diff --git a/.claude/.gitignore b/.claude/.gitignore new file mode 100644 index 00000000..1b6c8fc3 --- /dev/null +++ b/.claude/.gitignore @@ -0,0 +1,4 @@ +* +!skills/ +!skills/**/* +!CLAUDE.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md new file mode 100644 index 00000000..28ea2439 --- /dev/null +++ b/.claude/CLAUDE.md @@ -0,0 +1,111 @@ +# Buckyball + +基于 RISC-V 的 DSA(Domain Specific Architecture)框架。Chisel 6.5.0,Nix Flake 构建。 + +## 项目结构 + +- `arch/src/main/scala/framework/` — 框架核心 + - `balldomain/prototype/` — Ball 算子实现(每个 Ball 一个子目录) + - `balldomain/blink/` — Blink 协议定义(BlinkIO、BankRead/Write、BallStatus) + - `balldomain/configs/` — BallDomainParam + default.json(ballIdMappings) + - `balldomain/bbus/` — BBus 总线 + - `balldomain/rs/` — BallRsIssue / BallRsComplete(命令/完成接口) + - `memdomain/backend/banks/` — SramReadIO / SramWriteIO + - `core/bbtile/` — BBTile 集成(Rocket core + Buckyball) + - `top/` — GlobalConfig(顶层参数汇聚) +- `arch/src/main/scala/examples/toy/balldomain/` — toy 配置 + - `DISA.scala` — 指令 opcode(funct7 BitPat) + - `DomainDecoder.scala` — 指令解码表(ListLookup) + - `bbus/busRegister.scala` — Ball 生成器注册(match case) +- `arch/src/main/scala/sims/` — 仿真配置 + - `verilator/` — Verilator 配置 + - `verify/` — 单 Ball elaboration(BallTopMain) +- `bb-tests/` — 测试 + - `workloads/lib/bbhw/isa/` — ISA C 宏(每条指令一个 .c 文件) + - `workloads/src/CTest/toy/` — C 测试用例 + - `sardine/` — pytest 测试框架 +- `bbdev/` — 开发工具链(Motia 工作流后端) + +## Blink 协议 + +Ball 通过 Blink 协议接入 BBus。每个 Ball 实现 `HasBlink` trait。 + +``` +BlinkIO(b: GlobalConfig, inBW: Int, outBW: Int): + cmdReq: Flipped(Decoupled(BallRsIssue)) // 命令输入(含 BallDecodeCmd + rob_id) + cmdResp: Decoupled(BallRsComplete) // 完成输出(含 rob_id) + bankRead: Vec(inBW, Flipped(BankRead)) // SRAM 读端口 + bankWrite: Vec(outBW, Flipped(BankWrite)) // SRAM 写端口 + status: BallStatus { idle, running } // 状态信号 + +BankRead/BankWrite 元数据字段(均为 Input): + bank_id, rob_id, ball_id, group_id + +SramReadIO: req.valid/ready + req.bits.addr → resp.valid + resp.bits.data +SramWriteIO: req.valid/ready + req.bits(addr, data, mask, wmode) → resp.valid + resp.bits.ok +``` + +关键时序:SRAM 读延迟 = 1 cycle(req.fire 后下一周期 resp.valid 拉高)。 + +## 注册不变量 + +添加或修改 Ball 注册时,以下 6 项必须同时满足: + +1. `default.json` 的 `ballNum` == `ballIdMappings` 数组长度 +2. `ballId` 严格递增(0, 1, 2, ...),不跳号 +3. `ballId` 无重复 +4. `DISA.scala` 中 funct7 无重复 +5. `busRegister.scala` 中的 case 名称集合 == `default.json` 中的 ballName 集合 +6. `DomainDecoder.scala` 中的 BID 值集合 == `default.json` 中的 ballId 集合 + +用 `/check` skill 可以自动检查所有不变量并可选自动修复。 + +## MCP 工具 + +项目配置了 `buckyball-dev` MCP Server,提供以下工具。 + +**重要:编译、仿真、综合、测试等操作必须通过 MCP 工具调用,禁止直接使用 bbdev CLI 或 nix develop 命令。** +bbdev CLI 是给人类程序员用的,MCP 工具是给 agent 用的——MCP 工具内部会自动管理 bbdev server 生命周期并通过 HTTP API 调用。 + +### 校验 +- `validate` — 检查 6 项注册不变量 + +### bbdev API 封装(自动管理 server 生命周期) +- `bbdev_workload_build` — 编译 CTest +- `bbdev_verilator_run(binary, config?, batch?, coverage?)` — 全流程 clean→verilog→build→sim +- `bbdev_verilator_verilog(config, balltype?)` — 生成 Verilog;`config` 必传,`balltype` 可选(单 Ball elaborate) +- `bbdev_verilator_build(jobs?, coverage?)` — 编译 Verilator 仿真器 +- `bbdev_verilator_sim(binary, batch?, coverage?)` — 跑仿真(需先 build) +- `bbdev_sardine_run(workload?, coverage?)` — 批量测试 +- `bbdev_yosys_synth(top?, config?)` — Yosys 综合 + OpenSTA 时序分析 + +默认 config 值:`sims.verilator.BuckyballToyVerilatorConfig` +仿真 binary 命名格式:`ctest__test_singlecore-baremetal` + +### 分析报告路径 +- 面积报告:`bbdev/api/steps/yosys/log/hierarchy_report.txt`(子模块分解)、`area_report.txt`(顶层) +- 时序报告:`bbdev/api/steps/yosys/log/timing_report.txt` +- 覆盖率报告:`bb-tests/sardine/reports/coverage/html/` +- 仿真日志:`arch/log//stdout.log`、`disasm.log` +- bdb 调试日志:`arch/log//bdb.log`,包含三种 DPI-C trace: + - `[ITRACE]` — 指令发射/完成 + - `[MTRACE]` — SRAM 读写 + - `[PMCTRACE]` — Ball/Mem 性能计数(elapsed cycles) + +## Skills + +项目 skill 位于 `.claude/skills/` 下: +- `/ball` — 创建新 Ball 算子(全流程:实现→注册→ISA→CTest→仿真) +- `/check` — 注册状态校验 + 自动修复 +- `/verify` — Ball 功能验证(编译→仿真→覆盖率→PMC 分析) +- `/optimize` — RTL 模块面积/延迟优化(适用于任意模块,不限 Ball) +- `/debug` — 仿真调试(日志分析→波形→故障模式库) +- `/waveform` — 波形分析(waveform-mcp 使用指南) + +## 约定 + +- 改 Ball 实现时不要碰注册文件,改注册文件时不要碰实现文件 +- Chisel 版本 6.5.0,不要使用 6.6+ 新 API +- CTest 用 `add_cross_platform_test_target` 注册到 CMakeLists.txt +- **禁止直接调用 `bbdev` CLI 或 `nix develop -c bbdev ...`**,必须通过 MCP 工具调用 +- Ball wrapper 类名必须和 `default.json` 中 `ballName` 一致 diff --git a/.claude/skills/ball/SKILL.md b/.claude/skills/ball/SKILL.md new file mode 100644 index 00000000..416dea2b --- /dev/null +++ b/.claude/skills/ball/SKILL.md @@ -0,0 +1,285 @@ +--- +name: ball +description: 创建一个名为 $ARGUMENTS 的新 Buckyball Ball 算子,完成从实现到验证的全流程。当用户要求创建新 Ball、新算子、新加速单元,或说"加一个 XX Ball"、"实现 XX 操作"时,使用此 skill。即使用户没有明确说"Ball",只要意图是在 Buckyball 框架中增加一个新的计算算子,都应触发。 +--- + +**重要:编译、仿真等操作必须通过 MCP 工具(validate、bbdev_workload_build、bbdev_verilator_run 等)调用,禁止直接使用 bbdev CLI 或 nix develop 命令。** + +## 阶段 1 — 需求收集 + +1. 读取当前注册状态,确定新 Ball 的 ballId 和 funct7: + - `arch/src/main/scala/framework/balldomain/configs/default.json` + - `arch/src/main/scala/examples/toy/balldomain/DISA.scala` +2. 检查是否已部分存在(增量模式): + - 搜索 `arch/src/main/scala/framework/balldomain/prototype/` 下是否有同名目录 + - 搜索 `bb-tests/workloads/lib/bbhw/isa/` 下是否有对应的 ISA 宏文件 + - 搜索 `bb-tests/workloads/src/CTest/toy/` 下是否有对应的 CTest + - 如果部分文件已存在,只补齐缺失部分 +3. 向用户确认以下信息: + - Ball 的计算语义(做什么运算) + - inBW / outBW(读/写 bank 端口数量) + - 是否需要第二个操作数(op2) + - iter(迭代次数)的含义 + +## 阶段 2 — 实现 Ball + +1. 读取参考代码,理解 Blink 协议和现有 Ball 的写法: + - 简单参考:`arch/src/main/scala/framework/balldomain/prototype/relu/ReluBall.scala` 和 `Relu.scala` + - 复杂参考:`arch/src/main/scala/framework/balldomain/prototype/systolicarray/` + - Blink 协议:`arch/src/main/scala/framework/balldomain/blink/blink.scala`、`bank.scala`、`status.scala` + - SRAM 接口:`arch/src/main/scala/framework/memdomain/backend/banks/SramIO.scala` +2. 在 `arch/src/main/scala/framework/balldomain/prototype//` 下创建文件,使用 `references/` 中的模板作为起点。 + +### Ball wrapper 模板(`Ball.scala`) + +```scala +package framework.balldomain.prototype. + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink} +import framework.top.GlobalConfig + +@instantiable +class Ball(val b: GlobalConfig) extends Module with HasBlink { + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "Ball") + .getOrElse(throw new IllegalArgumentException("Ball not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + def blink: BlinkIO = io + + val core: Instance[] = Instantiate(new (b)) + + core.io.cmdReq <> io.cmdReq + core.io.cmdResp <> io.cmdResp + for (i <- 0 until inBW) { core.io.bankRead(i) <> io.bankRead(i) } + for (i <- 0 until outBW) { core.io.bankWrite(i) <> io.bankWrite(i) } + io.status <> core.io.status +} +``` + +### Core 模板(`.scala`) + +```scala +package framework.balldomain.prototype. + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig + +@instantiable +class (val b: GlobalConfig) extends Module { + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "Ball") + .getOrElse(throw new IllegalArgumentException("Ball not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + // Latch rob_id on cmdReq.fire + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + when(io.cmdReq.fire) { rob_id_reg := io.cmdReq.bits.rob_id } + + // Propagate rob_id to bank metadata + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + // FSM: idle -> read -> compute -> write -> complete -> idle + val idle :: sRead :: sCompute :: sWrite :: complete :: Nil = Enum(5) + val state = RegInit(idle) + + // Default port assignments (override in FSM states) + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + io.bankRead(i).bank_id := 0.U + io.bankRead(i).group_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + io.bankWrite(i).bank_id := 0.U + io.bankWrite(i).group_id := 0.U + } + + io.cmdReq.ready := state === idle + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + + // Latch command fields + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + rbank_reg := io.cmdReq.bits.cmd.op1_bank + wbank_reg := io.cmdReq.bits.cmd.wr_bank + iter_reg := io.cmdReq.bits.cmd.iter + state := sRead + } + } + is(sRead) { + // TODO: implement read logic + // Key: SRAM read latency = 1 cycle (resp.valid on next cycle after req.fire) + } + is(sCompute) { + // TODO: implement compute logic + } + is(sWrite) { + // TODO: implement write logic + } + is(complete) { + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + when(io.cmdResp.fire) { state := idle } + } + } + + io.status.idle := (state === idle) + io.status.running := (state =/= idle) && (state =/= complete) +} +``` + +### Param 模板(`configs/BallParam.scala`) + +```scala +package framework.balldomain.prototype..configs + +import upickle.default._ + +case class BallParam( + // TODO: add Ball-specific parameters +) + +object BallParam { + implicit val rw: ReadWriter[BallParam] = macroRW + + def apply(): BallParam = { + val jsonStr = scala.io.Source + .fromFile("src/main/scala/framework/balldomain/prototype//configs/default.json") + .mkString + read[BallParam](jsonStr) + } +} +``` + +关键约束: +- SRAM 读延迟 = 1 cycle(req.fire 后下一周期 resp.valid) +- cmdReq.fire 时 latch 命令字段到寄存器 +- FSM 基本模式:idle → 读数据 → 计算 → 写数据 → complete → idle +- status.idle 和 status.running 必须映射 FSM 状态 + +## 阶段 3 — 注册 + +按顺序更新以下四个文件: +1. `arch/src/main/scala/framework/balldomain/configs/default.json` — 追加 ballIdMappings 条目,更新 ballNum +2. `arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala` — 添加 import 和 match case +3. `arch/src/main/scala/examples/toy/balldomain/DISA.scala` — 添加 `val XXX = BitPat("bxxxxxxx")` +4. `arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala` — 添加 ListLookup 解码行,BID = ballId.U + +## 阶段 4 — ISA C 宏 + +在 `bb-tests/workloads/lib/bbhw/isa/` 下创建 `_.c`: + +```c +#ifndef _BB__H_ +#define _BB__H_ + +#include "isa.h" + +#define BB__FUNC7 + +#define bb_(bank_id, wr_bank_id, iter) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(bank_id) | BB_BANK2(wr_bank_id) | \ + BB_RD0 | BB_WR | BB_ITER(iter)), \ + 0, BB__FUNC7) + +#endif // _BB__H_ +``` + +在 `bb-tests/workloads/lib/bbhw/isa/isa.h` 中 `#include` 新文件。 + +## 阶段 5 — CTest + +在 `bb-tests/workloads/src/CTest/toy/` 下创建 `_test.c`: + +```c +#include "buckyball.h" +#include +#include +#include +#include +#include + +#define DIM 16 + +// Fixed input and expected output matrices +static elem_t input_matrix[DIM * DIM] __attribute__((aligned(64))) = { /* ... */ }; +static elem_t expected_matrix[DIM * DIM] __attribute__((aligned(64))) = { /* ... */ }; +static elem_t output_matrix[DIM * DIM] __attribute__((aligned(64))); + +void hw_(const char *test_name, elem_t *a, elem_t *b, int size) { + uint32_t op1_bank_id = 0; + uint32_t wr_bank_id = 1; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(wr_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_(op1_bank_id, wr_bank_id, size); + bb_mvout((uintptr_t)b, wr_bank_id, size, 1); + bb_fence(); +} + +int main() { + clear_i8_matrix(output_matrix, DIM, DIM); + hw_("", input_matrix, output_matrix, DIM); + + if (compare_i8_matrices(output_matrix, expected_matrix, DIM, DIM)) { + printf(" test PASSED\n"); + return 0; + } else { + printf(" test FAILED\n"); + return 1; + } +} +``` + +在 `bb-tests/workloads/src/CTest/toy/CMakeLists.txt` 中用 `add_cross_platform_test_target` 注册。 +在 `bb-tests/sardine/tests/test_ctest.py` 的 `ctest_workloads` 列表中追加对应条目。 + +## 阶段 6 — 校验 + 编译 + 仿真 + +1. 调用 MCP 工具 `validate` 做静态校验,确认 6 项不变量全部通过 +2. 调用 MCP 工具 `bbdev_workload_build` 编译 CTest +3. 调用 MCP 工具 `bbdev_verilator_run` 跑仿真,指定新 Ball 的 CTest binary +4. 解析仿真结果: + - PASSED → 完成 + - FAILED → 使用 `/debug` skill 进入调试流程 diff --git a/.claude/skills/check/SKILL.md b/.claude/skills/check/SKILL.md new file mode 100644 index 00000000..6ffc9edd --- /dev/null +++ b/.claude/skills/check/SKILL.md @@ -0,0 +1,42 @@ +--- +name: check +description: 对 Buckyball Ball 注册状态做静态校验,并可选自动修复不一致。当用户要求检查注册状态、校验 Ball 配置、排查注册问题,或在修改注册文件后需要验证一致性时使用此 skill。 +--- + +## 校验流程 + +调用 MCP 工具 `validate`,检查以下 6 项注册不变量: +1. ballNum == ballIdMappings 数组长度 +2. ballId 严格递增(0, 1, 2, ...),不跳号 +3. ballId 无重复 +4. DISA.scala funct7 无重复 +5. busRegister.scala case 名称与 default.json ballName 一致 +6. DomainDecoder.scala BID 与 default.json ballId 一致 + +每项报告 pass/fail 状态。 + +## 注册状态概览 + +校验后生成一张汇总表,展示当前所有 Ball 的注册信息。数据源: + +- `arch/src/main/scala/framework/balldomain/configs/default.json` — ballId, ballName, inBW, outBW +- `arch/src/main/scala/examples/toy/balldomain/DISA.scala` — funct7 值 +- `arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala` — 解码行中的 BID + +表格格式: + +| ballId | ballName | funct7 | inBW | outBW | DISA | busReg | Decoder | +|--------|----------|--------|------|-------|------|--------|---------| +| 0 | VecBall | 32 | 2 | 4 | ok | ok | ok | +| ... | ... | ... | ... | ... | ... | ... | ... | + +## 自动修复 + +如果校验发现不一致,且属于以下可确定性修复的类型,提示用户是否自动修复: + +1. **ballNum 不匹配** — 自动将 ballNum 更新为 ballIdMappings 数组长度 +2. **ballId 不连续** — 自动重编号为 0, 1, 2, ...(同步更新 DomainDecoder.scala 中的 BID) +3. **busRegister.scala 缺少 case** — 提示缺少哪些 Ball,给出需要添加的 import 和 match case 代码 +4. **DomainDecoder.scala BID 不一致** — 自动更新 BID 值与 default.json 匹配 + +对于无法自动修复的问题(如 funct7 冲突),给出原因分析和手动修复建议。 diff --git a/.claude/skills/debug/SKILL.md b/.claude/skills/debug/SKILL.md new file mode 100644 index 00000000..62bac63d --- /dev/null +++ b/.claude/skills/debug/SKILL.md @@ -0,0 +1,116 @@ +--- +name: debug +description: 系统性调试 Buckyball 仿真失败。当仿真返回 FAILED、CTest 测试不通过、Chisel 编译报错,或用户报告 Ball 行为异常时使用此 skill。涵盖日志分析、波形分析、常见故障模式匹配。即使用户只是说"仿真失败了"或"跑不过"也应触发。 +--- + +**重要:编译、仿真等操作必须通过 MCP 工具调用,禁止直接使用 bbdev CLI 或 nix develop 命令。** + +## 第一步 — 定位日志 + +1. 找到日志目录: + - MCP 工具返回的 JSON 中包含 `log_dir` 字段 + - 如果没有,用 `ls -t arch/log/ | head -5` 找最近的日志目录 +2. 确认关键日志文件存在: + - `stdout.log` — 程序标准输出(PASSED/FAILED、printf) + - `disasm.log` — 反汇编指令流 + - `bdb.log` — Buckyball 硬件调试日志(最重要) + - `bbdev/server.log` — bbdev server 日志(编译错误在这里) + +## 第二步 — 分层分析 + +按照从高层到底层的顺序分析,先排除简单问题: + +### Level 1: 编译错误(bbdev/server.log) + +如果 MCP 工具返回 HTTP 500 或 returncode=1,先看 server.log: +- Chisel 编译错误(类型不匹配、缺少注册等) +- mill 构建错误(依赖问题) +- CTest 编译错误(C 语法、链接问题) + +### Level 2: 程序输出(stdout.log) + +- 搜索 `PASSED` / `FAILED` 确认测试结果 +- 搜索 `printf` 输出,检查实际值 vs 预期值 +- 搜索 `panic` / `abort` / `trap` 看是否有异常 + +### Level 3: 指令流(disasm.log) + +- 确认 Ball 的 custom 指令确实被执行了(搜索 `custom3`) +- 检查指令顺序是否正确(mvin → ball_op → mvout → fence) +- 检查是否有 trap 或 exception + +### Level 4: 硬件跟踪(bdb.log) + +这是定位 Ball 逻辑错误最重要的日志,包含三种 trace: + +**[ITRACE] 指令 trace:** +- `ISSUE rob_id=X domain=Y funct=0xZZ` — 指令发射 +- `COMPLETE rob_id=X` — 指令完成 +- 检查:指令是否被发射?是否完成?完成顺序是否正确? + +**[MTRACE] 内存 trace:** +- `READ ch=X vbank=Y group=Z addr=0xAA` — SRAM 读 +- `WRITE ch=X vbank=Y group=Z addr=0xAA data=0x...` — SRAM 写 +- 检查:读写地址是否正确?数据是否正确?bank_id 是否匹配? + +**[PMCTRACE] 性能计数 trace:** +- `BALL ball_id=X rob_id=Y elapsed=Z` — Ball 操作耗时 +- `LOAD/STORE rob_id=X elapsed=Y` — 内存操作耗时 +- 检查:elapsed 是否合理?是否有异常长的操作? + +### Level 5: 波形分析(waveform-mcp) + +如果日志分析无法定位问题,使用 waveform-mcp 做 cycle-level 分析。详见 `/waveform` skill。 + +## 第三步 — 常见故障模式 + +### 1. Ball 没有响应(cmdResp 永远不 fire) +**症状:** 仿真超时或死锁,bdb.log 中有 ISSUE 但没有 COMPLETE +**原因:** +- FSM 卡在某个状态(检查 state 转移条件) +- SRAM resp.valid 没被处理(忘了 resp.ready := true.B) +- cmdResp.valid 没有被拉高 + +### 2. 数据全零 +**症状:** CTest 报 FAILED,输出矩阵全是 0 +**原因:** +- 写操作 addr 错误(waddr 没有递增) +- 写操作 mask 全零(忘了设置 mask := 1) +- bank_id 错误(写到了错误的 bank) + +### 3. 数据不变(输出 == 输入) +**症状:** CTest 报 FAILED,输出矩阵等于输入矩阵 +**原因:** +- 计算逻辑没有生效(跳过了 compute 状态) +- 读到的数据没有被处理就直接写回了 + +### 4. 部分数据错误 +**症状:** CTest 报 FAILED,部分行正确部分行错误 +**原因:** +- iter 次数计算错误(少读/少写了几行) +- 地址偏移量计算错误(行之间的 stride) +- 边界条件处理错误 + +### 5. SRAM 时序错误 +**症状:** 数据看起来"偏移了一行" +**原因:** +- SRAM 读延迟是 1 cycle,但代码在 req.fire 的同一个 cycle 就取了 resp.bits.data +- 正确做法:req.fire 后等一个 cycle,在下一个 cycle resp.valid 时读取数据 + +### 6. bank_id 冲突 +**症状:** assertion failure 或数据错乱 +**原因:** +- op1_bank 和 wr_bank 使用了同一个 bank(读写冲突) +- 多个 Ball 同时访问同一个 bank + +### 7. rob_id 不匹配 +**症状:** 指令完成顺序混乱 +**原因:** +- cmdResp.bits.rob_id 没有返回正确的 rob_id +- rob_id 没有在 cmdReq.fire 时被 latch + +## 第四步 — 修复 + 验证 + +1. 结合日志/波形分析定位到的问题,修改 Chisel 源码 +2. 重新通过 MCP 工具编译和仿真验证修复结果 +3. 如果修复引入新问题,回到第二步重新分析 diff --git a/.claude/skills/optimize/SKILL.md b/.claude/skills/optimize/SKILL.md new file mode 100644 index 00000000..ea59425e --- /dev/null +++ b/.claude/skills/optimize/SKILL.md @@ -0,0 +1,116 @@ +--- +name: optimize +description: 分析并优化名为 $ARGUMENTS 的 RTL 模块的延迟和面积。适用于任何 RTL 模块(Ball、MemFrontend、BBus、GlobalROB 等),不限于 Ball 算子。当用户要求优化某个模块的面积、延迟、时序、性能,或说"XX 太大了"、"XX 太慢了"、"分析一下 XX 的面积"时使用此 skill。 +--- + +**重要:综合、仿真等操作必须通过 MCP 工具调用,禁止直接使用 bbdev CLI 或 nix develop 命令。** + +## 阶段 1 — 基线测量 + +1. 调用 MCP 工具 `bbdev_yosys_synth` 跑 Yosys 综合 + OpenSTA 时序分析 +2. 读取面积报告:`bbdev/api/steps/yosys/log/hierarchy_report.txt`(子模块分解) +3. 读取时序报告:`bbdev/api/steps/yosys/log/timing_report.txt`(关键路径) + +### hierarchy_report.txt 解析指引 + +报告格式为 Yosys `stat -top` 输出,关键字段: +- `Chip area for module` 行后面是每个子模块的面积分解 +- `Number of cells` — 单元数量 +- `Sequential` — 寄存器面积(flip-flops) +- `Combinational` — 组合逻辑面积 +- 搜索目标模块名定位其层次结构 + +### timing_report.txt 解析指引 + +OpenSTA 输出格式: +- `Startpoint:` / `Endpoint:` — 关键路径起止点 +- `Path Delay` — 路径延迟 +- `Slack` — 时序裕量(负值表示违约) +- 搜索目标模块名看是否在关键路径上 + +### 延迟测量 + +**如果是 Ball 模块:** +1. 检查仿真 bdb.log 中的 PMC trace 数据(`[PMCTRACE] BALL ball_id=X`),提取 elapsed cycles +2. 如果 CTest 中有 `read_rdcycle()` 计时代码,也可以从 stdout.log 获取 cycle 数 +3. 如果两者都没有,可以用 waveform-mcp 精确测量: + - 用 `find_conditional_events` 找 `cmdReq.valid && cmdReq.ready` 的时刻 + - 用 `find_conditional_events` 找 `cmdResp.valid` 的时刻 + - 两者之差即为操作延迟 + +**如果是其他模块:** +1. 用 waveform-mcp 在波形上测量关键操作的 cycle 数 +2. 或在 CTest 中加入 `read_rdcycle()` 前后对比代码 + +## 阶段 2 — 面积分析 + +从 hierarchy_report.txt 提取目标模块及其子模块的面积数据: +- 总面积(Chip area) +- Sequential 占比(寄存器面积) +- Combinational 占比(组合逻辑面积) +- 子模块面积排名 + +识别面积大户: +- Sequential 占比高 → 寄存器多,考虑是否可用 SRAM 替代 +- Combinational 占比高 → 逻辑复杂,考虑是否可以简化或共享 + +对比同类模块面积,找出效率差距。 + +## 阶段 3 — 时序/延迟分析 + +1. 从 timing_report.txt 看关键路径是否经过该模块,以及路径延迟 +2. 从 PMC trace 或波形数据看操作延迟(cycle 数) +3. 如果是 Ball 或有 FSM 的模块,读取 Chisel 源码分析 FSM: + - 绘制 FSM 状态转移图 + - 计算每个状态的 cycle 数(最佳/最坏情况) + - 识别瓶颈状态(哪个状态耗时最多) + - 分析 SRAM 读写模式(串行 vs 流水 vs 多端口并行) + +## 阶段 4 — 优化方案 + +提出可量化的优化方案,每个方案包含: +- 优化手段描述 +- 预期面积变化(参考 hierarchy_report 数据量化) +- 预期延迟变化(cycle 数) +- 预期频率影响 +- trade-off 说明 + +常见优化模式: + +**降延迟**: +- 读写流水化:让写操作和下一轮读操作重叠,面积略增,延迟显著降 +- 多 bank 端口并行读:利用 inBW > 1 同时读多行,面积不变,延迟与端口数成比例降 +- 去中间等待状态:合并不必要的 FSM 状态,面积略降 +- 边读边算:计算嵌入读响应周期,利用 SRAM 1-cycle 延迟的下降沿做计算 + +**降面积**: +- regArray 改 SRAM:大块寄存器阵列换成 SRAM 端口访问,面积显著降,可能增延迟 +- 共享计算单元:多操作时分复用同一计算单元,面积降,延迟可能增 +- 减少 counter 位宽:根据实际范围缩减位宽,面积微降 + +**提频率**: +- 拆分组合逻辑长路径:在关键路径中间插入寄存器,面积略增,可能增 1 cycle 延迟 + +列出方案后让用户选择。 + +## 阶段 5 — 实施 + +根据用户选择的方案修改 Chisel 代码。 + +## 阶段 6 — 优化后测量 + +1. 再次调用 `bbdev_yosys_synth`,对比 hierarchy_report +2. 如果有 CTest,再次调用 `bbdev_verilator_run` 跑仿真,对比 PMC trace 中的 cycle 数 +3. 调用 `validate` 确认注册一致性未被破坏 +4. 输出优化前后对比报告: + +| 指标 | 优化前 | 优化后 | 变化 | +|------|--------|--------|------| +| 面积 | X | Y | -Z% | +| Cycle 数 | A | B | -C% | +| 关键路径延迟 | D | E | -F% | + +## 故障排查 + +如果 MCP 工具返回 HTTP 500 或 returncode=1: +- 读取 `bbdev/server.log` 获取详细错误信息 diff --git a/.claude/skills/verify/SKILL.md b/.claude/skills/verify/SKILL.md new file mode 100644 index 00000000..fb34ff41 --- /dev/null +++ b/.claude/skills/verify/SKILL.md @@ -0,0 +1,59 @@ +--- +name: verify +description: 验证名为 $ARGUMENTS 的 Ball 的功能正确性。当用户要求验证、测试某个 Ball,检查 Ball 是否工作正常,或说"验证 XX Ball"、"跑一下 XX 测试"时使用此 skill。也适用于用户新建完 Ball 后想确认其正确性的场景。 +--- + +**重要:编译、仿真、测试等操作必须通过 MCP 工具调用,禁止直接使用 bbdev CLI 或 nix develop 命令。** + +## 阶段 1 — 完整性检查 + +使用 `/check` 的逻辑检查注册一致性,然后检查以下各项是否存在,缺什么补什么: +1. Ball 实现:`arch/src/main/scala/framework/balldomain/prototype//` 目录是否存在 +2. 注册:`arch/src/main/scala/framework/balldomain/configs/default.json` 中是否有条目 +3. ISA 宏:`bb-tests/workloads/lib/bbhw/isa/` 下对应的 .c 文件 +4. CTest:`bb-tests/workloads/src/CTest/toy/` 下对应的 _test.c 文件 +5. sardine 列表:`bb-tests/sardine/tests/test_ctest.py` 的 ctest_workloads 列表 + +## 阶段 2 — 编译 + 仿真 + +1. 调用 MCP 工具 `bbdev_workload_build` 编译所有 CTest +2. 调用 MCP 工具 `bbdev_verilator_run` 仿真该 Ball 的 CTest + - binary 名称格式:`ctest__test_singlecore-baremetal` + - 设置 batch=true +3. 如果编译或仿真失败,使用 `/debug` skill 进入调试流程 + +## 阶段 3 — 覆盖率分析 + +1. 调用 MCP 工具 `bbdev_sardine_run`,设置 coverage=true +2. 读取覆盖率报告: + - 行覆盖数据在 `bb-tests/sardine/reports/coverage/annotated/` 下 + - 找到对应 Ball 的文件:grep 搜索 Ball 类名 +3. 分析该 Ball 的 RTL 行覆盖情况: + - 查看未覆盖的行(标记为 `000000`) + - 重点关注 FSM 状态、边界条件、错误路径 +4. 如果覆盖率不足,建议或自动补充测试用例,补充后重新编译和仿真验证 + +## 阶段 4 — PMC 性能分析 + +仿真通过后,读取 bdb.log 中的 PMC trace 数据来分析性能: + +1. 找到日志目录(`ls -t arch/log/ | head -5`) +2. 在 bdb.log 中搜索 `[PMCTRACE] BALL` 条目,提取该 Ball 的 elapsed cycle 数据 +3. 汇总报告: + - 平均 elapsed cycles per task + - 最大/最小 elapsed cycles + - 总调用次数 + +## 阶段 5 — 波形分析(仿真失败时) + +如果仿真失败,除了读日志外,还可以用 waveform-mcp 做精确的时序分析。详见 `/waveform` skill。 + +关键信号检查清单: +- `cmdReq.valid && cmdReq.ready`(命令握手) +- SRAM `req.valid/ready` 和 `resp.valid`(读写时序) +- FSM state 寄存器(状态转移) +- `cmdResp.valid && cmdResp.fire`(完成握手) + +## 失败分析 + +如果仿真结果为 FAILED,使用 `/debug` skill 进行系统性调试。 diff --git a/.claude/skills/waveform/SKILL.md b/.claude/skills/waveform/SKILL.md new file mode 100644 index 00000000..82dccfd5 --- /dev/null +++ b/.claude/skills/waveform/SKILL.md @@ -0,0 +1,139 @@ +--- +name: waveform +description: 使用 waveform-mcp 工具分析 VCD/FST 波形文件,检查 Buckyball 模块的信号时序。当需要做 cycle-level 的时序分析、调试握手协议、检查 FSM 状态转移,或用户要求"看波形"、"分析信号"时使用此 skill。也适用于 `/debug` 和 `/optimize` 中需要波形分析的场景。 +--- + +## 工具概览 + +项目配置了 `waveform-mcp` MCP server,提供以下工具: +- `open_waveform(file_path)` — 打开 VCD/FST 文件 +- `list_signals(waveform_id, hierarchy_prefix?, name_pattern?, recursive?)` — 列出信号 +- `read_signal(waveform_id, signal_path, time_index|time_indices)` — 读取信号值 +- `get_signal_info(waveform_id, signal_path)` — 获取信号元数据 +- `find_signal_events(waveform_id, signal_path, start?, end?, limit?)` — 找信号变化 +- `find_conditional_events(waveform_id, condition, start?, end?, limit?)` — 条件搜索 +- `close_waveform(waveform_id)` — 关闭波形 + +## 波形文件位置 + +仿真生成的波形文件通常在: +- `arch/log//` 目录下 +- 文件格式:`.vcd` 或 `.fst` +- 用 `find arch/log -name "*.vcd" -o -name "*.fst" | head -5` 查找 + +## 信号层次结构 + +Buckyball 的信号层次大致为: +``` +TOP +├── BuckyballToy (顶层) +│ ├── bbtile (BBTile) +│ │ ├── buckyball (Buckyball 主体) +│ │ │ ├── frontend +│ │ │ │ ├── globalDecoder +│ │ │ │ └── globalROB +│ │ │ ├── ballDomain +│ │ │ │ └── bbus (BBus) +│ │ │ │ ├── cmdRouter +│ │ │ │ ├── pmc (BallCyclePMC) +│ │ │ │ ├── balls_0 (第一个 Ball,按 ballId 排序) +│ │ │ │ ├── balls_1 +│ │ │ │ └── ... +│ │ │ └── memDomain +│ │ │ ├── memFrontend +│ │ │ └── memBackend +``` + +### 定位 Ball 信号 + +要找到特定 Ball 的信号: +1. `list_signals(waveform_id, recursive=false)` — 先看顶层 +2. 逐级向下:`list_signals(waveform_id, hierarchy_prefix="TOP.BuckyballToy")` 等 +3. 或直接用 `name_pattern` 搜索:`list_signals(waveform_id, name_pattern="relu", recursive=true)` + +## 常用检查模板 + +### 1. 握手检查(Decoupled valid/ready) + +检查 cmdReq 握手是否成功: +``` +find_conditional_events(waveform_id, + condition="TOP...cmdReq_valid && TOP...cmdReq_ready") +``` + +检查 cmdResp 握手: +``` +find_conditional_events(waveform_id, + condition="TOP...cmdResp_valid && TOP...cmdResp_ready") +``` + +### 2. SRAM 读时序检查 + +验证 SRAM 1-cycle 读延迟: +``` +# 找到读请求 +find_conditional_events(waveform_id, + condition="TOP...bankRead_0_io_req_valid && TOP...bankRead_0_io_req_ready") + +# 检查下一个 cycle 是否有 resp.valid +# 取上面事件的 time_index + 1,然后 +read_signal(waveform_id, "TOP...bankRead_0_io_resp_valid", time_index=) +``` + +### 3. FSM 状态转移追踪 + +找 state 寄存器的所有变化: +``` +find_signal_events(waveform_id, signal_path="TOP...state") +``` + +然后读取每个变化点的值来重建状态转移序列。 + +### 4. Ball 操作延迟测量 + +测量从 cmdReq.fire 到 cmdResp.fire 的 cycle 数: +``` +# 找 cmdReq 握手时刻 +req_events = find_conditional_events(waveform_id, + condition="TOP...cmdReq_valid && TOP...cmdReq_ready") + +# 找 cmdResp 握手时刻 +resp_events = find_conditional_events(waveform_id, + condition="TOP...cmdResp_valid && TOP...cmdResp_ready") + +# 对应的 resp - req 即为延迟 +``` + +### 5. Bank 地址/数据检查 + +在某个时刻读取 SRAM 读写的地址和数据: +``` +read_signal(waveform_id, "TOP...bankRead_0_io_req_bits_addr", time_index=T) +read_signal(waveform_id, "TOP...bankRead_0_io_resp_bits_data", time_index=T+1) +read_signal(waveform_id, "TOP...bankWrite_0_io_req_bits_addr", time_index=T) +read_signal(waveform_id, "TOP...bankWrite_0_io_req_bits_data", time_index=T) +``` + +## 条件搜索语法 + +`find_conditional_events` 的 condition 支持: +- 信号路径:`TOP.module.signal` +- 位运算:`~`(NOT), `&`(AND), `|`(OR), `^`(XOR) +- 布尔运算:`&&`, `||`, `!` +- 比较:`==`, `!=` +- 位提取:`signal[bit]` 或 `signal[msb:lsb]` +- 前值:`$past(signal)` — 前一个 time_index 的值 +- Verilog 字面量:`4'b0001`, `8'hFF` + +常用 pattern: +- 上升沿:`!$past(TOP.signal) && TOP.signal` +- 下降沿:`$past(TOP.signal) && !TOP.signal` +- 握手:`TOP.valid && TOP.ready` +- 某位为 1:`TOP.flags & 4'b0001` + +## 使用建议 + +1. **先 list_signals 再 read** — 信号路径名可能和 Chisel 源码中的名称略有不同(Chisel 会做 name mangling),先搜索确认 +2. **用 find_conditional_events 而不是逐 cycle read** — 效率差几个数量级 +3. **限制 limit** — 波形可能很长,设置合理的 limit 避免返回过多数据 +4. **用完记得 close_waveform** — 释放内存 diff --git a/.github/workflows/doc.yml b/.github/workflows/doc.yml deleted file mode 100644 index 484a1b30..00000000 --- a/.github/workflows/doc.yml +++ /dev/null @@ -1,66 +0,0 @@ -# Sample workflow for building and deploying a mdBook site to GitHub Pages -# -# To get started with mdBook see: https://rust-lang.github.io/mdBook/index.html -# -name: Deploy Documentation - -on: - # Runs on pushes targeting the default branch - push: - branches: ["main"] - - # Allows you to run this workflow manually from the Actions tab - workflow_dispatch: - -# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages -permissions: - contents: read - pages: write - id-token: write - -# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. -# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete. -concurrency: - group: "pages" - cancel-in-progress: false - -jobs: - # Build job - build: - runs-on: ubuntu-latest - env: - MDBOOK_VERSION: 0.4.52 - steps: - - uses: actions/checkout@v4 - - name: Install mdBook - run: | - curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf -y | sh - rustup update - cargo install --version ${MDBOOK_VERSION} mdbook - cargo install mdbook-linkcheck - cargo install mdbook-pandoc - cargo install mdbook-toc - cargo install mdbook-mermaid - - name: Setup Pages - id: pages - uses: actions/configure-pages@v4 - - name: Install mermaid.js - run: cd docs/bb-note && mdbook-mermaid install - - name: Build with mdBook - run: cd docs/bb-note && mdbook build - - name: Upload artifact - uses: actions/upload-pages-artifact@v3 - with: - path: ./docs/bb-note/book - - # Deployment job - deploy: - environment: - name: github-pages - url: ${{ steps.deployment.outputs.page_url }} - runs-on: ubuntu-latest - needs: build - steps: - - name: Deploy to GitHub Pages - id: deployment - uses: actions/deploy-pages@v4 diff --git a/.github/workflows/pr.yml b/.github/workflows/pr.yml index 37ff57ba..c7310683 100644 --- a/.github/workflows/pr.yml +++ b/.github/workflows/pr.yml @@ -9,6 +9,8 @@ jobs: PR-Test: name: PR Test runs-on: cpu-server + env: + ALL_PROXY: socks5h://127.0.0.1:7890 steps: - name: Print information run: | @@ -17,37 +19,43 @@ jobs: echo "🔎 Branch: ${{ github.head_ref }} -> ${{ github.base_ref }}" echo "🔎 Repository: ${{ github.repository }}" - - name: Pull from the repository + # - uses: actions/checkout@v4 + # - uses: DeterminateSystems/determinate-nix-action@v3 + + - name: Reset to clean state shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - setproxy - git fetch origin +refs/pull/${{ github.event.pull_request.number }}/head:refs/remotes/origin/pr/${{ github.event.pull_request.number }} + cd ~/Code/buckyball + git fetch origin git clean -fd - git reset --hard ${{ github.sha }} + git checkout ${{ github.head_ref }} + git pull + git submodule update + + - name: Nix build + shell: zsh {0} + run: | + cd ~/Code/buckyball + nix build - name: Build Workloads shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - bbdev workload --build + cd ~/Code/buckyball + nix develop -c bbdev workload --build - name: Build Verilator shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - bbdev verilator --clean - bbdev verilator --verilog - bbdev verilator --build '--jobs 16' + cd ~/Code/buckyball + nix develop -c bbdev verilator --clean + nix develop -c bbdev verilator --verilog '--config sims.verilator.sims.verilator.BuckyballToyVerilatorConfig' + nix develop -c bbdev verilator --build '--jobs 16' - name: Smoke Test shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - bbdev sardine --run '--workload ctest' + cd ~/Code/buckyball + nix develop -c bbdev sardine --run '--workload ctest' - run: echo "🍏 PR Test is ${{ job.status }}." diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index d947f997..da390933 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -9,51 +9,49 @@ jobs: CI-Test: name: CI Test runs-on: cpu-server + env: + ALL_PROXY: socks5h://127.0.0.1:7890 steps: - name: Print information run: | echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event." echo "🔎 The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}." - - name: Pull from the repository + - name: Reset to clean state shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - setproxy - git fetch origin main + cd ~/Code/buckyball + git fetch origin git clean -fd - git reset --hard ${{ github.sha }} + git checkout ${{ github.head_ref }} + git pull + git submodule update + + - name: Nix build + shell: zsh {0} + run: | + cd ~/Code/buckyball + nix build - name: Build Workloads shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - rm -rf workflow/.motia || true - bbdev workload --build + cd ~/Code/buckyball + nix develop -c bbdev workload --build - name: Build Verilator shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - rm -rf workflow/.motia || true - bbdev verilator --clean - rm -rf workflow/.motia || true - bbdev verilator --verilog '--config sims.verilator.BuckyballToyVerilatorConfig' - rm -rf workflow/.motia || true - bbdev verilator --build '--jobs 16' + cd ~/Code/buckyball + nix develop -c bbdev verilator --clean + nix develop -c bbdev verilator --verilog '--config sims.verilator.BuckyballToyVerilatorConfig' + nix develop -c bbdev verilator --build '--jobs 16' - name: Smoke Test shell: zsh {0} run: | - source ~/.zshrc - buckyball_exec - rm -rf workflow/.motia || true - bbdev sardine --run '--workload ctest' - - - run: echo "🍏 CI Test is ${{ job.status }}." + cd ~/Code/buckyball + nix develop -c bbdev sardine --run '--workload ctest' # if check failed, revert the branch # - name: Revert Bad Commit diff --git a/.gitignore b/.gitignore index 8f99e267..5421eb21 100644 --- a/.gitignore +++ b/.gitignore @@ -1,20 +1,26 @@ .vscode/ .metals/ -.claude/ .motia/ .cursor/ .bsp/ +out/ +.Xil/ +result + CLAUDE.local.md node_modules/ env.sh mill.* index.html +*.jou *.vcd *.fst *.log - +*.dot .clangd .kiro + +__pycache__/ diff --git a/.gitmodules b/.gitmodules index c0478c59..48ff281c 100644 --- a/.gitmodules +++ b/.gitmodules @@ -4,21 +4,25 @@ [submodule "arch/thirdparty/chipyard"] path = arch/thirdparty/chipyard url = https://github.com/ucb-bar/chipyard.git -[submodule "workflow/vscode"] - path = workflow/vscode - url = https://github.com/DangoSys/buckyball-vscode.git -[submodule "thirdparty/picker"] - path = thirdparty/picker - url = https://github.com/XS-MLVP/picker.git -[submodule "bebop/host/gem5/gem5"] - path = bebop/host/gem5/gem5 - url = https://github.com/gem5/gem5.git -[submodule "bebop/host/spike/riscv-isa-sim"] - path = bebop/host/spike/riscv-isa-sim - url = https://github.com/riscv-software-src/riscv-isa-sim -[submodule "tools/palladium"] - path = tools/palladium - url = https://github.com/SEU-ACAL/awesome-palladium.git [submodule "compiler"] path = compiler url = https://github.com/DangoSys/bb-compiler.git +[submodule "arch/thirdparty/t1"] + path = arch/thirdparty/t1 + url = https://github.com/DangoSys/t1.git +[submodule "bbdev"] + path = bbdev + url = https://github.com/DangoSys/bbdev.git +[submodule "thirdparty/palladium"] + path = thirdparty/palladium + url = https://github.com/SEU-ACAL/awesome-palladium.git +[submodule "docs"] + path = docs + url = https://github.com/DangoSys/documents.git +[submodule "thirdparty/waveform-mcp"] + path = thirdparty/waveform-mcp + url = https://github.com/jiegec/waveform-mcp.git +[submodule "bebop"] + path = bebop + url = https://github.com/DangoSys/bebop + branch = next diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 00000000..0993e650 --- /dev/null +++ b/.mcp.json @@ -0,0 +1,23 @@ +{ + "mcpServers": { + "buckyball-dev": { + "type": "stdio", + "command": "nix", + "args": [ + "develop", + "-c", + "python3", + "-u", + "scripts/claude/mcp_server.py" + ], + "env": { + "NIX_QUIET": "1" + } + }, + "waveform-mcp": { + "type": "stdio", + "command": "thirdparty/waveform-mcp/target/release/waveform-mcp", + "args": [] + } + } +} diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 601bf685..a37efc89 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -6,6 +6,7 @@ repos: hooks: - id: clang-format name: clang-format C++ code + language: system files: \.(cpp|hpp|cc|cxx|h|c|hxx)$ args: [--style=llvm] @@ -13,7 +14,7 @@ repos: rev: 24.10.0 hooks: - id: black - language_version: python3.10 + language: system # a comprehensive tool for checking the style and quality of Python code. # It combines three popular Python tools: @@ -24,6 +25,7 @@ repos: rev: 6.1.0 hooks: - id: flake8 + language: system args: - --max-line-length=120 # Adjust as per your style guide - --ignore=F821,F403,F405,F401,W503,E203,E402,E401,W605,E712,E711,F841 @@ -40,17 +42,22 @@ repos: rev: v5.0.0 hooks: - id: end-of-file-fixer + language: system - id: trailing-whitespace + language: system - id: check-merge-conflict + language: system - id: check-yaml + language: system - id: check-added-large-files + language: system - # # Scala formatting with scalafmt - # - repo: local - # hooks: - # - id: scalafmt-check - # name: Check Scala formatting with scalafmt - # entry: bash -c 'cd arch && scalafmt --test' - # language: system - # files: ^arch/.*\.scala$ - # pass_filenames: false + # Scala formatting with scalafmt + - repo: local + hooks: + - id: scalafmt + name: Format Scala code with scalafmt + entry: bash -c 'cd arch && scalafmt --config .scalafmt.conf src/main/scala' + language: system + files: ^arch/src/main/scala/.*\.scala$ + pass_filenames: false diff --git a/README.md b/README.md index d422b8c9..be2949c4 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,13 @@

- +

[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/DangoSys/buckyball) [![zread](https://img.shields.io/badge/Ask_Zread-_.svg?style=flat&color=00b0aa&labelColor=000000&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyB3aWR0aD0iMTYiIGhlaWdodD0iMTYiIHZpZXdCb3g9IjAgMCAxNiAxNiIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPHBhdGggZD0iTTQuOTYxNTYgMS42MDAxSDIuMjQxNTZDMS44ODgxIDEuNjAwMSAxLjYwMTU2IDEuODg2NjQgMS42MDE1NiAyLjI0MDFWNC45NjAxQzEuNjAxNTYgNS4zMTM1NiAxLjg4ODEgNS42MDAxIDIuMjQxNTYgNS42MDAxSDQuOTYxNTZDNS4zMTUwMiA1LjYwMDEgNS42MDE1NiA1LjMxMzU2IDUuNjAxNTYgNC45NjAxVjIuMjQwMUM1LjYwMTU2IDEuODg2NjQgNS4zMTUwMiAxLjYwMDEgNC45NjE1NiAxLjYwMDFaIiBmaWxsPSIjZmZmIi8%2BCjxwYXRoIGQ9Ik00Ljk2MTU2IDEwLjM5OTlIMi4yNDE1NkMxLjg4ODEgMTAuMzk5OSAxLjYwMTU2IDEwLjY4NjQgMS42MDE1NiAxMS4wMzk5VjEzLjc1OTlDMS42MDE1NiAxNC4xMTM0IDEuODg4MSAxNC4zOTk5IDIuMjQxNTYgMTQuMzk5OUg0Ljk2MTU2QzUuMzE1MDIgMTQuMzk5OSA1LjYwMTU2IDE0LjExMzQgNS42MDE1NiAxMy43NTk5VjExLjAzOTlDNS42MDE1NiAxMC42ODY0IDUuMzE1MDIgMTAuMzk5OSA0Ljk2MTU2IDEwLjM5OTlaIiBmaWxsPSIjZmZmIi8%2BCjxwYXRoIGQ9Ik0xMy43NTg0IDEuNjAwMUgxMS4wMzg0QzEwLjY4NSAxLjYwMDEgMTAuMzk4NCAxLjg4NjY0IDEwLjM5ODQgMi4yNDAxVjQuOTYwMUMxMC4zOTg0IDUuMzEzNTYgMTAuNjg1IDUuNjAwMSAxMS4wMzg0IDUuNjAwMUgxMy43NTg0QzE0LjExMTkgNS42MDAxIDE0LjM5ODQgNS4zMTM1NiAxNC4zOTg0IDQuOTYwMVYyLjI0MDFDMTQuMzk4NCAxLjg4NjY0IDE0LjExMTkgMS42MDAxIDEzLjc1ODQgMS42MDAxWiIgZmlsbD0iI2ZmZiIvPgo8cGF0aCBkPSJNNCAxMkwxMiA0TDQgMTJaIiBmaWxsPSIjZmZmIi8%2BCjxwYXRoIGQ9Ik00IDEyTDEyIDQiIHN0cm9rZT0iI2ZmZiIgc3Ryb2tlLXdpZHRoPSIxLjUiIHN0cm9rZS1saW5lY2FwPSJyb3VuZCIvPgo8L3N2Zz4K&logoColor=ffffff)](https://zread.ai/DangoSys/buckyball) -[![Document](https://github.com/DangoSys/buckyball/actions/workflows/doc.yml/badge.svg?branch=main)](https://dangosys.github.io/buckyball) -[![Buckyball CI](https://github.com/DangoSys/buckyball/actions/workflows/test.yml/badge.svg)](https://github.com/DangoSys/buckyball/actions/workflows/test.yml) +[![Document](https://img.shields.io/badge/documents-online-30c452?style=plastic&logo=gitbook)](http://www.buckyball.tech/docs/en/Overview/Overview.md) +[![buckyball CI](https://github.com/DangoSys/buckyball/actions/workflows/test.yml/badge.svg)](https://github.com/DangoSys/buckyball/actions/workflows/test.yml)
@@ -17,90 +17,59 @@ Buckyball is a scalable framework for Domain Specific Architecture, built on RIS ## Project Overview -The Buckyball framework provides a complete hardware design, simulation verification, and software development toolchain, supporting the full development process from RTL design to system-level verification. The framework adopts a modular design that supports flexible configuration and extension, suitable for various specialized computing scenarios. +The buckyball framework provides a complete hardware design, simulation verification, and software development toolchain, supporting the full development process from RTL design to system-level verification. The framework adopts a modular design that supports flexible configuration and extension, suitable for various specialized computing scenarios. ## Quick Start -### Environment Dependencies +### Installation in Nix +We use Nix Flake as our main build system. If you have not installed nix, install it following the [guide](https://nix.dev/manual/nix/2.28/installation/installing-binary.html), and enable flake following the [wiki](https://nixos.wiki/wiki/Flakes#Enable_flakes). Or you can try the [installer](https://github.com/DeterminateSystems/nix-installer) provided by Determinate Systems, which enables flake by default. -Before getting started, please ensure your system meets the following dependency requirements: - -**Required Software**: -- Anaconda/Miniconda (Python environment management) -- Ninja Build System -- GTKWave (waveform viewer) -- Bash Shell environment (doesn't need to be the primary shell) - -**Installing Dependencies**: -```bash -# Install Anaconda -# Download from: https://www.anaconda.com/download/ - -# Install system tools -sudo apt install ninja-build gtkwave - -# Optional: FireSim passwordless configuration -# Add to /etc/sudoers: user_name ALL=(ALL) NOPASSWD:ALL -``` - -### Source Build **1. Clone Repository** + ```bash git clone https://github.com/DangoSys/buckyball.git -cd buckyball ``` **2. Initialize Environment** ```bash -./scripts/init.sh +cd buckyball +./scripts/nix/build-all.sh ``` -*Note: Initialization takes approximately 3 hours, including dependency downloads and compilation* -**3. Environment Activation** +After the first time installation, you can enter the environment anytime by running: + ```bash -source buckyball/env.sh +nix develop ``` -**4. Verify Installation** +**3. Verify Installation** Run Verilator simulation test to verify installation: ```bash bbdev verilator --run '--jobs 16 --binary ctest_vecunit_matmul_ones_singlecore-baremetal --config sims.verilator.BuckyballToyVerilatorConfig --batch' ``` -### Docker Quick Experience - - We support providing a Docker environment for rapid deployment of buckyball. -**Notice**: -- Docker images are provided only for specific release versions. -- Docker image may not be the latest version, source build is recommended. - -> We do not provide support for this version as it is not a stable release. - -### Buckyball as a library + -## Quick Tutorial -You can start to learn ball and blink from [here](https://github.com/DangoSys/buckyball/blob/main/docs/bb-note/src/tutorial/tutorial.md) +## Tutorial +You can start to learn ball and blink from [here](https://github.com/DangoSys/documents/blob/main/content/en/Tutorial/Building%20Your%20Own%20Hardware%20Designs.md) ## Additional Resources You can learn more from [DeepWiki](https://deepwiki.com/DangoSys/buckyball) and [Zread](https://zread.ai/DangoSys/buckyball) -## Community - -Join our discussion on [Slack](https://dangosys-buckyball.slack.com/) ## Contributors -Thank you for considering contributing to Buckyball! +Thank you for considering contributing to buckyball! diff --git a/arch/.mill-version b/arch/.mill-version index 35ad3442..bd0119f9 100644 --- a/arch/.mill-version +++ b/arch/.mill-version @@ -1 +1 @@ -0.11.4 +0.11.12 diff --git a/arch/.scalafmt.conf b/arch/.scalafmt.conf index 46935cc9..91d1b07c 100644 --- a/arch/.scalafmt.conf +++ b/arch/.scalafmt.conf @@ -1,8 +1,8 @@ -version = 2.6.4 +version = 2.7.5 -# Exclude thirdparty directory from formatting -project.excludeFilters = [ - "thirdparty" +# Only format src/main/scala directory +project.includeFilters = [ + "src/main/scala/.*\\.scala$" ] # Basic formatting diff --git a/arch/build.sbt b/arch/build.sbt index 93f38dbb..19c9d68a 100644 --- a/arch/build.sbt +++ b/arch/build.sbt @@ -1,7 +1,7 @@ // See README.md for license details. -val chisel6Version = "6.5.0" -val chiselTestVersion = "6.0.0" +val chisel6Version = "6.5.0" +val chiselTestVersion = "6.0.0" val scalaVersionFromChisel = "2.13.12" // Fix for scalafix undefined setting @@ -19,11 +19,11 @@ lazy val chisel6Settings = Seq( lazy val chiselSettings = chisel6Settings ++ Seq( libraryDependencies ++= Seq( "org.apache.commons" % "commons-lang3" % "3.12.0", - "org.apache.commons" % "commons-text" % "1.9" + "org.apache.commons" % "commons-text" % "1.9" ) ) -lazy val scalaTestSettings = Seq( +lazy val scalaTestSettings = Seq( libraryDependencies ++= Seq( "org.scalatest" %% "scalatest" % "3.2.+" % "test" ) @@ -34,9 +34,8 @@ lazy val scalaTestSettings = Seq( // ------------------------------------------------------------------------------ lazy val chipyard = ProjectRef(file("thirdparty/chipyard"), "chipyard") lazy val firechip = ProjectRef(file("thirdparty/chipyard"), "firechip") - -// Palladium FPGA subproject (external reference) -lazy val palladium = ProjectRef(file("../tools/palladium"), "palladium") +// Palladium FPGA subproject +lazy val palladium = ProjectRef(file("../thirdparty/palladium"), "palladium") // ------------------------------------------------------------------------------ // Project Settings @@ -44,9 +43,9 @@ lazy val palladium = ProjectRef(file("../tools/palladium"), "palladium") lazy val buckyball = (project in file(".")) .dependsOn(chipyard, firechip, palladium) .settings( - name := "buckyball", + name := "buckyball", organization := "com.buckyball", - version := "1.0.0", + version := "1.0.0", scalaVersion := scalaVersionFromChisel, scalacOptions ++= Seq( "-deprecation", @@ -58,10 +57,12 @@ lazy val buckyball = (project in file(".")) Resolver.sonatypeRepo("releases") ), chisel6Settings ++ - scalaTestSettings ++ - Seq( - libraryDependencies ++= Seq( - "edu.berkeley.cs" %% "rocketchip" % "1.6" + scalaTestSettings ++ + Seq( + libraryDependencies ++= Seq( + "edu.berkeley.cs" %% "rocketchip" % "1.6", + "com.lihaoyi" %% "os-lib" % "0.10.0", + "com.lihaoyi" %% "upickle" % "3.3.1" + ) ) - ) ) diff --git a/arch/build.sc b/arch/build.sc index ab437d61..9c8180f5 100644 --- a/arch/build.sc +++ b/arch/build.sc @@ -7,11 +7,10 @@ import scalalib._ // support BSP import mill.bsp._ - - object buckyball extends SbtModule { m => override def millSourcePath = os.pwd - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" + override def scalacOptions = Seq( "-language:reflectiveCalls", "-deprecation", @@ -24,7 +23,8 @@ object buckyball extends SbtModule { m => override def moduleDeps = Seq( chipyard, firechip, - palladium + palladium, + pegasus ) override def ivyDeps = Agg( @@ -46,18 +46,21 @@ object buckyball extends SbtModule { m => object test extends ScalaModule with TestModule.ScalaTest { override def scalaVersion = T("2.13.12") - override def moduleDeps = Seq(m) + override def moduleDeps = Seq(m) + override def ivyDeps = Agg( ivy"org.scalatest::scalatest::3.2.19" // ivy"org.scalatest::scalatest:3.2.16" ) + } + } // Define cde module - must be compiled first object cde extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "tools" / "cde" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Override sources to match freshProject behavior override def sources = T.sources { @@ -71,12 +74,13 @@ object cde extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define hardfloat module - depends on cde object hardfloat extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "hardfloat" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add cde dependency override def moduleDeps = Seq( @@ -95,12 +99,14 @@ object hardfloat extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define midas_target_utils module object midas_target_utils extends SbtModule { - override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "sims" / "firesim" / "sim" / "midas" / "targetutils" - override def scalaVersion = "2.13.12" + override def millSourcePath = + os.pwd / "thirdparty" / "chipyard" / "sims" / "firesim" / "sim" / "midas" / "targetutils" + override def scalaVersion = "2.13.12" override def ivyDeps = Agg( ivy"org.chipsalliance::chisel:6.5.0" @@ -109,12 +115,13 @@ object midas_target_utils extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define diplomacy module - depends on cde object diplomacy extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "diplomacy" / "diplomacy" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add cde dependency first override def moduleDeps = Seq( @@ -134,12 +141,13 @@ object diplomacy extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define rocket-chip module with proper dependencies object rocketchip extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "rocket-chip" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add required dependencies for rocket-chip override def moduleDeps = Seq( @@ -159,12 +167,13 @@ object rocketchip extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define chipyard module object chipyard extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Override sources to include tools/stage, generators/chipyard, and harness directories (as per build.sbt) override def sources = T.sources { @@ -213,12 +222,13 @@ object chipyard extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define testchipip module object testchipip extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "testchipip" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip and rocket-chip-blocks as dependencies override def moduleDeps = Seq( @@ -233,12 +243,13 @@ object testchipip extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define rocket-chip-blocks module (contains sifive package) object rocket_chip_blocks extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "rocket-chip-blocks" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -252,12 +263,13 @@ object rocket_chip_blocks extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define icenet module object icenet extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "icenet" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -271,12 +283,13 @@ object icenet extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define nvdla module object nvdla extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "nvdla" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -290,12 +303,13 @@ object nvdla extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define fft_generator module object fft_generator extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "fft-generator" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip and rocket-dsp-utils as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -310,12 +324,13 @@ object fft_generator extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define constellation module object constellation extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "constellation" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -329,12 +344,13 @@ object constellation extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define boom module object boom extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "boom" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -348,12 +364,13 @@ object boom extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define tracegen module object tracegen extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "tracegen" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add testchipip, rocket-chip, rocketchip_inclusive_cache, and boom as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -370,12 +387,13 @@ object tracegen extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define shuttle module object shuttle extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "shuttle" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -389,12 +407,13 @@ object shuttle extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define rocketchip_inclusive_cache module object rocketchip_inclusive_cache extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "rocket-chip-inclusive-cache" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Override sources to match build.sbt behavior - point to design/craft directory override def sources = T.sources { @@ -413,12 +432,13 @@ object rocketchip_inclusive_cache extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define saturn module object saturn extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "saturn" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip and shuttle as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -433,12 +453,13 @@ object saturn extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define gemmini module object gemmini extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "gemmini" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -452,12 +473,13 @@ object gemmini extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define sodor module object sodor extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "riscv-sodor" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -471,12 +493,13 @@ object sodor extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define vexiiriscv module object vexiiriscv extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "vexiiriscv" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -490,12 +513,13 @@ object vexiiriscv extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define ibex module object ibex extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "ibex" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -509,12 +533,13 @@ object ibex extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define cva6 module object cva6 extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "cva6" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -528,12 +553,13 @@ object cva6 extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define ara module object ara extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "ara" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip and shuttle as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -548,12 +574,13 @@ object ara extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define rerocc module object rerocc extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "rerocc" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip, constellation, boom, and shuttle as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -570,12 +597,13 @@ object rerocc extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define rocket-dsp-utils module object rocket_dsp_utils extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "tools" / "rocket-dsp-utils" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip, cde, and dsptools as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -591,12 +619,13 @@ object rocket_dsp_utils extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define dsptools module object dsptools extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "tools" / "dsptools" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip and fixedpoint as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -614,12 +643,13 @@ object dsptools extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define fixedpoint module object fixedpoint extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "tools" / "fixedpoint" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -633,12 +663,13 @@ object fixedpoint extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define compressacc module object compressacc extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "compress-acc" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -652,12 +683,13 @@ object compressacc extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define mempress module object mempress extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "mempress" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -671,12 +703,13 @@ object mempress extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define barf module object barf extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "bar-fetchers" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -690,12 +723,13 @@ object barf extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define caliptra_aes module object caliptra_aes extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "caliptra-aes-acc" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip, rocc_acc_utils, and testchipip as dependencies (as per build.sbt) override def moduleDeps = Seq( @@ -711,12 +745,13 @@ object caliptra_aes extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define rocc_acc_utils module object rocc_acc_utils extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "rocc-acc-utils" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocket-chip as a dependency override def moduleDeps = Seq( @@ -730,22 +765,28 @@ object rocc_acc_utils extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define firrtl2 module object firrtl2 extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "tools" / "firrtl2" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" - // Override sources to include generated ANTLR sources and BuildInfo + // Override sources to include generated ANTLR sources and BuildInfo (from sbt antlr4Generate/compile) override def sources = T.sources { val baseSources = super.sources() - // Include pre-generated sources from target directory - val generatedDir = millSourcePath / "src" / "target" / "scala-2.13" / "src_managed" / "main" - if (os.exists(generatedDir)) { - baseSources ++ Seq(PathRef(generatedDir)) - } else { - baseSources + // Chipyard freshProject sets firrtl2 base to tools/firrtl2/src, so sbt puts target under src/ (normally is under here) + val underSrc = millSourcePath / "src" / "target" / "scala-2.13" / "src_managed" / "main" + // If sbt was run from tools/firrtl2 directly, target is under tools/firrtl2/ + val underRoot = millSourcePath / "target" / "scala-2.13" / "src_managed" / "main" + val generatedDir = if (os.exists(underSrc)) Some(underSrc) else if (os.exists(underRoot)) Some(underRoot) else None + generatedDir match { + case Some(dir) => baseSources ++ Seq(PathRef(dir)) + case None => + throw new Exception( + "firrtl2.antlr not found. Run: cd arch/thirdparty/chipyard && sbt compile" + ) } } @@ -770,12 +811,13 @@ object firrtl2 extends SbtModule { "-language:existentials", "-language:implicitConversions" ) + } // Define firrtl2_bridge module object firrtl2_bridge extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "tools" / "firrtl2" / "bridge" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add firrtl2 as a dependency override def moduleDeps = Seq( @@ -789,12 +831,12 @@ object firrtl2_bridge extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) -} +} object firesim_lib extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "sims" / "firesim" / "sim" / "firesim-lib" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add midas_target_utils as a dependency override def moduleDeps = Seq( @@ -808,6 +850,7 @@ object firesim_lib extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Interfaces for target-specific bridges shared with FireSim. @@ -815,7 +858,7 @@ object firesim_lib extends SbtModule { // This is copied to FireSim's GoldenGate compiler. object firechip_bridgeinterfaces extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "firechip" / "bridgeinterfaces" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" override def ivyDeps = Agg( ivy"org.chipsalliance::chisel:6.5.0" @@ -824,13 +867,14 @@ object firechip_bridgeinterfaces extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Target-side bridge definitions, CC files, etc used for FireSim. // This only compiled with Chipyard. object firechip_bridgestubs extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "firechip" / "bridgestubs" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add chipyard, firesim_lib, and firechip_bridgeinterfaces as dependencies override def moduleDeps = Seq( @@ -846,12 +890,13 @@ object firechip_bridgestubs extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // FireSim top-level project that includes the FireSim harness, CC files, etc needed for FireSim. object firechip extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "generators" / "firechip" / "chip" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add chipyard, firesim_lib, firechip_bridgestubs, and firechip_bridgeinterfaces as dependencies override def moduleDeps = Seq( @@ -868,12 +913,13 @@ object firechip extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Define fpga_shells module object fpga_shells extends SbtModule { override def millSourcePath = os.pwd / "thirdparty" / "chipyard" / "fpga" / "fpga-shells" - override def scalaVersion = "2.13.12" + override def scalaVersion = "2.13.12" // Add rocketchip and rocket_chip_blocks as dependencies override def moduleDeps = Seq( @@ -888,12 +934,13 @@ object fpga_shells extends SbtModule { override def scalacPluginIvyDeps = Agg( ivy"org.chipsalliance:::chisel-plugin:6.5.0" ) + } // Palladium FPGA subproject (external reference) object palladium extends SbtModule { - override def millSourcePath = os.pwd / os.up / "tools" / "palladium" - override def scalaVersion = "2.13.12" + override def millSourcePath = os.pwd / os.up / "thirdparty" / "palladium" + override def scalaVersion = "2.13.12" // Add chipyard and fpga_shells as dependencies override def moduleDeps = Seq( @@ -914,48 +961,31 @@ object palladium extends SbtModule { "-unchecked", "-Ymacro-annotations" ) + } -// uvbb测试模块 -// object uvbb extends ScalaModule { -// override def millSourcePath = os.pwd / os.up / "bb-tests" / "uvbb" -// override def scalaVersion = "2.13.12" - -// override def scalacOptions = Seq( -// "-language:reflectiveCalls", -// "-deprecation", -// "-feature", -// "-Xcheckinit", -// "-Ymacro-annotations" -// ) - -// // 依赖buckyball模块 -// override def moduleDeps = Seq(buckyball) - -// override def ivyDeps = Agg( -// ivy"org.chipsalliance::chisel:6.5.0" -// ) - -// override def scalacPluginIvyDeps = Agg( -// ivy"org.chipsalliance:::chisel-plugin:6.5.0" -// ) - -// // 包含dut源码路径 -// override def sources = T.sources { -// super.sources() ++ Seq( -// PathRef(millSourcePath / "dut" / "src" / "main" / "scala") -// ) -// } - -// // 包含resources路径 -// override def resources = T.sources { -// super.resources() ++ Seq( -// PathRef(millSourcePath / "dut" / "src" / "main" / "resources") -// ) -// } - -// // 生成Verilog的任务 -// def elaborate = T { -// runMain("uvbb.Elaborate")() -// } -// } +// Pegasus FPGA framework Chisel sources +// Physical location: pegasus/chisel/ (outside arch/) +// Pure Chisel only — no Chipyard dependency (avoids circular dep with buckyball) +// buckyball depends on pegasus (one-way) +object pegasus extends SbtModule { + override def millSourcePath = os.pwd / os.up / "pegasus" / "chisel" + override def scalaVersion = "2.13.12" + + override def ivyDeps = Agg( + ivy"org.chipsalliance::chisel:6.5.0" + ) + + override def scalacPluginIvyDeps = Agg( + ivy"org.chipsalliance:::chisel-plugin:6.5.0" + ) + + override def scalacOptions = Seq( + "-language:reflectiveCalls", + "-deprecation", + "-feature", + "-Xcheckinit", + "-Ymacro-annotations" + ) + +} diff --git a/arch/img/buckyball.png b/arch/img/buckyball.png deleted file mode 100644 index 6035040a..00000000 Binary files a/arch/img/buckyball.png and /dev/null differ diff --git a/arch/img/dma1.png b/arch/img/dma1.png deleted file mode 100644 index 373efa42..00000000 Binary files a/arch/img/dma1.png and /dev/null differ diff --git a/arch/img/dma2.png b/arch/img/dma2.png deleted file mode 100644 index 9aa671e4..00000000 Binary files a/arch/img/dma2.png and /dev/null differ diff --git a/arch/project/plugins.sbt b/arch/project/plugins.sbt index 87f092c7..1dd5989c 100644 --- a/arch/project/plugins.sbt +++ b/arch/project/plugins.sbt @@ -1,2 +1,2 @@ -addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.1") +addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.1") addSbtPlugin("ch.epfl.scala" % "sbt-scalafix" % "0.10.4") diff --git a/arch/scripts/bdb_ndjson_annotate.py b/arch/scripts/bdb_ndjson_annotate.py new file mode 100644 index 00000000..eeaa1df4 --- /dev/null +++ b/arch/scripts/bdb_ndjson_annotate.py @@ -0,0 +1,78 @@ +#!/usr/bin/env python3 +""" +Lookup instruction name by funct id. + +Example: + python3 arch/scripts/bdb_ndjson_annotate.py 0x20 +""" + +from __future__ import annotations + +import argparse +import re +from pathlib import Path +from typing import Any + +FILE_RE = re.compile(r"^(\d+)_([a-z0-9_]+)\.c$") + + +def parse_funct_id(v: Any) -> int: + try: + if isinstance(v, int): + return v + if isinstance(v, str): + return int(v, 0) + except ValueError as e: + raise SystemExit(f"invalid funct id: {v!r}") from e + raise SystemExit(f"invalid funct id type: {type(v)}") + + +def build_funct_map(isa_dir: Path) -> dict[int, str]: + if not isa_dir.is_dir(): + raise SystemExit(f"ISA dir not found: {isa_dir}") + mp: dict[int, str] = {} + for p in isa_dir.glob("*.c"): + m = FILE_RE.match(p.name) + if not m: + continue + fid = int(m.group(1)) + name = m.group(2) + mp[fid] = name + if not mp: + raise SystemExit(f"no ISA funct mapping found under: {isa_dir}") + return mp + + +def resolve_funct_name(fid: int, funct_map: dict[int, str]) -> str: + if fid not in funct_map: + raise SystemExit(f"unknown funct id: 0x{fid:x}") + return funct_map[fid] + + +def default_isa_dir() -> Path: + return ( + Path(__file__).resolve().parents[2] + / "bb-tests" + / "workloads" + / "lib" + / "bbhw" + / "isa" + ) + + +def main() -> None: + p = argparse.ArgumentParser(description="Lookup instruction name by funct id") + p.add_argument("funct_id", help="funct id, e.g. 0x20 or 32") + p.add_argument( + "--isa-dir", type=Path, default=default_isa_dir(), help="ISA macro dir" + ) + args = p.parse_args() + + fid = parse_funct_id(args.funct_id) + fmap = build_funct_map(args.isa_dir) + name = resolve_funct_name(fid, fmap) + print(name) + + +if __name__ == "__main__": + main() diff --git a/arch/scripts/bdb_ndjson_viz.py b/arch/scripts/bdb_ndjson_viz.py new file mode 100644 index 00000000..4fb2406f --- /dev/null +++ b/arch/scripts/bdb_ndjson_viz.py @@ -0,0 +1,245 @@ +#!/usr/bin/env python3 +""" +Parse BDB NDJSON trace and draw a single RoB-state figure. + +Only itrace issue->complete intervals are plotted. +Idle gaps (no RoB active) are not shown. + +Requires: matplotlib (pip install matplotlib) +""" + +from __future__ import annotations + +import argparse +import json +from pathlib import Path +from typing import Any + +from bdb_ndjson_annotate import build_funct_map, default_isa_dir, parse_funct_id + + +def load_records(path: Path) -> list[dict[str, Any]]: + rows: list[dict[str, Any]] = [] + with path.open(encoding="utf-8", errors="replace") as f: + for i, line in enumerate(f, 1): + line = line.strip() + if not line: + continue + try: + rows.append(json.loads(line)) + except json.JSONDecodeError as e: + raise SystemExit(f"{path}:{i}: invalid JSON: {e}") from e + return rows + + +def clk_of(rec: dict[str, Any], fallback: int | None) -> int | None: + if "clk" in rec: + return int(rec["clk"]) + return fallback + + +def build_rob_intervals( + records: list[dict[str, Any]], +) -> tuple[list[tuple[int, int, int, int, str]], list[tuple[int, int, int, int]], bool]: + """ + Build RoB intervals from itrace events. + Returns: + - active windows: issue->complete (t0, t1, domain_id, rob_id, funct_label) + - in-rob windows: alloc->complete (t0, t1, domain_id, rob_id) + - used_real_clk + """ + fmap = build_funct_map(default_isa_dir()) + issue_open: dict[tuple[int, int], tuple[int, str]] = {} + alloc_open: dict[tuple[int, int], int] = {} + active_out: list[tuple[int, int, int, int, str]] = [] + inrob_out: list[tuple[int, int, int, int]] = [] + used_clk = False + seq = 0 + + for rec in records: + if rec.get("type") != "itrace": + seq += 1 + continue + + ev = str(rec.get("event", "")) + dom = int(rec.get("domain_id", 0)) + rid = int(rec.get("rob_id", -1)) + if rid < 0: + seq += 1 + continue + t = clk_of(rec, seq) + if t is None: + t = seq + else: + used_clk = True + + key = (dom, rid) + funct_lbl = "" + if "funct" in rec: + try: + fid = parse_funct_id(rec["funct"]) + funct_lbl = fmap.get(fid, f"0x{fid:x}") + except SystemExit: + funct_lbl = str(rec["funct"]) + + if ev == "alloc": + alloc_open[key] = int(t) + elif ev == "issue": + issue_open[key] = (int(t), funct_lbl) + elif ev == "complete": + if key in issue_open: + t0, flbl = issue_open.pop(key) + t1 = int(t) + if t1 < t0: + t0, t1 = t1, t0 + if t1 == t0: + t1 += 1 + active_out.append((t0, t1, dom, rid, flbl)) + if key in alloc_open: + t0 = alloc_open.pop(key) + t1 = int(t) + if t1 < t0: + t0, t1 = t1, t0 + if t1 == t0: + t1 += 1 + inrob_out.append((t0, t1, dom, rid)) + seq += 1 + + return active_out, inrob_out, used_clk + + +def plot_timeline( + records: list[dict[str, Any]], + path: Path, + out_path: Path | None, + show: bool, +) -> None: + try: + import matplotlib.pyplot as plt + except ImportError as e: + raise SystemExit("matplotlib is required: pip install matplotlib") from e + + if not records: + raise SystemExit("empty trace") + + rob_itv, inrob_itv, used_clk = build_rob_intervals(records) + if not rob_itv and not inrob_itv: + raise SystemExit("no itrace issue/complete pairs found") + all_t0: list[int] = [] + all_t1: list[int] = [] + if rob_itv: + all_t0.extend(t0 for t0, _, _, _, _ in rob_itv) + all_t1.extend(t1 for _, t1, _, _, _ in rob_itv) + if inrob_itv: + all_t0.extend(t0 for t0, _, _, _ in inrob_itv) + all_t1.extend(t1 for _, t1, _, _ in inrob_itv) + g0 = min(all_t0) + g1 = max(all_t1) + if g1 <= g0: + g1 = g0 + 1 + + fig, ax = plt.subplots(1, 1, figsize=(14, 6)) + rows = sorted( + {(dom, rid) for _, _, dom, rid, _ in rob_itv} + | {(dom, rid) for _, _, dom, rid in inrob_itv} + ) + y_of = {k: i for i, k in enumerate(rows)} + + # Draw in-ROB occupancy (alloc->complete) as light gray background. + for t0, t1, dom, rid in inrob_itv: + y = y_of[(dom, rid)] + ax.barh( + y, + width=t1 - t0, + left=t0, + height=0.80, + color="#d9d9d9", + alpha=0.55, + edgecolor="none", + zorder=0, + ) + + for t0, t1, dom, rid, flbl in rob_itv: + y = y_of[(dom, rid)] + ax.vlines( + t0, + ymin=-0.5, + ymax=y, + colors="0.45", + linestyles="-", + linewidth=0.6, + alpha=0.6, + zorder=1, + ) + ax.barh( + y, + width=t1 - t0, + left=t0, + height=0.65, + color=f"C{dom % 10}", + alpha=0.9, + edgecolor="black", + linewidth=0.3, + zorder=2, + ) + if (t1 - t0) > 10 and flbl: + ax.text( + (t0 + t1) * 0.5, + y, + flbl, + ha="center", + va="center", + fontsize=7, + color="white", + zorder=3, + ) + + ax.set_xlim(g0, g1) + ax.set_yticks(range(len(rows))) + ax.set_yticklabels([f"d{dom}:r{rid}" for dom, rid in rows], fontsize=8) + ax.set_xlabel("clk" if used_clk else "record order") + ax.set_ylabel("RoB entry (active only)") + title_clk = "harness clk" if used_clk else "fallback order index" + ax.set_title(f"RoB active timeline — {path.name} ({title_clk})") + + plt.tight_layout() + if out_path: + out_path.parent.mkdir(parents=True, exist_ok=True) + fig.savefig(out_path, dpi=150, bbox_inches="tight") + if show: + plt.show() + else: + plt.close(fig) + + +def main() -> None: + p = argparse.ArgumentParser(description="Parse and visualize BDB NDJSON trace") + p.add_argument("ndjson", type=Path, help="Path to bdb.ndjson") + p.add_argument( + "-o", + "--output", + type=Path, + default=None, + help="Output PNG path (default: .timeline.png if not --show-only)", + ) + p.add_argument( + "--show", + action="store_true", + help="Open interactive window (requires display)", + ) + args = p.parse_args() + + if not args.ndjson.is_file(): + raise SystemExit(f"not a file: {args.ndjson}") + + records = load_records(args.ndjson) + + out = args.output + if out is None and not args.show: + out = args.ndjson.with_suffix(".timeline.png") + + plot_timeline(records, args.ndjson, out, args.show) + + +if __name__ == "__main__": + main() diff --git a/arch/src/csrc/include/bdb.h b/arch/src/csrc/include/bdb.h index e85ba802..3b688b02 100644 --- a/arch/src/csrc/include/bdb.h +++ b/arch/src/csrc/include/bdb.h @@ -2,30 +2,38 @@ #define _BDB_H_ // DPI-C -#include "verilated_dpi.h" -// #include "Vtop__Dpi.h" #include "svdpi.h" +#include "verilated_dpi.h" // verilator #include "verilated.h" -// #include "verilated_vcd_c.h" -#include "VTestHarness.h" -#include "verilated_fst_c.h" -// ================ DataType ==================== +#include "VBBSimHarness.h" +#include "monitor/trace_cfg.h" +#include "verilated_fst_c.h" +#if VM_COVERAGE +#include "verilated_cov.h" +#endif -// ================ RISCV CPU =================== +extern VBBSimHarness *top; // ================ BDB Config =================== -// VCD file path -extern const char *vcd_path; // Log file path extern const char *log_path; // FST file path extern const char *fst_path; +// UART stdout file path +extern const char *stdout_path; +// Raw stdout fd saved before dup2 — UART writes here for real-time display +extern int raw_stdout_fd; + +// If set (bbdev sim), NDJSON banner goes here; stdout may be piped to +// spike-dasm. +const char *bdb_sim_meta_path(void); void init_monitor(int argc, char *argv[]); void bdb_mainloop(); void ball_exec_once(); void bdb_set_batch_mode(); +void sim_exit(); #endif // _BDB_H_ diff --git a/arch/src/csrc/include/ioe/mmio.h b/arch/src/csrc/include/ioe/mmio.h new file mode 100644 index 00000000..a31aba00 --- /dev/null +++ b/arch/src/csrc/include/ioe/mmio.h @@ -0,0 +1,13 @@ +#ifndef _MMIO_H_ +#define _MMIO_H_ + +// mmio_tick: called once per posedge after eval(). +// io_mmio_fire is a 1-cycle register pulse; addr/data are stable latched +// registers. +// +// Address map: +// 0x6000_0000 : simulation exit — write triggers sim_exit() +// 0x6002_0000 : UART0 TX — write low byte → putchar +void mmio_tick(); + +#endif // _MMIO_H_ diff --git a/arch/src/csrc/include/monitor/halt.h b/arch/src/csrc/include/monitor/halt.h new file mode 100644 index 00000000..c336cf9f --- /dev/null +++ b/arch/src/csrc/include/monitor/halt.h @@ -0,0 +1,16 @@ +#ifndef MONITOR_HALT_H_ +#define MONITOR_HALT_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +// DPI-C function called by HaltDPI.sv when ebreak is detected +// Triggers bdb sim_exit() from within the simulation loop +void dpi_sim_halt(void); + +#ifdef __cplusplus +} +#endif + +#endif // MONITOR_HALT_H_ diff --git a/arch/src/csrc/include/monitor/trace.h b/arch/src/csrc/include/monitor/trace.h new file mode 100644 index 00000000..afe79e5b --- /dev/null +++ b/arch/src/csrc/include/monitor/trace.h @@ -0,0 +1,62 @@ +#ifndef MONITOR_TRACE_H_ +#define MONITOR_TRACE_H_ + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +// Harness clock cycle index (BBSimHarness posedge); must run before other trace +// DPI in same eval. +void dpi_bdb_set_clk(unsigned long long c); + +// DPI-C function for instruction trace (itrace) +// Called from GlobalROB when instructions are allocated/issued/completed +void dpi_itrace(unsigned char is_issue, // 2 = alloc, 1 = issue, 0 = complete + unsigned int rob_id, unsigned int domain_id, unsigned int funct, + unsigned long long pc, unsigned long long rs1, + unsigned long long rs2, unsigned char bank_enable); + +// DPI-C function for memory trace (mtrace) +// Called from MemBackend when read/write requests are made +void dpi_mtrace(unsigned char is_write, // 1 = write, 0 = read + unsigned char is_shared, unsigned int channel, + unsigned long long hart_id, unsigned int vbank_id, + unsigned int group_id, unsigned int addr, + unsigned long long data_lo, unsigned long long data_hi); + +// DPI-C function for Ball PMC trace (pmctrace) +// Called from BallCyclePMC when a Ball completes a task +void dpi_pmctrace(unsigned int ball_id, unsigned int rob_id, + unsigned long long elapsed); + +// DPI-C function for memory PMC trace (pmctrace) +// Called from MemCyclePMC when a load/store completes +void dpi_mem_pmctrace(unsigned char is_store, // 1 = store, 0 = load + unsigned int rob_id, unsigned long long elapsed); + +// DPI-C function for cycle counter trace (ctrace) +// Called from TraceBall when counter commands are executed +void dpi_ctrace(unsigned char subcmd, // 0=START, 1=STOP, 2=READ + unsigned int ctr_id, unsigned long long tag, + unsigned long long elapsed, unsigned long long cycle); + +// DPI-C functions for bank backdoor (TraceBall) +// RTL calls these to get parameters from C++ testbench +unsigned long long dpi_backdoor_get_read_addr(void); +unsigned long long dpi_backdoor_get_write_addr(void); +void dpi_backdoor_get_write_data(unsigned long long *data_lo, + unsigned long long *data_hi); +void dpi_backdoor_put_read_data(unsigned int bank_id, unsigned int row, + unsigned long long data_lo, + unsigned long long data_hi); +void dpi_backdoor_put_write_done(unsigned int bank_id, unsigned int row, + unsigned long long data_lo, + unsigned long long data_hi); + +#ifdef __cplusplus +} +#endif + +#endif // MONITOR_TRACE_H_ diff --git a/arch/src/csrc/include/monitor/trace_cfg.h b/arch/src/csrc/include/monitor/trace_cfg.h new file mode 100644 index 00000000..6625e453 --- /dev/null +++ b/arch/src/csrc/include/monitor/trace_cfg.h @@ -0,0 +1,23 @@ +#ifndef MONITOR_TRACE_CFG_H_ +#define MONITOR_TRACE_CFG_H_ + +#include + +enum { + BDB_TR_ITRACE = 1u << 0, + BDB_TR_MTRACE = 1u << 1, + BDB_TR_PMCTRACE = 1u << 2, + BDB_TR_CTRACE = 1u << 3, + BDB_TR_BANKTRACE = 1u << 4, + BDB_TR_ALL = BDB_TR_ITRACE | BDB_TR_MTRACE | BDB_TR_PMCTRACE | BDB_TR_CTRACE | + BDB_TR_BANKTRACE +}; + +extern uint32_t bdb_trace_mask; +extern uint64_t bdb_rtl_clk; + +static inline int bdb_trace_on(uint32_t bit) { + return (bdb_trace_mask & bit) != 0; +} + +#endif diff --git a/arch/src/csrc/include/utils/debug.h b/arch/src/csrc/include/utils/debug.h index cae3b151..156c1bf1 100644 --- a/arch/src/csrc/include/utils/debug.h +++ b/arch/src/csrc/include/utils/debug.h @@ -4,98 +4,27 @@ #include "utils/macro.h" #include -// Without formatting -// #define COLOR(a, b) "\033[" #b "m" a "\033[0m" -// #define GREEN(a) COLOR(a, 32) -// #define RED(a) COLOR(a, 31) -// #define BLUE(a) COLOR(a, 34) - -// Printf parameters for convenience, default is FG -#define BLACK "\33[1;30m" -#define RED "\33[1;31m" -#define GREEN "\33[1;32m" -#define YELLOW "\33[1;33m" -#define BLUE "\33[1;34m" -#define MAGENTA "\33[1;35m" -#define CYAN "\33[1;36m" -#define WHITE "\33[1;37m" - -// Interface with original Debug -// Foreground color -#define ASNI_FG_BLACK "\33[1;30m" #define ASNI_FG_RED "\33[1;31m" #define ASNI_FG_GREEN "\33[1;32m" #define ASNI_FG_YELLOW "\33[1;33m" #define ASNI_FG_BLUE "\33[1;34m" -#define ASNI_FG_MAGENTA "\33[1;35m" -#define ASNI_FG_CYAN "\33[1;36m" -#define ASNI_FG_WHITE "\33[1;37m" -#define ASNI_BG_BLACK "\33[1;40m" #define ASNI_BG_RED "\33[1;41m" -#define ASNI_BG_GREEN "\33[1;42m" -#define ASNI_BG_YELLOW "\33[1;43m" -#define ASNI_BG_BLUE "\33[1;44m" -#define ASNI_BG_MAGENTA "\33[1;35m" -#define ASNI_BG_CYAN "\33[1;46m" -#define ASNI_BG_WHITE "\33[1;47m" -// Reset color to "\033[0m" #define ASNI_NONE "\33[0m" #define ASNI_FMT(str, fmt) fmt str ASNI_NONE -#define log_write(...) \ - IFDEF( \ - CONFIG_TARGET_NATIVE_ELF, do { \ - extern FILE *log_fp; \ - extern bool log_enable(); \ - if (log_enable()) { \ - fprintf(log_fp, __VA_ARGS__); \ - fflush(log_fp); \ - } \ - } while (0)) - -#define _Log(...) \ - do { \ - printf(__VA_ARGS__); \ - log_write(__VA_ARGS__); \ - } while (0) - #define Log(format, ...) \ - _Log(ASNI_FMT("[%s:%d %s] " format, ASNI_FG_BLUE) "\n", __FILE__, __LINE__, \ - __func__, ##__VA_ARGS__) - -// Printf usage: first parameter is color, subsequent parameters (if any) follow -// in order -#define Printf(format, color, ...) _Log(ASNI_FMT(format, color), ##__VA_ARGS__) - -// #define assert(cond) \ -// do { \ -// if (!(cond)) { \ -// printf("Assertion fail at %s:%d\n", __FILE__, __LINE__); \ -// halt(0); \ -// } \ -// } while (0) + printf(ASNI_FMT("[%s:%d %s] " format, ASNI_FG_BLUE) "\n", __FILE__, \ + __LINE__, __func__, ##__VA_ARGS__) #define Assert(cond, format, ...) \ do { \ if (!(cond)) { \ - fflush(stdout), \ - fprintf(stderr, ASNI_FMT(format, ASNI_FG_RED) "\n", ##__VA_ARGS__); \ + fflush(stdout); \ + fprintf(stderr, ASNI_FMT(format, ASNI_FG_RED) "\n", ##__VA_ARGS__); \ } \ } while (0) -// #define Assert(cond, format, ...) \ -// do { \ -// if (!(cond)) { \ -// printf(ASNI_FMT(format, ASNI_FG_RED) "\n", ## __VA_ARGS__), \ -// fflush(stdout), fprintf(stderr, ASNI_FMT(format, ASNI_FG_RED) "\n", ## __VA_ARGS__); \ -// extern FILE* log_fp; fflush(log_fp); \ -// extern void assert_fail_msg(); \ -// assert_fail_msg(); \ -// assert(cond); \ -// } \ -// } while (0) - #define panic(format, ...) Assert(0, format, ##__VA_ARGS__) #endif diff --git a/arch/src/csrc/include/utils/macro.h b/arch/src/csrc/include/utils/macro.h index 1551febd..7f5ab44c 100644 --- a/arch/src/csrc/include/utils/macro.h +++ b/arch/src/csrc/include/utils/macro.h @@ -1,26 +1,24 @@ #ifndef __MACRO_H__ #define __MACRO_H__ -#include - // macro stringizing #define str_temp(x) #x #define str(x) str_temp(x) +#define concat_temp(x, y) x##y +#define concat(x, y) concat_temp(x, y) -// strlen() for string constant -#define STRLEN(CONST_STR) (sizeof(CONST_STR) - 1) - -// calculate the length of an array +// array length #define ARRLEN(arr) (int)(sizeof(arr) / sizeof(arr[0])) -// macro concatenation -#define concat_temp(x, y) x##y -#define concat(x, y) concat_temp(x, y) -#define concat3(x, y, z) concat(concat(x, y), z) -#define concat4(x, y, z, w) concat3(concat(x, y), z, w) -#define concat5(x, y, z, v, w) concat4(concat(x, y), z, v, w) +// bit manipulation +#define BITMASK(bits) ((1ull << (bits)) - 1) +#define BITS(x, hi, lo) (((x) >> (lo)) & BITMASK((hi) - (lo) + 1)) + +#if !defined(likely) +#define likely(cond) __builtin_expect(cond, 1) +#define unlikely(cond) __builtin_expect(cond, 0) +#endif -// macro testing // See // https://stackoverflow.com/questions/26099745/test-if-preprocessor-symbol-is-defined-inside-macro #define CHOOSE2nd(a, b, ...) b @@ -34,24 +32,11 @@ #define __P_ZERO_0 X, // define some selection functions based on the properties of BOOLEAN macro #define MUXDEF(macro, X, Y) MUX_MACRO_PROPERTY(__P_DEF_, macro, X, Y) -// If defined, then X, otherwise Y +// if defined, then X, otherwise Y #define MUXNDEF(macro, X, Y) MUX_MACRO_PROPERTY(__P_DEF_, macro, Y, X) #define MUXONE(macro, X, Y) MUX_MACRO_PROPERTY(__P_ONE_, macro, X, Y) #define MUXZERO(macro, X, Y) MUX_MACRO_PROPERTY(__P_ZERO_, macro, X, Y) -// test if a boolean macro is defined -#define ISDEF(macro) MUXDEF(macro, 1, 0) -// test if a boolean macro is undefined -#define ISNDEF(macro) MUXNDEF(macro, 1, 0) -// test if a boolean macro is defined to 1 -#define ISONE(macro) MUXONE(macro, 1, 0) -// test if a boolean macro is defined to 0 -#define ISZERO(macro) MUXZERO(macro, 1, 0) -// test if a macro of ANY type is defined -// NOTE1: it ONLY works inside a function, since it calls `strcmp()` -// NOTE2: macros defined to themselves (#define A A) will get wrong results -#define isdef(macro) (strcmp("" #macro, "" str(macro)) != 0) - // simplification for conditional compilation #define __IGNORE(...) #define __KEEP(...) __VA_ARGS__ @@ -64,55 +49,4 @@ // keep the code if a boolean macro is defined to 0 #define IFZERO(macro, ...) MUXZERO(macro, __KEEP, __IGNORE)(__VA_ARGS__) -// functional-programming-like macro (X-macro) -// apply the function `f` to each element in the container `c` -// NOTE1: `c` should be defined as a list like: -// f(a0) f(a1) f(a2) ... -// NOTE2: each element in the container can be a tuple -#define MAP(c, f) c(f) - -#define BITMASK(bits) ((1ull << (bits)) - 1) -#define BITS(x, hi, lo) \ - (((x) >> (lo)) & BITMASK((hi) - (lo) + 1)) // similar to x[hi:lo] in verilog - // Extract bits hi to lo from x -#define SEXT(x, len) \ - ({ \ - struct { \ - int64_t n : len; \ - } __x = {.n = x}; \ - (int64_t)__x.n; \ - }) - -// #define ROUNDUP(a, sz) ((((uintptr_t)a) + (sz) - 1) & ~((sz) - 1)) -// #define ROUNDDOWN(a, sz) ((((uintptr_t)a)) & ~((sz) - 1)) - -// #define PG_ALIGN __attribute((aligned(4096))) - -#if !defined(likely) -#define likely(cond) __builtin_expect(cond, 1) -#define unlikely(cond) __builtin_expect(cond, 0) -#endif - -// // for AM IOE -// #define io_read(reg) \ -// ({ reg##_T __io_param; \ -// ioe_read(reg, &__io_param); \ -// __io_param; }) - -// #define io_write(reg, ...) \ -// ({ reg##_T __io_param = (reg##_T) { __VA_ARGS__ }; \ -// ioe_write(reg, &__io_param); }) - -// Custom macros -#define MUX(v, p, a, b) v == p ? a : b -// value, p possible value, then a, otherwise b -#define SWAP(a, b) \ - a ^= b; \ - b ^= a; \ - a ^= b; - -// Convert x to string -#define STRINGIFY(x) #x -#define TOSTRING(x) STRINGIFY(x) - #endif diff --git a/arch/src/csrc/src/main.cc b/arch/src/csrc/src/main.cc index 97c14f35..49c3fae1 100644 --- a/arch/src/csrc/src/main.cc +++ b/arch/src/csrc/src/main.cc @@ -1,21 +1,32 @@ #include "bdb.h" +#include "ioe/mmio.h" #include "utils/debug.h" #include "utils/macro.h" -// #include "../build/obj_dir/VTestHarness___024root.h" +#include +#include -// #define MAX_SIM_TIME 50 Maximum simulation cycles vluint64_t sim_time = 0; - VerilatedContext *contextp = NULL; -// VerilatedVcdC *tfp = NULL; VerilatedFstC *tfp = NULL; -static VTestHarness *top; -// Record how many steps taken, useful for debugging when errors occur +VBBSimHarness *top; + int bb_step = 1; -//================ SIM FUNCTION =====================// +#if VM_COVERAGE +static void coverage_atexit() { + if (contextp) { + contextp->coveragep()->write(); + } +} + +static void coverage_signal_handler(int sig) { + coverage_atexit(); + _exit(128 + sig); +} +#endif + void step_and_dump_wave() { top->eval(); contextp->timeInc(1); @@ -26,17 +37,17 @@ void step_and_dump_wave() { void sim_init(int argc, char **argv) { contextp = new VerilatedContext; contextp->commandArgs(argc, argv); - // tfp = new VerilatedVcdC; tfp = new VerilatedFstC; - top = new VTestHarness{contextp}; + + top = new VBBSimHarness{contextp}; contextp->traceEverOn(true); top->trace(tfp, 0); - // tfp->open(vcd_path); - // Log("The waveform will be saved to the VCD file: %s", vcd_path); tfp->open(fst_path); - Log("The waveform will be saved to the FST file: %s", fst_path); + if (bdb_sim_meta_path() == nullptr) { + Log("The waveform will be saved to the FST file: %s", fst_path); + } top->reset = 1; top->clock = 0; @@ -48,49 +59,39 @@ void sim_init(int argc, char **argv) { top->clock = 0; step_and_dump_wave(); - // top->rootp->TestHarness__DOT__chiptop0__DOT__system__DOT__pbus__DOT__bootAddrReg - // = 0x80000000ULL; - // Low-level reset +#if VM_COVERAGE + atexit(coverage_atexit); + signal(SIGTERM, coverage_signal_handler); + signal(SIGINT, coverage_signal_handler); +#endif } void sim_exit() { contextp->timeInc(1); tfp->dump(contextp->time()); tfp->close(); - // printf("The wave data has been saved to the VCD file: %s\n", vcd_path); - printf("The wave data has been saved to the FST file: %s\n", fst_path); + if (bdb_sim_meta_path() == nullptr) { + printf("The wave data has been saved to the FST file: %s\n", fst_path); + } exit(0); } void ball_exec_once() { - top->clock ^= 1; - step_and_dump_wave(); - top->clock ^= 1; + // posedge: clock=1, eval (FF outputs settle), read fire pulse from RTL slave + top->clock = 1; + top->eval(); + mmio_tick(); // read io_mmio_fire; all AXI4 handshaking done in RTL + contextp->timeInc(1); + tfp->dump(contextp->time()); + sim_time++; + + // negedge: clock=0, eval + top->clock = 0; step_and_dump_wave(); - // top->rootp->TestHarness__DOT__chiptop0__DOT__system__DOT__pbus__DOT__bootAddrReg - // = 0x80000000ULL; - if (top->io_success == 1) { - printf("simulation success\n"); - sim_exit(); - } - // dump_gpr(); - // npc_step++; - // printf("bootAddrReg = 0x%x\n", - // top->rootp->TestHarness__DOT__chiptop0__DOT__system__DOT__pbus__DOT__bootAddrReg); - // Toggle twice to execute one instruction } -// void init_tet() { -// while (cpu_npc.pc != MEM_BASE) { -// // printf("%ld\n", cpu_npc.pc); -// npc_exec_once(); -// // npc_step--; -// } // PC advances until first instruction execution completes -// } - //================ main =====================// int main(int argc, char *argv[]) { - // Parse parameters here, including VCD path init_monitor(argc, argv); sim_init(argc, argv); bdb_mainloop(); diff --git a/arch/src/csrc/src/monitor/ioe/SimDRAM_bb.cc b/arch/src/csrc/src/monitor/ioe/SimDRAM_bb.cc new file mode 100644 index 00000000..70959f2a --- /dev/null +++ b/arch/src/csrc/src/monitor/ioe/SimDRAM_bb.cc @@ -0,0 +1,183 @@ +// SimDRAM_bb.cc — Override memory_init from testchipip's SimDRAM.cc +// Replaces fesvr load_elf with libelf-based ELF loader. +// This file is compiled into the simulation and takes precedence over +// the version embedded in the SimDRAM Verilog blackbox resource. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "mm_dramsim2.h" + +// Global state — defined here since testchipip's SimDRAM.cc is excluded from +// build. +bool use_dramsim = false; +std::string ini_dir = "dramsim2_ini"; +std::vector> backing_mem_data = {}; + +static std::string elf_file = ""; + +// --------------------------------------------------------------------------- +// load_elf_to_mem: parse ELF64 and copy PT_LOAD segments into backing data +// --------------------------------------------------------------------------- +static void load_elf_to_mem(const char *path, uint8_t *data, uint64_t mem_base, + uint64_t mem_size) { + int fd = open(path, O_RDONLY); + if (fd < 0) { + fprintf(stderr, "[SimDRAM_bb] Cannot open ELF: %s\n", path); + abort(); + } + + struct stat st; + fstat(fd, &st); + size_t file_size = st.st_size; + + uint8_t *file_buf = + (uint8_t *)mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); + if (file_buf == MAP_FAILED) { + fprintf(stderr, "[SimDRAM_bb] mmap failed for ELF: %s\n", path); + abort(); + } + + Elf64_Ehdr *ehdr = (Elf64_Ehdr *)file_buf; + if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0) { + fprintf(stderr, "[SimDRAM_bb] Not a valid ELF file: %s\n", path); + abort(); + } + if (ehdr->e_ident[EI_CLASS] != ELFCLASS64) { + fprintf(stderr, "[SimDRAM_bb] Only ELF64 supported\n"); + abort(); + } + + Elf64_Phdr *phdrs = (Elf64_Phdr *)(file_buf + ehdr->e_phoff); + size_t loaded = 0; + for (int i = 0; i < ehdr->e_phnum; i++) { + Elf64_Phdr *ph = &phdrs[i]; + if (ph->p_type != PT_LOAD) + continue; + if (ph->p_filesz == 0) + continue; + + uint64_t vaddr = ph->p_paddr; + if (vaddr < mem_base || vaddr + ph->p_memsz > mem_base + mem_size) { + fprintf(stderr, + "[SimDRAM_bb] Segment paddr=0x%lx size=0x%lx outside mem [0x%lx, " + "0x%lx)\n", + vaddr, ph->p_memsz, mem_base, mem_base + mem_size); + abort(); + } + uint64_t offset = vaddr - mem_base; + memcpy(data + offset, file_buf + ph->p_offset, ph->p_filesz); + if (ph->p_memsz > ph->p_filesz) + memset(data + offset + ph->p_filesz, 0, ph->p_memsz - ph->p_filesz); + loaded += ph->p_filesz; + } + + munmap(file_buf, file_size); + printf("[SimDRAM_bb] Loaded ELF '%s': %zu bytes\n", path, loaded); +} + +// --------------------------------------------------------------------------- +// memory_init — DPI-C from SimDRAM.v, called once at simulation start +// --------------------------------------------------------------------------- +extern "C" void *memory_init(int chip_id, long long int mem_size, + long long int word_size, long long int line_size, + long long int id_bits, long long int clock_hz, + long long int mem_base) { + mm_t *mm; + s_vpi_vlog_info info; + + std::string memory_ini = "DDR3_micron_64M_8B_x4_sg15.ini"; + std::string system_ini = "system.ini"; + std::string local_ini_dir = "dramsim2_ini"; + + if (!vpi_get_vlog_info(&info)) + abort(); + + for (int i = 1; i < info.argc; i++) { + std::string arg(info.argv[i]); + if (arg.find("+elf=") == 0) + elf_file = arg.substr(strlen("+elf=")); + if (arg == "+dramsim") + use_dramsim = true; + if (arg.find("+dramsim_ini_dir=") == 0) + local_ini_dir = arg.substr(strlen("+dramsim_ini_dir=")); + } + + while (chip_id >= (int)backing_mem_data.size()) + backing_mem_data.push_back(std::map()); + + if (backing_mem_data[chip_id].find(mem_base) != + backing_mem_data[chip_id].end()) { + assert(backing_mem_data[chip_id][mem_base].size == (size_t)mem_size); + } else { + uint8_t *data = (uint8_t *)mmap(NULL, mem_size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, -1, 0); + if (data == MAP_FAILED) { + fprintf(stderr, "[SimDRAM_bb] mmap for backing store failed\n"); + abort(); + } + memset(data, 0, mem_size); + + if (!elf_file.empty()) + load_elf_to_mem(elf_file.c_str(), data, (uint64_t)mem_base, + (uint64_t)mem_size); + + backing_mem_data[chip_id][mem_base] = {data, (size_t)mem_size}; + } + + if (use_dramsim) { + mm = (mm_t *)(new mm_dramsim2_t(mem_base, mem_size, word_size, line_size, + backing_mem_data[chip_id][mem_base], + memory_ini, system_ini, local_ini_dir, + 1 << id_bits, clock_hz)); + } else { + mm = (mm_t *)(new mm_magic_t(mem_base, mem_size, word_size, line_size, + backing_mem_data[chip_id][mem_base])); + } + + return mm; +} + +// --------------------------------------------------------------------------- +// memory_tick — DPI-C from SimDRAM.v / SimAXIMem.v, called each cycle +// --------------------------------------------------------------------------- +extern "C" void memory_tick( + void *channel, unsigned char reset, unsigned char ar_valid, + unsigned char *ar_ready, long long int ar_addr, int ar_id, int ar_size, + int ar_len, unsigned char aw_valid, unsigned char *aw_ready, + long long int aw_addr, int aw_id, int aw_size, int aw_len, + unsigned char w_valid, unsigned char *w_ready, int w_strb, long long w_data, + unsigned char w_last, unsigned char *r_valid, unsigned char r_ready, + int *r_id, int *r_resp, long long *r_data, unsigned char *r_last, + unsigned char *b_valid, unsigned char b_ready, int *b_id, int *b_resp) { + mm_t *mm = (mm_t *)channel; + mm->tick(reset, ar_valid, ar_addr, ar_id, ar_size, ar_len, aw_valid, aw_addr, + aw_id, aw_size, aw_len, w_valid, w_strb, &w_data, w_last, r_ready, + b_ready); + *ar_ready = mm->ar_ready(); + *aw_ready = mm->aw_ready(); + *w_ready = mm->w_ready(); + *r_valid = mm->r_valid(); + *r_id = mm->r_id(); + *r_resp = mm->r_resp(); + *r_data = *((long *)mm->r_data()); + *r_last = mm->r_last(); + *b_valid = mm->b_valid(); + *b_id = mm->b_id(); + *b_resp = mm->b_resp(); +} diff --git a/arch/src/csrc/src/monitor/ioe/mem.cc b/arch/src/csrc/src/monitor/ioe/mem.cc deleted file mode 100644 index e3bbce1e..00000000 --- a/arch/src/csrc/src/monitor/ioe/mem.cc +++ /dev/null @@ -1,41 +0,0 @@ -// #include "utils/mem.h" -#include "utils/debug.h" -#include -#include -#include -#include -#include -#include - -#define MEM_SIZE 100000 - -static uint8_t mem[MEM_SIZE] = {0}; - -// Load image from am-kernels (Makefile -> ./image.bin) -static long load_image(char const *img_file) { - if (img_file == NULL) { - printf("No image is given. Use the default build-in image."); - return 4096; // built-in image size - } - FILE *fp = fopen(img_file, "rb"); - Assert(fp, "Can not open '%s'", img_file); - - // fseek: set file position pointer related to fp to a specified position - // fseek(fp, 0, SEEK_END) positions file pointer to end of file, offset 0 - // bytes - fseek(fp, 0, SEEK_END); - // ftell: returns file size - static long img_size = ftell(fp); - - printf("The image is %s, size = %ld\n", img_file, img_size); - - // fseek(fp, 0, SEEK_SET) positions file pointer to beginning of file, offset - // 0 bytes - fseek(fp, 0, SEEK_SET); - // Read img_size from fp to mem - int ret = fread(mem, img_size, 1, fp); - assert(ret == 1); - - fclose(fp); - return img_size; -} diff --git a/arch/src/csrc/src/monitor/ioe/mmio.cc b/arch/src/csrc/src/monitor/ioe/mmio.cc new file mode 100644 index 00000000..f7ff0560 --- /dev/null +++ b/arch/src/csrc/src/monitor/ioe/mmio.cc @@ -0,0 +1,53 @@ +#include "ioe/mmio.h" +#include "bdb.h" + +#include +#include +#include + +#define SIM_EXIT_ADDR 0x60000000ULL +#define UART_TX_ADDR 0x60020000ULL + +static FILE *uart_fp = nullptr; + +static void uart_putchar(char ch) { + if (!uart_fp) { + const char *path = stdout_path ? stdout_path : "stdout.log"; + uart_fp = fopen(path, "w"); + } + if (uart_fp) { + fputc(ch, uart_fp); + fflush(uart_fp); + } + if (raw_stdout_fd >= 0) { + write(raw_stdout_fd, &ch, 1); + } +} + +// Called once per posedge after eval(). +// Rising-edge detection guards against MMIO clock being slower than the +// sim fast-clock — without it each character would be repeated multiple times. +void mmio_tick() { + static uint8_t prev_fire = 0; + uint8_t cur_fire = top->io_mmio_fire ? 1 : 0; + bool rising = (!prev_fire && cur_fire); + prev_fire = cur_fire; + if (!rising) + return; + + uint64_t addr = top->io_mmio_fire_addr; + uint64_t data = top->io_mmio_fire_data; + + if (addr == SIM_EXIT_ADDR) { + int code = (int)(data & 0xFFFFFFFF); + if (code == 0) + fprintf(stderr, "[MMIO] simulation success\n"); + else + fprintf(stderr, "[MMIO] simulation exit code %d\n", code); + if (uart_fp) + fclose(uart_fp); + sim_exit(); + } else if (addr == UART_TX_ADDR) { + uart_putchar((char)(data & 0xFF)); + } +} diff --git a/arch/src/csrc/src/monitor/ioe/tsi_stub.cc b/arch/src/csrc/src/monitor/ioe/tsi_stub.cc new file mode 100644 index 00000000..36647b0f --- /dev/null +++ b/arch/src/csrc/src/monitor/ioe/tsi_stub.cc @@ -0,0 +1,12 @@ +// tsi_stub.cc — stub for SimTSI DPI-C symbol. +// SimTSI is instantiated by chipyard's AbstractConfig via +// WithSimTSIOverSerialTL, but we don't use TSI. This stub satisfies the linker +// without pulling in fesvr. +extern "C" int tsi_tick(int chip_id, unsigned char out_valid, + unsigned char *out_ready, int out_bits, + unsigned char *in_valid, unsigned char in_ready, + int *in_bits) { + *out_ready = 0; + *in_valid = 0; + return 0; +} diff --git a/arch/src/csrc/src/monitor/monitor.cc b/arch/src/csrc/src/monitor/monitor.cc index 57edb3b2..cd3bc981 100644 --- a/arch/src/csrc/src/monitor/monitor.cc +++ b/arch/src/csrc/src/monitor/monitor.cc @@ -1,95 +1,156 @@ #include "bdb.h" #include "utils/debug.h" #include "utils/macro.h" -#include #include #include - +#include #include #include -#include "ioe/mem.cc" #include "utils/welcome.cc" -// Define global VCD path variable -const char *vcd_path = nullptr; - -// Define global log path variable +// Define global path variables const char *log_path = nullptr; const char *fst_path = nullptr; +const char *stdout_path = nullptr; +uint32_t bdb_trace_mask = BDB_TR_ALL; + +// Raw stdout fd saved before dup2 redirect — used by UART putchar for real-time +// display. +int raw_stdout_fd = -1; + +const char *bdb_sim_meta_path(void) { + const char *m = getenv("BDB_SIM_META"); + if (m != nullptr && m[0] != '\0') { + return m; + } + return nullptr; +} static int parse_args(int argc, char *argv[]) { - // const struct option table[] = { - // {"batch" , no_argument , NULL, 'b'}, - // {"vcd" , required_argument, NULL, 'v'}, - // {"log" , required_argument, NULL, 'l'}, - // {"help" , no_argument , NULL, 'h'}, - // }; - // int o; - // while ( (o = getopt_long(argc, argv, "bv:l:", table, NULL)) != -1) { - // switch (o) { - // case 'b': bdb_set_batch_mode(); break; - // case 'v': vcd_path = optarg; break; - // case 'l': log_path = optarg; break; - // default: - // printf("\t-b,--batch run with batch mode\n"); - // printf("\t-v,--vcd specify VCD file path\n"); - // printf("\t-l,--log specify log file path\n"); + auto parse_trace_list = [](const char *list) { + if (list == nullptr || *list == '\0') { + return; + } + char buf[256]; + int n = snprintf(buf, sizeof(buf), "%s", list); + Assert(n >= 0 && n < (int)sizeof(buf), "trace option too long: %s", list); + char *tok = strtok(buf, ","); + while (tok != NULL) { + if (strcmp(tok, "all") == 0) { + bdb_trace_mask = BDB_TR_ALL; + } else if (strcmp(tok, "none") == 0) { + bdb_trace_mask = 0; + } else if (strcmp(tok, "itrace") == 0) { + bdb_trace_mask |= BDB_TR_ITRACE; + } else if (strcmp(tok, "mtrace") == 0) { + bdb_trace_mask |= BDB_TR_MTRACE; + } else if (strcmp(tok, "pmctrace") == 0) { + bdb_trace_mask |= BDB_TR_PMCTRACE; + } else if (strcmp(tok, "ctrace") == 0) { + bdb_trace_mask |= BDB_TR_CTRACE; + } else if (strcmp(tok, "banktrace") == 0) { + bdb_trace_mask |= BDB_TR_BANKTRACE; + } else { + panic("Unknown +trace item: %s", tok); + } + tok = strtok(NULL, ","); + } + }; - // Manually parse Verilator-style parameters (+vcd=path, +log=path) for (int i = 1; i < argc; i++) { - if (strncmp(argv[i], "+vcd=", 5) == 0) { - // Skip "+vcd=" - vcd_path = argv[i] + 5; - } else if (strncmp(argv[i], "+fst=", 5) == 0) { - // Skip "+fst=" + if (strncmp(argv[i], "+fst=", 5) == 0) { fst_path = argv[i] + 5; } else if (strncmp(argv[i], "+log=", 5) == 0) { - // Skip "+log=" log_path = argv[i] + 5; + } else if (strncmp(argv[i], "+stdout=", 8) == 0) { + stdout_path = argv[i] + 8; + } else if (strncmp(argv[i], "+trace_mask=", 12) == 0) { + char *end = NULL; + unsigned long v = strtoul(argv[i] + 12, &end, 0); + Assert(end && *end == '\0', "Invalid +trace_mask value: %s", + argv[i] + 12); + bdb_trace_mask = (uint32_t)v; + } else if (strncmp(argv[i], "+trace=", 7) == 0) { + bdb_trace_mask = 0; + parse_trace_list(argv[i] + 7); } else if (strcmp(argv[i], "+batch") == 0) { bdb_set_batch_mode(); } else if (strcmp(argv[i], "+help") == 0) { printf("\t+batch run with batch mode\n"); - printf("\t+vcd= specify VCD file path\n"); + printf("\t+elf= specify ELF binary to load into DRAM\n"); printf("\t+log= specify log file path\n"); - printf("\t+fst= specify FST file path\n"); + printf("\t+stdout= specify UART output file path\n"); + printf("\t+fst= specify FST waveform file path\n"); + printf("\t+trace= trace list: " + "none|all|itrace,mtrace,pmctrace,ctrace,banktrace\n"); + printf("\t+trace_mask= bitfield itrace=1 mtrace=2 pmctrace=4 " + "ctrace=8 banktrace=16\n"); printf("\n"); exit(0); } + // +elf= is parsed by SimDRAM_bb.cc via vpi_get_vlog_info (Verilator + // plusargs) } - // Assert(vcd_path, "VCD file path is required. Use +vcd= to specify."); Assert(log_path, "Log file path is required. Use +log= to specify."); Assert(fst_path, "FST file path is required. Use +fst= to specify."); return 0; } static void init_log(const char *log_file) { - FILE *log_fp = NULL; - log_fp = stdout; if (log_file != NULL) { + // Keep a dedicated fd for UART real-time display. + // Do not redirect stdout to trace file, otherwise NDJSON is polluted. + raw_stdout_fd = dup(STDOUT_FILENO); + Assert(raw_stdout_fd >= 0, "dup(STDOUT_FILENO) failed"); FILE *fp = fopen(log_file, "w"); Assert(fp, "Can not open '%s'", log_file); - log_fp = fp; + fclose(fp); // truncate/create file; trace writers append NDJSON later + } + const char *meta = bdb_sim_meta_path(); + if (log_file) { + if (meta != nullptr) { + FILE *mf = fopen(meta, "w"); + Assert(mf, "Can not open BDB_SIM_META '%s'", meta); + fprintf(mf, "NDJSON trace is written to %s\n", log_file); + fprintf(mf, + "Trace mask=0x%X [itrace=%d mtrace=%d pmctrace=%d ctrace=%d " + "banktrace=%d]\n", + bdb_trace_mask, bdb_trace_on(BDB_TR_ITRACE), + bdb_trace_on(BDB_TR_MTRACE), bdb_trace_on(BDB_TR_PMCTRACE), + bdb_trace_on(BDB_TR_CTRACE), bdb_trace_on(BDB_TR_BANKTRACE)); + fclose(mf); + } else { + fprintf(stderr, "NDJSON trace is written to %s\n", log_file); + fprintf(stderr, + "Trace mask=0x%X [itrace=%d mtrace=%d pmctrace=%d ctrace=%d " + "banktrace=%d]\n", + bdb_trace_mask, bdb_trace_on(BDB_TR_ITRACE), + bdb_trace_on(BDB_TR_MTRACE), bdb_trace_on(BDB_TR_PMCTRACE), + bdb_trace_on(BDB_TR_CTRACE), bdb_trace_on(BDB_TR_BANKTRACE)); + } + } else { + if (meta != nullptr) { + FILE *mf = fopen(meta, "w"); + Assert(mf, "Can not open BDB_SIM_META '%s'", meta); + fprintf(mf, "NDJSON trace path is not set\n"); + fclose(mf); + } else { + fprintf(stderr, "NDJSON trace path is not set\n"); + } } - Log("Log is written to %s", log_file ? log_file : "stdout"); } static void init_io() { - // Force flush all output buffers fflush(stdout); fflush(stderr); - // Restore terminal echo functionality struct termios tty; if (tcgetattr(STDIN_FILENO, &tty) == 0) { - // Enable echo tty.c_lflag |= ECHO; - // Enable line buffered mode tty.c_lflag |= ICANON; tcsetattr(STDIN_FILENO, TCSANOW, &tty); - Log("Terminal echo restored"); } } @@ -97,7 +158,7 @@ void init_monitor(int argc, char *argv[]) { parse_args(argc, argv); init_log(log_path); init_io(); - // long img_size = - // load_image("/home/mio/Code/buckyball/arch/hello-baremetal"); - welcome(); + if (bdb_sim_meta_path() == nullptr) { + welcome(); + } } diff --git a/arch/src/csrc/src/monitor/trace/bdb_clk.cc b/arch/src/csrc/src/monitor/trace/bdb_clk.cc new file mode 100644 index 00000000..b11a72f7 --- /dev/null +++ b/arch/src/csrc/src/monitor/trace/bdb_clk.cc @@ -0,0 +1,9 @@ +#include "monitor/trace_cfg.h" + +// Harness reference clock cycle index (updated from RTL via dpi_bdb_set_clk +// each posedge). +uint64_t bdb_rtl_clk = 0; + +extern "C" void dpi_bdb_set_clk(unsigned long long c) { + bdb_rtl_clk = (uint64_t)c; +} diff --git a/arch/src/csrc/src/monitor/trace/ctrace.cc b/arch/src/csrc/src/monitor/trace/ctrace.cc new file mode 100644 index 00000000..039411c8 --- /dev/null +++ b/arch/src/csrc/src/monitor/trace/ctrace.cc @@ -0,0 +1,164 @@ +#include "monitor/trace.h" +#include "monitor/trace_cfg.h" +#include "utils/debug.h" +#include +#include +#include + +// Global log file pointer (shared with monitor.cc) +extern const char *log_path; +static FILE *ctrace_fp = NULL; + +// Backdoor write state +static uint32_t bd_write_row_cursor = 0; +static uint64_t bd_write_data_lo = 0; +static uint64_t bd_write_data_hi = 0; + +// Backdoor read state +static uint32_t bd_read_row_cursor = 0; + +// Backdoor test data generator +// Generates a simple incremental pattern for each row. +// 128-bit = 16 bytes, each byte = (row * 16 + byte_offset) & 0xFF +static void generate_test_data(uint32_t row, uint64_t *data_lo, + uint64_t *data_hi) { + uint8_t data[16]; + for (int i = 0; i < 16; i++) { + data[i] = (uint8_t)((row * 16 + i) & 0xFF); + } + *data_lo = 0; + *data_hi = 0; + for (int i = 0; i < 8; i++) { + *data_lo |= ((uint64_t)data[i]) << (i * 8); + *data_hi |= ((uint64_t)data[i + 8]) << (i * 8); + } +} + +static void init_ctrace() { + if (ctrace_fp == NULL && log_path != NULL) { + ctrace_fp = fopen(log_path, "a"); + if (ctrace_fp == NULL) { + panic("Failed to open ctrace log file: %s", log_path); + } + } +} + +static void u128_hex(char *buf, size_t n, unsigned long long hi, + unsigned long long lo) { + int ret = snprintf(buf, n, "0x%016llX%016llX", hi, lo); + if (ret < 0 || (size_t)ret >= n) { + panic("snprintf failed in ctrace u128_hex"); + } +} + +// ================================================================ +// Cycle counter DPI-C +// ================================================================ + +extern "C" void dpi_ctrace(unsigned char subcmd, unsigned int ctr_id, + unsigned long long tag, unsigned long long elapsed, + unsigned long long cycle) { + if (!bdb_trace_on(BDB_TR_CTRACE)) { + return; + } + init_ctrace(); + + if (ctrace_fp) { + switch (subcmd) { + case 0: // CTR_START + fprintf(ctrace_fp, + "{\"type\":\"ctrace\",\"clk\":%llu,\"event\":\"ctr_start\",\"ctr_" + "id\":%u," + "\"tag\":\"0x%llX\",\"cycle\":%llu}\n", + (unsigned long long)bdb_rtl_clk, ctr_id, tag, cycle); + break; + case 1: // CTR_STOP + fprintf(ctrace_fp, + "{\"type\":\"ctrace\",\"clk\":%llu,\"event\":\"ctr_stop\",\"ctr_" + "id\":%u," + "\"tag\":\"0x%llX\",\"elapsed\":%llu,\"cycle\":%llu}\n", + (unsigned long long)bdb_rtl_clk, ctr_id, tag, elapsed, cycle); + break; + case 2: // CTR_READ + fprintf(ctrace_fp, + "{\"type\":\"ctrace\",\"clk\":%llu,\"event\":\"ctr_read\",\"ctr_" + "id\":%u," + "\"current\":%llu,\"cycle\":%llu}\n", + (unsigned long long)bdb_rtl_clk, ctr_id, elapsed, cycle); + break; + } + fflush(ctrace_fp); + } +} + +// ================================================================ +// Backdoor DPI-C functions +// RTL calls these each iteration. C++ auto-increments row. +// bank_id comes from RTL (instruction encoding), not from C++. +// ================================================================ + +// RTL calls this to get the next row to read. +// Returns row in [31:0], upper bits = 0. +extern "C" unsigned long long dpi_backdoor_get_read_addr(void) { + uint32_t row = bd_read_row_cursor; + bd_read_row_cursor++; + return (uint64_t)row; +} + +// RTL calls this to get the next row to write. +// Also pre-generates test data for this row. +// Returns row in [31:0], upper bits = 0. +extern "C" unsigned long long dpi_backdoor_get_write_addr(void) { + uint32_t row = bd_write_row_cursor; + generate_test_data(row, &bd_write_data_lo, &bd_write_data_hi); + bd_write_row_cursor++; + return (uint64_t)row; +} + +// RTL calls this to get write data (128-bit as lo/hi). +// Data was pre-generated by the preceding get_write_addr call. +extern "C" void dpi_backdoor_get_write_data(unsigned long long *data_lo, + unsigned long long *data_hi) { + *data_lo = bd_write_data_lo; + *data_hi = bd_write_data_hi; +} + +// RTL calls this after reading SRAM — logs the data to NDJSON trace file +extern "C" void dpi_backdoor_put_read_data(unsigned int bank_id, + unsigned int row, + unsigned long long data_lo, + unsigned long long data_hi) { + if (!bdb_trace_on(BDB_TR_BANKTRACE)) { + return; + } + init_ctrace(); + if (ctrace_fp) { + char data_hex[35]; + u128_hex(data_hex, sizeof(data_hex), data_hi, data_lo); + fprintf(ctrace_fp, + "{\"type\":\"banktrace\",\"clk\":%llu,\"event\":\"backdoor_read\"," + "\"bank_id\":%u,\"row\":%u,\"data\":\"%s\"}\n", + (unsigned long long)bdb_rtl_clk, bank_id, row, data_hex); + fflush(ctrace_fp); + } +} + +// RTL calls this after writing SRAM — logs the write to NDJSON trace file +extern "C" void dpi_backdoor_put_write_done(unsigned int bank_id, + unsigned int row, + unsigned long long data_lo, + unsigned long long data_hi) { + if (!bdb_trace_on(BDB_TR_BANKTRACE)) { + return; + } + init_ctrace(); + if (ctrace_fp) { + char data_hex[35]; + u128_hex(data_hex, sizeof(data_hex), data_hi, data_lo); + fprintf(ctrace_fp, + "{\"type\":\"banktrace\",\"clk\":%llu,\"event\":\"backdoor_write\"," + "\"bank_id\":%u,\"row\":%u,\"data\":\"%s\"}\n", + (unsigned long long)bdb_rtl_clk, bank_id, row, data_hex); + fflush(ctrace_fp); + } +} diff --git a/arch/src/csrc/src/monitor/trace/halt.cc b/arch/src/csrc/src/monitor/trace/halt.cc new file mode 100644 index 00000000..49799d7f --- /dev/null +++ b/arch/src/csrc/src/monitor/trace/halt.cc @@ -0,0 +1,9 @@ +#include "monitor/halt.h" +#include "bdb.h" +#include + +// Called from HaltDPI.sv via DPI-C when ebreak exception is detected +void dpi_sim_halt(void) { + printf("simulation success\n"); + sim_exit(); +} diff --git a/arch/src/csrc/src/monitor/trace/itrace.cc b/arch/src/csrc/src/monitor/trace/itrace.cc new file mode 100644 index 00000000..4262d322 --- /dev/null +++ b/arch/src/csrc/src/monitor/trace/itrace.cc @@ -0,0 +1,94 @@ +#include "monitor/trace.h" +#include "monitor/trace_cfg.h" +#include "utils/debug.h" +#include +#include + +// Global log file pointer (shared with monitor.cc) +extern const char *log_path; +static FILE *itrace_fp = NULL; + +// Initialize itrace logging +static void init_itrace() { + if (itrace_fp == NULL && log_path != NULL) { + itrace_fp = fopen(log_path, "a"); + if (itrace_fp == NULL) { + panic("Failed to open itrace log file: %s", log_path); + } + } +} + +// bank_enable encoding (funct7[6:4]): +// 000 = none, 001 = 1rd, 010 = 1wr, 011 = 1rd+1wr, 100 = 2rd+1wr +// 101/110/111 = none (extended opcode space) +static const char *bank_enable_str(unsigned char enable) { + switch (enable) { + case 0: + return "---"; + case 1: + return "R--"; + case 2: + return "--W"; + case 3: + return "R-W"; + case 4: + return "RRW"; + default: + return "---"; // 5,6,7 = no bank access (extended) + } +} + +static void u64_hex(char *buf, size_t n, unsigned long long v) { + int ret = snprintf(buf, n, "0x%016llx", v); + if (ret < 0 || (size_t)ret >= n) { + panic("snprintf failed in itrace u64_hex"); + } +} + +// DPI-C function for instruction trace (itrace) +// Called when an instruction is allocated/issued/completed in GlobalROB +extern "C" void +dpi_itrace(unsigned char is_issue, // 2 = alloc, 1 = issue, 0 = complete + unsigned int rob_id, unsigned int domain_id, unsigned int funct, + unsigned long long pc, unsigned long long rs1, + unsigned long long rs2, unsigned char bank_enable) { + if (!bdb_trace_on(BDB_TR_ITRACE)) { + return; + } + init_itrace(); + + if (itrace_fp) { + char pc_hex[19]; + char rs1_hex[19]; + char rs2_hex[19]; + u64_hex(pc_hex, sizeof(pc_hex), pc); + u64_hex(rs1_hex, sizeof(rs1_hex), rs1); + u64_hex(rs2_hex, sizeof(rs2_hex), rs2); + if (is_issue == 2) { + fprintf( + itrace_fp, + "{\"type\":\"itrace\",\"clk\":%llu,\"event\":\"alloc\",\"rob_id\":%u," + "\"domain_id\":%u,\"funct\":\"0x%02x\",\"bank_enable\":%u," + "\"bank\":\"%s\",\"pc\":\"%s\",\"rs1\":\"%s\",\"rs2\":\"%s\"}\n", + (unsigned long long)bdb_rtl_clk, rob_id, domain_id, funct, + bank_enable, bank_enable_str(bank_enable), pc_hex, rs1_hex, rs2_hex); + } else if (is_issue == 1) { + fprintf( + itrace_fp, + "{\"type\":\"itrace\",\"clk\":%llu,\"event\":\"issue\",\"rob_id\":%u," + "\"domain_id\":%u,\"funct\":\"0x%02x\",\"bank_enable\":%u," + "\"bank\":\"%s\",\"pc\":\"%s\",\"rs1\":\"%s\",\"rs2\":\"%s\"}\n", + (unsigned long long)bdb_rtl_clk, rob_id, domain_id, funct, + bank_enable, bank_enable_str(bank_enable), pc_hex, rs1_hex, rs2_hex); + } else { + fprintf(itrace_fp, + "{\"type\":\"itrace\",\"clk\":%llu,\"event\":\"complete\",\"rob_" + "id\":%u," + "\"domain_id\":%u,\"funct\":\"0x%02x\",\"bank_enable\":%u," + "\"bank\":\"%s\",\"pc\":\"%s\"}\n", + (unsigned long long)bdb_rtl_clk, rob_id, domain_id, funct, + bank_enable, bank_enable_str(bank_enable), pc_hex); + } + fflush(itrace_fp); + } +} diff --git a/arch/src/csrc/src/monitor/trace/mtrace.cc b/arch/src/csrc/src/monitor/trace/mtrace.cc new file mode 100644 index 00000000..9e339e1e --- /dev/null +++ b/arch/src/csrc/src/monitor/trace/mtrace.cc @@ -0,0 +1,64 @@ +#include "monitor/trace.h" +#include "monitor/trace_cfg.h" +#include "utils/debug.h" +#include +#include + +// Global log file pointer (shared with monitor.cc) +extern const char *log_path; +static FILE *mtrace_fp = NULL; + +// Initialize mtrace logging +static void init_mtrace() { + if (mtrace_fp == NULL && log_path != NULL) { + mtrace_fp = fopen(log_path, "a"); + if (mtrace_fp == NULL) { + panic("Failed to open mtrace log file: %s", log_path); + } + } +} + +static void u128_hex(char *buf, size_t n, unsigned long long hi, + unsigned long long lo) { + int ret = snprintf(buf, n, "0x%016llx%016llx", hi, lo); + if (ret < 0 || (size_t)ret >= n) { + panic("snprintf failed in mtrace u128_hex"); + } +} + +// DPI-C function for memory trace (mtrace) +// Called when MemBackend performs read/write operations +extern "C" void dpi_mtrace(unsigned char is_write, // 1 = write, 0 = read + unsigned char is_shared, unsigned int channel, + unsigned long long hart_id, unsigned int vbank_id, + unsigned int group_id, unsigned int addr, + unsigned long long data_lo, + unsigned long long data_hi) { + if (!bdb_trace_on(BDB_TR_MTRACE)) { + return; + } + init_mtrace(); + + if (mtrace_fp) { + char data_hex[35]; + if (is_write) { + u128_hex(data_hex, sizeof(data_hex), data_hi, data_lo); + fprintf(mtrace_fp, + "{\"type\":\"mtrace\",\"clk\":%llu,\"event\":\"write\"," + "\"channel\":%u," + "\"hart_id\":%llu,\"is_shared\":%u,\"vbank_id\":%u," + "\"group_id\":%u,\"addr\":\"0x%08x\",\"data\":\"%s\"}\n", + (unsigned long long)bdb_rtl_clk, channel, hart_id, is_shared, + vbank_id, group_id, addr, data_hex); + } else { + fprintf( + mtrace_fp, + "{\"type\":\"mtrace\",\"clk\":%llu,\"event\":\"read\",\"channel\":%u," + "\"hart_id\":%llu,\"is_shared\":%u,\"vbank_id\":%u," + "\"group_id\":%u,\"addr\":\"0x%08x\"}\n", + (unsigned long long)bdb_rtl_clk, channel, hart_id, is_shared, + vbank_id, group_id, addr); + } + fflush(mtrace_fp); + } +} diff --git a/arch/src/csrc/src/monitor/trace/pmctrace.cc b/arch/src/csrc/src/monitor/trace/pmctrace.cc new file mode 100644 index 00000000..62fee501 --- /dev/null +++ b/arch/src/csrc/src/monitor/trace/pmctrace.cc @@ -0,0 +1,63 @@ +#include "monitor/trace.h" +#include "monitor/trace_cfg.h" +#include "utils/debug.h" +#include +#include + +// Global log file pointer (shared with monitor.cc) +extern const char *log_path; +static FILE *pmctrace_fp = NULL; + +// Initialize pmctrace logging +static void init_pmctrace() { + if (pmctrace_fp == NULL && log_path != NULL) { + pmctrace_fp = fopen(log_path, "a"); + if (pmctrace_fp == NULL) { + panic("Failed to open pmctrace log file: %s", log_path); + } + } +} + +// DPI-C function for Ball PMC trace +// Called when a Ball completes a task, reports elapsed cycles +extern "C" void dpi_pmctrace(unsigned int ball_id, unsigned int rob_id, + unsigned long long elapsed) { + if (!bdb_trace_on(BDB_TR_PMCTRACE)) { + return; + } + init_pmctrace(); + + if (pmctrace_fp) { + fprintf( + pmctrace_fp, + "{\"type\":\"pmctrace\",\"clk\":%llu,\"event\":\"ball\",\"ball_id\":%u," + "\"rob_id\":%u,\"elapsed\":%llu}\n", + (unsigned long long)bdb_rtl_clk, ball_id, rob_id, elapsed); + fflush(pmctrace_fp); + } +} + +// DPI-C function for Memory PMC trace +// Called when a load/store completes, reports elapsed cycles +extern "C" void dpi_mem_pmctrace(unsigned char is_store, unsigned int rob_id, + unsigned long long elapsed) { + if (!bdb_trace_on(BDB_TR_PMCTRACE)) { + return; + } + init_pmctrace(); + + if (pmctrace_fp) { + if (is_store) { + fprintf(pmctrace_fp, + "{\"type\":\"pmctrace\",\"clk\":%llu,\"event\":\"store\"," + "\"rob_id\":%u,\"elapsed\":%llu}\n", + (unsigned long long)bdb_rtl_clk, rob_id, elapsed); + } else { + fprintf(pmctrace_fp, + "{\"type\":\"pmctrace\",\"clk\":%llu,\"event\":\"load\"," + "\"rob_id\":%u,\"elapsed\":%llu}\n", + (unsigned long long)bdb_rtl_clk, rob_id, elapsed); + } + fflush(pmctrace_fp); + } +} diff --git a/arch/src/csrc/src/utils/verilated_vpi.cpp b/arch/src/csrc/src/utils/verilated_vpi.cpp deleted file mode 100644 index e0437522..00000000 --- a/arch/src/csrc/src/utils/verilated_vpi.cpp +++ /dev/null @@ -1,3087 +0,0 @@ -// -*- mode: C++; c-file-style: "cc-mode" -*- -//************************************************************************* -// -// Code available from: https://verilator.org -// -// Copyright 2009-2024 by Wilson Snyder. This program is free software; you can -// redistribute it and/or modify it under the terms of either the GNU -// Lesser General Public License Version 3 or the Perl Artistic License -// Version 2.0. -// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0 -// -//========================================================================= -/// -/// \file -/// \brief Verilated VPI implementation code -/// -/// This file must be compiled and linked against all Verilated objects -/// that use the VPI. -/// -/// Use "verilator --vpi" to add this to the Makefile for the linker. -/// -/// For documentation on the exported functions (named vpi_*) that are -/// implemented here, refer to the IEEE DPI chapter. -/// -//========================================================================= - -#define VERILATOR_VERILATED_VPI_CPP_ - -#include "verilated_vpi.h" - -#include "verilated.h" -#include "verilated_imp.h" - -#include -#include -#include -#include -#include -#include -#include - -//====================================================================== -// Internal constants - -#define VL_DEBUG_IF_PLI VL_DEBUG_IF -constexpr unsigned VL_VPI_LINE_SIZE_ = 8192; - -//====================================================================== -// Internal macros - -#define VL_VPI_INTERNAL_ \ - VerilatedVpiImp::error_info()->setMessage(vpiInternal)->setMessage -#define VL_VPI_SYSTEM_ \ - VerilatedVpiImp::error_info()->setMessage(vpiSystem)->setMessage -#define VL_VPI_ERROR_ \ - VerilatedVpiImp::error_info()->setMessage(vpiError)->setMessage -#define VL_VPI_WARNING_ \ - VerilatedVpiImp::error_info()->setMessage(vpiWarning)->setMessage -#define VL_VPI_NOTICE_ \ - VerilatedVpiImp::error_info()->setMessage(vpiNotice)->setMessage -#define VL_VPI_ERROR_RESET_ VerilatedVpiImp::error_info()->resetError - -// Not supported yet -#define VL_VPI_UNIMP_() \ - (VL_VPI_ERROR_(__FILE__, __LINE__, \ - Verilated::catName("Unsupported VPI function: ", __func__))) - -//====================================================================== -// Implementation - -// Base VPI handled object -class VerilatedVpio VL_NOT_FINAL { - // CONSTANTS - // Magic value stored in front of object to detect double free etc - // Must be odd, as aligned pointer can never be odd - static constexpr uint32_t activeMagic() VL_PURE { return 0xfeed100f; } - - // MEM MANGLEMENT - // Internal note: Globals may multi-construct, see verilated.cpp top. - static thread_local uint8_t *t_freeHeadp; - -public: - // CONSTRUCTORS - VerilatedVpio() = default; - virtual ~VerilatedVpio() = default; - static void *operator new(size_t size) VL_MT_SAFE { - // We new and delete tons of vpi structures, so keep them around - // To simplify our free list, we use a size large enough for all derived - // types We reserve word zero for the next pointer, as that's safer in case - // a dangling reference to the original remains around. - static constexpr size_t CHUNK_SIZE = 96; - if (VL_UNCOVERABLE(size > CHUNK_SIZE)) - VL_FATAL_MT(__FILE__, __LINE__, "", "increase CHUNK_SIZE"); - if (VL_LIKELY(t_freeHeadp)) { - uint8_t *const newp = t_freeHeadp; - t_freeHeadp = *(reinterpret_cast(newp)); - *(reinterpret_cast(newp)) = activeMagic(); - return newp + 8; - } - // +8: 8 bytes for next - uint8_t *newp = reinterpret_cast(::operator new(CHUNK_SIZE + 8)); - *(reinterpret_cast(newp)) = activeMagic(); - return newp + 8; - } - static void operator delete(void *obj, size_t /*size*/) VL_MT_SAFE { - uint8_t *const oldp = (static_cast(obj)) - 8; - if (VL_UNLIKELY(*(reinterpret_cast(oldp)) != activeMagic())) { - VL_FATAL_MT(__FILE__, __LINE__, "", - "vpi_release_handle() called on same object twice, or on " - "non-Verilator " - "VPI object"); - } -#ifdef VL_VPI_IMMEDIATE_FREE // Define to aid in finding leaky handles - ::operator delete(oldp); -#else - *(reinterpret_cast(oldp)) = t_freeHeadp; - t_freeHeadp = oldp; -#endif - } - // MEMBERS - static VerilatedVpio *castp(vpiHandle h) { - return dynamic_cast(reinterpret_cast(h)); - } - vpiHandle castVpiHandle() { return reinterpret_cast(this); } - // ACCESSORS - virtual const char *name() const { return ""; } - virtual const char *fullname() const { return ""; } - virtual const char *defname() const { return ""; } - virtual uint32_t type() const { return 0; } - virtual uint32_t constType() const { return vpiUndefined; } - virtual uint32_t size() const { return 0; } - virtual const VerilatedRange *rangep() const { return nullptr; } - virtual vpiHandle dovpi_scan() { return nullptr; } - virtual PLI_INT32 dovpi_remove_cb() { return 0; } -}; - -class VerilatedVpioReasonCb final : public VerilatedVpio { - // A handle to a timed or non-timed callback created with vpi_register_cb - // User can call vpi_remove_cb or vpi_release_handle on it - const uint64_t m_id; // Unique id/sequence number to find schedule's event - const QData m_time; // Scheduled time, or 0 = not timed - const PLI_INT32 m_reason; // VPI callback reason code - -public: - // cppcheck-suppress uninitVar // m_value - VerilatedVpioReasonCb(uint64_t id, QData time, PLI_INT32 reason) - : m_id{id}, m_time{time}, m_reason{reason} {} - ~VerilatedVpioReasonCb() override = default; - static VerilatedVpioReasonCb *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiCallback; } - PLI_INT32 dovpi_remove_cb() override; -}; - -class VerilatedVpioConst final : public VerilatedVpio { - const int32_t m_num; - -public: - explicit VerilatedVpioConst(int32_t num) : m_num{num} {} - ~VerilatedVpioConst() override = default; - static VerilatedVpioConst *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiConstant; } - uint32_t constType() const override { return vpiDecConst; } - int32_t num() const { return m_num; } -}; - -class VerilatedVpioVarBase VL_NOT_FINAL : public VerilatedVpio { -protected: - const VerilatedVar *m_varp = nullptr; - const VerilatedScope *m_scopep = nullptr; - std::string m_fullname; - const VerilatedRange &get_range() const { - // Determine number of dimensions and return outermost - return (m_varp->dims() > 1) ? m_varp->unpacked() : m_varp->packed(); - } - -public: - VerilatedVpioVarBase(const VerilatedVar *varp, const VerilatedScope *scopep) - : m_varp{varp}, m_scopep{scopep}, - m_fullname{std::string{m_scopep->name()} + '.' + m_varp->name()} {} - explicit VerilatedVpioVarBase(const VerilatedVpioVarBase *varp) { - if (varp) { - m_varp = varp->m_varp; - m_scopep = varp->m_scopep; - m_fullname = varp->m_fullname; - } - } - static VerilatedVpioVarBase *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - const VerilatedVar *varp() const { return m_varp; } - const VerilatedScope *scopep() const { return m_scopep; } - uint32_t size() const override { return get_range().elements(); } - const VerilatedRange *rangep() const override { return &get_range(); } - const char *name() const override { return m_varp->name(); } - const char *fullname() const override { return m_fullname.c_str(); } -}; - -class VerilatedVpioParam final : public VerilatedVpioVarBase { -public: - VerilatedVpioParam(const VerilatedVar *varp, const VerilatedScope *scopep) - : VerilatedVpioVarBase{varp, scopep} {} - ~VerilatedVpioParam() override = default; - - static VerilatedVpioParam *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiParameter; } - uint32_t constType() const override { - switch (m_varp->vltype()) { - case VLVT_UINT8: - case VLVT_UINT16: - case VLVT_UINT32: - case VLVT_UINT64: - case VLVT_WDATA: - return vpiDecConst; - case VLVT_STRING: - return vpiStringConst; - case VLVT_REAL: - return vpiRealConst; - default: - return vpiUndefined; - } - } - void *varDatap() const { return m_varp->datap(); } -}; - -class VerilatedVpioRange final : public VerilatedVpio { - const VerilatedRange *const m_rangep; - -public: - explicit VerilatedVpioRange(const VerilatedRange *rangep) - : m_rangep{rangep} {} - ~VerilatedVpioRange() override = default; - static VerilatedVpioRange *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiRange; } - uint32_t size() const override { return m_rangep->elements(); } - const VerilatedRange *rangep() const override { return m_rangep; } -}; - -class VerilatedVpioRangeIter final : public VerilatedVpio { - // Only supports 1 dimension - const VerilatedRange *const m_rangep; - bool m_done = false; - -public: - explicit VerilatedVpioRangeIter(const VerilatedRange *rangep) - : m_rangep{rangep} {} - ~VerilatedVpioRangeIter() override = default; - static VerilatedVpioRangeIter *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiIterator; } - vpiHandle dovpi_scan() override { - if (VL_UNLIKELY(m_done)) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; - } - m_done = true; - return ((new VerilatedVpioRange{m_rangep})->castVpiHandle()); - } -}; - -class VerilatedVpioScope VL_NOT_FINAL : public VerilatedVpio { -protected: - const VerilatedScope *const m_scopep; - bool m_toplevel = false; - -public: - explicit VerilatedVpioScope(const VerilatedScope *scopep) : m_scopep{scopep} { - std::string scopename = m_scopep->name(); - std::string::size_type pos = std::string::npos; - // Look for '.' not inside escaped identifier - size_t i = 0; - while (i < scopename.length()) { - if (scopename[i] == '\\') { - while (i < scopename.length() && scopename[i] != ' ') - ++i; - ++i; // Proc ' ', it should always be there. Then grab '.' on next cycle - } else { - while (i < scopename.length() && scopename[i] != '.') - ++i; - if (i < scopename.length()) - pos = i++; - } - } - if (VL_UNLIKELY(pos == std::string::npos)) - m_toplevel = true; - } - ~VerilatedVpioScope() override = default; - static VerilatedVpioScope *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiScope; } - const VerilatedScope *scopep() const { return m_scopep; } - const char *name() const override { return m_scopep->name(); } - const char *fullname() const override { return m_scopep->name(); } - bool toplevel() const { return m_toplevel; } -}; - -class VerilatedVpioVar VL_NOT_FINAL : public VerilatedVpioVarBase { - uint8_t *m_prevDatap = nullptr; // Previous value of data, for cbValueChange - union { - uint8_t u8[4]; - uint32_t u32; - } m_mask; // memoized variable mask - uint32_t m_entSize = 0; // memoized variable size -protected: - void *m_varDatap = nullptr; // varp()->datap() adjusted for array entries - int32_t m_index = 0; - -public: - VerilatedVpioVar(const VerilatedVar *varp, const VerilatedScope *scopep) - : VerilatedVpioVarBase{varp, scopep} { - m_mask.u32 = VL_MASK_I(varp->packed().elements()); - m_entSize = varp->entSize(); - m_varDatap = varp->datap(); - } - explicit VerilatedVpioVar(const VerilatedVpioVar *varp) - : VerilatedVpioVarBase{varp} { - if (varp) { - m_mask.u32 = varp->m_mask.u32; - m_entSize = varp->m_entSize; - m_varDatap = varp->m_varDatap; - m_index = varp->m_index; - // Not copying m_prevDatap, must be nullptr - } else { - m_mask.u32 = 0; - } - } - ~VerilatedVpioVar() override { - if (m_prevDatap) - VL_DO_CLEAR(delete[] m_prevDatap, m_prevDatap = nullptr); - } - static VerilatedVpioVar *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t mask() const { return m_mask.u32; } - uint8_t mask_byte(int idx) const { return m_mask.u8[idx & 3]; } - uint32_t entSize() const { return m_entSize; } - uint32_t index() const { return m_index; } - uint32_t type() const override { - uint32_t type = vpiReg; - switch (varp()->vltype()) { - case VLVT_REAL: - type = vpiRealVar; - break; - case VLVT_STRING: - type = vpiStringVar; - break; - default: - break; - } - return (varp()->dims() > 1) ? vpiMemory : type; // but might be wire, logic - } - void *prevDatap() const { return m_prevDatap; } - void *varDatap() const { return m_varDatap; } - void createPrevDatap() { - if (VL_UNLIKELY(!m_prevDatap)) { - m_prevDatap = new uint8_t[entSize()]; - std::memcpy(prevDatap(), varp()->datap(), entSize()); - } - } -}; - -class VerilatedVpioMemoryWord final : public VerilatedVpioVar { -public: - VerilatedVpioMemoryWord(const VerilatedVar *varp, - const VerilatedScope *scopep, int32_t index, - int offset) - : VerilatedVpioVar{varp, scopep} { - m_index = index; - m_varDatap = (static_cast(varp->datap())) + entSize() * offset; - } - ~VerilatedVpioMemoryWord() override = default; - static VerilatedVpioMemoryWord *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiMemoryWord; } - uint32_t size() const override { return varp()->packed().elements(); } - const VerilatedRange *rangep() const override { return &(varp()->packed()); } - const char *fullname() const override { - static thread_local std::string t_out; - constexpr size_t LEN_MAX_INDEX = 25; - char num[LEN_MAX_INDEX]; - VL_SNPRINTF(num, LEN_MAX_INDEX, "%d", m_index); - t_out = std::string{scopep()->name()} + "." + name() + "[" + num + "]"; - return t_out.c_str(); - } -}; - -class VerilatedVpioVarIter final : public VerilatedVpio { - const VerilatedScope *const m_scopep; - VerilatedVarNameMap::const_iterator m_it; - bool m_started = false; - const VerilatedScope *m_topscopep = nullptr; - bool m_onlyParams; - -public: - explicit VerilatedVpioVarIter(const VerilatedVpioScope *vop, - bool onlyParams = false) - : m_scopep{vop->scopep()}, m_onlyParams{onlyParams} { - if (VL_UNLIKELY(vop->toplevel())) - // This is a toplevel, so get TOP scope to search for ports during - // vpi_scan. - m_topscopep = Verilated::threadContextp()->scopeFind("TOP"); - } - ~VerilatedVpioVarIter() override = default; - static VerilatedVpioVarIter *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiIterator; } - vpiHandle dovpi_scan() override { - if (VL_UNLIKELY(!m_scopep->varsp())) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; // End of list - only one deep - } - while (true) { - const VerilatedVarNameMap *const varsp = m_scopep->varsp(); - if (VL_UNLIKELY(!m_started)) { - m_it = varsp->begin(); - m_started = true; - } else if (VL_UNLIKELY(m_it == varsp->end())) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; - } else { - ++m_it; - } - if (VL_UNLIKELY(m_it == varsp->end())) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; - } - if (m_onlyParams && !m_it->second.isParam()) - continue; - if (VL_UNLIKELY(m_topscopep)) { - if (const VerilatedVar *topvarp = - m_topscopep->varFind(m_it->second.name())) { - if (topvarp->isParam()) { - return ((new VerilatedVpioParam{topvarp, m_topscopep}) - ->castVpiHandle()); - } else { - return ( - (new VerilatedVpioVar{topvarp, m_topscopep})->castVpiHandle()); - } - } - } - if (m_it->second.isParam()) { - return ((new VerilatedVpioParam{&(m_it->second), m_scopep}) - ->castVpiHandle()); - } else { - return ( - (new VerilatedVpioVar{&(m_it->second), m_scopep})->castVpiHandle()); - } - } - } -}; - -class VerilatedVpioMemoryWordIter final : public VerilatedVpio { - const vpiHandle m_handle; - const VerilatedVar *const m_varp; - int32_t m_iteration; - const int32_t m_direction; - bool m_done = false; - -public: - VerilatedVpioMemoryWordIter(const vpiHandle handle, const VerilatedVar *varp) - : m_handle{handle}, m_varp{varp}, m_iteration{varp->unpacked().right()}, - m_direction{ - VL_LIKELY(varp->unpacked().left() > varp->unpacked().right()) - ? 1 - : -1} {} - ~VerilatedVpioMemoryWordIter() override = default; - static VerilatedVpioMemoryWordIter *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiIterator; } - void iterationInc() { - if (!(m_done = (m_iteration == m_varp->unpacked().left()))) - m_iteration += m_direction; - } - vpiHandle dovpi_scan() override { - if (VL_UNLIKELY(m_done)) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; - } - const vpiHandle result = vpi_handle_by_index(m_handle, m_iteration); - iterationInc(); - return result; - } -}; - -class VerilatedVpioModule final : public VerilatedVpioScope { - const char *m_name; - const char *m_fullname; - -public: - explicit VerilatedVpioModule(const VerilatedScope *modulep) - : VerilatedVpioScope{modulep} { - m_fullname = m_scopep->name(); - if (std::strncmp(m_fullname, "TOP.", 4) == 0) - m_fullname += 4; - m_name = m_scopep->identifier(); - } - static VerilatedVpioModule *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiModule; } - const char *name() const override { return m_name; } - const char *fullname() const override { return m_fullname; } -}; - -class VerilatedVpioModuleIter final : public VerilatedVpio { - const std::vector *m_vec; - std::vector::const_iterator m_it; - -public: - explicit VerilatedVpioModuleIter( - const std::vector &vec) - : m_vec{&vec} { - m_it = m_vec->begin(); - } - ~VerilatedVpioModuleIter() override = default; - static VerilatedVpioModuleIter *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiIterator; } - vpiHandle dovpi_scan() override { - while (true) { - if (m_it == m_vec->end()) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; - } - const VerilatedScope::Type itype = (*m_it)->type(); - const VerilatedScope *const modp = *m_it++; - if (itype == VerilatedScope::SCOPE_MODULE) { - return (new VerilatedVpioModule{modp})->castVpiHandle(); - } - } - } -}; - -class VerilatedVpioPackage final : public VerilatedVpioScope { - std::string m_name; - std::string m_fullname; - -public: - explicit VerilatedVpioPackage(const VerilatedScope *modulep) - : VerilatedVpioScope{modulep} { - const char *sfullname = m_scopep->name(); - if (std::strncmp(sfullname, "TOP.", 4) == 0) - sfullname += 4; - m_fullname = std::string{sfullname} + "::"; - if (m_fullname == "\\$unit ::") - m_fullname = "$unit::"; - m_name = std::string(m_scopep->identifier()); - if (m_name == "\\$unit ") - m_name = "$unit"; - } - static VerilatedVpioPackage *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiPackage; } - const char *name() const override { return m_name.c_str(); } - const char *fullname() const override { return m_fullname.c_str(); } -}; - -class VerilatedVpioInstanceIter final : public VerilatedVpio { - const std::vector *m_vec; - std::vector::const_iterator m_it; - -public: - explicit VerilatedVpioInstanceIter( - const std::vector &vec) - : m_vec{&vec} { - m_it = m_vec->begin(); - } - ~VerilatedVpioInstanceIter() override = default; - static VerilatedVpioInstanceIter *castp(vpiHandle h) { - return dynamic_cast( - reinterpret_cast(h)); - } - uint32_t type() const override { return vpiIterator; } - vpiHandle dovpi_scan() override { - while (true) { - if (m_it == m_vec->end()) { - delete this; // IEEE 37.2.2 vpi_scan at end does a vpi_release_handle - return nullptr; - } - const VerilatedScope::Type itype = (*m_it)->type(); - const VerilatedScope *const modp = *m_it++; - if (itype == VerilatedScope::SCOPE_MODULE) { - return (new VerilatedVpioModule{modp})->castVpiHandle(); - } - if (itype == VerilatedScope::SCOPE_PACKAGE) { - return (new VerilatedVpioPackage{modp})->castVpiHandle(); - } - } - } -}; - -//====================================================================== - -using VerilatedPliCb = PLI_INT32 (*)(struct t_cb_data *); - -class VerilatedVpiCbHolder final { - // Holds information needed to call a callback - uint64_t - m_id; // Unique id/sequence number to find schedule's event, 0 = invalid - s_cb_data m_cbData; - s_vpi_value m_value; - VerilatedVpioVar - m_varo; // If a cbValueChange callback, the object we will return - -public: - // cppcheck-suppress uninitVar // m_value - VerilatedVpiCbHolder(uint64_t id, const s_cb_data *cbDatap, - const VerilatedVpioVar *varop) - : m_id{id}, m_cbData{*cbDatap}, m_varo{varop} { - m_value.format = cbDatap->value ? cbDatap->value->format : vpiSuppressVal; - m_cbData.value = &m_value; - if (varop) { - m_cbData.obj = m_varo.castVpiHandle(); - m_varo.createPrevDatap(); - } else { - m_cbData.obj = nullptr; - } - } - ~VerilatedVpiCbHolder() = default; - VerilatedPliCb cb_rtnp() const { return m_cbData.cb_rtn; } - s_cb_data *cb_datap() { return &m_cbData; } - uint64_t id() const { return m_id; } - bool invalid() const { return !m_id; } - void invalidate() { m_id = 0; } -}; - -struct VerilatedVpiTimedCbsCmp final { - // Ordering sets keyed by time, then callback unique id - bool operator()(const std::pair &a, - const std::pair &b) const { - if (a.first < b.first) - return true; - if (a.first > b.first) - return false; - return a.second < b.second; - } -}; - -class VerilatedVpiError; - -class VerilatedVpiImp final { - enum { CB_ENUM_MAX_VALUE = cbAtEndOfSimTime + 1 }; // Maximum callback reason - using VpioCbList = std::list; - using VpioFutureCbs = - std::map, VerilatedVpiCbHolder>; - - // All only medium-speed, so use singleton function - // Callbacks that are past or at current timestamp - std::array m_cbCurrentLists; - VpioFutureCbs m_futureCbs; // Time based callbacks for future timestamps - VpioFutureCbs m_nextCbs; // cbNextSimTime callbacks - VerilatedVpiError *m_errorInfop = nullptr; // Container for vpi error info - VerilatedAssertOneThread m_assertOne; // Assert only called from single thread - uint64_t m_nextCallbackId = 1; // Id to identify callback - - static VerilatedVpiImp &s() { // Singleton - static VerilatedVpiImp s_s; - return s_s; - } - -public: - static void assertOneCheck() { s().m_assertOne.check(); } - static uint64_t nextCallbackId() { return ++s().m_nextCallbackId; } - - static void cbCurrentAdd(uint64_t id, const s_cb_data *cb_data_p) { - // The passed cb_data_p was property of the user, so need to recreate - if (VL_UNCOVERABLE(cb_data_p->reason >= CB_ENUM_MAX_VALUE)) { - VL_FATAL_MT(__FILE__, __LINE__, "", "vpi bb reason too large"); - } - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_register_cb reason=%d id=%" PRId64 - " obj=%p\n", - cb_data_p->reason, id, cb_data_p->obj);); - VerilatedVpioVar *varop = nullptr; - if (cb_data_p->reason == cbValueChange) - varop = VerilatedVpioVar::castp(cb_data_p->obj); - s().m_cbCurrentLists[cb_data_p->reason].emplace_back(id, cb_data_p, varop); - } - static void cbFutureAdd(uint64_t id, const s_cb_data *cb_data_p, QData time) { - // The passed cb_data_p was property of the user, so need to recreate - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_register_cb reason=%d id=%" PRId64 - " time=%" PRIu64 " obj=%p\n", - cb_data_p->reason, id, time, cb_data_p->obj);); - s().m_futureCbs.emplace(std::piecewise_construct, - std::forward_as_tuple(time, id), - std::forward_as_tuple(id, cb_data_p, nullptr)); - } - static void cbNextAdd(uint64_t id, const s_cb_data *cb_data_p, QData time) { - // The passed cb_data_p was property of the user, so need to recreate - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: vpi_register_cb reason=%d(NEXT) id=%" PRId64 - " time=%" PRIu64 " obj=%p\n", - cb_data_p->reason, id, time, cb_data_p->obj);); - s().m_nextCbs.emplace(std::piecewise_construct, - std::forward_as_tuple(time, id), - std::forward_as_tuple(id, cb_data_p, nullptr)); - } - static void cbReasonRemove(uint64_t id, uint32_t reason, QData time) { - // Id might no longer exist, if already removed due to call after event, or - // teardown We do not remove it now as we may be iterating the list, instead - // set to nullptr and will cleanup later Remove from cbCurrent queue - for (auto &ir : s().m_cbCurrentLists[reason]) { - if (ir.id() == id) { - ir.invalidate(); - return; // Once found, it won't also be in m_futureCbs - } - } - { // Remove from cbFuture queue - const auto it = s().m_futureCbs.find(std::make_pair(time, id)); - if (it != s().m_futureCbs.end()) { - it->second.invalidate(); - return; - } - } - { // Remove from cbNext - const auto it = s().m_nextCbs.find(std::make_pair(time, id)); - if (it != s().m_nextCbs.end()) { - it->second.invalidate(); - return; - } - } - } - static void moveFutureCbs() VL_MT_UNSAFE_ONE { - // For any events past current time, move from cbFuture queue to cbCurrent - // queue - if (s().m_futureCbs.empty() && s().m_nextCbs.empty()) - return; - // VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: moveFutureCbs\n"); dumpCbs(); ); - const QData time = VL_TIME_Q(); - for (auto it = s().m_futureCbs.begin(); // - VL_UNLIKELY(it != s().m_futureCbs.end() && it->first.first <= time);) { - VerilatedVpiCbHolder &hor = it->second; - const auto last_it = it; - ++it; - if (VL_UNLIKELY(!hor.invalid())) { - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: moveFutureCbs id=%" PRId64 "\n", hor.id());); - s().m_cbCurrentLists[hor.cb_datap()->reason].emplace_back(hor); - } - s().m_futureCbs.erase(last_it); - } - for (auto it = s().m_nextCbs.begin(); // - VL_UNLIKELY(it != s().m_nextCbs.end() && it->first.first < time);) { - VerilatedVpiCbHolder &hor = it->second; - const auto last_it = it; - ++it; - if (VL_UNLIKELY(!hor.invalid())) { - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: moveFutureCbs id=%" PRId64 "\n", hor.id());); - s().m_cbCurrentLists[hor.cb_datap()->reason].emplace_back(hor); - } - s().m_nextCbs.erase(last_it); - } - } - static QData cbNextDeadline() { - const auto it = s().m_futureCbs.cbegin(); - if (VL_LIKELY(it != s().m_futureCbs.cend())) - return it->first.first; - return ~0ULL; // maxquad - } - static bool callCbs(const uint32_t reason) VL_MT_UNSAFE_ONE { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: callCbs reason=%u\n", reason);); - assertOneCheck(); - moveFutureCbs(); - if (s().m_cbCurrentLists[reason].empty()) - return false; - // Iterate on old list, making new list empty, to prevent looping over newly - // added elements - VpioCbList cbObjList; - std::swap(s().m_cbCurrentLists[reason], cbObjList); - bool called = false; - for (VerilatedVpiCbHolder &ihor : cbObjList) { - // cbReasonRemove sets to nullptr, so we know on removal the old end() - // will still exist - if (VL_LIKELY(!ihor.invalid())) { // Not deleted earlier - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: reason_callback reason=%d id=%" PRId64 "\n", - reason, ihor.id());); - ihor.invalidate(); // Timed callbacks are one-shot - (ihor.cb_rtnp())(ihor.cb_datap()); - called = true; - } - } - return called; - } - static bool callValueCbs() VL_MT_UNSAFE_ONE { - assertOneCheck(); - VpioCbList &cbObjList = s().m_cbCurrentLists[cbValueChange]; - bool called = false; - std::unordered_set - update; // set of objects to update after callbacks - if (cbObjList.empty()) - return called; - const auto last = - std::prev(cbObjList.end()); // prevent looping over newly added elements - for (auto it = cbObjList.begin(); true;) { - // cbReasonRemove sets to nullptr, so we know on removal the old end() - // will still exist - const bool was_last = it == last; - if (VL_UNLIKELY(it->invalid())) { // Deleted earlier, cleanup - it = cbObjList.erase(it); - if (was_last) - break; - continue; - } - VerilatedVpiCbHolder &ho = *it++; - VerilatedVpioVar *const varop = - reinterpret_cast(ho.cb_datap()->obj); - void *const newDatap = varop->varDatap(); - void *const prevDatap = - varop->prevDatap(); // Was malloced when we added the callback - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: value_test %s v[0]=%d/%d %p %p\n", - varop->fullname(), - *(static_cast(newDatap)), - *(static_cast(prevDatap)), newDatap, - prevDatap);); - if (std::memcmp(prevDatap, newDatap, varop->entSize()) != 0) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: value_callback %" PRId64 - " %s v[0]=%d\n", - ho.id(), varop->fullname(), - *(static_cast(newDatap)));); - update.insert(varop); - vpi_get_value(ho.cb_datap()->obj, ho.cb_datap()->value); - (ho.cb_rtnp())(ho.cb_datap()); - called = true; - } - if (was_last) - break; - } - for (const auto &ip : update) { - std::memcpy(ip->prevDatap(), ip->varDatap(), ip->entSize()); - } - return called; - } - static void dumpCbs() VL_MT_UNSAFE_ONE; - static VerilatedVpiError * - error_info() VL_MT_UNSAFE_ONE; // getter for vpi error info -}; - -//====================================================================== -// Statics -// Internal note: Globals may multi-construct, see verilated.cpp top. - -thread_local uint8_t *VerilatedVpio::t_freeHeadp = nullptr; - -//====================================================================== -// VerilatedVpiError -// Internal container for vpi error info - -class VerilatedVpiError final { - t_vpi_error_info m_errorInfo; - bool m_flag = false; - char m_buff[VL_VPI_LINE_SIZE_]; - void setError(PLI_BYTE8 *message, PLI_BYTE8 *code, PLI_BYTE8 *file, - PLI_INT32 line) { - m_errorInfo.message = message; - m_errorInfo.file = file; - m_errorInfo.line = line; - m_errorInfo.code = code; - do_callbacks(); - } - void do_callbacks() { - if (getError()->level >= vpiError && - Verilated::threadContextp()->fatalOnVpiError()) { - // Stop on vpi error/unsupported - vpi_unsupported(); - } - // We need to run above code first because in the case that the - // callback executes further vpi functions we will loose the error - // as it will be overwritten. - VerilatedVpiImp::callCbs(cbPLIError); - } - -public: - VerilatedVpiError() { - m_buff[0] = '\0'; - m_errorInfo.product = const_cast(Verilated::productName()); - } - ~VerilatedVpiError() = default; - static void selfTest() VL_MT_UNSAFE_ONE; - VerilatedVpiError *setMessage(PLI_INT32 level) { - m_flag = true; - m_errorInfo.level = level; - return this; - } - void setMessage(const std::string &file, PLI_INT32 line, const char *message, - ...) { - // message cannot be a const string& as va_start cannot use a reference - static thread_local std::string t_filehold; - va_list args; - va_start(args, message); - VL_VSNPRINTF(m_buff, sizeof(m_buff), message, args); - va_end(args); - m_errorInfo.state = vpiPLI; - t_filehold = file; - setError(static_cast(m_buff), nullptr, - const_cast(t_filehold.c_str()), line); - } - p_vpi_error_info getError() { - if (m_flag) - return &m_errorInfo; - return nullptr; - } - void resetError() { m_flag = false; } - static void vpi_unsupported() { - // Not supported yet - const p_vpi_error_info error_info_p = - VerilatedVpiImp::error_info()->getError(); - if (error_info_p) { - VL_FATAL_MT(error_info_p->file, error_info_p->line, "", - error_info_p->message); - return; - } - VL_FATAL_MT(__FILE__, __LINE__, "", - "vpi_unsupported called without error info set"); - } - static const char *strFromVpiVal(PLI_INT32 vpiVal) VL_PURE; - static const char *strFromVpiObjType(PLI_INT32 vpiVal) VL_PURE; - static const char *strFromVpiMethod(PLI_INT32 vpiVal) VL_PURE; - static const char *strFromVpiCallbackReason(PLI_INT32 vpiVal) VL_PURE; - static const char *strFromVpiProp(PLI_INT32 vpiVal) VL_PURE; - static const char *strFromVpiConstType(PLI_INT32 vpiVal) VL_PURE; -}; - -//====================================================================== -// VerilatedVpi implementation - -bool VerilatedVpi::callCbs(uint32_t reason) VL_MT_UNSAFE_ONE { - return VerilatedVpiImp::callCbs(reason); -} - -// Historical, before we had multiple kinds of timed callbacks -void VerilatedVpi::callTimedCbs() VL_MT_UNSAFE_ONE { - VerilatedVpiImp::callCbs(cbAfterDelay); -} - -bool VerilatedVpi::callValueCbs() VL_MT_UNSAFE_ONE { - return VerilatedVpiImp::callValueCbs(); -} - -QData VerilatedVpi::cbNextDeadline() VL_MT_UNSAFE_ONE { - return VerilatedVpiImp::cbNextDeadline(); -} - -void VerilatedVpi::dumpCbs() VL_MT_UNSAFE_ONE { VerilatedVpiImp::dumpCbs(); } - -PLI_INT32 VerilatedVpioReasonCb::dovpi_remove_cb() { - VerilatedVpiImp::cbReasonRemove(m_id, m_reason, m_time); - delete this; // IEEE 37.2.2 a vpi_remove_cb does a vpi_release_handle - return 1; -} - -//====================================================================== -// VerilatedVpiImp implementation - -void VerilatedVpiImp::dumpCbs() VL_MT_UNSAFE_ONE { - assertOneCheck(); - VL_DBG_MSGF("- vpi: dumpCbs\n"); - for (uint32_t reason = 0; reason < CB_ENUM_MAX_VALUE; ++reason) { - VpioCbList &cbObjList = s().m_cbCurrentLists[reason]; - for (auto &ho : cbObjList) { - if (VL_UNLIKELY(!ho.invalid())) { - VL_DBG_MSGF("- vpi: reason=%d=%s id=%" PRId64 "\n", reason, - VerilatedVpiError::strFromVpiCallbackReason(reason), - ho.id()); - } - } - } - for (auto &ifuture : s().m_nextCbs) { - const QData time = ifuture.first.first; - VerilatedVpiCbHolder &ho = ifuture.second; - if (VL_UNLIKELY(!ho.invalid())) { - VL_DBG_MSGF( - "- vpi: time=%" PRId64 "(NEXT) reason=%d=%s id=%" PRId64 "\n", - time, ho.cb_datap()->reason, - VerilatedVpiError::strFromVpiCallbackReason(ho.cb_datap()->reason), - ho.id()); - } - } - for (auto &ifuture : s().m_futureCbs) { - const QData time = ifuture.first.first; - VerilatedVpiCbHolder &ho = ifuture.second; - if (VL_UNLIKELY(!ho.invalid())) { - VL_DBG_MSGF( - "- vpi: time=%" PRId64 " reason=%d=%s id=%" PRId64 "\n", time, - ho.cb_datap()->reason, - VerilatedVpiError::strFromVpiCallbackReason(ho.cb_datap()->reason), - ho.id()); - } - } -} - -VerilatedVpiError *VerilatedVpiImp::error_info() VL_MT_UNSAFE_ONE { - VerilatedVpiImp::assertOneCheck(); - if (VL_UNLIKELY(!s().m_errorInfop)) - s().m_errorInfop = new VerilatedVpiError; - return s().m_errorInfop; -} - -//====================================================================== -// VerilatedVpiError Methods - -const char *VerilatedVpiError::strFromVpiVal(PLI_INT32 vpiVal) VL_PURE { - // clang-format off - static const char* const names[] = { - "*undefined*", - "vpiBinStrVal", - "vpiOctStrVal", - "vpiDecStrVal", - "vpiHexStrVal", - "vpiScalarVal", - "vpiIntVal", - "vpiRealVal", - "vpiStringVal", - "vpiVectorVal", - "vpiStrengthVal", - "vpiTimeVal", - "vpiObjTypeVal", - "vpiSuppressVal", - "vpiShortIntVal", - "vpiLongIntVal", - "vpiShortRealVal", - "vpiRawTwoStateVal", - "vpiRawFourStateVal", - }; - // clang-format on - if (VL_UNCOVERABLE(vpiVal < 0)) - return names[0]; - return names[(vpiVal <= vpiRawFourStateVal) ? vpiVal : 0]; -} -const char *VerilatedVpiError::strFromVpiObjType(PLI_INT32 vpiVal) VL_PURE { - // clang-format off - static const char* const names[] = { - "*undefined*", - "vpiAlways", - "vpiAssignStmt", - "vpiAssignment", - "vpiBegin", - "vpiCase", - "vpiCaseItem", - "vpiConstant", - "vpiContAssign", - "vpiDeassign", - "vpiDefParam", - "vpiDelayControl", - "vpiDisable", - "vpiEventControl", - "vpiEventStmt", - "vpiFor", - "vpiForce", - "vpiForever", - "vpiFork", - "vpiFuncCall", - "vpiFunction", - "vpiGate", - "vpiIf", - "vpiIfElse", - "vpiInitial", - "vpiIntegerVar", - "vpiInterModPath", - "vpiIterator", - "vpiIODecl", - "vpiMemory", - "vpiMemoryWord", - "vpiModPath", - "vpiModule", - "vpiNamedBegin", - "vpiNamedEvent", - "vpiNamedFork", - "vpiNet", - "vpiNetBit", - "vpiNullStmt", - "vpiOperation", - "vpiParamAssign", - "vpiParameter", - "vpiPartSelect", - "vpiPathTerm", - "vpiPort", - "vpiPortBit", - "vpiPrimTerm", - "vpiRealVar", - "vpiReg", - "vpiRegBit", - "vpiRelease", - "vpiRepeat", - "vpiRepeatControl", - "vpiSchedEvent", - "vpiSpecParam", - "vpiSwitch", - "vpiSysFuncCall", - "vpiSysTaskCall", - "vpiTableEntry", - "vpiTask", - "vpiTaskCall", - "vpiTchk", - "vpiTchkTerm", - "vpiTimeVar", - "vpiTimeQueue", - "vpiUdp", - "vpiUdpDefn", - "vpiUserSystf", - "vpiVarSelect", - "vpiWait", - "vpiWhile", - "vpiCondition", - "vpiDelay", - "vpiElseStmt", - "vpiForIncStmt", - "vpiForInitStmt", - "vpiHighConn", - "vpiLhs", - "vpiIndex", - "vpiLeftRange", - "vpiLowConn", - "vpiParent", - "vpiRhs", - "vpiRightRange", - "vpiScope", - "vpiSysTfCall", - "vpiTchkDataTerm", - "vpiTchkNotifier", - "vpiTchkRefTerm", - "vpiArgument", - "vpiBit", - "vpiDriver", - "vpiInternalScope", - "vpiLoad", - "vpiModDataPathIn", - "vpiModPathIn", - "vpiModPathOut", - "vpiOperand", - "vpiPortInst", - "vpiProcess", - "vpiVariables", - "vpiUse", - "vpiExpr", - "vpiPrimitive", - "vpiStmt", - "vpiAttribute", - "vpiBitSelect", - "vpiCallback", - "vpiDelayTerm", - "vpiDelayDevice", - "vpiFrame", - "vpiGateArray", - "vpiModuleArray", - "vpiPrimitiveArray", - "vpiNetArray", - "vpiRange", - "vpiRegArray", - "vpiSwitchArray", - "vpiUdpArray", - "vpiActiveTimeFormat", - "vpiInTerm", - "vpiInstanceArray", - "vpiLocalDriver", - "vpiLocalLoad", - "vpiOutTerm", - "vpiPorts", - "vpiSimNet", - "vpiTaskFunc", - "vpiContAssignBit", - "vpiNamedEventArray", - "vpiIndexedPartSelect", - "vpiBaseExpr", - "vpiWidthExpr", - "vpiGenScopeArray", - "vpiGenScope", - "vpiGenVar", - "vpiAutomatics" - }; - static const char* const sv_names1[] = { - "vpiPackage", - "vpiInterface", - "vpiProgram", - "vpiInterfaceArray", - "vpiProgramArray", - "vpiTypespec", - "vpiModport", - "vpiInterfaceTfDecl", - "vpiRefObj", - "vpiTypeParameter", - "vpiLongIntVar", - "vpiShortIntVar", - "vpiIntVar", - "vpiShortRealVar", - "vpiByteVar", - "vpiClassVar", - "vpiStringVar", - "vpiEnumVar", - "vpiStructVar", - "vpiUnionVar", - "vpiBitVar", - "vpiClassObj", - "vpiChandleVar", - "vpiPackedArrayVar", - "*undefined*", // 624 is not defined for object types - "vpiLongIntTypespec", - "vpiShortRealTypespec", - "vpiByteTypespec", - "vpiShortIntTypespec", - "vpiIntTypespec", - "vpiClassTypespec", - "vpiStringTypespec", - "vpiChandleTypespec", - "vpiEnumTypespec", - "vpiEnumConst", - "vpiIntegerTypespec", - "vpiTimeTypespec", - "vpiRealTypespec", - "vpiStructTypespec", - "vpiUnionTypespec", - "vpiBitTypespec", - "vpiLogicTypespec", - "vpiArrayTypespec", - "vpiVoidTypespec", - "vpiTypespecMember", - "vpiDistItem", - "vpiAliasStmt", - "vpiThread", - "vpiMethodFuncCall", - "vpiMethodTaskCall", - "vpiClockingBlock", - "vpiClockingIODecl", - "vpiClassDefn", - "vpiConstraint", - "vpiConstraintOrdering", - "vpiPropertyDecl", - "vpiPropertySpec", - "vpiPropertyExpr", - "vpiMulticlockSequenceExpr", - "vpiClockedSeq", - "vpiPropertyInst", - "vpiSequenceDecl", - "vpiCaseProperty", - "*undefined*", // 663 is not defined for object types - "vpiSequenceInst", - "vpiImmediateAssert", - "vpiReturn", - "vpiAnyPattern", - "vpiTaggedPattern", - "vpiStructPattern", - "vpiDoWhile", - "vpiOrderedWait", - "vpiWaitFork", - "vpiDisableFork", - "vpiExpectStmt", - "vpiForeachStmt", - "vpiFinal", - "vpiExtends", - "vpiDistribution", - "vpiSeqFormalDecl", - "vpiEnumNet", - "vpiIntegerNet", - "vpiTimeNet", - "vpiStructNet", - "vpiBreak", - "vpiContinue", - "vpiAssert", - "vpiAssume", - "vpiCover", - "vpiDisableCondition", - "vpiClockingEvent", - "vpiReturnStmt", - "vpiPackedArrayTypespec", - "vpiPackedArrayNet", - "vpiImmediateAssume", - "vpiImmediateCover", - "vpiSequenceTypespec", - "vpiPropertyTypespec", - "vpiEventTypespec", - "vpiPropFormalDecl", - }; - // clang-format on - if (VL_UNCOVERABLE(vpiVal < 0)) - return names[0]; - else if (vpiVal <= vpiAutomatics) - return names[vpiVal]; - else if (vpiVal >= vpiPackage && vpiVal <= vpiPropFormalDecl) - return sv_names1[(vpiVal - vpiPackage)]; - else - return names[0]; -} -const char *VerilatedVpiError::strFromVpiMethod(PLI_INT32 vpiVal) VL_PURE { - // clang-format off - static const char* const names[] = { - "vpiCondition", - "vpiDelay", - "vpiElseStmt", - "vpiForIncStmt", - "vpiForInitStmt", - "vpiHighConn", - "vpiLhs", - "vpiIndex", - "vpiLeftRange", - "vpiLowConn", - "vpiParent", - "vpiRhs", - "vpiRightRange", - "vpiScope", - "vpiSysTfCall", - "vpiTchkDataTerm", - "vpiTchkNotifier", - "vpiTchkRefTerm", - "vpiArgument", - "vpiBit", - "vpiDriver", - "vpiInternalScope", - "vpiLoad", - "vpiModDataPathIn", - "vpiModPathIn", - "vpiModPathOut", - "vpiOperand", - "vpiPortInst", - "vpiProcess", - "vpiVariables", - "vpiUse", - "vpiExpr", - "vpiPrimitive", - "vpiStmt" - }; - // clang-format on - if (vpiVal > vpiStmt || vpiVal < vpiCondition) - return "*undefined*"; - return names[vpiVal - vpiCondition]; -} - -const char * -VerilatedVpiError::strFromVpiCallbackReason(PLI_INT32 vpiVal) VL_PURE { - // clang-format off - static const char* const names[] = { - "*undefined*", - "cbValueChange", - "cbStmt", - "cbForce", - "cbRelease", - "cbAtStartOfSimTime", - "cbReadWriteSynch", - "cbReadOnlySynch", - "cbNextSimTime", - "cbAfterDelay", - "cbEndOfCompile", - "cbStartOfSimulation", - "cbEndOfSimulation", - "cbError", - "cbTchkViolation", - "cbStartOfSave", - "cbEndOfSave", - "cbStartOfRestart", - "cbEndOfRestart", - "cbStartOfReset", - "cbEndOfReset", - "cbEnterInteractive", - "cbExitInteractive", - "cbInteractiveScopeChange", - "cbUnresolvedSystf", - "cbAssign", - "cbDeassign", - "cbDisable", - "cbPLIError", - "cbSignal", - "cbNBASynch", - "cbAtEndOfSimTime" - }; - // clang-format on - if (VL_UNCOVERABLE(vpiVal < 0)) - return names[0]; - return names[(vpiVal <= cbAtEndOfSimTime) ? vpiVal : 0]; -} - -const char *VerilatedVpiError::strFromVpiProp(PLI_INT32 vpiVal) VL_PURE { - // clang-format off - static const char* const names[] = { - "*undefined or other*", - "vpiType", - "vpiName", - "vpiFullName", - "vpiSize", - "vpiFile", - "vpiLineNo", - "vpiTopModule", - "vpiCellInstance", - "vpiDefName", - "vpiProtected", - "vpiTimeUnit", - "vpiTimePrecision", - "vpiDefNetType", - "vpiUnconnDrive", - "vpiDefFile", - "vpiDefLineNo", - "vpiScalar", - "vpiVector", - "vpiExplicitName", - "vpiDirection", - "vpiConnByName", - "vpiNetType", - "vpiExplicitScalared", - "vpiExplicitVectored", - "vpiExpanded", - "vpiImplicitDecl", - "vpiChargeStrength", - "vpiArray", - "vpiPortIndex", - "vpiTermIndex", - "vpiStrength0", - "vpiStrength1", - "vpiPrimType", - "vpiPolarity", - "vpiDataPolarity", - "vpiEdge", - "vpiPathType", - "vpiTchkType", - "vpiOpType", - "vpiConstType", - "vpiBlocking", - "vpiCaseType", - "vpiFuncType", - "vpiNetDeclAssign", - "vpiUserDefn", - "vpiScheduled", - "*undefined*", - "*undefined*", - "vpiActive", - "vpiAutomatic", - "vpiCell", - "vpiConfig", - "vpiConstantSelect", - "vpiDecompile", - "vpiDefAttribute", - "vpiDelayType", - "vpiIteratorType", - "vpiLibrary", - "*undefined*", - "vpiOffset", - "vpiResolvedNetType", - "vpiSaveRestartID", - "vpiSaveRestartLocation", - "vpiValid", - "vpiSigned", - "vpiStop", - "vpiFinish", - "vpiReset", - "vpiSetInteractiveScope", - "vpiLocalParam", - "vpiModPathHasIfNone", - "vpiIndexedPartSelectType", - "vpiIsMemory", - "vpiIsProtected" - }; - // clang-format on - if (vpiVal == vpiUndefined) - return "vpiUndefined"; - return names[(vpiVal <= vpiIsProtected) ? vpiVal : 0]; -} -const char * -VerilatedVpiError::strFromVpiConstType(PLI_INT32 constType) VL_PURE { - // clang-format off - static const char* const names[] = { - "*undefined*", - "vpiDecConst", - "vpiRealConst", - "vpiBinaryConst", - "vpiOctConst", - "vpiHexConst", - "vpiStringConst", - "vpiIntConst", - "vpiTimeConst", - }; - // clang-format on - if (VL_UNCOVERABLE(constType < 0)) - return names[0]; - return names[(constType <= vpiTimeConst) ? constType : 0]; -} - -#define SELF_CHECK_RESULT_CSTR(got, exp) \ - if (0 != std::strcmp((got), (exp))) { \ - const std::string msg = \ - "%Error: GOT = '"s + (got) + "'" + " EXP = '" + (exp) + "'"; \ - VL_FATAL_MT(__FILE__, __LINE__, "", msg.c_str()); \ - } - -#define SELF_CHECK_ENUM_STR(fn, enumn) \ - do { \ - const char *const strVal = VerilatedVpiError::fn(enumn); \ - SELF_CHECK_RESULT_CSTR(strVal, #enumn); \ - } while (0) - -void VerilatedVpi::selfTest() VL_MT_UNSAFE_ONE { - VerilatedVpiError::selfTest(); -} -void VerilatedVpiError::selfTest() VL_MT_UNSAFE_ONE { - VerilatedVpiImp::assertOneCheck(); - - SELF_CHECK_ENUM_STR(strFromVpiVal, vpiBinStrVal); - SELF_CHECK_ENUM_STR(strFromVpiVal, vpiRawFourStateVal); - - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAlways); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAssignStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAssignment); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBegin); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiCase); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiCaseItem); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiConstant); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiContAssign); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDeassign); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDefParam); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDelayControl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDisable); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEventControl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEventStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiFor); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiForce); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiForever); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiFork); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiFuncCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiFunction); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiGate); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIf); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIfElse); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInitial); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIntegerVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInterModPath); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIterator); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIODecl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiMemory); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiMemoryWord); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModPath); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModule); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNamedBegin); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNamedEvent); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNamedFork); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNetBit); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNullStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiOperation); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiParamAssign); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiParameter); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPartSelect); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPathTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPort); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPortBit); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPrimTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRealVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiReg); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRegBit); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRelease); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRepeat); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRepeatControl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSchedEvent); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSpecParam); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSwitch); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSysFuncCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSysTaskCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTableEntry); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTask); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTaskCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTchk); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTchkTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTimeVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTimeQueue); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUdp); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUdpDefn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUserSystf); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiVarSelect); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiWait); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiWhile); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiCondition); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDelay); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiElseStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiForIncStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiForInitStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiHighConn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLhs); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIndex); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLeftRange); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLowConn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiParent); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRhs); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRightRange); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiScope); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSysTfCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTchkDataTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTchkNotifier); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTchkRefTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiArgument); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBit); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDriver); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInternalScope); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLoad); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModDataPathIn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModPathIn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModPathOut); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiOperand); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPortInst); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiProcess); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiVariables); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUse); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiExpr); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPrimitive); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAttribute); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBitSelect); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiCallback); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDelayTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDelayDevice); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiFrame); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiGateArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModuleArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPrimitiveArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNetArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRange); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRegArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSwitchArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUdpArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiActiveTimeFormat); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInstanceArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLocalDriver); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLocalLoad); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiOutTerm); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPorts); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSimNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTaskFunc); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiContAssignBit); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiNamedEventArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIndexedPartSelect); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBaseExpr); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiWidthExpr); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiGenScopeArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiGenScope); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiGenVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAutomatics); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPackage); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInterface); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiProgram); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInterfaceArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiProgramArray); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiModport); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiInterfaceTfDecl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRefObj); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTypeParameter); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLongIntVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiShortIntVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIntVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiShortRealVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiByteVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClassVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStringVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEnumVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStructVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUnionVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBitVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClassObj); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiChandleVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPackedArrayVar); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLongIntTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiShortRealTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiByteTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiShortIntTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIntTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClassTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStringTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiChandleTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEnumTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEnumConst); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIntegerTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTimeTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiRealTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStructTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiUnionTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBitTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiLogicTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiArrayTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiVoidTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTypespecMember); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDistItem); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAliasStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiThread); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiMethodFuncCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiMethodTaskCall); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClockingBlock); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClockingIODecl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClassDefn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiConstraint); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiConstraintOrdering); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPropertyDecl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPropertySpec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPropertyExpr); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiMulticlockSequenceExpr); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClockedSeq); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPropertyInst); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSequenceDecl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiCaseProperty); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSequenceInst); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiImmediateAssert); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiReturn); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAnyPattern); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTaggedPattern); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStructPattern); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDoWhile); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiOrderedWait); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiWaitFork); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDisableFork); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiExpectStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiForeachStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiFinal); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiExtends); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDistribution); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSeqFormalDecl); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEnumNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiIntegerNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiTimeNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiStructNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiBreak); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiContinue); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAssert); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiAssume); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiCover); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiDisableCondition); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiClockingEvent); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiReturnStmt); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPackedArrayTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPackedArrayNet); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiImmediateAssume); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiImmediateCover); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiSequenceTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPropertyTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiEventTypespec); - SELF_CHECK_ENUM_STR(strFromVpiObjType, vpiPropFormalDecl); - - SELF_CHECK_ENUM_STR(strFromVpiMethod, vpiCondition); - SELF_CHECK_ENUM_STR(strFromVpiMethod, vpiStmt); - - SELF_CHECK_ENUM_STR(strFromVpiCallbackReason, cbValueChange); - SELF_CHECK_ENUM_STR(strFromVpiCallbackReason, cbAtEndOfSimTime); - - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiType); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiProtected); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiDirection); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiTermIndex); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiConstType); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiAutomatic); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiOffset); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiStop); - SELF_CHECK_ENUM_STR(strFromVpiProp, vpiIsProtected); - - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiDecConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiRealConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiBinaryConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiOctConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiHexConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiStringConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiIntConst); - SELF_CHECK_ENUM_STR(strFromVpiConstType, vpiTimeConst); -} - -#undef SELF_CHECK_ENUM_STR -#undef SELF_CHECK_RESULT_CSTR - -//====================================================================== -// callback related - -vpiHandle vpi_register_cb(p_cb_data cb_data_p) { - // Returns handle so user can remove the callback, user must - // vpi_release_handle it Don't confuse with the callback-activated t_cb_data - // object handle which is the object causing the callback rather than the - // callback itself - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - // cppcheck-suppress nullPointer - if (VL_UNLIKELY(!cb_data_p)) { - VL_VPI_WARNING_(__FILE__, __LINE__, "%s : callback data pointer is null", - __func__); - return nullptr; - } - const PLI_INT32 reason = cb_data_p->reason; - switch (reason) { - case cbAfterDelay: // FALLTHRU // One-shot; time relative - case cbAtEndOfSimTime: // FALLTHRU // One-shot; time absolute; supported via - // vlt_main.cpp - case cbAtStartOfSimTime: // FALLTHRU // One-shot; time absolute; supported via - // vlt_main.cpp - case cbReadOnlySynch: // FALLTHRU // One-shot; time relative; supported via - // vlt_main.cpp - case cbReadWriteSynch: { // One-shot; time relative; supported via - // vlt_main.cpp - const bool abs = reason == cbAtStartOfSimTime || reason == cbAtEndOfSimTime; - const QData time = VL_TIME_Q(); - QData abstime = 0; - if (cb_data_p->time) { - if (abs) { - abstime = VL_SET_QII(cb_data_p->time->high, cb_data_p->time->low); - } else { - abstime = - time + VL_SET_QII(cb_data_p->time->high, cb_data_p->time->low); - } - } - const uint64_t id = VerilatedVpiImp::nextCallbackId(); - VerilatedVpioReasonCb *const vop = - new VerilatedVpioReasonCb{id, abstime, reason}; - if (abstime <= time) { - VerilatedVpiImp::cbCurrentAdd(id, cb_data_p); - } else { - VerilatedVpiImp::cbFutureAdd(id, cb_data_p, abstime); - } - return vop->castVpiHandle(); - } - case cbNextSimTime: { // One-shot; time always next; supported via - // vlt_main.cpp - const QData time = VL_TIME_Q(); - const uint64_t id = VerilatedVpiImp::nextCallbackId(); - VerilatedVpioReasonCb *const vop = new VerilatedVpioReasonCb{id, 0, reason}; - VerilatedVpiImp::cbNextAdd(id, cb_data_p, time); - return vop->castVpiHandle(); - } - case cbEndOfSimulation: // FALLTHRU // One-shot; time ignored; supported via - // vlt_main.cpp - case cbEnterInteractive: // FALLTHRU // NOP, but need to return handle, so - // make object - case cbExitInteractive: // FALLTHRU // NOP, but need to return handle, so make - // object - case cbInteractiveScopeChange: // FALLTHRU // NOP, but need to return handle, - // so make object - case cbPLIError: // FALLTHRU // NOP, but need to return handle, so make object - case cbStartOfSimulation: // FALLTHRU // One-shot; time ignored; supported via - // vlt_main.cpp - case cbValueChange: { // Multi-shot; supported via vlt_main.cpp - const uint64_t id = VerilatedVpiImp::nextCallbackId(); - VerilatedVpioReasonCb *const vop = new VerilatedVpioReasonCb{id, 0, reason}; - VerilatedVpiImp::cbCurrentAdd(id, cb_data_p); - return vop->castVpiHandle(); - } - default: - VL_VPI_WARNING_(__FILE__, __LINE__, "%s: Unsupported callback type %s", - __func__, - VerilatedVpiError::strFromVpiCallbackReason(reason)); - return nullptr; - } -} - -PLI_INT32 vpi_remove_cb(vpiHandle cb_obj) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_remove_cb %p\n", cb_obj);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - VerilatedVpio *const vop = VerilatedVpio::castp(cb_obj); - if (VL_UNLIKELY(!vop)) - return 0; - return vop->dovpi_remove_cb(); -} - -void vpi_get_cb_info(vpiHandle /*object*/, p_cb_data /*cb_data_p*/) { - VL_VPI_UNIMP_(); -} -vpiHandle vpi_register_systf(p_vpi_systf_data /*systf_data_p*/) { - VL_VPI_UNIMP_(); - return nullptr; -} -void vpi_get_systf_info(vpiHandle /*object*/, - p_vpi_systf_data /*systf_data_p*/) { - VL_VPI_UNIMP_(); -} - -// for obtaining handles - -vpiHandle vpi_handle_by_name(PLI_BYTE8 *namep, vpiHandle scope) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - if (VL_UNLIKELY(!namep)) - return nullptr; - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: vpi_handle_by_name %s %p\n", namep, scope);); - const VerilatedVar *varp = nullptr; - const VerilatedScope *scopep; - const VerilatedVpioScope *const voScopep = VerilatedVpioScope::castp(scope); - std::string scopeAndName = namep; - if (voScopep) { - const bool scopeIsPackage = VerilatedVpioPackage::castp(scope) != nullptr; - scopeAndName = - std::string{voScopep->fullname()} + (scopeIsPackage ? "" : ".") + namep; - namep = const_cast(scopeAndName.c_str()); - } - { - // This doesn't yet follow the hierarchy in the proper way - bool isPackage = false; - scopep = Verilated::threadContextp()->scopeFind(namep); - if (scopep) { // Whole thing found as a scope - if (scopep->type() == VerilatedScope::SCOPE_MODULE) { - return (new VerilatedVpioModule{scopep})->castVpiHandle(); - } else if (scopep->type() == VerilatedScope::SCOPE_PACKAGE) { - return (new VerilatedVpioPackage{scopep})->castVpiHandle(); - } else { - return (new VerilatedVpioScope{scopep})->castVpiHandle(); - } - } - std::string basename = scopeAndName; - std::string scopename; - std::string::size_type prevpos = std::string::npos; - std::string::size_type pos = std::string::npos; - // Split hierarchical names at last '.' not inside escaped identifier - size_t i = 0; - while (i < scopeAndName.length()) { - if (scopeAndName[i] == '\\') { - while (i < scopeAndName.length() && scopeAndName[i] != ' ') - ++i; - ++i; // Proc ' ', it should always be there. Then grab '.' on next cycle - } else { - while (i < scopeAndName.length() && - (scopeAndName[i] != '.' && - (i + 1 >= scopeAndName.length() || scopeAndName[i] != ':' || - scopeAndName[i + 1] != ':'))) - ++i; - if (i < scopeAndName.length()) { - prevpos = pos; - pos = i++; - if (scopeAndName[i - 1] == ':') - isPackage = true; - } - } - } - // Do the split - if (VL_LIKELY(pos != std::string::npos)) { - basename.erase(0, pos + (isPackage ? 2 : 1)); - scopename = scopeAndName.substr(0, pos); - if (scopename == "$unit") - scopename = "\\$unit "; - } - if (prevpos == std::string::npos) { - // scopename is a toplevel (no '.' separator), so search in our TOP ports - // first. - scopep = Verilated::threadContextp()->scopeFind("TOP"); - if (scopep) - varp = scopep->varFind(basename.c_str()); - } - if (!varp) { - scopep = Verilated::threadContextp()->scopeFind(scopename.c_str()); - if (!scopep) - return nullptr; - varp = scopep->varFind(basename.c_str()); - } - } - if (!varp) - return nullptr; - - if (varp->isParam()) { - return (new VerilatedVpioParam{varp, scopep})->castVpiHandle(); - } else { - return (new VerilatedVpioVar{varp, scopep})->castVpiHandle(); - } -} - -vpiHandle vpi_handle_by_index(vpiHandle object, PLI_INT32 indx) { - // Used to get array entries - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: vpi_handle_by_index %p %d\n", object, indx);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - // Memory words are not indexable - const VerilatedVpioMemoryWord *const vop = - VerilatedVpioMemoryWord::castp(object); - if (VL_UNLIKELY(vop)) - return nullptr; - const VerilatedVpioVar *const varop = VerilatedVpioVar::castp(object); - if (VL_LIKELY(varop)) { - if (varop->varp()->dims() < 2) - return nullptr; - if (VL_LIKELY(varop->varp()->unpacked().left() >= - varop->varp()->unpacked().right())) { - if (VL_UNLIKELY(indx > varop->varp()->unpacked().left() || - indx < varop->varp()->unpacked().right())) - return nullptr; - return (new VerilatedVpioMemoryWord{ - varop->varp(), varop->scopep(), indx, - indx - varop->varp()->unpacked().right()}) - ->castVpiHandle(); - } - if (VL_UNLIKELY(indx < varop->varp()->unpacked().left() || - indx > varop->varp()->unpacked().right())) - return nullptr; - return (new VerilatedVpioMemoryWord{varop->varp(), varop->scopep(), indx, - indx - - varop->varp()->unpacked().left()}) - ->castVpiHandle(); - } - VL_VPI_INTERNAL_(__FILE__, __LINE__, "%s : can't resolve handle", __func__); - return nullptr; -} - -// for traversing relationships - -vpiHandle vpi_handle(PLI_INT32 type, vpiHandle object) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_handle %d %p\n", type, object);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - switch (type) { - case vpiLeftRange: { - if (const VerilatedVpioVarBase *const vop = - VerilatedVpioVarBase::castp(object)) { - if (VL_UNLIKELY(!vop->rangep())) - return nullptr; - return (new VerilatedVpioConst{vop->rangep()->left()})->castVpiHandle(); - } else if (const VerilatedVpioRange *const vop = - VerilatedVpioRange::castp(object)) { - if (VL_UNLIKELY(!vop->rangep())) - return nullptr; - return (new VerilatedVpioConst{vop->rangep()->left()})->castVpiHandle(); - } - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: Unsupported vpiHandle (%p) for type %s, nothing will be returned", - __func__, object, VerilatedVpiError::strFromVpiMethod(type)); - return nullptr; - } - case vpiRightRange: { - if (const VerilatedVpioVarBase *const vop = - VerilatedVpioVarBase::castp(object)) { - if (VL_UNLIKELY(!vop->rangep())) - return nullptr; - return (new VerilatedVpioConst{vop->rangep()->right()})->castVpiHandle(); - } else if (const VerilatedVpioRange *const vop = - VerilatedVpioRange::castp(object)) { - if (VL_UNLIKELY(!vop->rangep())) - return nullptr; - return (new VerilatedVpioConst{vop->rangep()->right()})->castVpiHandle(); - } - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: Unsupported vpiHandle (%p) for type %s, nothing will be returned", - __func__, object, VerilatedVpiError::strFromVpiMethod(type)); - return nullptr; - } - case vpiIndex: { - const VerilatedVpioVar *const vop = VerilatedVpioVar::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - const int32_t val = vop->index(); - return (new VerilatedVpioConst{val})->castVpiHandle(); - } - case vpiScope: { - const VerilatedVpioVarBase *const vop = VerilatedVpioVarBase::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - return (new VerilatedVpioScope{vop->scopep()})->castVpiHandle(); - } - case vpiParent: { - const VerilatedVpioMemoryWord *const vop = - VerilatedVpioMemoryWord::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - return (new VerilatedVpioVar{vop->varp(), vop->scopep()})->castVpiHandle(); - } - default: - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Unsupported type %s, nothing will be returned", - __func__, VerilatedVpiError::strFromVpiMethod(type)); - return nullptr; - } -} - -vpiHandle vpi_handle_multi(PLI_INT32 /*type*/, vpiHandle /*refHandle1*/, - vpiHandle /*refHandle2*/, ...) { - VL_VPI_UNIMP_(); - return nullptr; -} - -vpiHandle vpi_iterate(PLI_INT32 type, vpiHandle object) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_iterate %d %p\n", type, object);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - switch (type) { - case vpiMemoryWord: { - const VerilatedVpioVar *const vop = VerilatedVpioVar::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - if (vop->varp()->dims() < 2) - return nullptr; - if (vop->varp()->dims() > 2) { - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: %s, object %s has unsupported number of indices (%d)", __func__, - VerilatedVpiError::strFromVpiMethod(type), vop->fullname(), - vop->varp()->dims()); - } - return (new VerilatedVpioMemoryWordIter{object, vop->varp()}) - ->castVpiHandle(); - } - case vpiRange: { - const VerilatedVpioVar *const vop = VerilatedVpioVar::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - if (vop->varp()->dims() < 2) - return nullptr; - // Unsupported is multidim list - if (vop->varp()->dims() > 2) { - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: %s, object %s has unsupported number of indices (%d)", __func__, - VerilatedVpiError::strFromVpiMethod(type), vop->fullname(), - vop->varp()->dims()); - } - return ((new VerilatedVpioRangeIter{vop->rangep()})->castVpiHandle()); - } - case vpiReg: { - const VerilatedVpioScope *const vop = VerilatedVpioScope::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - return ((new VerilatedVpioVarIter{vop})->castVpiHandle()); - } - case vpiParameter: { - const VerilatedVpioScope *const vop = VerilatedVpioScope::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - return ((new VerilatedVpioVarIter{vop, true})->castVpiHandle()); - } - case vpiModule: { - const VerilatedVpioModule *const vop = VerilatedVpioModule::castp(object); - const VerilatedHierarchyMap *const map = VerilatedImp::hierarchyMap(); - const VerilatedScope *const modp = vop ? vop->scopep() : nullptr; - const auto it = - vlstd::as_const(map)->find(const_cast(modp)); - if (it == map->end()) - return nullptr; - return ((new VerilatedVpioModuleIter{it->second})->castVpiHandle()); - } - case vpiInstance: { - if (object) - return nullptr; - const VerilatedHierarchyMap *const map = VerilatedImp::hierarchyMap(); - const auto it = vlstd::as_const(map)->find(nullptr); - if (it == map->end()) - return nullptr; - return ((new VerilatedVpioInstanceIter{it->second})->castVpiHandle()); - } - default: - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Unsupported type %s, nothing will be returned", - __func__, VerilatedVpiError::strFromVpiObjType(type)); - return nullptr; - } -} -vpiHandle vpi_scan(vpiHandle object) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_scan %p\n", object);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - VerilatedVpio *const vop = VerilatedVpio::castp(object); - if (VL_UNLIKELY(!vop)) - return nullptr; - return vop->dovpi_scan(); -} - -// for processing properties - -PLI_INT32 vpi_get(PLI_INT32 property, vpiHandle object) { - // Leave this in the header file - in many cases the compiler can constant - // propagate "object" - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_get %d %p\n", property, object);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - switch (property) { - case vpiTimePrecision: { - return Verilated::threadContextp()->timeprecision(); - } - case vpiTimeUnit: { - const VerilatedVpioScope *const vop = VerilatedVpioScope::castp(object); - if (!vop) - return Verilated::threadContextp() - ->timeunit(); // Null asks for global, not unlikely - return vop->scopep()->timeunit(); - } - case vpiType: { - const VerilatedVpio *const vop = VerilatedVpio::castp(object); - if (VL_UNLIKELY(!vop)) - return vpiUndefined; - return vop->type(); - } - case vpiConstType: { - const VerilatedVpio *const vop = VerilatedVpio::castp(object); - if (VL_UNLIKELY(!vop)) - return vpiUndefined; - return vop->constType(); - } - case vpiDirection: { - // By forethought, the directions already are vpi enumerated - const VerilatedVpioVarBase *const vop = VerilatedVpioVarBase::castp(object); - if (VL_UNLIKELY(!vop)) - return vpiUndefined; - return vop->varp()->vldir(); - } - case vpiScalar: // FALLTHRU - case vpiVector: { - const VerilatedVpioVarBase *const vop = VerilatedVpioVarBase::castp(object); - if (VL_UNLIKELY(!vop)) - return vpiUndefined; - return (property == vpiVector) ^ (vop->varp()->dims() == 0); - } - case vpiSize: { - const VerilatedVpioVarBase *const vop = VerilatedVpioVarBase::castp(object); - if (VL_UNLIKELY(!vop)) - return vpiUndefined; - return vop->size(); - } - default: - VL_VPI_ERROR_(__FILE__, __LINE__, - "%s: Unsupported property %s, nothing will be returned", - __func__, VerilatedVpiError::strFromVpiProp(property)); - return vpiUndefined; - } -} - -PLI_INT64 vpi_get64(PLI_INT32 /*property*/, vpiHandle /*object*/) { - VL_VPI_UNIMP_(); - return vpiUndefined; -} - -PLI_BYTE8 *vpi_get_str(PLI_INT32 property, vpiHandle object) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_get_str %d %p\n", property, object);); - VerilatedVpiImp::assertOneCheck(); - const VerilatedVpio *const vop = VerilatedVpio::castp(object); - VL_VPI_ERROR_RESET_(); - if (VL_UNLIKELY(!vop)) - return nullptr; - switch (property) { - case vpiName: { - return const_cast(vop->name()); - } - case vpiFullName: { - return const_cast(vop->fullname()); - } - case vpiDefName: { - return const_cast(vop->defname()); - } - case vpiType: { - return const_cast( - VerilatedVpiError::strFromVpiObjType(vop->type())); - } - case vpiConstType: { - const PLI_INT32 constType = vpi_get(vpiConstType, object); - VL_VPI_ERROR_RESET_(); - return const_cast( - VerilatedVpiError::strFromVpiConstType(constType)); - } - default: - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Unsupported type %s, nothing will be returned", - __func__, VerilatedVpiError::strFromVpiProp(property)); - return nullptr; - } -} - -// delay processing - -void vpi_get_delays(vpiHandle /*object*/, p_vpi_delay /*delay_p*/) { - VL_VPI_UNIMP_(); -} -void vpi_put_delays(vpiHandle /*object*/, p_vpi_delay /*delay_p*/) { - VL_VPI_UNIMP_(); -} - -// value processing -bool vl_check_format(const VerilatedVar *varp, const p_vpi_value valuep, - const char *fullname, bool isGetValue) { - bool status = true; - if ((valuep->format == vpiVectorVal) || (valuep->format == vpiBinStrVal) || - (valuep->format == vpiOctStrVal) || (valuep->format == vpiHexStrVal)) { - switch (varp->vltype()) { - case VLVT_UINT8: - case VLVT_UINT16: - case VLVT_UINT32: - case VLVT_UINT64: - case VLVT_WDATA: - return status; - default: - status = false; - } - } else if (valuep->format == vpiDecStrVal) { - switch (varp->vltype()) { - case VLVT_UINT8: - case VLVT_UINT16: - case VLVT_UINT32: - case VLVT_UINT64: - return status; - default: - status = false; - } - } else if (valuep->format == vpiStringVal) { - switch (varp->vltype()) { - case VLVT_UINT8: - case VLVT_UINT16: - case VLVT_UINT32: - case VLVT_UINT64: - case VLVT_WDATA: - return status; - case VLVT_STRING: - // string parameter values can't be changed - if (isGetValue || !varp->isParam()) { - return status; - } else { - status = false; - break; - } - default: - status = false; - } - } else if (valuep->format == vpiIntVal) { - switch (varp->vltype()) { - case VLVT_UINT8: - case VLVT_UINT16: - case VLVT_UINT32: - return status; - default: - status = false; - } - } else if (valuep->format == vpiRealVal) { - switch (varp->vltype()) { - case VLVT_REAL: - return status; - default: - status = false; - } - } else if (valuep->format == vpiSuppressVal) { - return status; - } else { - status = false; - } - VL_VPI_ERROR_(__FILE__, __LINE__, "%s: Unsupported format (%s) for %s", - __func__, VerilatedVpiError::strFromVpiVal(valuep->format), - fullname); - return status; -} - -void vl_get_value(const VerilatedVar *varp, void *varDatap, p_vpi_value valuep, - const char *fullname) { - if (!vl_check_format(varp, valuep, fullname, true)) - return; - // Maximum required size is for binary string, one byte per bit plus null - // termination - static thread_local char - t_outStr[VL_VALUE_STRING_MAX_WORDS * VL_EDATASIZE + 1]; - // cppcheck-suppress variableScope - static const thread_local int t_outStrSz = sizeof(t_outStr) - 1; - // string data type is dynamic and may vary in size during simulation - static thread_local std::string t_outDynamicStr; - // We used to presume vpiValue.format = vpiIntVal or if single bit - // vpiScalarVal This may cause backward compatibility issues with older code. - if (valuep->format == vpiVectorVal) { - // Vector pointer must come from our memory pool - // It only needs to persist until the next vpi_get_value - static thread_local t_vpi_vecval t_out[VL_VALUE_STRING_MAX_WORDS * 2]; - valuep->value.vector = t_out; - if (varp->vltype() == VLVT_UINT8) { - t_out[0].aval = *(reinterpret_cast(varDatap)); - t_out[0].bval = 0; - return; - } else if (varp->vltype() == VLVT_UINT16) { - t_out[0].aval = *(reinterpret_cast(varDatap)); - t_out[0].bval = 0; - return; - } else if (varp->vltype() == VLVT_UINT32) { - t_out[0].aval = *(reinterpret_cast(varDatap)); - t_out[0].bval = 0; - return; - } else if (varp->vltype() == VLVT_UINT64) { - const QData data = *(reinterpret_cast(varDatap)); - t_out[1].aval = static_cast(data >> 32ULL); - t_out[1].bval = 0; - t_out[0].aval = static_cast(data); - t_out[0].bval = 0; - return; - } else if (varp->vltype() == VLVT_WDATA) { - const int words = VL_WORDS_I(varp->packed().elements()); - if (VL_UNCOVERABLE(words >= VL_VALUE_STRING_MAX_WORDS)) { - VL_FATAL_MT(__FILE__, __LINE__, "", - "vpi_get_value with more than VL_VALUE_STRING_MAX_WORDS; " - "increase and " - "recompile"); - } - const WDataInP datap = (reinterpret_cast(varDatap)); - for (int i = 0; i < words; ++i) { - t_out[i].aval = datap[i]; - t_out[i].bval = 0; - } - return; - } - } else if (valuep->format == vpiBinStrVal) { - valuep->value.str = t_outStr; - int bits = varp->packed().elements(); - const CData *datap = (reinterpret_cast(varDatap)); - int i; - if (bits > t_outStrSz) { - // limit maximum size of output to size of buffer to prevent overrun. - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Truncating string value of %s for %s" - " as buffer size (%d, VL_VALUE_STRING_MAX_WORDS=%d) is " - "less than required (%d)", - __func__, - VerilatedVpiError::strFromVpiVal(valuep->format), - fullname, t_outStrSz, VL_VALUE_STRING_MAX_WORDS, bits); - bits = t_outStrSz; - } - for (i = 0; i < bits; ++i) { - const char val = (datap[i >> 3] >> (i & 7)) & 1; - t_outStr[bits - i - 1] = val ? '1' : '0'; - } - t_outStr[i] = '\0'; - return; - } else if (valuep->format == vpiOctStrVal) { - valuep->value.str = t_outStr; - int chars = (varp->packed().elements() + 2) / 3; - const int bytes = VL_BYTES_I(varp->packed().elements()); - const CData *datap = (reinterpret_cast(varDatap)); - int i; - if (chars > t_outStrSz) { - // limit maximum size of output to size of buffer to prevent overrun. - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Truncating string value of %s for %s" - " as buffer size (%d, VL_VALUE_STRING_MAX_WORDS=%d) is " - "less than required (%d)", - __func__, - VerilatedVpiError::strFromVpiVal(valuep->format), - fullname, t_outStrSz, VL_VALUE_STRING_MAX_WORDS, chars); - chars = t_outStrSz; - } - for (i = 0; i < chars; ++i) { - const div_t idx = div(i * 3, 8); - int val = datap[idx.quot]; - if ((idx.quot + 1) < bytes) { - // if the next byte is valid or that in - // for when the required 3 bits straddle adjacent bytes - val |= datap[idx.quot + 1] << 8; - } - // align so least significant 3 bits represent octal char - val >>= idx.rem; - if (i == (chars - 1)) { - // most significant char, mask off nonexistent bits when vector - // size is not a multiple of 3 - const unsigned int rem = varp->packed().elements() % 3; - if (rem) { - // generate bit mask & zero nonexistent bits - val &= (1 << rem) - 1; - } - } - t_outStr[chars - i - 1] = '0' + (val & 7); - } - t_outStr[i] = '\0'; - return; - } else if (valuep->format == vpiDecStrVal) { - valuep->value.str = t_outStr; - // outStrSz does not include nullptr termination so add one - if (varp->vltype() == VLVT_UINT8) { - VL_SNPRINTF( - t_outStr, t_outStrSz + 1, "%hhu", - static_cast(*(reinterpret_cast(varDatap)))); - return; - } else if (varp->vltype() == VLVT_UINT16) { - VL_SNPRINTF( - t_outStr, t_outStrSz + 1, "%hu", - static_cast(*(reinterpret_cast(varDatap)))); - return; - } else if (varp->vltype() == VLVT_UINT32) { - VL_SNPRINTF( - t_outStr, t_outStrSz + 1, "%u", - static_cast(*(reinterpret_cast(varDatap)))); - return; - } else if (varp->vltype() == VLVT_UINT64) { - VL_SNPRINTF(t_outStr, t_outStrSz + 1, "%llu", - static_cast( - *(reinterpret_cast(varDatap)))); - return; - } - } else if (valuep->format == vpiHexStrVal) { - valuep->value.str = t_outStr; - int chars = (varp->packed().elements() + 3) >> 2; - const CData *datap = (reinterpret_cast(varDatap)); - int i; - if (chars > t_outStrSz) { - // limit maximum size of output to size of buffer to prevent overrun. - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Truncating string value of %s for %s" - " as buffer size (%d, VL_VALUE_STRING_MAX_WORDS=%d) is " - "less than required (%d)", - __func__, - VerilatedVpiError::strFromVpiVal(valuep->format), - fullname, t_outStrSz, VL_VALUE_STRING_MAX_WORDS, chars); - chars = t_outStrSz; - } - for (i = 0; i < chars; ++i) { - char val = (datap[i >> 1] >> ((i & 1) << 2)) & 15; - if (i == (chars - 1)) { - // most significant char, mask off nonexistent bits when vector - // size is not a multiple of 4 - const unsigned int rem = varp->packed().elements() & 3; - if (rem) { - // generate bit mask & zero nonexistent bits - val &= (1 << rem) - 1; - } - } - t_outStr[chars - i - 1] = "0123456789abcdef"[static_cast(val)]; - } - t_outStr[i] = '\0'; - return; - } else if (valuep->format == vpiStringVal) { - if (varp->vltype() == VLVT_STRING) { - if (varp->isParam()) { - valuep->value.str = reinterpret_cast(varDatap); - return; - } else { - t_outDynamicStr = *(reinterpret_cast(varDatap)); - valuep->value.str = const_cast(t_outDynamicStr.c_str()); - return; - } - } else { - valuep->value.str = t_outStr; - int bytes = VL_BYTES_I(varp->packed().elements()); - const CData *datap = (reinterpret_cast(varDatap)); - int i; - if (bytes > t_outStrSz) { - // limit maximum size of output to size of buffer to prevent overrun. - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: Truncating string value of %s for %s" - " as buffer size (%d, VL_VALUE_STRING_MAX_WORDS=%d) is less than " - "required (%d)", - __func__, VerilatedVpiError::strFromVpiVal(valuep->format), - fullname, t_outStrSz, VL_VALUE_STRING_MAX_WORDS, bytes); - bytes = t_outStrSz; - } - for (i = 0; i < bytes; ++i) { - const char val = datap[bytes - i - 1]; - // other simulators replace [leading?] zero chars with spaces, replicate - // here. - t_outStr[i] = val ? val : ' '; - } - t_outStr[i] = '\0'; - return; - } - } else if (valuep->format == vpiIntVal) { - if (varp->vltype() == VLVT_UINT8) { - valuep->value.integer = *(reinterpret_cast(varDatap)); - return; - } else if (varp->vltype() == VLVT_UINT16) { - valuep->value.integer = *(reinterpret_cast(varDatap)); - return; - } else if (varp->vltype() == VLVT_UINT32) { - valuep->value.integer = *(reinterpret_cast(varDatap)); - return; - } - } else if (valuep->format == vpiRealVal) { - valuep->value.real = *(reinterpret_cast(varDatap)); - return; - } else if (valuep->format == vpiSuppressVal) { - return; - } - VL_VPI_ERROR_(__FILE__, __LINE__, - "%s: Unsupported format (%s) as requested for %s", __func__, - VerilatedVpiError::strFromVpiVal(valuep->format), fullname); -} - -void vpi_get_value(vpiHandle object, p_vpi_value valuep) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_get_value %p\n", object);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - if (VL_UNLIKELY(!valuep)) - return; - - if (const VerilatedVpioVar *const vop = VerilatedVpioVar::castp(object)) { - vl_get_value(vop->varp(), vop->varDatap(), valuep, vop->fullname()); - return; - } else if (const VerilatedVpioParam *const vop = - VerilatedVpioParam::castp(object)) { - vl_get_value(vop->varp(), vop->varDatap(), valuep, vop->fullname()); - return; - } else if (const VerilatedVpioConst *const vop = - VerilatedVpioConst::castp(object)) { - if (valuep->format == vpiIntVal) { - valuep->value.integer = vop->num(); - return; - } - VL_VPI_ERROR_(__FILE__, __LINE__, "%s: Unsupported format (%s) for %s", - __func__, VerilatedVpiError::strFromVpiVal(valuep->format), - vop->fullname()); - return; - } - VL_VPI_ERROR_(__FILE__, __LINE__, "%s: Unsupported vpiHandle (%p)", __func__, - object); -} - -vpiHandle vpi_put_value(vpiHandle object, p_vpi_value valuep, - p_vpi_time /*time_p*/, PLI_INT32 /*flags*/) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_put_value %p %p\n", object, valuep);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - if (VL_UNLIKELY(!valuep)) { - VL_VPI_WARNING_(__FILE__, __LINE__, - "Ignoring vpi_put_value with nullptr value pointer"); - return nullptr; - } - if (const VerilatedVpioVar *const vop = VerilatedVpioVar::castp(object)) { - VL_DEBUG_IF_PLI( - VL_DBG_MSGF("- vpi: vpi_put_value name=%s fmt=%d vali=%d\n", - vop->fullname(), valuep->format, valuep->value.integer); - VL_DBG_MSGF("- vpi: varp=%p putatp=%p\n", vop->varp()->datap(), - vop->varDatap());); - - if (VL_UNLIKELY(!vop->varp()->isPublicRW())) { - VL_VPI_WARNING_(__FILE__, __LINE__, - "Ignoring vpi_put_value to signal marked read-only," - " use public_flat_rw instead: %s", - vop->fullname()); - return nullptr; - } - if (!vl_check_format(vop->varp(), valuep, vop->fullname(), false)) - return nullptr; - if (valuep->format == vpiVectorVal) { - if (VL_UNLIKELY(!valuep->value.vector)) - return nullptr; - if (vop->varp()->vltype() == VLVT_UINT8) { - *(reinterpret_cast(vop->varDatap())) = - valuep->value.vector[0].aval & vop->mask(); - return object; - } else if (vop->varp()->vltype() == VLVT_UINT16) { - *(reinterpret_cast(vop->varDatap())) = - valuep->value.vector[0].aval & vop->mask(); - return object; - } else if (vop->varp()->vltype() == VLVT_UINT32) { - *(reinterpret_cast(vop->varDatap())) = - valuep->value.vector[0].aval & vop->mask(); - return object; - } else if (vop->varp()->vltype() == VLVT_UINT64) { - *(reinterpret_cast(vop->varDatap())) = - VL_SET_QII(valuep->value.vector[1].aval & vop->mask(), - valuep->value.vector[0].aval); - return object; - } else if (vop->varp()->vltype() == VLVT_WDATA) { - const int words = VL_WORDS_I(vop->varp()->packed().elements()); - WDataOutP datap = (reinterpret_cast(vop->varDatap())); - for (int i = 0; i < words; ++i) { - datap[i] = valuep->value.vector[i].aval; - if (i == (words - 1)) - datap[i] &= vop->mask(); - } - return object; - } - } else if (valuep->format == vpiBinStrVal) { - const int bits = vop->varp()->packed().elements(); - const int len = std::strlen(valuep->value.str); - CData *const datap = (reinterpret_cast(vop->varDatap())); - for (int i = 0; i < bits; ++i) { - const char set = - (i < len) ? (valuep->value.str[len - i - 1] == '1') : 0; - // zero bits 7:1 of byte when assigning to bit 0, else - // or in 1 if bit set - if (i & 7) { - datap[i >> 3] |= set << (i & 7); - } else { - datap[i >> 3] = set; - } - } - return object; - } else if (valuep->format == vpiOctStrVal) { - const int chars = (vop->varp()->packed().elements() + 2) / 3; - const int bytes = VL_BYTES_I(vop->varp()->packed().elements()); - const int len = std::strlen(valuep->value.str); - CData *const datap = (reinterpret_cast(vop->varDatap())); - div_t idx; - datap[0] = 0; // reset zero'th byte - for (int i = 0; i < chars; ++i) { - union { - char byte[2]; - uint16_t half; - } val; - idx = div(i * 3, 8); - if (i < len) { - // ignore illegal chars - const char digit = valuep->value.str[len - i - 1]; - if (digit >= '0' && digit <= '7') { - val.half = digit - '0'; - } else { - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: Non octal character '%c' in '%s' as value %s for %s", - __func__, digit, valuep->value.str, - VerilatedVpiError::strFromVpiVal(valuep->format), - vop->fullname()); - val.half = 0; - } - } else { - val.half = 0; - } - // align octal character to position within vector, note that - // the three bits may straddle a byte boundary so two byte wide - // assignments are made to adjacent bytes - but not if the least - // significant byte of the aligned value is the most significant - // byte of the destination. - val.half <<= idx.rem; - datap[idx.quot] |= val.byte[0]; // or in value - if ((idx.quot + 1) < bytes) { - datap[idx.quot + 1] = val.byte[1]; // this also resets - // all bits to 0 prior to or'ing above - } - } - // mask off non-existent bits in the most significant byte - if (idx.quot == (bytes - 1)) { - datap[idx.quot] &= vop->mask_byte(idx.quot); - } else if (idx.quot + 1 == (bytes - 1)) { - datap[idx.quot + 1] &= vop->mask_byte(idx.quot + 1); - } - // zero off remaining top bytes - for (int i = idx.quot + 2; i < bytes; ++i) - datap[i] = 0; - return object; - } else if (valuep->format == vpiDecStrVal) { - char remainder[16]; - unsigned long long val; - const int success = - std::sscanf(valuep->value.str, "%30llu%15s", &val, remainder); - if (success < 1) { - VL_VPI_ERROR_(__FILE__, __LINE__, - "%s: Parsing failed for '%s' as value %s for %s", - __func__, valuep->value.str, - VerilatedVpiError::strFromVpiVal(valuep->format), - vop->fullname()); - return nullptr; - } - if (success > 1) { - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Trailing garbage '%s' in '%s' as value %s for %s", - __func__, remainder, valuep->value.str, - VerilatedVpiError::strFromVpiVal(valuep->format), - vop->fullname()); - } - if (vop->varp()->vltype() == VLVT_UINT8) { - *(reinterpret_cast(vop->varDatap())) = val & vop->mask(); - return object; - } else if (vop->varp()->vltype() == VLVT_UINT16) { - *(reinterpret_cast(vop->varDatap())) = val & vop->mask(); - return object; - } else if (vop->varp()->vltype() == VLVT_UINT32) { - *(reinterpret_cast(vop->varDatap())) = val & vop->mask(); - return object; - } else if (vop->varp()->vltype() == VLVT_UINT64) { - *(reinterpret_cast(vop->varDatap())) = val; - (reinterpret_cast(vop->varDatap()))[1] &= vop->mask(); - return object; - } - } else if (valuep->format == vpiHexStrVal) { - const int chars = (vop->varp()->packed().elements() + 3) >> 2; - CData *const datap = (reinterpret_cast(vop->varDatap())); - const char *val = valuep->value.str; - // skip hex ident if one is detected at the start of the string - if (val[0] == '0' && (val[1] == 'x' || val[1] == 'X')) - val += 2; - const int len = std::strlen(val); - for (int i = 0; i < chars; ++i) { - char hex; - // compute hex digit value - if (i < len) { - const char digit = val[len - i - 1]; - if (digit >= '0' && digit <= '9') { - hex = digit - '0'; - } else if (digit >= 'a' && digit <= 'f') { - hex = digit - 'a' + 10; - } else if (digit >= 'A' && digit <= 'F') { - hex = digit - 'A' + 10; - } else { - VL_VPI_WARNING_( - __FILE__, __LINE__, - "%s: Non hex character '%c' in '%s' as value %s for %s", - __func__, digit, valuep->value.str, - VerilatedVpiError::strFromVpiVal(valuep->format), - vop->fullname()); - hex = 0; - } - } else { - hex = 0; - } - // assign hex digit value to destination - if (i & 1) { - datap[i >> 1] |= hex << 4; - } else { - datap[i >> 1] = hex; // this also resets all - // bits to 0 prior to or'ing above of the msb - } - } - // apply bit mask to most significant byte - datap[(chars - 1) >> 1] &= vop->mask_byte((chars - 1) >> 1); - return object; - } else if (valuep->format == vpiStringVal) { - if (vop->varp()->vltype() == VLVT_STRING) { - *(reinterpret_cast(vop->varDatap())) = valuep->value.str; - return object; - } else { - const int bytes = VL_BYTES_I(vop->varp()->packed().elements()); - const int len = std::strlen(valuep->value.str); - CData *const datap = (reinterpret_cast(vop->varDatap())); - for (int i = 0; i < bytes; ++i) { - // prepend with 0 values before placing string the least significant - // bytes - datap[i] = (i < len) ? valuep->value.str[len - i - 1] : 0; - } - } - return object; - } else if (valuep->format == vpiIntVal) { - if (vop->varp()->vltype() == VLVT_UINT8) { - *(reinterpret_cast(vop->varDatap())) = - vop->mask() & valuep->value.integer; - return object; - } else if (vop->varp()->vltype() == VLVT_UINT16) { - *(reinterpret_cast(vop->varDatap())) = - vop->mask() & valuep->value.integer; - return object; - } else if (vop->varp()->vltype() == VLVT_UINT32) { - *(reinterpret_cast(vop->varDatap())) = - vop->mask() & valuep->value.integer; - return object; - } - } else if (valuep->format == vpiRealVal) { - if (vop->varp()->vltype() == VLVT_REAL) { - *(reinterpret_cast(vop->varDatap())) = valuep->value.real; - return object; - } - } - VL_VPI_ERROR_(__FILE__, __LINE__, - "%s: Unsupported format (%s) as requested for %s", __func__, - VerilatedVpiError::strFromVpiVal(valuep->format), - vop->fullname()); - return nullptr; - } else if (const VerilatedVpioParam *const vop = - VerilatedVpioParam::castp(object)) { - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Ignoring vpi_put_value to vpiParameter: %s", __func__, - vop->fullname()); - return nullptr; - } else if (const VerilatedVpioConst *const vop = - VerilatedVpioConst::castp(object)) { - VL_VPI_WARNING_(__FILE__, __LINE__, - "%s: Ignoring vpi_put_value to vpiConstant: %s", __func__, - vop->fullname()); - return nullptr; - } - VL_VPI_ERROR_(__FILE__, __LINE__, "%s: Unsupported vpiHandle (%p)", __func__, - object); - return nullptr; -} - -void vpi_get_value_array(vpiHandle /*object*/, - p_vpi_arrayvalue /*arrayvalue_p*/, - PLI_INT32 * /*index_p*/, PLI_UINT32 /*num*/) { - VL_VPI_UNIMP_(); -} -void vpi_put_value_array(vpiHandle /*object*/, - p_vpi_arrayvalue /*arrayvalue_p*/, - PLI_INT32 * /*index_p*/, PLI_UINT32 /*num*/) { - VL_VPI_UNIMP_(); -} - -// time processing - -void vpi_get_time(vpiHandle object, p_vpi_time time_p) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - // cppcheck-suppress nullPointer - if (VL_UNLIKELY(!time_p)) { - VL_VPI_WARNING_(__FILE__, __LINE__, - "Ignoring vpi_get_time with nullptr value pointer"); - return; - } - if (time_p->type == vpiSimTime) { - const QData qtime = VL_TIME_Q(); - VlWide<2> itime; - VL_SET_WQ(itime, qtime); - time_p->low = itime[0]; - time_p->high = itime[1]; - return; - } else if (time_p->type == vpiScaledRealTime) { - double dtime = VL_TIME_D(); - if (const VerilatedVpioScope *const vop = - VerilatedVpioScope::castp(object)) { - const int scalePow10 = Verilated::threadContextp()->timeprecision() - - vop->scopep()->timeunit(); - const double scale = vl_time_multiplier(scalePow10); // e.g. 0.0001 - dtime *= scale; - } - time_p->real = dtime; - return; - } - VL_VPI_ERROR_(__FILE__, __LINE__, "%s: Unsupported type (%d)", __func__, - time_p->type); -} - -// I/O routines - -PLI_UINT32 vpi_mcd_open(PLI_BYTE8 *filenamep) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - return VL_FOPEN_NN(filenamep, "wb"); -} - -PLI_UINT32 vpi_mcd_close(PLI_UINT32 mcd) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - VL_FCLOSE_I(mcd); - return 0; -} - -PLI_BYTE8 *vpi_mcd_name(PLI_UINT32 /*mcd*/) { - VL_VPI_UNIMP_(); - return nullptr; -} - -PLI_INT32 vpi_mcd_printf(PLI_UINT32 mcd, PLI_BYTE8 *formatp, ...) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - va_list ap; - va_start(ap, formatp); - const int chars = vpi_mcd_vprintf(mcd, formatp, ap); - va_end(ap); - return chars; -} - -PLI_INT32 vpi_printf(PLI_BYTE8 *formatp, ...) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - va_list ap; - va_start(ap, formatp); - const int chars = vpi_vprintf(formatp, ap); - va_end(ap); - return chars; -} - -PLI_INT32 vpi_vprintf(PLI_BYTE8 *formatp, va_list ap) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - return VL_VPRINTF(formatp, ap); -} - -PLI_INT32 vpi_mcd_vprintf(PLI_UINT32 mcd, PLI_BYTE8 *format, va_list ap) { - VerilatedVpiImp::assertOneCheck(); - FILE *const fp = VL_CVT_I_FP(mcd); - VL_VPI_ERROR_RESET_(); - // cppcheck-suppress nullPointer - if (VL_UNLIKELY(!fp)) - return 0; - const int chars = vfprintf(fp, format, ap); - return chars; -} - -PLI_INT32 vpi_flush(void) { - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - Verilated::runFlushCallbacks(); - return 0; // Gcc coverage bug // LCOV_EXCL_LINE -} - -PLI_INT32 vpi_mcd_flush(PLI_UINT32 mcd) { - VerilatedVpiImp::assertOneCheck(); - FILE *const fp = VL_CVT_I_FP(mcd); - VL_VPI_ERROR_RESET_(); - if (VL_UNLIKELY(!fp)) - return 1; - std::fflush(fp); - return 0; -} - -// utility routines - -PLI_INT32 vpi_compare_objects(vpiHandle /*object1*/, vpiHandle /*object2*/) { - VL_VPI_UNIMP_(); - return 0; -} -PLI_INT32 vpi_chk_error(p_vpi_error_info error_info_p) { - // executing vpi_chk_error does not reset error - // error_info_p can be nullptr, so only return level in that case - VerilatedVpiImp::assertOneCheck(); - p_vpi_error_info const _error_info_p = - VerilatedVpiImp::error_info()->getError(); - if (error_info_p && _error_info_p) - *error_info_p = *_error_info_p; - if (!_error_info_p) - return 0; // no error occurred - return _error_info_p->level; // return error severity level -} - -#ifndef VL_NO_LEGACY -PLI_INT32 vpi_free_object(vpiHandle object) { - // vpi_free_object is IEEE deprecated, use vpi_release_handle - return vpi_release_handle(object); -} -#endif - -PLI_INT32 vpi_release_handle(vpiHandle object) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_release_handle %p\n", object);); - VerilatedVpiImp::assertOneCheck(); - VerilatedVpio *const vop = VerilatedVpio::castp(object); - VL_VPI_ERROR_RESET_(); - if (VL_UNLIKELY(!vop)) - return 0; - VL_DO_DANGLING(delete vop, vop); - return 1; -} - -PLI_INT32 vpi_get_vlog_info(p_vpi_vlog_info vlog_info_p) { - // This is VL_MT_SAFE, but not marked as can't indicate it in the standardized - // header file - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - const auto argc_argv = Verilated::threadContextp()->impp()->argc_argv(); - vlog_info_p->argc = argc_argv.first; - vlog_info_p->argv = argc_argv.second; - vlog_info_p->product = const_cast(Verilated::productName()); - vlog_info_p->version = const_cast(Verilated::productVersion()); - return 1; -} - -// routines added with 1364-2001 - -PLI_INT32 vpi_get_data(PLI_INT32 /*id*/, PLI_BYTE8 * /*dataLoc*/, - PLI_INT32 /*numOfBytes*/) { - VL_VPI_UNIMP_(); - return 0; -} -PLI_INT32 vpi_put_data(PLI_INT32 /*id*/, PLI_BYTE8 * /*dataLoc*/, - PLI_INT32 /*numOfBytes*/) { - VL_VPI_UNIMP_(); - return 0; -} -void *vpi_get_userdata(vpiHandle /*obj*/) { - VL_VPI_UNIMP_(); - return nullptr; -} -PLI_INT32 vpi_put_userdata(vpiHandle /*obj*/, void * /*userdata*/) { - VL_VPI_UNIMP_(); - return 0; -} - -PLI_INT32 vpi_control(PLI_INT32 operation, ...) { - VL_DEBUG_IF_PLI(VL_DBG_MSGF("- vpi: vpi_control %d\n", operation);); - VerilatedVpiImp::assertOneCheck(); - VL_VPI_ERROR_RESET_(); - switch (operation) { - case vpiFinish: { - VL_FINISH_MT("", 0, "*VPI*"); - return 1; - } - case vpiStop: { - VL_STOP_MT("", 0, "*VPI*"); - return 1; // LCOV_EXCL_LINE - } - default: { - VL_VPI_WARNING_(__FILE__, __LINE__, "%s: Unsupported type %s, ignoring", - __func__, VerilatedVpiError::strFromVpiProp(operation)); - return 0; - } - } -} - -vpiHandle vpi_handle_by_multi_index(vpiHandle /*obj*/, PLI_INT32 /*num_index*/, - PLI_INT32 * /*index_array*/) { - VL_VPI_UNIMP_(); - return nullptr; -} diff --git a/arch/src/main/scala/README.md b/arch/src/main/scala/README.md index 19c450ef..28cc9d5e 100644 --- a/arch/src/main/scala/README.md +++ b/arch/src/main/scala/README.md @@ -18,7 +18,7 @@ Main functional modules include: ``` scala/ ├── framework/ - Buckyball core framework -│ ├── blink/ - Blink communication components +│ ├── blink/ - blink communication components │ ├── builtin/ - Built-in hardware components │ │ ├── frontend/ - Frontend processing components │ │ ├── memdomain/ - Memory domain implementation diff --git a/arch/src/main/scala/Util/Pipeline.scala b/arch/src/main/scala/Util/Pipeline.scala deleted file mode 100644 index 714b00ed..00000000 --- a/arch/src/main/scala/Util/Pipeline.scala +++ /dev/null @@ -1,79 +0,0 @@ -package Util - -import chisel3._ -import chisel3.util._ - -class Pipeline[T <: Data] (gen: T, latency: Int)(comb: Seq[T => T] = Seq.fill(latency+1)((x: T) => x)) extends Module { - val io = IO(new Bundle { - val in = Flipped(Decoupled(gen)) - val out = Decoupled(gen) - val busy = Output(Bool()) - }) - - require(comb.size == latency+1, "length of combinational is incorrect") - - if (latency == 0) { - io.in.ready := io.out.ready - io.out.valid := io.in.valid - io.out.bits := comb.head(io.in.bits) - io.busy := io.in.valid - } else { - val stages = Reg(Vec(latency, gen)) - val valids = RegInit(VecInit(Seq.fill(latency)(false.B))) - val stalling = VecInit(Seq.fill(latency)(false.B)) - io.busy := io.in.valid || valids.reduce(_||_) - - // Stall signals - io.in.ready := !stalling.head - stalling.last := valids.last && !io.out.ready - (stalling.init, stalling.tail, valids.init).zipped.foreach { case (s1, s2, v1) => - s1 := v1 && s2 - } - - // Valid signals - // When the pipeline stage ahead of you isn't stalling, then make yourself invalid - io.out.valid := valids.last - when(io.out.ready) { - valids.last := false.B - } - (valids.init, stalling.tail).zipped.foreach { case (v1, s2) => - when(!s2) { - v1 := false.B - } - } - // When the pipeline stage behind you is valid then become true - when(io.in.fire) { - valids.head := true.B - } - (valids.tail, valids.init).zipped.foreach { case (v2, v1) => - when(v1) { - v2 := true.B - } - } - - // Stages - when(io.in.fire) { - stages.head := comb.head(io.in.bits) - } - io.out.bits := comb.last(stages.last) - ((stages.tail zip stages.init) zip (stalling.tail zip comb.tail.init)).foreach { case ((st2, st1), (s2, c1)) => - when(!s2) { - st2 := c1(st1) - } - } - } -} - -object Pipeline { - def apply[T <: Data](in: ReadyValidIO[T], latency: Int, comb: Seq[T => T]): DecoupledIO[T] = { - val p = Module(new Pipeline(in.bits.cloneType, latency)(comb)) - p.io.in <> in - p.io.out - } - - def apply[T <: Data](in: ReadyValidIO[T], latency: Int): DecoupledIO[T] = { - val p = Module(new Pipeline(in.bits.cloneType, latency)()) - p.io.in <> in - p.io.out - } -} diff --git a/arch/src/main/scala/Util/README.md b/arch/src/main/scala/Util/README.md deleted file mode 100644 index 17a5e607..00000000 --- a/arch/src/main/scala/Util/README.md +++ /dev/null @@ -1,107 +0,0 @@ -# Buckyball Utility Library - -## Overview - -This directory contains general utility functions and helper modules in the Buckyball framework, primarily providing reusable hardware design components. Located at `arch/src/main/scala/Util`, it serves as the base utility layer throughout the architecture, providing common hardware building blocks for other modules. - -Main functionality includes: -- **Pipeline**: Pipeline control and management tools -- Common hardware design pattern implementations - -## Code Structure - -``` -Util/ -└── Pipeline.scala - Pipeline control implementation -``` - -### File Dependencies - -**Pipeline.scala** (Base utility layer) -- Provides general pipeline control logic -- Referenced by other modules requiring pipeline functionality -- Implements standard pipeline interfaces and control signals - -## Module Description - -### Pipeline.scala - -**Main functionality**: Provides general pipeline control and management functionality - -**Key components**: - -```scala -class Pipeline extends Module { - val io = IO(new Bundle { - val flush = Input(Bool()) - val stall = Input(Bool()) - val valid_in = Input(Bool()) - val ready_out = Output(Bool()) - val valid_out = Output(Bool()) - }) - - // Pipeline control logic - val pipeline_valid = RegInit(false.B) - - when(io.flush) { - pipeline_valid := false.B - }.elsewhen(!io.stall) { - pipeline_valid := io.valid_in - } - - io.ready_out := !io.stall - io.valid_out := pipeline_valid && !io.flush -} -``` - -**Pipeline control signals**: -- **flush**: Pipeline flush signal, clears all pipeline stages -- **stall**: Pipeline stall signal, maintains current state -- **valid_in**: Input data valid signal -- **ready_out**: Ready to receive new data signal -- **valid_out**: Output data valid signal - -**Inputs/Outputs**: -- Input: Control signals (flush, stall) and data valid signal -- Output: Pipeline state and data valid indication -- Edge cases: flush has higher priority than stall, ensuring correct pipeline behavior - -**Dependencies**: Chisel3 base library, standard Module and Bundle interfaces - -## Usage - -### Usage - -**Integrating pipeline control**: -```scala -class MyModule extends Module { - val pipeline = Module(new Pipeline) - - // Connect control signals - pipeline.io.flush := flush_condition - pipeline.io.stall := stall_condition - pipeline.io.valid_in := input_valid - - // Use pipeline output - val output_enable = pipeline.io.valid_out -} -``` - -### Design Patterns - -**Pipeline cascading**: -- Supports cascaded connection of multi-stage pipelines -- Provides standard ready/valid handshake protocol -- Ensures correctness and timing of data flow - -**Backpressure handling**: -- Implements standard backpressure propagation mechanism -- Supports pause and resume of upstream modules -- Guarantees no data loss or duplication - -### Notes - -1. **Timing constraints**: flush signal should be asserted synchronously at clock rising edge -2. **Reset behavior**: Pipeline should clear all valid bits on reset -3. **Combinational logic**: ready signal is combinational logic, avoid timing path issues -4. **Extensibility**: Design supports parameterized pipeline depth and data width diff --git a/arch/src/main/scala/examples/BuckyBallConfig.scala b/arch/src/main/scala/examples/BuckyBallConfig.scala deleted file mode 100644 index 3b87a774..00000000 --- a/arch/src/main/scala/examples/BuckyBallConfig.scala +++ /dev/null @@ -1,22 +0,0 @@ -package examples - -import chisel3._ -import framework.builtin.BaseConfig -import examples.toy.BuckyballToyConfig - -object BuckyballConfigs { - val defaultConfig = BaseConfig - val toyConfig = BuckyballToyConfig.defaultConfig - - // Actually used configuration - val customConfig = toyConfig - - type CustomBuckyballConfig = BaseConfig -} - - -// Get currently selected configuration -object CustomBuckyballConfig { - import BuckyballConfigs._ - def apply(): CustomBuckyballConfig = BuckyballConfigs.customConfig -} diff --git a/arch/src/main/scala/examples/README.md b/arch/src/main/scala/examples/README.md deleted file mode 100644 index c9cc7c49..00000000 --- a/arch/src/main/scala/examples/README.md +++ /dev/null @@ -1,346 +0,0 @@ -# Buckyball Example Configurations - -## Overview - -This directory contains example configurations and reference implementations of the Buckyball framework, demonstrating how to configure and extend Buckyball systems. Located at `arch/src/main/scala/examples`, it serves as the configuration layer, providing configuration templates and system instances for developers. - -Main components include: -- **BuckyballConfig.scala**: Global configuration parameter definitions -- **toy/**: Complete example system implementation with custom coprocessor and CSR extensions - -## Code Structure - -``` -examples/ -├── BuckyballConfig.scala - Global configuration definitions -└── toy/ - Complete example system - ├── balldomain/ - Ball domain component implementation - │ ├── BallDomain.scala - Ball domain top-level - │ ├── bbus/ - Ball bus registration - │ │ └── busRegister.scala - │ ├── rs/ - Ball RS registration - │ │ └── rsRegister.scala - │ └── decoder/ - Ball decoder (if exists) - ├── CustomConfigs.scala - System configuration composition - └── ToyBuckyball.scala - System top-level module -``` - -### File Dependencies - -**BuckyballConfig.scala** (Base Configuration Layer) -- Defines global configuration parameters and defaults -- Inherited and extended by all other configuration files -- Provides system-level configuration interface - -**toy/CustomConfigs.scala** (Configuration Composition Layer) -- Inherits from BuckyballConfig and adds custom parameters -- Composes multiple configuration fragments into complete configuration -- Provides configuration support for ToyBuckyball - -**toy/ToyBuckyball.scala** (System Instantiation Layer) -- Uses CustomConfigs to instantiate complete system -- Serves as entry point for Mill build -- Generates final Verilog code - -## Module Details - -### BuckyballConfig.scala - -**Main Function**: Define global configuration parameters for the Buckyball framework - -**Key Components**: - -```scala -object BuckyballConfigs { - val defaultConfig = BaseConfig - val toyConfig = BuckyballToyConfig.defaultConfig - - // Actually used configuration - val customConfig = toyConfig - - type CustomBuckyballConfig = BaseConfig -} -``` - -**Configuration Selection**: -The framework uses `customConfig` to select the active configuration. This allows easy switching between different system configurations. - -**Input/Output**: -- Input: No direct input, parameters passed through configuration system -- Output: Configuration parameters for use by other modules -- Edge cases: Configuration conflicts resolved by priority-based overriding - -### toy/ - Example System - -The toy system demonstrates a complete Buckyball implementation with various Ball devices. - -#### toy/ToyBuckyball.scala - -**Main Function**: System top-level module, instantiates complete toy system - -**Key Components**: - -```scala -class ToyBuckyball(implicit b: CustomBuckyballConfig, p: Parameters) extends LazyRoCC { - override lazy val module = new ToyBuckyballModuleImp(this) -} - -class ToyBuckyballModuleImp(outer: ToyBuckyball) extends LazyRoCCModuleImp(outer) { - // Global Decoder - val globalDecoder = Module(new GlobalDecoder) - - // Global Reservation Station (with ROB) - val globalRS = Module(new GlobalReservationStation) - - // Ball Domain (regular Module, not LazyModule) - val ballDomain = Module(new BallDomain) - - // Memory Domain (complete domain with DMA+TLB+SRAM) - val memDomain = LazyModule(new MemDomain) - - // Connect components - globalDecoder.io.rocc <> io.cmd - globalRS.io.decode <> globalDecoder.io.issue - ballDomain.io.issue <> globalRS.io.ballIssue - memDomain.module.io.issue <> globalRS.io.memIssue - // ... more connections -} -``` - -**Build Flow**: -1. Load configuration from BuckyballConfig -2. Instantiate ToyBuckyball LazyRoCC module -3. Generate Verilog through ChiselStage -4. Output to generated-src directory - -**Input/Output**: -- Input: RoCC interface commands from Rocket core -- Output: RoCC interface responses, busy signals -- Edge cases: Configuration errors cause build failure - -#### toy/balldomain/ - Ball Domain Components - -**BallDomain.scala**: Ball domain top-level module -- Integrates Ball Decoder, local Ball RS, and BBus -- Provides single-channel interface to Global RS -- Routes commands to appropriate Ball devices - -**bbus/busRegister.scala**: Ball bus registration -```scala -class BBusModule extends BBus { - // Register Ball device generators - registerBall(() => new VecBall, ballId = 0.U) - registerBall(() => new MatrixBall, ballId = 1.U) - registerBall(() => new TransposeBall, ballId = 2.U) - registerBall(() => new Im2colBall, ballId = 3.U) - registerBall(() => new ReluBall, ballId = 4.U) -} -``` - -**rs/rsRegister.scala**: Ball RS device registration -```scala -class BallRSModule extends BallReservationStation { - // Register Ball device information - registerBallInfo(name = "VecBall", bid = 0, latency = 10) - registerBallInfo(name = "MatrixBall", bid = 1, latency = 20) - registerBallInfo(name = "TransposeBall", bid = 2, latency = 15) - registerBallInfo(name = "Im2colBall", bid = 3, latency = 15) - registerBallInfo(name = "ReluBall", bid = 4, latency = 10) -} -``` - -#### toy/CustomConfigs.scala - -**Main Function**: Compose multiple configuration fragments for the toy system - -**Configuration Composition**: - -```scala -object BuckyballToyConfig { - val defaultConfig = BaseConfig( - opcodes = OpcodeSet.custom3, - inputType = UInt(8.W), // INT8 input - accType = UInt(32.W), // INT32 accumulator - veclane = 16, // 16-element vectors - accveclane = 4, // 4-element accumulator vectors - rob_entries = 16, // 16 ROB entries - sp_banks = 4, // 4 scratchpad banks - acc_banks = 8, // 8 accumulator banks - sp_capacity = CapacityInKilobytes(256), // 256KB scratchpad - acc_capacity = CapacityInKilobytes(64), // 64KB accumulator - numVecPE = 16, // 16 vector PEs - numVecThread = 16 // 16 vector threads - ) -} -``` - -**Configuration Parameters**: -- **opcodes**: Custom instruction opcode set (custom3 = 0x7b) -- **inputType**: Data type for input operands -- **accType**: Data type for accumulator -- **veclane**: Number of elements per vector lane -- **rob_entries**: Reorder buffer depth -- **Memory configuration**: Bank counts and capacities -- **Vector configuration**: PE count and thread count - -## Usage Guide - -### Building the Toy System - -**Generate Verilog**: -```bash -cd arch -mill arch.runMain examples.toy.ToyBuckyball -``` - -**Generated Files**: -- Location: `arch/generated-src/toy/` -- Files: Verilog (.v), FIRRTL (.fir), annotation (.anno.json) - -### Custom Configuration Development - -**Steps**: -1. Copy `CustomConfigs.scala` as template -2. Modify configuration parameters to meet requirements -3. Implement necessary custom components -4. Update top-level module to reference new configuration -5. Register Ball devices in BBus and Ball RS - -**Example: Adding New Ball Device**: - -1. Implement Ball device: -```scala -class MyCustomBall(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module with BallRegist { - // Implement Ball interfaces - val io = IO(new BlinkIO) - def ballId = 6.U // Assign unique Ball ID - // ... implementation -} -``` - -2. Register in BBusModule: -```scala -registerBall(() => new MyCustomBall, ballId = 6.U) -``` - -3. Register in BallRSModule: -```scala -registerBallInfo(name = "MyCustomBall", bid = 6, latency = 12) -``` - -### Configuration Best Practices - -**Parameter Selection**: -1. **Memory Sizes**: Balance capacity vs. area - - Scratchpad: Main working memory for data - - Accumulator: Smaller, used for accumulation results - -2. **ROB Depth**: Impacts instruction-level parallelism - - Larger ROB: More in-flight instructions, higher parallelism - - Smaller ROB: Lower area, simpler control logic - -3. **Bank Counts**: Affects memory bandwidth - - More banks: Higher parallel access bandwidth - - Fewer banks: Simpler arbitration, lower area - -4. **Vector Configuration**: Depends on workload - - Vector lane width: Match data parallelism - - PE/Thread count: Balance performance vs. area - -**Common Configurations**: - -```scala -// High-performance configuration -val highPerfConfig = BaseConfig( - veclane = 32, // Wider vectors - rob_entries = 32, // Deeper ROB - sp_banks = 8, // More banks - sp_capacity = CapacityInKilobytes(512) -) - -// Area-optimized configuration -val smallConfig = BaseConfig( - veclane = 8, - rob_entries = 8, - sp_banks = 2, - sp_capacity = CapacityInKilobytes(64) -) -``` - -### Important Notes - -1. **Configuration Priority**: Later configurations in the chain override earlier ones with same parameter names -2. **Dependency Management**: Ensure custom component dependencies are correctly declared in configuration -3. **Build Path**: Generated file paths specified by TargetDirAnnotation -4. **Parameter Validation**: Configuration parameters validated during instantiation; invalid configurations cause build failure -5. **Ball ID Uniqueness**: Each Ball device must have unique ID across the system -6. **Bank Access Rules**: Remember op1 and op2 cannot access same bank simultaneously - -## System Architecture - -The toy system implements the complete Buckyball architecture: - -``` -┌─────────────────────────────────────────────────────────┐ -│ Rocket Core (via RoCC) │ -└────────────────────┬────────────────────────────────────┘ - │ - ┌────────▼────────┐ - │ Global Decoder │ - └────────┬────────┘ - │ - ┌────────▼────────┐ - │ Global RS │ - │ (with ROB) │ - └────┬──────┬─────┘ - │ │ - ┌───────▼──┐ ┌▼──────────┐ - │ Ball │ │ Mem │ - │ Domain │ │ Domain │ - │ │ │ │ - │ ┌─────┐ │ │ ┌──────┐ │ - │ │BBus │ │ │ │ DMA │ │ - │ └──┬──┘ │ │ │+TLB │ │ - │ │ │ │ └───┬──┘ │ - │ ┌──▼───┐│ │ │ │ - │ │Balls ││ │ ┌──▼──┐ │ - │ └──────┘│ │ │Mem │ │ - └──────┬───┘ │ │Ctrl │ │ - │ │ └─────┘ │ - │ └─────┬────┘ - │ │ - ┌───▼───────────▼───┐ - │ Memory Controller│ - │ (Scratchpad+Acc) │ - └───────────────────┘ -``` - -**Supported Ball Devices**: -- **VecBall** (ID=0): Vector operations -- **MatrixBall** (ID=1): Matrix multiplication (various formats) -- **TransposeBall** (ID=2): Matrix transpose -- **Im2colBall** (ID=3): Im2col transformation for convolution -- **ReluBall** (ID=4): ReLU activation function - -## Related Documentation - -- [Framework Overview](../framework/README.md) - Core framework architecture -- [Ball Domain Details](toy/balldomain/README.md) - Ball domain implementation -- [Prototype Ball Devices](../prototype/README.md) - Ball device implementations -- [Memory Domain](../framework/builtin/memdomain/README.md) - Memory subsystem -- [Simulation Guide](../sims/README.md) - Running simulations - -## Troubleshooting - -**Issue**: Build fails with "Ball ID conflict" -- **Solution**: Ensure each Ball device has unique ID in both BBus and RS registration - -**Issue**: Generated Verilog has timing violations -- **Solution**: Reduce clock frequency or optimize critical paths - -**Issue**: Simulation shows incorrect results -- **Solution**: Verify Ball device implementation and memory access patterns - -**Issue**: Configuration parameter not taking effect -- **Solution**: Check configuration priority and ensure parameter is in correct config fragment diff --git a/arch/src/main/scala/examples/goban/CustomConfigs.scala b/arch/src/main/scala/examples/goban/CustomConfigs.scala new file mode 100644 index 00000000..578e6c57 --- /dev/null +++ b/arch/src/main/scala/examples/goban/CustomConfigs.scala @@ -0,0 +1,44 @@ +package examples.goban + +import org.chipsalliance.cde.config.Config +import framework.core.bbtile.WithNBBTiles +import framework.top.GlobalConfig +import framework.top.configs.TopConfig + +import freechips.rocketchip.subsystem.InSubsystem + +/** + * Goban: multi-core BBTile configuration. + * + * Each BBTile contains nCores Rocket cores, each paired with a BuckyballAccelerator. + * All accelerators within the tile share a single SharedMemBackend and BarrierUnit. + * The ISA, Ball operators, and memory layout are identical to the toy configuration. + */ +object GobanConfig { + + /** Number of cores inside each BBTile. */ + val nCores: Int = 4 + + /** GlobalConfig for goban: same as toy but with nCores set. */ + def apply(): GlobalConfig = { + val base = GlobalConfig() + base.copy(top = base.top.copy(nCores = nCores)) + } + +} + +/** 1 BBTile × 4 cores (4 Rocket + 4 Buckyball, shared SharedMem + BarrierUnit) */ +class BuckyballGobanConfig + extends Config( + new WithNBBTiles(1, buckyballConfig = GobanConfig()) ++ + new chipyard.config.WithSystemBusWidth(128) ++ + new chipyard.config.AbstractConfig + ) + +/** 2 BBTiles × 4 cores = 8 total cores */ +class BuckyballGoban2TileConfig + extends Config( + new WithNBBTiles(2, buckyballConfig = GobanConfig()) ++ + new chipyard.config.WithSystemBusWidth(128) ++ + new chipyard.config.AbstractConfig + ) diff --git a/arch/src/main/scala/examples/toy/CustomConfigs.scala b/arch/src/main/scala/examples/toy/CustomConfigs.scala index 878bc86a..13bcd042 100644 --- a/arch/src/main/scala/examples/toy/CustomConfigs.scala +++ b/arch/src/main/scala/examples/toy/CustomConfigs.scala @@ -1,70 +1,10 @@ package examples.toy -import org.chipsalliance.cde.config.{Config, Parameters, Field} -import chisel3._ -import freechips.rocketchip.diplomacy.LazyModule -import freechips.rocketchip.subsystem.SystemBusKey -// import freechips.rocketchip.tile.{BuildRoCC, OpcodeSet} -import freechips.rocketchip.tile._ -import examples.toy.ToyBuckyball -import framework.builtin.BaseConfig -import examples.BuckyballConfigs.CustomBuckyballConfig -import examples.CustomBuckyballConfig +import org.chipsalliance.cde.config.Config +import framework.core.bbtile.WithNBBTiles -// Use standard BuildRoCC instead of BuildRoCCBB -// import framework.rocket.BuildRoCCBB -// import framework.rocket.MultiRoCCKeyBB - -object BuckyballToyConfig { - val defaultConfig = new BaseConfig( - inputType = UInt(8.W), - accType = UInt(32.W), - sp_banks = 4, - acc_banks = 8 - ) -} - -class BuckyballCustomConfig( - buckyballConfig: CustomBuckyballConfig = CustomBuckyballConfig() -) extends Config((site, here, up) => { - case BuildRoCC => up(BuildRoCC) ++ Seq( - (p: Parameters) => { - implicit val q = p - val buckyball = LazyModule(new ToyBuckyball(buckyballConfig)) - buckyball - } - ) -}) - -// class WithMultiRoCCToyBuckyball(harts: Int*)( -// buckyballConfig: CustomBuckyballConfig = CustomBuckyballConfig() -// ) extends Config((site, here, up) => { -// case MultiRoCCKeyBB => up(MultiRoCCKeyBB, site) ++ harts.distinct.map { i => -// (i -> Seq((p: Parameters) => { -// implicit val q = p -// val buckyball = LazyModule(new ToyBuckyball(buckyballConfig)) -// buckyball -// })) -// } -// }) - -import framework.gendomain.common.VectorParams - -class BuckyballToyConfig extends Config( - new BuckyballCustomConfig ++ - new framework.rocket.WithNBuckyballCores(1) ++ - new chipyard.config.WithSystemBusWidth(128) ++ - new chipyard.config.AbstractConfig) - -class BuckyballToyVectorConfig extends Config( - new BuckyballCustomConfig ++ - new framework.gendomain.rocket.WithRocketVectorUnit(64, 64, VectorParams.minParams) ++ - new framework.rocket.WithNBuckyballCores(1) ++ - new chipyard.config.WithSystemBusWidth(128) ++ - new chipyard.config.AbstractConfig) - -import freechips.rocketchip.subsystem.{InCluster, InSubsystem, SBUS, MBUS} -import freechips.rocketchip.devices.tilelink.{BootROMParams, BootROMLocated} +import freechips.rocketchip.subsystem.{InCluster, InSubsystem} +import freechips.rocketchip.devices.tilelink.{BootROMLocated, BootROMParams} import constellation.channel._ import constellation.routing._ import constellation.router._ @@ -72,101 +12,25 @@ import constellation.topology._ import constellation.noc._ import scala.collection.immutable.ListMap -// Increase BootROM size for large core counts (device tree becomes very large) -// Note: For AddressSet(base, size-1), we need (base & (size-1)) == 0 -// This means base must be aligned to size (size must be power of 2) -// For 256 cores (8 clusters × 32 cores), 512KB should be sufficient -class WithLargeBootROM(address: BigInt = 0x80000, size: Int = 0x80000) extends Config((site, here, up) => { - case BootROMLocated(InSubsystem) => { - up(BootROMLocated(InSubsystem)).map(_.copy(address = address, size = size)) - } -}) - -// 4-core test configuration to understand NoC mapping -class BuckyballToy4Config extends Config( - new WithLargeBootROM(0x80000, 0x80000) ++ - new constellation.soc.WithSbusNoC(constellation.protocol.SimpleTLNoCParams( - constellation.protocol.DiplomaticNetworkNodeMapping( - inNodeMapping = ListMap( - "serial_tl" -> 0, - "Core 0 " -> 1, // Space after number for precise matching - "Core 1 " -> 2, - "Core 2 " -> 3, - "Core 3 " -> 4, - "debug" -> 5, - // buckyball-stream ports appear together, map them to same node - "buckyball-stream-reader[0],buckyball-stream-writer[0]" -> 6, - "buckyball-stream-reader[1],buckyball-stream-writer[1]" -> 7, - "buckyball-stream-reader[2],buckyball-stream-writer[2]" -> 8, - "buckyball-stream-reader[3],buckyball-stream-writer[3]" -> 9 - ), - outNodeMapping = ListMap( - "pbus" -> 10, - "system[0]" -> 11, - "system[1]" -> 12, - "system[2]" -> 13, - "system[3]" -> 14 - ) - ), - NoCParams( - topology = TerminalRouter(Mesh2D(4, 4)), // 4x4 mesh = 16 nodes (enough for 10 inputs + 5 outputs) - channelParamGen = (a, b) => UserChannelParams(Seq.fill(8) { UserVirtualChannelParams(4) }), - routingRelation = BlockingVirtualSubnetworksRouting(TerminalRouterRouting(Mesh2DEscapeRouting()), 5, 1)) - )) ++ - new BuckyballCustomConfig ++ - new framework.rocket.WithNBuckyballCores(4) ++ - new freechips.rocketchip.subsystem.WithNBanks(4) ++ - new chipyard.config.WithSystemBusWidth(128) ++ - new chipyard.config.AbstractConfig) - -// 64-core test configuration with 2 L2 banks -class BuckyballToy64Config extends Config( - new WithLargeBootROM(0x80000, 0x80000) ++ - new constellation.soc.WithSbusNoC(constellation.protocol.SimpleTLNoCParams( - constellation.protocol.DiplomaticNetworkNodeMapping( - inNodeMapping = ListMap( - "serial_tl" -> 0, - "debug" -> 1 - ) ++ (0 until 64).map(i => s"Core $i " -> (2 + i)).toMap // Note the space after number! - ++ (0 until 64).map(i => s"buckyball-stream-reader[$i],buckyball-stream-writer[$i]" -> (66 + i)).toMap, - outNodeMapping = ListMap( - "pbus" -> 130 - ) ++ (0 until 2).map(i => s"system[$i]" -> (131 + i)).toMap - ), - NoCParams( - // 12x12 mesh = 144 nodes (enough for 130 inputs + 3 outputs) - topology = TerminalRouter(Mesh2D(12, 12)), - channelParamGen = (a, b) => UserChannelParams(Seq.fill(8) { UserVirtualChannelParams(4) }), - routingRelation = BlockingVirtualSubnetworksRouting(TerminalRouterRouting(Mesh2DEscapeRouting()), 5, 1)) - )) ++ - new BuckyballCustomConfig ++ - new framework.rocket.WithNBuckyballCores(64) ++ - new freechips.rocketchip.subsystem.WithNBanks(2) ++ - new chipyard.config.WithSystemBusWidth(128) ++ - new chipyard.config.AbstractConfig) - -// 256-core test configuration with 32 L2 banks -class BuckyballToy256Config extends Config( - new WithLargeBootROM(0x80000, 0x80000) ++ - new constellation.soc.WithSbusNoC(constellation.protocol.SimpleTLNoCParams( - constellation.protocol.DiplomaticNetworkNodeMapping( - inNodeMapping = ListMap( - "serial_tl" -> 0, - "debug" -> 1 - ) ++ (0 until 256).map(i => s"Core $i " -> (2 + i)).toMap // Note the space after number! - ++ (0 until 256).map(i => s"buckyball-stream-reader[$i],buckyball-stream-writer[$i]" -> (258 + i)).toMap, - outNodeMapping = ListMap( - "pbus" -> 514 - ) ++ (0 until 32).map(i => s"system[$i]" -> (515 + i)).toMap - ), - NoCParams( - // 24x24 mesh = 576 nodes (enough for 514 inputs + 33 outputs) - topology = TerminalRouter(Mesh2D(24, 24)), - channelParamGen = (a, b) => UserChannelParams(Seq.fill(8) { UserVirtualChannelParams(4) }), - routingRelation = BlockingVirtualSubnetworksRouting(TerminalRouterRouting(Mesh2DEscapeRouting()), 5, 1)) - )) ++ - new BuckyballCustomConfig ++ - new framework.rocket.WithNBuckyballCores(256) ++ - new freechips.rocketchip.subsystem.WithNBanks(32) ++ - new chipyard.config.WithSystemBusWidth(128) ++ - new chipyard.config.AbstractConfig) +/** Single BBTile: 1 Rocket core + 1 Buckyball accelerator */ +class BuckyballToyConfig + extends Config( + new WithNBBTiles(1) ++ + new chipyard.config.WithSystemBusWidth(128) ++ + new chipyard.config.AbstractConfig + ) + +/** Single Rocket core only (no Buckyball) */ +class RocketOnlyConfig + extends Config( + new WithNBBTiles(1, withBuckyball = false) ++ + new chipyard.config.WithSystemBusWidth(128) ++ + new chipyard.config.AbstractConfig + ) + +// Increase BootROM size for large core counts +class WithLargeBootROM(address: BigInt = 0x80000, size: Int = 0x80000) + extends Config((site, here, up) => { + case BootROMLocated(InSubsystem) => + up(BootROMLocated(InSubsystem)).map(_.copy(address = address, size = size)) + }) diff --git a/arch/src/main/scala/examples/toy/README.md b/arch/src/main/scala/examples/toy/README.md deleted file mode 100644 index ecf88766..00000000 --- a/arch/src/main/scala/examples/toy/README.md +++ /dev/null @@ -1,167 +0,0 @@ -# Toy Buckyball Example Implementation - -## Overview - -This directory contains a complete example implementation of the Buckyball framework, demonstrating how to build a custom coprocessor based on the RoCC interface. Located in `arch/src/main/scala/examples/toy`, it serves as a reference implementation for the Buckyball system, integrating global decoder, Ball domain, and memory domain. - -Core components: -- **ToyBuckyball.scala**: Main RoCC coprocessor implementation -- **CustomConfigs.scala**: System configuration and RoCC integration -- **CSR.scala**: Custom control and status registers -- **balldomain/**: Ball domain related components - -## Code Structure - -``` -toy/ -├── ToyBuckyball.scala - Main coprocessor implementation -├── CustomConfigs.scala - Configuration definitions -├── CSR.scala - CSR implementation -└── balldomain/ - Ball domain components -``` - -### File Dependencies - -**ToyBuckyball.scala** (Core implementation layer) -- Extends LazyRoCCBB, implements RoCC coprocessor interface -- Integrates GlobalDecoder, BallDomain, MemDomain -- Manages TileLink connections and DMA components - -**CustomConfigs.scala** (Configuration layer) -- Defines BuckyballCustomConfig and BuckyballToyConfig -- Configures RoCC integration and system parameters -- Provides multi-core configuration support - -**CSR.scala** (Register layer) -- Implements FenceCSR control register -- Provides simple 64-bit register interface - -## Module Description - -### ToyBuckyball.scala - -**Main functionality**: Implements complete Buckyball RoCC coprocessor - -**Key components**: - -```scala -class ToyBuckyball(val b: CustomBuckyballConfig)(implicit p: Parameters) - extends LazyRoCCBB (opcodes = b.opcodes, nPTWPorts = 2) { - - val reader = LazyModule(new BBStreamReader(...)) - val writer = LazyModule(new BBStreamWriter(...)) - val xbar_node = TLXbar() -} -``` - -**System architecture**: -```scala -// Frontend: global decoder -val gDecoder = Module(new GlobalDecoder) - -// Backend: Ball domain and memory domain -val ballDomain = Module(new BallDomain) -val memDomain = Module(new MemDomain) - -// Response arbitration -val respArb = Module(new Arbiter(new RoCCResponseBB()(p), 2)) -``` - -**TileLink connections**: -```scala -xbar_node := TLBuffer() := reader.node -xbar_node := TLBuffer() := writer.node -id_node := TLWidthWidget(b.dma_buswidth/8) := TLBuffer() := xbar_node -``` - -**Inputs/Outputs**: -- Input: RoCC command interface, PTW interface -- Output: RoCC response, TileLink memory access -- Edge cases: Busy-wait handling during Fence operations - -### CustomConfigs.scala - -**Main functionality**: Defines system configuration and RoCC integration - -**Configuration class definition**: -```scala -class BuckyballCustomConfig( - buckyballConfig: CustomBuckyballConfig = CustomBuckyballConfig() -) extends Config((site, here, up) => { - case BuildRoCCBB => up(BuildRoCCBB) ++ Seq( - (p: Parameters) => { - val buckyball = LazyModule(new ToyBuckyball(buckyballConfig)) - buckyball - } - ) -}) -``` - -**System configuration**: -```scala -class BuckyballToyConfig extends Config( - new framework.rocket.WithNBuckyballCores(1) ++ - new BuckyballCustomConfig(CustomBuckyballConfig()) ++ - new chipyard.config.WithSystemBusWidth(128) ++ - new WithCustomBootROM ++ - new chipyard.config.AbstractConfig -) -``` - -**Multi-core support**: -```scala -class WithMultiRoCCToyBuckyball(harts: Int*) extends Config(...) -``` - -### CSR.scala - -**Main functionality**: Provides custom control and status registers - -**Implementation**: -```scala -object FenceCSR { - def apply(): UInt = RegInit(0.U(64.W)) -} -``` - -**Fence handling logic**: -```scala -val fenceCSR = FenceCSR() -val fenceSet = ballDomain.io.fence_o -val allDomainsIdle = !ballDomain.io.busy && !memDomain.io.busy - -when (fenceSet) { - fenceCSR := 1.U - io.cmd.ready := allDomainsIdle -} -``` - -## Usage - -### System Integration - -**RoCC interface integration**: -- Register coprocessor through BuildRoCCBB configuration key -- Support multi-core configuration -- Provide 2 PTW ports for address translation - -**Inter-domain communication**: -```scala -// BallDomain -> MemDomain bridge -ballDomain.io.sramRead <> memDomain.io.ballDomain.sramRead -ballDomain.io.sramWrite <> memDomain.io.ballDomain.sramWrite -``` - -**DMA connections**: -```scala -memDomain.io.dma.read.req <> outer.reader.module.io.req -memDomain.io.dma.write.req <> outer.writer.module.io.req -``` - -### Notes - -1. **Fence semantics**: Use CSR to implement Fence operation synchronization -2. **Busy-wait detection**: Assertion checks to prevent long simulation stalls -3. **TLB integration**: TLB functionality integrated in MemDomain -4. **Response arbitration**: BallDomain has higher priority than MemDomain -5. **Configuration dependencies**: Correctly configure CustomBuckyballConfig parameters diff --git a/arch/src/main/scala/examples/toy/ToyBuckyBall.scala b/arch/src/main/scala/examples/toy/ToyBuckyBall.scala deleted file mode 100644 index fb0bd3d9..00000000 --- a/arch/src/main/scala/examples/toy/ToyBuckyBall.scala +++ /dev/null @@ -1,145 +0,0 @@ -package examples.toy - -import java.nio.charset.StandardCharsets -import java.nio.file.{Files, Paths} - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ - -import freechips.rocketchip.diplomacy.LazyModule -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ - -import freechips.rocketchip.tile.{LazyRoCC, LazyRoCCModuleImp} -import framework.frontend.decoder.GlobalDecoder -import framework.memdomain.dma.{BBStreamReader, BBStreamWriter} -import framework.memdomain.MemDomain -import examples.toy.balldomain.BallDomain -import examples.BuckyballConfigs.CustomBuckyballConfig - - -class ToyBuckyball(val b: CustomBuckyballConfig)(implicit p: Parameters) - extends LazyRoCC (opcodes = b.opcodes, nPTWPorts = 1) { - - val xLen = p(TileKey).core.xLen // the width of core's register file - - val id_node = TLIdentityNode() - val xbar_node = TLXbar() - - val spad_w = b.inputType.getWidth * b.veclane - val reader = LazyModule(new BBStreamReader(b.max_in_flight_mem_reqs, b.dma_buswidth, b.dma_maxbytes, spad_w)) - val writer = LazyModule(new BBStreamWriter(b.max_in_flight_mem_reqs, b.dma_buswidth, b.dma_maxbytes, spad_w)) - - // Note: BallDomain is now a regular Module, no longer a LazyModule - // Will be instantiated in module - - xbar_node := TLBuffer() := reader.node - xbar_node := TLBuffer() := writer.node - id_node := TLWidthWidget(b.dma_buswidth/8) := TLBuffer() := xbar_node - - override lazy val module = new ToyBuckyballModule(this) - - // The LazyRoCC class contains two TLOutputNode instances, atlNode and tlNode. - // atlNode connects into a tile-local arbiter along with the backside of the L1 instruction cache. - // tlNode connects directly to the L1-L2 crossbar. - // The corresponding Tilelink ports in the module implementation's IO bundle are atl and tl, respectively. - override val tlNode = id_node - override val atlNode = TLIdentityNode() - val node = tlNode -} - -class ToyBuckyballModule(outer: ToyBuckyball) extends LazyRoCCModuleImp(outer) - with HasCoreParameters { - import outer.b._ - - val tagWidth = 32 - -// ----------------------------------------------------------------------------- -// Frontend: TLB moved inside MemDomain -// ----------------------------------------------------------------------------- - implicit val edge: TLEdgeOut = outer.id_node.edges.out.head - -// ----------------------------------------------------------------------------- -// Frontend: Global Decoder + Global Reservation Station -// ----------------------------------------------------------------------------- - implicit val b: CustomBuckyballConfig = outer.b - - val gDecoder = Module(new GlobalDecoder) - gDecoder.io.id_i.valid := io.cmd.valid - gDecoder.io.id_i.bits.cmd := io.cmd.bits - io.cmd.ready := gDecoder.io.id_i.ready - - // Global reservation station - val globalRs = Module(new framework.frontend.globalrs.GlobalReservationStation) - globalRs.io.global_decode_cmd_i <> gDecoder.io.id_o - - - -// ----------------------------------------------------------------------------- -// Backend: Ball Domain -// ----------------------------------------------------------------------------- - // BallDomain is now a regular Module, instantiate directly - val ballDomain = Module(new BallDomain()(b, p)) - - // Global RS -> BallDomain - ballDomain.io.global_issue_i <> globalRs.io.ball_issue_o - globalRs.io.ball_complete_i <> ballDomain.io.global_complete_o - -// ----------------------------------------------------------------------------- -// Backend: Mem Domain - complete domain containing DMA+TLB+SRAM -// ----------------------------------------------------------------------------- - val memDomain = Module(new MemDomain) - - // Global RS -> MemDomain - memDomain.io.global_issue_i <> globalRs.io.mem_issue_o - globalRs.io.mem_complete_i <> memDomain.io.global_complete_o - -// ----------------------------------------------------------------------------- -// Backend: MemDomain Connections -// ----------------------------------------------------------------------------- - // MemDomain -> DMA - memDomain.io.dma.read.req <> outer.reader.module.io.req - outer.reader.module.io.resp <> memDomain.io.dma.read.resp - memDomain.io.dma.write.req <> outer.writer.module.io.req - outer.writer.module.io.resp <> memDomain.io.dma.write.resp - - // DMA -> TLB (now through MemDomain) - outer.reader.module.io.tlb <> memDomain.io.tlb(1) - outer.writer.module.io.tlb <> memDomain.io.tlb(0) - - // PTW connected to MemDomain's TLB (shared TLB has only 1 PTW port) - io.ptw(0) <> memDomain.io.ptw(0) - - // TLB exception handling - shared TLB has only 1 exception interface - // Set flush input signals - memDomain.io.tlbExp(0).flush_skip := false.B - memDomain.io.tlbExp(0).flush_retry := false.B - - // Flush signals to DMA components (obtained from MemDomain's TLB exceptions) - outer.reader.module.io.flush := memDomain.io.tlbExp(0).flush() - outer.writer.module.io.flush := memDomain.io.tlbExp(0).flush() - -// ----------------------------------------------------------------------------- -// Backend: Domain Bridge: BallDomain -> MemDomain -// ----------------------------------------------------------------------------- - ballDomain.io.sramRead <> memDomain.io.ballDomain.sramRead - ballDomain.io.sramWrite <> memDomain.io.ballDomain.sramWrite - ballDomain.io.accRead <> memDomain.io.ballDomain.accRead - ballDomain.io.accWrite <> memDomain.io.ballDomain.accWrite - -// --------------------------------------------------------------------------- -// RoCC response and status signals -// --------------------------------------------------------------------------- - io.resp <> globalRs.io.rs_rocc_o.resp - io.busy := globalRs.io.rs_rocc_o.busy - io.interrupt := memDomain.io.tlbExp(0).interrupt - -// --------------------------------------------------------------------------- -// Busy counter to prevent long simulation stalls -// --------------------------------------------------------------------------- - val busy_counter = RegInit(0.U(32.W)) - busy_counter := Mux(globalRs.io.rs_rocc_o.busy, busy_counter + 1.U, 0.U) - assert(busy_counter < 10000.U, "ToyBuckyball: busy for too long!") - -} diff --git a/arch/src/main/scala/examples/toy/balldomain/BallDomain.scala b/arch/src/main/scala/examples/toy/balldomain/BallDomain.scala index 9c87492b..9e9541e3 100644 --- a/arch/src/main/scala/examples/toy/balldomain/BallDomain.scala +++ b/arch/src/main/scala/examples/toy/balldomain/BallDomain.scala @@ -2,78 +2,85 @@ package examples.toy.balldomain import chisel3._ import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} import org.chipsalliance.cde.config.Parameters import freechips.rocketchip.tile._ -import freechips.rocketchip.diplomacy.{LazyModule, LazyModuleImp} -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import examples.BuckyballConfigs.CustomBuckyballConfig -import examples.toy.balldomain.rs.BallRSModule +import framework.top.GlobalConfig import examples.toy.balldomain.bbus.BBusModule -import framework.frontend.globalrs.GlobalRsIssue -import framework.frontend.globalrs.GlobalRsComplete - -// Ball Domain input/output interface -class BallDomainIO(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - // Issue interface from global RS (single channel) - val global_issue_i = Flipped(Decoupled(new GlobalRsIssue)) - - // Report completion to global RS (single channel) - val global_complete_o = Decoupled(new GlobalRsComplete) - - // Execution interface connected to Scratchpad - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, b.spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - val accRead = Vec(b.acc_banks, Flipped(new SramReadIO(b.acc_bank_entries, b.acc_w))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) -} +import framework.frontend.globalrs.{GlobalSchedComplete, GlobalSchedIssue} +import framework.balldomain.blink.{BankRead, BankWrite, SubRobRow} +import framework.balldomain.rs.BallReservationStation + +@instantiable +class BallDomain(val b: GlobalConfig) extends Module { + val memChannel = b.top.ballMemChannelNum + val totalBallRead = b.ballDomain.ballIdMappings.map(_.inBW).sum + val totalBallWrite = b.ballDomain.ballIdMappings.map(_.outBW).sum + + @public + val global_issue_i = IO(Flipped(Decoupled(new GlobalSchedIssue(b)))) -// Ball Domain top level - uses new simplified BBus architecture -class BallDomain(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { + @public + val global_complete_o = IO(Decoupled(new GlobalSchedComplete(b))) - val io = IO(new BallDomainIO) + @public + val bankRead = IO(Vec(totalBallRead, Flipped(new BankRead(b)))) - // Create new BBus module - val bbus = Module(new BBusModule) + @public + val bankWrite = IO(Vec(totalBallWrite, Flipped(new BankWrite(b)))) + + @public + val subRobReq = IO(Vec(b.ballDomain.ballNum, Decoupled(new SubRobRow(b)))) + + val bbus: Instance[BBusModule] = Instantiate(new BBusModule(b)) + val ballDecoder: Instance[BallDomainDecoder] = Instantiate(new BallDomainDecoder(b)) + val ballRs: Instance[BallReservationStation] = Instantiate(new BallReservationStation(b)) //--------------------------------------------------------------------------- // Global RS -> Decoder (receive global issue and construct PostGDCmd) //--------------------------------------------------------------------------- - val ballDecoder = Module(new BallDomainDecoder) // Convert global RS issue to Decoder input format - ballDecoder.io.raw_cmd_i.valid := io.global_issue_i.valid - ballDecoder.io.raw_cmd_i.bits := io.global_issue_i.bits.cmd - io.global_issue_i.ready := ballDecoder.io.raw_cmd_i.ready + ballDecoder.cmd_i.valid := global_issue_i.valid + ballDecoder.cmd_i.bits := global_issue_i.bits.cmd + global_issue_i.ready := ballDecoder.cmd_i.ready //--------------------------------------------------------------------------- // Decoder -> Local BallRS (multi-channel issue to each Ball device) //--------------------------------------------------------------------------- - val ballRs = Module(new BallRSModule) // Connect decoded instruction and global rob_id - ballRs.io.ball_decode_cmd_i.valid := ballDecoder.io.ball_decode_cmd_o.valid - ballRs.io.ball_decode_cmd_i.bits.cmd := ballDecoder.io.ball_decode_cmd_o.bits - ballRs.io.ball_decode_cmd_i.bits.rob_id := io.global_issue_i.bits.rob_id - ballDecoder.io.ball_decode_cmd_o.ready := ballRs.io.ball_decode_cmd_i.ready + ballRs.ball_decode_cmd_i.valid := ballDecoder.ball_decode_cmd_o.valid + ballRs.ball_decode_cmd_i.bits.cmd := ballDecoder.ball_decode_cmd_o.bits + ballRs.ball_decode_cmd_i.bits.rob_id := global_issue_i.bits.rob_id + ballRs.ball_decode_cmd_i.bits.is_sub := global_issue_i.bits.is_sub + ballRs.ball_decode_cmd_i.bits.sub_rob_id := global_issue_i.bits.sub_rob_id + ballDecoder.ball_decode_cmd_o.ready := ballRs.ball_decode_cmd_i.ready //--------------------------------------------------------------------------- // Local BallRS -> BBus (multi-channel) //--------------------------------------------------------------------------- - bbus.io.cmdReq <> ballRs.io.issue_o.balls - ballRs.io.commit_i.balls <> bbus.io.cmdResp + bbus.cmdReq <> ballRs.issue_o.balls + ballRs.commit_i.balls <> bbus.cmdResp //--------------------------------------------------------------------------- // BBus -> Mem Domain //--------------------------------------------------------------------------- - bbus.io.sramRead <> io.sramRead - bbus.io.sramWrite <> io.sramWrite - bbus.io.accRead <> io.accRead - bbus.io.accWrite <> io.accWrite + bbus.bankRead <> bankRead + bbus.bankWrite <> bankWrite + + // BBus -> SubROB + for (i <- 0 until b.ballDomain.ballNum) { + subRobReq(i) <> bbus.subRobReq(i) + } //--------------------------------------------------------------------------- // Local RS completion signal -> Global RS (single channel, includes global rob_id) //--------------------------------------------------------------------------- - io.global_complete_o <> ballRs.io.complete_o + global_complete_o.valid := ballRs.complete_o.valid + global_complete_o.bits.rob_id := ballRs.complete_o.bits.rob_id + global_complete_o.bits.is_sub := ballRs.complete_o.bits.is_sub + global_complete_o.bits.sub_rob_id := ballRs.complete_o.bits.sub_rob_id + ballRs.complete_o.ready := global_complete_o.ready - override lazy val desiredName = "BallDomain" } diff --git a/arch/src/main/scala/examples/toy/balldomain/DISA.scala b/arch/src/main/scala/examples/toy/balldomain/DISA.scala index 394294ad..474dc645 100644 --- a/arch/src/main/scala/examples/toy/balldomain/DISA.scala +++ b/arch/src/main/scala/examples/toy/balldomain/DISA.scala @@ -2,26 +2,51 @@ package examples.toy.balldomain import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.tile._ +// funct7 encoding: [6:4] = enable, [3:0] = opcode +// enable: 000=no_bank, 001=1rd, 010=1wr, 011=1rd+1wr, 100=2rd+1wr +// 101/110/111 = no_bank (extended space) +object DISA { + // enable=100 (2 read + 1 write): opcode 0-3 + val MATMUL_WARP16 = BitPat("b1000000") // 64 (0x40) + val SYSTOLIC = BitPat("b1000001") // 65 (0x41) + val GEMMINI_COMPUTE_PRELOADED = BitPat("b1000010") // 66 (0x42) + val GEMMINI_COMPUTE_ACCUMULATED = BitPat("b1000011") // 67 (0x43) -class BuckyballRawCmd(implicit p: Parameters) extends Bundle { - val cmd = new RoCCCommand -} + // enable=011 (1 read + 1 write): opcode 0-6 + val IM2COL = BitPat("b0110000") // 48 (0x30) + val TRANSPOSE = BitPat("b0110001") // 49 (0x31) + val RELU = BitPat("b0110010") // 50 (0x32) + val QUANT = BitPat("b0110011") // 51 (0x33) + val DEQUANT = BitPat("b0110100") // 52 (0x34) + val GEMMINI_PRELOAD = BitPat("b0110101") // 53 (0x35) + val BDB_BACKDOOR = BitPat("b0110110") // 54 (0x36) + val MXFP = BitPat("b0110111") // 55 (0x37) -object DISA { - val BB_BBFP_MUL = BitPat("b0011010") // 26 - val MATMUL_WS = BitPat("b0011011") // 27 - val MATMUL_WARP16_BITPAT = BitPat("b0100000") // 32 - val IM2COL = BitPat("b0100001") // 33 - val TRANSPOSE = BitPat("b0100010") // 34 - val RELU = BitPat("b0100110") // 38 - val BBUS_CONFIG = BitPat("b0100111") // 39 - val NNLUT = BitPat("b0101000") // 40 - val SNN = BitPat("b0101001") // 41 - val ABFT_SYSTOLIC = BitPat("b0101010") // 42 - val CONV = BitPat("b0101011") // 43 - val CIM = BitPat("b0101100") // 44 - val TRANSFER = BitPat("b0101101") // 45 + // enable=000 (no bank access): opcode 2-4 + val GEMMINI_CONFIG = BitPat("b0000010") // 2 (0x02) + val GEMMINI_FLUSH = BitPat("b0000011") // 3 (0x03) + val BDB_COUNTER = BitPat("b0000100") // 4 (0x04) + + // enable=101 (no bank, extended): Loop WS config, opcode 0-7 + val GEMMINI_LOOP_WS_CONFIG_BOUNDS = BitPat("b1010000") // 80 (0x50) + val GEMMINI_LOOP_WS_CONFIG_ADDR_A = BitPat("b1010001") // 81 (0x51) + val GEMMINI_LOOP_WS_CONFIG_ADDR_B = BitPat("b1010010") // 82 (0x52) + val GEMMINI_LOOP_WS_CONFIG_ADDR_D = BitPat("b1010011") // 83 (0x53) + val GEMMINI_LOOP_WS_CONFIG_ADDR_C = BitPat("b1010100") // 84 (0x54) + val GEMMINI_LOOP_WS_CONFIG_STRIDES_AB = BitPat("b1010101") // 85 (0x55) + val GEMMINI_LOOP_WS_CONFIG_STRIDES_DC = BitPat("b1010110") // 86 (0x56) + val GEMMINI_LOOP_WS = BitPat("b1010111") // 87 (0x57) + + // enable=110 (no bank, extended): Loop Conv WS config, opcode 0-9 + val GEMMINI_LOOP_CONV_WS_CONFIG_1 = BitPat("b1100000") // 96 (0x60) + val GEMMINI_LOOP_CONV_WS_CONFIG_2 = BitPat("b1100001") // 97 (0x61) + val GEMMINI_LOOP_CONV_WS_CONFIG_3 = BitPat("b1100010") // 98 (0x62) + val GEMMINI_LOOP_CONV_WS_CONFIG_4 = BitPat("b1100011") // 99 (0x63) + val GEMMINI_LOOP_CONV_WS_CONFIG_5 = BitPat("b1100100") // 100 (0x64) + val GEMMINI_LOOP_CONV_WS_CONFIG_6 = BitPat("b1100101") // 101 (0x65) + val GEMMINI_LOOP_CONV_WS_CONFIG_7 = BitPat("b1100110") // 102 (0x66) + val GEMMINI_LOOP_CONV_WS_CONFIG_8 = BitPat("b1100111") // 103 (0x67) + val GEMMINI_LOOP_CONV_WS_CONFIG_9 = BitPat("b1101000") // 104 (0x68) + val GEMMINI_LOOP_CONV_WS = BitPat("b1101001") // 105 (0x69) } diff --git a/arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala b/arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala index 9fd970f5..27acc626 100644 --- a/arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala +++ b/arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala @@ -2,20 +2,18 @@ package examples.toy.balldomain import chisel3._ import chisel3.util._ -import framework.frontend.decoder.PostGDCmd -import examples.BuckyballConfigs.CustomBuckyballConfig -import examples.toy.balldomain.DISA._ -import framework.memdomain.dma.LocalAddr -import freechips.rocketchip.tile._ +import chisel3.experimental.hierarchy.{instantiable, public} import org.chipsalliance.cde.config.Parameters +import framework.frontend.decoder.{DomainId, PostGDCmd} +import examples.toy.balldomain.DISA._ +import framework.top.GlobalConfig // Detailed decode output for Ball domain -class BallDecodeCmd(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - // Ball ID - val bid = UInt(5.W) - +class BallDecodeCmd(numBanks: Int, iterLen: Int) extends Bundle { + val bid = UInt(5.W) + val funct7 = UInt(7.W) // raw funct7 for instruction routing // Iteration count - val iter = UInt(10.W) + val iter = UInt(iterLen.W) // Ball-specific fields val op1_en = Bool() @@ -23,26 +21,13 @@ class BallDecodeCmd(implicit b: CustomBuckyballConfig, p: Parameters) extends Bu val wr_spad_en = Bool() val op1_from_spad = Bool() val op2_from_spad = Bool() - // Instruction-specific subfield - val special = UInt(40.W) - - // Ball operand addresses - // 3 bits, supports 8 banks - val op1_bank = UInt(log2Up(b.sp_banks + b.acc_banks).W) - // 12 bits, uses SPAD row count - val op1_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - // 3 bits, supports 8 banks - val op2_bank = UInt(log2Up(b.sp_banks + b.acc_banks).W) - // 12 bits, uses SPAD row count - val op2_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - - // Write address and bank information - // 3 bits, supports 8 banks - val wr_bank = UInt(log2Up(b.sp_banks + b.acc_banks).W) - // 12 bits, uses SPAD row count - val wr_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - // Whether this is an acc bank operation - val is_acc = Bool() + // Instruction-specific subfield (full rs2) + val special = UInt(64.W) + + // Ball operand bank IDs + val op1_bank = UInt(log2Up(numBanks).W) + val op2_bank = UInt(log2Up(numBanks).W) + val wr_bank = UInt(log2Up(numBanks).W) val rs1 = UInt(64.W) val rs2 = UInt(64.W) @@ -51,101 +36,192 @@ class BallDecodeCmd(implicit b: CustomBuckyballConfig, p: Parameters) extends Bu // Ball decode fields object BallDecodeFields extends Enumeration { type Field = Value - val OP1_EN, OP2_EN, WR_SPAD, OP1_FROM_SPAD, OP2_FROM_SPAD, - OP1_SPADDR, OP2_SPADDR, WR_SPADDR, ITER, BID, SPECIAL = Value + val OP1_FROM_SPAD, OP2_FROM_SPAD, OP1_SPADDR, OP2_SPADDR, WR_SPADDR, BID, SPECIAL = + Value } - - // Default constants for EX decoder object BallDefaultConstants { - val Y = true.B - val N = false.B - val DADDR = 0.U(14.W) - val DITER = 0.U(10.W) - val DBID = 0.U(5.W) - val DSPECIAL = 0.U(40.W) + val Y = true.B + val N = false.B + val DADDR = 0.U(10.W) + val DBID = 0.U(5.W) + val DSPECIAL = 0.U(64.W) } -class BallDomainDecoder(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { +@instantiable +class BallDomainDecoder(val b: GlobalConfig) extends Module { import BallDefaultConstants._ - val io = IO(new Bundle { - val raw_cmd_i = Flipped(Decoupled(new PostGDCmd)) - val ball_decode_cmd_o = Decoupled(new BallDecodeCmd) - }) - - val spAddrLen = b.spAddrLen - - // Only process ball instructions - io.raw_cmd_i.ready := io.ball_decode_cmd_o.ready - - val func7 = io.raw_cmd_i.bits.raw_cmd.inst.funct - val rs1 = io.raw_cmd_i.bits.raw_cmd.rs1 - val rs2 = io.raw_cmd_i.bits.raw_cmd.rs2 + val bankIdLen = b.frontend.bank_id_len + val iterLen = b.frontend.iter_len + + @public + val cmd_i = IO(Flipped(Decoupled(new PostGDCmd(b)))) + @public + val ball_decode_cmd_o = IO(Decoupled(new BallDecodeCmd(b.memDomain.bankNum, iterLen))) + + cmd_i.ready := ball_decode_cmd_o.ready + + val func7 = cmd_i.bits.cmd.funct + val rs1 = cmd_i.bits.cmd.rs1Data + val rs2 = cmd_i.bits.cmd.rs2Data + + // Unified rs1 layout: + // rs1[9:0] = BANK0 (op1 bank / 1st read) + // rs1[19:10] = BANK1 (op2 bank / 2nd read) + // rs1[29:20] = BANK2 (wr bank) + // rs1[63:30] = ITER (34-bit) + // + // Enable encoding in funct7[6:4]: + // 000 = no bank access + // 001 = 1 read (bank0) + // 010 = 1 write (bank2) + // 011 = 1 read + 1 write (bank0 read, bank2 write) + // 100 = 2 read + 1 write (bank0+bank1 read, bank2 write) + // 101,110,111 = no bank access (extended opcode space) + // rs2 = special (full 64-bit) + + val op1_bank_raw = rs1(bankIdLen - 1, 0) + val op2_bank_raw = rs1(bankIdLen + 9, 10) + val wr_bank_raw = rs1(bankIdLen + 19, 20) + val iter_raw = rs1(63, 30) + + // Enable decoding from funct7[6:4] + val enableBits = func7(6, 4) + val hasRd0 = enableBits === 1.U || enableBits === 3.U || enableBits === 4.U + val hasRd1 = enableBits === 4.U + val hasWr = enableBits === 2.U || enableBits === 3.U || enableBits === 4.U // Ball instruction decoding import BallDecodeFields._ - val ball_default_decode = List(N,N,N,N,N,DADDR,DADDR,DADDR,DITER,DBID,DSPECIAL) - val ball_decode_list = ListLookup(func7, ball_default_decode, Array( - MATMUL_WARP16_BITPAT -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),0.U,rs2(63,spAddrLen + 10)), - BB_BBFP_MUL -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),1.U,rs2(63,spAddrLen + 10)), - MATMUL_WS -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),1.U,rs2(63,spAddrLen + 10)), - IM2COL -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),2.U,rs2(63,spAddrLen + 10)), - TRANSPOSE -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),3.U,rs2(63,spAddrLen + 10)), - RELU -> List(Y,N,Y,Y,N, rs1(spAddrLen-1,0), DADDR, rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),4.U,rs2(63,spAddrLen + 10)), - BBUS_CONFIG -> List(Y,N,Y,Y,N, rs1(spAddrLen-1,0), DADDR, rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),5.U,rs2(63,spAddrLen + 10)), - NNLUT -> List(Y,N,Y,Y,N, rs1(spAddrLen-1,0), DADDR, rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),31.U,rs2(63,spAddrLen + 10)), - SNN -> List(Y,N,Y,Y,N, rs1(spAddrLen-1,0), DADDR, rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),7.U,rs2(63,spAddrLen + 10)), - ABFT_SYSTOLIC -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),8.U,rs2(63,spAddrLen + 10)), - CONV -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),9.U,rs2(63,spAddrLen + 10)), - CIM -> List(Y,Y,Y,Y,Y, rs1(spAddrLen-1,0), rs1(2*spAddrLen - 1,spAddrLen), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),10.U,rs2(63,spAddrLen + 10)), - TRANSFER -> List(Y,N,Y,Y,N, rs1(spAddrLen-1,0), DADDR, rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen),6.U,rs2(63,spAddrLen + 10)) - )) - + val ball_default_decode = List(N, N, 0.U, 0.U, 0.U, DBID, DSPECIAL) + + val ball_decode_list = ListLookup( + func7, + ball_default_decode, + Array( + // enable=100 (2rd+1wr): bank0 read, bank1 read, bank2 write + // op1s op2s op1_bank op2_bank wr_bank bid special + MATMUL_WARP16 -> List(Y, Y, op1_bank_raw, op2_bank_raw, wr_bank_raw, 0.U, rs2), + SYSTOLIC -> List(Y, Y, op1_bank_raw, op2_bank_raw, wr_bank_raw, 4.U, rs2), + GEMMINI_COMPUTE_PRELOADED -> List( + Y, + Y, + op1_bank_raw, + op2_bank_raw, + wr_bank_raw, + 7.U, + Cat(rs2(63, 4), 2.U(4.W)) + ), + GEMMINI_COMPUTE_ACCUMULATED -> List( + Y, + Y, + op1_bank_raw, + op2_bank_raw, + wr_bank_raw, + 7.U, + Cat(rs2(63, 4), 3.U(4.W)) + ), + // enable=011 (1rd+1wr): bank0 read, bank2 write + RELU -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 1.U, rs2), + TRANSPOSE -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 2.U, rs2), + IM2COL -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 3.U, rs2), + QUANT -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 5.U, rs2), + DEQUANT -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 6.U, rs2), + GEMMINI_PRELOAD -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 7.U, Cat(rs2(63, 4), 1.U(4.W))), + BDB_BACKDOOR -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 8.U, rs2), + // enable=000 (no bank): config/flush/counter + GEMMINI_CONFIG -> List(N, N, DADDR, DADDR, DADDR, 7.U, Cat(rs2(63, 4), 0.U(4.W))), + GEMMINI_FLUSH -> List(N, N, DADDR, DADDR, DADDR, 7.U, Cat(0.U(60.W), 4.U(4.W))), + BDB_COUNTER -> List(N, N, DADDR, DADDR, DADDR, 8.U, rs2), + MXFP -> List(Y, N, op1_bank_raw, DADDR, wr_bank_raw, 9.U, rs2), + // enable=101 (no bank, extended): Loop WS config/trigger + GEMMINI_LOOP_WS_CONFIG_BOUNDS -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS_CONFIG_ADDR_A -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS_CONFIG_ADDR_B -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS_CONFIG_ADDR_D -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS_CONFIG_ADDR_C -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS_CONFIG_STRIDES_AB -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS_CONFIG_STRIDES_DC -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_WS -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + // enable=110 (no bank, extended): Loop Conv WS config/trigger + GEMMINI_LOOP_CONV_WS_CONFIG_1 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_2 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_3 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_4 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_5 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_6 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_7 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_8 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS_CONFIG_9 -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2), + GEMMINI_LOOP_CONV_WS -> List(N, N, DADDR, DADDR, DADDR, 7.U, rs2) + ) + ) // ----------------------------------------------------------------------------- // Output assignment // ----------------------------------------------------------------------------- - io.ball_decode_cmd_o.valid := io.raw_cmd_i.valid && io.raw_cmd_i.bits.is_ball - - io.ball_decode_cmd_o.bits.bid := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.BID.id).asUInt, DBID) - - io.ball_decode_cmd_o.bits.iter := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.ITER.id).asUInt, 0.U(10.W)) - io.ball_decode_cmd_o.bits.special := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.SPECIAL.id).asUInt, DSPECIAL) - io.ball_decode_cmd_o.bits.op1_en := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.OP1_EN.id).asBool, false.B) - io.ball_decode_cmd_o.bits.op2_en := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.OP2_EN.id).asBool, false.B) - io.ball_decode_cmd_o.bits.wr_spad_en := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.WR_SPAD.id).asBool, false.B) - io.ball_decode_cmd_o.bits.op1_from_spad := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.OP1_FROM_SPAD.id).asBool, false.B) - io.ball_decode_cmd_o.bits.op2_from_spad := Mux(io.ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.OP2_FROM_SPAD.id).asBool, false.B) - - // Address parsing - val op1_spaddr = ball_decode_list(BallDecodeFields.OP1_SPADDR.id).asUInt - val op2_spaddr = ball_decode_list(BallDecodeFields.OP2_SPADDR.id).asUInt - val wr_spaddr = ball_decode_list(BallDecodeFields.WR_SPADDR.id).asUInt - - val op1_laddr = LocalAddr.cast_to_sp_addr(b.local_addr_t, op1_spaddr) - val op2_laddr = LocalAddr.cast_to_sp_addr(b.local_addr_t, op2_spaddr) - val wr_laddr = LocalAddr.cast_to_sp_addr(b.local_addr_t, wr_spaddr) - - // Use mem_bank() and mem_row() to support ACC banks (bank 4+) - io.ball_decode_cmd_o.bits.op1_bank := Mux(io.ball_decode_cmd_o.valid, op1_laddr.mem_bank(), 0.U(log2Up(b.sp_banks + b.acc_banks).W)) - io.ball_decode_cmd_o.bits.op1_bank_addr := Mux(io.ball_decode_cmd_o.valid, op1_laddr.mem_row(), 0.U(log2Up(b.spad_bank_entries).W)) - io.ball_decode_cmd_o.bits.op2_bank := Mux(io.ball_decode_cmd_o.valid, op2_laddr.mem_bank(), 0.U(log2Up(b.sp_banks + b.acc_banks).W)) - io.ball_decode_cmd_o.bits.op2_bank_addr := Mux(io.ball_decode_cmd_o.valid, op2_laddr.mem_row(), 0.U(log2Up(b.spad_bank_entries).W)) - - io.ball_decode_cmd_o.bits.wr_bank := Mux(io.ball_decode_cmd_o.valid, wr_laddr.mem_bank(), 0.U(log2Up(b.sp_banks + b.acc_banks).W)) - io.ball_decode_cmd_o.bits.wr_bank_addr := Mux(io.ball_decode_cmd_o.valid, wr_laddr.mem_row(), 0.U(log2Up(b.spad_bank_entries).W)) - io.ball_decode_cmd_o.bits.is_acc := Mux(io.ball_decode_cmd_o.valid, (io.ball_decode_cmd_o.bits.wr_bank >= b.sp_banks.U), false.B) + ball_decode_cmd_o.valid := cmd_i.valid && (cmd_i.bits.domain_id === DomainId.BALL) + + ball_decode_cmd_o.bits.bid := Mux(ball_decode_cmd_o.valid, ball_decode_list(BallDecodeFields.BID.id).asUInt, DBID) + ball_decode_cmd_o.bits.funct7 := Mux(ball_decode_cmd_o.valid, func7, 0.U) + + // iter is always from rs1[63:30] + ball_decode_cmd_o.bits.iter := Mux( + ball_decode_cmd_o.valid, + iter_raw, + 0.U(iterLen.W) + ) + ball_decode_cmd_o.bits.special := Mux( + ball_decode_cmd_o.valid, + ball_decode_list(BallDecodeFields.SPECIAL.id).asUInt, + DSPECIAL + ) + + // Enable bits from funct7[6:4] + ball_decode_cmd_o.bits.op1_en := ball_decode_cmd_o.valid && hasRd0 + ball_decode_cmd_o.bits.op2_en := ball_decode_cmd_o.valid && hasRd1 + ball_decode_cmd_o.bits.wr_spad_en := ball_decode_cmd_o.valid && hasWr + + ball_decode_cmd_o.bits.op1_from_spad := Mux( + ball_decode_cmd_o.valid, + ball_decode_list(BallDecodeFields.OP1_FROM_SPAD.id).asBool, + false.B + ) + ball_decode_cmd_o.bits.op2_from_spad := Mux( + ball_decode_cmd_o.valid, + ball_decode_list(BallDecodeFields.OP2_FROM_SPAD.id).asBool, + false.B + ) + + // Directly assign bank IDs from decoded values + ball_decode_cmd_o.bits.op1_bank := Mux( + ball_decode_cmd_o.valid, + ball_decode_list(BallDecodeFields.OP1_SPADDR.id).asUInt, + 0.U + ) + ball_decode_cmd_o.bits.op2_bank := Mux( + ball_decode_cmd_o.valid, + ball_decode_list(BallDecodeFields.OP2_SPADDR.id).asUInt, + 0.U + ) + ball_decode_cmd_o.bits.wr_bank := Mux( + ball_decode_cmd_o.valid, + ball_decode_list(BallDecodeFields.WR_SPADDR.id).asUInt, + 0.U + ) // Assertion: OpA and OpB in execution instructions must access different banks - assert(!(io.ball_decode_cmd_o.valid && io.ball_decode_cmd_o.bits.op1_en && io.ball_decode_cmd_o.bits.op2_en && - io.ball_decode_cmd_o.bits.op1_bank === io.ball_decode_cmd_o.bits.op2_bank), - "BallDomainDecoder: Ball instruction OpA and OpB cannot access the same bank") + assert( + !(ball_decode_cmd_o.valid && ball_decode_cmd_o.bits.op1_en && ball_decode_cmd_o.bits.op2_en && + ball_decode_cmd_o.bits.op1_bank === ball_decode_cmd_o.bits.op2_bank), + "BallDomainDecoder: Ball instruction OpA and OpB cannot access the same bank" + ) // ----------------------------------------------------------------------------- // Continue passing rs1 and rs2 // ----------------------------------------------------------------------------- - io.ball_decode_cmd_o.bits.rs1 := rs1 - io.ball_decode_cmd_o.bits.rs2 := rs2 + ball_decode_cmd_o.bits.rs1 := rs1 + ball_decode_cmd_o.bits.rs2 := rs2 } diff --git a/arch/src/main/scala/examples/toy/balldomain/bbus/README.md b/arch/src/main/scala/examples/toy/balldomain/bbus/README.md index ad90e78d..9aca5eb0 100644 --- a/arch/src/main/scala/examples/toy/balldomain/bbus/README.md +++ b/arch/src/main/scala/examples/toy/balldomain/bbus/README.md @@ -6,7 +6,7 @@ This directory contains the implementation of Buckyball's ball domain bus system This directory implements two core components: - **BallBus**: Ball domain bus main module, manages SRAM access by multiple Ball nodes -- **BBusRouter**: Bus router, provides routing functionality for Blink interface +- **BBusRouter**: Bus router, provides routing functionality for blink interface ## Code Structure @@ -25,7 +25,7 @@ bbus/ **router.scala** (Routing module) - Implements routing functionality based on BBusNode -- Provides Blink protocol interface encapsulation +- Provides blink protocol interface encapsulation ## Module Description @@ -67,7 +67,7 @@ class BallBus(maxReadBW: Int, maxWriteBW: Int, numBalls: Int) extends LazyModule ### router.scala -**Main functionality**: Bus router, provides routing functionality for Blink protocol interface +**Main functionality**: Bus router, provides routing functionality for blink protocol interface **Key components**: @@ -86,11 +86,11 @@ class BBusRouter extends LazyModule { **Routing functionality**: - Implements standard Ball node interface based on BBusNode -- Provides Blink protocol encapsulation and conversion +- Provides blink protocol encapsulation and conversion - Supports configurable read/write bandwidth parameters **Inputs/Outputs**: -- Input: Blink protocol interface +- Input: blink protocol interface - Output: BBusNode standard interface - Edge cases: Depends on validity of node.edges.in.head diff --git a/arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala b/arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala index a0af441d..51b402fa 100644 --- a/arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala +++ b/arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala @@ -2,24 +2,41 @@ package examples.toy.balldomain.bbus import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig +import chisel3.experimental.hierarchy.instantiable +import framework.top.GlobalConfig import framework.balldomain.bbus.BBus - +import framework.balldomain.blink.HasBlink +import framework.balldomain.prototype.vector.VecBall +import framework.balldomain.prototype.relu.ReluBall +import framework.balldomain.prototype.transpose.TransposeBall +import framework.balldomain.prototype.im2col.Im2colBall +import framework.balldomain.prototype.systolicarray.SystolicArrayBall +import framework.balldomain.prototype.quant.QuantBall +import framework.balldomain.prototype.dequant.DequantBall +import framework.balldomain.prototype.gemmini.GemminiBall +import framework.balldomain.prototype.trace.TraceBall +import framework.balldomain.prototype.mxfp.MxfpBall /** * BBusModule - Ball bus module that directly extends BBus */ -class BBusModule(implicit b: CustomBuckyballConfig, p: Parameters) extends BBus ( - // Define Ball device generators to register - Seq( - () => new prototype.vector.VecBall(0), - () => new prototype.matrix.MatrixBall(1), - () => new prototype.im2col.Im2colBall(2), - () => new prototype.transpose.TransposeBall(3), - () => new prototype.relu.ReluBall(4), - () => new examples.toy.balldomain.emptyball.EmptyBall(5), - () => new prototype.transfer.TransferBall(6) - ) -) { - override lazy val desiredName = "BBusModule" -} +@instantiable +class BBusModule(b: GlobalConfig) + extends BBus( + b, + b.ballDomain.ballIdMappings.map { mapping => + val ballGenerator: () => HasBlink with Module = mapping.ballName match { + case "VecBall" => () => new VecBall(b) + case "ReluBall" => () => new ReluBall(b) + case "TransposeBall" => () => new TransposeBall(b) + case "Im2colBall" => () => new Im2colBall(b) + case "SystolicArrayBall" => () => new SystolicArrayBall(b) + case "QuantBall" => () => new QuantBall(b) + case "DequantBall" => () => new DequantBall(b) + case "GemminiBall" => () => new GemminiBall(b) + case "TraceBall" => () => new TraceBall(b) + case "MxfpBall" => () => new MxfpBall(b) + case name => throw new IllegalArgumentException(s"Unknown ball name: $name") + } + ballGenerator + } + ) {} diff --git a/arch/src/main/scala/examples/toy/balldomain/emptyball/EmptyBall.scala b/arch/src/main/scala/examples/toy/balldomain/emptyball/EmptyBall.scala deleted file mode 100644 index 5abcc76f..00000000 --- a/arch/src/main/scala/examples/toy/balldomain/emptyball/EmptyBall.scala +++ /dev/null @@ -1,57 +0,0 @@ -package examples.toy.balldomain.emptyball - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} - -class EmptyBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - def Blink: Blink = io - - io.cmdResp.valid := RegNext(io.cmdReq.valid) - io.cmdResp.bits.rob_id := RegNext(io.cmdReq.bits.rob_id) - io.cmdReq.ready := true.B - - for (i <- 0 until b.sp_banks) { - io.sramRead(i).io.req.valid := false.B - io.sramRead(i).io.req.bits.addr := 0.U - io.sramRead(i).io.req.bits.fromDMA := false.B - io.sramRead(i).io.resp.ready := false.B - io.sramRead(i).rob_id := 0.U - - io.sramWrite(i).io.req.valid := false.B - io.sramWrite(i).io.req.bits.addr := 0.U - io.sramWrite(i).io.req.bits.data := 0.U - io.sramWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - io.sramWrite(i).rob_id := 0.U - } - - // Handle Accumulator read interface - EmptyBall does not read accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramReadIO), we need to drive req.valid, req.bits (outputs) and resp.ready (output) - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Handle Accumulator write interface - EmptyBall does not write accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramWriteIO), we need to drive req.valid and req.bits (outputs) - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - io.status.ready := true.B - io.status.valid := io.cmdResp.valid - io.status.idle := false.B - io.status.init := false.B - io.status.running := false.B - io.status.iter := 0.U - io.status.complete := io.cmdResp.valid - -} diff --git a/arch/src/main/scala/examples/toy/balldomain/rs/README.md b/arch/src/main/scala/examples/toy/balldomain/rs/README.md deleted file mode 100644 index 605b935a..00000000 --- a/arch/src/main/scala/examples/toy/balldomain/rs/README.md +++ /dev/null @@ -1,214 +0,0 @@ -# Reservation Station & ROB - -## Overview - -This module implements the Reservation Station and Reorder Buffer (ROB) in the Buckyball system for out-of-order execution and instruction scheduling support. The reservation station manages instruction issue and completion, while ROB ensures instructions commit in program order, maintaining precise exception semantics. - -## File Structure - -``` -rs/ -├── reservationStation.scala - Reservation station implementation -└── rob.scala - Reorder buffer implementation -``` - -## Core Components - -### BallReservationStation - Ball Domain Reservation Station - -The reservation station is a key component connecting the instruction decoder and execution units, responsible for: - -**Main functionality**: -- Receives instructions from Ball domain decoder -- Dispatches to different execution units based on instruction type -- Manages instruction issue and completion status -- Generates RoCC responses - -**Supported execution units**: -- **ball1**: VecUnit (vector processing unit) -- **ball2**: BBFP (floating-point processing unit) -- **ball3**: im2col (image processing accelerator) -- **ball4**: transpose (matrix transpose accelerator) - -**Interface design**: -```scala -class BallReservationStation extends Module { - val io = IO(new Bundle { - // Instruction input - val ball_decode_cmd_i = Flipped(DecoupledIO(new BallDecodeCmd)) - - // RoCC response output - val rs_rocc_o = new Bundle { - val resp = DecoupledIO(new RoCCResponseBB) - val busy = Output(Bool()) - } - - // Execution unit interfaces - val issue_o = new BallIssueInterface // Issue interface - val commit_i = new BallCommitInterface // Commit interface - }) -} -``` - -**Instruction dispatch logic**: -```scala -// Dispatch instructions based on bid (Ball ID) -io.issue_o.ball1.valid := rob.io.issue.valid && rob.io.issue.bits.cmd.bid === 1.U // VecUnit -io.issue_o.ball2.valid := rob.io.issue.valid && rob.io.issue.bits.cmd.bid === 2.U // BBFP -io.issue_o.ball3.valid := rob.io.issue.valid && rob.io.issue.bits.cmd.bid === 3.U // im2col -io.issue_o.ball4.valid := rob.io.issue.valid && rob.io.issue.bits.cmd.bid === 4.U // transpose -``` - -### ROB - Reorder Buffer - -ROB implements sequential instruction management and out-of-order completion support: - -**Design features**: -- Uses FIFO queue to maintain instruction order -- Uses completion status table to track instruction execution status -- Supports out-of-order completion but in-order issue -- Provides ROB ID for instruction identification - -**Core data structures**: -```scala -class RobEntry extends Bundle { - val cmd = new BallDecodeCmd // Instruction content - val rob_id = UInt(log2Up(rob_entries).W) // ROB identifier -} -``` - -**State management**: -```scala -val robFifo = Module(new Queue(new RobEntry, rob_entries)) // Instruction queue -val robTable = Reg(Vec(rob_entries, Bool())) // Completion status table -val robIdCounter = RegInit(0.U(log2Up(rob_entries).W)) // ID counter -``` - -## Workflow - -### Instruction Allocation Flow -1. **Instruction enqueue**: Instructions from decoder enter ROB -2. **Assign ROB ID**: Allocate unique ROB ID to each instruction -3. **State initialization**: Mark as incomplete in completion status table - -```scala -when(io.alloc.fire) { - robIdCounter := robIdCounter + 1.U - robTable(robIdCounter) := false.B // Mark as incomplete -} -``` - -### Instruction Issue Flow -1. **Head check**: Check if ROB head instruction is incomplete -2. **Type dispatch**: Dispatch instruction to corresponding execution unit based on bid -3. **Ready control**: Only issue when target execution unit is ready - -```scala -val headEntry = robFifo.io.deq.bits -val headCompleted = robTable(headEntry.rob_id) -io.issue.valid := robFifo.io.deq.valid && !headCompleted -``` - -### Instruction Completion Flow -1. **Completion arbitration**: Multiple execution unit completion signals handled by arbiter -2. **State update**: Update completion status table based on ROB ID -3. **Queue dequeue**: Remove completed head instruction from ROB - -```scala -val completeArb = Module(new Arbiter(UInt(log2Up(rob_entries).W), 4)) -when(io.complete.fire) { - robTable(io.complete.bits) := true.B // Mark as completed -} -``` - -## Configuration Parameters - -### Key Configuration Items -- **rob_entries**: ROB entry count, affects out-of-order execution window size -- **Execution unit count**: Currently supports 4 Ball execution units -- **Arbitration strategy**: Uses round-robin arbitration for multiple completion signals - -### Performance Considerations -- **ROB size**: Larger ROB supports more out-of-order execution but increases hardware overhead -- **Issue bandwidth**: Currently maximum one instruction issued per cycle -- **Completion bandwidth**: Supports multiple instruction completions per cycle - -## Interface Protocol - -### BallIssueInterface - Issue Interface -```scala -class BallIssueInterface extends Bundle { - val ball1 = Decoupled(new BallRsIssue) // VecUnit issue - val ball2 = Decoupled(new BallRsIssue) // BBFP issue - val ball3 = Decoupled(new BallRsIssue) // im2col issue - val ball4 = Decoupled(new BallRsIssue) // transpose issue -} -``` - -### BallCommitInterface - Commit Interface -```scala -class BallCommitInterface extends Bundle { - val ball1 = Flipped(Decoupled(new BallRsComplete)) // VecUnit commit - val ball2 = Flipped(Decoupled(new BallRsComplete)) // BBFP commit - val ball3 = Flipped(Decoupled(new BallRsComplete)) // im2col commit - val ball4 = Flipped(Decoupled(new BallRsComplete)) // transpose commit -} -``` - -## Usage Examples - -### Basic Configuration -```scala -// Configure ROB size in CustomBuckyballConfig -class MyBuckyballConfig extends CustomBuckyballConfig { - override val rob_entries = 16 // 16-entry ROB -} - -// Instantiate reservation station -val reservationStation = Module(new BallReservationStation) -``` - -### Connecting Execution Units -```scala -// Connect VecUnit -vecUnit.io.cmd <> reservationStation.io.issue_o.ball1 -reservationStation.io.commit_i.ball1 <> vecUnit.io.resp - -// Connect BBFP -bbfp.io.cmd <> reservationStation.io.issue_o.ball2 -reservationStation.io.commit_i.ball2 <> bbfp.io.resp -``` - -## Debug and Monitoring - -### Status Signals -- **io.rs_rocc_o.busy**: Reservation station busy status -- **rob.io.empty**: ROB empty status -- **rob.io.full**: ROB full status - -### Performance Counters -The following performance counters can be added for monitoring: -- Instruction issue count -- Instruction completion count -- ROB utilization -- Load distribution across execution units - -## Extension Guide - -### Adding New Execution Units -1. Add new issue port in `BallIssueInterface` -2. Add corresponding commit port in `BallCommitInterface` -3. Add corresponding dispatch and arbitration logic in reservation station -4. Update completion signal arbiter port count - -### Optimization Suggestions -- **Multi-issue support**: Can be extended to issue multiple instructions per cycle -- **Dynamic scheduling**: Implement more complex scheduling algorithms -- **Load balancing**: Perform load balancing across multiple execution units of the same type - -## Related Documentation - -- [Ball Domain Overview](../README.md) -- [Ball Domain Bus](../bbus/README.md) -- [Image Processing Accelerator](../im2col/README.md) -- [Vector Processing Unit](../../../prototype/vector/README.md) diff --git a/arch/src/main/scala/examples/toy/balldomain/rs/rsRegister.scala b/arch/src/main/scala/examples/toy/balldomain/rs/rsRegister.scala deleted file mode 100644 index da797fc3..00000000 --- a/arch/src/main/scala/examples/toy/balldomain/rs/rsRegister.scala +++ /dev/null @@ -1,25 +0,0 @@ -package examples.toy.balldomain.rs - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallReservationStation, BallRsRegist} - -/** - * Ball RS module - references BBus mechanism, manages Ball device registration and connections - */ -class BallRSModule(implicit b: CustomBuckyballConfig, p: Parameters) extends BallReservationStation( - // Define Ball device information to register - Seq( - BallRsRegist(ballId = 0, ballName = "VecBall"), - BallRsRegist(ballId = 1, ballName = "MatrixBall"), - BallRsRegist(ballId = 2, ballName = "Im2colBall"), - BallRsRegist(ballId = 3, ballName = "TransposeBall"), - BallRsRegist(ballId = 4, ballName = "ReluBall"), - BallRsRegist(ballId = 5, ballName = "EmptyBall"), - BallRsRegist(ballId = 6, ballName = "TransferBall") - ) -) { - override lazy val desiredName = "BallRSModule" -} diff --git a/arch/src/main/scala/framework/README.md b/arch/src/main/scala/framework/README.md deleted file mode 100644 index ceb657b9..00000000 --- a/arch/src/main/scala/framework/README.md +++ /dev/null @@ -1,284 +0,0 @@ -# Buckyball Framework Core - -## Overview - -This directory contains the core implementation of the Buckyball framework, serving as the foundation layer for the entire hardware architecture. Located at `arch/src/main/scala/framework`, it provides a complete implementation of processor cores, built-in components, and system interconnects. - -Main functional modules include: -- **builtin**: Built-in hardware component library, including memory domain and frontend modules -- **blink**: System interconnect and communication framework - -## Code Structure - -``` -framework/ -├── builtin/ - Built-in component library -│ ├── memdomain/ - Memory domain implementation -│ │ ├── dma/ - DMA engines (BBStreamReader/Writer) -│ │ ├── mem/ - Memory components (Scratchpad, Accumulator, SRAM banks) -│ │ ├── rs/ - Memory domain reservation station -│ │ ├── tlb/ - TLB implementation -│ │ ├── MemController.scala - Memory controller -│ │ ├── MemDomain.scala - Memory domain top-level -│ │ ├── MemLoader.scala - Load instruction handler -│ │ └── MemStorer.scala - Store instruction handler -│ ├── frontend/ - Frontend components -│ │ ├── GobalDecoder.scala - Global instruction decoder -│ │ ├── globalrs/ - Global reservation station -│ │ │ ├── GlobalReservationStation.scala -│ │ │ └── GlobalROB.scala - Global reorder buffer -│ │ └── rs/ - Ball domain reservation station -│ ├── util/ - Framework utility functions -│ └── BaseConfigs.scala - Base configuration parameters -└── blink/ - System interconnect framework - ├── baseball.scala - Ball device base trait - ├── blink.scala - Blink protocol definitions - └── bbus.scala - Ball bus implementation -``` - -### Module Dependencies - -``` -Application Layer → builtin components → blink interconnect → Physical interface - ↓ ↓ - Memory domain Ball protocol - Frontend System bus -``` - -## Module Details - -### builtin/ - Built-in Component Library - -**Main Function**: Provides standardized hardware component implementations - -**Component Categories**: - -#### memdomain/ - Memory Domain -The memory domain encapsulates all memory-related functionality: - -**Key Components**: -- **MemDomain.scala**: Top-level memory domain module - - Integrates MemController, MemLoader, MemStorer, and TLB - - Provides unified interface to Global RS - - Handles both load and store operations - -- **MemController.scala**: Memory controller - - Encapsulates Scratchpad and Accumulator - - Provides DMA and Ball Domain interfaces - - Handles bank arbitration and routing - -- **MemLoader.scala**: Load instruction handler - - Receives load instructions from reservation station - - Issues DMA read requests - - Writes data to Scratchpad/Accumulator - -- **MemStorer.scala**: Store instruction handler - - Receives store instructions from reservation station - - Reads data from Scratchpad/Accumulator - - Issues DMA write requests with data alignment and masking - -- **dma/**: DMA engines - - **BBStreamReader**: Streaming DMA read with TLB support - - **BBStreamWriter**: Streaming DMA write with alignment handling - - Transaction ID management for multiple outstanding requests - -- **mem/**: Memory components - - **Scratchpad.scala**: 4-bank scratchpad memory (256KB total) - - **AccBank.scala**: Accumulator bank with accumulation pipeline - - **SramBank.scala**: Generic single-port SRAM bank implementation - -- **rs/**: Memory domain reservation station - - **reservationStation.scala**: Local FIFO-based scheduler - - **rob.scala**: Local reorder buffer for memory instructions - - **ringFifo.scala**: Circular FIFO implementation - -- **tlb/**: Translation Lookaside Buffer - - Virtual to physical address translation - - Integrated with DMA engines - -#### frontend/ - Frontend Components -The frontend handles global instruction management: - -**Key Components**: -- **GobalDecoder.scala**: Global instruction decoder - - Classifies instructions into Ball/Memory/Fence types - - Constructs PostGDCmd for domain-specific decoders - - Interfaces with Global RS - -- **globalrs/**: Global reservation station - - **GlobalReservationStation.scala**: Central instruction manager - - Allocates ROB entries - - Issues instructions to Ball and Memory domains - - Handles instruction completion from both domains - - Manages Fence instruction synchronization - - **GlobalROB.scala**: Global reorder buffer - - Tracks instruction state across domains - - Supports out-of-order completion - - Sequential commit of completed instructions - -- **rs/**: Ball domain reservation station - - **reservationStation.scala**: Ball-specific scheduler - - **rob.scala**: Local ROB for Ball instructions - -#### util/ - Framework Utilities -Common utility functions and helper modules - -#### BaseConfigs.scala -**Configuration Parameters**: -```scala -case class BaseConfig( - veclane: Int = 16, // Vector lane width - accveclane: Int = 4, // Accumulator vector lane width - rob_entries: Int = 16, // Number of ROB entries - rs_out_of_order_response: Boolean = true, // Out-of-order response support - sp_banks: Int = 4, // Scratchpad bank count - acc_banks: Int = 8, // Accumulator bank count - sp_capacity: BuckyballMemCapacity = CapacityInKilobytes(256), - acc_capacity: BuckyballMemCapacity = CapacityInKilobytes(64), - spAddrLen: Int = 15, // SPAD address length - memAddrLen: Int = 32, // Memory address length - numVecPE: Int = 16, // Vector PEs per thread - numVecThread: Int = 16, // Vector threads - emptyBallid: Int = 5 // Empty ball ID -) -``` - -### blink/ - System Interconnect - -**Main Function**: Implements system-level interconnect and Ball protocol - -**Key Components**: -- **baseball.scala**: Ball device base trait - - Defines `BallRegist` trait for Ball device registration - - Provides common interface for all Ball devices - -- **blink.scala**: Blink protocol definitions - - Command/response interfaces - - Status and control signals - - SRAM read/write interfaces - -- **bbus.scala**: Ball bus implementation (BBus) - - Manages multiple Ball device connections - - Command router: Routes commands to appropriate Ball devices - - Bus router: Arbitrates Ball device responses - - Memory router: Handles memory access arbitration - - Performance monitoring counters - -**Interconnect Features**: -- Support for multiple bus protocols -- Arbitration and routing functionality -- Latency and bandwidth management -- Dynamic Ball device registration - -## Usage Guide - -### Framework Integration - -**Configuration System**: -```scala -class BuckyballConfig extends Config( - new WithBuiltinComponents ++ - new WithBlinkInterconnect ++ - new BaseConfig -) -``` - -**Module Instantiation**: -```scala -class BuckyballSystem(implicit p: Parameters) extends LazyModule { - // Memory domain - val memdomain = Module(new MemDomain) - - // Ball domain - val balldomain = Module(new BallDomain) - - // Global RS - val globalRS = Module(new GlobalReservationStation) - - // Connect modules - balldomain.io.issue <> globalRS.io.ballIssue - memdomain.io.issue <> globalRS.io.memIssue - globalRS.io.ballComplete <> balldomain.io.complete - globalRS.io.memComplete <> memdomain.io.complete -} -``` - -### Extension Development - -**Adding New Components**: -1. Create new component module in builtin directory -2. Implement standard Module interface -3. Register in configuration system -4. Update interconnect and routing logic - -**Custom Ball Device**: -1. Extend `BallRegist` trait -2. Implement Blink protocol interfaces -3. Register in BBus -4. Add to Ball RS device list - -### Design Principles - -1. **Parameter Passing**: Use Chipyard's Parameters system for configuration -2. **Clock Domains**: Pay attention to clock domain crossing between modules -3. **Reset Strategy**: Ensure proper reset sequencing and dependencies -4. **Performance Optimization**: Focus on critical paths and timing constraints -5. **Debug Support**: Integrate necessary debug and monitoring interfaces -6. **Memory Access**: Respect bank access constraints (op1 and op2 cannot access same bank) -7. **Handshake Protocols**: Use ready/valid handshake for all data transfers - -## Architecture Highlights - -### Instruction Flow -``` -RoCC → Global Decoder → Global RS → Ball Domain / Mem Domain - ↓ ↓ ↓ - Global ROB Ball Decoder Mem Decoder - (tracks state) ↓ ↓ - Ball Devices Loader/Storer - ↓ ↓ - MemController ← → MemController -``` - -### Memory Access Flow -``` -Ball Devices ──→ MemController ──→ Scratchpad (4 banks) - │ └→ Accumulator (8 banks) - │ -Mem Domain ──→ MemController - (Loader/Storer) │ - ↓ - DMA + TLB - ↓ - Main Memory -``` - -## Related Documentation - -- [Blink Interconnect System](blink/README.md) - System interconnect implementation -- [Built-in Components](builtin/README.md) - Standard hardware components -- [Memory Domain](builtin/memdomain/README.md) - Memory subsystem details -- [Frontend Components](builtin/frontend/README.md) - Instruction management -- [Buckyball Source Overview](../README.md) - Upper-level architecture - -## Performance Considerations - -1. **ROB Size**: 16 entries support up to 16 in-flight instructions -2. **Bank Parallelism**: 4 scratchpad + 8 accumulator banks enable parallel access -3. **Out-of-Order Execution**: Global RS supports out-of-order completion when enabled -4. **DMA Bandwidth**: 128-bit bus width provides high memory bandwidth -5. **Pipeline Depth**: Multi-stage pipeline allows high clock frequency - -## Common Issues and Solutions - -**Issue**: Instructions stall in Global RS -- **Solution**: Check ROB capacity and completion signals from domains - -**Issue**: Memory access conflicts -- **Solution**: Ensure op1 and op2 don't access same bank, respect bank boundaries - -**Issue**: DMA timeout -- **Solution**: Verify TLB configuration and page table walker connectivity - -**Issue**: Ball device not responding -- **Solution**: Check Ball device registration in BBus and RS device list diff --git a/arch/src/main/scala/framework/balldomain/bbus/bbus.scala b/arch/src/main/scala/framework/balldomain/bbus/bbus.scala index 6b9e05ea..62391adb 100644 --- a/arch/src/main/scala/framework/balldomain/bbus/bbus.scala +++ b/arch/src/main/scala/framework/balldomain/bbus/bbus.scala @@ -2,135 +2,96 @@ package framework.balldomain.bbus import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.blink.BallRegist +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.top.GlobalConfig +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.HasBlink import framework.balldomain.bbus.pmc.BallCyclePMC import framework.balldomain.bbus.cmdrouter.CmdRouter -import framework.balldomain.bbus.memrouter.MemRouter -import framework.switcher.{ToPhysicalLine, ToVirtualLine} +import framework.balldomain.blink.{BankRead, BankWrite, SubRobRow} - -class BBusConfigIO(numBalls: Int)extends Bundle { - val src_bid = UInt(log2Ceil(numBalls).W) - val dst_bid = UInt(log2Ceil(numBalls).W) - val set = Bool() -} /** * BBus - Ball bus, manages connections and arbitration of multiple Ball devices */ -class BBus(ballGenerators: Seq[() => BallRegist with Module]) - (implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val numBalls = ballGenerators.length - - val io = IO(new Bundle { - val cmdReq = Vec(numBalls, Flipped(Decoupled(new BallRsIssue))) - val cmdResp = Vec(numBalls, Decoupled(new BallRsComplete)) - - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, b.spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - val accRead = Vec(b.acc_banks, Flipped(new SramReadIO(b.acc_bank_entries, b.acc_w))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - }) +@instantiable +class BBus(val b: GlobalConfig, ballGenerators: Seq[() => HasBlink with Module]) extends Module { + val numBalls = b.ballDomain.ballNum + val totalBallRead = b.ballDomain.ballIdMappings.map(_.inBW).sum + val totalBallWrite = b.ballDomain.ballIdMappings.map(_.outBW).sum + + // Rs - bbus - balls + @public + val cmdReq = IO(Vec(numBalls, Flipped(Decoupled(new BallRsIssue(b))))) + @public + val cmdResp = IO(Vec(numBalls, Decoupled(new BallRsComplete(b)))) + // balls - bbus + @public + val bankRead = IO(Vec(totalBallRead, Flipped(new BankRead(b)))) + @public + val bankWrite = IO(Vec(totalBallWrite, Flipped(new BankWrite(b)))) + // balls - bbus - SubROB + @public + val subRobReq = IO(Vec(numBalls, Decoupled(new SubRobRow(b)))) - // Instantiate all registered Balls val balls = ballGenerators.map(gen => Module(gen())) - + val cmdRouter: Instance[CmdRouter] = Instantiate(new CmdRouter(b)) + val pmc: Instance[BallCyclePMC] = Instantiate(new BallCyclePMC(b)) // ----------------------------------------------------------------------------- // cmd router // ----------------------------------------------------------------------------- - val cmdRouter = Module(new CmdRouter(numBalls)) - val idle_ball = Wire(Vec(numBalls, Bool())) - for (i <- 0 until numBalls) { - idle_ball(i) := balls(i).Blink.cmdReq.ready - } - cmdRouter.io.cmdReq_i <> io.cmdReq + val idle_ball = VecInit(balls.map(_.blink.cmdReq.ready)) + + cmdRouter.io.cmdReq_i <> cmdReq cmdRouter.io.ballIdle := idle_ball for (i <- 0 until numBalls) { - balls(i).Blink.cmdReq.valid := cmdRouter.io.cmdReq_o.valid && (cmdRouter.io.cmdReq_o.bits.cmd.bid === i.U) - balls(i).Blink.cmdReq.bits := cmdRouter.io.cmdReq_o.bits + balls(i).blink.cmdReq.valid := cmdRouter.io.cmdReq_o.valid && (cmdRouter.io.cmdReq_o.bits.cmd.bid === i.U) + balls(i).blink.cmdReq.bits := cmdRouter.io.cmdReq_o.bits + + cmdRouter.io.cmdResp_i(i) <> balls(i).blink.cmdResp } cmdRouter.io.cmdReq_o.ready := VecInit((0 until numBalls).map(i => - balls(i).Blink.cmdReq.ready && (cmdRouter.io.cmdReq_o.bits.cmd.bid === i.U) + balls(i).blink.cmdReq.ready && (cmdRouter.io.cmdReq_o.bits.cmd.bid === i.U) )).asUInt.orR - for (i <- 0 until numBalls) { - cmdRouter.io.cmdResp_i(i) <> balls(i).Blink.cmdResp - } - - io.cmdResp <> cmdRouter.io.cmdResp_o - -// ----------------------------------------------------------------------------- -// bus router -// ----------------------------------------------------------------------------- - - -// ----------------------------------------------------------------------------- -// memory router -// ----------------------------------------------------------------------------- - val memoryrouter = Module(new MemRouter(numBalls)(b, p)) - io.sramRead <> memoryrouter.io.sramRead_o - io.sramWrite <> memoryrouter.io.sramWrite_o - io.accRead <> memoryrouter.io.accRead_o - io.accWrite <> memoryrouter.io.accWrite_o - memoryrouter.io.bbusConfig_i <> cmdRouter.io.bbusConfig_o - - // be replaced by ToVirtualLine and ToPhysicalLine modules - //begin - // for(i <- 0 until numBalls){ - // memoryrouter.io.sramRead_i(i) <> balls(i).Blink.sramRead - // memoryrouter.io.sramWrite_i(i) <> balls(i).Blink.sramWrite - // memoryrouter.io.accRead_i(i) <> balls(i).Blink.accRead - // memoryrouter.io.accWrite_i(i) <> balls(i).Blink.accWrite - // } - //end + cmdResp <> cmdRouter.io.cmdResp_o // ----------------------------------------------------------------------------- // PMC - Performance Monitor Counter // ----------------------------------------------------------------------------- - val pmc = Module(new BallCyclePMC(numBalls)) - for (i <- 0 until numBalls) { - pmc.io.cmdReq_i(i).valid := cmdRouter.io.cmdReq_i(i).fire - pmc.io.cmdReq_i(i).bits := cmdRouter.io.cmdReq_i(i).bits - // Remove delay caused by RoB blocking preventing commit + pmc.io.cmdReq_i(i).valid := cmdRouter.io.cmdReq_i(i).fire + pmc.io.cmdReq_i(i).bits := cmdRouter.io.cmdReq_i(i).bits pmc.io.cmdResp_o(i).valid := cmdRouter.io.cmdResp_o(i).valid - pmc.io.cmdResp_o(i).bits := cmdRouter.io.cmdResp_o(i).bits + pmc.io.cmdResp_o(i).bits := cmdRouter.io.cmdResp_o(i).bits } -//----------------------------------------------------------------------------- -// ToVirtualLine - per-ball address to virtual line conversion -// ----------------------------------------------------------------------------- +// Connect balls' bankRead and bankWrite to memrouter + var readChannelIdx = 0 + var writeChannelIdx = 0 - val toVirtualLines = Seq.fill(numBalls){ Module(new ToVirtualLine()(b, p)) } - for (i <- 0 until numBalls) { - toVirtualLines(i).io.sramRead_i <> balls(i).Blink.sramRead - toVirtualLines(i).io.sramWrite_i <> balls(i).Blink.sramWrite - toVirtualLines(i).io.accRead_i <> balls(i).Blink.accRead - toVirtualLines(i).io.accWrite_i <> balls(i).Blink.accWrite - } + for (ball <- balls) { + val ballConfig = b.ballDomain.ballIdMappings.find(_.ballName == ball.getClass.getSimpleName) + val inBW = ballConfig.map(_.inBW).getOrElse(0) + val outBW = ballConfig.map(_.outBW).getOrElse(0) + for (i <- 0 until inBW) { + bankRead(readChannelIdx) <> ball.blink.bankRead(i) + readChannelIdx = readChannelIdx + 1 + } -// ----------------------------------------------------------------------------- -// ToPhysicalLine - per-ball conversion from virtual to physical line -// ----------------------------------------------------------------------------- + for (i <- 0 until outBW) { + bankWrite(writeChannelIdx) <> ball.blink.bankWrite(i) + writeChannelIdx = writeChannelIdx + 1 + } + } - val toPhysicalLines = Seq.fill(numBalls){ Module(new ToPhysicalLine()(b, p)) } + // Connect balls' subRobReq for (i <- 0 until numBalls) { - toPhysicalLines(i).io.sramRead_i <> toVirtualLines(i).io.sramRead_o - toPhysicalLines(i).io.sramWrite_i <> toVirtualLines(i).io.sramWrite_o - - memoryrouter.io.sramRead_i(i) <> toPhysicalLines(i).io.sramRead_o - memoryrouter.io.sramWrite_i(i) <> toPhysicalLines(i).io.sramWrite_o - memoryrouter.io.accRead_i(i) <> toPhysicalLines(i).io.accRead_o - memoryrouter.io.accWrite_i(i) <> toPhysicalLines(i).io.accWrite_o + subRobReq(i) <> balls(i).blink.subRobReq } - override lazy val desiredName = "BBus" } diff --git a/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdReqRouter.scala b/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdReqRouter.scala deleted file mode 100644 index a0b7388d..00000000 --- a/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdReqRouter.scala +++ /dev/null @@ -1,27 +0,0 @@ -package framework.balldomain.bbus.cmdrouter - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} - -class CmdReqRouter(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val cmdReq_i = Vec(numBalls, Flipped(Decoupled(new BallRsIssue))) - val ballIdle = Input(Vec(numBalls, Bool())) - val cmdReq_o = Decoupled(new BallRsIssue) - }) - - val arbiter = Module(new RRArbiter(new BallRsIssue, numBalls)) - - for (i <- 0 until numBalls) { - arbiter.io.in(i).valid := io.cmdReq_i(i).valid && io.ballIdle(i) - arbiter.io.in(i).bits := io.cmdReq_i(i).bits - io.cmdReq_i(i).ready := arbiter.io.in(i).ready && io.ballIdle(i) - } - - io.cmdReq_o <> arbiter.io.out - - override lazy val desiredName = "CmdReqRouter" -} diff --git a/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRespRouter.scala b/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRespRouter.scala deleted file mode 100644 index 1534418e..00000000 --- a/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRespRouter.scala +++ /dev/null @@ -1,24 +0,0 @@ -package framework.balldomain.bbus.cmdrouter - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} - -class CmdRespRouter(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val cmdResp_i = Vec(numBalls, Flipped(Decoupled(new BallRsComplete))) - val cmdResp_o = Decoupled(new BallRsComplete) - }) - - val arbiter = Module(new RRArbiter(new BallRsComplete, numBalls)) - - for (i <- 0 until numBalls) { - arbiter.io.in(i) <> io.cmdResp_i(i) - } - - io.cmdResp_o <> arbiter.io.out - - override lazy val desiredName = "CmdRespRouter" -} diff --git a/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRouter.scala b/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRouter.scala index 3bb11d4b..2bddde4b 100644 --- a/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRouter.scala +++ b/arch/src/main/scala/framework/balldomain/bbus/cmdrouter/CmdRouter.scala @@ -2,39 +2,37 @@ package framework.balldomain.bbus.cmdrouter import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import framework.balldomain.bbus.BBusConfigIO +import framework.top.GlobalConfig +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import chisel3.experimental.hierarchy.{instantiable, public} -class CmdRouter(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { +@instantiable +class CmdRouter(val b: GlobalConfig) extends Module { + val numBalls = b.ballDomain.ballNum + + @public val io = IO(new Bundle { - val cmdReq_i = Vec(numBalls, Flipped(Decoupled(new BallRsIssue))) - val cmdResp_i = Vec(numBalls, Flipped(Decoupled(new BallRsComplete))) + val cmdReq_i = Vec(numBalls, Flipped(Decoupled(new BallRsIssue(b)))) + val cmdResp_i = Vec(numBalls, Flipped(Decoupled(new BallRsComplete(b)))) + val cmdReq_o = Decoupled(new BallRsIssue(b)) + val cmdResp_o = Vec(numBalls, Decoupled(new BallRsComplete(b))) + val ballIdle = Input(Vec(numBalls, Bool())) - val cmdReq_o = Decoupled(new BallRsIssue) - val cmdResp_o = Vec(numBalls, Decoupled(new BallRsComplete)) - val bbusConfig_o = Decoupled(new BBusConfigIO(numBalls)) }) - val reqRouter = Module(new CmdReqRouter(numBalls)) + val arbiter = Module(new RRArbiter(new BallRsIssue(b), numBalls)) - reqRouter.io.cmdReq_i <> io.cmdReq_i - reqRouter.io.ballIdle := io.ballIdle - io.cmdReq_o <> reqRouter.io.cmdReq_o + val ballIdleR = RegNext(io.ballIdle, VecInit(Seq.fill(numBalls)(false.B))) for (i <- 0 until numBalls) { - io.cmdResp_o(i) <> io.cmdResp_i(i) + arbiter.io.in(i).valid := io.cmdReq_i(i).valid && ballIdleR(i) + arbiter.io.in(i).bits := io.cmdReq_i(i).bits + io.cmdReq_i(i).ready := arbiter.io.in(i).ready && ballIdleR(i) } - io.bbusConfig_o.valid := false.B - io.bbusConfig_o.bits.src_bid := 0.U - io.bbusConfig_o.bits.dst_bid := 0.U - io.bbusConfig_o.bits.set := false.B - when(io.cmdReq_i(b.emptyBallid).valid){ - io.bbusConfig_o.valid := true.B - io.bbusConfig_o.bits.src_bid := io.cmdReq_i(b.emptyBallid).bits.cmd.special(5,0) - io.bbusConfig_o.bits.dst_bid := io.cmdReq_i(b.emptyBallid).bits.cmd.special(11,6) - io.bbusConfig_o.bits.set := io.cmdReq_i(b.emptyBallid).bits.cmd.special(12,12) + + io.cmdReq_o <> arbiter.io.out + + for (i <- 0 until numBalls) { + io.cmdResp_o(i) <> io.cmdResp_i(i) } - override lazy val desiredName = "CmdRouter" } diff --git a/arch/src/main/scala/framework/balldomain/bbus/memrouter/SramIOAdapter.scala b/arch/src/main/scala/framework/balldomain/bbus/memrouter/SramIOAdapter.scala deleted file mode 100644 index 690e52c2..00000000 --- a/arch/src/main/scala/framework/balldomain/bbus/memrouter/SramIOAdapter.scala +++ /dev/null @@ -1,21 +0,0 @@ -package framework.balldomain.bbus.memrouter - - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import framework.memdomain.mem.{SramReadIO, SramWriteIO, SramReadReq, SramReadResp, SramWriteReq} - -class SramIOAdapter(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val sramWrite_i = new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len) - val sramRead_o = new SramReadIO(b.spad_bank_entries, b.spad_w) - }) - io.sramRead_o.req.ready := true.B - io.sramWrite_i.req.ready := io.sramRead_o.resp.ready - io.sramRead_o.resp.valid := io.sramWrite_i.req.valid - io.sramRead_o.resp.bits.data := io.sramWrite_i.req.bits.data - io.sramRead_o.resp.bits.fromDMA := false.B -} diff --git a/arch/src/main/scala/framework/balldomain/bbus/memrouter/memRouter.scala b/arch/src/main/scala/framework/balldomain/bbus/memrouter/memRouter.scala deleted file mode 100644 index c363a3ae..00000000 --- a/arch/src/main/scala/framework/balldomain/bbus/memrouter/memRouter.scala +++ /dev/null @@ -1,119 +0,0 @@ -package framework.balldomain.bbus.memrouter - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import framework.memdomain.mem.{SramReadIO, SramWriteIO, SramReadReq, SramReadResp, SramWriteReq} -import framework.balldomain.blink.{SramReadWithRobId, SramWriteWithRobId} -import framework.balldomain.bbus.BBusConfigIO - -class MemRouter(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val sramRead_i = Vec(numBalls, Vec(b.sp_banks, new SramReadWithRobId(b.spad_bank_entries, b.spad_w))) - val sramWrite_i = Vec(numBalls, Vec(b.sp_banks, new SramWriteWithRobId(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - val accRead_i = Vec(numBalls, Vec(b.acc_banks, new SramReadWithRobId(b.acc_bank_entries, b.acc_w))) - val accWrite_i = Vec(numBalls, Vec(b.acc_banks, new SramWriteWithRobId(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - val bbusConfig_i = Flipped(Decoupled(new BBusConfigIO(numBalls))) - - val sramRead_o = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, b.spad_w))) - val sramWrite_o = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - val accRead_o = Vec(b.acc_banks, Flipped(new SramReadIO(b.acc_bank_entries, b.acc_w))) - val accWrite_o = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - }) - - val list_valid = RegInit(VecInit(Seq.fill(numBalls)(false.B))) - val list_dst_bid = RegInit(VecInit(Seq.fill(numBalls)(0.U(log2Ceil(numBalls).W)))) - - // Explicitly initialize to false - val memReq = WireInit(VecInit(Seq.fill(numBalls)(false.B))) - - - // Default assignment - io.sramRead_o := DontCare - io.sramWrite_o := DontCare - io.accRead_o := DontCare - io.accWrite_o := DontCare - for (i <- 0 until numBalls) { - io.sramRead_i(i).foreach(_.io.req.ready := false.B) - io.sramRead_i(i).foreach(_.io.resp.valid := false.B) - io.sramRead_i(i).foreach(_.io.resp.bits := DontCare) - io.sramWrite_i(i).foreach(_.io.req.ready := false.B) - io.accRead_i(i).foreach(_.io.req.ready := false.B) - io.accRead_i(i).foreach(_.io.resp.valid := false.B) - io.accRead_i(i).foreach(_.io.resp.bits := DontCare) - io.accWrite_i(i).foreach(_.io.req.ready := false.B) - } - - // Routing selection - for (i <- 0 until numBalls) { -/* - memReq(i) := io.sramRead_i(i).map(_.io.req.valid).reduce(_||_) || - io.sramWrite_i(i).map(_.io.req.valid).reduce(_||_) || - io.accRead_i(i).map(_.io.req.valid).reduce(_||_) || - io.accWrite_i(i).map(_.io.req.valid).reduce(_||_) - - when (memReq(i)) { - io.sramRead_o <> io.sramRead_i(i).io - io.sramWrite_o <> io.sramWrite_i(i).io - io.accRead_o <> io.accRead_i(i).io - io.accWrite_o <> io.accWrite_i(i).io - } - */ - - for(j <- 0 until b.sp_banks){ - when(io.sramRead_i(i)(j).io.req.valid){ - io.sramRead_o(j).req <> io.sramRead_i(i)(j).io.req - } - } - for(j <- 0 until b.sp_banks){ - when(io.sramRead_o(j).resp.valid){ - io.sramRead_i(i)(j).io.resp <> io.sramRead_o(j).resp - } - } - - for(j <- 0 until b.acc_banks){ - when(io.accRead_i(i)(j).io.req.valid){ - io.accRead_o(j).req <> io.accRead_i(i)(j).io.req - } - } - for(j <- 0 until b.acc_banks){ - when(io.accRead_o(j).resp.valid){ - io.accRead_i(i)(j).io.resp <> io.accRead_o(j).resp - } - } - for(j <- 0 until b.sp_banks){ - when(io.sramWrite_i(i)(j).io.req.valid){ - io.sramWrite_o(j)<> io.sramWrite_i(i)(j).io - } - } - for(j <- 0 until b.acc_banks){ - when(io.accWrite_i(i)(j).io.req.valid){ - io.accWrite_o(j) <> io.accWrite_i(i)(j).io - } - } - } - io.bbusConfig_i.ready := true.B - when (io.bbusConfig_i.valid) { - when(io.bbusConfig_i.bits.set === 1.U){ - list_valid(io.bbusConfig_i.bits.src_bid) := true.B - list_dst_bid(io.bbusConfig_i.bits.src_bid) := io.bbusConfig_i.bits.dst_bid - }.otherwise{ - list_valid(io.bbusConfig_i.bits.src_bid) := false.B - list_dst_bid(io.bbusConfig_i.bits.src_bid) := 0.U - } - } - - for(i <- 0 until numBalls){ - when(list_valid(i)){ - val sramIOadapter = Module(new SramIOAdapter(numBalls)(b, p)) - val dst_bid = list_dst_bid(i) - sramIOadapter.io.sramWrite_i <> io.sramWrite_i(i)(0).io - io.sramRead_i(dst_bid)(0).io <> sramIOadapter.io.sramRead_o - io.sramWrite_o(0).req.valid := false.B - io.sramWrite_o(0).req.bits := DontCare - } - } - override lazy val desiredName = "MemRouter" -} diff --git a/arch/src/main/scala/framework/balldomain/bbus/pmc/BallCyclePMC.scala b/arch/src/main/scala/framework/balldomain/bbus/pmc/BallCyclePMC.scala index bef6b307..2ef55d4d 100644 --- a/arch/src/main/scala/framework/balldomain/bbus/pmc/BallCyclePMC.scala +++ b/arch/src/main/scala/framework/balldomain/bbus/pmc/BallCyclePMC.scala @@ -2,23 +2,37 @@ package framework.balldomain.bbus.pmc import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} +import framework.top.GlobalConfig +import framework.balldomain.rs.BallRsIssue +import framework.balldomain.rs.BallRsComplete +import chisel3.experimental.hierarchy.{instantiable, public} -class BallCyclePMC(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { +@instantiable +class BallCyclePMC(val b: GlobalConfig) extends Module { + val numBalls = b.ballDomain.ballNum + + @public val io = IO(new Bundle { - val cmdReq_i = Input(Vec(numBalls, Valid(new BallRsIssue))) - val cmdResp_o = Input(Vec(numBalls, Valid(new BallRsComplete))) + val cmdReq_i = Input(Vec(numBalls, Valid(new BallRsIssue(b)))) + val cmdResp_o = Input(Vec(numBalls, Valid(new BallRsComplete(b)))) val totalCycles = Output(Vec(numBalls, UInt(64.W))) }) val cycleCounter = RegInit(0.U(64.W)) cycleCounter := cycleCounter + 1.U - val startTime = Reg(Vec(b.rob_entries, UInt(64.W))) + val startTime = Reg(Vec(b.frontend.rob_entries, UInt(64.W))) val ballTotalCycles = RegInit(VecInit(Seq.fill(numBalls)(0.U(64.W)))) + // Per-Ball DPI-C trace modules + val pmcTraces = Seq.fill(numBalls)(Module(new PMCTraceDPI)) + for (pt <- pmcTraces) { + pt.io.ball_id := 0.U + pt.io.rob_id := 0.U + pt.io.elapsed := 0.U + pt.io.enable := false.B + } + for (i <- 0 until numBalls) { when(io.cmdReq_i(i).valid) { startTime(io.cmdReq_i(i).bits.rob_id) := cycleCounter @@ -27,10 +41,15 @@ class BallCyclePMC(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Paramete for (i <- 0 until numBalls) { when(io.cmdResp_o(i).valid) { - val robId = io.cmdResp_o(i).bits.rob_id + val robId = io.cmdResp_o(i).bits.rob_id val elapsed = cycleCounter - startTime(robId) ballTotalCycles(i) := ballTotalCycles(i) + elapsed - printf("[PMC] Ball %d completed task, elapsed: %d cycles\n", i.U, elapsed) + + // DPI-C trace output + pmcTraces(i).io.ball_id := i.U + pmcTraces(i).io.rob_id := robId + pmcTraces(i).io.elapsed := elapsed + pmcTraces(i).io.enable := true.B } } diff --git a/arch/src/main/scala/framework/balldomain/bbus/pmc/PMCTraceDPI.scala b/arch/src/main/scala/framework/balldomain/bbus/pmc/PMCTraceDPI.scala new file mode 100644 index 00000000..56514365 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/bbus/pmc/PMCTraceDPI.scala @@ -0,0 +1,39 @@ +package framework.balldomain.bbus.pmc + +import chisel3._ +import chisel3.util._ + +// DPI-C BlackBox for PMC trace +class PMCTraceDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val ball_id = Input(UInt(32.W)) + val rob_id = Input(UInt(32.W)) + val elapsed = Input(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "PMCTraceDPI.v", + """ + |import "DPI-C" function void dpi_pmctrace( + | input int unsigned ball_id, + | input int unsigned rob_id, + | input longint unsigned elapsed + |); + | + |module PMCTraceDPI( + | input [31:0] ball_id, + | input [31:0] rob_id, + | input [63:0] elapsed, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_pmctrace(ball_id, rob_id, elapsed); + | end + | end + |endmodule + """.stripMargin + ) +} diff --git a/arch/src/main/scala/framework/balldomain/blink/SubRobRow.scala b/arch/src/main/scala/framework/balldomain/blink/SubRobRow.scala new file mode 100644 index 00000000..288761b3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/blink/SubRobRow.scala @@ -0,0 +1,51 @@ +package framework.balldomain.blink + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig +import framework.frontend.decoder.PostGDCmd +import framework.frontend.scoreboard.BankAccessInfo + +class SubRobSlot(val b: GlobalConfig) extends Bundle { + val valid = Bool() + val cmd = new PostGDCmd(b) +} + +class SubRobRow(val b: GlobalConfig) extends Bundle { + val slots = Vec(4, new SubRobSlot(b)) + val ball_id = UInt(log2Up(b.ballDomain.ballNum).W) + val master_rob_id = UInt(log2Up(b.frontend.rob_entries).W) +} + +/** Tie-off for subRobReq when a Ball does not use SubROB. */ +object SubRobRow { + + def tieOff(b: GlobalConfig): SubRobRow = { + val w = Wire(new SubRobRow(b)) + w.ball_id := 0.U + w.master_rob_id := 0.U + val bankIdLen = log2Up(b.memDomain.bankNum) + for (i <- 0 until 4) { + w.slots(i).valid := false.B + w.slots(i).cmd.domain_id := 0.U + w.slots(i).cmd.cmd.raw_inst := 0.U + w.slots(i).cmd.cmd.pc := 0.U + w.slots(i).cmd.cmd.funct := 0.U + w.slots(i).cmd.cmd.funct3 := 0.U + w.slots(i).cmd.cmd.rs2 := 0.U + w.slots(i).cmd.cmd.rs1 := 0.U + w.slots(i).cmd.cmd.xd := false.B + w.slots(i).cmd.cmd.xs1 := false.B + w.slots(i).cmd.cmd.xs2 := false.B + w.slots(i).cmd.cmd.rd := 0.U + w.slots(i).cmd.cmd.opcode := 0.U + w.slots(i).cmd.cmd.rs1Data := 0.U + w.slots(i).cmd.cmd.rs2Data := 0.U + w.slots(i).cmd.bankAccess := BankAccessInfo.none(bankIdLen) + w.slots(i).cmd.isFence := false.B + w.slots(i).cmd.isBarrier := false.B + } + w + } + +} diff --git a/arch/src/main/scala/framework/balldomain/blink/axis/axis.scala b/arch/src/main/scala/framework/balldomain/blink/axis/axis.scala new file mode 100644 index 00000000..347bc4fc --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/blink/axis/axis.scala @@ -0,0 +1,41 @@ +package framework.balldomain.blink.axis + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig + +/** + * AXI4-Stream style bundle. + * Source drives tvalid, tdata, tlast, tstrb, tid, tdest; sink drives tready. + * Use Flipped(AxisBundle(...)) for sink side. Transfer occurs when tvalid && tready. + * + * Signals: + * tvalid Source: this beat valid; tdata (and other payload) may be sampled. + * tready Sink: can accept this cycle; transfer completes when both high. + * tdata Payload, width dataWidth. + * tlast Packet end; high on final beat of a stream packet. + * tstrb Byte strobe: 1 bit per TDATA byte; 1 = valid, 0 = null/placeholder. + * tid Transaction ID: source/transaction identity for mux, reorder, or req–resp match. + * tdest Destination ID: routing target when multiplexing. + * + * @param dataWidth TDATA width (bits). tstrb length = (dataWidth/8).max(1). + * @param idWidth TID width; 0 = unused. + * @param destWidth TDEST width; 0 = unused. + */ +class AxisBundle( + val b: GlobalConfig, + val dataWidth: Int, + val idWidth: Int = 0, + val destWidth: Int = 0, + val offsetWidth: Int = 0) + extends Bundle { + val tdata = UInt(dataWidth.W) + val tlast = Bool() + val tstrb = UInt((dataWidth / 8).max(1).W) + val tid = UInt(idWidth.W) + val tdest = UInt(destWidth.W) + + val wmode = Bool() // true=accumulator mode, false=direct write mode + val offset = UInt(offsetWidth.W) + val rob_id = UInt(log2Ceil(b.frontend.rob_entries).W) +} diff --git a/arch/src/main/scala/framework/balldomain/blink/bank.scala b/arch/src/main/scala/framework/balldomain/blink/bank.scala new file mode 100644 index 00000000..692261d3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/blink/bank.scala @@ -0,0 +1,35 @@ +package framework.balldomain.blink + +import chisel3._ +import chisel3.util._ +import framework.memdomain.backend.banks.{SramReadIO, SramWriteIO} +import framework.memdomain.backend.banks.{SramReadReq, SramWriteReq} +import framework.top.GlobalConfig + +trait HasBankId { + val b: GlobalConfig + val bank_id = Input(UInt(log2Up(b.memDomain.bankNum).W)) +} + +trait HasRobId { + val b: GlobalConfig + val rob_id = Input(UInt(log2Up(b.frontend.rob_entries).W)) +} + +trait HasBallId { + val b: GlobalConfig + val ball_id = Input(UInt(log2Up(b.ballDomain.ballNum).W)) +} + +trait HasAccGroupId { + val b: GlobalConfig + val group_id = Input(UInt(3.W)) +} + +class BankRead(val b: GlobalConfig) extends Bundle with HasBankId with HasRobId with HasBallId with HasAccGroupId { + val io = new SramReadIO(b) +} + +class BankWrite(val b: GlobalConfig) extends Bundle with HasBankId with HasRobId with HasBallId with HasAccGroupId { + val io = new SramWriteIO(b) +} diff --git a/arch/src/main/scala/framework/balldomain/blink/baseball.scala b/arch/src/main/scala/framework/balldomain/blink/baseball.scala index 365cdc22..b00fc630 100644 --- a/arch/src/main/scala/framework/balldomain/blink/baseball.scala +++ b/arch/src/main/scala/framework/balldomain/blink/baseball.scala @@ -2,10 +2,13 @@ package framework.balldomain.blink import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -// Base trait for Ball devices -trait BallRegist { - def Blink: Blink - def ballId: UInt +// all balls must have a blink interface +trait HasBlink { + def blink: BlinkIO +} + +// can be customized by user to add additional status signals +trait HasBallStatus { + def status: BallStatus } diff --git a/arch/src/main/scala/framework/balldomain/blink/blink.scala b/arch/src/main/scala/framework/balldomain/blink/blink.scala index f1d018b1..2f068ff5 100644 --- a/arch/src/main/scala/framework/balldomain/blink/blink.scala +++ b/arch/src/main/scala/framework/balldomain/blink/blink.scala @@ -2,72 +2,16 @@ package framework.balldomain.blink import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import framework.memdomain.mem.{SramReadIO, SramWriteIO} - -// Ball device status bundle -class Status extends Bundle { - // device is ready to accept new input - val ready = Output(Bool()) - // device has valid output - val valid = Output(Bool()) - // no input and no output - val idle = Output(Bool()) - // has input but no output - val init = Output(Bool()) - // started producing output - val running = Output(Bool()) - // fully finished current batch - val complete = Output(Bool()) - // current batch iteration - val iter = Output(UInt(32.W)) -} - - -// SramReadIO with rob_id -class SramReadWithRobId(val n: Int, val w: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val io = new SramReadIO(n, w) - // Input because the outer layer has Flipped - val rob_id = Input(UInt(log2Up(b.rob_entries).W)) -} - -// SramWriteIO with rob_id -class SramWriteWithRobId(val n: Int, val w: Int, val mask_len: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val io = new SramWriteIO(n, w, mask_len) - // Input because theSramWriteIO outer layer has Flipped - val rob_id = Input(UInt(log2Up(b.rob_entries).W)) -} - -// SramReadIO with rob_id -class SramReadWithInfo(val n: Int, val w: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val io = new SramReadIO(n, w) - // Input because the outer layer has Flipped - val rob_id = Input(UInt(log2Up(b.rob_entries).W)) - val is_acc = Input(Bool()) - val bank_id = Input(UInt(log2Up(b.sp_banks+b.acc_banks).W)) -} - -// SramWriteIO with rob_id -class SramWriteWithInfo(val n: Int, val w: Int, val mask_len: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val io = new SramWriteIO(n, w, mask_len) - // Input because theSramWriteIO outer layer has Flipped - val rob_id = Input(UInt(log2Up(b.rob_entries).W)) - val is_acc = Input(Bool()) - val bank_id = Input(UInt(log2Up(b.sp_banks+b.acc_banks).W)) -} - - -// Standard interface for Ball devices -class Blink(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - val sramRead = Vec(b.sp_banks, Flipped(new SramReadWithRobId(b.spad_bank_entries, b.spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteWithRobId(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - val accRead = Vec(b.acc_banks, Flipped(new SramReadWithRobId(b.acc_bank_entries, b.acc_w))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteWithRobId(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - val status = new Status +import framework.top.GlobalConfig +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import chisel3.experimental.hierarchy.{instantiable, public} + +class BlinkIO(b: GlobalConfig, inBW: Int, outBW: Int) extends Bundle with HasBallStatus { + val status = new BallStatus() + + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val subRobReq = Decoupled(new SubRobRow(b)) } diff --git a/arch/src/main/scala/framework/balldomain/blink/status.scala b/arch/src/main/scala/framework/balldomain/blink/status.scala new file mode 100644 index 00000000..843444a3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/blink/status.scala @@ -0,0 +1,9 @@ +package framework.balldomain.blink + +import chisel3._ +import chisel3.util._ + +class BallStatus extends Bundle { + val idle = Output(Bool()) + val running = Output(Bool()) +} diff --git a/arch/src/main/scala/framework/balldomain/configs/BallDomainParam.scala b/arch/src/main/scala/framework/balldomain/configs/BallDomainParam.scala new file mode 100644 index 00000000..92090ece --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/configs/BallDomainParam.scala @@ -0,0 +1,24 @@ +package framework.balldomain.configs + +import upickle.default._ + +case class BallIdMapping( + ballId: Int, + ballName: String, + inBW: Int, + outBW: Int) + +case class BallDomainParam( + ballNum: Int, + ballIdMappings: Seq[BallIdMapping]) + +object BallDomainParam { + implicit val ballIdMappingRW: ReadWriter[BallIdMapping] = macroRW + implicit val rw: ReadWriter[BallDomainParam] = macroRW + + def apply(): BallDomainParam = { + val jsonStr = scala.io.Source.fromFile("src/main/scala/framework/balldomain/configs/default.json").mkString + read[BallDomainParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/configs/default.json b/arch/src/main/scala/framework/balldomain/configs/default.json new file mode 100644 index 00000000..9745567a --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/configs/default.json @@ -0,0 +1,15 @@ +{ + "ballNum": 10, + "ballIdMappings": [ + {"ballId": 0, "ballName": "VecBall", "inBW": 2, "outBW": 4}, + {"ballId": 1, "ballName": "ReluBall", "inBW": 1, "outBW": 1}, + {"ballId": 2, "ballName": "TransposeBall", "inBW": 1, "outBW": 1}, + {"ballId": 3, "ballName": "Im2colBall", "inBW": 1, "outBW": 1}, + {"ballId": 4, "ballName": "SystolicArrayBall", "inBW": 2, "outBW": 4}, + {"ballId": 5, "ballName": "QuantBall", "inBW": 1, "outBW": 1}, + {"ballId": 6, "ballName": "DequantBall", "inBW": 1, "outBW": 1}, + {"ballId": 7, "ballName": "GemminiBall", "inBW": 2, "outBW": 4}, + {"ballId": 8, "ballName": "TraceBall", "inBW": 1, "outBW": 1}, + {"ballId": 9, "ballName": "MxfpBall", "inBW": 1, "outBW": 1} + ] +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/README.md b/arch/src/main/scala/framework/balldomain/prototype/README.md new file mode 100644 index 00000000..8e4f4579 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/README.md @@ -0,0 +1,327 @@ +# Buckyball Prototype Accelerators + +This directory contains prototype implementations of various domain-specific computation accelerators in the Buckyball framework, covering hardware accelerator designs for machine learning, numerical computation, and data processing domains. + +## Directory Structure + +``` +prototype/ +├── format/ - Data format conversion accelerators +├── im2col/ - Image-to-column transformation accelerator +├── matrix/ - Matrix computation accelerators +├── relu/ - ReLU activation accelerator +├── transpose/ - Matrix transpose accelerator +└── vector/ - Vector processing unit +``` + +## Accelerator Components + +### format/ - Data Format Processing +Implements hardware acceleration for various data format conversions and arithmetic operations: +- **Arithmetic.scala**: Custom arithmetic operation units +- **Dataformat.scala**: Data format conversion and encoding + +**Key Features**: +- Support for multiple data formats (INT8, FP16, FP32, BBFP) +- Abstract arithmetic interface for extensibility +- Concrete implementations for different data types + +**Use Cases**: +- Floating-point format conversion +- Fixed-point arithmetic optimization +- Data compression and decompression +- Mixed-precision computation + +### im2col/ - Image Processing Acceleration +Specialized accelerator for im2col operations in convolutional neural networks: +- **im2col.scala**: Hardware implementation of image-to-column matrix transformation + +**Key Features**: +- Configurable kernel size and stride +- Efficient data reorganization for convolution +- Pipeline-based processing for high throughput +- Support for different input dimensions + +**Use Cases**: +- CNN convolution layer acceleration +- Image preprocessing pipeline +- Feature extraction optimization +- Memory-efficient convolution implementation + +### matrix/ - Matrix Computation Engine +Matrix computation accelerator implementation with multiple modules: + +**Core Components**: +- **bbfpIns_decode.scala**: Instruction decoder for matrix operations +- **bbfp_load.scala**: Data loading unit for matrix operands +- **bbfp_ex.scala**: Execution unit for matrix multiplication +- **bbfp_pe.scala**: Processing Element (PE) array implementation +- **bbfp_control.scala**: Control logic for matrix operations + +**PE Array Architecture**: +- **BBFP_PE**: Individual processing element with weight stationary mode +- **BBFP_PE_Array2x2**: 2×2 PE array building block +- **BBFP_PE_Array16x16**: 16×16 PE array for high-performance computing +- Systolic array dataflow for efficient matrix multiplication + +**Supported Formats**: +- INT8 integer arithmetic +- FP16 half-precision floating-point +- FP32 single-precision floating-point +- BBFP (Brain Floating Point) custom format + +**Use Cases**: +- Deep learning training and inference +- Scientific computing acceleration +- Linear algebra operations +- High-performance GEMM operations + +### relu/ - ReLU Activation +Efficient hardware implementation of ReLU (Rectified Linear Unit) activation: +- **Relu.scala**: Pipelined ReLU accelerator + +**Key Features**: +- Element-wise ReLU computation +- Configurable tile size +- Pipeline-based processing +- Integrated with scratchpad memory + +**Use Cases**: +- Neural network activation layers +- Non-linear transformation +- Post-convolution activation + +### transpose/ - Matrix Transpose +Efficient hardware implementation for matrix transpose operations: +- **Transpose.scala**: Matrix transpose accelerator + +**Key Features**: +- Tile-based transpose for large matrices +- Optimized memory access patterns +- Configurable tile size +- Pipeline-based implementation + +**Use Cases**: +- Matrix operation preprocessing +- Data reorganization and transformation +- Memory access pattern optimization +- Transpose in GEMM operations + +### vector/ - Vector Processing Unit +Vector processing architecture supporting SIMD and multi-threading: + +**Core Components**: +- **VecUnit.scala**: Vector processor top-level module +- **VecCtrlUnit.scala**: Vector control unit for instruction dispatch +- **VecLoadUnit.scala**: Vector load unit for data fetching +- **VecEXUnit.scala**: Vector execution unit with multiple functional units +- **VecStoreUnit.scala**: Vector store unit for result write-back + +**Submodules**: +- **bond/**: Binding and synchronization mechanisms + - Various bond types (VSSBond, VVVBond, VSVBond, VVSBond, VVBond) + - Operand routing and data distribution + +- **op/**: Vector operation implementations + - AddOp, MulOp, CascadeOp, SelectOp, etc. + - Arithmetic and logical operations + +- **thread/**: Multi-threading support + - Thread-level parallelism + - Warp-based execution model + +- **warp/**: Thread bundle management (MeshWarp) + - 16×16 PE mesh for vector operations + - Parallel execution of vector instructions + +**Architecture Highlights**: +- Configurable number of PEs and threads +- Support for various vector operations (add, mul, cascade, select) +- Flexible data routing through bond mechanisms +- High parallelism with warp-level execution + +**Use Cases**: +- Parallel numerical computation +- Signal processing acceleration +- High-performance computing applications +- SIMD-style data processing + +## Design Features + +### Modular Design +Each accelerator adopts modular design for: +- Independent development and testing +- Flexible composition and configuration +- Performance tuning and extension +- Easy integration with Buckyball framework + +### Pipeline Architecture +Most accelerators use deep pipeline design: +- Improved throughput and frequency +- Support for continuous data stream processing +- Optimized resource utilization +- Latency hiding through pipelining + +### Configurable Parameters +Support rich configuration parameters: +- Data width and precision +- Parallelism and pipeline depth +- Cache size and organization +- Interface protocol and timing + +## Integration Method + +### blink Protocol Interface +All Ball accelerators implement the blink protocol interface: +```scala +class CustomBall(implicit b: CustomBuckyballConfig, p: Parameters) + extends Module with BallRegist { + val io = IO(new BlinkIO) + def ballId = .U + def blink = // Implement blink protocol +} +``` + +**blink Interface Components**: +- **cmdReq**: Command request interface with rob_id tracking +- **cmdResp**: Command response interface for completion signaling +- **status**: Status signals (ready, valid, idle, complete) +- **sramRead/Write**: SRAM interfaces for scratchpad and accumulator access + +### Memory Interface +Support multiple memory access patterns: +- DMA bulk transfer through MemDomain +- Scratchpad direct access for low-latency operations +- Accumulator access for result accumulation +- Bank-aware memory access (op1 and op2 must access different banks) + +### Configuration Integration +Parameterized through Buckyball configuration system: +```scala +case class BaseConfig( + veclane: Int = 16, // Vector lane width + numVecPE: Int = 16, // Number of vector PEs + numVecThread: Int = 16, // Number of vector threads + // ... more parameters +) +``` + +## Performance Optimization + +### Data Locality +- Optimize data access patterns for spatial and temporal locality +- Reduce memory bandwidth requirements through data reuse +- Improve cache hit rate with tile-based processing +- Scratchpad memory for frequently accessed data + +### Parallel Processing +- Multi-level parallelism design + - Instruction-level parallelism (ILP) through pipelining + - Data-level parallelism (DLP) through vector operations + - Thread-level parallelism (TLP) through multiple warps +- Pipeline parallelism for continuous data flow +- Data parallelism through PE arrays + +### Resource Sharing +- Arithmetic unit reuse across different operations +- Storage resource sharing between modules +- Control logic optimization for area efficiency +- Flexible routing for resource utilization + +## Verification and Testing + +Each accelerator comes with corresponding test cases: +- Functional correctness verification +- Performance benchmark testing +- Boundary condition checking +- Random test generation +- Integration testing with complete system + +## Development Guidelines + +### Adding New Accelerators + +**Steps**: +1. Implement Ball device with BallRegist trait +2. Define blink protocol interfaces +3. Implement computation logic +4. Add SRAM access logic (respect bank constraints) +5. Register in BBus and Ball RS + +**Example Template**: +```scala +class NewBall(implicit b: CustomBuckyballConfig, p: Parameters) + extends Module with BallRegist { + val io = IO(new BlinkIO) + + def ballId = .U + def blink = io + + // State machine + val sIdle :: sCompute :: sComplete :: Nil = Enum(3) + val state = RegInit(sIdle) + + // Computation logic + switch(state) { + is(sIdle) { + when(io.cmdReq.fire) { + state := sCompute + } + } + is(sCompute) { + // Perform computation + when(done) { + state := sComplete + } + } + is(sComplete) { + io.cmdResp.valid := true.B + state := sIdle + } + } +} +``` + +### Performance Optimization Tips + +1. **Memory Access**: + - Group memory accesses to same bank + - Use streaming access patterns + - Minimize random access + +2. **Pipeline Design**: + - Balance pipeline stages + - Add registers for timing closure + - Use buffering for throughput + +3. **Resource Utilization**: + - Share expensive resources (multipliers, dividers) + - Use LUTs for simple operations + - Optimize control logic + +### Common Pitfalls + +1. **Bank Conflict**: op1 and op2 accessing same bank - violates design constraint +2. **ROB ID Tracking**: Must forward rob_id from request to response +3. **Ready/Valid Protocol**: Carefully implement handshake to avoid deadlock +4. **Iteration Count**: Properly handle iteration for multi-row operations + +## Related Documentation + +- [Format Conversion](format/README.md) - Data format details +- [Im2col Implementation](im2col/README.md) - Im2col accelerator +- [Matrix Operations](matrix/README.md) - Matrix computation +- [ReLU Activation](relu/README.md) - ReLU implementation +- [Transpose Operations](transpose/README.md) - Matrix transpose +- [Vector Processing](vector/README.md) - Vector unit architecture +- [blink Protocol](../framework/blink/README.md) - Ball protocol specification + +## Future Enhancements + +Potential areas for extension: +- Support for additional data formats (INT4, BF16) +- Advanced matrix operations (SVD, QR decomposition) +- Fused operations (Conv+ReLU, GEMM+BiasAdd) +- Dynamic reconfiguration for different workloads +- Power management and clock gating +- Advanced synchronization mechanisms diff --git a/arch/src/main/scala/framework/balldomain/prototype/dequant/Dequant.scala b/arch/src/main/scala/framework/balldomain/prototype/dequant/Dequant.scala new file mode 100644 index 00000000..b231ebd2 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/dequant/Dequant.scala @@ -0,0 +1,240 @@ +package framework.balldomain.prototype.dequant + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig + +/** + * Dequant - Dequantization core logic. + * INT32 -> FP32: fp32_val = int32_val * scale + * Each 128-bit SRAM word = 4 x INT32. Output: 4 x FP32 = 128 bits. 1:1 read/write. + * Scale from cmd.special(31,0) as FP32 bit pattern. + * + * FSM follows ReluBall pattern: + * idle -> sRead -> sWrite -> complete -> idle + */ +@instantiable +class Dequant(val b: GlobalConfig) extends Module { + val elemsPerWord = 4 + val bankWidth = b.memDomain.bankWidth + val InputNum = 16 + + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "DequantBall") + .getOrElse(throw new IllegalArgumentException("DequantBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + } + + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) + val state = RegInit(idle) + + val regArray = RegInit(VecInit(Seq.fill(InputNum)(0.U(bankWidth.W)))) + + val readCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + val respCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + val writeCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + + val raddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val waddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + val scale_reg = RegInit(0.U(32.W)) + + // Default outputs + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + io.bankRead(i).bank_id := rbank_reg + io.bankRead(i).group_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + io.bankWrite(i).bank_id := wbank_reg + io.bankWrite(i).group_id := 0.U + } + + io.cmdReq.ready := state === idle + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + // INT32 to FP32 + def int32ToFp32(intVal: UInt): UInt = { + val signed = intVal.asSInt + val is_zero = signed === 0.S + val sign = intVal(31) + val absVal = Wire(UInt(32.W)) + absVal := Mux(sign.asBool, (~intVal + 1.U), intVal) + + val leadingOne = Wire(UInt(5.W)) + leadingOne := (30.U - PriorityEncoder(Reverse(absVal(30, 0)))) + + val exponent = Wire(UInt(8.W)) + exponent := leadingOne +& 127.U + + val mantissa = Wire(UInt(23.W)) + when(leadingOne >= 23.U) { + mantissa := (absVal >> (leadingOne - 23.U))(22, 0) + }.otherwise { + mantissa := (absVal << (23.U - leadingOne))(22, 0) + } + + val result = Wire(UInt(32.W)) + when(is_zero) { + result := 0.U + }.otherwise { + result := Cat(sign, exponent, mantissa) + } + result + } + + // FP32 multiply + def fp32Multiply(a: UInt, bv: UInt): UInt = { + val a_sign = a(31) + val b_sign = bv(31) + val a_exp = a(30, 23) + val b_exp = bv(30, 23) + val a_mant = Cat(1.U(1.W), a(22, 0)) + val b_mant = Cat(1.U(1.W), bv(22, 0)) + val result_sign = a_sign ^ b_sign + val a_is_zero = a_exp === 0.U && a(22, 0) === 0.U + val b_is_zero = b_exp === 0.U && bv(22, 0) === 0.U + val mant_product = (a_mant * b_mant)(47, 0) + val mant_shifted = Wire(UInt(24.W)) + val exp_adjust = Wire(UInt(1.W)) + when(mant_product(47)) { + mant_shifted := mant_product(47, 24) + exp_adjust := 1.U + }.otherwise { + mant_shifted := mant_product(46, 23) + exp_adjust := 0.U + } + val result_exp_wide = a_exp +& b_exp +& exp_adjust - 127.U + val result_exp = result_exp_wide(7, 0) + val result = Wire(UInt(32.W)) + when(a_is_zero || b_is_zero) { + result := 0.U + }.elsewhen(result_exp_wide(9, 8) =/= 0.U && result_exp_wide(9)) { + result := 0.U + }.elsewhen(result_exp_wide(8) && !result_exp_wide(9)) { + result := Cat(result_sign, 255.U(8.W), 0.U(23.W)) + }.otherwise { + result := Cat(result_sign, result_exp, mant_shifted(22, 0)) + } + result + } + + // FSM + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + state := sRead + readCounter := 0.U + respCounter := 0.U + writeCounter := 0.U + raddr_reg := 0.U + rbank_reg := io.cmdReq.bits.cmd.op1_bank + waddr_reg := 0.U + wbank_reg := io.cmdReq.bits.cmd.wr_bank + iter_reg := io.cmdReq.bits.cmd.iter + scale_reg := io.cmdReq.bits.cmd.special(31, 0) + } + } + + is(sRead) { + io.bankRead(0).io.resp.ready := true.B + + io.bankRead(0).io.req.valid := readCounter < iter_reg + io.bankRead(0).io.req.bits.addr := raddr_reg + readCounter + + when(io.bankRead(0).io.req.fire) { + readCounter := readCounter + 1.U + } + + val dataWord = io.bankRead(0).io.resp.bits.data + + when(io.bankRead(0).io.resp.fire) { + val results = Wire(Vec(elemsPerWord, UInt(32.W))) + for (i <- 0 until elemsPerWord) { + val int_elem = dataWord((i + 1) * 32 - 1, i * 32) + val fp_elem = int32ToFp32(int_elem) + results(i) := fp32Multiply(fp_elem, scale_reg) + } + regArray(respCounter) := Cat(results.reverse) + respCounter := respCounter + 1.U + + when(respCounter === (iter_reg - 1.U)) { + state := sWrite + } + } + } + + is(sWrite) { + val hasMore = writeCounter < iter_reg + + io.bankWrite(0).io.req.valid := hasMore + io.bankWrite(0).io.req.bits.addr := waddr_reg + writeCounter + io.bankWrite(0).io.req.bits.data := regArray(writeCounter) + io.bankWrite(0).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(1.U(1.W))) + io.bankWrite(0).io.resp.ready := true.B + + when(io.bankWrite(0).io.req.fire) { + when(writeCounter === (iter_reg - 1.U)) { + state := complete + }.otherwise { + writeCounter := writeCounter + 1.U + } + } + } + + is(complete) { + io.bankWrite(0).io.resp.ready := true.B + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + when(io.cmdResp.fire) { + state := idle + } + } + } + + io.status.idle := state === idle + io.status.running := (state === sRead) || (state === sWrite) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/dequant/DequantBall.scala b/arch/src/main/scala/framework/balldomain/prototype/dequant/DequantBall.scala new file mode 100644 index 00000000..efd2d167 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/dequant/DequantBall.scala @@ -0,0 +1,39 @@ +package framework.balldomain.prototype.dequant + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.top.GlobalConfig + +@instantiable +class DequantBall(val b: GlobalConfig) extends Module with HasBlink { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "DequantBall") + .getOrElse(throw new IllegalArgumentException("DequantBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + val dequantUnit: Instance[Dequant] = Instantiate(new Dequant(b)) + + dequantUnit.io.cmdReq <> io.cmdReq + dequantUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + dequantUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + dequantUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> dequantUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/dequant/configs/DequantBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/dequant/configs/DequantBallParam.scala new file mode 100644 index 00000000..96223f15 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/dequant/configs/DequantBallParam.scala @@ -0,0 +1,17 @@ +package framework.balldomain.prototype.dequant.configs + +import upickle.default._ + +case class DequantBallParam( + placeholder: Boolean) + +object DequantBallParam { + implicit val rw: ReadWriter[DequantBallParam] = macroRW + + def apply(): DequantBallParam = { + val jsonStr = + scala.io.Source.fromFile("src/main/scala/framework/balldomain/prototype/dequant/configs/default.json").mkString + read[DequantBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/dequant/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/dequant/configs/default.json new file mode 100644 index 00000000..97ac37b6 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/dequant/configs/default.json @@ -0,0 +1,3 @@ +{ + "placeholder": true +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiBall.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiBall.scala new file mode 100644 index 00000000..cfc26d46 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiBall.scala @@ -0,0 +1,233 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.balldomain.rs.BallRsComplete +import framework.top.GlobalConfig + +@instantiable +class GemminiBall(val b: GlobalConfig) extends Module with HasBlink with HasBallStatus { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "GemminiBall") + .getOrElse(throw new IllegalArgumentException("GemminiBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + def status: BallStatus = io.status + + // ========================================================================= + // Sub-modules + // ========================================================================= + val exCtrl: Instance[GemminiExCtrl] = Instantiate(new GemminiExCtrl(b)) + val matmulUnroller: Instance[LoopMatmulUnroller] = Instantiate(new LoopMatmulUnroller(b)) + val convUnroller: Instance[LoopConvUnroller] = Instantiate(new LoopConvUnroller(b)) + val encoder: Instance[LoopCmdEncoder] = Instantiate(new LoopCmdEncoder(b)) + + // ========================================================================= + // Config registers for Loop modes + // ========================================================================= + val loopWsConfig = Reg(new LoopWsConfig(b)) + val loopConvConfig = Reg(new LoopConvWsConfig(b)) + + // rob_id tracking for bank metadata + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + } + + // ========================================================================= + // Instruction routing by funct7 + // ========================================================================= + val funct7 = io.cmdReq.bits.cmd.funct7 + + val rs2Data = io.cmdReq.bits.cmd.special + + val isConfig = funct7 === 0x02.U // GEMMINI_CONFIG (enable=000, opcode=2) + val isPreload = funct7 === 0x35.U // GEMMINI_PRELOAD (enable=011, opcode=5) + val isComputePre = funct7 === 0x42.U // GEMMINI_COMPUTE_PRELOADED (enable=100, opcode=2) + val isComputeAcc = funct7 === 0x43.U // GEMMINI_COMPUTE_ACCUMULATED (enable=100, opcode=3) + val isFlush = funct7 === 0x03.U // GEMMINI_FLUSH (enable=000, opcode=3) + val isExUnit = isConfig || isPreload || isComputePre || isComputeAcc || isFlush + + val isLoopWsConfig = funct7 >= 0x50.U && funct7 <= 0x56.U + val isLoopWsTrigger = funct7 === 0x57.U + val isLoopConvConfig = funct7 >= 0x60.U && funct7 <= 0x68.U + val isLoopConvTrigger = funct7 === 0x69.U + + // ========================================================================= + // ExUnit path (non-Loop: CONFIG/PRELOAD/COMPUTE/FLUSH) + // ========================================================================= + exCtrl.exio.cmdReq.valid := io.cmdReq.valid && isExUnit + exCtrl.exio.cmdReq.bits := io.cmdReq.bits + + // ========================================================================= + // Config latch path (immediate cmdResp) + // ========================================================================= + val configRespValid = RegInit(false.B) + val configRespBits = Reg(new BallRsComplete(b)) + configRespValid := false.B // default: pulse + + when(io.cmdReq.fire && isLoopWsConfig) { + configRespValid := true.B + configRespBits.rob_id := io.cmdReq.bits.rob_id + configRespBits.is_sub := io.cmdReq.bits.is_sub + configRespBits.sub_rob_id := io.cmdReq.bits.sub_rob_id + switch(funct7) { + is(0x50.U) { + loopWsConfig.max_k := rs2Data(15, 0) + loopWsConfig.max_j := rs2Data(31, 16) + loopWsConfig.max_i := rs2Data(47, 32) + } + is(0x51.U)(loopWsConfig.dram_addr_a := rs2Data(38, 0)) + is(0x52.U)(loopWsConfig.dram_addr_b := rs2Data(38, 0)) + is(0x53.U)(loopWsConfig.dram_addr_d := rs2Data(38, 0)) + is(0x54.U)(loopWsConfig.dram_addr_c := rs2Data(38, 0)) + is(0x55.U) { + loopWsConfig.stride_a := rs2Data(31, 0) + loopWsConfig.stride_b := rs2Data(63, 32) + } + is(0x56.U) { + loopWsConfig.stride_d := rs2Data(31, 0) + loopWsConfig.stride_c := rs2Data(63, 32) + } + } + } + + when(io.cmdReq.fire && isLoopConvConfig) { + configRespValid := true.B + configRespBits.rob_id := io.cmdReq.bits.rob_id + configRespBits.is_sub := io.cmdReq.bits.is_sub + configRespBits.sub_rob_id := io.cmdReq.bits.sub_rob_id + switch(funct7) { + is(0x60.U) { + loopConvConfig.batch_size := rs2Data(15, 0) + loopConvConfig.in_dim := rs2Data(31, 16) + loopConvConfig.in_channels := rs2Data(47, 32) + } + is(0x61.U) { + loopConvConfig.out_channels := rs2Data(15, 0) + loopConvConfig.out_dim := rs2Data(31, 16) + loopConvConfig.stride := rs2Data(39, 32) + loopConvConfig.padding := rs2Data(47, 40) + } + is(0x62.U) { + loopConvConfig.kernel_dim := rs2Data(7, 0) + loopConvConfig.pool_size := rs2Data(15, 8) + loopConvConfig.pool_stride := rs2Data(23, 16) + loopConvConfig.pool_padding := rs2Data(31, 24) + } + is(0x63.U)(loopConvConfig.dram_addr_bias := rs2Data(38, 0)) + is(0x64.U)(loopConvConfig.dram_addr_input := rs2Data(38, 0)) + is(0x65.U)(loopConvConfig.dram_addr_weight := rs2Data(38, 0)) + is(0x66.U)(loopConvConfig.dram_addr_output := rs2Data(38, 0)) + is(0x67.U) { + loopConvConfig.input_stride := rs2Data(31, 0) + loopConvConfig.weight_stride := rs2Data(63, 32) + } + is(0x68.U)(loopConvConfig.output_stride := rs2Data(31, 0)) + } + } + + // ========================================================================= + // Loop trigger: latch bank IDs and start unroller (no cmdResp) + // Bank values come from rs2Data in the trigger instruction, but loopWsConfig + // Reg won't update until next edge. Override start.bits combinationally. + // ========================================================================= + matmulUnroller.io.start.valid := false.B + matmulUnroller.io.start.bits := loopWsConfig + + when(io.cmdReq.fire && isLoopWsTrigger) { + loopWsConfig.bank_a := rs2Data(9, 0) + loopWsConfig.bank_b := rs2Data(19, 10) + loopWsConfig.bank_c := rs2Data(29, 20) + loopWsConfig.low_d := rs2Data(30) + matmulUnroller.io.start.valid := true.B + matmulUnroller.io.start.bits.bank_a := rs2Data(9, 0) + matmulUnroller.io.start.bits.bank_b := rs2Data(19, 10) + matmulUnroller.io.start.bits.bank_c := rs2Data(29, 20) + matmulUnroller.io.start.bits.low_d := rs2Data(30) + } + + convUnroller.io.start.valid := false.B + convUnroller.io.start.bits := loopConvConfig + + when(io.cmdReq.fire && isLoopConvTrigger) { + loopConvConfig.bank_input := rs2Data(9, 0) + loopConvConfig.bank_weight := rs2Data(19, 10) + loopConvConfig.bank_output := rs2Data(29, 20) + loopConvConfig.no_bias := rs2Data(30) + convUnroller.io.start.valid := true.B + convUnroller.io.start.bits.bank_input := rs2Data(9, 0) + convUnroller.io.start.bits.bank_weight := rs2Data(19, 10) + convUnroller.io.start.bits.bank_output := rs2Data(29, 20) + convUnroller.io.start.bits.no_bias := rs2Data(30) + } + + // ========================================================================= + // LoopUnrollers → Arbiter → LoopCmdEncoder → io.subRobReq + // ========================================================================= + val cmdArb = Module(new Arbiter(new LoopCmd(b), 2)) + cmdArb.io.in(0) <> matmulUnroller.io.cmd + cmdArb.io.in(1) <> convUnroller.io.cmd + encoder.io.cmd <> cmdArb.io.out + encoder.io.subRobRow <> io.subRobReq + encoder.io.ballId := ballCommonConfig.ballId.U + encoder.io.masterRobId := rob_id_reg + + // ========================================================================= + // cmdReq.ready: route to correct consumer + // ========================================================================= + io.cmdReq.ready := Mux( + isExUnit, + exCtrl.exio.cmdReq.ready, + Mux( + isLoopWsConfig || isLoopConvConfig, + true.B, + Mux( + isLoopWsTrigger, + !matmulUnroller.io.busy, + Mux(isLoopConvTrigger, !convUnroller.io.busy, false.B) + ) + ) + ) + + // ========================================================================= + // cmdResp: mux between exUnit and config immediate response + // ========================================================================= + io.cmdResp <> exCtrl.exio.cmdResp + when(configRespValid) { + io.cmdResp.valid := true.B + io.cmdResp.bits := configRespBits + } + + // ========================================================================= + // Bank connections (unchanged from original) + // ========================================================================= + for (i <- 0 until inBW) { + io.bankRead(i).io.req <> exCtrl.exio.bankReadReq(i) + exCtrl.exio.bankReadResp(i) <> io.bankRead(i).io.resp + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + io.bankRead(i).group_id := 0.U + } + io.bankRead(0).bank_id := exCtrl.exio.op1_bank_o + if (inBW > 1) { + io.bankRead(1).bank_id := exCtrl.exio.op2_bank_o + } + + for (i <- 0 until outBW) { + io.bankWrite(i).io <> exCtrl.exio.bankWrite(i) + io.bankWrite(i).bank_id := exCtrl.exio.wr_bank_o + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + io.bankWrite(i).group_id := i.U + } + + io.status <> exCtrl.exio.status +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrl.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrl.scala new file mode 100644 index 00000000..d23f1506 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrl.scala @@ -0,0 +1,25 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig + +@instantiable +class GemminiExCtrl(val b: GlobalConfig) + extends Module + with GemminiExCtrlDefs + with GemminiExCtrlDefaults + with GemminiExCtrlCmdStates + with GemminiExCtrlPreloadStates + with GemminiExCtrlComputeReadState + with GemminiExCtrlComputeFeedState + with GemminiExCtrlStoreOps + with GemminiExCtrlFsm { + @public val exio = io + + applyDefaults() + runFsm() + + io.status.idle := state === sIdle + io.status.running := state =/= sIdle +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlCmdStates.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlCmdStates.scala new file mode 100644 index 00000000..e6273245 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlCmdStates.scala @@ -0,0 +1,73 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import gemmini._ + +trait GemminiExCtrlCmdStates { this: GemminiExCtrl => + + protected def handleIdleState(): Unit = { + io.cmdReq.ready := true.B + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + op1_bank := io.cmdReq.bits.cmd.op1_bank + op2_bank := io.cmdReq.bits.cmd.op2_bank + wr_bank := io.cmdReq.bits.cmd.wr_bank + total_rows := Mux(io.cmdReq.bits.cmd.iter === 0.U, DIM.U, io.cmdReq.bits.cmd.iter) + + when(sub_cmd === GemminiSubCmd.CONFIG) { + cfg_dataflow := io.cmdReq.bits.cmd.special(4) + cfg_a_transpose := io.cmdReq.bits.cmd.special(7) + cfg_bd_transpose := io.cmdReq.bits.cmd.special(8) + cfg_in_shift := io.cmdReq.bits.cmd.special(log2Up(config.accWidth) + 8, 9) + io.cmdResp.valid := true.B + state := sCommit + }.elsewhen(sub_cmd === GemminiSubCmd.PRELOAD) { + read_row_cnt := 0.U + feed_row_cnt := 0.U + req_sent := false.B + state := sPreloadRead + }.elsewhen(sub_cmd === GemminiSubCmd.COMPUTE_PRELOADED || sub_cmd === GemminiSubCmd.COMPUTE_ACCUMULATED) { + read_row_cnt := 0.U + feed_row_cnt := 0.U + outBufRows := 0.U + outBufCollected := 0.U + req_sent := false.B + state := sComputeRead + }.elsewhen(sub_cmd === GemminiSubCmd.FLUSH) { + state := sFlush + } + } + } + + protected def handleCommitState(): Unit = { + io.cmdResp.valid := true.B + when(io.cmdResp.fire) { + state := sIdle + } + } + + protected def handleFlushState(): Unit = { + mesh.io.req.valid := true.B + mesh.io.req.bits.flush := 2.U + mesh.io.req.bits.total_rows := DIM.U + mesh.io.req.bits.tag.rob := robIdAsTag8(rob_id_reg) + + mesh.io.a.valid := true.B + mesh.io.a.bits := 0.U.asTypeOf(mesh.A_TYPE) + mesh.io.b.valid := true.B + mesh.io.b.bits := 0.U.asTypeOf(mesh.B_TYPE) + mesh.io.d.valid := true.B + mesh.io.d.bits := 0.U.asTypeOf(mesh.D_TYPE) + + when(mesh.io.req.ready) { + io.cmdResp.valid := true.B + when(io.cmdResp.fire) { + state := sIdle + } + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlComputeFeedState.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlComputeFeedState.scala new file mode 100644 index 00000000..3e77649e --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlComputeFeedState.scala @@ -0,0 +1,68 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import gemmini._ + +trait GemminiExCtrlComputeFeedState { this: GemminiExCtrl => + + protected def handleComputeFeedState(): Unit = { + // OS: do not collect mesh rows here — preload matmul responses still drain during feed; + // sComputeFlush drops tag=0xff (garbage) resps and collects non-garbage rows only. + + when(!req_sent) { + mesh.io.req.valid := true.B + mesh.io.req.bits.pe_control.dataflow := cfg_dataflow + mesh.io.req.bits.pe_control.propagate := 1.U + mesh.io.req.bits.pe_control.shift := cfg_in_shift + mesh.io.req.bits.a_transpose := Mux(cfg_dataflow === Dataflow.OS.id.U, !cfg_a_transpose, cfg_a_transpose) + mesh.io.req.bits.bd_transpose := cfg_bd_transpose + mesh.io.req.bits.total_rows := total_rows + mesh.io.req.bits.tag.rob := robIdAsTag8(rob_id_reg) + mesh.io.req.bits.flush := 0.U + when(mesh.io.req.fire) { + req_sent := true.B + } + } + + when(req_sent && feed_row_cnt < total_rows) { + when(rdQueue0.io.deq.valid && rdQueue1.io.deq.valid) { + val a_row = rdQueue0.io.deq.bits.data.asTypeOf(Vec(DIM, inputType)) + val x_row = rdQueue1.io.deq.bits.data.asTypeOf(Vec(DIM, inputType)) + mesh.io.a.valid := true.B + mesh.io.a.bits := VecInit(a_row.grouped(config.tileRows).map(g => VecInit(g)).toSeq) + when(cfg_dataflow === Dataflow.OS.id.U) { + // OS: stream A/B, D=0 + mesh.io.b.valid := true.B + mesh.io.b.bits := VecInit(x_row.grouped(config.tileColumns).map(g => VecInit(g)).toSeq) + mesh.io.d.valid := true.B + mesh.io.d.bits := 0.U.asTypeOf(mesh.D_TYPE) + }.otherwise { + // WS: stream A/D, B comes from preloaded weights + mesh.io.b.valid := true.B + mesh.io.b.bits := 0.U.asTypeOf(mesh.B_TYPE) + mesh.io.d.valid := true.B + mesh.io.d.bits := VecInit(x_row.grouped(config.tileColumns).map(g => VecInit(g)).toSeq) + } + when(mesh.io.a.ready && mesh.io.b.ready && mesh.io.d.ready) { + rdQueue0.io.deq.ready := true.B + rdQueue1.io.deq.ready := true.B + feed_row_cnt := feed_row_cnt + 1.U + } + } + } + + when(req_sent && feed_row_cnt >= total_rows) { + when(cfg_dataflow === Dataflow.OS.id.U) { + outBufRows := total_rows - 1.U + req_sent := false.B + feed_row_cnt := 0.U + state := sComputeFlush + }.otherwise { + outBufRows := 0.U + state := sDrain + } + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlComputeReadState.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlComputeReadState.scala new file mode 100644 index 00000000..948ed2fb --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlComputeReadState.scala @@ -0,0 +1,27 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import gemmini._ + +trait GemminiExCtrlComputeReadState { this: GemminiExCtrl => + + protected def handleComputeReadState(): Unit = { + // Drain stale preload data from shared queues before issuing compute reads + when(read_row_cnt === 0.U && (rdQueue0.io.deq.valid || rdQueue1.io.deq.valid)) { + rdQueue0.io.deq.ready := true.B + rdQueue1.io.deq.ready := true.B + }.elsewhen(read_row_cnt < total_rows) { + io.bankReadReq(0).valid := true.B + io.bankReadReq(0).bits.addr := read_row_cnt + io.bankReadReq(1).valid := true.B + io.bankReadReq(1).bits.addr := read_row_cnt + when(io.bankReadReq(0).ready && io.bankReadReq(1).ready) { + read_row_cnt := read_row_cnt + 1.U + } + }.otherwise { + state := sComputeFeed + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlDefaults.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlDefaults.scala new file mode 100644 index 00000000..08e1d393 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlDefaults.scala @@ -0,0 +1,44 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ + +trait GemminiExCtrlDefaults { this: GemminiExCtrl => + + protected def applyDefaults(): Unit = { + io.cmdReq.ready := state === sIdle + + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + for (i <- 0 until inBW) { + io.bankReadReq(i).valid := false.B + io.bankReadReq(i).bits.addr := 0.U + } + + rdQueue0.io.deq.ready := false.B + rdQueue1.io.deq.ready := false.B + + mesh.io.a.valid := false.B + mesh.io.a.bits := 0.U.asTypeOf(mesh.A_TYPE) + mesh.io.b.valid := false.B + mesh.io.b.bits := 0.U.asTypeOf(mesh.B_TYPE) + mesh.io.d.valid := false.B + mesh.io.d.bits := 0.U.asTypeOf(mesh.D_TYPE) + mesh.io.req.valid := false.B + mesh.io.req.bits := 0.U.asTypeOf(mesh.io.req.bits) + // mesh.io.resp is Valid-only (no ready); MeshWithDelays assumes immediate observation when valid + + io.bankWrite.foreach { bw => + bw.req.valid := false.B + bw.req.bits.addr := 0.U + bw.req.bits.data := 0.U + bw.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(false.B)) + bw.req.bits.wmode := true.B + bw.resp.ready := true.B + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlDefs.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlDefs.scala new file mode 100644 index 00000000..a10cdfaa --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlDefs.scala @@ -0,0 +1,103 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.public +import framework.balldomain.blink.BallStatus +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.memdomain.backend.banks.{SramReadReq, SramReadResp, SramWriteIO} +import framework.top.GlobalConfig +import framework.balldomain.prototype.gemmini.configs.GemminiBallParam +import gemmini._ +import gemmini.Util._ + +trait GemminiExCtrlDefs { this: GemminiExCtrl => + val config = GemminiBallParam() + val DIM = config.blockSize + + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "GemminiBall") + .getOrElse(throw new IllegalArgumentException("GemminiBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + val inputType = SInt(config.inputWidth.W) + val accType = SInt(config.accWidth.W) + val outputType = SInt(config.accWidth.W) + + val ctrlIo = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankReadReq = Vec(inBW, Decoupled(new SramReadReq(b))) + val bankReadResp = Vec(inBW, Flipped(Decoupled(new SramReadResp(b)))) + val bankWrite = Vec(outBW, Flipped(new SramWriteIO(b))) + val op1_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val op2_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val wr_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val status = new BallStatus + }) + + val io = ctrlIo + + val mesh = Module(new MeshWithDelays( + inputType = inputType, + outputType = outputType, + accType = accType, + tagType = new SimpleTag, + df = Dataflow.BOTH, + tree_reduction = false, + tile_latency = config.tileLatency, + output_delay = config.outputDelay, + tileRows = config.tileRows, + tileColumns = config.tileColumns, + meshRows = config.meshRows, + meshColumns = config.meshColumns, + leftBanks = 1, + upBanks = 1, + n_simultaneous_matmuls = 5 + )) + + val cfg_dataflow = RegInit(0.U(1.W)) + val cfg_in_shift = RegInit(0.U(log2Up(config.accWidth).W)) + val cfg_a_transpose = RegInit(false.B) + val cfg_bd_transpose = RegInit(false.B) + + val sIdle :: sPreloadRead :: sPreloadFeed :: sComputeRead :: sComputeFeed :: sComputeFlush :: sFlush :: sDrain :: sStore :: sCommit :: Nil = + Enum(10) + val state = RegInit(sIdle) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + + /** Match SimpleTag.rob (8b); must match mesh req tag bits */ + protected def robIdAsTag8(x: UInt): UInt = { + val w = x.getWidth + if (w >= 8) x(7, 0) else Cat(0.U((8 - w).W), x) + } + + val op1_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op2_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wr_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + io.op1_bank_o := op1_bank + io.op2_bank_o := op2_bank + io.wr_bank_o := wr_bank + + val read_row_cnt = RegInit(0.U(log2Up(DIM + 1).W)) + val feed_row_cnt = RegInit(0.U(log2Up(DIM + 1).W)) + val store_row_cnt = RegInit(0.U(log2Up(DIM + 1).W)) + val total_rows = RegInit(DIM.U(log2Up(DIM + 1).W)) + val req_sent = RegInit(false.B) + + val sub_cmd = io.cmdReq.bits.cmd.special(3, 0) + + val rdQueue0 = Module(new Queue(new SramReadResp(b), entries = DIM)) + val rdQueue1 = Module(new Queue(new SramReadResp(b), entries = DIM)) + rdQueue0.io.enq <> io.bankReadResp(0) + rdQueue1.io.enq <> io.bankReadResp(1) + + val outBuf = Reg(Vec(DIM, Vec(config.meshColumns, Vec(config.tileColumns, outputType)))) + val outBufRows = RegInit(0.U(log2Up(DIM + 1).W)) + val outBufCollected = RegInit(0.U(log2Up(DIM + 1).W)) + + val port_written = RegInit(VecInit(Seq.fill(outBW)(false.B))) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlFsm.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlFsm.scala new file mode 100644 index 00000000..90d12ffb --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlFsm.scala @@ -0,0 +1,43 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ + +trait GemminiExCtrlFsm { this: GemminiExCtrl => + + protected def runFsm(): Unit = { + switch(state) { + is(sIdle) { + handleIdleState() + } + is(sPreloadRead) { + handlePreloadReadState() + } + is(sPreloadFeed) { + handlePreloadFeedState() + } + is(sComputeRead) { + handleComputeReadState() + } + is(sComputeFeed) { + handleComputeFeedState() + } + is(sComputeFlush) { + handleComputeFlushState() + } + is(sDrain) { + handleDrainState() + } + is(sStore) { + handleStoreState() + } + is(sCommit) { + handleCommitState() + } + is(sFlush) { + handleFlushState() + } + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlPreloadStates.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlPreloadStates.scala new file mode 100644 index 00000000..22044514 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlPreloadStates.scala @@ -0,0 +1,84 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import gemmini._ + +trait GemminiExCtrlPreloadStates { this: GemminiExCtrl => + + protected def handlePreloadReadState(): Unit = { + when(cfg_dataflow === Dataflow.OS.id.U) { + when(read_row_cnt < total_rows) { + io.bankReadReq(0).valid := true.B + io.bankReadReq(0).bits.addr := read_row_cnt + when(io.bankReadReq(0).ready) { + read_row_cnt := read_row_cnt + 1.U + } + }.otherwise { + state := sPreloadFeed + } + }.otherwise { + when(read_row_cnt < total_rows) { + io.bankReadReq(0).valid := true.B + io.bankReadReq(0).bits.addr := total_rows - 1.U - read_row_cnt + when(io.bankReadReq(0).ready) { + read_row_cnt := read_row_cnt + 1.U + } + }.otherwise { + state := sPreloadFeed + } + } + } + + protected def handlePreloadFeedState(): Unit = { + when(!req_sent) { + mesh.io.req.valid := true.B + mesh.io.req.bits.pe_control.dataflow := cfg_dataflow + mesh.io.req.bits.pe_control.propagate := 1.U + mesh.io.req.bits.pe_control.shift := cfg_in_shift + // Buckyball sends preload/compute as separate commands, so the + // AlwaysOutTransposer inside MeshWithDelays cannot be primed with + // compute-A during preload (unlike Chipyard's interleaved flow). + // Negate here so MeshWithDelays' internal !a_transpose yields false, + // keeping the transposer inactive. Software must pre-transpose A. + mesh.io.req.bits.a_transpose := Mux(cfg_dataflow === Dataflow.OS.id.U, !cfg_a_transpose, cfg_a_transpose) + mesh.io.req.bits.bd_transpose := cfg_bd_transpose + mesh.io.req.bits.total_rows := total_rows + mesh.io.req.bits.tag.rob := robIdAsTag8(rob_id_reg) + mesh.io.req.bits.flush := 0.U + when(mesh.io.req.fire) { + req_sent := true.B + } + } + + when(req_sent && feed_row_cnt < total_rows) { + when(rdQueue0.io.deq.valid) { + val row_data = rdQueue0.io.deq.bits.data.asTypeOf(Vec(DIM, inputType)) + mesh.io.a.valid := true.B + mesh.io.a.bits := 0.U.asTypeOf(mesh.A_TYPE) + mesh.io.b.valid := true.B + mesh.io.b.bits := 0.U.asTypeOf(mesh.B_TYPE) + mesh.io.d.valid := true.B + // OS preload in Buckyball is used to prime pipeline state before compute. + // Feed D=0 to avoid injecting bias-like data into the following matmul. + mesh.io.d.bits := Mux( + cfg_dataflow === Dataflow.OS.id.U, + 0.U.asTypeOf(mesh.D_TYPE), + VecInit(row_data.grouped(config.tileColumns).map(g => VecInit(g)).toSeq) + ) + when(mesh.io.a.ready && mesh.io.b.ready && mesh.io.d.ready) { + rdQueue0.io.deq.ready := true.B + feed_row_cnt := feed_row_cnt + 1.U + } + } + } + + when(req_sent && feed_row_cnt >= total_rows) { + io.cmdResp.valid := true.B + when(io.cmdResp.fire) { + state := sIdle + } + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlStoreOps.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlStoreOps.scala new file mode 100644 index 00000000..38dfcb80 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlStoreOps.scala @@ -0,0 +1,104 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import gemmini._ + +trait GemminiExCtrlStoreOps { this: GemminiExCtrl => + + protected def handleComputeFlushState(): Unit = { + when(mesh.io.resp.fire && mesh.io.resp.bits.total_rows === total_rows) { + when(outBufCollected < total_rows && mesh.io.resp.bits.tag.rob =/= 0xff.U) { + outBuf(total_rows - 1.U - outBufCollected) := mesh.io.resp.bits.data + outBufCollected := outBufCollected + 1.U + }.otherwise {} + } + + when(outBufCollected >= total_rows) { + store_row_cnt := 0.U + port_written.foreach(_ := false.B) + state := sStore + }.otherwise { + // Flush once to drain remaining rows after feeding is complete. + when(!req_sent) { + mesh.io.req.valid := true.B + mesh.io.req.bits.pe_control.dataflow := cfg_dataflow + mesh.io.req.bits.pe_control.propagate := 1.U + mesh.io.req.bits.pe_control.shift := cfg_in_shift + mesh.io.req.bits.a_transpose := Mux(cfg_dataflow === Dataflow.OS.id.U, !cfg_a_transpose, cfg_a_transpose) + mesh.io.req.bits.bd_transpose := cfg_bd_transpose + mesh.io.req.bits.total_rows := total_rows + mesh.io.req.bits.tag.rob := robIdAsTag8(rob_id_reg) + mesh.io.req.bits.flush := 1.U + when(mesh.io.req.fire) { + req_sent := true.B + } + } + } + } + + protected def handleDrainState(): Unit = { + when(mesh.io.resp.valid) { + when(cfg_dataflow === Dataflow.OS.id.U) { + when(mesh.io.resp.bits.total_rows === total_rows) { + outBuf(outBufRows) := mesh.io.resp.bits.data + outBufRows := outBufRows - 1.U + outBufCollected := outBufCollected + 1.U + } + }.otherwise { + outBuf(outBufRows) := mesh.io.resp.bits.data + outBufRows := outBufRows + 1.U + } + } + + when(cfg_dataflow === Dataflow.OS.id.U) { + when(outBufCollected >= total_rows) { + store_row_cnt := 0.U + port_written.foreach(_ := false.B) + state := sStore + } + }.otherwise { + when(outBufRows >= total_rows) { + store_row_cnt := 0.U + port_written.foreach(_ := false.B) + state := sStore + } + } + } + + protected def handleStoreState(): Unit = { + when(store_row_cnt < total_rows) { + // One row = DIM elements; split across outBW ports (group_id 0..outBW-1). + // mvout reads (addr, group 0), (addr, group 1), ... and concatenates as row. + // So port i (group_id=i) must get elements [i*elemsPerPort .. (i+1)*elemsPerPort). + val rowIdx = store_row_cnt + val row = outBuf(rowIdx) + val flat_raw = VecInit(row.flatten) + val row_bits = Cat(flat_raw.map(_.asUInt).reverse) + + val bitsPerPort = b.memDomain.bankWidth + + for (i <- 0 until outBW) { + when(!port_written(i)) { + val slice = row_bits((i + 1) * bitsPerPort - 1, i * bitsPerPort) + io.bankWrite(i).req.valid := true.B + io.bankWrite(i).req.bits.addr := store_row_cnt + io.bankWrite(i).req.bits.data := slice + io.bankWrite(i).req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(true.B)) + io.bankWrite(i).req.bits.wmode := true.B + when(io.bankWrite(i).req.ready) { + port_written(i) := true.B + } + } + } + + when(port_written.asUInt.andR) { + store_row_cnt := store_row_cnt + 1.U + port_written.foreach(_ := false.B) + } + }.otherwise { + state := sCommit + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlTypes.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlTypes.scala new file mode 100644 index 00000000..710e7825 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/GemminiExCtrlTypes.scala @@ -0,0 +1,19 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ + +/** Minimal tag for MeshWithDelays that satisfies TagQueueTag */ +class SimpleTag extends Bundle with gemmini.TagQueueTag { + val rob = UInt(8.W) + override def make_this_garbage(dummy: Int = 0): Unit = + rob := 0xff.U +} + +/** Sub-command encoding within the special field */ +object GemminiSubCmd { + val CONFIG = 0.U(4.W) + val PRELOAD = 1.U(4.W) + val COMPUTE_PRELOADED = 2.U(4.W) + val COMPUTE_ACCUMULATED = 3.U(4.W) + val FLUSH = 4.U(4.W) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/LICENSE.gemmini b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LICENSE.gemmini new file mode 100644 index 00000000..17824de7 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LICENSE.gemmini @@ -0,0 +1,24 @@ +Copyright (c) 2018-2019, The Regents of the University of California +(Regents). All Rights Reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: +1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. +3. Neither the name of the Regents nor the + names of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + +IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, +SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING +OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS +BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED +HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE +MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopCmd.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopCmd.scala new file mode 100644 index 00000000..e4076235 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopCmd.scala @@ -0,0 +1,74 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig + +object LoopSubCmdType { + val MSET_ALLOC = 0.U(3.W) + val MSET_FREE = 1.U(3.W) + val MVIN = 2.U(3.W) + val MVOUT = 3.U(3.W) + val PRELOAD = 4.U(3.W) + val COMPUTE = 5.U(3.W) +} + +class LoopSubCmd(val b: GlobalConfig) extends Bundle { + val cmdType = UInt(3.W) + val bank_id = UInt(log2Up(b.memDomain.bankNum).W) + val dram_addr = UInt(b.memDomain.memAddrLen.W) + val iter = UInt(b.frontend.iter_len.W) + val bank_row = UInt(5.W) + val bank_col = UInt(5.W) + val op1_bank = UInt(log2Up(b.memDomain.bankNum).W) + val op2_bank = UInt(log2Up(b.memDomain.bankNum).W) + val wr_bank = UInt(log2Up(b.memDomain.bankNum).W) + val compute_mode = UInt(2.W) +} + +class LoopCmd(val b: GlobalConfig) extends Bundle { + val slots = Vec(4, Valid(new LoopSubCmd(b))) +} + +class LoopWsConfig(val b: GlobalConfig) extends Bundle { + val max_i = UInt(16.W) + val max_j = UInt(16.W) + val max_k = UInt(16.W) + val dram_addr_a = UInt(b.memDomain.memAddrLen.W) + val dram_addr_b = UInt(b.memDomain.memAddrLen.W) + val dram_addr_d = UInt(b.memDomain.memAddrLen.W) + val dram_addr_c = UInt(b.memDomain.memAddrLen.W) + val stride_a = UInt(32.W) + val stride_b = UInt(32.W) + val stride_d = UInt(32.W) + val stride_c = UInt(32.W) + val bank_a = UInt(log2Up(b.memDomain.bankNum).W) + val bank_b = UInt(log2Up(b.memDomain.bankNum).W) + val bank_c = UInt(log2Up(b.memDomain.bankNum).W) + val low_d = Bool() +} + +class LoopConvWsConfig(val b: GlobalConfig) extends Bundle { + val batch_size = UInt(16.W) + val in_dim = UInt(16.W) + val in_channels = UInt(16.W) + val out_channels = UInt(16.W) + val out_dim = UInt(16.W) + val stride = UInt(8.W) + val padding = UInt(8.W) + val kernel_dim = UInt(8.W) + val pool_size = UInt(8.W) + val pool_stride = UInt(8.W) + val pool_padding = UInt(8.W) + val dram_addr_bias = UInt(b.memDomain.memAddrLen.W) + val dram_addr_input = UInt(b.memDomain.memAddrLen.W) + val dram_addr_weight = UInt(b.memDomain.memAddrLen.W) + val dram_addr_output = UInt(b.memDomain.memAddrLen.W) + val input_stride = UInt(32.W) + val weight_stride = UInt(32.W) + val output_stride = UInt(32.W) + val bank_input = UInt(log2Up(b.memDomain.bankNum).W) + val bank_weight = UInt(log2Up(b.memDomain.bankNum).W) + val bank_output = UInt(log2Up(b.memDomain.bankNum).W) + val no_bias = Bool() +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopCmdEncoder.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopCmdEncoder.scala new file mode 100644 index 00000000..96f21030 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopCmdEncoder.scala @@ -0,0 +1,126 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.balldomain.blink.SubRobRow +import framework.frontend.decoder.PostGDCmd +import framework.frontend.scoreboard.BankAccessInfo +import framework.frontend.decoder.DomainId + +/** Converts LoopCmd (logical) → SubRobRow (hardware PostGDCmd) for SubROB. */ +@instantiable +class LoopCmdEncoder(val b: GlobalConfig) extends Module { + val bankIdLen = log2Up(b.memDomain.bankNum) + + @public + val io = IO(new Bundle { + val cmd = Flipped(Decoupled(new LoopCmd(b))) + val subRobRow = Decoupled(new SubRobRow(b)) + val ballId = Input(UInt(log2Up(b.ballDomain.ballNum).W)) + val masterRobId = Input(UInt(log2Up(b.frontend.rob_entries).W)) + }) + + // Passthrough handshake + io.subRobRow.valid := io.cmd.valid + io.cmd.ready := io.subRobRow.ready + + io.subRobRow.bits.ball_id := io.ballId + io.subRobRow.bits.master_rob_id := io.masterRobId + + for (i <- 0 until 4) { + val slot = io.subRobRow.bits.slots(i) + val lsub = io.cmd.bits.slots(i) + slot.valid := lsub.valid + + // Defaults — all unused RoCCCommandBB fields zero + slot.cmd.domain_id := 0.U + slot.cmd.cmd.raw_inst := 0.U + slot.cmd.cmd.pc := 0.U + slot.cmd.cmd.funct := 0.U + slot.cmd.cmd.funct3 := 0.U + slot.cmd.cmd.rs2 := 0.U + slot.cmd.cmd.rs1 := 0.U + slot.cmd.cmd.xd := false.B + slot.cmd.cmd.xs1 := false.B + slot.cmd.cmd.xs2 := false.B + slot.cmd.cmd.rd := 0.U + slot.cmd.cmd.opcode := 0.U + slot.cmd.cmd.rs1Data := 0.U + slot.cmd.cmd.rs2Data := 0.U + slot.cmd.bankAccess := BankAccessInfo.none(bankIdLen) + slot.cmd.isFence := false.B + slot.cmd.isBarrier := false.B + + when(lsub.valid) { + switch(lsub.bits.cmdType) { + is(LoopSubCmdType.MSET_ALLOC) { + slot.cmd.domain_id := DomainId.MEM + slot.cmd.cmd.funct := 0x20.U // MSET (enable=010, opcode=0) + slot.cmd.cmd.rs1Data := lsub.bits.bank_id + slot.cmd.cmd.rs2Data := lsub.bits.bank_row.asUInt | + (lsub.bits.bank_col.asUInt << 5) | + (1.U << 10) // alloc=1 + slot.cmd.bankAccess.wr_bank_valid := true.B + slot.cmd.bankAccess.wr_bank_id := lsub.bits.bank_id + } + is(LoopSubCmdType.MSET_FREE) { + slot.cmd.domain_id := DomainId.MEM + slot.cmd.cmd.funct := 0x20.U // MSET + slot.cmd.cmd.rs1Data := lsub.bits.bank_id + slot.cmd.cmd.rs2Data := 0.U // alloc=0 + slot.cmd.bankAccess.wr_bank_valid := true.B + slot.cmd.bankAccess.wr_bank_id := lsub.bits.bank_id + } + is(LoopSubCmdType.MVIN) { + slot.cmd.domain_id := DomainId.MEM + slot.cmd.cmd.funct := 0x21.U // MVIN (enable=010, opcode=1) + slot.cmd.cmd.rs1Data := lsub.bits.bank_id | (lsub.bits.iter << 30) + slot.cmd.cmd.rs2Data := lsub.bits.dram_addr + slot.cmd.bankAccess.wr_bank_valid := true.B + slot.cmd.bankAccess.wr_bank_id := lsub.bits.bank_id + } + is(LoopSubCmdType.MVOUT) { + slot.cmd.domain_id := DomainId.MEM + slot.cmd.cmd.funct := 0x10.U // MVOUT (enable=001, opcode=0) + slot.cmd.cmd.rs1Data := lsub.bits.bank_id | (lsub.bits.iter << 30) + slot.cmd.cmd.rs2Data := lsub.bits.dram_addr + slot.cmd.bankAccess.rd_bank_0_valid := true.B + slot.cmd.bankAccess.rd_bank_0_id := lsub.bits.bank_id + } + is(LoopSubCmdType.PRELOAD) { + slot.cmd.domain_id := DomainId.BALL + slot.cmd.cmd.funct := 0x35.U // GEMMINI_PRELOAD (enable=011, opcode=5) + slot.cmd.cmd.rs1Data := lsub.bits.op1_bank | + (lsub.bits.wr_bank << 20) | + (lsub.bits.iter << 30) + slot.cmd.cmd.rs2Data := 0.U + slot.cmd.bankAccess.rd_bank_0_valid := true.B + slot.cmd.bankAccess.rd_bank_0_id := lsub.bits.op1_bank + slot.cmd.bankAccess.wr_bank_valid := true.B + slot.cmd.bankAccess.wr_bank_id := lsub.bits.wr_bank + } + is(LoopSubCmdType.COMPUTE) { + slot.cmd.domain_id := DomainId.BALL + slot.cmd.cmd.funct := Mux( + lsub.bits.compute_mode === 0.U, + 0x42.U, // COMPUTE_PRELOADED (enable=100, opcode=2) + 0x43.U // COMPUTE_ACCUMULATED (enable=100, opcode=3) + ) + slot.cmd.cmd.rs1Data := lsub.bits.op1_bank | + (lsub.bits.op2_bank << 10) | + (lsub.bits.wr_bank << 20) | + (lsub.bits.iter << 30) + slot.cmd.cmd.rs2Data := 0.U + slot.cmd.bankAccess.rd_bank_0_valid := true.B + slot.cmd.bankAccess.rd_bank_0_id := lsub.bits.op1_bank + slot.cmd.bankAccess.rd_bank_1_valid := true.B + slot.cmd.bankAccess.rd_bank_1_id := lsub.bits.op2_bank + slot.cmd.bankAccess.wr_bank_valid := true.B + slot.cmd.bankAccess.wr_bank_id := lsub.bits.wr_bank + } + } + } + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopConvAddrGen.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopConvAddrGen.scala new file mode 100644 index 00000000..209522e8 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopConvAddrGen.scala @@ -0,0 +1,77 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.balldomain.prototype.gemmini.configs.GemminiBallParam + +/** + * LoopConvAddrGen — combinational im2col address computation. + * + * Given loop indices and conv config, computes: + * - Input DRAM address (with padding detection) + * - Weight DRAM address + * - Output DRAM address + * - Bias DRAM address + */ +@instantiable +class LoopConvAddrGen(val b: GlobalConfig) extends Module { + val config = GemminiBallParam() + val DIM = config.blockSize + val elemSize = config.inputWidth / 8 + val accBytes = config.accWidth / 8 + + @public + val io = IO(new Bundle { + // Conv config + val cfg = Input(new LoopConvWsConfig(b)) + // Loop indices + val batch = Input(UInt(16.W)) + val orow = Input(UInt(16.W)) + val ocol = Input(UInt(16.W)) + val och = Input(UInt(16.W)) + val krow = Input(UInt(16.W)) + val kcol = Input(UInt(16.W)) + val kch = Input(UInt(16.W)) + // Outputs + val inputAddr = Output(UInt(b.memDomain.memAddrLen.W)) + val weightAddr = Output(UInt(b.memDomain.memAddrLen.W)) + val outputAddr = Output(UInt(b.memDomain.memAddrLen.W)) + val biasAddr = Output(UInt(b.memDomain.memAddrLen.W)) + val isPadding = Output(Bool()) + }) + + val cfg = io.cfg + + // im2col: irow = orow * stride + krow - padding + val irow = (io.orow * cfg.stride).asSInt + io.krow.asSInt - cfg.padding.asSInt + val icol = (io.ocol * cfg.stride).asSInt + io.kcol.asSInt - cfg.padding.asSInt + + // Padding detection + io.isPadding := irow < 0.S || irow >= cfg.in_dim.asSInt || + icol < 0.S || icol >= cfg.in_dim.asSInt + + // Input address: base + ((batch * in_dim * in_dim + irow * in_dim + icol) * in_channels + kch) * elemSize + val inputOffset = + (io.batch * cfg.in_dim * cfg.in_dim + + irow.asUInt * cfg.in_dim + icol.asUInt) * cfg.in_channels + io.kch + + io.inputAddr := cfg.dram_addr_input + inputOffset * elemSize.U + + // Weight address: base + ((krow * kernel_dim + kcol) * in_channels + kch) * out_channels * elemSize + // + och * elemSize + val weightOffset = ((io.krow * cfg.kernel_dim + io.kcol) * cfg.in_channels + io.kch) * + cfg.out_channels + io.och + io.weightAddr := cfg.dram_addr_weight + weightOffset * elemSize.U + + // Output address: base + ((batch * out_dim * out_dim + orow * out_dim + ocol) * out_channels + och) * accBytes + val outputOffset = + (io.batch * cfg.out_dim * cfg.out_dim + + io.orow * cfg.out_dim + io.ocol) * cfg.out_channels + io.och + + io.outputAddr := cfg.dram_addr_output + outputOffset * accBytes.U + + // Bias address: base + och * accBytes + io.biasAddr := cfg.dram_addr_bias + io.och * accBytes.U +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopConvUnroller.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopConvUnroller.scala new file mode 100644 index 00000000..a16a3265 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopConvUnroller.scala @@ -0,0 +1,256 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.top.GlobalConfig +import framework.balldomain.prototype.gemmini.configs.GemminiBallParam + +/** + * LoopConvUnroller — expands LOOP_CONV_WS into LoopCmd rows. + * + * Simplified iteration: + * batch → orow → ocol → och → krow → kcol → kch (innermost) + * + * Each (orow, ocol, och, krow, kcol, kch) tile generates: + * MVIN input + MVIN weight → PRELOAD + COMPUTE → (on last k-iter) MVOUT + * + * First implementation: no pooling, no im2col tiling, single DIM×DIM tiles. + */ +@instantiable +class LoopConvUnroller(val b: GlobalConfig) extends Module { + val config = GemminiBallParam() + val DIM = config.blockSize + val elemSize = config.inputWidth / 8 + val accBytes = config.accWidth / 8 + + @public + val io = IO(new Bundle { + val start = Flipped(Decoupled(new LoopConvWsConfig(b))) + val cmd = Decoupled(new LoopCmd(b)) + val busy = Output(Bool()) + }) + + val addrGen: Instance[LoopConvAddrGen] = Instantiate(new LoopConvAddrGen(b)) + + // FSM states + val sIdle :: sAlloc :: sMvinInput :: sMvinWeight :: sCompute :: sMvout :: sFree :: sDone :: Nil = Enum(8) + val state = RegInit(sIdle) + + val cfg = Reg(new LoopConvWsConfig(b)) + + // Iterator registers + val batch_reg = RegInit(0.U(16.W)) + val orow_reg = RegInit(0.U(16.W)) + val ocol_reg = RegInit(0.U(16.W)) + val och_reg = RegInit(0.U(16.W)) + val krow_reg = RegInit(0.U(16.W)) + val kcol_reg = RegInit(0.U(16.W)) + val kch_reg = RegInit(0.U(16.W)) + + // Is this the first k-iteration for this (orow, ocol, och)? + val isFirstK = krow_reg === 0.U && kcol_reg === 0.U && kch_reg === 0.U + + // Is this the last k-iteration? + val isLastK = (krow_reg === cfg.kernel_dim - 1.U) && + (kcol_reg === cfg.kernel_dim - 1.U) && + (kch_reg + DIM.U >= cfg.in_channels) // Tile boundary + + // Connect address generator + addrGen.io.cfg := cfg + addrGen.io.batch := batch_reg + addrGen.io.orow := orow_reg + addrGen.io.ocol := ocol_reg + addrGen.io.och := och_reg + addrGen.io.krow := krow_reg + addrGen.io.kcol := kcol_reg + addrGen.io.kch := kch_reg + + // Defaults + io.start.ready := state === sIdle + io.cmd.valid := false.B + io.cmd.bits := 0.U.asTypeOf(new LoopCmd(b)) + io.busy := state =/= sIdle + + def emptySlot(): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := false.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v + } + + def msetSlot( + bankId: UInt, + alloc: Boolean, + row: Int, + col: Int + ): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := true.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v.bits.cmdType := (if (alloc) LoopSubCmdType.MSET_ALLOC else LoopSubCmdType.MSET_FREE) + v.bits.bank_id := bankId + v.bits.bank_row := row.U + v.bits.bank_col := col.U + v + } + + def mvinSlot(bankId: UInt, addr: UInt): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := true.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v.bits.cmdType := LoopSubCmdType.MVIN + v.bits.bank_id := bankId + v.bits.dram_addr := addr + v.bits.iter := DIM.U + v + } + + // Advance k-iterators (kch → kcol → krow), then output iterators (och → ocol → orow → batch) + def advanceIter(): Unit = { + when(kch_reg + DIM.U < cfg.in_channels) { + kch_reg := kch_reg + DIM.U + }.elsewhen(kcol_reg + 1.U < cfg.kernel_dim) { + kch_reg := 0.U; kcol_reg := kcol_reg + 1.U + }.elsewhen(krow_reg + 1.U < cfg.kernel_dim) { + kch_reg := 0.U; kcol_reg := 0.U; krow_reg := krow_reg + 1.U + }.elsewhen(och_reg + DIM.U < cfg.out_channels) { + kch_reg := 0.U; kcol_reg := 0.U; krow_reg := 0.U; och_reg := och_reg + DIM.U + }.elsewhen(ocol_reg + 1.U < cfg.out_dim) { + kch_reg := 0.U; kcol_reg := 0.U; krow_reg := 0.U; och_reg := 0.U + ocol_reg := ocol_reg + 1.U + }.elsewhen(orow_reg + 1.U < cfg.out_dim) { + kch_reg := 0.U; kcol_reg := 0.U; krow_reg := 0.U; och_reg := 0.U; ocol_reg := 0.U + orow_reg := orow_reg + 1.U + }.elsewhen(batch_reg + 1.U < cfg.batch_size) { + kch_reg := 0.U; kcol_reg := 0.U; krow_reg := 0.U; och_reg := 0.U; ocol_reg := 0.U; orow_reg := 0.U + batch_reg := batch_reg + 1.U + }.otherwise { + state := sFree // All iterations done + } + } + + // Track whether all iterations are done (used to decide next state) + val allDone = (batch_reg === cfg.batch_size - 1.U) && + (orow_reg === cfg.out_dim - 1.U) && (ocol_reg === cfg.out_dim - 1.U) && + (och_reg + DIM.U >= cfg.out_channels) && isLastK + + switch(state) { + is(sIdle) { + io.start.ready := true.B + when(io.start.fire) { + cfg := io.start.bits + batch_reg := 0.U; orow_reg := 0.U; ocol_reg := 0.U; och_reg := 0.U + krow_reg := 0.U; kcol_reg := 0.U; kch_reg := 0.U + state := sAlloc + } + } + + // [MSET_alloc(input), MSET_alloc(weight), MSET_alloc(output), --] + is(sAlloc) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := msetSlot(cfg.bank_input, alloc = true, row = 1, col = 1) + io.cmd.bits.slots(1) := msetSlot(cfg.bank_weight, alloc = true, row = 1, col = 1) + io.cmd.bits.slots(2) := msetSlot(cfg.bank_output, alloc = true, row = 1, col = 4) + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sMvinInput + } + } + + // [MVIN_input, MVIN_weight, --, --] + is(sMvinInput) { + io.cmd.valid := true.B + val inputSlot = Mux( + addrGen.io.isPadding, + emptySlot(), // Skip MVIN if padding (load zeros implicitly) + mvinSlot(cfg.bank_input, addrGen.io.inputAddr) + ) + io.cmd.bits.slots(0) := inputSlot + io.cmd.bits.slots(1) := mvinSlot(cfg.bank_weight, addrGen.io.weightAddr) + io.cmd.bits.slots(2) := emptySlot() + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sCompute + } + } + + // [PRELOAD, COMPUTE, --, --] + is(sCompute) { + io.cmd.valid := true.B + val preSlot = Wire(Valid(new LoopSubCmd(b))) + preSlot.valid := true.B + preSlot.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + preSlot.bits.cmdType := LoopSubCmdType.PRELOAD + preSlot.bits.op1_bank := cfg.bank_weight + preSlot.bits.wr_bank := cfg.bank_output + preSlot.bits.iter := DIM.U + + val compSlot = Wire(Valid(new LoopSubCmd(b))) + compSlot.valid := true.B + compSlot.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + compSlot.bits.cmdType := LoopSubCmdType.COMPUTE + compSlot.bits.op1_bank := cfg.bank_weight + compSlot.bits.op2_bank := cfg.bank_input + compSlot.bits.wr_bank := cfg.bank_output + compSlot.bits.compute_mode := Mux(isFirstK, 0.U, 1.U) // PRELOADED first, then ACCUMULATED + compSlot.bits.iter := DIM.U + + io.cmd.bits.slots(0) := preSlot + io.cmd.bits.slots(1) := compSlot + io.cmd.bits.slots(2) := emptySlot() + io.cmd.bits.slots(3) := emptySlot() + + when(io.cmd.fire) { + when(isLastK) { + state := sMvout // Need to output results after last k iteration + }.otherwise { + advanceIter() + when(state =/= sFree)(state := sMvinInput) + } + } + } + + // [MVOUT, --, --, --] + is(sMvout) { + io.cmd.valid := true.B + val mvSlot = Wire(Valid(new LoopSubCmd(b))) + mvSlot.valid := true.B + mvSlot.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + mvSlot.bits.cmdType := LoopSubCmdType.MVOUT + mvSlot.bits.bank_id := cfg.bank_output + mvSlot.bits.dram_addr := addrGen.io.outputAddr + mvSlot.bits.iter := DIM.U + + io.cmd.bits.slots(0) := mvSlot + io.cmd.bits.slots(1) := emptySlot() + io.cmd.bits.slots(2) := emptySlot() + io.cmd.bits.slots(3) := emptySlot() + + when(io.cmd.fire) { + when(allDone) { + state := sFree + }.otherwise { + advanceIter() + when(state =/= sFree)(state := sMvinInput) + } + } + } + + // [MSET_free(input), MSET_free(weight), MSET_free(output), --] + is(sFree) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := msetSlot(cfg.bank_input, alloc = false, row = 0, col = 0) + io.cmd.bits.slots(1) := msetSlot(cfg.bank_weight, alloc = false, row = 0, col = 0) + io.cmd.bits.slots(2) := msetSlot(cfg.bank_output, alloc = false, row = 0, col = 0) + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sDone + } + } + + is(sDone) { + state := sIdle + } + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopMatmulUnroller.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopMatmulUnroller.scala new file mode 100644 index 00000000..453dd074 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/LoopMatmulUnroller.scala @@ -0,0 +1,262 @@ +package framework.balldomain.prototype.gemmini + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.balldomain.prototype.gemmini.configs.GemminiBallParam + +/** + * LoopMatmulUnroller — expands LOOP_WS into a stream of LoopCmd rows. + * + * FSM: sIdle → sAllocBanks → sPrimeLoad → sMainLoop (Row1/Row2) → sDrainLast → sFreeBanks → sDone + * + * Row layout per iteration: + * Row 1: [PRELOAD(cur), COMPUTE(cur)] + * Row 2: [MVOUT(cur), MVIN_A(next), MVIN_B(next)] + * Last iteration: only Row 1, then sDrainLast handles final MVOUT. + */ +@instantiable +class LoopMatmulUnroller(val b: GlobalConfig) extends Module { + val config = GemminiBallParam() + val DIM = config.blockSize + val elemSize = config.inputWidth / 8 + val accBytes = config.accWidth / 8 + val bankIdLen = log2Up(b.memDomain.bankNum) + + @public + val io = IO(new Bundle { + val start = Flipped(Decoupled(new LoopWsConfig(b))) + val cmd = Decoupled(new LoopCmd(b)) + val busy = Output(Bool()) + }) + + // FSM states + val sIdle :: sAllocBanks :: sPrimeLoad :: sMainRow1 :: sMainRow2 :: sDrainLast :: sFreeBanks :: sDone :: Nil = + Enum(8) + val state = RegInit(sIdle) + + // Config registers (latched at start) + val cfg = Reg(new LoopWsConfig(b)) + + // Loop iterator registers + val i_reg = RegInit(0.U(16.W)) + val j_reg = RegInit(0.U(16.W)) + val k_reg = RegInit(0.U(16.W)) + + // Total iterations and current iteration index + val totalIter = cfg.max_i * cfg.max_j * cfg.max_k + val curIter = RegInit(0.U(32.W)) + + // Whether this is the last iteration + val isLastIter = curIter === (totalIter - 1.U) + + // Defaults + io.start.ready := state === sIdle + io.cmd.valid := false.B + io.cmd.bits := 0.U.asTypeOf(new LoopCmd(b)) + io.busy := state =/= sIdle + + // Helper to build an invalid LoopSubCmd slot + def emptySlot(): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := false.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v + } + + // Helper to build an MSET slot + def msetSlot( + bankId: UInt, + alloc: Boolean, + row: Int, + col: Int + ): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := true.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v.bits.cmdType := (if (alloc) LoopSubCmdType.MSET_ALLOC else LoopSubCmdType.MSET_FREE) + v.bits.bank_id := bankId + v.bits.bank_row := row.U + v.bits.bank_col := col.U + v + } + + // Helper to build an MVIN slot + def mvinSlot(bankId: UInt, addr: UInt, iter: UInt): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := true.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v.bits.cmdType := LoopSubCmdType.MVIN + v.bits.bank_id := bankId + v.bits.dram_addr := addr + v.bits.iter := iter + v + } + + // Helper to build an MVOUT slot + def mvoutSlot(bankId: UInt, addr: UInt, iter: UInt): Valid[LoopSubCmd] = { + val v = Wire(Valid(new LoopSubCmd(b))) + v.valid := true.B + v.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + v.bits.cmdType := LoopSubCmdType.MVOUT + v.bits.bank_id := bankId + v.bits.dram_addr := addr + v.bits.iter := iter + v + } + + // Address computation + // addr_a(i,k) = dram_addr_a + i * stride_a + k * DIM * elemSize + // addr_b(k,j) = dram_addr_b + k * stride_b + j * DIM * elemSize + // addr_c(i,j) = dram_addr_c + i * stride_c + j * DIM * accBytes + def addrA(i: UInt, k: UInt): UInt = + cfg.dram_addr_a + i * cfg.stride_a + k * (DIM * elemSize).U + + def addrB(k: UInt, j: UInt): UInt = + cfg.dram_addr_b + k * cfg.stride_b + j * (DIM * elemSize).U + + def addrC(i: UInt, j: UInt): UInt = + cfg.dram_addr_c + i * cfg.stride_c + j * (DIM * accBytes).U + + // Next iterator values (advance k, then j, then i) + val next_k = Wire(UInt(16.W)) + val next_j = Wire(UInt(16.W)) + val next_i = Wire(UInt(16.W)) + + when(k_reg + 1.U < cfg.max_k) { + next_k := k_reg + 1.U + next_j := j_reg + next_i := i_reg + }.elsewhen(j_reg + 1.U < cfg.max_j) { + next_k := 0.U + next_j := j_reg + 1.U + next_i := i_reg + }.otherwise { + next_k := 0.U + next_j := 0.U + next_i := i_reg + 1.U + } + + // Compute mode: first k iteration uses PRELOADED (0), rest use ACCUMULATED (1) + val computeMode = Mux(k_reg === 0.U, 0.U(2.W), 1.U(2.W)) + + switch(state) { + is(sIdle) { + io.start.ready := true.B + when(io.start.fire) { + cfg := io.start.bits + i_reg := 0.U + j_reg := 0.U + k_reg := 0.U + curIter := 0.U + state := sAllocBanks + } + } + + // Row 0: [MSET_alloc(A), MSET_alloc(B), MSET_alloc(C), --] + is(sAllocBanks) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := msetSlot(cfg.bank_a, alloc = true, row = 1, col = 1) + io.cmd.bits.slots(1) := msetSlot(cfg.bank_b, alloc = true, row = 1, col = 1) + io.cmd.bits.slots(2) := msetSlot(cfg.bank_c, alloc = true, row = 1, col = 4) // accWidth = 4x inputWidth + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sPrimeLoad + } + } + + // Row 1: [MVIN_A(0), MVIN_B(0), --, --] + is(sPrimeLoad) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := mvinSlot(cfg.bank_a, addrA(0.U, 0.U), DIM.U) + io.cmd.bits.slots(1) := mvinSlot(cfg.bank_b, addrB(0.U, 0.U), DIM.U) + io.cmd.bits.slots(2) := emptySlot() + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sMainRow1 + } + } + + // Main loop Row 1: [PRELOAD(cur), COMPUTE(cur), --, --] + is(sMainRow1) { + io.cmd.valid := true.B + val preSlot = Wire(Valid(new LoopSubCmd(b))) + preSlot.valid := true.B + preSlot.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + preSlot.bits.cmdType := LoopSubCmdType.PRELOAD + preSlot.bits.op1_bank := cfg.bank_a // preload A (activations) into systolic array + preSlot.bits.wr_bank := cfg.bank_c + preSlot.bits.iter := DIM.U + + val compSlot = Wire(Valid(new LoopSubCmd(b))) + compSlot.valid := true.B + compSlot.bits := 0.U.asTypeOf(new LoopSubCmd(b)) + compSlot.bits.cmdType := LoopSubCmdType.COMPUTE + compSlot.bits.op1_bank := cfg.bank_a + compSlot.bits.op2_bank := cfg.bank_b + compSlot.bits.wr_bank := cfg.bank_c + compSlot.bits.compute_mode := computeMode + compSlot.bits.iter := DIM.U + + io.cmd.bits.slots(0) := preSlot + io.cmd.bits.slots(1) := compSlot + io.cmd.bits.slots(2) := emptySlot() + io.cmd.bits.slots(3) := emptySlot() + + when(io.cmd.fire) { + when(isLastIter) { + state := sDrainLast // Skip Row 2 for last iteration + }.otherwise { + state := sMainRow2 + } + } + } + + // Main loop Row 2: [MVOUT(cur), MVIN_A(next), MVIN_B(next), --] + is(sMainRow2) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := mvoutSlot(cfg.bank_c, addrC(i_reg, j_reg), DIM.U) + io.cmd.bits.slots(1) := mvinSlot(cfg.bank_a, addrA(next_i, next_k), DIM.U) + io.cmd.bits.slots(2) := mvinSlot(cfg.bank_b, addrB(next_k, next_j), DIM.U) + io.cmd.bits.slots(3) := emptySlot() + + when(io.cmd.fire) { + // Advance iterators + i_reg := next_i + j_reg := next_j + k_reg := next_k + curIter := curIter + 1.U + state := sMainRow1 + } + } + + // Drain: emit final MVOUT for last iteration + is(sDrainLast) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := mvoutSlot(cfg.bank_c, addrC(i_reg, j_reg), DIM.U) + io.cmd.bits.slots(1) := emptySlot() + io.cmd.bits.slots(2) := emptySlot() + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sFreeBanks + } + } + + // Free: [MSET_free(A), MSET_free(B), MSET_free(C), --] + is(sFreeBanks) { + io.cmd.valid := true.B + io.cmd.bits.slots(0) := msetSlot(cfg.bank_a, alloc = false, row = 0, col = 0) + io.cmd.bits.slots(1) := msetSlot(cfg.bank_b, alloc = false, row = 0, col = 0) + io.cmd.bits.slots(2) := msetSlot(cfg.bank_c, alloc = false, row = 0, col = 0) + io.cmd.bits.slots(3) := emptySlot() + when(io.cmd.fire) { + state := sDone + } + } + + is(sDone) { + state := sIdle + } + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/configs/GemminiBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/gemmini/configs/GemminiBallParam.scala new file mode 100644 index 00000000..db5015c4 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/configs/GemminiBallParam.scala @@ -0,0 +1,30 @@ +package framework.balldomain.prototype.gemmini.configs + +import upickle.default._ + +case class GemminiBallParam( + meshRows: Int, + meshColumns: Int, + tileRows: Int, + tileColumns: Int, + inputWidth: Int, + accWidth: Int, + tileLatency: Int, + outputDelay: Int) { + + val totalRows: Int = meshRows * tileRows + val totalColumns: Int = meshColumns * tileColumns + val blockSize: Int = totalRows // == totalColumns (must be square) +} + +object GemminiBallParam { + implicit val rw: ReadWriter[GemminiBallParam] = macroRW + + def apply(): GemminiBallParam = { + val jsonStr = scala.io.Source.fromFile( + "src/main/scala/framework/balldomain/prototype/gemmini/configs/default.json" + ).mkString + read[GemminiBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/gemmini/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/gemmini/configs/default.json new file mode 100644 index 00000000..44eea666 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/gemmini/configs/default.json @@ -0,0 +1,10 @@ +{ + "meshRows": 16, + "meshColumns": 16, + "tileRows": 1, + "tileColumns": 1, + "inputWidth": 8, + "accWidth": 32, + "tileLatency": 0, + "outputDelay": 0 +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/FIFO.scala b/arch/src/main/scala/framework/balldomain/prototype/im2col/FIFO.scala new file mode 100644 index 00000000..97971d52 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/FIFO.scala @@ -0,0 +1,39 @@ +package framework.balldomain.prototype.im2col + +import chisel3._ +import chisel3.util._ + +class RowSlotFIFO(maxRows: Int) extends Module { + + val io = IO(new Bundle { + val kRows = Input(UInt(log2Ceil(maxRows + 1).W)) + val init = Input(Bool()) + val advance = Input(Bool()) + val head = Output(UInt(log2Ceil(maxRows).W)) + val slotToOverwrite = Output(UInt(log2Ceil(maxRows).W)) + }) + + private val headReg = RegInit(0.U(log2Ceil(maxRows).W)) + + io.head := headReg + io.slotToOverwrite := headReg + + when(io.init) { + headReg := 0.U + }.elsewhen(io.advance && (io.kRows > 0.U)) { + when(headReg + 1.U === io.kRows) { + headReg := 0.U + }.otherwise { + headReg := headReg + 1.U + } + } +} + +object RowSlotFIFO { + + def logicalToPhysical(head: UInt, logicalRow: UInt, kRows: UInt): UInt = { + val sum = head + logicalRow + Mux(sum >= kRows, sum - kRows, sum) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/Im2col.scala b/arch/src/main/scala/framework/balldomain/prototype/im2col/Im2col.scala new file mode 100644 index 00000000..97e67d7d --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/Im2col.scala @@ -0,0 +1,239 @@ +package framework.balldomain.prototype.im2col + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} + +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig +import framework.balldomain.prototype.im2col.configs.Im2colBallParam + +/** + * Im2col — FSM scheduler that coordinates LineBufferManager and StreamWriter. + * + * Optimizations applied: + * A) Eliminated elemBuffer — elements stream directly from lineBuffer to pack register + * B) Eliminated hardware divider — dual counters (kRowIdx, kColIdx) replace t/kCol, t%kCol + */ +@instantiable +class Im2col(val b: GlobalConfig) extends Module { + private val maxK = Im2colBallParam().InputNum + + private val mapping = b.ballDomain.ballIdMappings + .find(_.ballName == "Im2colBall") + .getOrElse(throw new IllegalArgumentException("Im2colBall not found in config")) + + private val inBW = mapping.inBW + private val outBW = mapping.outBW + + @public val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + require(inBW >= 1, "[Im2col] inBW must be >= 1") + require(outBW >= 1, "[Im2col] outBW must be >= 1") + + // --- Sub-modules --- + val lineBuf: Instance[LineBufferManager] = Instantiate(new LineBufferManager(b)) + val writer: Instance[StreamWriter] = Instantiate(new StreamWriter(b)) + + // --- FSM --- + val idle :: preload :: stream :: flushing :: loadNext :: complete :: Nil = Enum(6) + val state = RegInit(idle) + + // --- Registers --- + private val robIdReg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + private val isSubReg = RegInit(false.B) + private val subRobIdReg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + private val rBankReg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + private val wBankReg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + private val rBaseBeatReg = RegInit(0.U(32.W)) + private val wBaseBeatReg = RegInit(0.U(32.W)) + + private val kRowReg = RegInit(0.U(log2Ceil(maxK + 1).W)) + private val kColReg = RegInit(0.U(log2Ceil(maxK + 1).W)) + private val inRowReg = RegInit(0.U(16.W)) + private val inColReg = RegInit(0.U(16.W)) + private val startRowReg = RegInit(0.U(16.W)) + private val startColReg = RegInit(0.U(16.W)) + private val rowPtrReg = RegInit(0.U(16.W)) + private val colPtrReg = RegInit(0.U(16.W)) + + // Dual counters (optimization B — replaces hardware divider) + private val kRowIdxReg = RegInit(0.U(log2Ceil(maxK + 1).W)) + private val kColIdxReg = RegInit(0.U(log2Ceil(maxK + 1).W)) + private val elemDoneReg = RegInit(false.B) + + // --- Derived signals --- + private val rowMax = inRowReg - kRowReg + private val colMax = inColReg - kColReg + private val rowEnd = rowPtrReg === (startRowReg + rowMax) + private val colEnd = colPtrReg === (startColReg + colMax) + private val isLastWindow = rowEnd && colEnd + + // --- Top-level IO defaults --- + io.cmdReq.ready := (state === idle) + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := robIdReg + io.cmdResp.bits.is_sub := isSubReg + io.cmdResp.bits.sub_rob_id := subRobIdReg + io.status.idle := (state === idle) + io.status.running := (state =/= idle) && (state =/= complete) + + // --- Wire up LineBufferManager --- + for (i <- 0 until inBW) { + lineBuf.io.bankRead(i) <> io.bankRead(i) + } + lineBuf.io.startPreload := false.B + lineBuf.io.startLoadNext := false.B + lineBuf.io.kRow := kRowReg + lineBuf.io.inCol := inColReg + lineBuf.io.rowPtr := rowPtrReg + lineBuf.io.rBaseBeat := rBaseBeatReg + lineBuf.io.rBankId := rBankReg + lineBuf.io.robId := robIdReg + lineBuf.io.elemReq.kRowIdx := kRowIdxReg + lineBuf.io.elemReq.kColIdx := kColIdxReg + lineBuf.io.elemReq.colPtr := colPtrReg + + // --- Wire up StreamWriter --- + for (i <- 0 until outBW) { + writer.io.bankWrite(i) <> io.bankWrite(i) + } + writer.io.start := false.B + writer.io.init := false.B + writer.io.flush := false.B + writer.io.wBaseBeat := wBaseBeatReg + writer.io.wBankId := wBankReg + writer.io.robId := robIdReg + + // Element stream: connect lineBuffer output to writer input + writer.io.elemIn.valid := false.B + writer.io.elemIn.bits := lineBuf.io.elemData + + // --- FSM --- + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + robIdReg := io.cmdReq.bits.rob_id + isSubReg := io.cmdReq.bits.is_sub + subRobIdReg := io.cmdReq.bits.sub_rob_id + rBankReg := io.cmdReq.bits.cmd.op1_bank + wBankReg := io.cmdReq.bits.cmd.wr_bank + + kColReg := io.cmdReq.bits.cmd.special(3, 0) + kRowReg := io.cmdReq.bits.cmd.special(7, 4) + inColReg := io.cmdReq.bits.cmd.special(12, 8) + inRowReg := io.cmdReq.bits.cmd.special(22, 13) + startColReg := io.cmdReq.bits.cmd.special(27, 23) + startRowReg := io.cmdReq.bits.cmd.special(37, 28) + + rowPtrReg := io.cmdReq.bits.cmd.special(37, 28) + colPtrReg := io.cmdReq.bits.cmd.special(27, 23) + + rBaseBeatReg := 0.U + wBaseBeatReg := 0.U + kRowIdxReg := 0.U + kColIdxReg := 0.U + elemDoneReg := false.B + + val cmdKCol = io.cmdReq.bits.cmd.special(3, 0) + val cmdKRow = io.cmdReq.bits.cmd.special(7, 4) + val cmdInCol = io.cmdReq.bits.cmd.special(12, 8) + val cmdInRow = io.cmdReq.bits.cmd.special(22, 13) + val invalidShape = (cmdKCol === 0.U) || (cmdKRow === 0.U) || (cmdInCol === 0.U) || (cmdInRow === 0.U) || + (cmdInCol < cmdKCol) || (cmdInRow < cmdKRow) + + when(invalidShape) { + state := complete + }.otherwise { + lineBuf.io.startPreload := true.B + state := preload + } + } + } + + is(preload) { + when(lineBuf.io.loadDone) { + kRowIdxReg := 0.U + kColIdxReg := 0.U + elemDoneReg := false.B + writer.io.init := true.B + state := stream + } + } + + is(stream) { + // Stream elements from lineBuffer through writer + when(!elemDoneReg && writer.io.elemIn.ready) { + writer.io.elemIn.valid := true.B + + // Advance dual counters + val isLastElem = (kRowIdxReg === (kRowReg - 1.U)) && (kColIdxReg === (kColReg - 1.U)) + when(isLastElem) { + elemDoneReg := true.B + }.otherwise { + when(kColIdxReg === (kColReg - 1.U)) { + kColIdxReg := 0.U + kRowIdxReg := kRowIdxReg + 1.U + }.otherwise { + kColIdxReg := kColIdxReg + 1.U + } + } + } + + // Window done — transition without flushing (pack continues across windows) + when(elemDoneReg && !writer.io.busy) { + when(isLastWindow) { + // Last window: flush remaining partial pack + writer.io.flush := true.B + state := flushing + }.elsewhen(colEnd) { + // Row boundary: need to load next row + colPtrReg := startColReg + rowPtrReg := rowPtrReg + 1.U + kRowIdxReg := 0.U + kColIdxReg := 0.U + elemDoneReg := false.B + lineBuf.io.startLoadNext := true.B + state := loadNext + }.otherwise { + // Column slide: directly start next window + colPtrReg := colPtrReg + 1.U + kRowIdxReg := 0.U + kColIdxReg := 0.U + elemDoneReg := false.B + state := stream + } + } + } + + is(flushing) { + // Wait for writer to finish flushing (write request must fire) + when(!writer.io.busy) { + state := complete + } + } + + is(loadNext) { + when(lineBuf.io.loadDone) { + kRowIdxReg := 0.U + kColIdxReg := 0.U + elemDoneReg := false.B + state := stream + } + } + + is(complete) { + io.cmdResp.valid := true.B + when(io.cmdResp.fire) { + state := idle + } + } + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/Im2colBall.scala b/arch/src/main/scala/framework/balldomain/prototype/im2col/Im2colBall.scala new file mode 100644 index 00000000..0a47fd06 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/Im2colBall.scala @@ -0,0 +1,46 @@ +package framework.balldomain.prototype.im2col + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.balldomain.prototype.im2col.Im2col +import framework.top.GlobalConfig + +/** + * Im2colBall - An Im2col computation Ball that complies with the Blink protocol + */ +class Im2colBall(val b: GlobalConfig) extends Module with HasBlink { + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "Im2colBall") + .getOrElse(throw new IllegalArgumentException("Im2colBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + // Instantiate Im2col + val im2colUnit: Instance[Im2col] = Instantiate(new Im2col(b)) + + // Connect command interface + im2colUnit.io.cmdReq <> io.cmdReq + im2colUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + im2colUnit.io.bankRead(i) <> io.bankRead(i) + } + + // Connect SRAM write interface - Im2col needs to write to scratchpad + for (i <- 0 until outBW) { + im2colUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + // Connect Status signals - directly obtained from internal unit + io.status <> im2colUnit.io.status + + // Ball does not use SubROB: tie off subRobReq + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/LineBufferManager.scala b/arch/src/main/scala/framework/balldomain/prototype/im2col/LineBufferManager.scala new file mode 100644 index 00000000..0561ad94 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/LineBufferManager.scala @@ -0,0 +1,179 @@ +package framework.balldomain.prototype.im2col + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.blink.BankRead +import framework.top.GlobalConfig +import framework.balldomain.prototype.im2col.configs.Im2colBallParam + +/** + * LineBufferManager — manages lineBuffer loading and element extraction. + * + * Handles preload (loading kRow rows) and load_next_row (loading 1 new row + * with FIFO rotation). Provides a combinational element read port that + * extracts a single element from the lineBuffer given (kRowIdx, kColIdx). + */ +@instantiable +class LineBufferManager(val b: GlobalConfig) extends Module { + private val maxK = Im2colBallParam().InputNum + private val elemWidth = Im2colBallParam().inputWidth + private val bankWidth = b.memDomain.bankWidth + private val lanesPerBeat = 16 + private val maxInCol = 32 + private val maxInColWords = (maxInCol + lanesPerBeat - 1) / lanesPerBeat + + private val mapping = b.ballDomain.ballIdMappings + .find(_.ballName == "Im2colBall") + .getOrElse(throw new IllegalArgumentException("Im2colBall not found in config")) + + private val inBW = mapping.inBW + + @public val io = IO(new Bundle { + // SRAM read port (directly connected to Ball's bankRead) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + + // Control inputs + val startPreload = Input(Bool()) // pulse: begin preloading kRow rows + val startLoadNext = Input(Bool()) // pulse: begin loading 1 new row + + // Configuration (latched by Im2col on cmdReq.fire) + val kRow = Input(UInt(log2Ceil(maxK + 1).W)) + val inCol = Input(UInt(16.W)) + val rowPtr = Input(UInt(16.W)) + val rBaseBeat = Input(UInt(32.W)) + val rBankId = Input(UInt(log2Up(b.memDomain.bankNum).W)) + val robId = Input(UInt(log2Up(b.frontend.rob_entries).W)) + + // Status outputs + val loadDone = Output(Bool()) // high when load operation is complete + + // Element read port (combinational) + val elemReq = new Bundle { + val kRowIdx = Input(UInt(log2Ceil(maxK + 1).W)) + val kColIdx = Input(UInt(log2Ceil(maxK + 1).W)) + val colPtr = Input(UInt(16.W)) + } + + val elemData = Output(UInt(elemWidth.W)) + }) + + private def ceilDiv(a: UInt, d: Int): UInt = (a + (d - 1).U) / d.U + private val inColWords = ceilDiv(io.inCol, lanesPerBeat) + + // Line buffer storage + private val lineBuffer = RegInit(VecInit(Seq.fill(maxK)(VecInit(Seq.fill(maxInColWords)(0.U(bankWidth.W)))))) + + // Row slot FIFO for circular buffer management + private val rowFifo = Module(new RowSlotFIFO(maxK)) + rowFifo.io.kRows := io.kRow + rowFifo.io.init := false.B + rowFifo.io.advance := false.B + + // Load state machine + val sIdle :: sPreload :: sLoadNext :: Nil = Enum(3) + val loadState = RegInit(sIdle) + + private val ldRowIdxReg = RegInit(0.U(log2Ceil(maxK + 1).W)) + private val ldBeatIdxReg = RegInit(0.U(log2Ceil(maxInColWords + 1).W)) + private val ldOutstandingReg = RegInit(false.B) + + io.loadDone := (loadState === sIdle) + + // Default bankRead signals + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + io.bankRead(i).bank_id := io.rBankId + io.bankRead(i).rob_id := io.robId + io.bankRead(i).ball_id := 0.U + io.bankRead(i).group_id := 0.U + } + + // Element extraction (combinational) — no division needed + private val startLane = io.elemReq.colPtr % lanesPerBeat.U + private val physicalSlot = RowSlotFIFO.logicalToPhysical(rowFifo.io.head, io.elemReq.kRowIdx, io.kRow) + private val laneSum = startLane + io.elemReq.kColIdx + private val beatIdx = laneSum / lanesPerBeat.U + private val laneIdx = laneSum % lanesPerBeat.U + private val beatWord = lineBuffer(physicalSlot)(beatIdx) + private val lanes = beatWord.asTypeOf(Vec(lanesPerBeat, UInt(elemWidth.W))) + io.elemData := lanes(laneIdx) + + switch(loadState) { + is(sIdle) { + when(io.startPreload) { + ldRowIdxReg := 0.U + ldBeatIdxReg := 0.U + ldOutstandingReg := false.B + rowFifo.io.init := true.B + loadState := sPreload + }.elsewhen(io.startLoadNext) { + ldBeatIdxReg := 0.U + ldOutstandingReg := false.B + loadState := sLoadNext + } + } + + is(sPreload) { + val doneRows = ldRowIdxReg === io.kRow + val canIssue = !doneRows && !ldOutstandingReg && (ldBeatIdxReg < inColWords) + val rowElem = io.rowPtr + ldRowIdxReg + val reqAddr = io.rBaseBeat + rowElem * inColWords + ldBeatIdxReg + + io.bankRead(0).io.req.valid := canIssue + io.bankRead(0).io.req.bits.addr := reqAddr + io.bankRead(0).io.resp.ready := ldOutstandingReg + + when(io.bankRead(0).io.req.fire) { + ldOutstandingReg := true.B + } + + when(io.bankRead(0).io.resp.fire) { + lineBuffer(ldRowIdxReg)(ldBeatIdxReg) := io.bankRead(0).io.resp.bits.data.asUInt + ldOutstandingReg := false.B + + when(ldBeatIdxReg + 1.U === inColWords) { + ldBeatIdxReg := 0.U + ldRowIdxReg := ldRowIdxReg + 1.U + }.otherwise { + ldBeatIdxReg := ldBeatIdxReg + 1.U + } + } + + when(doneRows && !ldOutstandingReg) { + loadState := sIdle + } + } + + is(sLoadNext) { + val canIssue = !ldOutstandingReg && (ldBeatIdxReg < inColWords) + val rowElem = io.rowPtr + io.kRow - 1.U + val reqAddr = io.rBaseBeat + rowElem * inColWords + ldBeatIdxReg + val targetSlot = rowFifo.io.slotToOverwrite + + io.bankRead(0).io.req.valid := canIssue + io.bankRead(0).io.req.bits.addr := reqAddr + io.bankRead(0).io.resp.ready := ldOutstandingReg + + when(io.bankRead(0).io.req.fire) { + ldOutstandingReg := true.B + } + + when(io.bankRead(0).io.resp.fire) { + lineBuffer(targetSlot)(ldBeatIdxReg) := io.bankRead(0).io.resp.bits.data.asUInt + ldOutstandingReg := false.B + + when(ldBeatIdxReg + 1.U === inColWords) { + ldBeatIdxReg := 0.U + rowFifo.io.advance := true.B + loadState := sIdle + }.otherwise { + ldBeatIdxReg := ldBeatIdxReg + 1.U + } + } + } + } +} diff --git a/arch/src/main/scala/prototype/im2col/README.md b/arch/src/main/scala/framework/balldomain/prototype/im2col/README.md similarity index 100% rename from arch/src/main/scala/prototype/im2col/README.md rename to arch/src/main/scala/framework/balldomain/prototype/im2col/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/StreamWriter.scala b/arch/src/main/scala/framework/balldomain/prototype/im2col/StreamWriter.scala new file mode 100644 index 00000000..6ae89606 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/StreamWriter.scala @@ -0,0 +1,123 @@ +package framework.balldomain.prototype.im2col + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.blink.BankWrite +import framework.top.GlobalConfig +import framework.balldomain.prototype.im2col.configs.Im2colBallParam + +/** + * StreamWriter — packs elements into beats and writes to SRAM. + * + * Accepts one element per cycle via elemIn, packs lanesPerBeat elements + * into a full beat, then issues a write request. Handles partial flush + * at window end. + */ +@instantiable +class StreamWriter(val b: GlobalConfig) extends Module { + private val maxK = Im2colBallParam().InputNum + private val elemWidth = Im2colBallParam().inputWidth + private val bankWidth = b.memDomain.bankWidth + private val lanesPerBeat = 16 + + private val mapping = b.ballDomain.ballIdMappings + .find(_.ballName == "Im2colBall") + .getOrElse(throw new IllegalArgumentException("Im2colBall not found in config")) + + private val outBW = mapping.outBW + + @public val io = IO(new Bundle { + // SRAM write port + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + + // Element input + val elemIn = Flipped(Decoupled(UInt(elemWidth.W))) + + // Control + val start = Input(Bool()) // pulse: reset pack state for new window (does NOT reset write address) + val init = Input(Bool()) // pulse: initialize write address for new operation + val flush = Input(Bool()) // pulse: flush partial pack at window end + + // Configuration + val wBaseBeat = Input(UInt(32.W)) // initial write address (used only on init) + val wBankId = Input(UInt(log2Up(b.memDomain.bankNum).W)) + val robId = Input(UInt(log2Up(b.frontend.rob_entries).W)) + + // Status + val busy = Output(Bool()) // actively writing or have pending data + val wBaseBeatOut = Output(UInt(32.W)) // updated write address + }) + + private val packCntReg = RegInit(0.U(log2Ceil(lanesPerBeat + 1).W)) + private val packReg = RegInit(VecInit(Seq.fill(lanesPerBeat)(0.U(elemWidth.W)))) + private val wrPendingReg = RegInit(false.B) + private val wAddrReg = RegInit(0.U(32.W)) + private val flushingReg = RegInit(false.B) + + io.wBaseBeatOut := wAddrReg + io.busy := wrPendingReg || flushingReg + + // Default bankWrite signals + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(false.B)) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + io.bankWrite(i).bank_id := io.wBankId + io.bankWrite(i).rob_id := io.robId + io.bankWrite(i).ball_id := 0.U + io.bankWrite(i).group_id := 0.U + } + + io.bankWrite(0).io.resp.ready := true.B + + // Write request when pack full or flushing + io.bankWrite(0).io.req.valid := wrPendingReg + io.bankWrite(0).io.req.bits.addr := wAddrReg + io.bankWrite(0).io.req.bits.data := Cat(packReg.reverse) + io.bankWrite(0).io.req.bits.wmode := true.B + io.bankWrite(0).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(true.B)) + + // Accept elements when not pending a write + io.elemIn.ready := !wrPendingReg && !flushingReg + + when(io.init) { + wAddrReg := io.wBaseBeat + packCntReg := 0.U + wrPendingReg := false.B + flushingReg := false.B + } + + when(io.start) { + packCntReg := 0.U + wrPendingReg := false.B + flushingReg := false.B + } + + when(io.bankWrite(0).io.req.fire) { + wAddrReg := wAddrReg + 1.U + packCntReg := 0.U + wrPendingReg := false.B + flushingReg := false.B + } + + when(io.elemIn.fire) { + packReg(packCntReg) := io.elemIn.bits + val nextCnt = packCntReg + 1.U + packCntReg := nextCnt + when(nextCnt === lanesPerBeat.U) { + wrPendingReg := true.B + } + } + + when(io.flush && !wrPendingReg) { + when(packCntReg > 0.U) { + wrPendingReg := true.B + flushingReg := true.B + } + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/configs/Im2colBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/im2col/configs/Im2colBallParam.scala new file mode 100644 index 00000000..e468c565 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/configs/Im2colBallParam.scala @@ -0,0 +1,21 @@ +package framework.balldomain.prototype.im2col.configs + +import upickle.default._ + +/** + * Im2colBall Parameter + */ +case class Im2colBallParam( + InputNum: Int, + inputWidth: Int) + +object Im2colBallParam { + implicit val rw: ReadWriter[Im2colBallParam] = macroRW + + def apply(): Im2colBallParam = { + val jsonStr = + scala.io.Source.fromFile("src/main/scala/framework/balldomain/prototype/im2col/configs/default.json").mkString + read[Im2colBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/im2col/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/im2col/configs/default.json new file mode 100644 index 00000000..a4552216 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/im2col/configs/default.json @@ -0,0 +1,4 @@ +{ + "InputNum": 16, + "inputWidth": 8 +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/mxfp/Mxfp.scala b/arch/src/main/scala/framework/balldomain/prototype/mxfp/Mxfp.scala new file mode 100644 index 00000000..95aa3ab3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/mxfp/Mxfp.scala @@ -0,0 +1,299 @@ +package framework.balldomain.prototype.mxfp + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig +import framework.balldomain.prototype.mxfp.configs.MxfpBallParam + +@instantiable +class PipelinedMxfp(val b: GlobalConfig) extends Module { + val ballConfig = MxfpBallParam() + val InputNum = ballConfig.InputNum + val inputWidth = ballConfig.inputWidth + val bankWidth = b.memDomain.bankWidth + + require(InputNum == 16, s"MxfpBall v1 requires InputNum = 16, got $InputNum") + require(inputWidth == 32, s"MxfpBall v1 requires inputWidth = 32, got $inputWidth") + require(bankWidth % inputWidth == 0, s"bankWidth must be divisible by inputWidth, got bankWidth=$bankWidth inputWidth=$inputWidth") + require(bankWidth >= 96, s"MxfpBall requires bankWidth >= 96, got $bankWidth") + + val elemsPerWord = bankWidth / inputWidth + require(InputNum % elemsPerWord == 0, s"InputNum must be divisible by elemsPerWord, got InputNum=$InputNum elemsPerWord=$elemsPerWord") + val wordsPerBlock = InputNum / elemsPerWord + + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "MxfpBall") + .getOrElse(throw new IllegalArgumentException("MxfpBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + // --------------------------------------------------------------------------- + // FP32 helper functions + // --------------------------------------------------------------------------- + + private def fpSign(x: UInt): UInt = x(31) + private def fpExp(x: UInt): UInt = x(30, 23) + private def fpFrac(x: UInt): UInt = x(22, 0) + + private def isZero(x: UInt): Bool = fpExp(x) === 0.U && fpFrac(x) === 0.U + private def isSubnormal(x: UInt): Bool = fpExp(x) === 0.U && fpFrac(x) =/= 0.U + private def isSpecial(x: UInt): Bool = fpExp(x) === "hff".U + + private def normalExpOrZero(x: UInt): UInt = + Mux(isZero(x) || isSubnormal(x) || isSpecial(x), 0.U(8.W), fpExp(x)) + + // v1 approximation: + // 4-bit magnitude under shared exponent + private def quantizeMag4(x: UInt, sharedExp: UInt): UInt = { + val exp = fpExp(x) + val frac = fpFrac(x) + + val sig24 = Cat(1.U(1.W), frac) // 24 bits + val shiftAmt = (20.U(8.W) + sharedExp - exp)(5, 0) + + val shifted = sig24 >> shiftAmt + val mag = Wire(UInt(4.W)) + mag := 0.U + + when(isZero(x) || isSubnormal(x)) { + mag := 0.U + }.elsewhen(isSpecial(x)) { + mag := 15.U + }.elsewhen(exp > sharedExp) { + mag := 15.U + }.otherwise { + mag := Mux(shifted >= 15.U, 15.U, shifted(3, 0)) + } + + mag + } + + // Pack 16 FP32 values into one MX6 block + // + // Layout (LSB first): + // [7:0] : global exponent + // [15:8] : 8 micro bits + // [95:16] : 16 * (sign + 4-bit mag) = 80 bits + // [bankWidth-1:96] : zero + private def packMx6Block(elems: Seq[UInt]): UInt = { + require(elems.length == InputNum, s"packMx6Block expects $InputNum elements, got ${elems.length}") + + val exps = elems.map(normalExpOrZero) + val globalExp = exps.reduce((a, b) => Mux(a > b, a, b)) + + val microBits = (0 until InputNum / 2).map { p => + val e0 = exps(2 * p) + val e1 = exps(2 * p + 1) + val pairMax = Mux(e0 > e1, e0, e1) + (globalExp =/= 0.U) && (pairMax + 1.U <= globalExp) + } + + val elemPayloads = (0 until InputNum).map { i => + val pairIdx = i / 2 + val localExp = Mux(microBits(pairIdx), globalExp - 1.U, globalExp) + val signBit = fpSign(elems(i)) + val mag4 = quantizeMag4(elems(i), localExp) + Cat(signBit, mag4) // 5 bits + } + + val microPacked = Cat(microBits.reverse.map(_.asUInt)) // 8 bits + val elemPacked = Cat(elemPayloads.reverse) // 80 bits + val packed96 = Cat(elemPacked, microPacked, globalExp) + + if (bankWidth > 96) { + Cat(0.U((bankWidth - 96).W), packed96) + } else { + packed96(bankWidth - 1, 0) + } + } + + // --------------------------------------------------------------------------- + // ROB bookkeeping + // --------------------------------------------------------------------------- + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + } + + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + // --------------------------------------------------------------------------- + // State machine + // iter is interpreted as number of MX blocks + // Each block consumes wordsPerBlock input words and produces 1 output word + // --------------------------------------------------------------------------- + + val idle :: sRead :: sPack :: sWrite :: complete :: Nil = Enum(5) + val state = RegInit(idle) + + val readCntWidth = log2Ceil(wordsPerBlock + 1) + + val fp32Buf = RegInit(VecInit(Seq.fill(InputNum)(0.U(inputWidth.W)))) + val packedBlockReg = RegInit(0.U(bankWidth.W)) + + val readReqCounter = RegInit(0.U(readCntWidth.W)) + val readRespCounter = RegInit(0.U(readCntWidth.W)) + + val raddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val waddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val remainingBlocks = RegInit(0.U(b.frontend.iter_len.W)) + + val writeMaskReg = RegInit(VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W)))) + + // --------------------------------------------------------------------------- + // Default IO + // --------------------------------------------------------------------------- + + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + } + + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + } + + for (i <- 0 until inBW) { + io.bankRead(i).bank_id := rbank_reg + io.bankRead(i).group_id := 0.U + } + + for (i <- 0 until outBW) { + io.bankWrite(i).bank_id := wbank_reg + io.bankWrite(i).group_id := 0.U + } + + io.cmdReq.ready := state === idle + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + // --------------------------------------------------------------------------- + // Main FSM + // --------------------------------------------------------------------------- + + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + readReqCounter := 0.U + readRespCounter := 0.U + raddr_reg := 0.U + waddr_reg := 0.U + rbank_reg := io.cmdReq.bits.cmd.op1_bank + wbank_reg := io.cmdReq.bits.cmd.wr_bank + remainingBlocks := io.cmdReq.bits.cmd.iter + + for (i <- 0 until b.memDomain.bankMaskLen) { + writeMaskReg(i) := 1.U + } + + when(io.cmdReq.bits.cmd.iter === 0.U) { + state := complete + }.otherwise { + state := sRead + } + } + } + + is(sRead) { + io.bankRead(0).io.resp.ready := true.B + + io.bankRead(0).io.req.valid := readReqCounter < wordsPerBlock.U + io.bankRead(0).io.req.bits.addr := raddr_reg + readReqCounter + + when(io.bankRead(0).io.req.fire) { + readReqCounter := readReqCounter + 1.U + } + + when(io.bankRead(0).io.resp.fire) { + val dataWord = io.bankRead(0).io.resp.bits.data + + for (w <- 0 until wordsPerBlock) { + when(readRespCounter === w.U) { + for (i <- 0 until elemsPerWord) { + val hi = (i + 1) * inputWidth - 1 + val lo = i * inputWidth + fp32Buf(w * elemsPerWord + i) := dataWord(hi, lo) + } + } + } + + when(readRespCounter === (wordsPerBlock - 1).U) { + state := sPack + } + + readRespCounter := readRespCounter + 1.U + } + } + + is(sPack) { + packedBlockReg := packMx6Block((0 until InputNum).map(i => fp32Buf(i))) + state := sWrite + } + + is(sWrite) { + io.bankWrite(0).io.req.valid := true.B + io.bankWrite(0).io.req.bits.addr := waddr_reg + io.bankWrite(0).io.req.bits.data := packedBlockReg + io.bankWrite(0).io.req.bits.mask := writeMaskReg + io.bankWrite(0).io.resp.ready := true.B + + when(io.bankWrite(0).io.req.fire) { + when(remainingBlocks > 1.U) { + remainingBlocks := remainingBlocks - 1.U + raddr_reg := raddr_reg + wordsPerBlock.U + waddr_reg := waddr_reg + 1.U + readReqCounter := 0.U + readRespCounter := 0.U + state := sRead + }.otherwise { + state := complete + } + } + } + + is(complete) { + io.cmdResp.valid := true.B + when(io.cmdResp.fire) { + state := idle + } + } + } + + io.status.idle := state === idle + io.status.running := (state === sRead) || (state === sPack) || (state === sWrite) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/mxfp/MxfpBall.scala b/arch/src/main/scala/framework/balldomain/prototype/mxfp/MxfpBall.scala new file mode 100644 index 00000000..25785ec8 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/mxfp/MxfpBall.scala @@ -0,0 +1,47 @@ +package framework.balldomain.prototype.mxfp + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} + +import framework.balldomain.blink.{BlinkIO, HasBlink, SubRobRow} +import framework.top.GlobalConfig + +/** + * MxfpBall - MXFP conversion ball wrapper. + * + * This module only wraps the inner PipelinedMxfp execution unit + * and connects it to the blink protocol. + */ +@instantiable +class MxfpBall(val b: GlobalConfig) extends Module with HasBlink { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "MxfpBall") + .getOrElse(throw new IllegalArgumentException("MxfpBall not found in config")) + + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + val mxfpUnit: Instance[PipelinedMxfp] = Instantiate(new PipelinedMxfp(b)) + + mxfpUnit.io.cmdReq <> io.cmdReq + mxfpUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + mxfpUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + mxfpUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> mxfpUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/mxfp/README.md b/arch/src/main/scala/framework/balldomain/prototype/mxfp/README.md new file mode 100644 index 00000000..67460e1f --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/mxfp/README.md @@ -0,0 +1,398 @@ +# MXFP 格式转换加速器 + +## Overview + +本目录实现了 Buckyball 的 **MXFP(Mixed Floating-Point)格式转换加速器**,位置位于 `arch/src/main/scala/framework/balldomain/prototype/mxfp`。 +该模块用于从 Scratchpad 中读取 FP32 数据块,按照当前定义的 MXFP 打包格式完成转换,并将结果写回目标 Scratchpad bank。 + +当前实现重点验证以下完整链路: + +- 自定义 MXFP 指令接入 +- Ball domain 指令解码与调度 +- Scratchpad 读写数据流 +- RTL 打包逻辑 +- C 端软件黄金模型对比 +- Verilator 仿真验证 + +当前版本属于 **V1 原型实现**,目标是先完成 **FP32 block -> MXFP packed block** 的功能验证。 + +核心组件: + +- **Mxfp.scala**:MXFP 主执行逻辑 +- **MxfpBall.scala**:MXFP Ball 外层封装 + +## Code Structure + +```text +mxfp/ +├── Mxfp.scala - MXFP 主执行逻辑 +├── MxfpBall.scala - MXFP Ball 外层封装 +└── configs/ + ├── MxfpBallParam.scala - 参数定义与读取 + └── default.json - 默认参数配置 +```` + +### Module Responsibilities + +**Mxfp.scala**(加速器实现层) + +* 从 Scratchpad 中按 block 读取 FP32 数据 +* 收集一个 block 所需的 16 个 FP32 元素 +* 计算 global exponent 与 micro bits +* 执行 4-bit magnitude 量化 +* 生成 16 个 payload,并打包为一个 128-bit MXFP block +* 将结果写回目标 Scratchpad bank +* 提供 Ball domain 命令接口,并返回完成响应/状态 + +**MxfpBall.scala**(Ball 封装层) + +* 实例化 `PipelinedMxfp` +* 连接 blink 接口 +* 透传 bank read / bank write 端口 +* 输出状态信息 + +## Module Description + +### Mxfp.scala + +**Main functionality**: + +当前模块按 block 进行处理: + +**从 Scratchpad 连续读取 4 个输入 word -> 拼成 16 个 FP32 元素 -> 执行 MXFP 打包 -> 写回 1 个 128-bit 输出 word** + +当前 V1 的固定假设如下: + +* `InputNum = 16` +* `inputWidth = 32` +* `bankWidth = 128` +* 一个 bank word 含 `4` 个 FP32 +* 一个 MXFP block 含 `16` 个 FP32 +* 一个 block 对应: + + * `4` 次 bank read + * `1` 次 bank write + +### State machine definition + +```scala +val idle :: sRead :: sPack :: sWrite :: complete :: Nil = Enum(5) +val state = RegInit(idle) +``` + +状态说明: + +* **idle**:等待命令 +* **sRead**:连续读取一个 block 所需的 4 个输入 word +* **sPack**:将收集到的 16 个 FP32 打包为一个 MXFP block +* **sWrite**:将 packed block 写回目标 bank +* **complete**:返回完成响应,进入下一次处理或回到 idle + +### Key registers + +```scala +// 输入缓存:保存一个 block 的 16 个 FP32 +val fp32Buf = RegInit(VecInit(Seq.fill(InputNum)(0.U(inputWidth.W)))) + +// 当前 block 的 packed 输出 +val packedBlockReg = RegInit(0.U(bankWidth.W)) + +// 读请求/响应计数器 +val readReqCounter = RegInit(0.U(...)) +val readRespCounter = RegInit(0.U(...)) + +// 地址与控制寄存器 +val raddr_reg = RegInit(0.U(...)) +val waddr_reg = RegInit(0.U(...)) +val rbank_reg = RegInit(0.U(...)) +val wbank_reg = RegInit(0.U(...)) +val remainingBlocks = RegInit(0.U(...)) +``` + +### Command parsing + +命令进入时: + +* 记录 `rob_id / sub_rob_id` +* 记录源 bank 与目标 bank +* 初始化读写地址 +* 将 `iter` 解释为 **待处理 block 数** + +当前实现中: + +* `op1_bank`:输入源 bank +* `wr_bank`:输出目标 bank +* `iter`:block 数量,不再表示“行数” + +### Data conversion logic (MXFP) + +当前 V1 实现为固定 **MX6-like 打包格式**。 + +#### 1. Global exponent + +对 block 中 16 个 FP32 元素: + +* 提取 exponent +* 忽略 zero / subnormal / special 在正常 exponent 竞争中的影响 +* 取最大 exponent 作为 `globalExp` + +#### 2. Micro bit + +每 2 个元素构成一个 pair: + +* 若该 pair 的最大 exponent 比 `globalExp` 至少小 1 +* 则该 pair 的 micro bit 置 `1` +* 否则置 `0` + +如果 micro bit = `1`,该 pair 使用: + +* `localExp = globalExp - 1` + +否则使用: + +* `localExp = globalExp` + +#### 3. Payload + +每个元素生成一个 5-bit payload: + +* `1 bit sign` +* `4 bit magnitude` + +即: + +```text +payload = sign[1] ++ mag[4] +``` + +#### 4. Packed layout + +当前输出 block 的布局为: + +* byte 0:global exponent +* byte 1:8 个 micro bit +* byte 2 ~ byte 11:16 个 payload 打包后的 80 bit +* byte 12 ~ byte 15:0 填充 + +总计: + +* `8 + 8 + 80 = 96 bit` +* 再补零到 `128 bit` + +### Scratchpad interface + +当前通过 Ball domain 的 bank read / bank write 接口完成数据交换: + +* 输入:从源 bank 连续读取 4 个 word +* 输出:向目标 bank 写回 1 个 packed word + +每个 block 的地址步进关系为: + +* 读地址每次 `+1` +* 累积 4 次读后完成一个 block +* 写地址每个 block `+1` + +### Processing flow + +1. **idle** + + * 等待 `cmdReq` + * 解析源 bank、目标 bank、iter + * 初始化 block 处理计数器 + +2. **sRead** + + * 连续发起 4 次 bank read + * 每次读取 1 个 128-bit word + * 拆出 4 个 FP32 填入 `fp32Buf` + +3. **sPack** + + * 当 16 个 FP32 收集完成后 + * 计算 global exponent、micro bits、payload + * 生成 1 个 packed MXFP block + +4. **sWrite** + + * 向目标 bank 写回当前 block 的 packed 结果 + +5. **complete** + + * 若还有剩余 block,则继续下一轮 + * 否则发出 `cmdResp`,回到 `idle` + +## ISA Structure + +该模块对应一条 Ball 指令,用于执行: + +**从源 Scratchpad bank 读取 FP32 block,转换为 MXFP packed block,并写回目标 bank** + +### Function + +执行 Scratchpad 数据的块级 MXFP 格式转换。 + +### func7 + +```text +55 +``` + +### Instruction + +```c +bb_mxfp(op1_bank_id, wr_bank_id, iter) +``` + +### Parameters + +* `op1_bank_id`:输入数据所在 bank +* `wr_bank_id`:输出数据写回 bank +* `iter`:待处理 block 数量 + +当前实现中: + +* 1 个 block = 16 个 FP32 +* 1 个 block 需要 4 个输入 word +* 1 个 block 产生 1 个输出 word + +## Usage + +### Basic flow + +1. 将 FP32 输入数据按原始 IEEE754 bit pattern 准备好 +2. 使用 `bb_mvin` 将输入搬入源 bank +3. 调用 `bb_mxfp(...)` +4. 使用 `bb_mvout` 将 packed 输出搬出 +5. 与软件黄金模型比较结果 + +### Example workflow + +```text +Input FP32 bit patterns +-> bb_mvin +-> bb_mxfp +-> bb_mvout +-> software golden compare +``` + +## Test Strategy + +当前提供了 C 端测试程序 `mxfp_test.c`,用于验证: + +* Scratchpad 输入是否正确搬入 +* MXFP Ball 是否正确执行 block 打包 +* Scratchpad 输出是否与软件黄金模型一致 + +### Current validation method + +当前测试使用: + +* 固定 16 元素 block 输入 +* 原始 IEEE754 bit pattern 构造输入 +* 软件端复现 RTL 当前的打包逻辑 +* 与硬件输出逐字节比较 + +为了避免 baremetal 环境下浮点执行带来的不确定性,测试输入采用 **IEEE754 的原始 32-bit bit pattern** 表示,而不是直接使用 `float` 运算。 + +## Current Validation Result + +当前版本已经完成以下验证: + +* MXFP 指令链路接入成功 +* RTL elaboration 成功 +* Verilator 仿真成功运行 +* Scratchpad 输入 -> MXFP 转换 -> Scratchpad 输出 链路打通 +* 软件黄金模型与硬件输出一致 + +在当前测试配置下: + +* 已验证前缀输出与软件黄金模型一致 +* 已验证一个完整 128-bit 输出 block 与软件黄金模型一致 +* 当前仿真测试返回 `PASSED` + +这表明当前版本的: + +* global exponent 打包 +* micro bit 生成 +* payload 打包 +* bank 读写流程 + +在已验证测试中是一致的。 + +## Notes + +1. **当前为原型版本** + 这是一个 V1 原型,主要目标是验证数据路径与打包逻辑,不是最终完整 MX 浮点实现。 + +2. **当前输入固定为 FP32** + 输入元素宽度固定为 32 bit,输入数据以 IEEE754 原始 bit pattern 形式参与测试。 + +3. **当前 block 大小固定** + 当前仅支持 `16` 元素 block。 + +4. **当前 bankWidth 固定为 128 bit** + 因此一个 block 需要通过 4 次读取完成收集。 + +5. **特殊值处理采用简化策略** + + * zero -> 视为 0 + * subnormal -> 视为 0 + * Inf / NaN -> 饱和到最大 magnitude + +6. **当前实现重点是格式转换** + 当前不涉及更复杂的 MX 算术计算,仅实现 FP32 -> MXFP packed format。 + +7. **当前测试主要用于 bring-up** + 当前主要完成基础功能验证,后续仍可继续补充更多输入模式与 corner case 测试。 + +## Files Modified / Added + +本原型通常涉及以下新增或修改文件: + +### RTL / Config + +* `Mxfp.scala` +* `MxfpBall.scala` +* `configs/MxfpBallParam.scala` +* `configs/default.json` + +### ISA / Test + +* `55_mxfp.c` +* `mxfp_test.c` + +### Integration points + +* Ball 注册 +* decoder 映射 +* ISA include +* ball 配置映射 + +## Future Work + +后续可以继续推进以下方向: + +1. 扩展更多输入数据模式 +2. 增加 corner case 测试 +3. 完善 NaN / Inf / subnormal 的处理策略 +4. 支持更多 MX 格式 +5. 优化打包方式与参数化能力 +6. 补充更系统的性能测试结果 + +## Summary + +本 MXFP Ball 原型已经完成了从: + +* 自定义指令 +* Ball domain RTL +* Scratchpad 读写 +* FP32 block 收集 +* MXFP packed block 生成 +* C 侧黄金模型比对 + +这一整条路径的验证。 + +当前版本已经能够作为后续 MXFP 扩展与优化的基础实现。 + +``` +``` diff --git a/arch/src/main/scala/framework/balldomain/prototype/mxfp/configs/MxfpBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/mxfp/configs/MxfpBallParam.scala new file mode 100644 index 00000000..cb7bed67 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/mxfp/configs/MxfpBallParam.scala @@ -0,0 +1,20 @@ +package framework.balldomain.prototype.mxfp.configs + +import upickle.default._ + +case class MxfpBallParam( + InputNum: Int, + inputWidth: Int +) + +object MxfpBallParam { + implicit val rw: ReadWriter[MxfpBallParam] = macroRW + + def apply(): MxfpBallParam = { + val jsonStr = + scala.io.Source + .fromFile("src/main/scala/framework/balldomain/prototype/mxfp/configs/default.json") + .mkString + read[MxfpBallParam](jsonStr) + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/mxfp/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/mxfp/configs/default.json new file mode 100644 index 00000000..51676872 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/mxfp/configs/default.json @@ -0,0 +1,4 @@ +{ + "InputNum": 16, + "inputWidth": 32 +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/quant/Quant.scala b/arch/src/main/scala/framework/balldomain/prototype/quant/Quant.scala new file mode 100644 index 00000000..d3627d5b --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/quant/Quant.scala @@ -0,0 +1,312 @@ +package framework.balldomain.prototype.quant + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig +import framework.balldomain.prototype.quant.configs.QuantBallParam + +/** + * Quant - Quantization core logic. + * + * INT32 mode: FP32 -> INT32 + * Each 128-bit SRAM word = 4 x FP32. + * Output: 4 x INT32 = 128 bits. 1:1 read/write. + * + * INT8 mode: FP32 -> INT8 + * Read 4 words (16 FP32), pack into 1 output word (16 x INT8 = 128 bits). 4:1 read/write. + * + * Scale factor from cmd.special(31,0) as FP32 bit pattern. + * + * FSM follows ReluBall pattern: + * idle -> sRead (pipelined read all words, process on resp) -> sWrite (write all results) -> complete -> idle + */ +@instantiable +class Quant(val b: GlobalConfig) extends Module { + val ballConfig = QuantBallParam() + val isInt8 = ballConfig.targetType == "INT8" + val elemsPerWord = 4 + val bankWidth = b.memDomain.bankWidth + // For INT32: InputNum = iter (max 16), for INT8: InputNum = iter (output words, max 16) + val InputNum = 16 // max elements per dimension, matching ReluBall + + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "QuantBall") + .getOrElse(throw new IllegalArgumentException("QuantBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + } + + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + // FSM + val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) + val state = RegInit(idle) + + // Result storage: up to InputNum output words, each 128 bits + val regArray = RegInit(VecInit(Seq.fill(InputNum)(0.U(bankWidth.W)))) + + val readCounter = RegInit(0.U(log2Ceil(InputNum * 4 + 1).W)) // request counter + val respCounter = RegInit(0.U(log2Ceil(InputNum * 4 + 1).W)) // response counter + val writeCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + + val raddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val waddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + val scale_reg = RegInit(0.U(32.W)) + + // For INT8: accumulate 4 responses into one output word + val int8AccumIdx = RegInit(0.U(2.W)) // 0..3 within current output word + val int8OutIdx = RegInit(0.U(log2Ceil(InputNum + 1).W)) // output word index + + // Total read requests needed + val totalReads = Wire(UInt(b.frontend.iter_len.W)) + if (isInt8) { + totalReads := iter_reg << 2 + } else { + totalReads := iter_reg + } + + val writeDataReg = Reg(UInt(bankWidth.W)) + val writeMaskReg = Reg(Vec(b.memDomain.bankMaskLen, UInt(1.W))) + + // Default outputs + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + io.bankRead(i).bank_id := rbank_reg + io.bankRead(i).group_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + io.bankWrite(i).bank_id := wbank_reg + io.bankWrite(i).group_id := 0.U + } + + io.cmdReq.ready := state === idle + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + // ---- FP32 multiply ---- + def fp32Multiply(a: UInt, bv: UInt): UInt = { + val a_sign = a(31) + val b_sign = bv(31) + val a_exp = a(30, 23) + val b_exp = bv(30, 23) + val a_mant = Cat(1.U(1.W), a(22, 0)) + val b_mant = Cat(1.U(1.W), bv(22, 0)) + val result_sign = a_sign ^ b_sign + val a_is_zero = a_exp === 0.U && a(22, 0) === 0.U + val b_is_zero = b_exp === 0.U && bv(22, 0) === 0.U + val mant_product = (a_mant * b_mant)(47, 0) + val mant_shifted = Wire(UInt(24.W)) + val exp_adjust = Wire(UInt(1.W)) + when(mant_product(47)) { + mant_shifted := mant_product(47, 24) + exp_adjust := 1.U + }.otherwise { + mant_shifted := mant_product(46, 23) + exp_adjust := 0.U + } + val result_exp_wide = a_exp +& b_exp +& exp_adjust - 127.U + val result_exp = result_exp_wide(7, 0) + val result = Wire(UInt(32.W)) + when(a_is_zero || b_is_zero) { + result := 0.U + }.elsewhen(result_exp_wide(9, 8) =/= 0.U && result_exp_wide(9)) { + result := 0.U + }.elsewhen(result_exp_wide(8) && !result_exp_wide(9)) { + result := Cat(result_sign, 255.U(8.W), 0.U(23.W)) + }.otherwise { + result := Cat(result_sign, result_exp, mant_shifted(22, 0)) + } + result + } + + def fp32ToInt32(fp: UInt): UInt = { + val sign = fp(31) + val exponent = fp(30, 23) + val mantissa = Cat(1.U(1.W), fp(22, 0)) + val is_zero = exponent === 0.U && fp(22, 0) === 0.U + val exp_val = exponent.asSInt - 127.S + val result = Wire(SInt(32.W)) + when(is_zero) { + result := 0.S + }.elsewhen(exp_val >= 31.S) { + result := Mux(sign.asBool, -2147483648L.S(32.W), 2147483647.S(32.W)) + }.elsewhen(exp_val < 0.S) { + when(exp_val === -1.S) { + result := Mux(sign.asBool, -1.S(32.W), 1.S(32.W)) + }.otherwise { + result := 0.S + } + }.otherwise { + val shift_amount = exp_val.asUInt(4, 0) + val magnitude = Wire(UInt(32.W)) + when(shift_amount >= 23.U) { + magnitude := mantissa << (shift_amount - 23.U) + }.otherwise { + magnitude := mantissa >> (23.U - shift_amount) + } + result := Mux(sign.asBool, -(magnitude.asSInt), magnitude.asSInt) + } + result.asUInt + } + + def fp32ToInt8(fp: UInt): UInt = { + val int32val = fp32ToInt32(fp).asSInt + val clamped = Wire(SInt(8.W)) + when(int32val > 127.S) { + clamped := 127.S(8.W) + }.elsewhen(int32val < -128.S) { + clamped := -128.S(8.W) + }.otherwise { + clamped := int32val(7, 0).asSInt + } + clamped.asUInt + } + + // ---- FSM ---- + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + state := sRead + readCounter := 0.U + respCounter := 0.U + writeCounter := 0.U + raddr_reg := 0.U + rbank_reg := io.cmdReq.bits.cmd.op1_bank + waddr_reg := 0.U + wbank_reg := io.cmdReq.bits.cmd.wr_bank + iter_reg := io.cmdReq.bits.cmd.iter + scale_reg := io.cmdReq.bits.cmd.special(31, 0) + if (isInt8) { + int8AccumIdx := 0.U + int8OutIdx := 0.U + } + } + } + + is(sRead) { + io.bankRead(0).io.resp.ready := true.B + + // Send read requests (pipelined) + io.bankRead(0).io.req.valid := readCounter < totalReads + io.bankRead(0).io.req.bits.addr := raddr_reg + readCounter + + when(io.bankRead(0).io.req.fire) { + readCounter := readCounter + 1.U + } + + // Process responses (1 cycle latency after req) + val dataWord = io.bankRead(0).io.resp.bits.data + + when(io.bankRead(0).io.resp.fire) { + if (isInt8) { + // Quantize 4 FP32 -> 4 INT8, store in current output word + val packedBytes = Wire(Vec(4, UInt(8.W))) + for (i <- 0 until elemsPerWord) { + val fp_elem = dataWord((i + 1) * 32 - 1, i * 32) + val scaled = fp32Multiply(fp_elem, scale_reg) + packedBytes(i) := fp32ToInt8(scaled) + } + // Write 4 bytes into the correct position of regArray(int8OutIdx) + // Position: int8AccumIdx * 32 bits + val shift = int8AccumIdx * 32.U + val mask = ~(Fill(32, 1.U(1.W)) << shift) + val newBits = Cat(packedBytes(3), packedBytes(2), packedBytes(1), packedBytes(0)) + regArray(int8OutIdx) := (regArray(int8OutIdx) & mask) | (newBits.asUInt << shift) + + when(int8AccumIdx === 3.U) { + int8AccumIdx := 0.U + int8OutIdx := int8OutIdx + 1.U + }.otherwise { + int8AccumIdx := int8AccumIdx + 1.U + } + } else { + // INT32 mode: quantize 4 FP32 -> 4 INT32, pack into 128-bit word + val results = Wire(Vec(elemsPerWord, UInt(32.W))) + for (i <- 0 until elemsPerWord) { + val fp_elem = dataWord((i + 1) * 32 - 1, i * 32) + val scaled = fp32Multiply(fp_elem, scale_reg) + results(i) := fp32ToInt32(scaled) + } + regArray(respCounter) := Cat(results.reverse) + } + respCounter := respCounter + 1.U + + // Check if all responses received + val totalResps = if (isInt8) iter_reg << 2 else iter_reg + when(respCounter === (totalResps - 1.U)) { + state := sWrite + } + } + } + + is(sWrite) { + val hasMore = writeCounter < iter_reg + + io.bankWrite(0).io.req.valid := hasMore + io.bankWrite(0).io.req.bits.addr := waddr_reg + writeCounter + io.bankWrite(0).io.req.bits.data := regArray(writeCounter) + io.bankWrite(0).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(1.U(1.W))) + io.bankWrite(0).io.resp.ready := true.B + + when(io.bankWrite(0).io.req.fire) { + when(writeCounter === (iter_reg - 1.U)) { + state := complete + }.otherwise { + writeCounter := writeCounter + 1.U + } + } + } + + is(complete) { + io.bankWrite(0).io.resp.ready := true.B + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + when(io.cmdResp.fire) { + state := idle + } + } + } + + io.status.idle := state === idle + io.status.running := (state === sRead) || (state === sWrite) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/quant/QuantBall.scala b/arch/src/main/scala/framework/balldomain/prototype/quant/QuantBall.scala new file mode 100644 index 00000000..37c412b1 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/quant/QuantBall.scala @@ -0,0 +1,39 @@ +package framework.balldomain.prototype.quant + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.top.GlobalConfig + +@instantiable +class QuantBall(val b: GlobalConfig) extends Module with HasBlink { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "QuantBall") + .getOrElse(throw new IllegalArgumentException("QuantBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + val quantUnit: Instance[Quant] = Instantiate(new Quant(b)) + + quantUnit.io.cmdReq <> io.cmdReq + quantUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + quantUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + quantUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> quantUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/quant/configs/QuantBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/quant/configs/QuantBallParam.scala new file mode 100644 index 00000000..20dac444 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/quant/configs/QuantBallParam.scala @@ -0,0 +1,18 @@ +package framework.balldomain.prototype.quant.configs + +import upickle.default._ + +case class QuantBallParam( + targetType: String // "INT32" or "INT8" +) + +object QuantBallParam { + implicit val rw: ReadWriter[QuantBallParam] = macroRW + + def apply(): QuantBallParam = { + val jsonStr = + scala.io.Source.fromFile("src/main/scala/framework/balldomain/prototype/quant/configs/default.json").mkString + read[QuantBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/quant/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/quant/configs/default.json new file mode 100644 index 00000000..46314b30 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/quant/configs/default.json @@ -0,0 +1,3 @@ +{ + "targetType": "INT32" +} diff --git a/arch/src/main/scala/prototype/relu/README.md b/arch/src/main/scala/framework/balldomain/prototype/relu/README.md similarity index 100% rename from arch/src/main/scala/prototype/relu/README.md rename to arch/src/main/scala/framework/balldomain/prototype/relu/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/relu/Relu.scala b/arch/src/main/scala/framework/balldomain/prototype/relu/Relu.scala new file mode 100644 index 00000000..fd889310 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/relu/Relu.scala @@ -0,0 +1,212 @@ +package framework.balldomain.prototype.relu + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.prototype.vector._ +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig +import framework.balldomain.prototype.relu.configs.ReluBallParam + +@instantiable +class PipelinedRelu(val b: GlobalConfig) extends Module { + val ballConfig = ReluBallParam() + val InputNum = ballConfig.InputNum + val inputWidth = ballConfig.inputWidth + val bankWidth = b.memDomain.bankWidth + + // Get bandwidth from config + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "ReluBall") + .getOrElse(throw new IllegalArgumentException("ReluBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + } + + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) + val state = RegInit(idle) + + val regArray = RegInit( + VecInit(Seq.fill(InputNum)( + VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + )) + ) + + val readCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + val respCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + val writeCounter = RegInit(0.U(log2Ceil(InputNum + 1).W)) + + val waddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val raddr_reg = RegInit(0.U(b.frontend.iter_len.W)) + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + val cycle_reg = RegInit(0.U(6.W)) + val iterCnt = RegInit(0.U(32.W)) + val writeDataReg = Reg(UInt(bankWidth.W)) + val writeMaskReg = Reg(Vec(b.memDomain.bankMaskLen, UInt(1.W))) + + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + } + + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + } + + for (i <- 0 until inBW) { + io.bankRead(i).bank_id := rbank_reg + io.bankRead(i).group_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).bank_id := wbank_reg + io.bankWrite(i).group_id := 0.U + } + + io.cmdReq.ready := state === idle + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + state := sRead + readCounter := 0.U + respCounter := 0.U + writeCounter := 0.U + waddr_reg := 0.U + wbank_reg := io.cmdReq.bits.cmd.wr_bank + raddr_reg := 0.U + rbank_reg := io.cmdReq.bits.cmd.op1_bank + iter_reg := io.cmdReq.bits.cmd.iter + cycle_reg := (io.cmdReq.bits.cmd.iter +& (InputNum.U - 1.U)) / InputNum.U - 1.U + } + when(cycle_reg =/= 0.U) { + state := sRead + readCounter := 0.U + writeCounter := 0.U + respCounter := 0.U + waddr_reg := waddr_reg + InputNum.U + raddr_reg := raddr_reg + InputNum.U + cycle_reg := cycle_reg - 1.U + } + } + + is(sRead) { + io.bankRead(0).io.resp.ready := true.B + + io.bankRead(0).io.req.valid := (readCounter < InputNum.U) + io.bankRead(0).io.req.bits.addr := raddr_reg + readCounter + + when(io.bankRead(0).io.req.fire) { + readCounter := readCounter + 1.U + } + + val dataWord = io.bankRead(0).io.resp.bits.data + + when(io.bankRead(0).io.resp.fire) { + for (col <- 0 until InputNum) { + val hi = (col + 1) * inputWidth - 1 + val lo = col * inputWidth + val raw = dataWord(hi, lo) + val signed = raw.asSInt + val relu = Mux(signed < 0.S, 0.S(inputWidth.W), signed) + regArray(respCounter)(col) := relu.asUInt + } + respCounter := respCounter + 1.U + } + + when(respCounter === (InputNum - 1).U) { + state := sWrite + writeDataReg := Cat((0 until InputNum).reverse.map(j => regArray(0)(j))) + for (i <- 0 until b.memDomain.bankMaskLen) { + writeMaskReg(i) := 1.U(1.W) + } + } + } + + is(sWrite) { + val hasMore = writeCounter < InputNum.U + + io.bankWrite(0).io.req.valid := hasMore + io.bankWrite(0).io.req.bits.addr := waddr_reg + writeCounter + io.bankWrite(0).io.req.bits.data := writeDataReg + io.bankWrite(0).io.req.bits.mask := writeMaskReg + io.bankWrite(0).io.resp.ready := true.B + + when(io.bankWrite(0).io.req.fire) { + when(writeCounter === (InputNum - 1).U) { + state := complete + }.otherwise { + val nextCnt = writeCounter + 1.U + writeCounter := nextCnt + writeDataReg := Cat((0 until InputNum).reverse.map(j => regArray(nextCnt)(j))) + } + } + } + + is(complete) { + io.bankWrite(0).io.resp.ready := true.B + when(cycle_reg === 0.U) { + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + when(io.cmdResp.fire) { + iterCnt := iterCnt + 1.U + } + } + state := idle + } + } + + io.status.idle := (state === idle) + io.status.running := (state === sRead) || (state === sWrite) + + when(reset.asBool) { + for (i <- 0 until InputNum) { + for (j <- 0 until InputNum) { + regArray(i)(j) := 0.U + } + } + writeDataReg := 0.U + for (i <- 0 until b.memDomain.bankMaskLen) { + writeMaskReg(i) := 0.U + } + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/relu/ReluBall.scala b/arch/src/main/scala/framework/balldomain/prototype/relu/ReluBall.scala new file mode 100644 index 00000000..d1322e9a --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/relu/ReluBall.scala @@ -0,0 +1,45 @@ +package framework.balldomain.prototype.relu + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.balldomain.prototype.relu.PipelinedRelu +import framework.top.GlobalConfig + +/** + * ReluBall - A ReLU computation Ball that complies with the blink protocol. + * Behavior: Read data from Scratchpad, perform element-wise ReLU (set negative values to 0), + * then write back to Scratchpad. + */ +@instantiable +class ReluBall(val b: GlobalConfig) extends Module with HasBlink { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "ReluBall") + .getOrElse(throw new IllegalArgumentException("ReluBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + val reluUnit: Instance[PipelinedRelu] = Instantiate(new PipelinedRelu(b)) + + reluUnit.io.cmdReq <> io.cmdReq + reluUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + reluUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + reluUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> reluUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/relu/configs/ReluBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/relu/configs/ReluBallParam.scala new file mode 100644 index 00000000..e8948dc1 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/relu/configs/ReluBallParam.scala @@ -0,0 +1,21 @@ +package framework.balldomain.prototype.relu.configs + +import upickle.default._ + +/** + * ReluBall Parameter + */ +case class ReluBallParam( + InputNum: Int, + inputWidth: Int) + +object ReluBallParam { + implicit val rw: ReadWriter[ReluBallParam] = macroRW + + def apply(): ReluBallParam = { + val jsonStr = + scala.io.Source.fromFile("src/main/scala/framework/balldomain/prototype/relu/configs/default.json").mkString + read[ReluBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/relu/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/relu/configs/default.json new file mode 100644 index 00000000..a4552216 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/relu/configs/default.json @@ -0,0 +1,4 @@ +{ + "InputNum": 16, + "inputWidth": 8 +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayBall.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayBall.scala new file mode 100644 index 00000000..85fba75b --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayBall.scala @@ -0,0 +1,40 @@ +package framework.balldomain.prototype.systolicarray + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.top.GlobalConfig + +@instantiable +class SystolicArrayBall(val b: GlobalConfig) extends Module with HasBlink with HasBallStatus { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "SystolicArrayBall") + .getOrElse(throw new IllegalArgumentException("SystolicArrayBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + def status: BallStatus = io.status + + val systolicArrayUnit: Instance[SystolicArrayUnit] = Instantiate(new SystolicArrayUnit(b)) + + systolicArrayUnit.io.cmdReq <> io.cmdReq + systolicArrayUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + systolicArrayUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + systolicArrayUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> systolicArrayUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayCtrl.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayCtrl.scala new file mode 100644 index 00000000..da7ba33f --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayCtrl.scala @@ -0,0 +1,103 @@ +package framework.balldomain.prototype.systolicarray + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.top.GlobalConfig + +@instantiable +class SystolicArrayCtrl(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp_o = Decoupled(new BallRsComplete(b)) + + val ctrl_ld_o = Decoupled(new ctrl_ld_req(b)) + val ctrl_st_o = Decoupled(new ctrl_st_req(b)) + val ctrl_ex_o = Decoupled(new ctrl_ex_req(b)) + + val cmdResp_i = Flipped(Valid(new Bundle { val commit = Bool() })) + }) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + val iter = RegInit(0.U(b.frontend.iter_len.W)) + val op1_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op1_bank_addr = RegInit(0.U(12.W)) + val op2_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op2_bank_addr = RegInit(0.U(12.W)) + val wr_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wr_bank_addr = RegInit(0.U(12.W)) + val has_send = RegInit(false.B) + + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + + io.cmdReq.ready := state === idle + + when(io.cmdReq.fire) { + iter := io.cmdReq.bits.cmd.iter + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + op1_bank := io.cmdReq.bits.cmd.op1_bank + op1_bank_addr := 0.U + op2_bank := io.cmdReq.bits.cmd.op2_bank + op2_bank_addr := 0.U + wr_bank := io.cmdReq.bits.cmd.wr_bank + wr_bank_addr := 0.U + state := busy + } + + when(state === busy && !has_send) { + io.ctrl_ld_o.valid := true.B + io.ctrl_ld_o.bits.op1_bank := op1_bank + io.ctrl_ld_o.bits.op1_bank_addr := op1_bank_addr + io.ctrl_ld_o.bits.op2_bank := op2_bank + io.ctrl_ld_o.bits.op2_bank_addr := op2_bank_addr + io.ctrl_ld_o.bits.iter := iter + + io.ctrl_ex_o.valid := true.B + io.ctrl_ex_o.bits.iter := iter + + io.ctrl_st_o.valid := true.B + io.ctrl_st_o.bits.wr_bank := wr_bank + io.ctrl_st_o.bits.wr_bank_addr := wr_bank_addr + io.ctrl_st_o.bits.iter := iter + + has_send := true.B + }.otherwise { + io.ctrl_ld_o.valid := false.B + io.ctrl_ld_o.bits.op1_bank := 0.U + io.ctrl_ld_o.bits.op1_bank_addr := 0.U + io.ctrl_ld_o.bits.op2_bank := 0.U + io.ctrl_ld_o.bits.op2_bank_addr := 0.U + io.ctrl_ld_o.bits.iter := 0.U + + io.ctrl_ex_o.valid := false.B + io.ctrl_ex_o.bits.iter := 0.U + + io.ctrl_st_o.valid := false.B + io.ctrl_st_o.bits.wr_bank := 0.U + io.ctrl_st_o.bits.wr_bank_addr := 0.U + io.ctrl_st_o.bits.iter := 0.U + } + + when(io.cmdResp_i.valid) { + io.cmdResp_o.valid := true.B + io.cmdResp_o.bits.rob_id := rob_id_reg + io.cmdResp_o.bits.is_sub := is_sub_reg + io.cmdResp_o.bits.sub_rob_id := sub_rob_id_reg + state := idle + has_send := false.B + }.otherwise { + io.cmdResp_o.valid := false.B + io.cmdResp_o.bits.rob_id := 0.U + io.cmdResp_o.bits.is_sub := false.B + io.cmdResp_o.bits.sub_rob_id := 0.U + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayEX.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayEX.scala new file mode 100644 index 00000000..a854051f --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayEX.scala @@ -0,0 +1,197 @@ +package framework.balldomain.prototype.systolicarray + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.balldomain.prototype.systolicarray.configs.SystolicBallParam + +class ctrl_ex_req(b: GlobalConfig) extends Bundle { + val iter = UInt(b.frontend.iter_len.W) +} + +class ld_ex_req(b: GlobalConfig) extends Bundle { + val config = SystolicBallParam() + val op1 = Vec(config.lane, UInt(config.inputWidth.W)) + val op2 = Vec(config.lane, UInt(config.inputWidth.W)) + val iter = UInt(b.frontend.iter_len.W) +} + +class ex_st_req(b: GlobalConfig) extends Bundle { + val config = SystolicBallParam() + val result = Vec(config.lane, UInt(config.outputWidth.W)) +} + +class PE(val inputWidth: Int, val outputWidth: Int) extends Module { + + val io = IO(new Bundle { + val in_a = Flipped(Decoupled(UInt(inputWidth.W))) + val in_b = Flipped(Decoupled(UInt(inputWidth.W))) + val out_a = Decoupled(UInt(inputWidth.W)) + val out_b = Decoupled(UInt(inputWidth.W)) + val out_c = Output(UInt(outputWidth.W)) + val clear = Input(Bool()) + }) + + io.out_a.valid := RegNext(io.in_a.valid) + io.out_a.bits := RegNext(io.in_a.bits) + io.in_a.ready := io.out_a.ready + + io.out_b.valid := RegNext(io.in_b.valid) + io.out_b.bits := RegNext(io.in_b.bits) + io.in_b.ready := io.out_b.ready + + val acc_reg = RegInit(0.U(outputWidth.W)) + + when(io.clear) { + acc_reg := 0.U + }.elsewhen(io.in_a.fire && io.in_b.fire) { + acc_reg := io.in_a.bits * io.in_b.bits + acc_reg + } + + io.out_c := acc_reg + +} + +@instantiable +class SystolicArrayEX(val b: GlobalConfig) extends Module { + val config = SystolicBallParam() + val inputWidth = config.inputWidth + val outputWidth = config.outputWidth + val arraySize = config.lane + + @public + val io = IO(new Bundle { + val ctrl_ex_i = Flipped(Decoupled(new ctrl_ex_req(b))) + val ld_ex_i = Flipped(Decoupled(new ld_ex_req(b))) + + val ex_st_o = Decoupled(new ex_st_req(b)) + }) + + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + + val iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val store_counter = RegInit(0.U(6.W)) + val in_counter = RegInit(0.U(b.frontend.iter_len.W)) + + // Use Reg with Vec type for proper register behavior + val in_a_buffer = Reg(Vec(arraySize, Vec(arraySize, UInt(inputWidth.W)))) + val in_b_buffer = Reg(Vec(arraySize, Vec(arraySize, UInt(inputWidth.W)))) + val pes = VecInit(Seq.fill(arraySize)(VecInit(Seq.fill(arraySize)(Module(new PE(inputWidth, outputWidth)).io)))) + + // default values + io.ctrl_ex_i.ready := io.ex_st_o.ready + io.ld_ex_i.ready := io.ex_st_o.ready + + io.ex_st_o.valid := false.B + io.ex_st_o.bits.result := VecInit(Seq.fill(arraySize)(0.U(inputWidth.W))) + + for (row <- 0 until arraySize) { + for (col <- 0 until arraySize) { + pes(row)(col).clear := false.B + } + } + + // input data to buffer + when(io.ld_ex_i.fire) { + in_counter := in_counter + 1.U + when(in_counter < arraySize.U) { + for (i <- 0 until arraySize) { + in_a_buffer(in_counter)(i) := io.ld_ex_i.bits.op1(i) + in_b_buffer(in_counter)(i) := io.ld_ex_i.bits.op2(i) + } + } + } + + // PEs connection + when(in_counter === arraySize.U) { + iter_counter := iter_counter + 1.U + + for (row <- 0 until arraySize) { + for (col <- 0 until arraySize) { + + if (row == 0 && col == 0) { + when(iter_counter < arraySize.U) { + pes(row)(col).in_a.valid := true.B + pes(row)(col).in_a.bits := in_a_buffer(0)(iter_counter) + pes(row)(col).in_b.valid := true.B + pes(row)(col).in_b.bits := in_b_buffer(iter_counter)(0) + }.otherwise { + pes(row)(col).in_a.valid := false.B + pes(row)(col).in_a.bits := 0.U + pes(row)(col).in_b.valid := false.B + pes(row)(col).in_b.bits := 0.U + } + + } else if (row == 0 && col > 0) { + when((iter_counter >= col.U) && (iter_counter < arraySize.U + col.U)) { + pes(row)(col).in_b.valid := true.B + pes(row)(col).in_b.bits := in_b_buffer(iter_counter - col.U)(col) + pes(row)(col).in_a <> pes(row)(col - 1).out_a + }.otherwise { + pes(row)(col).in_b.valid := false.B + pes(row)(col).in_b.bits := 0.U + pes(row)(col).in_a <> pes(row)(col - 1).out_a + } + + } else if (col == 0 && row > 0) { + when((iter_counter >= row.U) && (iter_counter < arraySize.U + row.U)) { + pes(row)(col).in_a.valid := true.B + pes(row)(col).in_a.bits := in_a_buffer(row)(iter_counter - row.U) + pes(row)(col).in_b <> pes(row - 1)(col).out_b + }.otherwise { + pes(row)(col).in_a.valid := false.B + pes(row)(col).in_a.bits := 0.U + pes(row)(col).in_b <> pes(row - 1)(col).out_b + } + + } else if (row > 0 && col > 0) { + pes(row)(col).in_a <> pes(row)(col - 1).out_a + pes(row)(col).in_b <> pes(row - 1)(col).out_b + } + + if (row == arraySize - 1 || col == arraySize - 1) { + pes(row)(col).out_a.ready := io.ex_st_o.ready + pes(row)(col).out_b.ready := io.ex_st_o.ready + } + } + } + + }.otherwise { + for (row <- 0 until arraySize) { + for (col <- 0 until arraySize) { + pes(row)(col).in_a.valid := false.B + pes(row)(col).in_a.bits := 0.U + pes(row)(col).in_b.valid := false.B + pes(row)(col).in_b.bits := 0.U + pes(row)(col).out_a.ready := io.ex_st_o.ready + pes(row)(col).out_b.ready := io.ex_st_o.ready + } + } + } + + // output data from PEs + when(iter_counter >= 40.U) { + when(store_counter < arraySize.U) { + io.ex_st_o.valid := true.B + io.ex_st_o.bits.result := VecInit(pes(store_counter).map(_.out_c)) + store_counter := store_counter + 1.U + }.otherwise { + // back to idle + iter_counter := 0.U + store_counter := 0.U + in_counter := 0.U + + // clear PEs + for (row <- 0 until arraySize) { + for (col <- 0 until arraySize) { + pes(row)(col).clear := true.B + } + } + + } + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayLoad.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayLoad.scala new file mode 100644 index 00000000..d086e8c6 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayLoad.scala @@ -0,0 +1,114 @@ +package framework.balldomain.prototype.systolicarray + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.memdomain.backend.banks.{SramReadReq, SramReadResp} +import framework.top.GlobalConfig +import framework.balldomain.prototype.systolicarray.configs.SystolicBallParam + +@instantiable +class SystolicArrayLoad(val b: GlobalConfig) extends Module { + val config = SystolicBallParam() + val InputNum = 16 + val bankWidth = b.memDomain.bankWidth + val inputWidth = config.inputWidth + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "SystolicArrayBall") + .getOrElse(throw new IllegalArgumentException("SystolicArrayBall not found in config")) + val inBW = ballMapping.inBW + + @public + val io = IO(new Bundle { + val bankReadReq = Vec(inBW, Decoupled(new SramReadReq(b))) + val bankReadResp = Vec(inBW, Flipped(Decoupled(new SramReadResp(b)))) + val ctrl_ld_i = Flipped(Decoupled(new ctrl_ld_req(b))) + val ld_ex_o = Decoupled(new ld_ex_req(b)) + val op1_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val op2_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + }) + + val op1_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op2_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op1_addr = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val op2_addr = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val iter = RegInit(0.U(b.frontend.iter_len.W)) + val op1_iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val op2_iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + val ld_ex_iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + + val bankRespQueue0 = Module(new Queue(new SramReadResp(b), entries = 8)) + val bankRespQueue1 = Module(new Queue(new SramReadResp(b), entries = 8)) + + for (i <- 0 until inBW) { + io.bankReadReq(i).valid := false.B + io.bankReadReq(i).bits.addr := 0.U + } + + io.op1_bank_o := op1_bank + io.op2_bank_o := op2_bank + io.ctrl_ld_i.ready := state === idle + + bankRespQueue0.io.enq <> io.bankReadResp(0) + bankRespQueue1.io.enq <> io.bankReadResp(1) + + when(io.ctrl_ld_i.fire) { + op1_bank := io.ctrl_ld_i.bits.op1_bank + op2_bank := io.ctrl_ld_i.bits.op2_bank + op1_addr := io.ctrl_ld_i.bits.op1_bank_addr + op2_addr := io.ctrl_ld_i.bits.op2_bank_addr + iter := io.ctrl_ld_i.bits.iter + op1_iter_counter := 0.U + op2_iter_counter := 0.U + state := busy + } + + when(state === busy && io.ld_ex_o.ready) { + io.bankReadReq(0).valid := op1_iter_counter < iter + io.bankReadReq(0).bits.addr := op1_addr + op1_iter_counter + op1_iter_counter := Mux(io.bankReadReq(0).ready, op1_iter_counter + 1.U, op1_iter_counter) + } + + when(state === busy && io.ld_ex_o.ready) { + io.bankReadReq(1).valid := op2_iter_counter < iter + io.bankReadReq(1).bits.addr := op2_addr + op2_iter_counter + op2_iter_counter := Mux(io.bankReadReq(1).ready, op2_iter_counter + 1.U, op2_iter_counter) + } + +// ----------------------------------------------------------------------------- +// SRAM returns data and passes to EX unit +// ----------------------------------------------------------------------------- + val both_valid = bankRespQueue0.io.deq.valid && bankRespQueue1.io.deq.valid + + io.ld_ex_o.valid := both_valid + when(both_valid) { + io.ld_ex_o.bits.op1 := bankRespQueue0.io.deq.bits.data.asTypeOf(Vec(InputNum, UInt(inputWidth.W))) + io.ld_ex_o.bits.op2 := bankRespQueue1.io.deq.bits.data.asTypeOf(Vec(InputNum, UInt(inputWidth.W))) + io.ld_ex_o.bits.iter := ld_ex_iter_reg + }.otherwise { + io.ld_ex_o.bits.iter := 0.U + io.ld_ex_o.bits.op1 := VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + io.ld_ex_o.bits.op2 := VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + } + + // Only dequeue and advance iter counter on successful handshake + bankRespQueue0.io.deq.ready := io.ld_ex_o.fire + bankRespQueue1.io.deq.ready := io.ld_ex_o.fire + + when(io.ld_ex_o.fire) { + ld_ex_iter_reg := ld_ex_iter_reg + 1.U + } + +// ----------------------------------------------------------------------------- +// Reset op1_iter_counter and return to idle state +// ----------------------------------------------------------------------------- + + when(state === busy && ld_ex_iter_reg === iter) { + state := idle + op1_iter_counter := 0.U + op2_iter_counter := 0.U + ld_ex_iter_reg := 0.U + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayStore.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayStore.scala new file mode 100644 index 00000000..c930a4d3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayStore.scala @@ -0,0 +1,134 @@ +package framework.balldomain.prototype.systolicarray + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.memdomain.backend.banks.SramWriteIO +import framework.top.GlobalConfig + +class ctrl_st_req(b: GlobalConfig) extends Bundle { + val wr_bank = UInt(log2Up(b.memDomain.bankNum).W) + val wr_bank_addr = UInt(log2Up(b.memDomain.bankEntries).W) + val iter = UInt(b.frontend.iter_len.W) +} + +class BankWriteEntry(b: GlobalConfig) extends Bundle { + val addr = UInt(log2Ceil(b.memDomain.bankEntries).W) + val data = UInt(b.memDomain.bankWidth.W) + val mask = Vec(b.memDomain.bankMaskLen, Bool()) + val wmode = Bool() +} + +@instantiable +class SystolicArrayStore(val b: GlobalConfig) extends Module { + val accWidth = 32 + val InputNum = 16 + + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "SystolicArrayBall") + .getOrElse(throw new IllegalArgumentException("SystolicArrayBall not found in config")) + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val ctrl_st_i = Flipped(Decoupled(new ctrl_st_req(b))) + val ex_st_i = Flipped(Decoupled(new ex_st_req(b))) + val bankWrite = Vec(outBW, Flipped(new SramWriteIO(b))) + val wr_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val cmdResp_o = Valid(new Bundle { val commit = Bool() }) + }) + + val wr_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wr_bank_addr = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val iter = RegInit(0.U(b.frontend.iter_len.W)) + val iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + + val writeQueues = VecInit(Seq.fill(outBW)(Module(new Queue(new BankWriteEntry(b), 16)).io)) + +// ----------------------------------------------------------------------------- +// Set registers when Ctrl instruction arrives +// ----------------------------------------------------------------------------- + io.ctrl_st_i.ready := state === idle + + when(io.ctrl_st_i.fire) { + wr_bank := io.ctrl_st_i.bits.wr_bank + wr_bank_addr := io.ctrl_st_i.bits.wr_bank_addr + iter := (io.ctrl_st_i.bits.iter + 15.U) & (~15.U(b.frontend.iter_len.W)) + iter_counter := 0.U + state := busy + } + +// ----------------------------------------------------------------------------- +// Accept computation results from EX unit and push to write queues +// ----------------------------------------------------------------------------- + io.ex_st_i.ready := state === busy && writeQueues.forall(_.enq.ready) + + for (i <- 0 until outBW) { + writeQueues(i).enq.valid := false.B + writeQueues(i).enq.bits := DontCare + } + + when(io.ex_st_i.fire) { + for (i <- 0 until outBW) { + val elementsPerChannel = InputNum / outBW + val startIdx = i * elementsPerChannel + val endIdx = startIdx + elementsPerChannel - 1 + + val entry = Wire(new BankWriteEntry(b)) + entry.addr := wr_bank_addr + iter_counter(log2Ceil(InputNum) - 1, 0) + entry.data := Cat(io.ex_st_i.bits.result.slice(startIdx, endIdx + 1).reverse) + entry.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(true.B)) + entry.wmode := true.B + + writeQueues(i).enq.valid := true.B + writeQueues(i).enq.bits := entry + } + iter_counter := iter_counter + 1.U + } + +// ----------------------------------------------------------------------------- +// Drain write queues to bankWrite interface +// ----------------------------------------------------------------------------- + io.bankWrite.foreach { acc => + acc.req.valid := false.B + acc.req.bits.addr := 0.U + acc.req.bits.data := Cat(Seq.fill(accWidth / 8)(0.U(8.W))) + acc.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(false.B)) + acc.req.bits.wmode := false.B + acc.resp.ready := false.B + } + + for (i <- 0 until outBW) { + writeQueues(i).deq.ready := false.B + + when(writeQueues(i).deq.valid) { + io.bankWrite(i).req.valid := true.B + io.bankWrite(i).req.bits.addr := writeQueues(i).deq.bits.addr + io.bankWrite(i).req.bits.data := writeQueues(i).deq.bits.data + io.bankWrite(i).req.bits.mask := writeQueues(i).deq.bits.mask + io.bankWrite(i).req.bits.wmode := writeQueues(i).deq.bits.wmode + writeQueues(i).deq.ready := io.bankWrite(i).req.ready + } + } + + // Output wr_bank for bank_id setting + io.wr_bank_o := wr_bank + +// ----------------------------------------------------------------------------- +// Reset iter counter, commit cmdResp, return to idle state +// ----------------------------------------------------------------------------- + val allQueuesEmpty = writeQueues.forall(q => !q.deq.valid) + val allDataEnqueued = state === busy && iter_counter >= iter + + when(allDataEnqueued && allQueuesEmpty) { + state := idle + io.cmdResp_o.valid := true.B + io.cmdResp_o.bits.commit := true.B + }.otherwise { + io.cmdResp_o.valid := false.B + io.cmdResp_o.bits.commit := false.B + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayUnit.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayUnit.scala new file mode 100644 index 00000000..f3ac5bfd --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/SystolicArrayUnit.scala @@ -0,0 +1,105 @@ +package framework.balldomain.prototype.systolicarray + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig +import framework.memdomain.backend.banks.{SramReadReq, SramReadResp, SramWriteIO} + +class ctrl_ld_req(b: GlobalConfig) extends Bundle { + val op1_bank = UInt(log2Up(b.memDomain.bankNum).W) + val op1_bank_addr = UInt(log2Up(b.memDomain.bankEntries).W) + val op2_bank = UInt(log2Up(b.memDomain.bankNum).W) + val op2_bank_addr = UInt(log2Up(b.memDomain.bankEntries).W) + val iter = UInt(b.frontend.iter_len.W) +} + +@instantiable +class SystolicArrayUnit(val b: GlobalConfig) extends Module { + val InputNum = 16 + val inputWidth = 8 + val accWidth = 32 + + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "SystolicArrayBall") + .getOrElse(throw new IllegalArgumentException("SystolicArrayBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + } + + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + val ctrl: Instance[SystolicArrayCtrl] = Instantiate(new SystolicArrayCtrl(b)) + val load: Instance[SystolicArrayLoad] = Instantiate(new SystolicArrayLoad(b)) + val ex: Instance[SystolicArrayEX] = Instantiate(new SystolicArrayEX(b)) + val store: Instance[SystolicArrayStore] = Instantiate(new SystolicArrayStore(b)) + + ctrl.io.cmdReq <> io.cmdReq + io.cmdResp <> ctrl.io.cmdResp_o + + ctrl.io.ctrl_ld_o <> load.io.ctrl_ld_i + ctrl.io.ctrl_ex_o <> ex.io.ctrl_ex_i + ctrl.io.ctrl_st_o <> store.io.ctrl_st_i + + for (i <- 0 until inBW) { + io.bankRead(i).io.req <> load.io.bankReadReq(i) + load.io.bankReadResp(i) <> io.bankRead(i).io.resp + if (i == 0) { + io.bankRead(i).bank_id := load.io.op1_bank_o + io.bankRead(i).group_id := 0.U + } else if (i == 1) { + io.bankRead(i).bank_id := load.io.op2_bank_o + io.bankRead(i).group_id := 0.U + } + } + + load.io.ld_ex_o <> ex.io.ld_ex_i + ex.io.ex_st_o <> store.io.ex_st_i + ctrl.io.cmdResp_i <> store.io.cmdResp_o + + for (i <- 0 until outBW) { + io.bankWrite(i).io <> store.io.bankWrite(i) + io.bankWrite(i).bank_id := store.io.wr_bank_o + io.bankWrite(i).io.req.bits.wmode := true.B + io.bankWrite(i).group_id := i.U + } + + val hasInput = RegInit(false.B) + val hasOutput = RegInit(false.B) + + when(io.cmdReq.fire) { + hasInput := true.B + } + when(io.cmdResp.fire) { + hasOutput := false.B + hasInput := false.B + } + when(io.cmdResp.valid && !hasOutput) { + hasOutput := true.B + } + + io.status.idle := !hasInput && !hasOutput + io.status.running := hasOutput +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/configs/SystolicBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/configs/SystolicBallParam.scala new file mode 100644 index 00000000..035e901e --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/configs/SystolicBallParam.scala @@ -0,0 +1,26 @@ +package framework.balldomain.prototype.systolicarray.configs + +import upickle.default._ + +/** + * SystolicBall Parameter + */ +case class SystolicBallParam( + InputNum: Int, + inputWidth: Int, + lane: Int, + outputWidth: Int, + numMulThreads: Int, + numCasThreads: Int) + +object SystolicBallParam { + implicit val rw: ReadWriter[SystolicBallParam] = macroRW + + def apply(): SystolicBallParam = { + val jsonStr = scala.io.Source.fromFile( + "src/main/scala/framework/balldomain/prototype/systolicarray/configs/default.json" + ).mkString + read[SystolicBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/systolicarray/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/configs/default.json new file mode 100644 index 00000000..d5ad5924 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/systolicarray/configs/default.json @@ -0,0 +1,8 @@ +{ + "InputNum": 16, + "inputWidth": 8, + "lane": 16, + "outputWidth": 32, + "numMulThreads": 16, + "numCasThreads": 16 +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/trace/README.md b/arch/src/main/scala/framework/balldomain/prototype/trace/README.md new file mode 100644 index 00000000..21e085ee --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/trace/README.md @@ -0,0 +1,132 @@ +# TraceBall — 调试追踪 Ball + +## 概述 + +TraceBall 是一个不做计算的特殊 Ball,通过 Buckyball 指令通道提供运行时调试能力。它有两个核心功能: + +1. **Cycle Counter(计数器管理)**— 通过指令 set/release 多个独立的 cycle 计数器,用于测量任意代码区间的执行周期 +2. **Bank Backdoor(SRAM 后门读写)**— 通过 DPI-C 注入数据写入 SRAM bank,或读取 SRAM bank 数据通过 DPI-C 输出 + +所有 DPI-C 接口仅存在于 TraceBall 内部,不影响其他模块。 + +--- + +## 指令编码 + +TraceBall 使用 **两个 funct7 编码**。 + +### 指令 1:`bdb_counter` (funct7 = 48, 0x30) + +Cycle counter 管理。**不访问 SRAM,不需要 bank 端口,1 cycle 完成。** + +rs1 布局: +- rs1 = 任意(不使用,不需要设 BB_RD0/BB_WR 标志) + +rs2 布局(64-bit): +``` +rs2[3:0] = subcmd 子命令 (0=START, 1=STOP, 2=READ) +rs2[7:4] = ctr_id 计数器编号 (0-15,最多 16 个独立计数器) +rs2[63:8] = payload +``` + +子命令定义: + +| subcmd | 名称 | 行为 | payload 含义 | +|--------|------|------|-------------| +| 0 | `CTR_START` | 启动计数器 ctr_id,记录当前 cycle 为起始点 | payload = tag(用户自定义标签,会输出到 trace) | +| 1 | `CTR_STOP` | 停止计数器 ctr_id,输出 elapsed cycles 到 DPI-C trace,然后释放计数器 | payload = 忽略 | +| 2 | `CTR_READ` | 读取计数器 ctr_id 当前值(不停止),输出到 DPI-C trace | payload = 忽略 | + +DPI-C 输出格式(写入 bdb.log): +``` +[CTRACE] CTR_START ctr=0 tag=0xDEAD cycle=10042 +[CTRACE] CTR_STOP ctr=0 tag=0xDEAD elapsed=387 cycle=10429 +[CTRACE] CTR_READ ctr=0 current=200 cycle=10242 +``` + +### 指令 2:`bdb_backdoor` (funct7 = 49, 0x31) + +SRAM 后门读写,**所有参数(bank_id, row, data)由 DPI-C 提供**。**需要 bank 端口(inBW=1, outBW=1)。** + +rs1 布局: +``` +rs1[45] = BB_RD0 读模式:从 DPI-C 获取地址,读 SRAM,输出数据到 DPI-C +rs1[47] = BB_WR 写模式:从 DPI-C 获取地址+数据,写 SRAM +rs1[63:48] = iter 操作次数(0 = 单次,>0 = 循环 iter 次) +``` + +rs2 布局: +- rs2 = 任意(不使用) + +操作模式: + +| rs1 flag | 行为 | DPI-C 交互 | +|----------|------|-----------| +| BB_RD0 | 读模式 | RTL 调用 `dpi_backdoor_get_read_addr()` 获取 (bank_id, row),读 SRAM,调用 `dpi_backdoor_put_read_data()` 输出数据 | +| BB_WR | 写模式 | RTL 调用 `dpi_backdoor_get_write_req()` 获取 (bank_id, row, data),写 SRAM | + +DPI-C 输出格式: +``` +[BANK-TRACE] BACKDOOR_READ bank=2 row=5 data=0x00010002000300040005000600070008 +[BANK-TRACE] BACKDOOR_WRITE bank=3 row=10 data=0xDEADBEEFDEADBEEFDEADBEEFDEADBEEF +``` + + +## 使用示例 + +### 测量算子执行周期 + +```c +// 测量一次 matmul 的 cycle +bdb_counter_start(0, 0xA001); // 计数器 0,tag=matmul +bb_mul_warp16(A, B, C, 16); +bb_fence(); +bdb_counter_stop(0); // 输出 elapsed + +// 测量嵌套区间 +bdb_counter_start(0, 0xB001); // 外层:整个 conv + bdb_counter_start(1, 0xB002); // 内层:im2col + bb_im2col(...); + bb_fence(); + bdb_counter_stop(1); + + bdb_counter_start(2, 0xB003); // 内层:matmul + bb_mul_warp16(...); + bb_fence(); + bdb_counter_stop(2); +bdb_counter_stop(0); // 外层结束 +``` + +bdb.log 输出: +``` +[CTRACE] CTR_START ctr=0 tag=0xB001 cycle=0 +[CTRACE] CTR_START ctr=1 tag=0xB002 cycle=0 +[CTRACE] CTR_STOP ctr=1 tag=0xB002 elapsed=150 cycle=150 +[CTRACE] CTR_START ctr=2 tag=0xB003 cycle=0 +[CTRACE] CTR_STOP ctr=2 tag=0xB003 elapsed=300 cycle=300 +[CTRACE] CTR_STOP ctr=0 tag=0xB001 elapsed=456 cycle=456 +``` + +### SRAM 后门注入测试数据 + +TraceBall内有一个私有bank,不会被配置看到 + +```c +// 不走 DMA,直接通过 DPI-C 注入测试数据到 bank 0 +// (C++ 先通过 DPI-C 注入数据到TraceBall内的bank) +bb_alloc(0, 1, 1) +bdb_backdoor_mvin(16); // 注入 16 行到 私有bank +bdb_backdoor_write(0, 16); // 将私有bank的数据 注入 16 行到 bank 0 + +// 跑 Transpose +bb_transpose(0, 1, 16); +// 检查部分结果 +bdb_backdoor_peek(0, 15); // 检查最后一行 + +// 读出全部结果 +bdb_backdoor_read(1, 16); // dump bank 1 的 16 行到 trace +``` + + + +注意,所有RTL修改都在traceball内部,不允许动外部bank等地方 diff --git a/arch/src/main/scala/framework/balldomain/prototype/trace/Trace.scala b/arch/src/main/scala/framework/balldomain/prototype/trace/Trace.scala new file mode 100644 index 00000000..6ee8fa89 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/trace/Trace.scala @@ -0,0 +1,374 @@ +package framework.balldomain.prototype.trace + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.memdomain.backend.banks.SramBank +import framework.top.GlobalConfig + +/** + * Trace — TraceBall inner processing unit. + * + * Handles two instruction types: + * - bdb_counter (funct7=48): cycle counter management (START/STOP/READ) + * - bdb_backdoor (funct7=49): SRAM backdoor read/write via DPI-C + * + * Backdoor write: C++ generates (row, data) via DPI-C each iteration, + * RTL writes to external bank at (wbank_reg, row). + * Backdoor read: RTL reads external bank at (rbank_reg, row from DPI-C), + * sends data back to C++ via DPI-C for logging. + * + * bank_id always comes from the instruction encoding (rs1), + * row and data come from DPI-C (C++ auto-increments row each call). + */ +@instantiable +class Trace(val b: GlobalConfig) extends Module { + + val ballMapping = b.ballDomain.ballIdMappings + .find(_.ballName == "TraceBall") + .getOrElse(throw new IllegalArgumentException("TraceBall not found in config")) + + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + val bankWidth = b.memDomain.bankWidth + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + // ============================================================ + // Constants + // ============================================================ + val CTR_START = 0.U(4.W) + val CTR_STOP = 1.U(4.W) + val CTR_READ = 2.U(4.W) + + val NUM_COUNTERS = 16 + + // ============================================================ + // State machine + // ============================================================ + val idle :: sCounter :: sBdReadExt :: sBdReadExtResp :: sBdGetWriteData :: sBdWriteExt :: sBdWriteExtResp :: complete :: Nil = + Enum(8) + val state = RegInit(idle) + + // ============================================================ + // Registers + // ============================================================ + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + + // Command decode registers + val isRead_reg = RegInit(false.B) // BB_RD0 flag + val isWrite_reg = RegInit(false.B) // BB_WR flag + val iter_reg = RegInit(0.U(16.W)) + val iterCnt = RegInit(0.U(16.W)) + + // Counter-specific registers + val subcmd_reg = RegInit(0.U(4.W)) + val ctr_id_reg = RegInit(0.U(4.W)) + val payload_reg = RegInit(0.U(56.W)) + + // Cycle counters: 16 independent counters + val cycleCounter = RegInit(0.U(64.W)) + cycleCounter := cycleCounter + 1.U + + val ctrStartCycle = RegInit(VecInit(Seq.fill(NUM_COUNTERS)(0.U(64.W)))) + val ctrTag = RegInit(VecInit(Seq.fill(NUM_COUNTERS)(0.U(56.W)))) + val ctrActive = RegInit(VecInit(Seq.fill(NUM_COUNTERS)(false.B))) + + // Bank metadata registers (from instruction encoding) + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + + // Backdoor address/data (row from DPI-C, data from DPI-C for writes) + val bd_addr_reg = RegInit(0.U(32.W)) + val bd_data_reg = RegInit(0.U(bankWidth.W)) + + // ============================================================ + // Private SramBank (staging buffer for future use) + // ============================================================ + val privBank = Module(new SramBank(b)) + + // Default: private bank idle + privBank.io.sramRead.req.valid := false.B + privBank.io.sramRead.req.bits.addr := 0.U + privBank.io.sramRead.resp.ready := false.B + privBank.io.sramWrite.req.valid := false.B + privBank.io.sramWrite.req.bits.addr := 0.U + privBank.io.sramWrite.req.bits.data := 0.U + privBank.io.sramWrite.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(true.B)) + privBank.io.sramWrite.req.bits.wmode := false.B + privBank.io.sramWrite.resp.ready := true.B + + // ============================================================ + // DPI-C modules + // ============================================================ + val ctraceDpi = Module(new CTraceDPI) + ctraceDpi.io.subcmd := 0.U + ctraceDpi.io.ctr_id := 0.U + ctraceDpi.io.tag := 0.U + ctraceDpi.io.elapsed := 0.U + ctraceDpi.io.cycle := cycleCounter + ctraceDpi.io.enable := false.B + + val bdGetReadAddr = Module(new BackdoorGetReadAddrDPI) + val bdGetWriteAddr = Module(new BackdoorGetWriteAddrDPI) + val bdGetWriteData = Module(new BackdoorGetWriteDataDPI) + val bdPutReadData = Module(new BackdoorPutReadDataDPI) + val bdPutWriteDone = Module(new BackdoorPutWriteDoneDPI) + + bdGetReadAddr.io.enable := false.B + bdGetWriteAddr.io.enable := false.B + bdGetWriteData.io.enable := false.B + + bdPutReadData.io.bank_id := 0.U + bdPutReadData.io.row := 0.U + bdPutReadData.io.data_lo := 0.U + bdPutReadData.io.data_hi := 0.U + bdPutReadData.io.enable := false.B + + bdPutWriteDone.io.bank_id := 0.U + bdPutWriteDone.io.row := 0.U + bdPutWriteDone.io.data_lo := 0.U + bdPutWriteDone.io.data_hi := 0.U + bdPutWriteDone.io.enable := false.B + + // ============================================================ + // External bank port defaults + // ============================================================ + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + io.bankRead(i).bank_id := rbank_reg + io.bankRead(i).group_id := 0.U + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + } + + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + io.bankWrite(i).bank_id := wbank_reg + io.bankWrite(i).group_id := 0.U + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + } + + // ============================================================ + // Command interface + // ============================================================ + io.cmdReq.ready := state === idle + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + // ============================================================ + // State machine + // ============================================================ + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + + val cmd = io.cmdReq.bits.cmd + val rs2 = cmd.rs2 + + // Distinguish counter vs backdoor by op1_en / wr_spad_en + val isBackdoor = cmd.op1_en || cmd.wr_spad_en + + isRead_reg := cmd.op1_en // BB_RD0 + isWrite_reg := cmd.wr_spad_en // BB_WR + iter_reg := cmd.iter + iterCnt := 0.U + + // Counter fields from rs2 + subcmd_reg := rs2(3, 0) + ctr_id_reg := rs2(7, 4) + payload_reg := rs2(63, 8) + + // Bank IDs from decoded cmd + rbank_reg := cmd.op1_bank + wbank_reg := cmd.wr_bank + + when(!isBackdoor) { + state := sCounter + }.elsewhen(cmd.wr_spad_en) { + // Backdoor write: get row+data from DPI-C, write external bank + state := sBdGetWriteData + }.otherwise { + // Backdoor read: get row from DPI-C, read external bank + state := sBdReadExt + } + } + } + + // ---------------------------------------------------------- + // Counter instruction: execute and complete in 1 cycle + // ---------------------------------------------------------- + is(sCounter) { + val cid = ctr_id_reg + + ctraceDpi.io.subcmd := subcmd_reg + ctraceDpi.io.ctr_id := cid + + switch(subcmd_reg) { + is(CTR_START) { + ctrStartCycle(cid) := cycleCounter + ctrTag(cid) := payload_reg + ctrActive(cid) := true.B + + ctraceDpi.io.tag := payload_reg + ctraceDpi.io.elapsed := 0.U + ctraceDpi.io.enable := true.B + } + is(CTR_STOP) { + val elapsed = cycleCounter - ctrStartCycle(cid) + ctrActive(cid) := false.B + + ctraceDpi.io.tag := ctrTag(cid) + ctraceDpi.io.elapsed := elapsed + ctraceDpi.io.enable := true.B + } + is(CTR_READ) { + val current = cycleCounter - ctrStartCycle(cid) + + ctraceDpi.io.tag := ctrTag(cid) + ctraceDpi.io.elapsed := current + ctraceDpi.io.enable := true.B + } + } + + state := complete + } + + // ---------------------------------------------------------- + // Backdoor read: get row from DPI-C, read external bank + // ---------------------------------------------------------- + is(sBdReadExt) { + // Get row from DPI-C (C++ auto-increments) + bdGetReadAddr.io.enable := true.B + val row = bdGetReadAddr.io.result(31, 0) + + bd_addr_reg := row + + // Issue read to external bank (bank_id from instruction) + io.bankRead(0).io.req.valid := true.B + io.bankRead(0).io.req.bits.addr := row + io.bankRead(0).io.resp.ready := true.B + + when(io.bankRead(0).io.req.fire) { + state := sBdReadExtResp + } + } + + is(sBdReadExtResp) { + io.bankRead(0).io.resp.ready := true.B + + when(io.bankRead(0).io.resp.valid) { + val data = io.bankRead(0).io.resp.bits.data + + // Output data via DPI-C + bdPutReadData.io.bank_id := rbank_reg + bdPutReadData.io.row := bd_addr_reg + bdPutReadData.io.data_lo := data(63, 0) + bdPutReadData.io.data_hi := data(127, 64) + bdPutReadData.io.enable := true.B + + // Check if more iterations + iterCnt := iterCnt + 1.U + when(iterCnt >= iter_reg) { + state := complete + }.otherwise { + state := sBdReadExt + } + } + } + + // ---------------------------------------------------------- + // Backdoor write: get row+data from DPI-C, write external bank + // ---------------------------------------------------------- + is(sBdGetWriteData) { + // Get row from DPI-C (C++ auto-increments and pre-generates data) + bdGetWriteAddr.io.enable := true.B + val row = bdGetWriteAddr.io.result(31, 0) + bd_addr_reg := row + + // Get data from DPI-C + bdGetWriteData.io.enable := true.B + val fullData = Cat(bdGetWriteData.io.data_hi, bdGetWriteData.io.data_lo) + bd_data_reg := fullData + + state := sBdWriteExt + } + + is(sBdWriteExt) { + // Write to external bank (bank_id from instruction) + io.bankWrite(0).io.req.valid := true.B + io.bankWrite(0).io.req.bits.addr := bd_addr_reg + io.bankWrite(0).io.req.bits.data := bd_data_reg + io.bankWrite(0).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(1.U(1.W))) + io.bankWrite(0).io.req.bits.wmode := false.B + io.bankWrite(0).io.resp.ready := true.B + + when(io.bankWrite(0).io.req.fire) { + state := sBdWriteExtResp + } + } + + is(sBdWriteExtResp) { + io.bankWrite(0).io.resp.ready := true.B + + when(io.bankWrite(0).io.resp.valid) { + // Log the write via DPI-C + bdPutWriteDone.io.bank_id := wbank_reg + bdPutWriteDone.io.row := bd_addr_reg + bdPutWriteDone.io.data_lo := bd_data_reg(63, 0) + bdPutWriteDone.io.data_hi := bd_data_reg(127, 64) + bdPutWriteDone.io.enable := true.B + + // Check if more iterations + iterCnt := iterCnt + 1.U + when(iterCnt >= iter_reg) { + state := complete + }.otherwise { + state := sBdGetWriteData + } + } + } + + // ---------------------------------------------------------- + // Complete: fire cmdResp + // ---------------------------------------------------------- + is(complete) { + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + when(io.cmdResp.fire) { + state := idle + } + } + } + + // ============================================================ + // Status + // ============================================================ + io.status.idle := state === idle + io.status.running := state =/= idle && state =/= complete +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/trace/TraceBall.scala b/arch/src/main/scala/framework/balldomain/prototype/trace/TraceBall.scala new file mode 100644 index 00000000..fe41b2be --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/trace/TraceBall.scala @@ -0,0 +1,48 @@ +package framework.balldomain.prototype.trace + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.top.GlobalConfig + +/** + * TraceBall — Debug trace Ball providing cycle counters and SRAM backdoor. + * + * Uses two funct7 encodings: + * - bdb_counter (funct7=48): cycle counter management + * - bdb_backdoor (funct7=49): SRAM backdoor read/write via DPI-C + */ +@instantiable +class TraceBall(val b: GlobalConfig) extends Module with HasBlink { + + val ballCommonConfig = b.ballDomain.ballIdMappings + .find(_.ballName == "TraceBall") + .getOrElse(throw new IllegalArgumentException("TraceBall not found in config")) + + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + val traceUnit: Instance[Trace] = Instantiate(new Trace(b)) + + traceUnit.io.cmdReq <> io.cmdReq + traceUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + traceUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + traceUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> traceUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/trace/TraceDPI.scala b/arch/src/main/scala/framework/balldomain/prototype/trace/TraceDPI.scala new file mode 100644 index 00000000..79d6327e --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/trace/TraceDPI.scala @@ -0,0 +1,237 @@ +package framework.balldomain.prototype.trace + +import chisel3._ +import chisel3.util._ + +/** + * DPI-C BlackBox for cycle counter trace. + * Outputs [CTRACE] lines to bdb.log. + */ +class CTraceDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val subcmd = Input(UInt(8.W)) + val ctr_id = Input(UInt(32.W)) + val tag = Input(UInt(64.W)) + val elapsed = Input(UInt(64.W)) + val cycle = Input(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "CTraceDPI.v", + """ + |import "DPI-C" function void dpi_ctrace( + | input byte unsigned subcmd, + | input int unsigned ctr_id, + | input longint unsigned tag, + | input longint unsigned elapsed, + | input longint unsigned cycle + |); + | + |module CTraceDPI( + | input [7:0] subcmd, + | input [31:0] ctr_id, + | input [63:0] tag, + | input [63:0] elapsed, + | input [63:0] cycle, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_ctrace(subcmd, ctr_id, tag, elapsed, cycle); + | end + | end + |endmodule + """.stripMargin + ) +} + +/** + * DPI-C BlackBox for backdoor get_read_addr. + * Returns packed [63:32]=bank_id, [31:0]=row. + */ +class BackdoorGetReadAddrDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val result = Output(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "BackdoorGetReadAddrDPI.v", + """ + |import "DPI-C" function longint unsigned dpi_backdoor_get_read_addr(); + | + |module BackdoorGetReadAddrDPI( + | output [63:0] result, + | input enable + |); + | reg [63:0] result_reg; + | assign result = result_reg; + | always @(*) begin + | result_reg = 64'd0; + | if (enable) begin + | result_reg = dpi_backdoor_get_read_addr(); + | end + | end + |endmodule + """.stripMargin + ) +} + +/** + * DPI-C BlackBox for backdoor get_write_addr. + * Returns packed [63:32]=bank_id, [31:0]=row. + */ +class BackdoorGetWriteAddrDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val result = Output(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "BackdoorGetWriteAddrDPI.v", + """ + |import "DPI-C" function longint unsigned dpi_backdoor_get_write_addr(); + | + |module BackdoorGetWriteAddrDPI( + | output [63:0] result, + | input enable + |); + | reg [63:0] result_reg; + | assign result = result_reg; + | always @(*) begin + | result_reg = 64'd0; + | if (enable) begin + | result_reg = dpi_backdoor_get_write_addr(); + | end + | end + |endmodule + """.stripMargin + ) +} + +/** + * DPI-C BlackBox for backdoor get_write_data. + * Returns 128-bit data as two 64-bit outputs. + */ +class BackdoorGetWriteDataDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val data_lo = Output(UInt(64.W)) + val data_hi = Output(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "BackdoorGetWriteDataDPI.v", + """ + |import "DPI-C" function void dpi_backdoor_get_write_data( + | output longint unsigned data_lo, + | output longint unsigned data_hi + |); + | + |module BackdoorGetWriteDataDPI( + | output [63:0] data_lo, + | output [63:0] data_hi, + | input enable + |); + | reg [63:0] data_lo_reg; + | reg [63:0] data_hi_reg; + | assign data_lo = data_lo_reg; + | assign data_hi = data_hi_reg; + | always @(*) begin + | data_lo_reg = 64'd0; + | data_hi_reg = 64'd0; + | if (enable) begin + | dpi_backdoor_get_write_data(data_lo_reg, data_hi_reg); + | end + | end + |endmodule + """.stripMargin + ) +} + +/** + * DPI-C BlackBox for backdoor put_read_data. + * Reports read data back to C++ for logging. + */ +class BackdoorPutReadDataDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val bank_id = Input(UInt(32.W)) + val row = Input(UInt(32.W)) + val data_lo = Input(UInt(64.W)) + val data_hi = Input(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "BackdoorPutReadDataDPI.v", + """ + |import "DPI-C" function void dpi_backdoor_put_read_data( + | input int unsigned bank_id, + | input int unsigned row, + | input longint unsigned data_lo, + | input longint unsigned data_hi + |); + | + |module BackdoorPutReadDataDPI( + | input [31:0] bank_id, + | input [31:0] row, + | input [63:0] data_lo, + | input [63:0] data_hi, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_backdoor_put_read_data(bank_id, row, data_lo, data_hi); + | end + | end + |endmodule + """.stripMargin + ) +} + +/** + * DPI-C BlackBox for backdoor put_write_done. + * Reports write completion back to C++ for logging. + */ +class BackdoorPutWriteDoneDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val bank_id = Input(UInt(32.W)) + val row = Input(UInt(32.W)) + val data_lo = Input(UInt(64.W)) + val data_hi = Input(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "BackdoorPutWriteDoneDPI.v", + """ + |import "DPI-C" function void dpi_backdoor_put_write_done( + | input int unsigned bank_id, + | input int unsigned row, + | input longint unsigned data_lo, + | input longint unsigned data_hi + |); + | + |module BackdoorPutWriteDoneDPI( + | input [31:0] bank_id, + | input [31:0] row, + | input [63:0] data_lo, + | input [63:0] data_hi, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_backdoor_put_write_done(bank_id, row, data_lo, data_hi); + | end + | end + |endmodule + """.stripMargin + ) +} diff --git a/arch/src/main/scala/prototype/transpose/README.md b/arch/src/main/scala/framework/balldomain/prototype/transpose/README.md similarity index 100% rename from arch/src/main/scala/prototype/transpose/README.md rename to arch/src/main/scala/framework/balldomain/prototype/transpose/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/transpose/Transpose.scala b/arch/src/main/scala/framework/balldomain/prototype/transpose/Transpose.scala new file mode 100644 index 00000000..8ec247a0 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/transpose/Transpose.scala @@ -0,0 +1,219 @@ +package framework.balldomain.prototype.transpose + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.balldomain.prototype.vector._ +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.top.GlobalConfig +import framework.balldomain.prototype.transpose.configs.TransposeBallParam + +@instantiable +class Transpose(val b: GlobalConfig) extends Module { + val ballConfig = TransposeBallParam() + val InputNum = ballConfig.InputNum + val inputWidth = ballConfig.inputWidth + val bankWidth = b.memDomain.bankWidth + + val ballMapping = b.ballDomain.ballIdMappings + .find(_.ballName == "TransposeBall") + .getOrElse(throw new IllegalArgumentException("TransposeBall not found in config")) + + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + // ------------------------------- + // ROB / IDs + // ------------------------------- + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + } + + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + // ------------------------------- + // State: idle -> fill -> drain -> (fill or idle) + // For 16xN with stride = N/16: + // fill(16) -> drain(16) -> fill(16) -> drain(16) -> ... -> idle + // Total: 32 * stride cycles + // ------------------------------- + val idle :: fill :: drain :: Nil = Enum(3) + val state = RegInit(idle) + + // ------------------------------- + // Single 16x16 register array + // ------------------------------- + val regArray = Reg(Vec(InputNum, Vec(InputNum, UInt(inputWidth.W)))) + + // Command fields + val rbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wbank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val iter_reg = RegInit(0.U(32.W)) + val stride = RegInit(1.U(32.W)) // iter / InputNum = addresses per row + + // Counters + val fillIdx = RegInit(0.U(log2Ceil(InputNum + 1).W)) // row being filled (0..15) + val drainIdx = RegInit(0.U(log2Ceil(InputNum + 1).W)) // column being drained (0..16) + val round = RegInit(0.U(32.W)) // which column-group (0..stride-1) + + // Read request tracking + val readReqCnt = RegInit(0.U(32.W)) + val readRespCnt = RegInit(0.U(32.W)) + + // ------------------------------- + // Default IO assignments + // ------------------------------- + for (i <- 0 until inBW) { + io.bankRead(i).io.req.valid := false.B + io.bankRead(i).io.req.bits.addr := 0.U + io.bankRead(i).io.resp.ready := false.B + io.bankRead(i).bank_id := rbank_reg + io.bankRead(i).group_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).io.req.valid := false.B + io.bankWrite(i).io.req.bits.addr := 0.U + io.bankWrite(i).io.req.bits.data := 0.U + io.bankWrite(i).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(0.U(1.W))) + io.bankWrite(i).io.req.bits.wmode := false.B + io.bankWrite(i).io.resp.ready := false.B + io.bankWrite(i).bank_id := wbank_reg + io.bankWrite(i).group_id := 0.U + } + + io.cmdReq.ready := (state === idle) + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + io.bankRead(0).io.resp.ready := (state =/= idle) + io.bankWrite(0).io.resp.ready := (state =/= idle) + + // ------------------------------- + // Helpers + // ------------------------------- + // Read address: strided to gather one column-group + // For round r, row i: addr = i * stride + r + def readAddr(row: UInt, r: UInt): UInt = row * stride + r + + // Pack one column of regArray into a data word + def packColumn(col: UInt): UInt = { + Cat((0 until InputNum).reverse.map { r => + regArray(r.U)(col) + }) + } + + // ------------------------------- + // Main FSM + // ------------------------------- + switch(state) { + is(idle) { + when(io.cmdReq.fire) { + val iterVal = io.cmdReq.bits.cmd.iter + val strideVal = iterVal >> log2Ceil(InputNum) + + rbank_reg := io.cmdReq.bits.cmd.op1_bank + wbank_reg := io.cmdReq.bits.cmd.wr_bank + iter_reg := iterVal + stride := Mux(strideVal === 0.U, 1.U, strideVal) + round := 0.U + fillIdx := 0.U + drainIdx := 0.U + readReqCnt := 0.U + readRespCnt := 0.U + state := fill + } + } + + // ------------------------------------------------------- + // FILL: read 16 rows into regArray for current round + // ------------------------------------------------------- + is(fill) { + // Send read requests + io.bankRead(0).io.req.valid := (fillIdx < InputNum.U) + io.bankRead(0).io.req.bits.addr := readAddr(fillIdx, round) + when(io.bankRead(0).io.req.fire) { + readReqCnt := readReqCnt + 1.U + fillIdx := fillIdx + 1.U + } + + // Handle responses: fill regArray row by row + when(io.bankRead(0).io.resp.fire) { + val dataWord = io.bankRead(0).io.resp.bits.data + val respRow = readRespCnt(log2Ceil(InputNum) - 1, 0) + for (col <- 0 until InputNum) { + val hi = (col + 1) * inputWidth - 1 + val lo = col * inputWidth + regArray(respRow)(col) := dataWord(hi, lo) + } + readRespCnt := readRespCnt + 1.U + + // All 16 responses received → buffer full, go to drain + when(readRespCnt(log2Ceil(InputNum) - 1, 0) === (InputNum - 1).U) { + drainIdx := 0.U + state := drain + } + } + } + + // ------------------------------------------------------- + // DRAIN: write out 16 transposed columns, then fill next + // round or signal completion + // ------------------------------------------------------- + is(drain) { + when(drainIdx < InputNum.U) { + io.bankWrite(0).io.req.valid := true.B + io.bankWrite(0).io.req.bits.addr := round * InputNum.U + drainIdx + io.bankWrite(0).io.req.bits.data := packColumn(drainIdx) + io.bankWrite(0).io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(1.U(1.W))) + + when(io.bankWrite(0).io.req.fire) { + drainIdx := drainIdx + 1.U + } + }.otherwise { + // All 16 columns written + when(round === stride - 1.U) { + // Last round → signal completion + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + when(io.cmdResp.fire) { + state := idle + } + }.otherwise { + // More rounds → advance round and fill next + round := round + 1.U + fillIdx := 0.U + state := fill + } + } + } + } + + io.status.idle := (state === idle) + io.status.running := (state =/= idle) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/transpose/TransposeBall.scala b/arch/src/main/scala/framework/balldomain/prototype/transpose/TransposeBall.scala new file mode 100644 index 00000000..2d8a8b32 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/transpose/TransposeBall.scala @@ -0,0 +1,40 @@ +package framework.balldomain.prototype.transpose + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.balldomain.prototype.transpose.Transpose +import framework.top.GlobalConfig + +@instantiable +class TransposeBall(val b: GlobalConfig) extends Module with HasBlink { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "TransposeBall") + .getOrElse(throw new IllegalArgumentException("TransposeBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + + val transposeUnit: Instance[Transpose] = Instantiate(new Transpose(b)) + + transposeUnit.io.cmdReq <> io.cmdReq + transposeUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + transposeUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + transposeUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> transposeUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/transpose/configs/TransposeBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/transpose/configs/TransposeBallParam.scala new file mode 100644 index 00000000..a11a7343 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/transpose/configs/TransposeBallParam.scala @@ -0,0 +1,21 @@ +package framework.balldomain.prototype.transpose.configs + +import upickle.default._ + +/** + * TransposeBall Parameter + */ +case class TransposeBallParam( + InputNum: Int, + inputWidth: Int) + +object TransposeBallParam { + implicit val rw: ReadWriter[TransposeBallParam] = macroRW + + def apply(): TransposeBallParam = { + val jsonStr = + scala.io.Source.fromFile("src/main/scala/framework/balldomain/prototype/transpose/configs/default.json").mkString + read[TransposeBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/transpose/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/transpose/configs/default.json new file mode 100644 index 00000000..a4552216 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/transpose/configs/default.json @@ -0,0 +1,4 @@ +{ + "InputNum": 16, + "inputWidth": 8 +} diff --git a/arch/src/main/scala/prototype/vector/README.md b/arch/src/main/scala/framework/balldomain/prototype/vector/README.md similarity index 100% rename from arch/src/main/scala/prototype/vector/README.md rename to arch/src/main/scala/framework/balldomain/prototype/vector/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/VecBall.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/VecBall.scala new file mode 100644 index 00000000..f9d6b820 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/VecBall.scala @@ -0,0 +1,44 @@ +package framework.balldomain.prototype.vector + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.balldomain.blink.{BallStatus, BlinkIO, HasBallStatus, HasBlink, SubRobRow} +import framework.balldomain.prototype.vector.VecUnit +import framework.top.GlobalConfig + +/** + * VecBall + */ +@instantiable +class VecBall(val b: GlobalConfig) extends Module with HasBlink with HasBallStatus { + + val ballCommonConfig = b.ballDomain.ballIdMappings.find(_.ballName == "VecBall") + .getOrElse(throw new IllegalArgumentException("VecBall not found in config")) + val inBW = ballCommonConfig.inBW + val outBW = ballCommonConfig.outBW + + @public + val io = IO(new BlinkIO(b, inBW, outBW)) + + def blink: BlinkIO = io + def status: BallStatus = io.status + + val vecUnit: Instance[VecUnit] = Instantiate(new VecUnit(b)) + + vecUnit.io.cmdReq <> io.cmdReq + vecUnit.io.cmdResp <> io.cmdResp + + for (i <- 0 until inBW) { + vecUnit.io.bankRead(i) <> io.bankRead(i) + } + + for (i <- 0 until outBW) { + vecUnit.io.bankWrite(i) <> io.bankWrite(i) + } + + io.status <> vecUnit.io.status + + io.subRobReq.valid := false.B + io.subRobReq.bits := SubRobRow.tieOff(b) +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/VecCtrlUnit.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/VecCtrlUnit.scala new file mode 100644 index 00000000..44af26da --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/VecCtrlUnit.scala @@ -0,0 +1,123 @@ +package framework.balldomain.prototype.vector + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.top.GlobalConfig + +@instantiable +class VecCtrlUnit(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp_o = Decoupled(new BallRsComplete(b)) + + val ctrl_ld_o = Decoupled(new ctrl_ld_req(b)) + val ctrl_st_o = Decoupled(new ctrl_st_req(b)) + val ctrl_ex_o = Decoupled(new ctrl_ex_req(b)) + + val cmdResp_i = Flipped(Valid(new Bundle { val commit = Bool() })) // from store unit + }) + + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + val iter = RegInit(0.U(b.frontend.iter_len.W)) + val op1_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op1_bank_addr = RegInit(0.U(12.W)) // New ISA: always 0, but keep for compatibility + val op2_bank_addr = RegInit(0.U(12.W)) // New ISA: always 0, but keep for compatibility + val op2_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wr_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wr_bank_addr = RegInit(0.U(12.W)) // New ISA: always 0, but keep for compatibility + val is_acc = RegInit(false.B) // Deprecated: use wmode instead + val has_send = RegInit(false.B) + val mode = RegInit(0.U(1.W)) + + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + +// ----------------------------------------------------------------------------- +// Set registers when EX instruction arrives +// ----------------------------------------------------------------------------- + + when(io.cmdReq.fire) { + iter := io.cmdReq.bits.cmd.iter + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + op1_bank := io.cmdReq.bits.cmd.op1_bank + op1_bank_addr := 0.U // New ISA: all operations start from row 0 + op2_bank := io.cmdReq.bits.cmd.op2_bank + op2_bank_addr := 0.U // New ISA: all operations start from row 0 + wr_bank := io.cmdReq.bits.cmd.wr_bank + wr_bank_addr := 0.U // New ISA: all operations start from row 0 + is_acc := false.B // Deprecated: use wmode instead + mode := io.cmdReq.bits.cmd.special(0) + + state := busy + } + +// ----------------------------------------------------------------------------- +// Send control signals to VecUnit's load/store/ex units +// ----------------------------------------------------------------------------- + + when(state === busy && !has_send) { + io.ctrl_ld_o.valid := true.B + io.ctrl_ld_o.bits.op1_bank := op1_bank + io.ctrl_ld_o.bits.op1_bank_addr := op1_bank_addr + io.ctrl_ld_o.bits.op2_bank := op2_bank + io.ctrl_ld_o.bits.op2_bank_addr := op2_bank_addr + io.ctrl_ld_o.bits.iter := iter + io.ctrl_ld_o.bits.mode := mode + + io.ctrl_ex_o.valid := true.B + io.ctrl_ex_o.bits.iter := iter + + io.ctrl_st_o.valid := true.B + io.ctrl_st_o.bits.wr_bank := wr_bank + io.ctrl_st_o.bits.wr_bank_addr := wr_bank_addr + io.ctrl_st_o.bits.iter := iter + + has_send := true.B + }.otherwise { + io.ctrl_ld_o.valid := false.B + io.ctrl_ld_o.bits.op1_bank := 0.U + io.ctrl_ld_o.bits.op1_bank_addr := 0.U + io.ctrl_ld_o.bits.op2_bank := 0.U + io.ctrl_ld_o.bits.op2_bank_addr := 0.U + io.ctrl_ld_o.bits.iter := 0.U + io.ctrl_ld_o.bits.mode := 0.U + + io.ctrl_ex_o.valid := false.B + io.ctrl_ex_o.bits.iter := 0.U + + io.ctrl_st_o.valid := false.B + io.ctrl_st_o.bits.wr_bank := 0.U + io.ctrl_st_o.bits.wr_bank_addr := 0.U + io.ctrl_st_o.bits.iter := 0.U + } + +// ----------------------------------------------------------------------------- +// Wait for VecUnit's final write-back to complete +// ----------------------------------------------------------------------------- + + when(io.cmdResp_i.valid) { + io.cmdResp_o.valid := true.B + io.cmdResp_o.bits.rob_id := rob_id_reg + io.cmdResp_o.bits.is_sub := is_sub_reg + io.cmdResp_o.bits.sub_rob_id := sub_rob_id_reg + state := idle + has_send := false.B + }.otherwise { + io.cmdResp_o.valid := false.B + io.cmdResp_o.bits.rob_id := 0.U + io.cmdResp_o.bits.is_sub := false.B + io.cmdResp_o.bits.sub_rob_id := 0.U + } + + io.cmdReq.ready := state === idle + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/VecEXUnit.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/VecEXUnit.scala new file mode 100644 index 00000000..50f81ed3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/VecEXUnit.scala @@ -0,0 +1,92 @@ +package framework.balldomain.prototype.vector + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.balldomain.prototype.vector.warp.MeshWarp +import framework.balldomain.prototype.vector.configs.VectorBallParam + +class ctrl_ex_req(b: GlobalConfig) extends Bundle { + val iter = UInt(b.frontend.iter_len.W) +} + +class ld_ex_req(b: GlobalConfig) extends Bundle { + val config = VectorBallParam() + val op1 = Vec(config.lane, UInt(config.inputWidth.W)) + val op2 = Vec(config.lane, UInt(config.inputWidth.W)) + val iter = UInt(b.frontend.iter_len.W) +} + +@instantiable +class VecEXUnit(val b: GlobalConfig) extends Module { + val config = VectorBallParam() + val InputNum = config.lane + val inputWidth = config.inputWidth + val accWidth = config.outputWidth + + @public + val io = IO(new Bundle { + val ctrl_ex_i = Flipped(Decoupled(new ctrl_ex_req(b))) + val ld_ex_i = Flipped(Decoupled(new ld_ex_req(b))) + + val ex_st_o = Decoupled(new ex_st_req(b)) + }) + + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + + val meshWarp = Module(new MeshWarp(config)) + + // Thread ID for MeshWarp (always use thread 0 for now) + val threadId = RegInit(0.U(10.W)) + + // Initialize default values for all signals + io.ctrl_ex_i.ready := false.B + io.ex_st_o.valid := false.B + io.ex_st_o.bits.rst := VecInit(Seq.fill(InputNum)(0.U(accWidth.W))) + io.ex_st_o.bits.iter := 0.U + + // Initialize MeshWarp input signals with default values + meshWarp.io.in.valid := false.B + meshWarp.io.in.bits.op1 := VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + meshWarp.io.in.bits.op2 := VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + meshWarp.io.in.bits.thread_id := threadId + meshWarp.io.out.ready := false.B + +// ----------------------------------------------------------------------------- +// Set registers when Ctrl instruction arrives +// ----------------------------------------------------------------------------- + io.ctrl_ex_i.ready := state === idle + when(io.ctrl_ex_i.fire) { + threadId := 0.U // Use thread 0 for computation + state := busy + } + when(io.ld_ex_i.fire) { + threadId := Mux(threadId === (config.numMulThreads - 1).U, 0.U, threadId + 1.U) + } + +// ----------------------------------------------------------------------------- +// Accept read results from load unit and perform computation +// ----------------------------------------------------------------------------- + io.ld_ex_i.ready := state === busy && meshWarp.io.in.ready + when(io.ld_ex_i.valid) { + meshWarp.io.in.valid := true.B + meshWarp.io.in.bits.op1 := io.ld_ex_i.bits.op1 + meshWarp.io.in.bits.op2 := io.ld_ex_i.bits.op2 + meshWarp.io.in.bits.thread_id := threadId + } + +// ----------------------------------------------------------------------------- +// Send computation results to store unit for write-back +// ----------------------------------------------------------------------------- + io.ex_st_o.valid := meshWarp.io.out.valid + meshWarp.io.out.ready := io.ex_st_o.ready + + when(io.ex_st_o.fire) { + io.ex_st_o.bits.rst := meshWarp.io.out.bits.res + io.ex_st_o.bits.iter := io.ld_ex_i.bits.iter // Use iter from ld_ex_i + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/VecLoadUnit.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/VecLoadUnit.scala new file mode 100644 index 00000000..5ea42d23 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/VecLoadUnit.scala @@ -0,0 +1,164 @@ +package framework.balldomain.prototype.vector + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.memdomain.backend.banks.{SramReadReq, SramReadResp} +import framework.top.GlobalConfig +import framework.balldomain.prototype.vector.configs.VectorBallParam + +class ctrl_ld_req(b: GlobalConfig) extends Bundle { + val op1_bank = UInt(log2Up(b.memDomain.bankNum).W) + val op1_bank_addr = UInt(log2Up(b.memDomain.bankEntries).W) + val op2_bank = UInt(log2Up(b.memDomain.bankNum).W) + val op2_bank_addr = UInt(log2Up(b.memDomain.bankEntries).W) + val iter = UInt(b.frontend.iter_len.W) + val mode = UInt(1.W) +} + +@instantiable +class VecLoadUnit(val b: GlobalConfig) extends Module { + val config = VectorBallParam() + val InputNum = config.lane + val inputWidth = config.inputWidth + val bankWidth = b.memDomain.bankWidth + val rob_id_width = log2Up(b.frontend.rob_entries) + + // Get bandwidth from config (use first VecBall mapping) + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "VecBall") + .getOrElse(throw new IllegalArgumentException("VecBall not found in config")) + val inBW = ballMapping.inBW + + @public + val io = IO(new Bundle { + val bankReadReq = Vec(inBW, Decoupled(new SramReadReq(b))) + val bankReadResp = Vec(inBW, Flipped(Decoupled(new SramReadResp(b)))) + val ctrl_ld_i = Flipped(Decoupled(new ctrl_ld_req(b))) + val ld_ex_o = Decoupled(new ld_ex_req(b)) + val op1_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val op2_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + }) + + val op1_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op2_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val op1_addr = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val op2_addr = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val iter = RegInit(0.U(b.frontend.iter_len.W)) + val op1_iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val op2_iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + val ld_ex_op1_reg = Reg(Vec(InputNum, UInt(inputWidth.W))) + val ld_ex_op2_reg = Reg(Vec(InputNum, UInt(inputWidth.W))) + val ld_ex_iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + val wait1_reg = RegInit(false.B) + val wait2_reg = RegInit(false.B) + val wait1_cnt = RegInit(0.U(7.W)) + val wait2_cnt = RegInit(0.U(7.W)) + + val bankRespQueue0 = Module(new Queue(new SramReadResp(b), entries = 8)) + val bankRespQueue1 = Module(new Queue(new SramReadResp(b), entries = 8)) + + for (i <- 0 until inBW) { + io.bankReadReq(i).valid := false.B + io.bankReadReq(i).bits.addr := 0.U + } + + io.op1_bank_o := op1_bank + io.op2_bank_o := op2_bank + io.ctrl_ld_i.ready := state === idle + + bankRespQueue0.io.enq <> io.bankReadResp(0) + bankRespQueue1.io.enq <> io.bankReadResp(1) + +// ----------------------------------------------------------------------------- +// Set registers when Ctrl instruction arrives +// ----------------------------------------------------------------------------- + when(io.ctrl_ld_i.fire) { + op1_bank := io.ctrl_ld_i.bits.op1_bank + op2_bank := io.ctrl_ld_i.bits.op2_bank + op1_addr := io.ctrl_ld_i.bits.op1_bank_addr + op2_addr := io.ctrl_ld_i.bits.op2_bank_addr + iter := io.ctrl_ld_i.bits.iter + op1_iter_counter := 0.U + op2_iter_counter := 0.U + state := busy + assert(io.ctrl_ld_i.bits.iter > 0.U, "iter should be greater than 0") + } + +// ----------------------------------------------------------------------------- +// Send SRAM read request +// ----------------------------------------------------------------------------- + //wait1_reg := Mux(io.run , 0.U, wait1_reg) + + //wait2_reg := Mux(io.run , 0.U, wait2_reg) + + when(state === busy && io.ld_ex_o.ready && !wait1_reg) { + io.bankReadReq(0).valid := op1_iter_counter < iter + io.bankReadReq(0).bits.addr := op1_addr + op1_iter_counter + op1_iter_counter := Mux(io.bankReadReq(0).ready, op1_iter_counter + 1.U, op1_iter_counter) + wait1_reg := Mux((op1_iter_counter + 1.U) % 16.U === 0.U, 1.U, 0.U) + when(io.bankReadReq(0).fire) {} + } + + when(state === busy && io.ld_ex_o.ready && !wait2_reg) { + io.bankReadReq(1).valid := op2_iter_counter < iter + io.bankReadReq(1).bits.addr := op2_addr + op2_iter_counter + op2_iter_counter := Mux(io.bankReadReq(1).ready, op2_iter_counter + 1.U, op2_iter_counter) + wait2_reg := Mux((op2_iter_counter + 1.U) % 16.U === 0.U, 1.U, 0.U) + when(io.bankReadReq(1).fire) {} + } + + when(wait1_reg) { + wait1_cnt := wait1_cnt + 1.U + when(wait1_cnt === 32.U) { + wait1_reg := false.B + wait1_cnt := 0.U + } + } + + when(wait2_reg) { + wait2_cnt := wait2_cnt + 1.U + when(wait2_cnt === 32.U) { + wait2_reg := false.B + wait2_cnt := 0.U + } + } + +// ----------------------------------------------------------------------------- +// SRAM returns data and passes to EX unit +// ----------------------------------------------------------------------------- + val both_valid = bankRespQueue0.io.deq.valid && bankRespQueue1.io.deq.valid + + io.ld_ex_o.valid := both_valid + when(both_valid) { + io.ld_ex_o.bits.op1 := bankRespQueue0.io.deq.bits.data.asTypeOf(Vec(InputNum, UInt(inputWidth.W))) + io.ld_ex_o.bits.op2 := bankRespQueue1.io.deq.bits.data.asTypeOf(Vec(InputNum, UInt(inputWidth.W))) + io.ld_ex_o.bits.iter := ld_ex_iter_reg + }.otherwise { + io.ld_ex_o.bits.iter := 0.U + io.ld_ex_o.bits.op1 := VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + io.ld_ex_o.bits.op2 := VecInit(Seq.fill(InputNum)(0.U(inputWidth.W))) + } + + // Only dequeue and advance iter counter on successful handshake + bankRespQueue0.io.deq.ready := io.ld_ex_o.fire + bankRespQueue1.io.deq.ready := io.ld_ex_o.fire + + when(io.ld_ex_o.fire) { + ld_ex_iter_reg := ld_ex_iter_reg + 1.U + } + +// ----------------------------------------------------------------------------- +// Reset op1_iter_counter and return to idle state +// ----------------------------------------------------------------------------- + + when(state === busy && ld_ex_iter_reg === iter) { + state := idle + op1_iter_counter := 0.U + op2_iter_counter := 0.U + ld_ex_iter_reg := 0.U + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/VecStoreUnit.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/VecStoreUnit.scala new file mode 100644 index 00000000..9dc711dd --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/VecStoreUnit.scala @@ -0,0 +1,145 @@ +package framework.balldomain.prototype.vector + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.memdomain.backend.banks.SramWriteIO +import framework.top.GlobalConfig +import framework.balldomain.prototype.vector.configs.VectorBallParam + +class ctrl_st_req(b: GlobalConfig) extends Bundle { + val wr_bank = UInt(log2Up(b.memDomain.bankNum).W) + val wr_bank_addr = UInt(log2Up(b.memDomain.bankEntries).W) + val iter = UInt(b.frontend.iter_len.W) +} + +class ex_st_req(b: GlobalConfig) extends Bundle { + val config = VectorBallParam() + val InputNum = config.lane + val accWidth = config.outputWidth + val rst = Vec(InputNum, UInt(accWidth.W)) + val iter = UInt(b.frontend.iter_len.W) +} + +class BankWriteEntry(b: GlobalConfig) extends Bundle { + val addr = UInt(log2Ceil(b.memDomain.bankEntries).W) + val data = UInt(b.memDomain.bankWidth.W) + val mask = Vec(b.memDomain.bankMaskLen, Bool()) + val wmode = Bool() +} + +@instantiable +class VecStoreUnit(val b: GlobalConfig) extends Module { + val config = VectorBallParam() + val InputNum = config.lane + val accWidth = config.outputWidth + + // Get bandwidth from config (use first VecBall mapping) + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "VecBall") + .getOrElse(throw new IllegalArgumentException("VecBall not found in config")) + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val ctrl_st_i = Flipped(Decoupled(new ctrl_st_req(b))) + val ex_st_i = Flipped(Decoupled(new ex_st_req(b))) + val bankWrite = Vec(outBW, Flipped(new SramWriteIO(b))) + val wr_bank_o = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val cmdResp_o = Valid(new Bundle { val commit = Bool() }) + }) + + val wr_bank = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val wr_bank_addr = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val iter = RegInit(0.U(b.frontend.iter_len.W)) + val iter_counter = RegInit(0.U(b.frontend.iter_len.W)) + val idle :: busy :: Nil = Enum(2) + val state = RegInit(idle) + + val writeQueues = VecInit(Seq.fill(outBW)(Module(new Queue(new BankWriteEntry(b), 16)).io)) + +// ----------------------------------------------------------------------------- +// Set registers when Ctrl instruction arrives +// ----------------------------------------------------------------------------- + io.ctrl_st_i.ready := state === idle + + when(io.ctrl_st_i.fire) { + wr_bank := io.ctrl_st_i.bits.wr_bank + wr_bank_addr := io.ctrl_st_i.bits.wr_bank_addr + iter := (io.ctrl_st_i.bits.iter + 15.U) & (~15.U(b.frontend.iter_len.W)) + iter_counter := 0.U + state := busy + } + +// ----------------------------------------------------------------------------- +// Accept computation results from EX unit and push to write queues +// ----------------------------------------------------------------------------- + io.ex_st_i.ready := state === busy && writeQueues.forall(_.enq.ready) + + for (i <- 0 until outBW) { + writeQueues(i).enq.valid := false.B + writeQueues(i).enq.bits := DontCare + } + + when(io.ex_st_i.fire) { + for (i <- 0 until outBW) { + val elementsPerChannel = InputNum / outBW + val startIdx = i * elementsPerChannel + val endIdx = startIdx + elementsPerChannel - 1 + + val entry = Wire(new BankWriteEntry(b)) + entry.addr := wr_bank_addr + iter_counter(log2Ceil(InputNum) - 1, 0) + entry.data := Cat(io.ex_st_i.bits.rst.slice(startIdx, endIdx + 1).reverse) + entry.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(true.B)) + entry.wmode := true.B + + writeQueues(i).enq.valid := true.B + writeQueues(i).enq.bits := entry + } + iter_counter := iter_counter + 1.U + } + +// ----------------------------------------------------------------------------- +// Drain write queues to bankWrite interface +// ----------------------------------------------------------------------------- + io.bankWrite.foreach { acc => + acc.req.valid := false.B + acc.req.bits.addr := 0.U + acc.req.bits.data := Cat(Seq.fill(accWidth / 8)(0.U(8.W))) + acc.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(false.B)) + acc.req.bits.wmode := false.B + acc.resp.ready := false.B + } + + for (i <- 0 until outBW) { + writeQueues(i).deq.ready := false.B + + when(writeQueues(i).deq.valid) { + io.bankWrite(i).req.valid := true.B + io.bankWrite(i).req.bits.addr := writeQueues(i).deq.bits.addr + io.bankWrite(i).req.bits.data := writeQueues(i).deq.bits.data + io.bankWrite(i).req.bits.mask := writeQueues(i).deq.bits.mask + io.bankWrite(i).req.bits.wmode := writeQueues(i).deq.bits.wmode + writeQueues(i).deq.ready := io.bankWrite(i).req.ready + } + } + + // Output wr_bank for bank_id setting + io.wr_bank_o := wr_bank + +// ----------------------------------------------------------------------------- +// Reset iter counter, commit cmdResp, return to idle state +// ----------------------------------------------------------------------------- + val allQueuesEmpty = writeQueues.forall(q => !q.deq.valid) + val allDataEnqueued = state === busy && iter_counter >= iter + + when(allDataEnqueued && allQueuesEmpty) { + state := idle + io.cmdResp_o.valid := true.B + io.cmdResp_o.bits.commit := true.B + }.otherwise { + io.cmdResp_o.valid := false.B + io.cmdResp_o.bits.commit := false.B + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/VecUnit.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/VecUnit.scala new file mode 100644 index 00000000..4c67005c --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/VecUnit.scala @@ -0,0 +1,141 @@ +package framework.balldomain.prototype.vector + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} + +import framework.balldomain.prototype.vector._ +import framework.balldomain.rs.{BallRsComplete, BallRsIssue} +import framework.top.GlobalConfig +import framework.balldomain.blink.{BallStatus, BankRead, BankWrite} +import framework.balldomain.prototype.vector.configs.VectorBallParam + +@instantiable +class VecUnit(val b: GlobalConfig) extends Module { + val ballConfig = VectorBallParam() + val InputNum = ballConfig.lane + val inputWidth = ballConfig.inputWidth + val accWidth = ballConfig.outputWidth + val bankWidth = b.memDomain.bankWidth + + // Get bandwidth from config (use first VecBall mapping) + val ballMapping = b.ballDomain.ballIdMappings.find(_.ballName == "VecBall") + .getOrElse(throw new IllegalArgumentException("VecBall not found in config")) + val inBW = ballMapping.inBW + val outBW = ballMapping.outBW + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new BallRsIssue(b))) + val cmdResp = Decoupled(new BallRsComplete(b)) + val bankRead = Vec(inBW, Flipped(new BankRead(b))) + val bankWrite = Vec(outBW, Flipped(new BankWrite(b))) + val status = new BallStatus + }) + + // Register to store rob_id when command is received + val rob_id_reg = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + when(io.cmdReq.fire) { + rob_id_reg := io.cmdReq.bits.rob_id + } + + // Set rob_id and ball_id for all bankRead and bankWrite channels from register + for (i <- 0 until inBW) { + io.bankRead(i).rob_id := rob_id_reg + io.bankRead(i).ball_id := 0.U + } + for (i <- 0 until outBW) { + io.bankWrite(i).rob_id := rob_id_reg + io.bankWrite(i).ball_id := 0.U + } + + val VecCtrlUnit: Instance[VecCtrlUnit] = Instantiate(new VecCtrlUnit(b)) + val VecLoadUnit: Instance[VecLoadUnit] = Instantiate(new VecLoadUnit(b)) + val VecEX: Instance[VecEXUnit] = Instantiate(new VecEXUnit(b)) + val VecStoreUnit: Instance[VecStoreUnit] = Instantiate(new VecStoreUnit(b)) + +// ----------------------------------------------------------------------------- +// VECCTRLUNIT +// ----------------------------------------------------------------------------- + + VecCtrlUnit.io.cmdReq <> io.cmdReq + io.cmdResp <> VecCtrlUnit.io.cmdResp_o + +// ----------------------------------------------------------------------------- +// VECLOADUNIT +// ----------------------------------------------------------------------------- + + VecLoadUnit.io.ctrl_ld_i <> VecCtrlUnit.io.ctrl_ld_o + for (i <- 0 until inBW) { + io.bankRead(i).io.req <> VecLoadUnit.io.bankReadReq(i) + VecLoadUnit.io.bankReadResp(i) <> io.bankRead(i).io.resp + if (i == 0) { + io.bankRead(i).bank_id := VecLoadUnit.io.op1_bank_o + io.bankRead(i).group_id := 0.U + } else if (i == 1) { + io.bankRead(i).bank_id := VecLoadUnit.io.op2_bank_o + io.bankRead(i).group_id := 0.U + } + } + +// ----------------------------------------------------------------------------- +// VECEX +// ----------------------------------------------------------------------------- + + VecEX.io.ctrl_ex_i <> VecCtrlUnit.io.ctrl_ex_o + VecEX.io.ld_ex_i <> VecLoadUnit.io.ld_ex_o + +// ----------------------------------------------------------------------------- +// VECSTOREUNIT +// ----------------------------------------------------------------------------- + VecStoreUnit.io.ctrl_st_i <> VecCtrlUnit.io.ctrl_st_o + VecStoreUnit.io.ex_st_i <> VecEX.io.ex_st_o + + // Debug: print wr_bank_o + when(VecStoreUnit.io.ctrl_st_i.fire) { + printf("[VecUnit] VecStoreUnit wr_bank_o=%d\n", VecStoreUnit.io.wr_bank_o) + } + + for (i <- 0 until outBW) { + io.bankWrite(i).io <> VecStoreUnit.io.bankWrite(i) + io.bankWrite(i).bank_id := VecStoreUnit.io.wr_bank_o + io.bankWrite(i).io.req.bits.wmode := true.B + io.bankWrite(i).group_id := i.U + + // Debug: print all channels + when(io.bankWrite(i).io.req.valid) { + printf( + "[VecUnit] bankWrite[%d]: bank_id=%d group_id=%d valid=%d ready=%d\n", + i.U, + io.bankWrite(i).bank_id, + io.bankWrite(i).group_id, + io.bankWrite(i).io.req.valid, + io.bankWrite(i).io.req.ready + ) + } + } + VecCtrlUnit.io.cmdResp_i <> VecStoreUnit.io.cmdResp_o + +// ----------------------------------------------------------------------------- +// Status tracking +// ----------------------------------------------------------------------------- + val iterCnt = RegInit(0.U(32.W)) + val hasInput = RegInit(false.B) + val hasOutput = RegInit(false.B) + + when(io.cmdReq.fire) { + hasInput := true.B + } + when(io.cmdResp.fire) { + hasOutput := false.B + hasInput := false.B + iterCnt := iterCnt + 1.U + } + when(io.cmdResp.valid && !hasOutput) { + hasOutput := true.B + } + + io.status.idle := !hasInput && !hasOutput + io.status.running := hasOutput +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/bond/BondWrapper.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/bond/BondWrapper.scala new file mode 100644 index 00000000..ba46b3da --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/bond/BondWrapper.scala @@ -0,0 +1,14 @@ +package framework.balldomain.prototype.vector.bond + +import chisel3._ +import chisel3.util._ + +abstract class BondWrapper { + val bondName = "vvv" + + def to[T](name: String)(body: => T): T = + body + + def from[T](name: String)(body: => T): T = + body +} diff --git a/arch/src/main/scala/prototype/vector/bond/README.md b/arch/src/main/scala/framework/balldomain/prototype/vector/bond/README.md similarity index 100% rename from arch/src/main/scala/prototype/vector/bond/README.md rename to arch/src/main/scala/framework/balldomain/prototype/vector/bond/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/bond/vvv.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/bond/vvv.scala new file mode 100644 index 00000000..5dcfffea --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/bond/vvv.scala @@ -0,0 +1,29 @@ +//===----------------------------------------------------------------------===// +// VVV Bond: +// Input: Vec, Vec +// Output: Vec +//===----------------------------------------------------------------------===// + +package framework.balldomain.prototype.vector.bond + +import chisel3._ +import chisel3.util._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import chisel3.experimental.hierarchy.{instantiable, public} + +@instantiable +class VVV(val config: VectorBallParam, val inputWidth: Int, val outputWidth: Int) extends Bundle { + val lane = config.lane + + @public + val in = Flipped(Decoupled(new Bundle { + val in1 = Vec(lane, UInt(inputWidth.W)) + val in2 = Vec(lane, UInt(inputWidth.W)) + })) + + @public + val out = Decoupled(new Bundle { + val out = Vec(lane, UInt(outputWidth.W)) + }) + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/configs/VectorBallParam.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/configs/VectorBallParam.scala new file mode 100644 index 00000000..12da7338 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/configs/VectorBallParam.scala @@ -0,0 +1,26 @@ +package framework.balldomain.prototype.vector.configs + +import upickle.default._ + +/** + * VectorBall Parameter + */ +case class VectorBallParam( + InputNum: Int, + inputWidth: Int, + lane: Int, + outputWidth: Int, + numMulThreads: Int, + numCasThreads: Int) + +object VectorBallParam { + implicit val rw: ReadWriter[VectorBallParam] = macroRW + + def apply(): VectorBallParam = { + val jsonStr = scala.io.Source.fromFile( + "src/main/scala/framework/balldomain/prototype/vector/configs/default.json" + ).mkString + read[VectorBallParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/configs/default.json b/arch/src/main/scala/framework/balldomain/prototype/vector/configs/default.json new file mode 100644 index 00000000..d5ad5924 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/configs/default.json @@ -0,0 +1,8 @@ +{ + "InputNum": 16, + "inputWidth": 8, + "lane": 16, + "outputWidth": 32, + "numMulThreads": 16, + "numCasThreads": 16 +} diff --git a/arch/src/main/scala/prototype/vector/op/README.md b/arch/src/main/scala/framework/balldomain/prototype/vector/op/README.md similarity index 100% rename from arch/src/main/scala/prototype/vector/op/README.md rename to arch/src/main/scala/framework/balldomain/prototype/vector/op/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/op/cascade.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/op/cascade.scala new file mode 100644 index 00000000..f0805623 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/op/cascade.scala @@ -0,0 +1,40 @@ +package framework.balldomain.prototype.vector.op + +import chisel3._ +import chisel3.util._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.vector.bond.VVV + +class CascadeOp(val config: VectorBallParam) extends Module { + val lane = config.lane + val inputWidth = config.outputWidth // Cascade uses outputWidth as input + val outputWidth = config.outputWidth + + val io = IO(new VVV(config, inputWidth, outputWidth)) + + val reg1 = RegInit(VecInit(Seq.fill(lane)(0.U(outputWidth.W)))) + val reg2 = RegInit(VecInit(Seq.fill(lane)(0.U(outputWidth.W)))) + val valid1 = RegInit(false.B) + val valid2 = RegInit(false.B) + + io.in.ready := io.out.ready + + when(io.in.valid) { + valid1 := true.B + reg1 := io.in.bits.in1.zip(io.in.bits.in2).map { case (a, b) => a + b } + }.elsewhen(!io.in.ready) { + valid1 := valid1 + }.otherwise { + valid1 := false.B + } + + val valid = valid1 + + when(io.out.ready && valid) { + io.out.valid := true.B + io.out.bits.out := reg1 + }.otherwise { + io.out.valid := false.B + io.out.bits.out := VecInit(Seq.fill(lane)(0.U(outputWidth.W))) + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/op/mul.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/op/mul.scala new file mode 100644 index 00000000..03e9fe58 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/op/mul.scala @@ -0,0 +1,37 @@ +package framework.balldomain.prototype.vector.op + +import chisel3._ +import chisel3.util._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.vector.bond.VVV + +class MulOp(val config: VectorBallParam) extends Module { + val lane = config.lane + val inputWidth = config.inputWidth + val outputWidth = config.outputWidth + + val io = IO(new VVV(config, inputWidth, outputWidth)) + val reg1 = RegInit(VecInit(Seq.fill(lane)(0.U(inputWidth.W)))) + val reg2 = RegInit(VecInit(Seq.fill(lane)(0.U(inputWidth.W)))) + val cnt = RegInit(0.U(log2Ceil(lane).W)) + val active = RegInit(false.B) + + io.out.valid := active && io.out.ready + io.in.ready := io.out.ready + + when(io.in.valid) { + reg1 := io.in.bits.in1 + reg2 := io.in.bits.in2 + cnt := 0.U + active := true.B + }.elsewhen(active && io.out.ready) { + cnt := cnt + 1.U + when(cnt === (lane - 1).U) { + active := false.B + } + } + + for (i <- 0 until lane) { + io.out.bits.out(i) := Mux(io.out.valid, reg1(cnt) * reg2(i), 0.U) + } +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/thread/BaseThread.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/BaseThread.scala new file mode 100644 index 00000000..cf7469b7 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/BaseThread.scala @@ -0,0 +1,13 @@ +//===- BaseThread.scala - Level 1: Thread ---===// +package framework.balldomain.prototype.vector.thread + +import chisel3._ +import framework.balldomain.prototype.vector.configs.VectorBallParam + +//===----------------------------------------------------------------------===// +// BaseThread base class +//===----------------------------------------------------------------------===// +class BaseThread(val config: VectorBallParam, val opType: String) extends Module { + val io = IO(new Bundle {}) + val lane = config.lane +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/thread/CasThread.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/CasThread.scala new file mode 100644 index 00000000..fbd130f3 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/CasThread.scala @@ -0,0 +1,16 @@ +package framework.balldomain.prototype.vector.thread + +import chisel3._ +import chisel3.util._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.vector.bond.VVV +import framework.balldomain.prototype.vector.op.CascadeOp + +class CasThread(config: VectorBallParam) extends BaseThread(config, "cascade") { + val cascadeOp = Module(new CascadeOp(this.config)) + val vvvBond = IO(new VVV(this.config, this.config.outputWidth, this.config.outputWidth)) + + // Connect CascadeOp and VVVBond + cascadeOp.io.in <> vvvBond.in + cascadeOp.io.out <> vvvBond.out +} diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/thread/MulThread.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/MulThread.scala new file mode 100644 index 00000000..dc9b0025 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/MulThread.scala @@ -0,0 +1,16 @@ +package framework.balldomain.prototype.vector.thread + +import chisel3._ +import chisel3.util._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.vector.bond.VVV +import framework.balldomain.prototype.vector.op.MulOp + +class MulThread(config: VectorBallParam) extends BaseThread(config, "mul") { + val mulOp = Module(new MulOp(this.config)) + val vvvBond = IO(new VVV(this.config, this.config.inputWidth, this.config.outputWidth)) + + // Connect MulOp and VVVBond + mulOp.io.in <> vvvBond.in + mulOp.io.out <> vvvBond.out +} diff --git a/arch/src/main/scala/prototype/vector/thread/README.md b/arch/src/main/scala/framework/balldomain/prototype/vector/thread/README.md similarity index 100% rename from arch/src/main/scala/prototype/vector/thread/README.md rename to arch/src/main/scala/framework/balldomain/prototype/vector/thread/README.md diff --git a/arch/src/main/scala/framework/balldomain/prototype/vector/warp/MeshWarp.scala b/arch/src/main/scala/framework/balldomain/prototype/vector/warp/MeshWarp.scala new file mode 100644 index 00000000..dc07b947 --- /dev/null +++ b/arch/src/main/scala/framework/balldomain/prototype/vector/warp/MeshWarp.scala @@ -0,0 +1,84 @@ +package framework.balldomain.prototype.vector.warp + +import chisel3._ +import chisel3.util._ +import chisel3.stage._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.vector.thread._ + +class MeshWarpInput(val config: VectorBallParam) extends Bundle { + val op1 = Vec(config.lane, UInt(config.inputWidth.W)) + val op2 = Vec(config.lane, UInt(config.inputWidth.W)) + val thread_id = UInt(10.W) +} + +class MeshWarpOutput(val config: VectorBallParam) extends Bundle { + val res = Vec(config.lane, UInt(config.outputWidth.W)) +} + +class MeshWarp(val config: VectorBallParam) extends Module { + + val io = IO(new Bundle { + val in = Flipped(Decoupled(new MeshWarpInput(config))) + val out = Decoupled(new MeshWarpOutput(config)) + }) + + val mulThreads = (0 until config.numMulThreads).map { i => + Module(new MulThread(config)) + } + + val casThreads = (0 until config.numCasThreads).map { i => + Module(new CasThread(config)) + } + + io.in.ready := mulThreads(0).vvvBond.in.ready + + for (i <- 0 until config.numMulThreads) { + val mulThread = mulThreads(i) + val casThread = casThreads(i) + val mulBond = mulThread.vvvBond + val casBond = casThread.vvvBond + + // Connect mul thread output to cascade thread input + casBond.in.bits.in1 := mulBond.out.bits.out + mulBond.out.ready := casBond.in.ready + + // Connect cascade thread's second input and output ready signal + if (i == 0) { + casBond.in.bits.in2 := VecInit(Seq.fill(config.lane)(0.U(config.outputWidth.W))) + // First cascade thread's valid is determined by mulBond's output valid + casBond.in.valid := mulBond.out.valid + // First cascade thread's output ready is connected to next cascade thread's input ready + if (i < config.numCasThreads - 1) { + casBond.out.ready := casThreads(i + 1).vvvBond.in.ready + } + } else { + // Directly connect output to input + casBond.in.bits.in2 := casThreads(i - 1).vvvBond.out.bits.out + // casBond's valid is jointly determined by previous casBond's output valid and current mulBond's output valid + casBond.in.valid := casThreads(i - 1).vvvBond.out.valid || mulBond.out.valid + // Middle cascade thread's output ready is connected to next cascade thread's input ready + if (i < config.numCasThreads - 1) { + casBond.out.ready := casThreads(i + 1).vvvBond.in.ready + } + } + + // Only allow mulOp corresponding to thread_id to drive input + when(i.U === io.in.bits.thread_id && io.in.valid) { + mulBond.in.valid := true.B + mulBond.in.bits.in1 := io.in.bits.op1 + mulBond.in.bits.in2 := io.in.bits.op2 + io.in.ready := mulBond.in.ready + }.otherwise { + mulBond.in.valid := false.B + mulBond.in.bits.in1 := VecInit(Seq.fill(config.lane)(0.U(config.inputWidth.W))) + mulBond.in.bits.in2 := VecInit(Seq.fill(config.lane)(0.U(config.inputWidth.W))) + } + } + + // Connect output + val finalCasBond = casThreads(config.numCasThreads - 1).vvvBond + io.out.valid := finalCasBond.out.valid + io.out.bits.res := finalCasBond.out.bits.out + finalCasBond.out.ready := io.out.ready +} diff --git a/arch/src/main/scala/prototype/vector/warp/README.md b/arch/src/main/scala/framework/balldomain/prototype/vector/warp/README.md similarity index 100% rename from arch/src/main/scala/prototype/vector/warp/README.md rename to arch/src/main/scala/framework/balldomain/prototype/vector/warp/README.md diff --git a/arch/src/main/scala/framework/balldomain/rs/reservationStation.scala b/arch/src/main/scala/framework/balldomain/rs/reservationStation.scala index f461309f..35cb7f67 100644 --- a/arch/src/main/scala/framework/balldomain/rs/reservationStation.scala +++ b/arch/src/main/scala/framework/balldomain/rs/reservationStation.scala @@ -2,107 +2,133 @@ package framework.balldomain.rs import chisel3._ import chisel3.util._ -import chisel3.experimental._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import examples.toy.balldomain._ -import framework.balldomain.blink.BallRegist - -// Ball device information - configuration information for registering Ball devices -case class BallRsRegist( - ballId: Int, - ballName: String -) +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import examples.toy.balldomain.BallDecodeCmd // Ball domain issue interface - includes global rob_id -class BallRsIssue(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val cmd = new BallDecodeCmd +class BallRsIssue(b: GlobalConfig) extends Bundle { + val cmd = new BallDecodeCmd(b.memDomain.bankNum, b.frontend.iter_len) // Global ROB ID - val rob_id = UInt(log2Up(b.rob_entries).W) + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) } // Ball domain completion interface -class BallRsComplete(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val rob_id = UInt(log2Up(b.rob_entries).W) +class BallRsComplete(b: GlobalConfig) extends Bundle { + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) } // Generic Ball domain issue interface - supports dynamic number of Ball devices -class BallIssueInterface(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val balls = Vec(numBalls, Decoupled(new BallRsIssue)) +class BallIssueInterface(b: GlobalConfig) extends Bundle { + val balls = Vec(b.ballDomain.ballNum, Decoupled(new BallRsIssue(b))) } // Generic Ball domain completion interface - supports dynamic number of Ball devices -class BallCommitInterface(numBalls: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val balls = Vec(numBalls, Flipped(Decoupled(new BallRsComplete))) +class BallCommitInterface(b: GlobalConfig) extends Bundle { + val balls = Vec(b.ballDomain.ballNum, Flipped(Decoupled(new BallRsComplete(b)))) } // Local Ball reservation station - simple FIFO scheduler -class BallReservationStation(BallRsRegists: Seq[BallRsRegist]) - (implicit b: CustomBuckyballConfig, p: Parameters) extends Module { +@instantiable +class BallReservationStation(val b: GlobalConfig) extends Module { - val numBalls = BallRsRegists.length + @public + val ball_decode_cmd_i = IO(Flipped(new DecoupledIO(new Bundle { + val cmd = new BallDecodeCmd(b.memDomain.bankNum, b.frontend.iter_len) + // Global ROB ID + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) + }))) - val io = IO(new Bundle { - // Decoded instruction input (with global rob_id) - val ball_decode_cmd_i = Flipped(new DecoupledIO(new Bundle { - val cmd = new BallDecodeCmd - // Global ROB ID - val rob_id = UInt(log2Up(b.rob_entries).W) - })) + // Rs -> BallController (multi-channel issue) + @public + val issue_o = IO(new BallIssueInterface(b)) - // Rs -> BallController (multi-channel issue) - val issue_o = new BallIssueInterface(numBalls) - val commit_i = new BallCommitInterface(numBalls) + @public + val commit_i = IO(new BallCommitInterface(b)) - // Output completion signal (with global rob_id, single channel) - val complete_o = Decoupled(new BallRsComplete) - }) + // Output completion signal (with global rob_id, single channel) + @public + val complete_o = IO(Decoupled(new BallRsComplete(b))) // Simple FIFO queue, only for buffering - val fifo = Module(new Queue(new Bundle { - val cmd = new BallDecodeCmd - val rob_id = UInt(log2Up(b.rob_entries).W) - }, entries = 4)) // Small buffer is sufficient + val fifo = Module(new Queue( + new Bundle { + val cmd = new BallDecodeCmd(b.memDomain.bankNum, b.frontend.iter_len) + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) + }, + entries = 4 + )) // Small buffer is sufficient // ----------------------------------------------------------------------------- // Inbound - FIFO enqueue // ----------------------------------------------------------------------------- - fifo.io.enq <> io.ball_decode_cmd_i + fifo.io.enq <> ball_decode_cmd_i // ----------------------------------------------------------------------------- // Outbound - instruction issue (dispatch to corresponding Ball device based on bid) // ----------------------------------------------------------------------------- val headEntry = fifo.io.deq.bits + // Build ballId to index mapping from config + // Config order should match the order in busRegister.scala + // Each index in issue_o.balls corresponds to a ball registered in BBus + val numConfiguredBalls = b.ballDomain.ballIdMappings.length + // Set issue signals for each Ball device - for (i <- 0 until numBalls) { - val ballId = BallRsRegists(i).ballId.U - io.issue_o.balls(i).valid := fifo.io.deq.valid && headEntry.cmd.bid === ballId - io.issue_o.balls(i).bits.cmd := headEntry.cmd - io.issue_o.balls(i).bits.rob_id := headEntry.rob_id + // Use configured ball id mappings: index i in issue_o.balls corresponds to ballIdMappings(i).ballId + for (i <- 0 until b.ballDomain.ballNum) { + if (i < numConfiguredBalls) { + val configuredBallId = b.ballDomain.ballIdMappings(i).ballId.U + issue_o.balls(i).valid := fifo.io.deq.valid && headEntry.cmd.bid === configuredBallId + issue_o.balls(i).bits.cmd := headEntry.cmd + issue_o.balls(i).bits.rob_id := headEntry.rob_id + issue_o.balls(i).bits.is_sub := headEntry.is_sub + issue_o.balls(i).bits.sub_rob_id := headEntry.sub_rob_id + } else { + // Unused slots - no valid signal + issue_o.balls(i).valid := false.B + issue_o.balls(i).bits.cmd := DontCare + issue_o.balls(i).bits.rob_id := DontCare + issue_o.balls(i).bits.is_sub := DontCare + issue_o.balls(i).bits.sub_rob_id := DontCare + } } // FIFO deq.ready - can only dequeue when target Ball device is ready + // Find which index corresponds to the requested ballId fifo.io.deq.ready := VecInit( - BallRsRegists.zipWithIndex.map { case (info, idx) => - (headEntry.cmd.bid === info.ballId.U) && io.issue_o.balls(idx).ready + (0 until b.ballDomain.ballNum).map { idx => + if (idx < numConfiguredBalls) { + val configuredBallId = b.ballDomain.ballIdMappings(idx).ballId.U + (headEntry.cmd.bid === configuredBallId) && issue_o.balls(idx).ready + } else { + false.B + } } ).asUInt.orR // ----------------------------------------------------------------------------- // Completion signal processing - directly forward to global RS // ----------------------------------------------------------------------------- - val completeArb = Module(new Arbiter(UInt(log2Up(b.rob_entries).W), numBalls)) + val completeArb = Module(new Arbiter(new BallRsComplete(b), b.ballDomain.ballNum)) // Connect completion signals from all Ball devices to arbiter - for (i <- 0 until numBalls) { - completeArb.io.in(i).valid := io.commit_i.balls(i).valid - completeArb.io.in(i).bits := io.commit_i.balls(i).bits.rob_id - io.commit_i.balls(i).ready := completeArb.io.in(i).ready + for (i <- 0 until b.ballDomain.ballNum) { + completeArb.io.in(i).valid := commit_i.balls(i).valid + completeArb.io.in(i).bits := commit_i.balls(i).bits + commit_i.balls(i).ready := completeArb.io.in(i).ready } // Forward completion signal (with global rob_id) - io.complete_o.valid := completeArb.io.out.valid - io.complete_o.bits.rob_id := completeArb.io.out.bits - completeArb.io.out.ready := io.complete_o.ready + complete_o.valid := completeArb.io.out.valid + complete_o.bits := completeArb.io.out.bits + completeArb.io.out.ready := complete_o.ready } diff --git a/arch/src/main/scala/framework/balldomain/rs/rob.scala b/arch/src/main/scala/framework/balldomain/rs/rob.scala deleted file mode 100644 index 25f7bff6..00000000 --- a/arch/src/main/scala/framework/balldomain/rs/rob.scala +++ /dev/null @@ -1,197 +0,0 @@ -package framework.balldomain.rs - -import chisel3._ -import chisel3.util._ -import chisel3.experimental._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import examples.toy.balldomain.BallDecodeCmd - -// ROB entry data structure - preserves ROB ID to support out-of-order completion -class RobEntry(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val cmd = new BallDecodeCmd - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -class ROB (implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - // Allocation interface - val alloc = Flipped(new DecoupledIO(new BallDecodeCmd)) - // Externally specified rob_id - val alloc_rob_id = Input(UInt(log2Up(b.rob_entries).W)) - - // Issue interface - issue uncompleted head instruction - val issue = new DecoupledIO(new RobEntry) - - // Completion interface - report instruction completion - val complete = Flipped(new DecoupledIO(UInt(log2Up(b.rob_entries).W))) - - // Commit interface - commit completed head instruction - // val commit = new DecoupledIO(new RobEntry) - - // Status signals - exposed to reservation station for decision making - val empty = Output(Bool()) - val full = Output(Bool()) - // head pointer position - val head_ptr = Output(UInt(log2Up(b.rob_entries).W)) - // Number of issued but uncompleted instructions - val issued_count = Output(UInt(log2Up(b.rob_entries + 1).W)) - // Whether each entry is valid - val entry_valid = Output(Vec(b.rob_entries, Bool())) - // Whether each entry is complete - val entry_complete = Output(Vec(b.rob_entries, Bool())) - }) - - // Circular ROB structure - // Initialize to zero to avoid X states in FPGA - val robEntries = RegInit(VecInit(Seq.fill(b.rob_entries)(0.U.asTypeOf(new RobEntry)))) - // Whether entry is valid - val robValid = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - // Whether entry is issued - val robIssued = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - // Whether entry is complete - val robComplete = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - - // Circular queue pointers - // Points to oldest uncommitted instruction - val headPtr = RegInit(0.U(log2Up(b.rob_entries).W)) - // Points to next position to allocate - val tailPtr = RegInit(0.U(log2Up(b.rob_entries).W)) - // ROB ID circular counter - val robIdCounter = RegInit(0.U(log2Up(b.rob_entries).W)) - - // Number of issued but uncompleted instructions (used to limit issue) - val issuedCount = RegInit(0.U(log2Up(b.rob_entries + 1).W)) - // Maximum issue limit: half of ROB depth - val maxIssueLimit = (b.rob_entries / 2).U - - // Queue status - val isEmpty = headPtr === tailPtr && !robValid(headPtr) - val isFull = headPtr === tailPtr && robValid(headPtr) - val count = Mux(isFull, b.rob_entries.U, - Mux(tailPtr >= headPtr, tailPtr - headPtr, - b.rob_entries.U + tailPtr - headPtr)) - -// ----------------------------------------------------------------------------- -// Inbound - instruction allocation -// ----------------------------------------------------------------------------- - io.alloc.ready := !isFull - - when(io.alloc.fire) { - robEntries(tailPtr).cmd := io.alloc.bits - robEntries(tailPtr).rob_id := robIdCounter - robValid(tailPtr) := true.B - robIssued(tailPtr) := false.B - robComplete(tailPtr) := false.B - - // Update tail pointer and rob_id counter (circular) - tailPtr := Mux(tailPtr === (b.rob_entries - 1).U, 0.U, tailPtr + 1.U) - robIdCounter := Mux(robIdCounter === (b.rob_entries - 1).U, 0.U, robIdCounter + 1.U) - } - -// ----------------------------------------------------------------------------- -// Completion signal processing -// ----------------------------------------------------------------------------- - io.complete.ready := true.B - when(io.complete.fire) { - val completeId = io.complete.bits - robComplete(completeId) := true.B - // When complete, decrement issued count - when(robIssued(completeId)) { - issuedCount := issuedCount - 1.U - } - } - -// ----------------------------------------------------------------------------- -// Outbound - issue instructions in order (starting from head) -// ----------------------------------------------------------------------------- - // Find first valid and unissued instruction starting from head - val canIssue = Wire(Bool()) - val issuePtr = Wire(UInt(log2Up(b.rob_entries).W)) - - // Default values - canIssue := false.B - issuePtr := headPtr - - // Scan from head to find first issuable instruction - val scanValid = Wire(Vec(b.rob_entries, Bool())) - for (i <- 0 until b.rob_entries) { - val ptr = Mux(headPtr + i.U >= b.rob_entries.U, - headPtr + i.U - b.rob_entries.U, - headPtr + i.U) - scanValid(i) := robValid(ptr) && !robIssued(ptr) && !robComplete(ptr) - } - - // Find first issuable position - val firstValid = PriorityEncoder(scanValid.asUInt) - val hasValid = scanValid.asUInt.orR - - val actualIssuePtr = Mux(headPtr + firstValid >= b.rob_entries.U, - headPtr + firstValid - b.rob_entries.U, - headPtr + firstValid) - - // Can only issue if issue limit is not reached - val canIssueMore = issuedCount < maxIssueLimit - canIssue := hasValid && canIssueMore - issuePtr := actualIssuePtr - - io.issue.valid := canIssue - io.issue.bits := robEntries(issuePtr) - - when(io.issue.fire) { - robIssued(issuePtr) := true.B - issuedCount := issuedCount + 1.U - } - -// ----------------------------------------------------------------------------- -// Instruction commit - commit all completed instructions out-of-order -// ----------------------------------------------------------------------------- - // When head instruction completes, automatically commit and move head pointer - // when(robValid(headPtr) && robComplete(headPtr)) { - // robValid(headPtr) := false.B - // robIssued(headPtr) := false.B - // robComplete(headPtr) := false.B - // headPtr := Mux(headPtr === (b.rob_entries - 1).U, 0.U, headPtr + 1.U) - // } // Sequential commit version - - // Commit all completed instructions - for (i <- 0 until b.rob_entries) { - when(robValid(i.U) && robComplete(i.U)) { - robValid(i.U) := false.B - robIssued(i.U) := false.B - robComplete(i.U) := false.B - } - } - - // Update head pointer: skip all completed (about to be cleared) positions - // Find first "valid and incomplete" instruction position starting from head - val nextHeadCandidates = Wire(Vec(b.rob_entries, Bool())) - for (i <- 0 until b.rob_entries) { - val ptr = Mux(headPtr + i.U >= b.rob_entries.U, - headPtr + i.U - b.rob_entries.U, - headPtr + i.U) - // Entry is valid and incomplete (will not be committed) - nextHeadCandidates(i) := robValid(ptr) && !robComplete(ptr) - } - - val hasUncommitted = nextHeadCandidates.asUInt.orR - val nextHeadOffset = PriorityEncoder(nextHeadCandidates.asUInt) - val nextHeadPtr = Mux(headPtr + nextHeadOffset >= b.rob_entries.U, - headPtr + nextHeadOffset - b.rob_entries.U, - headPtr + nextHeadOffset) - - // Update head pointer: - // - If there are uncompleted instructions, move head to first uncompleted position - // - If there are no uncompleted instructions (all complete), move head to tail (ROB is empty) - headPtr := Mux(hasUncommitted, nextHeadPtr, tailPtr) - -// ----------------------------------------------------------------------------- -// Status signals - exposed to reservation station -// ----------------------------------------------------------------------------- - io.empty := isEmpty - io.full := isFull - io.head_ptr := headPtr - io.issued_count := issuedCount - io.entry_valid := robValid - io.entry_complete := robComplete -} diff --git a/arch/src/main/scala/framework/builtin/BaseConfigs.scala b/arch/src/main/scala/framework/builtin/BaseConfigs.scala deleted file mode 100644 index 64298454..00000000 --- a/arch/src/main/scala/framework/builtin/BaseConfigs.scala +++ /dev/null @@ -1,79 +0,0 @@ -package framework.builtin - -import scala.math.{max, pow, sqrt} -import chisel3._ -import chisel3.util._ -import freechips.rocketchip.tile._ -import org.chipsalliance.cde.config._ - -import framework.memdomain.dma.LocalAddr - -sealed abstract trait BuckyballMemCapacity -case class CapacityInKilobytes(kilobytes: Int) extends BuckyballMemCapacity -case class CapacityInVectors(vectors: Int) extends BuckyballMemCapacity -// case class CapacityInMatrices(matrices: Int) extends BuckyballMemCapacity - -case class BaseConfig( - opcodes: OpcodeSet = OpcodeSet.custom3, - - inputType: Data, - accType: Data, - - veclane: Int = 16, - accveclane: Int = 4, - - tlb_size: Int = 4, - // Number of RoB entries - rob_entries: Int = 16, - // Whether reservation station responds out-of-order (false = wait for ROB to be empty before responding) - rs_out_of_order_response: Boolean = true, - - // Unused - dma_maxbytes: Int = 64, - dma_buswidth: Int = 128, - - sp_banks: Int = 4, - acc_banks: Int = 8, - - sp_singleported: Boolean = true, - - sp_capacity: BuckyballMemCapacity = CapacityInKilobytes(256), - acc_capacity: BuckyballMemCapacity = CapacityInKilobytes(64), - - max_in_flight_mem_reqs: Int = 16, // Unused - aligned_to: Int = 1, - spad_read_delay: Int = 0, - - // Index length supporting SPAD (16384 rows) + ACC (4096 rows) - spAddrLen: Int = 15, - // Index length for 4GB - memAddrLen: Int = 32, - - // Number of vector PEs per thread - numVecPE: Int = 16, - // Number of vector threads per thread - numVecThread: Int = 16, - - // Empty ball id - emptyBallid: Int = 5, - -) { - val spad_w = veclane * inputType.getWidth - val spad_mask_len = (spad_w / (aligned_to * 8)) max 1 - val spad_bank_entries = sp_capacity match { - case CapacityInKilobytes(kb) => kb * 1024 * 8 / (sp_banks * spad_w) - case CapacityInVectors(vs) => vs * veclane / sp_banks - } - val acc_w = accveclane * accType.getWidth - val acc_mask_len = (acc_w / (aligned_to * 8)) max 1 - val acc_bank_entries = acc_capacity match { - case CapacityInKilobytes(kb) => kb * 1024 * 8 / (acc_banks * acc_w) - case CapacityInVectors(vs) => vs * accveclane / acc_banks - } - - - val local_addr_t = new LocalAddr(sp_banks, spad_bank_entries, acc_banks, acc_bank_entries) - - - -} diff --git a/arch/src/main/scala/framework/builtin/README.md b/arch/src/main/scala/framework/builtin/README.md deleted file mode 100644 index de47673d..00000000 --- a/arch/src/main/scala/framework/builtin/README.md +++ /dev/null @@ -1,340 +0,0 @@ -# Buckyball Built-in Component Library - -## Overview - -This directory contains the built-in hardware component implementations of the Buckyball framework, providing standardized and reusable hardware modules. Located at `arch/src/main/scala/framework/builtin`, it serves as the component library layer, offering verified hardware building blocks for upper-level systems. - -Main component modules: -- **memdomain**: Memory domain components, including storage and DMA engines -- **frontend**: Frontend processing components for instruction decode and scheduling -- **util**: Framework-level utility functions -- **BaseConfigs.scala**: Base configuration definitions - -## Code Structure - -``` -builtin/ -├── BaseConfigs.scala - Base configuration parameter definitions -├── memdomain/ - Memory domain implementation -│ ├── dma/ - DMA engines (BBStreamReader/Writer) -│ ├── mem/ - Memory components (Scratchpad, Accumulator) -│ ├── rs/ - Memory domain reservation station -│ ├── tlb/ - TLB implementation -│ ├── MemController.scala - Memory controller -│ ├── MemDomain.scala - Memory domain top-level -│ ├── MemLoader.scala - Load instruction handler -│ ├── MemStorer.scala - Store instruction handler -│ └── DomainDecoder.scala - Memory domain decoder -├── frontend/ - Frontend components -│ ├── GobalDecoder.scala - Global instruction decoder -│ ├── globalrs/ - Global reservation station -│ │ ├── GlobalReservationStation.scala -│ │ └── GlobalROB.scala - Global reorder buffer -│ └── rs/ - Ball domain reservation station -│ ├── reservationStation.scala -│ └── rob.scala -└── util/ - Utility function library -``` - -### Module Dependencies - -``` -Configuration Layer (BaseConfigs.scala) - ↓ -Component Layer (memdomain, frontend, util) - ↓ -Application Layer (examples, prototypes) -``` - -**BaseConfigs.scala** (Configuration Base Layer) -- Defines base configuration parameters for all built-in components -- Provides default configuration and parameter validation -- Referenced by all sub-modules as configuration source - -**memdomain/** (Memory Subsystem) -- Depends on BaseConfigs for memory-related configuration -- Implements storage, DMA, address management, etc. -- Provides memory access services for other components - -**frontend/** (Frontend Processing) -- Uses frontend configuration parameters from BaseConfigs -- Implements instruction fetch, decode, and scheduling -- Tightly integrated with processor core - -**util/** (Utility Library) -- Provides common hardware design utility functions -- Widely used by other components -- Independent of specific configuration parameters - -## Module Details - -### BaseConfigs.scala - -**Main Function**: Define base configuration parameters and defaults for built-in components - -**Key Components**: - -```scala -case class BaseConfig( - opcodes: OpcodeSet = OpcodeSet.custom3, - - inputType: Data, // Input data type - accType: Data, // Accumulator data type - - veclane: Int = 16, // Vector lane width - accveclane: Int = 4, // Accumulator vector lane - - tlb_size: Int = 4, // TLB size - rob_entries: Int = 16, // Number of ROB entries - rs_out_of_order_response: Boolean = true, // Out-of-order response support - - dma_maxbytes: Int = 64, // Unused - dma_buswidth: Int = 128, // DMA bus width - - sp_banks: Int = 4, // Scratchpad bank count - acc_banks: Int = 8, // Accumulator bank count - - sp_capacity: BuckyballMemCapacity = CapacityInKilobytes(256), - acc_capacity: BuckyballMemCapacity = CapacityInKilobytes(64), - - spAddrLen: Int = 15, // SPAD address length - memAddrLen: Int = 32, // Memory address length - - numVecPE: Int = 16, // Vector PEs per thread - numVecThread: Int = 16, // Vector threads - - emptyBallid: Int = 5 // Empty ball ID -) -``` - -**Configuration Parameters**: -- **Memory Domain**: Bank counts, capacities, address lengths -- **Frontend**: ROB entries, out-of-order response -- **Vector Unit**: PE count, thread count, lane width -- **Data Types**: Input and accumulator data types - -**Parameter Validation**: -```scala -require(sp_banks > 0, "SP banks must be positive") -require(acc_banks > 0, "ACC banks must be positive") -require(rob_entries > 0 && isPow2(rob_entries), "ROB entries must be power of 2") -``` - -**Input/Output**: -- Input: User-defined configuration overrides -- Output: Validated complete configuration parameters -- Edge cases: Parameter conflict detection and error reporting - -### memdomain/ Submodule - -**Main Function**: Implement complete memory domain functionality - -**Key Components**: -- **MemDomain.scala**: Memory domain top-level module - - Integrates all memory subsystem components - - Provides unified external interface - - Manages DMA and Ball domain access coordination - -- **MemController.scala**: Memory controller - - Encapsulates Scratchpad and Accumulator - - Dual-port design for DMA and Ball domain - - Bank arbitration and routing - -- **MemLoader.scala**: Load instruction handler - - Receives load instructions from reservation station - - Issues DMA read requests - - Writes data to Scratchpad/Accumulator - -- **MemStorer.scala**: Store instruction handler - - Reads data from Scratchpad/Accumulator - - Issues DMA write requests with alignment - - Handles byte masking - -- **dma/**: DMA engines - - **BBStreamReader**: Streaming read with TLB support - - **BBStreamWriter**: Streaming write with alignment - - Transaction ID management - -- **mem/**: Memory components - - **Scratchpad**: Multi-bank scratchpad memory - - **AccBank**: Accumulator bank with accumulation pipeline - - **SramBank**: Generic SRAM bank - -- **tlb/**: Translation Lookaside Buffer - - Virtual to physical address translation - - Integrated with DMA engines - -- **rs/**: Memory domain reservation station - - FIFO-based instruction scheduler - - Local ROB for memory instructions - -**Interface Definition**: -```scala -class MemDomainIO(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - // From Global RS - val issue = Flipped(Decoupled(new MemRsIssue)) - - // To Global RS - val complete = Decoupled(new MemRsComplete) - - // Ball domain SRAM interface - val sramRead = Vec(sp_banks, Flipped(new SramReadIO)) - val sramWrite = Vec(sp_banks, Flipped(new SramWriteIO)) - val accRead = Vec(acc_banks, Flipped(new SramReadIO)) - val accWrite = Vec(acc_banks, Flipped(new SramWriteIO)) - - // DMA interface - val dma = new Bundle { - val read = Decoupled(new BBReadRequest) - val write = Decoupled(new BBWriteRequest) - } - - // TLB interface - val tlb = Vec(2, new BBTLBIO) - val ptw = Vec(2, Flipped(new TLBPTWIO)) - val tlbExp = Output(Vec(2, new BBTLBExceptionIO)) -} -``` - -### frontend/ Submodule - -**Main Function**: Implement processor frontend functionality for instruction decode and scheduling - -**Core Components**: -- **GobalDecoder.scala**: Global instruction decoder - - Classifies instructions into Ball/Memory/Fence types - - Constructs PostGDCmd for domain-specific decoders - - Interfaces with Global RS - -- **globalrs/**: Global reservation station - - **GlobalReservationStation.scala**: Central instruction manager - - Allocates ROB entries - - Issues to Ball and Memory domains - - Handles completion from both domains - - Manages Fence synchronization - - **GlobalROB.scala**: Global reorder buffer - - Tracks instruction state across domains - - Supports out-of-order completion - - Sequential commit - -- **rs/**: Ball domain reservation station - - **reservationStation.scala**: Ball-specific scheduler - - **rob.scala**: Local ROB for Ball instructions - -**Data Flow**: -``` -RoCC → Global Decoder → Global RS → Ball Domain / Mem Domain - ↓ ↓ ↓ - Global ROB Ball Decoder Mem Decoder - (tracks state) ↓ ↓ - Ball Devices Loader/Storer -``` - -### util/ Submodule - -**Main Function**: Provide common utility functions - -**Utility Categories**: -- Mathematical operation tools -- Interface conversion tools -- Debug and monitoring tools -- Common hardware patterns - -## Usage Guide - -### Configuration Usage - -**Basic Configuration Inheritance**: -```scala -class MySystemConfig extends Config( - new BaseConfig ++ - new WithCustomMemDomain(spBanks = 8) ++ - new WithCustomFrontend(robEntries = 32) -) -``` - -**Parameter Access**: -```scala -class MyModule(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spBanks = b.sp_banks - val accBanks = b.acc_banks - val robEntries = b.rob_entries -} -``` - -### Extension Development - -**Adding New Components**: -1. Create new module in corresponding subdirectory -2. Add configuration parameters in BaseConfigs.scala -3. Implement standard Module interface -4. Add corresponding test cases -5. Update documentation - -**Custom Configuration**: -```scala -case class MyComponentParams( - param1: Int = 16, - param2: Boolean = true -) - -class WithMyComponent(param1: Int, param2: Boolean) extends Config((site, here, up) => { - case MyComponentKey => MyComponentParams(param1, param2) -}) -``` - -### Important Notes - -1. **Configuration Consistency**: Ensure related component configurations are compatible -2. **Resource Constraints**: Pay attention to reasonable hardware resource allocation -3. **Timing Optimization**: Focus on timing paths across components -4. **Interface Standards**: Follow unified interface design specifications -5. **Test Coverage**: Provide sufficient test cases for each component -6. **Memory Access**: Respect bank access constraints (op1 and op2 cannot access same bank) -7. **ROB Management**: Coordinate between Global ROB and local ROBs - -## Architecture Highlights - -### Instruction Pipeline -``` -RoCC Interface - ↓ -Global Decoder (classify instruction type) - ↓ -Global RS (with ROB) ← tracks all in-flight instructions - ↓ ↓ -Ball Domain Mem Domain - ↓ ↓ -Ball Devices Loader/Storer - ↓ ↓ - MemController - ↓ ↓ -Scratchpad Accumulator -``` - -### Memory Architecture -``` -Main Memory - ↓ (DMA + TLB) -MemController -├─→ Scratchpad (4 banks × 64KB = 256KB) -└─→ Accumulator (8 banks × 8KB = 64KB) - ↑ -Ball Devices (read/write access) -``` - -## Performance Considerations - -1. **ROB Depth**: 16 entries support up to 16 in-flight instructions -2. **Memory Banks**: 4 scratchpad + 8 accumulator banks enable parallel access -3. **Out-of-Order Execution**: Global RS supports OOO when enabled -4. **DMA Bandwidth**: 128-bit bus provides high memory throughput -5. **Pipeline Depth**: Multi-stage pipeline for high clock frequency - -## Related Documentation - -- [Memory Domain Details](memdomain/README.md) - Memory subsystem implementation -- [Frontend Components](frontend/README.md) - Instruction decode and scheduling -- [DMA Engines](memdomain/dma/README.md) - DMA implementation -- [TLB Management](memdomain/tlb/README.md) - Address translation -- [Framework Overview](../README.md) - Upper-level architecture diff --git a/arch/src/main/scala/framework/builtin/pipeline/Pipeline.scala b/arch/src/main/scala/framework/builtin/pipeline/Pipeline.scala new file mode 100644 index 00000000..783612bb --- /dev/null +++ b/arch/src/main/scala/framework/builtin/pipeline/Pipeline.scala @@ -0,0 +1,86 @@ +package framework.builtin.pipeline + +import chisel3._ +import chisel3.util._ + +class Pipeline[T <: Data](gen: T, latency: Int)(comb: Seq[T => T] = Seq.fill(latency + 1)((x: T) => x)) extends Module { + + val io = IO(new Bundle { + val in = Flipped(Decoupled(gen)) + val out = Decoupled(gen) + val busy = Output(Bool()) + }) + + require(comb.size == latency + 1, "length of combinational is incorrect") + + if (latency == 0) { + io.in.ready := io.out.ready + io.out.valid := io.in.valid + io.out.bits := comb.head(io.in.bits) + io.busy := io.in.valid + } else { + val stages = Reg(Vec(latency, gen)) + val valids = RegInit(VecInit(Seq.fill(latency)(false.B))) + val stalling = VecInit(Seq.fill(latency)(false.B)) + io.busy := io.in.valid || valids.reduce(_ || _) + + // Stall signals + io.in.ready := !stalling.head + stalling.last := valids.last && !io.out.ready + (stalling.init, stalling.tail, valids.init).zipped.foreach { + case (s1, s2, v1) => + s1 := v1 && s2 + } + + // Valid signals + // When the pipeline stage ahead of you isn't stalling, then make yourself invalid + io.out.valid := valids.last + when(io.out.ready) { + valids.last := false.B + } + (valids.init, stalling.tail).zipped.foreach { + case (v1, s2) => + when(!s2) { + v1 := false.B + } + } + // When the pipeline stage behind you is valid then become true + when(io.in.fire) { + valids.head := true.B + } + (valids.tail, valids.init).zipped.foreach { + case (v2, v1) => + when(v1) { + v2 := true.B + } + } + + // Stages + when(io.in.fire) { + stages.head := comb.head(io.in.bits) + } + io.out.bits := comb.last(stages.last) + ((stages.tail zip stages.init) zip (stalling.tail zip comb.tail.init)).foreach { + case ((st2, st1), (s2, c1)) => + when(!s2) { + st2 := c1(st1) + } + } + } +} + +object Pipeline { + + def apply[T <: Data](in: ReadyValidIO[T], latency: Int, comb: Seq[T => T]): DecoupledIO[T] = { + val p = Module(new Pipeline(in.bits.cloneType, latency)(comb)) + p.io.in <> in + p.io.out + } + + def apply[T <: Data](in: ReadyValidIO[T], latency: Int): DecoupledIO[T] = { + val p = Module(new Pipeline(in.bits.cloneType, latency)()) + p.io.in <> in + p.io.out + } + +} diff --git a/arch/src/main/scala/framework/builtin/util/README.md b/arch/src/main/scala/framework/builtin/util/README.md deleted file mode 100644 index b748af2d..00000000 --- a/arch/src/main/scala/framework/builtin/util/README.md +++ /dev/null @@ -1,225 +0,0 @@ -# Framework Utility Library - -## Overview - -This directory contains framework-level utility functions and helper modules for Buckyball, providing general-purpose hardware design tools. Located in `arch/src/main/scala/framework/builtin/util`, it serves as a utility layer, offering reusable hardware building blocks and utility functions for other framework components. - -Main utility categories: -- Mathematical operations and bit manipulation tools -- Interface conversion and adapters -- Debug and performance monitoring tools -- Common hardware pattern implementations - -## Code Structure - -``` -util/ -└── (specific utility files to be analyzed) -``` - -### Utility Categories - -**Math Tools** -- Bit width calculation and logarithm functions -- Numeric conversion and formatting -- Optimized arithmetic operation implementations - -**Interface Tools** -- Protocol conversion adapters -- Signal synchronization and clock domain crossing -- Handshake protocol implementations - -**Debug Tools** -- Performance counter templates -- Debug signal output -- State monitoring interfaces - -## Module Description - -### Mathematical Operations Tools - -**Main functionality**: Provides common mathematical operations and bit manipulation functions - -**Key functions**: - -```scala -object MathUtils { - // Calculate log2 ceiling - def log2Ceil(x: Int): Int = { - require(x > 0) - (log(x) / log(2)).ceil.toInt - } - - // Check if power of 2 - def isPow2(x: Int): Boolean = x > 0 && (x & (x - 1)) == 0 - - // Calculate smallest power of 2 >= x - def nextPow2(x: Int): Int = { - if (isPow2(x)) x else 1 << log2Ceil(x) - } -} -``` - -**Bit manipulation tools**: -```scala -object BitUtils { - // Bit reversal - def reverseBits(data: UInt, width: Int): UInt = { - VecInit((0 until width).map(i => data(i))).asUInt - } - - // Hamming weight (count of 1s) - def popCount(data: UInt): UInt = { - PopCount(data) - } - - // Leading zero count - def leadingZeros(data: UInt, width: Int): UInt = { - PriorityEncoder(Reverse(data)) - } -} -``` - -### Interface Conversion Tools - -**Main functionality**: Provides common interface conversion and adaptation - -**Protocol converters**: -```scala -class DecoupledToValid[T <: Data](gen: T) extends Module { - val io = IO(new Bundle { - val in = Flipped(Decoupled(gen)) - val out = Valid(gen) - }) - - io.out.valid := io.in.valid - io.out.bits := io.in.bits - io.in.ready := true.B -} - -class ValidToDecoupled[T <: Data](gen: T) extends Module { - val io = IO(new Bundle { - val in = Flipped(Valid(gen)) - val out = Decoupled(gen) - }) - - io.out.valid := io.in.valid - io.out.bits := io.in.bits -} -``` - -**Clock domain crossing**: -```scala -class AsyncFIFO[T <: Data](gen: T, depth: Int) extends Module { - val io = IO(new Bundle { - val enq_clock = Input(Clock()) - val enq_reset = Input(Bool()) - val enq = Flipped(Decoupled(gen)) - - val deq_clock = Input(Clock()) - val deq_reset = Input(Bool()) - val deq = Decoupled(gen) - }) - - // Async FIFO implementation - // Uses Gray code pointers to avoid metastability -} -``` - -### Debug and Monitoring Tools - -**Main functionality**: Provides general debugging and performance monitoring tools - -**Performance counter**: -```scala -class PerfCounter(name: String) extends Module { - val io = IO(new Bundle { - val inc = Input(Bool()) - val value = Output(UInt(64.W)) - }) - - val counter = RegInit(0.U(64.W)) - when(io.inc) { - counter := counter + 1.U - } - io.value := counter - - // Optional debug output - when(io.inc) { - printf(s"[PerfCounter] $name: %d\n", counter + 1.U) - } -} -``` - -**Debug signal output**: -```scala -object DebugUtils { - def debugPrint(cond: Bool, fmt: String, args: Bits*): Unit = { - when(cond) { - printf(fmt, args: _*) - } - } - - def assert(cond: Bool, msg: String): Unit = { - chisel3.assert(cond, msg) - } - - def cover(cond: Bool, msg: String): Unit = { - chisel3.cover(cond, msg) - } -} -``` - -## Usage - -### Usage Examples - -**Using math tools**: -```scala -import util.MathUtils._ - -class MyModule extends Module { - val addrBits = log2Ceil(entries) - val bankSize = nextPow2(requestedSize) - - require(isPow2(bankSize), "Bank size must be power of 2") -} -``` - -**Using interface conversion**: -```scala -val converter = Module(new DecoupledToValid(UInt(32.W))) -converter.io.in <> some_decoupled_signal -val valid_signal = converter.io.out -``` - -**Using performance monitoring**: -```scala -val hit_counter = Module(new PerfCounter("cache_hits")) -hit_counter.io.inc := cache_hit - -val miss_counter = Module(new PerfCounter("cache_misses")) -miss_counter.io.inc := cache_miss -``` - -### Extension Development - -**Adding new tools**: -1. Determine utility's generality and reuse value -2. Implement standard Chisel module interfaces -3. Add sufficient parameterization support -4. Provide usage examples and test cases - -**Tool design principles**: -- Keep interfaces concise and clear -- Support parameterized configuration -- Provide good error checking -- Consider hardware implementation efficiency - -### Notes - -1. **Hardware overhead**: Utility functions should consider hardware implementation costs -2. **Timing impact**: Avoid using complex tools on critical paths -3. **Parameter validation**: Perform sufficient parameter checking at compile time -4. **Complete documentation**: Provide clear usage instructions for each tool -5. **Test coverage**: Ensure correctness and boundary case handling of utility functions diff --git a/arch/src/main/scala/framework/builtin/util/Util.scala b/arch/src/main/scala/framework/builtin/util/Util.scala deleted file mode 100644 index 513b5b1a..00000000 --- a/arch/src/main/scala/framework/builtin/util/Util.scala +++ /dev/null @@ -1,157 +0,0 @@ -package framework.builtin.util - -import chisel3._ -import chisel3.util._ - -object Util { - def wrappingAdd(u: UInt, n: UInt, max_plus_one: Int): UInt = { - val max = max_plus_one - 1 - if (max == 0) { - 0.U - } else { - assert(n <= max.U, "cannot wrapAdd when n is larger than max") - Mux(u >= max.U - n + 1.U && n =/= 0.U, n - (max.U - u) - 1.U, u + n) - } - } - - def wrappingAdd(u: UInt, n: UInt, max_plus_one: UInt, en: Bool = true.B): UInt = { - val max = max_plus_one - 1.U - assert(n <= max || max === 0.U, "cannot wrapAdd when n is larger than max, unless max is 0") - - /* - Mux(!en, u, - Mux (max === 0.U, 0.U, - Mux(u >= max - n + 1.U && n =/= 0.U, n - (max - u) - 1.U, u + n))) - */ - - MuxCase(u + n, Seq( - (!en) -> u, - (max === 0.U) -> 0.U, - (u >= max - n + 1.U && n =/= 0.U) -> (n - (max - u) - 1.U) - )) - } - - def satAdd(u: UInt, v: UInt, max: UInt): UInt = { - Mux(u +& v > max, max, u + v) - } - - def floorAdd(u: UInt, n: UInt, max_plus_one: UInt, en: Bool = true.B): UInt = { - val max = max_plus_one - 1.U - - MuxCase(u + n, Seq( - (!en) -> u, - ((u +& n) > max) -> 0.U - )) - } - - def sFloorAdd(s: SInt, n: UInt, max_plus_one: SInt, min: SInt, en: Bool = true.B): SInt = { - val max = max_plus_one - 1.S - - MuxCase(s + n.zext, Seq( - (!en) -> s, - ((s +& n.zext) > max) -> min - )) - } - - def wrappingSub(u: UInt, n: UInt, max_plus_one: Int): UInt = { - val max = max_plus_one - 1 - assert(n <= max.U, "cannot wrapSub when n is larger than max") - Mux(u < n, max.U - (n-u) + 1.U, u - n) - } - - def ceilingDivide(numer: Int, denom: Int): Int = { - if (numer % denom == 0) { numer / denom } - else { numer / denom + 1} - } - - def closestLowerPowerOf2(u: UInt): UInt = { - // TODO figure out a more efficient way of doing this. Is this many muxes really necessary? - val exp = u.asBools.zipWithIndex.map { case (b, i) => - Mux(b, i.U, 0.U) - }.reduce((acc, u) => Mux(acc > u, acc, u)) - - (1.U << exp).asUInt - } - - def closestAlignedLowerPowerOf2(u: UInt, addr: UInt, stride: UInt, rowBytes: Int): UInt = { - val lgRowBytes = log2Ceil(rowBytes) - - // TODO figure out a more efficient way of doing this. Is this many muxes really necessary? - val exp = u.asBools.zipWithIndex.map { case (b, i) => - Mux(b && addr(i + lgRowBytes - 1, 0) === 0.U && stride(i + lgRowBytes - 1, 0) === 0.U, i.U, 0.U) - }.reduce((acc, u) => Mux(acc > u, acc, u)) - - (1.U << exp).asUInt - } - - // This function will return "next" with a 0-cycle delay when the "enable" signal is high. It's like a queue with - // the "pipe" and "flow" parameters set to "true" - def RegEnableThru[T <: Data](next: T, enable: Bool): T = { - val buf = RegEnable(next, enable) - Mux(enable, next, buf) - } - - def RegEnableThru[T <: Data](next: T, init: T, enable: Bool): T = { - val buf = RegEnable(next, init, enable) - Mux(enable, next, buf) - } - - def maxOf(u1: UInt, u2: UInt): UInt = { - Mux(u1 > u2, u1, u2) - } - - // def maxOf[T <: Data](x: T, y: T)(implicit ev: Arithmetic[T]): T = { - // import ev._ - // Mux(x > y, x, y) - // } - - def minOf(u1: UInt, u2: UInt): UInt = { - Mux(u1 < u2, u1, u2) - } - - // def accumulateTree[T <: Data](xs: Seq[T])(implicit ev: Arithmetic[T]): T = { - def accumulateTree(xs: Seq[UInt]): UInt = { - - assert(xs.nonEmpty, "can't accumulate 0 elements") - - if (xs.length == 1) { - xs.head - } else { - val upperRowLen = 1 << log2Ceil(xs.length) - val upperRow = xs.padTo(upperRowLen, 0.U(xs.head.getWidth.W)) - val pairs = upperRow.grouped(2) - val lowerRow = pairs.map { case Seq(a, b) => a + b } - accumulateTree(lowerRow.toSeq) - } - } - - // An undirectioned Valid bundle - class UDValid[T <: Data](t: T) extends Bundle { - val valid = Bool() - val bits = t.cloneType - - def push(b: T): Unit = { - valid := true.B - bits := b - } - - def pop(dummy: Int = 0): T = { - valid := false.B - bits - } - - } - - object UDValid { - def apply[T <: Data](t: T): UDValid[T] = new UDValid(t) - } - - // creates a Reg and the next-state Wire, and returns both - def regwire(bits: Int) = { - val wire = Wire(UInt(bits.W)) - val reg = RegNext(wire) - wire := reg // default wire to read from reg - (reg, wire) - } - -} diff --git a/arch/src/main/scala/framework/builtin/utils/Util.scala b/arch/src/main/scala/framework/builtin/utils/Util.scala new file mode 100644 index 00000000..e0fd4dca --- /dev/null +++ b/arch/src/main/scala/framework/builtin/utils/Util.scala @@ -0,0 +1,182 @@ +package framework.builtin.utils + +import chisel3._ +import chisel3.util._ + +object Util { + + def wrappingAdd(u: UInt, n: UInt, max_plus_one: Int): UInt = { + val max = max_plus_one - 1 + if (max == 0) { + 0.U + } else { + assert(n <= max.U, "cannot wrapAdd when n is larger than max") + Mux(u >= max.U - n + 1.U && n =/= 0.U, n - (max.U - u) - 1.U, u + n) + } + } + + def wrappingAdd( + u: UInt, + n: UInt, + max_plus_one: UInt, + en: Bool = true.B + ): UInt = { + val max = max_plus_one - 1.U + assert(n <= max || max === 0.U, "cannot wrapAdd when n is larger than max, unless max is 0") + + /* Mux(!en, u, Mux (max === 0.U, 0.U, Mux(u >= max - n + 1.U && n =/= 0.U, n - (max - u) - 1.U, u + n))) */ + + MuxCase( + u + n, + Seq( + (!en) -> u, + (max === 0.U) -> 0.U, + (u >= max - n + 1.U && n =/= 0.U) -> (n - (max - u) - 1.U) + ) + ) + } + + def satAdd(u: UInt, v: UInt, max: UInt): UInt = + Mux(u +& v > max, max, u + v) + + def floorAdd( + u: UInt, + n: UInt, + max_plus_one: UInt, + en: Bool = true.B + ): UInt = { + val max = max_plus_one - 1.U + + MuxCase( + u + n, + Seq( + (!en) -> u, + ((u +& n) > max) -> 0.U + ) + ) + } + + def sFloorAdd( + s: SInt, + n: UInt, + max_plus_one: SInt, + min: SInt, + en: Bool = true.B + ): SInt = { + val max = max_plus_one - 1.S + + MuxCase( + s + n.zext, + Seq( + (!en) -> s, + ((s +& n.zext) > max) -> min + ) + ) + } + + def wrappingSub(u: UInt, n: UInt, max_plus_one: Int): UInt = { + val max = max_plus_one - 1 + assert(n <= max.U, "cannot wrapSub when n is larger than max") + Mux(u < n, max.U - (n - u) + 1.U, u - n) + } + + def ceilingDivide(numer: Int, denom: Int): Int = + if (numer % denom == 0) { numer / denom } + else { numer / denom + 1 } + + def closestLowerPowerOf2(u: UInt): UInt = { + // TODO figure out a more efficient way of doing this. Is this many muxes really necessary? + val exp = u.asBools.zipWithIndex.map { + case (b, i) => + Mux(b, i.U, 0.U) + }.reduce((acc, u) => Mux(acc > u, acc, u)) + + (1.U << exp).asUInt + } + + def closestAlignedLowerPowerOf2( + u: UInt, + addr: UInt, + stride: UInt, + rowBytes: Int + ): UInt = { + val lgRowBytes = log2Ceil(rowBytes) + + // TODO figure out a more efficient way of doing this. Is this many muxes really necessary? + val exp = u.asBools.zipWithIndex.map { + case (b, i) => + Mux(b && addr(i + lgRowBytes - 1, 0) === 0.U && stride(i + lgRowBytes - 1, 0) === 0.U, i.U, 0.U) + }.reduce((acc, u) => Mux(acc > u, acc, u)) + + (1.U << exp).asUInt + } + + // This function will return "next" with a 0-cycle delay when the "enable" signal is high. It's like a queue with + // the "pipe" and "flow" parameters set to "true" + def RegEnableThru[T <: Data](next: T, enable: Bool): T = { + val buf = RegEnable(next, enable) + Mux(enable, next, buf) + } + + def RegEnableThru[T <: Data](next: T, init: T, enable: Bool): T = { + val buf = RegEnable(next, init, enable) + Mux(enable, next, buf) + } + + def maxOf(u1: UInt, u2: UInt): UInt = + Mux(u1 > u2, u1, u2) + + // def maxOf[T <: Data](x: T, y: T)(implicit ev: Arithmetic[T]): T = { + // import ev._ + // Mux(x > y, x, y) + // } + + def minOf(u1: UInt, u2: UInt): UInt = + Mux(u1 < u2, u1, u2) + + // def accumulateTree[T <: Data](xs: Seq[T])(implicit ev: Arithmetic[T]): T = { + def accumulateTree(xs: Seq[UInt]): UInt = { + + assert(xs.nonEmpty, "can't accumulate 0 elements") + + if (xs.length == 1) { + xs.head + } else { + val upperRowLen = 1 << log2Ceil(xs.length) + val upperRow = xs.padTo(upperRowLen, 0.U(xs.head.getWidth.W)) + val pairs = upperRow.grouped(2) + val lowerRow = pairs.map { case Seq(a, b) => a + b } + accumulateTree(lowerRow.toSeq) + } + } + + // An undirectioned Valid bundle + class UDValid[T <: Data](t: T) extends Bundle { + val valid = Bool() + val bits = t.cloneType + + def push(b: T): Unit = { + valid := true.B + bits := b + } + + def pop(dummy: Int = 0): T = { + valid := false.B + bits + } + + } + + object UDValid { + def apply[T <: Data](t: T): UDValid[T] = new UDValid(t) + } + + // creates a Reg and the next-state Wire, and returns both + def regwire(bits: Int) = { + val wire = Wire(UInt(bits.W)) + val reg = RegNext(wire) + wire := reg // default wire to read from reg + (reg, wire) + } + +} diff --git a/arch/src/main/scala/framework/core/bbtile/BBTile.scala b/arch/src/main/scala/framework/core/bbtile/BBTile.scala new file mode 100644 index 00000000..0cc82421 --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/BBTile.scala @@ -0,0 +1,472 @@ +package framework.core.bbtile + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{Instance, Instantiate} + +import org.chipsalliance.cde.config._ +import org.chipsalliance.diplomacy.lazymodule._ + +import freechips.rocketchip.rocket._ +import freechips.rocketchip.tile._ +import freechips.rocketchip.devices.tilelink.{BasicBusBlocker, BasicBusBlockerParams} +import freechips.rocketchip.diplomacy.{AddressSet, BufferParams, DisableMonitors} +import freechips.rocketchip.resources.{ + Description, + Resource, + ResourceAddress, + ResourceAnchors, + ResourceBinding, + ResourceBindings, + SimpleDevice +} +import freechips.rocketchip.interrupts.IntIdentityNode +import freechips.rocketchip.tilelink.{ + TLBuffer, + TLClientNode, + TLClientParameters, + TLIdentityNode, + TLMasterPortParameters, + TLWidthWidget, + TLXbar +} +import freechips.rocketchip.subsystem.HierarchicalElementCrossingParamsLike +import freechips.rocketchip.prci.{ClockCrossingType, ClockSinkParameters, RationalCrossing} +import freechips.rocketchip.util.{Annotated, InOrderArbiter} +import freechips.rocketchip.util.BooleanToAugmentedBoolean + +import framework.top.GlobalConfig +import framework.core.bbtile.id.RVVRoCCDecode +import framework.memdomain.backend.MemRequestIO +import framework.memdomain.backend.shared.SharedMemBackend +import framework.memdomain.frontend.outside_channel.MemConfigerIO + +/** + * BBTile — a composable tile containing Rocket core(s) + optional Buckyball accelerator(s). + * + * When nCores=1 (default), behaviour is identical to the original single-core tile. + * When nCores>1, the tile contains N (RocketBB + BuckyballAccelerator) pairs that share + * a single SharedMemBackend and BarrierUnit. + * + * The trait-provided DCache/ICache/PTW serve core-0. Cores 1..N-1 are wired entirely + * inside BBTileModuleImp (no extra diplomacy DCache/ICache — they share core-0's cache + * hierarchy via the same TL xbar for now; independent caches are a future enhancement). + */ +class BBTile private ( + val bbParams: BBTileParams, + crossing: ClockCrossingType, + lookup: LookupByHartIdImpl, + q: Parameters) + extends BaseTile(bbParams, crossing, lookup, q) + with SinksExternalInterrupts + with SourcesExternalNotifications + with HasHellaCache + with HasICacheFrontend { + + def this( + params: BBTileParams, + crossing: HierarchicalElementCrossingParamsLike, + lookup: LookupByHartIdImpl + )( + implicit p: Parameters + ) = + this(params, crossing.crossingType, lookup, BBTile.injectBuildRoCC(p, params.withBuckyball)) + + val nCores = bbParams.nCores + + // RoCC CSRs — Buckyball doesn't use custom CSRs, so this is always empty + val roccCSRs: Seq[Seq[CustomCSR]] = Nil + + // --------------------------------------------------------------------------- + // Diplomacy nodes — tile boundary + // --------------------------------------------------------------------------- + val intOutwardNode = bbParams.beuAddr.map(_ => IntIdentityNode()) + val slaveNode = TLIdentityNode() + val masterNode = visibilityNode + + // Scratchpad (DTIM) + val dtim_adapter = bbParams.dcache.flatMap { d => + d.scratch.map { s => + LazyModule(new ScratchpadSlavePort( + AddressSet.misaligned(s, d.dataScratchpadBytes), + lazyCoreParamsView.coreDataBytes, + bbParams.core.useAtomics && !bbParams.core.useAtomicsOnlyForIO + )) + } + } + + dtim_adapter.foreach(lm => connectTLSlave(lm.node, lm.node.portParams.head.beatBytes)) + + // Bus error unit + val bus_error_unit = bbParams.beuAddr.map { a => + val beu = LazyModule(new BusErrorUnit(new L1BusErrors, BusErrorUnitParams(a), xLen / 8)) + intOutwardNode.get := beu.intNode + connectTLSlave(beu.node, xBytes) + beu + } + + // Master port blocker + val tile_master_blocker = + bbParams.blockerCtrlAddr + .map(BasicBusBlockerParams(_, xBytes, masterPortBeatBytes, deadlock = true)) + .map(bp => LazyModule(new BasicBusBlocker(bp))) + + tile_master_blocker.foreach(lm => connectTLSlave(lm.controlNode, xBytes)) + + // --------------------------------------------------------------------------- + // Buckyball accelerator TileLink nodes (diplomacy layer) — N pairs of DMA + // --------------------------------------------------------------------------- + val bbConfig = bbParams.buckyballConfig + + val bb_reader_nodes: Seq[Option[TLClientNode]] = (0 until nCores).map { i => + if (bbParams.withBuckyball) Some(TLClientNode(Seq(TLMasterPortParameters.v1(Seq(TLClientParameters( + name = s"bb-dma-reader-$i", + sourceId = freechips.rocketchip.diplomacy.IdRange(0, bbConfig.memDomain.dma_n_xacts) + )))))) + else None + } + + val bb_writer_nodes: Seq[Option[TLClientNode]] = (0 until nCores).map { i => + if (bbParams.withBuckyball) Some(TLClientNode(Seq(TLMasterPortParameters.v1(Seq(TLClientParameters( + name = s"bb-dma-writer-$i", + sourceId = freechips.rocketchip.diplomacy.IdRange(0, bbConfig.memDomain.dma_n_xacts) + )))))) + else None + } + + // Gather all DMA nodes into one xbar + if (bbParams.withBuckyball) { + val bb_xbar = TLXbar() + for (i <- 0 until nCores) { + bb_xbar := TLBuffer() := bb_reader_nodes(i).get + bb_xbar := TLBuffer() := bb_writer_nodes(i).get + } + tlOtherMastersNode :=* TLWidthWidget(bbConfig.memDomain.dma_buswidth / 8) := TLBuffer() := bb_xbar + } + + // --------------------------------------------------------------------------- + // TileLink topology + // --------------------------------------------------------------------------- + tlOtherMastersNode := tile_master_blocker.map(_.node := tlMasterXbar.node).getOrElse(tlMasterXbar.node) + masterNode :=* tlOtherMastersNode + DisableMonitors(implicit p => tlSlaveXbar.node :*= slaveNode) + + // DCache port count: core + PTW(via usingVM) + DTIM + vector + RoCC tieoff + nDCachePorts += 1 + (dtim_adapter.isDefined).toInt + + bbParams.core.vector.map(_.useDCache.toInt).getOrElse(0) + + bbParams.withBuckyball.toInt + + // --------------------------------------------------------------------------- + // Device tree properties + // --------------------------------------------------------------------------- + val dtimProperty = dtim_adapter.map(d => Map("sifive,dtim" -> d.device.asProperty)).getOrElse(Nil) + val itimProperty = frontend.icache.itimProperty.toSeq.flatMap(p => Map("sifive,itim" -> p)) + val beuProperty = bus_error_unit.map(d => Map("sifive,buserror" -> d.device.asProperty)).getOrElse(Nil) + + val cpuDevice: SimpleDevice = new SimpleDevice("cpu", Seq("sifive,rocket0", "riscv")) { + override def parent = Some(ResourceAnchors.cpus) + + override def describe(resources: ResourceBindings): Description = { + val Description(name, mapping) = super.describe(resources) + Description( + name, + mapping ++ cpuProperties ++ nextLevelCacheProperty + ++ tileProperties ++ dtimProperty ++ itimProperty ++ beuProperty + ) + } + + } + + // Vector unit (optional) + val vector_unit = bbParams.core.vector.map(v => LazyModule(v.build(p))) + vector_unit.foreach(vu => tlMasterXbar.node :=* vu.atlNode) + vector_unit.foreach(vu => tlOtherMastersNode :=* vu.tlNode) + + ResourceBinding { + Resource(cpuDevice, "reg").bind(ResourceAddress(bbParams.tileId)) + } + + // Buckyball needs one PTW port per accelerator + if (bbParams.withBuckyball) { + nPTWPorts += nCores + } + + override lazy val module = new BBTileModuleImp(this) + + override def makeMasterBoundaryBuffers(crossing: ClockCrossingType)(implicit p: Parameters) = + (bbParams.boundaryBuffers, crossing) match { + case (Some(RocketTileBoundaryBufferParams(true)), _) => TLBuffer() + case (Some(RocketTileBoundaryBufferParams(false)), _: RationalCrossing) => + TLBuffer(BufferParams.none, BufferParams.flow, BufferParams.none, BufferParams.flow, BufferParams(1)) + case _ => TLBuffer(BufferParams.none) + } + + override def makeSlaveBoundaryBuffers(crossing: ClockCrossingType)(implicit p: Parameters) = + (bbParams.boundaryBuffers, crossing) match { + case (Some(RocketTileBoundaryBufferParams(true)), _) => TLBuffer() + case (Some(RocketTileBoundaryBufferParams(false)), _: RationalCrossing) => + TLBuffer(BufferParams.flow, BufferParams.none, BufferParams.none, BufferParams.none, BufferParams.none) + case _ => TLBuffer(BufferParams.none) + } + +} + +// ============================================================================= +// Module implementation (Chisel layer) +// ============================================================================= +class BBTileModuleImp(outer: BBTile) extends BaseTileModuleImp(outer) with HasICacheFrontendModule { + + Annotated.params(this, outer.bbParams) + val nCores = outer.nCores + + // --- FPU (optional) --- + val fpuOpt = outer.bbParams.core.fpu.map(params => Module(new FPU(params)(outer.p))) + + // --- Rocket core (using our fork that accepts BBTile) --- + val core = Module(new RocketBB(outer)(outer.p)) + + // Vector unit connections + outer.vector_unit.foreach { v => + core.io.vector.get <> v.module.io.core + v.module.io.tlb <> outer.dcache.module.io.tlb_port + } + + core.io.reset_vector := DontCare + + // Report conditions + outer.reportHalt(List(outer.dcache.module.io.errors)) + outer.reportCease(outer.bbParams.core.clockGate.option( + !outer.dcache.module.io.cpu.clock_enabled && + !outer.frontend.module.io.cpu.clock_enabled && + !ptw.io.dpath.clock_enabled && + core.io.cease + )) + outer.reportWFI(Some(core.io.wfi)) + + // Interrupts + outer.decodeCoreInterrupts(core.io.interrupts) + outer.bus_error_unit.foreach { beu => + core.io.interrupts.buserror.get := beu.module.io.interrupt + beu.module.io.errors.dcache := outer.dcache.module.io.errors + beu.module.io.errors.icache := outer.frontend.module.io.errors + } + core.io.interrupts.nmi.foreach(nmi => nmi := outer.nmiSinkNode.get.bundle) + + // Trace and misc + outer.traceSourceNode.bundle <> core.io.trace + core.io.traceStall := outer.traceAuxSinkNode.bundle.stall + outer.bpwatchSourceNode.bundle <> core.io.bpwatch + core.io.hartid := outer.hartIdSinkNode.bundle + + // Core pipeline connections + outer.frontend.module.io.cpu <> core.io.imem + dcachePorts += core.io.dmem + + // FPU + fpuOpt.foreach { fpu => + core.io.fpu :<>= fpu.io.waiveAs[FPUCoreIO](_.cp_req, _.cp_resp) + fpu.io.cp_req.valid := false.B + fpu.io.cp_req.bits := DontCare + fpu.io.cp_resp.ready := false.B + } + if (fpuOpt.isEmpty) { + core.io.fpu := DontCare + } + + // Vector unit DCache port + outer.vector_unit.foreach { v => + if (outer.bbParams.core.vector.get.useDCache) { + dcachePorts += v.module.io.dmem + } else { + v.module.io.dmem := DontCare + } + } + + core.io.ptw <> ptw.io.dpath + + // DTIM adapter + outer.dtim_adapter.foreach(lm => dcachePorts += lm.module.io.dmem) + + // --------------------------------------------------------------------------- + // Helper: wire a BuckyballAccelerator's PTW to tile's PTW subsystem + // --------------------------------------------------------------------------- + def wireBBPtw(buckyball: BuckyballAccelerator): Unit = { + val bbPtw = Wire(new TLBPTWIO) + ptwPorts += bbPtw + bbPtw.req.valid := buckyball.io.ptw(0).req.valid + bbPtw.req.bits.valid := buckyball.io.ptw(0).req.bits.valid + bbPtw.req.bits.bits.addr := buckyball.io.ptw(0).req.bits.bits.addr + bbPtw.req.bits.bits.need_gpa := buckyball.io.ptw(0).req.bits.bits.need_gpa + bbPtw.req.bits.bits.vstage1 := buckyball.io.ptw(0).req.bits.bits.vstage1 + bbPtw.req.bits.bits.stage2 := buckyball.io.ptw(0).req.bits.bits.stage2 + buckyball.io.ptw(0).req.ready := bbPtw.req.ready + + buckyball.io.ptw(0).resp.valid := bbPtw.resp.valid + buckyball.io.ptw(0).resp.bits.ae_ptw := bbPtw.resp.bits.ae_ptw + buckyball.io.ptw(0).resp.bits.ae_final := bbPtw.resp.bits.ae_final + buckyball.io.ptw(0).resp.bits.pf := bbPtw.resp.bits.pf + buckyball.io.ptw(0).resp.bits.gf := bbPtw.resp.bits.gf + buckyball.io.ptw(0).resp.bits.hr := bbPtw.resp.bits.hr + buckyball.io.ptw(0).resp.bits.hw := bbPtw.resp.bits.hw + buckyball.io.ptw(0).resp.bits.hx := bbPtw.resp.bits.hx + buckyball.io.ptw(0).resp.bits.pte.ppn := bbPtw.resp.bits.pte.ppn + buckyball.io.ptw(0).resp.bits.pte.reserved_for_future := bbPtw.resp.bits.pte.reserved_for_future + buckyball.io.ptw(0).resp.bits.pte.reserved_for_software := bbPtw.resp.bits.pte.reserved_for_software + buckyball.io.ptw(0).resp.bits.pte.d := bbPtw.resp.bits.pte.d + buckyball.io.ptw(0).resp.bits.pte.a := bbPtw.resp.bits.pte.a + buckyball.io.ptw(0).resp.bits.pte.g := bbPtw.resp.bits.pte.g + buckyball.io.ptw(0).resp.bits.pte.u := bbPtw.resp.bits.pte.u + buckyball.io.ptw(0).resp.bits.pte.x := bbPtw.resp.bits.pte.x + buckyball.io.ptw(0).resp.bits.pte.w := bbPtw.resp.bits.pte.w + buckyball.io.ptw(0).resp.bits.pte.r := bbPtw.resp.bits.pte.r + buckyball.io.ptw(0).resp.bits.pte.v := bbPtw.resp.bits.pte.v + buckyball.io.ptw(0).resp.bits.level := bbPtw.resp.bits.level + buckyball.io.ptw(0).resp.bits.fragmented_superpage := bbPtw.resp.bits.fragmented_superpage + buckyball.io.ptw(0).resp.bits.homogeneous := bbPtw.resp.bits.homogeneous + buckyball.io.ptw(0).resp.bits.gpa.valid := bbPtw.resp.bits.gpa.valid + buckyball.io.ptw(0).resp.bits.gpa.bits := bbPtw.resp.bits.gpa.bits + buckyball.io.ptw(0).resp.bits.gpa_is_pte := bbPtw.resp.bits.gpa_is_pte + + buckyball.io.ptw(0).ptbr.mode := bbPtw.ptbr.mode + buckyball.io.ptw(0).ptbr.asid := bbPtw.ptbr.asid + buckyball.io.ptw(0).ptbr.ppn := bbPtw.ptbr.ppn + buckyball.io.ptw(0).hgatp.mode := bbPtw.hgatp.mode + buckyball.io.ptw(0).hgatp.asid := bbPtw.hgatp.asid + buckyball.io.ptw(0).hgatp.ppn := bbPtw.hgatp.ppn + buckyball.io.ptw(0).vsatp.mode := bbPtw.vsatp.mode + buckyball.io.ptw(0).vsatp.asid := bbPtw.vsatp.asid + buckyball.io.ptw(0).vsatp.ppn := bbPtw.vsatp.ppn + buckyball.io.ptw(0).status := bbPtw.status + buckyball.io.ptw(0).hstatus := bbPtw.hstatus + buckyball.io.ptw(0).gstatus := bbPtw.gstatus + buckyball.io.ptw(0).pmp.zipWithIndex.foreach { case (pmpPort, i) => + pmpPort.cfg.l := bbPtw.pmp(i).cfg.l + pmpPort.cfg.res := bbPtw.pmp(i).cfg.res + pmpPort.cfg.a := bbPtw.pmp(i).cfg.a + pmpPort.cfg.x := bbPtw.pmp(i).cfg.x + pmpPort.cfg.w := bbPtw.pmp(i).cfg.w + pmpPort.cfg.r := bbPtw.pmp(i).cfg.r + pmpPort.addr := bbPtw.pmp(i).addr + pmpPort.mask := bbPtw.pmp(i).mask + } + buckyball.io.ptw(0).customCSRs := DontCare + bbPtw.customCSRs := DontCare + } + + // --------------------------------------------------------------------------- + // Buckyball accelerators — N instances sharing SharedMemBackend + BarrierUnit + // --------------------------------------------------------------------------- + if (outer.bbParams.withBuckyball) { + val bankChannel = outer.bbConfig.memDomain.bankChannel + + // Instantiate N accelerators + val accelerators = (0 until nCores).map { i => + val (tl_reader, edge) = outer.bb_reader_nodes(i).get.out(0) + val (tl_writer, _) = outer.bb_writer_nodes(i).get.out(0) + val acc = Module(new BuckyballAccelerator(outer.bbConfig)(edge)) + acc.io.hartid := outer.hartIdSinkNode.bundle + i.U + + // DMA TileLink + tl_reader <> acc.io.tl_reader + tl_writer <> acc.io.tl_writer + + // PTW + wireBBPtw(acc) + + // TLB exception + acc.io.tlbExp(0).flush_skip := false.B + acc.io.tlbExp(0).flush_retry := false.B + + // CPU sfence → Buckyball TLB flush + acc.io.sfence := ptw.io.dpath.sfence.valid + + acc + } + + // Core-0 RoCC wiring (the single Rocket core drives accelerator 0) + accelerators(0).io.cmd <> core.io.rocc.cmd + core.io.rocc.resp <> accelerators(0).io.resp + core.io.rocc.busy := accelerators(0).io.busy + core.io.rocc.interrupt := accelerators(0).io.interrupt + + // RoCC mem: tied-off HellaCacheIF for the DCache arbiter port count + val roccMemIF = Module(new SimpleHellaCacheIF()) + roccMemIF.io.requestor.req.valid := false.B + roccMemIF.io.requestor.req.bits := DontCare + roccMemIF.io.requestor.s1_kill := false.B + roccMemIF.io.requestor.s1_data := DontCare + roccMemIF.io.requestor.s2_kill := false.B + roccMemIF.io.requestor.keep_clock_enabled := false.B + dcachePorts += roccMemIF.io.cache + core.io.rocc.mem := DontCare + + // SharedMemBackend (tile-level singleton) + val sharedBackend = Module(new SharedMemBackend(outer.bbConfig)) + + // Connect each accelerator's shared ports to the SharedMemBackend + for (i <- 0 until nCores) { + for (ch <- 0 until bankChannel) { + val slot = i * bankChannel + ch + sharedBackend.io.mem_req(slot) <> accelerators(i).io.shared_mem_req(ch) + } + // Shared query: connect per-core query to shared backend + // (only one query port on SharedMemBackend — for now use accelerator 0's query; + // each accelerator's MemBackend routes shared queries through its IO) + } + + // Shared config arbiter: N accelerators → 1 SharedMemBackend config port + val cfgArb = Module(new Arbiter(new MemConfigerIO(outer.bbConfig), nCores)) + for (i <- 0 until nCores) { + cfgArb.io.in(i) <> accelerators(i).io.shared_config + } + sharedBackend.io.config <> cfgArb.io.out + + // Shared query — simplified: each accelerator queries independently, + // but SharedMemBackend has one query port. Use accelerator 0's for now. + sharedBackend.io.query_vbank_id := accelerators(0).io.shared_query_vbank_id + accelerators(0).io.shared_query_group_count := sharedBackend.io.query_group_count + for (i <- 1 until nCores) { + accelerators(i).io.shared_query_group_count := sharedBackend.io.query_group_count + } + + // BarrierUnit (tile-level singleton) + val barrierUnit = Module(new BarrierUnit(nCores)) + for (i <- 0 until nCores) { + barrierUnit.io.arrive(i) := accelerators(i).io.barrier_arrive + accelerators(i).io.barrier_release := barrierUnit.io.release(i) + } + + } else { + // No accelerator — tie off RoCC + core.io.rocc.cmd.ready := false.B + core.io.rocc.resp.valid := false.B + core.io.rocc.resp.bits := DontCare + core.io.rocc.busy := DontCare + core.io.rocc.interrupt := DontCare + core.io.rocc.mem := DontCare + } + + // --- Finalize DCache arbiter and PTW connections (after all ports added) --- + val h = dcachePorts.size + val c = core.dcacheArbPorts + val o = outer.nDCachePorts + require(h == c, s"port list size was $h, core expected $c") + require(h == o, s"port list size was $h, outer counted $o") + + dcacheArb.io.requestor <> dcachePorts.toSeq + ptw.io.requestor <> ptwPorts.toSeq +} + +object BBTile { + + /** + * Inject a dummy BuildRoCC entry so that usingRoCC=true throughout all + * HasRocketCoreParameters mixins (CSR, decode, etc.), without actually + * using the LazyRoCC mechanism. + */ + def injectBuildRoCC(p: Parameters, withBuckyball: Boolean): Parameters = + if (withBuckyball) + p.alterPartial { case BuildRoCC => Seq((_: Parameters) => null.asInstanceOf[LazyRoCC]) } + else p + +} diff --git a/arch/src/main/scala/framework/core/bbtile/BBTileAttach.scala b/arch/src/main/scala/framework/core/bbtile/BBTileAttach.scala new file mode 100644 index 00000000..cd3ead9a --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/BBTileAttach.scala @@ -0,0 +1,10 @@ +package framework.core.bbtile + +import freechips.rocketchip.subsystem.{CanAttachTile, RocketCrossingParams} +import freechips.rocketchip.tile.RocketTile + +/** Attach parameters for BBTile — used in chipyard Config system via TilesLocated. */ +case class BBTileAttachParams( + tileParams: BBTileParams, + crossingParams: RocketCrossingParams) + extends CanAttachTile { type TileType = BBTile } diff --git a/arch/src/main/scala/framework/core/bbtile/BBTileParams.scala b/arch/src/main/scala/framework/core/bbtile/BBTileParams.scala new file mode 100644 index 00000000..6b4c628f --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/BBTileParams.scala @@ -0,0 +1,46 @@ +package framework.core.bbtile + +import freechips.rocketchip.rocket.{BTBParams, DCacheParams, ICacheParams, RocketCoreParams} +import freechips.rocketchip.tile.{InstantiableTileParams, RocketTileBoundaryBufferParams} +import freechips.rocketchip.subsystem.HierarchicalElementCrossingParamsLike +import freechips.rocketchip.prci.ClockSinkParameters +import org.chipsalliance.cde.config.Parameters +import freechips.rocketchip.tile.LookupByHartIdImpl +import framework.top.GlobalConfig + +/** + * Parameters for a BBTile. + * + * A BBTile contains N Rocket cores, each optionally paired with a Buckyball accelerator. + * Internal modules use @instantiable + config style; diplomacy is only used for TileLink ports. + */ +case class BBTileParams( + nCores: Int = 1, + withBuckyball: Boolean = true, + core: RocketCoreParams = RocketCoreParams(), + icache: Option[ICacheParams] = Some(ICacheParams()), + dcache: Option[DCacheParams] = Some(DCacheParams()), + btb: Option[BTBParams] = Some(BTBParams()), + buckyballConfig: GlobalConfig = GlobalConfig(), + tileId: Int = 0, + beuAddr: Option[BigInt] = None, + blockerCtrlAddr: Option[BigInt] = None, + clockSinkParams: ClockSinkParameters = ClockSinkParameters(), + boundaryBuffers: Option[RocketTileBoundaryBufferParams] = None) + extends InstantiableTileParams[BBTile] { + require(icache.isDefined) + require(dcache.isDefined) + require(nCores >= 1) + + val baseName = "bbtile" + val uniqueName = s"${baseName}_$tileId" + + def instantiate( + crossing: HierarchicalElementCrossingParamsLike, + lookup: LookupByHartIdImpl + )( + implicit p: Parameters + ): BBTile = + new BBTile(this, crossing, lookup) + +} diff --git a/arch/src/main/scala/framework/core/bbtile/BarrierUnit.scala b/arch/src/main/scala/framework/core/bbtile/BarrierUnit.scala new file mode 100644 index 00000000..63f7ea19 --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/BarrierUnit.scala @@ -0,0 +1,35 @@ +package framework.core.bbtile + +import chisel3._ +import chisel3.experimental.hierarchy.{instantiable, public} + +/** + * BarrierUnit — tile-level hardware barrier for multi-core synchronization. + * + * Each core's BuckyballAccelerator raises arrive(i) after its ROB drains. + * When all N cores have arrived, release(i) is asserted for one cycle, + * allowing all cores to proceed simultaneously. + * + * @param nCores number of cores in this tile + */ +@instantiable +class BarrierUnit(val nCores: Int) extends Module { + + @public + val io = IO(new Bundle { + val arrive = Input(Vec(nCores, Bool())) + val release = Output(Vec(nCores, Bool())) + }) + + val arrived = RegInit(VecInit(Seq.fill(nCores)(false.B))) + val allArrived = arrived.asUInt.andR + + for (i <- 0 until nCores) { + when(io.arrive(i))(arrived(i) := true.B) + io.release(i) := allArrived + } + + when(allArrived) { + for (i <- 0 until nCores) { arrived(i) := false.B } + } +} diff --git a/arch/src/main/scala/framework/core/bbtile/BuckyballAccelerator.scala b/arch/src/main/scala/framework/core/bbtile/BuckyballAccelerator.scala new file mode 100644 index 00000000..81c1901c --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/BuckyballAccelerator.scala @@ -0,0 +1,165 @@ +package framework.core.bbtile + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} + +import freechips.rocketchip.tilelink.{TLBundle, TLEdgeOut} +import framework.top.GlobalConfig +import framework.frontend.Frontend +import framework.gpdomain.GpDomain +import framework.memdomain.MemDomain +import framework.memdomain.backend.MemRequestIO +import framework.memdomain.frontend.outside_channel.{MemConfigerIO} +import framework.memdomain.frontend.outside_channel.tlb.{BBTLBExceptionIO, BBTLBPTWIO} +import examples.toy.balldomain.BallDomain + +/** + * Standalone Buckyball accelerator module. + * + * Decoupled from the LazyRoCCBB inheritance chain. + * Uses @instantiable + GlobalConfig pattern. + * TileLink bundles are passed in from the tile's diplomacy shell. + * + * @param b GlobalConfig for the accelerator + * @param edge TLEdgeOut from the DMA TileLink nodes (for TLBundle sizing) + */ +@instantiable +class BuckyballAccelerator(val b: GlobalConfig)(edge: TLEdgeOut) extends Module { + val totalBallRead = b.ballDomain.ballIdMappings.map(_.inBW).sum + val totalBallWrite = b.ballDomain.ballIdMappings.map(_.outBW).sum + + @public + val io = IO(new Bundle { + // RoCC command/response (connected to Rocket core inside tile) + val cmd = Flipped(Decoupled(new RoCCCommandBB(b.core.xLen))) + val resp = Decoupled(new RoCCResponseBB(b.core.xLen)) + val busy = Output(Bool()) + val interrupt = Output(Bool()) + val hartid = Input(UInt(b.core.xLen.W)) + + // PTW interface (shared with Rocket core's PTW) + val ptw = Vec(1, new BBTLBPTWIO(b)) + // TLB exception interface + val tlbExp = Vec(1, new BBTLBExceptionIO) + // CPU sfence signal — flushes Buckyball's TLB + val sfence = Input(Bool()) + + // TileLink DMA bundles (from tile's diplomacy nodes) + val tl_reader = new TLBundle(edge.bundle) + val tl_writer = new TLBundle(edge.bundle) + + // Shared memory path — exposed to tile level for multi-core SharedMemBackend + val shared_mem_req = Vec(b.memDomain.bankChannel, new MemRequestIO(b)) + val shared_config = Decoupled(new MemConfigerIO(b)) + val shared_query_vbank_id = Output(UInt(8.W)) + val shared_query_group_count = Input(UInt(4.W)) + + // Barrier interface — connected to tile-level BarrierUnit + val barrier_arrive = Output(Bool()) + val barrier_release = Input(Bool()) + }) + + // --- Instantiate domains --- + val frontend: Instance[Frontend] = Instantiate(new Frontend(b)) + val ballDomain: Instance[BallDomain] = Instantiate(new BallDomain(b)) + val memDomain: Instance[MemDomain] = Instantiate(new MemDomain(b)(edge)) + val gpDomain: Instance[GpDomain] = Instantiate(new GpDomain(b)) + + // --- Frontend <- cmd --- + frontend.io.cmd.valid := io.cmd.valid + frontend.io.cmd.bits.cmd := io.cmd.bits + io.cmd.ready := frontend.io.cmd.ready + + // --- Frontend -> BallDomain --- + ballDomain.global_issue_i <> frontend.io.ball_issue_o + frontend.io.ball_complete_i <> ballDomain.global_complete_o + + // --- BallDomain -> Frontend (SubROB requests) --- + for (i <- 0 until b.ballDomain.ballNum) { + frontend.io.ball_subrob_req_i(i) <> ballDomain.subRobReq(i) + } + + // --- Frontend -> MemDomain --- + memDomain.io.global_issue_i <> frontend.io.mem_issue_o + frontend.io.mem_complete_i <> memDomain.io.global_complete_o + memDomain.io.hartid := io.hartid + + // --- Frontend -> GpDomain --- + gpDomain.io.global_issue_i <> frontend.io.gp_issue_o + frontend.io.gp_complete_i <> gpDomain.io.global_complete_o + + // --- BallDomain <-> MemDomain (bankRead with pipeline register to break comb loops) --- + for (i <- 0 until totalBallRead) { + val bankReadReqWithIds = Wire(Decoupled(new Bundle { + val bank_id = chiselTypeOf(ballDomain.bankRead(i).bank_id) + val rob_id = chiselTypeOf(ballDomain.bankRead(i).rob_id) + val ball_id = chiselTypeOf(ballDomain.bankRead(i).ball_id) + val group_id = chiselTypeOf(ballDomain.bankRead(i).group_id) + val req = chiselTypeOf(ballDomain.bankRead(i).io.req.bits) + })) + + bankReadReqWithIds.valid := ballDomain.bankRead(i).io.req.valid + bankReadReqWithIds.bits.bank_id := ballDomain.bankRead(i).bank_id + bankReadReqWithIds.bits.rob_id := ballDomain.bankRead(i).rob_id + bankReadReqWithIds.bits.ball_id := ballDomain.bankRead(i).ball_id + bankReadReqWithIds.bits.group_id := ballDomain.bankRead(i).group_id + bankReadReqWithIds.bits.req := ballDomain.bankRead(i).io.req.bits + ballDomain.bankRead(i).io.req.ready := bankReadReqWithIds.ready + + val bankReadReqQ = Queue(bankReadReqWithIds, 8) + + memDomain.io.ballDomain.bankRead(i).io.req.valid := bankReadReqQ.valid + memDomain.io.ballDomain.bankRead(i).io.req.bits := bankReadReqQ.bits.req + memDomain.io.ballDomain.bankRead(i).bank_id := bankReadReqQ.bits.bank_id + memDomain.io.ballDomain.bankRead(i).rob_id := bankReadReqQ.bits.rob_id + memDomain.io.ballDomain.bankRead(i).ball_id := bankReadReqQ.bits.ball_id + memDomain.io.ballDomain.bankRead(i).group_id := bankReadReqQ.bits.group_id + bankReadReqQ.ready := memDomain.io.ballDomain.bankRead(i).io.req.ready + + ballDomain.bankRead(i).io.resp <> memDomain.io.ballDomain.bankRead(i).io.resp + } + + ballDomain.bankWrite <> memDomain.io.ballDomain.bankWrite + + // --- PTW --- + io.ptw(0).req <> memDomain.io.ptw(0).req + memDomain.io.ptw(0).resp <> io.ptw(0).resp + memDomain.io.ptw(0).ptbr <> io.ptw(0).ptbr + memDomain.io.ptw(0).hgatp <> io.ptw(0).hgatp + memDomain.io.ptw(0).vsatp <> io.ptw(0).vsatp + memDomain.io.ptw(0).status <> io.ptw(0).status + memDomain.io.ptw(0).hstatus <> io.ptw(0).hstatus + memDomain.io.ptw(0).gstatus <> io.ptw(0).gstatus + memDomain.io.ptw(0).pmp <> io.ptw(0).pmp + memDomain.io.ptw(0).customCSRs := DontCare + + // --- TLB exception --- + memDomain.io.tlbExp(0).flush_skip := false.B + memDomain.io.tlbExp(0).flush_retry := io.sfence + io.tlbExp(0) <> memDomain.io.tlbExp(0) + + // --- TileLink DMA --- + io.tl_reader <> memDomain.io.tl_reader + io.tl_writer <> memDomain.io.tl_writer + + // --- Shared memory passthrough --- + io.shared_mem_req <> memDomain.io.shared_mem_req + io.shared_config <> memDomain.io.shared_config + io.shared_query_vbank_id := memDomain.io.shared_query_vbank_id + memDomain.io.shared_query_group_count := io.shared_query_group_count + + // --- Barrier passthrough --- + io.barrier_arrive := frontend.io.barrier_arrive + frontend.io.barrier_release := io.barrier_release + + // --- Response & status --- + io.resp <> frontend.io.resp + io.busy := frontend.io.busy + io.interrupt := memDomain.io.tlbExp(0).interrupt + + // --- Busy watchdog --- + val busy_counter = RegInit(0.U(32.W)) + busy_counter := Mux(frontend.io.busy, busy_counter + 1.U, 0.U) + assert(busy_counter < 100000.U, "BuckyballAccelerator: busy for too long!") +} diff --git a/arch/src/main/scala/framework/core/bbtile/Configs.scala b/arch/src/main/scala/framework/core/bbtile/Configs.scala new file mode 100644 index 00000000..43f7b911 --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/Configs.scala @@ -0,0 +1,74 @@ +package framework.core.bbtile + +import org.chipsalliance.cde.config._ +import freechips.rocketchip.rocket.{BTBParams, DCacheParams, ICacheParams, MulDivParams, RocketCoreParams} +import freechips.rocketchip.subsystem._ +import freechips.rocketchip.tile.{FPUParams, RocketTileBoundaryBufferParams} +import framework.top.GlobalConfig + +/** + * Config fragment to add N BBTiles. + * + * Each BBTile contains one Rocket core + optional Buckyball accelerator. + */ +object WithNBBTiles { + + private def defaultCrossing(location: HierarchicalLocation): RocketCrossingParams = + RocketCrossingParams( + master = HierarchicalElementMasterPortParams.locationDefault(location), + slave = HierarchicalElementSlavePortParams.locationDefault(location), + mmioBaseAddressPrefixWhere = location match { + case InSubsystem => CBUS + case InCluster(clusterId) => CCBUS(clusterId) + } + ) + +} + +class WithNBBTiles( + n: Int, + location: HierarchicalLocation = InSubsystem, + withBuckyball: Boolean = true, + buckyballConfig: GlobalConfig = GlobalConfig(), + crossing: Option[RocketCrossingParams] = None) + extends Config((site, here, up) => { + case TilesLocated(`location`) => + val prev = up(TilesLocated(`location`), site) + val idOffset = up(NumTiles) + val actualCrossing = crossing.getOrElse(WithNBBTiles.defaultCrossing(location)) + val tileParams = BBTileParams( + withBuckyball = withBuckyball, + buckyballConfig = buckyballConfig, + core = RocketCoreParams( + mulDiv = Some(MulDivParams( + mulUnroll = 8, + mulEarlyOut = true, + divEarlyOut = true + )), + useZba = true, + useZbb = true, + useZbs = true, + fpu = Some(FPUParams(minFLen = 16)) + ), + dcache = Some(DCacheParams( + nSets = 64, + nWays = 8, + rowBits = site(SystemBusKey).beatBits, + nMSHRs = 0, + blockBytes = site(CacheBlockBytes) + )), + icache = Some(ICacheParams( + nSets = 64, + nWays = 8, + rowBits = site(SystemBusKey).beatBits, + blockBytes = site(CacheBlockBytes) + )) + ) + List.tabulate(n)(i => + BBTileAttachParams( + tileParams.copy(tileId = i + idOffset), + actualCrossing + ) + ) ++ prev + case NumTiles => up(NumTiles) + n + }) diff --git a/arch/src/main/scala/framework/core/bbtile/RoCCTypes.scala b/arch/src/main/scala/framework/core/bbtile/RoCCTypes.scala new file mode 100644 index 00000000..b77c8077 --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/RoCCTypes.scala @@ -0,0 +1,50 @@ +package framework.core.bbtile + +import chisel3._ +import chisel3.util._ +import org.chipsalliance.cde.config.Parameters +import freechips.rocketchip.tile.{CoreBundle, CustomCSRIO} +import freechips.rocketchip.rocket.HellaCacheIO + +/** RoCC command bundle */ +class RoCCCommandBB(xLen: Int = 64) extends Bundle { + val raw_inst = UInt(32.W) + val pc = UInt(xLen.W) + val funct = UInt(7.W) + val funct3 = UInt(3.W) + val rs2 = Bits(5.W) + val rs1 = Bits(5.W) + val xd = Bool() + val xs1 = Bool() + val xs2 = Bool() + val rd = Bits(5.W) + val opcode = UInt(7.W) + val rs1Data = UInt(xLen.W) + val rs2Data = UInt(xLen.W) +} + +/** RoCC response bundle */ +class RoCCResponseBB(xLen: Int = 64) extends Bundle { + val rd = Bits(5.W) + val data = Bits(xLen.W) +} + +/** RoCC interface between a core and an accelerator. */ +class RoCCIO(xLen: Int = 64) extends Bundle { + val cmd = Flipped(Decoupled(new RoCCCommandBB(xLen))) + val resp = Decoupled(new RoCCResponseBB(xLen)) + val busy = Output(Bool()) + val interrupt = Output(Bool()) + val exception = Input(Bool()) +} + +/** RoCC core IO — used inside Rocket core. */ +class RoCCCoreIOBB(val nRoCCCSRs: Int = 0)(implicit p: Parameters) extends CoreBundle()(p) { + val cmd = Flipped(Decoupled(new RoCCCommandBB)) + val resp = Decoupled(new RoCCResponseBB) + val mem = new HellaCacheIO + val busy = Output(Bool()) + val interrupt = Output(Bool()) + val exception = Input(Bool()) + val csrs = Flipped(Vec(nRoCCCSRs, new CustomCSRIO)) +} diff --git a/arch/src/main/scala/framework/core/bbtile/RocketCoreBB.scala b/arch/src/main/scala/framework/core/bbtile/RocketCoreBB.scala new file mode 100644 index 00000000..bb742f56 --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/RocketCoreBB.scala @@ -0,0 +1,1472 @@ +// See LICENSE.Berkeley for license details. +// See LICENSE.SiFive for license details. + +package framework.core.bbtile + +import chisel3._ +import chisel3.util._ +import chisel3.withClock +import org.chipsalliance.cde.config.Parameters +import freechips.rocketchip.tile._ +import freechips.rocketchip.util._ +import freechips.rocketchip.util.property +import scala.collection.mutable.ArrayBuffer +import freechips.rocketchip.rocket._ + +import framework.core.bbtile.id.RVVRoCCDecode + +trait HasRocketCoreIOBB extends HasRocketCoreParameters { + implicit val p: Parameters + def nTotalRoCCCSRs: Int + + val io = IO(new CoreBundle()(p) { + val hartid = Input(UInt(hartIdLen.W)) + val reset_vector = Input(UInt(resetVectorLen.W)) + val interrupts = Input(new CoreInterrupts(tileParams.asInstanceOf[BBTileParams].beuAddr.isDefined)) + val imem = new FrontendIO + val dmem = new HellaCacheIO + val ptw = Flipped(new DatapathPTWIO()) + val fpu = Flipped(new FPUCoreIO()) + val rocc = Flipped(new RoCCCoreIOBB(nTotalRoCCCSRs)) + val trace = Output(new TraceBundle) + val bpwatch = Output(Vec(coreParams.nBreakpoints, new BPWatch(coreParams.retireWidth))) + val cease = Output(Bool()) + val wfi = Output(Bool()) + val traceStall = Input(Bool()) + val vector = if (usingVector) Some(Flipped(new VectorCoreIO)) else None + }) + +} + +class RocketBB(tile: BBTile)(implicit p: Parameters) + extends CoreModule()(p) + with HasRocketCoreParameters + with HasRocketCoreIOBB { + def nTotalRoCCCSRs = tile.roccCSRs.flatten.size + + // Override usingRoCC: BBTile doesn't use BuildRoCC/LazyRoCC, but when + // withBuckyball is enabled the core must still treat custom instructions as + // RoCC commands. + override val usingRoCC = tile.bbParams.withBuckyball + + import ALU._ + + val clock_en_reg = RegInit(true.B) + val long_latency_stall = Reg(Bool()) + val id_reg_pause = Reg(Bool()) + val imem_might_request_reg = Reg(Bool()) + val clock_en = WireDefault(true.B) + + val gated_clock = + if (!rocketParams.clockGate) clock + else ClockGate(clock, clock_en, "rocket_clock_gate") + + class RocketImpl { // entering gated-clock domain + + // performance counters + def pipelineIDToWB[T <: Data](x: T): T = + RegEnable(RegEnable(RegEnable(x, !ctrl_killd), ex_pc_valid), mem_pc_valid) + + val perfEvents = new EventSets(Seq( + new EventSet( + (mask, hits) => Mux(wb_xcpt, mask(0), wb_valid && pipelineIDToWB((mask & hits).orR)), + Seq( + ("exception", () => false.B), + ("load", () => id_ctrl.mem && id_ctrl.mem_cmd === M_XRD && !id_ctrl.fp), + ("store", () => id_ctrl.mem && id_ctrl.mem_cmd === M_XWR && !id_ctrl.fp), + ("amo", () => usingAtomics.B && id_ctrl.mem && (isAMO(id_ctrl.mem_cmd) || id_ctrl.mem_cmd.isOneOf(M_XLR, M_XSC))), + ("system", () => id_ctrl.csr =/= CSR.N), + ( + "arith", + () => + id_ctrl.wxd && !(id_ctrl.jal || id_ctrl.jalr || id_ctrl.mem || id_ctrl.fp || id_ctrl.mul || id_ctrl.div || id_ctrl.csr =/= CSR.N) + ), + ("branch", () => id_ctrl.branch), + ("jal", () => id_ctrl.jal), + ("jalr", () => id_ctrl.jalr) + ) + ++ (if (!usingMulDiv) Seq() + else + Seq( + ("mul", () => if (pipelinedMul) id_ctrl.mul else id_ctrl.div && (id_ctrl.alu_fn & FN_DIV) =/= FN_DIV), + ("div", () => if (pipelinedMul) id_ctrl.div else id_ctrl.div && (id_ctrl.alu_fn & FN_DIV) === FN_DIV) + )) + ++ (if (!usingFPU) Seq() + else + Seq( + ("fp load", () => id_ctrl.fp && io.fpu.dec.ldst && io.fpu.dec.wen), + ("fp store", () => id_ctrl.fp && io.fpu.dec.ldst && !io.fpu.dec.wen), + ("fp add", () => id_ctrl.fp && io.fpu.dec.fma && io.fpu.dec.swap23), + ("fp mul", () => id_ctrl.fp && io.fpu.dec.fma && !io.fpu.dec.swap23 && !io.fpu.dec.ren3), + ("fp mul-add", () => id_ctrl.fp && io.fpu.dec.fma && io.fpu.dec.ren3), + ("fp div/sqrt", () => id_ctrl.fp && (io.fpu.dec.div || io.fpu.dec.sqrt)), + ( + "fp other", + () => id_ctrl.fp && !(io.fpu.dec.ldst || io.fpu.dec.fma || io.fpu.dec.div || io.fpu.dec.sqrt) + ) + )) + ), + new EventSet( + (mask, hits) => (mask & hits).orR, + Seq( + ( + "load-use interlock", + () => id_ex_hazard && ex_ctrl.mem || id_mem_hazard && mem_ctrl.mem || id_wb_hazard && wb_ctrl.mem + ), + ("long-latency interlock", () => id_sboard_hazard), + ( + "csr interlock", + () => + id_ex_hazard && ex_ctrl.csr =/= CSR.N || id_mem_hazard && mem_ctrl.csr =/= CSR.N || id_wb_hazard && wb_ctrl.csr =/= CSR.N + ), + ("I$ blocked", () => icache_blocked), + ("D$ blocked", () => id_ctrl.mem && dcache_blocked), + ("branch misprediction", () => take_pc_mem && mem_direction_misprediction), + ( + "control-flow target misprediction", + () => take_pc_mem && mem_misprediction && mem_cfi && !mem_direction_misprediction && !icache_blocked + ), + ("flush", () => wb_reg_flush_pipe), + ("replay", () => replay_wb) + ) + ++ (if (!usingMulDiv) Seq() + else + Seq( + ( + "mul/div interlock", + () => + id_ex_hazard && (ex_ctrl.mul || ex_ctrl.div) || id_mem_hazard && (mem_ctrl.mul || mem_ctrl.div) || id_wb_hazard && wb_ctrl.div + ) + )) + ++ (if (!usingFPU) Seq() + else + Seq( + ( + "fp interlock", + () => + id_ex_hazard && ex_ctrl.fp || id_mem_hazard && mem_ctrl.fp || id_wb_hazard && wb_ctrl.fp || id_ctrl.fp && id_stall_fpu + ) + )) + ), + new EventSet( + (mask, hits) => (mask & hits).orR, + Seq( + ("I$ miss", () => io.imem.perf.acquire), + ("D$ miss", () => io.dmem.perf.acquire), + ("D$ release", () => io.dmem.perf.release), + ("ITLB miss", () => io.imem.perf.tlbMiss), + ("DTLB miss", () => io.dmem.perf.tlbMiss), + ("L2 TLB miss", () => io.ptw.perf.l2miss) + ) + ) + )) + + val pipelinedMul = usingMulDiv && mulDivParams.mulUnroll == xLen + + val usingRVVRoCC = tile.bbParams.withBuckyball + + // Ensure usingVector and usingRVVRoCC are mutually exclusive + require( + !usingVector || !usingRVVRoCC, + "usingVector and usingRVVRoCC cannot both be enabled. " + + "Use usingVector for built-in vector unit, or usingRVVRoCC to route vector instructions to RoCC." + ) + + val decode_table = { + (if (usingMulDiv) new MDecode(pipelinedMul) +: (xLen > 32).option(new M64Decode(pipelinedMul)).toSeq else Nil) ++: + (if (usingAtomics) new ADecode +: (xLen > 32).option(new A64Decode).toSeq else Nil) ++: + (usingRVVRoCC.option(new RVVRoCCDecode)) ++: + (if (fLen >= 32) new FDecode +: (xLen > 32).option(new F64Decode).toSeq else Nil) ++: + (if (fLen >= 64) new DDecode +: (xLen > 32).option(new D64Decode).toSeq else Nil) ++: + (if (minFLen == 16) + new HDecode +: (xLen > 32).option(new H64Decode).toSeq ++: (fLen >= 64).option(new HDDecode).toSeq + else Nil) ++: + (usingRoCC.option(new RoCCDecode)) ++: + (if (xLen == 32) new I32Decode else new I64Decode) +: + (usingVM.option(new SVMDecode)) ++: + (usingSupervisor.option(new SDecode)) ++: + (usingHypervisor.option(new HypervisorDecode)) ++: + ((usingHypervisor && (xLen == 64)).option(new Hypervisor64Decode)) ++: + (usingDebug.option(new DebugDecode)) ++: + (usingNMI.option(new NMIDecode)) ++: + (usingConditionalZero.option(new ConditionalZeroDecode)) ++: + Seq(new FenceIDecode(tile.dcache.flushOnFenceI)) ++: + coreParams.haveCFlush.option(new CFlushDecode(tile.dcache.canSupportCFlushLine)) ++: + rocketParams.haveCease.option(new CeaseDecode) ++: + usingVector.option(new VCFGDecode) ++: + (if (coreParams.useZba) new ZbaDecode +: (xLen > 32).option(new Zba64Decode).toSeq else Nil) ++: + (if (coreParams.useZbb) Seq(new ZbbDecode, if (xLen == 32) new Zbb32Decode else new Zbb64Decode) else Nil) ++: + coreParams.useZbs.option(new ZbsDecode) ++: + Seq(new IDecode) + } flatMap (_.table) + + val ex_ctrl = Reg(new IntCtrlSigs) + val mem_ctrl = Reg(new IntCtrlSigs) + val wb_ctrl = Reg(new IntCtrlSigs) + + val ex_reg_xcpt_interrupt = Reg(Bool()) + val ex_reg_valid = Reg(Bool()) + val ex_reg_rvc = Reg(Bool()) + val ex_reg_btb_resp = Reg(new BTBResp) + val ex_reg_xcpt = Reg(Bool()) + val ex_reg_flush_pipe = Reg(Bool()) + val ex_reg_load_use = Reg(Bool()) + val ex_reg_cause = Reg(UInt()) + val ex_reg_replay = Reg(Bool()) + val ex_reg_pc = Reg(UInt()) + val ex_reg_mem_size = Reg(UInt()) + val ex_reg_hls = Reg(Bool()) + val ex_reg_inst = Reg(Bits()) + val ex_reg_raw_inst = Reg(UInt()) + val ex_reg_wphit = Reg(Vec(nBreakpoints, Bool())) + val ex_reg_set_vconfig = Reg(Bool()) + + val mem_reg_xcpt_interrupt = Reg(Bool()) + val mem_reg_valid = Reg(Bool()) + val mem_reg_rvc = Reg(Bool()) + val mem_reg_btb_resp = Reg(new BTBResp) + val mem_reg_xcpt = Reg(Bool()) + val mem_reg_replay = Reg(Bool()) + val mem_reg_flush_pipe = Reg(Bool()) + val mem_reg_cause = Reg(UInt()) + val mem_reg_slow_bypass = Reg(Bool()) + val mem_reg_load = Reg(Bool()) + val mem_reg_store = Reg(Bool()) + val mem_reg_set_vconfig = Reg(Bool()) + val mem_reg_sfence = Reg(Bool()) + val mem_reg_pc = Reg(UInt()) + val mem_reg_inst = Reg(Bits()) + val mem_reg_mem_size = Reg(UInt()) + val mem_reg_hls_or_dv = Reg(Bool()) + val mem_reg_raw_inst = Reg(UInt()) + val mem_reg_wdata = Reg(Bits()) + val mem_reg_rs2 = Reg(Bits()) + val mem_br_taken = Reg(Bool()) + val take_pc_mem = Wire(Bool()) + val mem_reg_wphit = Reg(Vec(nBreakpoints, Bool())) + + val wb_reg_valid = Reg(Bool()) + val wb_reg_xcpt = Reg(Bool()) + val wb_reg_replay = Reg(Bool()) + val wb_reg_flush_pipe = Reg(Bool()) + val wb_reg_cause = Reg(UInt()) + val wb_reg_set_vconfig = Reg(Bool()) + val wb_reg_sfence = Reg(Bool()) + val wb_reg_pc = Reg(UInt()) + val wb_reg_mem_size = Reg(UInt()) + val wb_reg_hls_or_dv = Reg(Bool()) + val wb_reg_hfence_v = Reg(Bool()) + val wb_reg_hfence_g = Reg(Bool()) + val wb_reg_inst = Reg(Bits()) + val wb_reg_raw_inst = Reg(UInt()) + val wb_reg_wdata = Reg(Bits()) + val wb_reg_rs2 = Reg(Bits()) + val take_pc_wb = Wire(Bool()) + val wb_reg_wphit = Reg(Vec(nBreakpoints, Bool())) + + val take_pc_mem_wb = take_pc_wb || take_pc_mem + val take_pc = take_pc_mem_wb + + // decode stage + val ibuf = Module(new IBuf) + val id_expanded_inst = ibuf.io.inst.map(_.bits.inst) + val id_raw_inst = ibuf.io.inst.map(_.bits.raw) + val id_inst = id_expanded_inst.map(_.bits) + ibuf.io.imem <> io.imem.resp + ibuf.io.kill := take_pc + + require(decodeWidth == 1 /* TODO */ && retireWidth == decodeWidth) + require(!(coreParams.useRVE && coreParams.fpu.nonEmpty), "Can't select both RVE and floating-point") + require(!(coreParams.useRVE && coreParams.useHypervisor), "Can't select both RVE and Hypervisor") + val id_ctrl = Wire(new IntCtrlSigs).decode(id_inst(0), decode_table) + + val lgNXRegs = if (coreParams.useRVE) 4 else 5 + val regAddrMask = (1 << lgNXRegs) - 1 + + def decodeReg(x: UInt) = (x.extract(x.getWidth - 1, lgNXRegs).asBool, x(lgNXRegs - 1, 0)) + val (id_raddr3_illegal, id_raddr3) = decodeReg(id_expanded_inst(0).rs3) + val (id_raddr2_illegal, id_raddr2) = decodeReg(id_expanded_inst(0).rs2) + val (id_raddr1_illegal, id_raddr1) = decodeReg(id_expanded_inst(0).rs1) + val (id_waddr_illegal, id_waddr) = decodeReg(id_expanded_inst(0).rd) + + val id_load_use = Wire(Bool()) + val id_reg_fence = RegInit(false.B) + val id_ren = IndexedSeq(id_ctrl.rxs1, id_ctrl.rxs2) + val id_raddr = IndexedSeq(id_raddr1, id_raddr2) + val rf = new RegFile(regAddrMask, xLen) + val id_rs = id_raddr.map(rf.read _) + val ctrl_killd = Wire(Bool()) + val id_npc = (ibuf.io.pc.asSInt + ImmGen(IMM_UJ, id_inst(0))).asUInt + + val csr = Module(new CSRFile( + perfEvents, + coreParams.customCSRs.decls, + tile.roccCSRs.flatten, + tile.bbParams.beuAddr.isDefined + )) + + val id_csr_en = id_ctrl.csr.isOneOf(CSR.S, CSR.C, CSR.W) + val id_system_insn = id_ctrl.csr === CSR.I + val id_csr_ren = id_ctrl.csr.isOneOf(CSR.S, CSR.C) && id_expanded_inst(0).rs1 === 0.U + val id_csr = Mux(id_system_insn && id_ctrl.mem, CSR.N, Mux(id_csr_ren, CSR.R, id_ctrl.csr)) + val id_csr_flush = id_system_insn || (id_csr_en && !id_csr_ren && csr.io.decode(0).write_flush) + val id_set_vconfig = + Seq(Instructions.VSETVLI, Instructions.VSETIVLI, Instructions.VSETVL).map(_ === id_inst(0)).orR && usingVector.B + + id_ctrl.vec := false.B + if (usingVector) { + val v_decode = rocketParams.vector.get.decoder(p) + v_decode.io.inst := id_inst(0) + v_decode.io.vconfig := csr.io.vector.get.vconfig + when(v_decode.io.legal) { + id_ctrl.legal := !csr.io.vector.get.vconfig.vtype.vill + id_ctrl.fp := v_decode.io.fp + id_ctrl.rocc := false.B + id_ctrl.branch := false.B + id_ctrl.jal := false.B + id_ctrl.jalr := false.B + id_ctrl.rxs2 := v_decode.io.read_rs2 + id_ctrl.rxs1 := v_decode.io.read_rs1 + id_ctrl.mem := false.B + id_ctrl.rfs1 := v_decode.io.read_frs1 + id_ctrl.rfs2 := false.B + id_ctrl.rfs3 := false.B + id_ctrl.wfd := v_decode.io.write_frd + id_ctrl.mul := false.B + id_ctrl.div := false.B + id_ctrl.wxd := v_decode.io.write_rd + id_ctrl.csr := CSR.N + id_ctrl.fence_i := false.B + id_ctrl.fence := false.B + id_ctrl.amo := false.B + id_ctrl.dp := false.B + id_ctrl.vec := true.B + } + } + + val id_illegal_insn = !id_ctrl.legal || + (id_ctrl.mul || id_ctrl.div) && !csr.io.status.isa('m' - 'a') || + id_ctrl.amo && !csr.io.status.isa('a' - 'a') || + id_ctrl.fp && (csr.io.decode(0).fp_illegal || (io.fpu.illegal_rm && !id_ctrl.vec)) || + (id_ctrl.vec) && (csr.io.decode(0).vector_illegal || csr.io.vector.map(_.vconfig.vtype.vill).getOrElse( + false.B + )) || + id_ctrl.dp && !csr.io.status.isa('d' - 'a') || + ibuf.io.inst(0).bits.rvc && !csr.io.status.isa('c' - 'a') || + id_raddr2_illegal && id_ctrl.rxs2 || + id_raddr1_illegal && id_ctrl.rxs1 || + id_waddr_illegal && id_ctrl.wxd || + id_ctrl.rocc && csr.io.decode(0).rocc_illegal || + id_csr_en && (csr.io.decode(0).read_illegal || !id_csr_ren && csr.io.decode(0).write_illegal) || + !ibuf.io.inst(0).bits.rvc && (id_system_insn && csr.io.decode(0).system_illegal) + + val id_virtual_insn = id_ctrl.legal && + ((id_csr_en && !(!id_csr_ren && csr.io.decode(0).write_illegal) && csr.io.decode(0).virtual_access_illegal) || + (!ibuf.io.inst(0).bits.rvc && id_system_insn && csr.io.decode(0).virtual_system_illegal)) + + // stall decode for fences (now, for AMO.rl; later, for AMO.aq and FENCE) + val id_amo_aq = id_inst(0)(26) + val id_amo_rl = id_inst(0)(25) + val id_fence_pred = id_inst(0)(27, 24) + val id_fence_succ = id_inst(0)(23, 20) + val id_fence_next = id_ctrl.fence || id_ctrl.amo && id_amo_aq + val id_mem_busy = !io.dmem.ordered || io.dmem.req.valid + when(!id_mem_busy)(id_reg_fence := false.B) + + val id_rocc_busy = usingRoCC.B && + (io.rocc.busy || ex_reg_valid && ex_ctrl.rocc || + mem_reg_valid && mem_ctrl.rocc || wb_reg_valid && wb_ctrl.rocc) + + val id_csr_rocc_write = tile.roccCSRs.flatten.map(_.id.U === id_inst(0)(31, 20)).orR && id_csr_en && !id_csr_ren + val id_vec_busy = io.vector.map(v => v.backend_busy || v.trap_check_busy).getOrElse(false.B) + + val id_do_fence = WireDefault(id_rocc_busy && (id_ctrl.fence || id_csr_rocc_write) || + id_vec_busy && id_ctrl.fence || + id_mem_busy && (id_ctrl.amo && id_amo_rl || id_ctrl.fence_i || id_reg_fence && (id_ctrl.mem || id_ctrl.rocc))) + + val bpu = Module(new BreakpointUnit(nBreakpoints)) + bpu.io.status := csr.io.status + bpu.io.bp := csr.io.bp + bpu.io.pc := ibuf.io.pc + bpu.io.ea := mem_reg_wdata + bpu.io.mcontext := csr.io.mcontext + bpu.io.scontext := csr.io.scontext + + val id_xcpt0 = ibuf.io.inst(0).bits.xcpt0 + val id_xcpt1 = ibuf.io.inst(0).bits.xcpt1 + + val (id_xcpt, id_cause) = checkExceptions(List( + (csr.io.interrupt, csr.io.interrupt_cause), + (bpu.io.debug_if, CSR.debugTriggerCause.U), + (bpu.io.xcpt_if, Causes.breakpoint.U), + (id_xcpt0.pf.inst, Causes.fetch_page_fault.U), + (id_xcpt0.gf.inst, Causes.fetch_guest_page_fault.U), + (id_xcpt0.ae.inst, Causes.fetch_access.U), + (id_xcpt1.pf.inst, Causes.fetch_page_fault.U), + (id_xcpt1.gf.inst, Causes.fetch_guest_page_fault.U), + (id_xcpt1.ae.inst, Causes.fetch_access.U), + (id_virtual_insn, Causes.virtual_instruction.U), + (id_illegal_insn, Causes.illegal_instruction.U) + )) + + val idCoverCauses = List( + (CSR.debugTriggerCause, "DEBUG_TRIGGER"), + (Causes.breakpoint, "BREAKPOINT"), + (Causes.fetch_access, "FETCH_ACCESS"), + (Causes.illegal_instruction, "ILLEGAL_INSTRUCTION") + ) ++ (if (usingVM) + List( + (Causes.fetch_page_fault, "FETCH_PAGE_FAULT") + ) + else Nil) + + coverExceptions(id_xcpt, id_cause, "DECODE", idCoverCauses) + + val dcache_bypass_data = + if (fastLoadByte) io.dmem.resp.bits.data(xLen - 1, 0) + else if (fastLoadWord) io.dmem.resp.bits.data_word_bypass(xLen - 1, 0) + else wb_reg_wdata + + // detect bypass opportunities + val ex_waddr = ex_reg_inst(11, 7) & regAddrMask.U + val mem_waddr = mem_reg_inst(11, 7) & regAddrMask.U + val wb_waddr = wb_reg_inst(11, 7) & regAddrMask.U + + val bypass_sources = IndexedSeq( + (true.B, 0.U, 0.U), // treat reading x0 as a bypass + (ex_reg_valid && ex_ctrl.wxd, ex_waddr, mem_reg_wdata), + (mem_reg_valid && mem_ctrl.wxd && !mem_ctrl.mem, mem_waddr, wb_reg_wdata), + (mem_reg_valid && mem_ctrl.wxd, mem_waddr, dcache_bypass_data) + ) + + val id_bypass_src = id_raddr.map(raddr => bypass_sources.map(s => s._1 && s._2 === raddr)) + + // execute stage + val bypass_mux = bypass_sources.map(_._3) + val ex_reg_rs_bypass = Reg(Vec(id_raddr.size, Bool())) + val ex_reg_rs_lsb = Reg(Vec(id_raddr.size, UInt(log2Ceil(bypass_sources.size).W))) + val ex_reg_rs_msb = Reg(Vec(id_raddr.size, UInt())) + + val ex_rs = + for (i <- 0 until id_raddr.size) + yield Mux(ex_reg_rs_bypass(i), bypass_mux(ex_reg_rs_lsb(i)), Cat(ex_reg_rs_msb(i), ex_reg_rs_lsb(i))) + + val ex_imm = ImmGen(ex_ctrl.sel_imm, ex_reg_inst) + val ex_rs1shl = Mux(ex_reg_inst(3), ex_rs(0)(31, 0), ex_rs(0)) << ex_reg_inst(14, 13) + + val ex_op1 = MuxLookup(ex_ctrl.sel_alu1, 0.S)(Seq( + A1_RS1 -> ex_rs(0).asSInt, + A1_PC -> ex_reg_pc.asSInt, + A1_RS1SHL -> (if (rocketParams.useZba) ex_rs1shl.asSInt else 0.S) + )) + + val ex_op2_oh = + UIntToOH(Mux(ex_ctrl.sel_alu2(0), (ex_reg_inst >> 20).asUInt, ex_rs(1))(log2Ceil(xLen) - 1, 0)).asSInt + + val ex_op2 = MuxLookup(ex_ctrl.sel_alu2, 0.S)(Seq( + A2_RS2 -> ex_rs(1).asSInt, + A2_IMM -> ex_imm, + A2_SIZE -> Mux(ex_reg_rvc, 2.S, 4.S) + ) ++ (if (coreParams.useZbs) + Seq( + A2_RS2OH -> ex_op2_oh, + A2_IMMOH -> ex_op2_oh + ) + else Nil)) + + val (ex_new_vl, ex_new_vconfig) = + if (usingVector) { + val ex_new_vtype = VType.fromUInt(MuxCase( + ex_rs(1), + Seq( + ex_reg_inst(31, 30).andR -> ex_reg_inst(29, 20), + !ex_reg_inst(31) -> ex_reg_inst(30, 20) + ) + )) + val ex_avl = Mux( + ex_ctrl.rxs1, + Mux( + ex_reg_inst(19, 15) === 0.U, + Mux(ex_reg_inst(11, 7) === 0.U, csr.io.vector.get.vconfig.vl, ex_new_vtype.vlMax), + ex_rs(0) + ), + ex_reg_inst(19, 15) + ) + val ex_new_vl = ex_new_vtype.vl(ex_avl, csr.io.vector.get.vconfig.vl, false.B, false.B, false.B) + val ex_new_vconfig = Wire(new VConfig) + ex_new_vconfig.vtype := ex_new_vtype + ex_new_vconfig.vl := ex_new_vl + (Some(ex_new_vl), Some(ex_new_vconfig)) + } else { (None, None) } + + val alu = Module(new ALU) + alu.io.dw := ex_ctrl.alu_dw + alu.io.fn := ex_ctrl.alu_fn + alu.io.in2 := ex_op2.asUInt + alu.io.in1 := ex_op1.asUInt + + // multiplier and divider + val div = Module(new MulDiv(if (pipelinedMul) mulDivParams.copy(mulUnroll = 0) else mulDivParams, width = xLen)) + div.io.req.valid := ex_reg_valid && ex_ctrl.div + div.io.req.bits.dw := ex_ctrl.alu_dw + div.io.req.bits.fn := ex_ctrl.alu_fn + div.io.req.bits.in1 := ex_rs(0) + div.io.req.bits.in2 := ex_rs(1) + div.io.req.bits.tag := ex_waddr + + val mul = pipelinedMul.option { + val m = Module(new PipelinedMultiplier(xLen, 2)) + m.io.req.valid := ex_reg_valid && ex_ctrl.mul + m.io.req.bits := div.io.req.bits + m + } + + ex_reg_valid := !ctrl_killd + ex_reg_replay := !take_pc && ibuf.io.inst(0).valid && ibuf.io.inst(0).bits.replay + ex_reg_xcpt := !ctrl_killd && id_xcpt + ex_reg_xcpt_interrupt := !take_pc && ibuf.io.inst(0).valid && csr.io.interrupt + + when(!ctrl_killd) { + ex_ctrl := id_ctrl + ex_reg_rvc := ibuf.io.inst(0).bits.rvc + ex_ctrl.csr := id_csr + when(id_ctrl.fence && id_fence_succ === 0.U)(id_reg_pause := true.B) + when(id_fence_next)(id_reg_fence := true.B) + when(id_xcpt) { // pass PC down ALU writeback pipeline for badaddr + ex_ctrl.alu_fn := FN_ADD + ex_ctrl.alu_dw := DW_XPR + ex_ctrl.sel_alu1 := A1_RS1 // badaddr := instruction + ex_ctrl.sel_alu2 := A2_ZERO + when(id_xcpt1.asUInt.orR) { // badaddr := PC+2 + ex_ctrl.sel_alu1 := A1_PC + ex_ctrl.sel_alu2 := A2_SIZE + ex_reg_rvc := true.B + } + when(bpu.io.xcpt_if || id_xcpt0.asUInt.orR) { // badaddr := PC + ex_ctrl.sel_alu1 := A1_PC + ex_ctrl.sel_alu2 := A2_ZERO + } + } + ex_reg_flush_pipe := id_ctrl.fence_i || id_csr_flush + ex_reg_load_use := id_load_use + ex_reg_hls := usingHypervisor.B && id_system_insn && id_ctrl.mem_cmd.isOneOf(M_XRD, M_XWR, M_HLVX) + ex_reg_mem_size := Mux(usingHypervisor.B && id_system_insn, id_inst(0)(27, 26), id_inst(0)(13, 12)) + when(id_ctrl.mem_cmd.isOneOf(M_SFENCE, M_HFENCEV, M_HFENCEG, M_FLUSH_ALL)) { + ex_reg_mem_size := Cat(id_raddr2 =/= 0.U, id_raddr1 =/= 0.U) + } + when(id_ctrl.mem_cmd === M_SFENCE && csr.io.status.v) { + ex_ctrl.mem_cmd := M_HFENCEV + } + if (tile.dcache.flushOnFenceI) { + when(id_ctrl.fence_i) { + ex_reg_mem_size := 0.U + } + } + + for (i <- 0 until id_raddr.size) { + val do_bypass = id_bypass_src(i).reduce(_ || _) + val bypass_src = PriorityEncoder(id_bypass_src(i)) + ex_reg_rs_bypass(i) := do_bypass + ex_reg_rs_lsb(i) := bypass_src + when(id_ren(i) && !do_bypass) { + ex_reg_rs_lsb(i) := id_rs(i)(log2Ceil(bypass_sources.size) - 1, 0) + ex_reg_rs_msb(i) := id_rs(i) >> log2Ceil(bypass_sources.size) + } + } + when(id_illegal_insn || id_virtual_insn) { + val inst = Mux(ibuf.io.inst(0).bits.rvc, id_raw_inst(0)(15, 0), id_raw_inst(0)) + ex_reg_rs_bypass(0) := false.B + ex_reg_rs_lsb(0) := inst(log2Ceil(bypass_sources.size) - 1, 0) + ex_reg_rs_msb(0) := inst >> log2Ceil(bypass_sources.size) + } + } + when(!ctrl_killd || csr.io.interrupt || ibuf.io.inst(0).bits.replay) { + ex_reg_cause := id_cause + ex_reg_inst := id_inst(0) + ex_reg_raw_inst := id_raw_inst(0) + ex_reg_pc := ibuf.io.pc + ex_reg_btb_resp := ibuf.io.btb_resp + ex_reg_wphit := bpu.io.bpwatch.map(bpw => bpw.ivalid(0)) + ex_reg_set_vconfig := id_set_vconfig && !id_xcpt + } + + // replay inst in ex stage? + val ex_pc_valid = ex_reg_valid || ex_reg_replay || ex_reg_xcpt_interrupt + val wb_dcache_miss = wb_ctrl.mem && !io.dmem.resp.valid + + val replay_ex_structural = ex_ctrl.mem && !io.dmem.req.ready || + ex_ctrl.div && !div.io.req.ready || + ex_ctrl.vec && !io.vector.map(_.ex.ready).getOrElse(true.B) + + val replay_ex_load_use = wb_dcache_miss && ex_reg_load_use + val replay_ex = ex_reg_replay || (ex_reg_valid && (replay_ex_structural || replay_ex_load_use)) + val ctrl_killx = take_pc_mem_wb || replay_ex || !ex_reg_valid + // detect 2-cycle load-use delay for LB/LH/SC + val ex_slow_bypass = ex_ctrl.mem_cmd === M_XSC || ex_reg_mem_size < 2.U + val ex_sfence = + usingVM.B && ex_ctrl.mem && (ex_ctrl.mem_cmd === M_SFENCE || ex_ctrl.mem_cmd === M_HFENCEV || ex_ctrl.mem_cmd === M_HFENCEG) + + val (ex_xcpt, ex_cause) = checkExceptions(List( + (ex_reg_xcpt_interrupt || ex_reg_xcpt, ex_reg_cause) + )) + + val exCoverCauses = idCoverCauses + coverExceptions(ex_xcpt, ex_cause, "EXECUTE", exCoverCauses) + + // memory stage + val mem_pc_valid = mem_reg_valid || mem_reg_replay || mem_reg_xcpt_interrupt + + val mem_br_target = mem_reg_pc.asSInt + + Mux( + mem_ctrl.branch && mem_br_taken, + ImmGen(IMM_SB, mem_reg_inst), + Mux(mem_ctrl.jal, ImmGen(IMM_UJ, mem_reg_inst), Mux(mem_reg_rvc, 2.S, 4.S)) + ) + + val mem_npc = (Mux( + mem_ctrl.jalr || mem_reg_sfence, + encodeVirtualAddress(mem_reg_wdata, mem_reg_wdata).asSInt, + mem_br_target + ) & -2.S).asUInt + + val mem_wrong_npc = + Mux( + ex_pc_valid, + mem_npc =/= ex_reg_pc, + Mux(ibuf.io.inst(0).valid || ibuf.io.imem.valid, mem_npc =/= ibuf.io.pc, true.B) + ) + + val mem_npc_misaligned = !csr.io.status.isa('c' - 'a') && mem_npc(1) && !mem_reg_sfence + val mem_int_wdata = + Mux(!mem_reg_xcpt && (mem_ctrl.jalr ^ mem_npc_misaligned), mem_br_target, mem_reg_wdata.asSInt).asUInt + val mem_cfi = mem_ctrl.branch || mem_ctrl.jalr || mem_ctrl.jal + val mem_cfi_taken = (mem_ctrl.branch && mem_br_taken) || mem_ctrl.jalr || mem_ctrl.jal + val mem_direction_misprediction = mem_ctrl.branch && mem_br_taken =/= (usingBTB.B && mem_reg_btb_resp.taken) + val mem_misprediction = if (usingBTB) mem_wrong_npc else mem_cfi_taken + take_pc_mem := mem_reg_valid && !mem_reg_xcpt && (mem_misprediction || mem_reg_sfence) + + mem_reg_valid := !ctrl_killx + mem_reg_replay := !take_pc_mem_wb && replay_ex + mem_reg_xcpt := !ctrl_killx && ex_xcpt + mem_reg_xcpt_interrupt := !take_pc_mem_wb && ex_reg_xcpt_interrupt + + // on pipeline flushes, cause mem_npc to hold the sequential npc, which + // will drive the W-stage npc mux + when(mem_reg_valid && mem_reg_flush_pipe) { + mem_reg_sfence := false.B + }.elsewhen(ex_pc_valid) { + mem_ctrl := ex_ctrl + mem_reg_rvc := ex_reg_rvc + mem_reg_load := ex_ctrl.mem && isRead(ex_ctrl.mem_cmd) + mem_reg_store := ex_ctrl.mem && isWrite(ex_ctrl.mem_cmd) + mem_reg_sfence := ex_sfence + mem_reg_btb_resp := ex_reg_btb_resp + mem_reg_flush_pipe := ex_reg_flush_pipe + mem_reg_slow_bypass := ex_slow_bypass + mem_reg_wphit := ex_reg_wphit + mem_reg_set_vconfig := ex_reg_set_vconfig + + mem_reg_cause := ex_cause + mem_reg_inst := ex_reg_inst + mem_reg_raw_inst := ex_reg_raw_inst + mem_reg_mem_size := ex_reg_mem_size + mem_reg_hls_or_dv := io.dmem.req.bits.dv + mem_reg_pc := ex_reg_pc + // IDecode ensured they are 1H + mem_reg_wdata := Mux(ex_reg_set_vconfig, ex_new_vl.getOrElse(alu.io.out), alu.io.out) + mem_br_taken := alu.io.cmp_out + + when(ex_ctrl.rxs2 && (ex_ctrl.mem || ex_ctrl.rocc || ex_sfence)) { + val size = Mux(ex_ctrl.rocc, log2Ceil(xLen / 8).U, ex_reg_mem_size) + mem_reg_rs2 := new StoreGen(size, 0.U, ex_rs(1), coreDataBytes).data + } + if (usingVector) { + when(ex_reg_set_vconfig) { + mem_reg_rs2 := ex_new_vconfig.get.asUInt + } + } + when(ex_ctrl.jalr && csr.io.status.debug) { + // flush I$ on D-mode JALR to effect uncached fetch without D$ flush + mem_ctrl.fence_i := true.B + mem_reg_flush_pipe := true.B + } + } + + val mem_breakpoint = (mem_reg_load && bpu.io.xcpt_ld) || (mem_reg_store && bpu.io.xcpt_st) + val mem_debug_breakpoint = (mem_reg_load && bpu.io.debug_ld) || (mem_reg_store && bpu.io.debug_st) + + val (mem_ldst_xcpt, mem_ldst_cause) = checkExceptions(List( + (mem_debug_breakpoint, CSR.debugTriggerCause.U), + (mem_breakpoint, Causes.breakpoint.U) + )) + + val (mem_xcpt, mem_cause) = checkExceptions(List( + (mem_reg_xcpt_interrupt || mem_reg_xcpt, mem_reg_cause), + (mem_reg_valid && mem_npc_misaligned, Causes.misaligned_fetch.U), + (mem_reg_valid && mem_ldst_xcpt, mem_ldst_cause) + )) + + val memCoverCauses = (exCoverCauses ++ List( + (CSR.debugTriggerCause, "DEBUG_TRIGGER"), + (Causes.breakpoint, "BREAKPOINT"), + (Causes.misaligned_fetch, "MISALIGNED_FETCH") + )).distinct + + coverExceptions(mem_xcpt, mem_cause, "MEMORY", memCoverCauses) + + val dcache_kill_mem = mem_reg_valid && mem_ctrl.wxd && io.dmem.replay_next // structural hazard on writeback port + val fpu_kill_mem = mem_reg_valid && mem_ctrl.fp && io.fpu.nack_mem + val vec_kill_mem = mem_reg_valid && mem_ctrl.mem && io.vector.map(_.mem.block_mem).getOrElse(false.B) + val vec_kill_all = mem_reg_valid && io.vector.map(_.mem.block_all).getOrElse(false.B) + val replay_mem = dcache_kill_mem || mem_reg_replay || fpu_kill_mem || vec_kill_mem || vec_kill_all + val killm_common = dcache_kill_mem || take_pc_wb || mem_reg_xcpt || !mem_reg_valid + div.io.kill := killm_common && RegNext(div.io.req.fire) + val ctrl_killm = killm_common || mem_xcpt || fpu_kill_mem || vec_kill_mem + + // writeback stage + wb_reg_valid := !ctrl_killm + wb_reg_replay := replay_mem && !take_pc_wb + wb_reg_xcpt := mem_xcpt && !take_pc_wb && !io.vector.map(_.mem.block_all).getOrElse(false.B) + wb_reg_flush_pipe := !ctrl_killm && mem_reg_flush_pipe + when(mem_pc_valid) { + wb_ctrl := mem_ctrl + wb_reg_sfence := mem_reg_sfence + wb_reg_wdata := Mux(!mem_reg_xcpt && mem_ctrl.fp && mem_ctrl.wxd, io.fpu.toint_data, mem_int_wdata) + when(mem_ctrl.rocc || mem_reg_sfence || mem_reg_set_vconfig) { + wb_reg_rs2 := mem_reg_rs2 + } + wb_reg_cause := mem_cause + wb_reg_inst := mem_reg_inst + wb_reg_raw_inst := mem_reg_raw_inst + wb_reg_mem_size := mem_reg_mem_size + wb_reg_hls_or_dv := mem_reg_hls_or_dv + wb_reg_hfence_v := mem_ctrl.mem_cmd === M_HFENCEV + wb_reg_hfence_g := mem_ctrl.mem_cmd === M_HFENCEG + wb_reg_pc := mem_reg_pc + wb_reg_wphit := mem_reg_wphit | bpu.io.bpwatch.map { bpw => + (bpw.rvalid(0) && mem_reg_load) || (bpw.wvalid(0) && mem_reg_store) + } + wb_reg_set_vconfig := mem_reg_set_vconfig + } + + val (wb_xcpt, wb_cause) = checkExceptions(List( + (wb_reg_xcpt, wb_reg_cause), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.pf.st, Causes.store_page_fault.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.pf.ld, Causes.load_page_fault.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.gf.st, Causes.store_guest_page_fault.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.gf.ld, Causes.load_guest_page_fault.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ae.st, Causes.store_access.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ae.ld, Causes.load_access.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ma.st, Causes.misaligned_store.U), + (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ma.ld, Causes.misaligned_load.U) + )) + + val wbCoverCauses = List( + (Causes.misaligned_store, "MISALIGNED_STORE"), + (Causes.misaligned_load, "MISALIGNED_LOAD"), + (Causes.store_access, "STORE_ACCESS"), + (Causes.load_access, "LOAD_ACCESS") + ) ++ (if (usingVM) + List( + (Causes.store_page_fault, "STORE_PAGE_FAULT"), + (Causes.load_page_fault, "LOAD_PAGE_FAULT") + ) + else Nil) ++ (if (usingHypervisor) + List( + (Causes.store_guest_page_fault, "STORE_GUEST_PAGE_FAULT"), + (Causes.load_guest_page_fault, "LOAD_GUEST_PAGE_FAULT") + ) + else Nil) + + coverExceptions(wb_xcpt, wb_cause, "WRITEBACK", wbCoverCauses) + + val wb_pc_valid = wb_reg_valid || wb_reg_replay || wb_reg_xcpt + val wb_wxd = wb_reg_valid && wb_ctrl.wxd + val wb_set_sboard = wb_ctrl.div || wb_dcache_miss || wb_ctrl.rocc || wb_ctrl.vec + val replay_wb_common = io.dmem.s2_nack || wb_reg_replay + val replay_wb_rocc = wb_reg_valid && wb_ctrl.rocc && !io.rocc.cmd.ready + val replay_wb_csr: Bool = wb_reg_valid && csr.io.rw_stall + val replay_wb_vec = wb_reg_valid && io.vector.map(_.wb.replay).getOrElse(false.B) + val replay_wb = replay_wb_common || replay_wb_rocc || replay_wb_csr || replay_wb_vec + take_pc_wb := replay_wb || wb_xcpt || csr.io.eret || wb_reg_flush_pipe + + // writeback arbitration + val dmem_resp_xpu = !io.dmem.resp.bits.tag(0).asBool + val dmem_resp_fpu = io.dmem.resp.bits.tag(0).asBool + val dmem_resp_waddr = io.dmem.resp.bits.tag(5, 1) + val dmem_resp_valid = io.dmem.resp.valid && io.dmem.resp.bits.has_data + val dmem_resp_replay = dmem_resp_valid && io.dmem.resp.bits.replay + + class LLWB extends Bundle { + val data = UInt(xLen.W) + val tag = UInt(5.W) + } + + val ll_arb = Module(new Arbiter(new LLWB, 3)) // div, rocc, vec + ll_arb.io.in.foreach(_.valid := false.B) + ll_arb.io.in.foreach(_.bits := DontCare) + val ll_wdata = WireInit(ll_arb.io.out.bits.data) + val ll_waddr = WireInit(ll_arb.io.out.bits.tag) + val ll_wen = WireInit(ll_arb.io.out.fire) + ll_arb.io.out.ready := !wb_wxd + + div.io.resp.ready := ll_arb.io.in(0).ready + ll_arb.io.in(0).valid := div.io.resp.valid + ll_arb.io.in(0).bits.data := div.io.resp.bits.data + ll_arb.io.in(0).bits.tag := div.io.resp.bits.tag + + if (usingRoCC) { + io.rocc.resp.ready := ll_arb.io.in(1).ready + ll_arb.io.in(1).valid := io.rocc.resp.valid + ll_arb.io.in(1).bits.data := io.rocc.resp.bits.data + ll_arb.io.in(1).bits.tag := io.rocc.resp.bits.rd + } else { + // tie off RoCC + io.rocc.resp.ready := false.B + io.rocc.mem.req.ready := false.B + } + + io.vector.map { v => + v.resp.ready := Mux(v.resp.bits.fp, !(dmem_resp_valid && dmem_resp_fpu), ll_arb.io.in(2).ready) + ll_arb.io.in(2).valid := v.resp.valid && !v.resp.bits.fp + ll_arb.io.in(2).bits.data := v.resp.bits.data + ll_arb.io.in(2).bits.tag := v.resp.bits.rd + } + // Dont care mem since not all RoCC need accessing memory + io.rocc.mem := DontCare + + when(dmem_resp_replay && dmem_resp_xpu) { + ll_arb.io.out.ready := false.B + ll_waddr := dmem_resp_waddr + ll_wen := true.B + } + + val wb_valid = wb_reg_valid && !replay_wb && !wb_xcpt + val wb_wen = wb_valid && wb_ctrl.wxd + val rf_wen = wb_wen || ll_wen + val rf_waddr = Mux(ll_wen, ll_waddr, wb_waddr) + + val rf_wdata = Mux( + dmem_resp_valid && dmem_resp_xpu, + io.dmem.resp.bits.data(xLen - 1, 0), + Mux( + ll_wen, + ll_wdata, + Mux( + wb_ctrl.csr =/= CSR.N, + csr.io.rw.rdata, + Mux(wb_ctrl.mul, mul.map(_.io.resp.bits.data).getOrElse(wb_reg_wdata), wb_reg_wdata) + ) + ) + ) + + when(rf_wen)(rf.write(rf_waddr, rf_wdata)) + + // hook up control/status regfile + csr.io.ungated_clock := clock + csr.io.decode(0).inst := id_inst(0) + csr.io.exception := wb_xcpt + csr.io.cause := wb_cause + csr.io.retire := wb_valid + csr.io.inst(0) := (if (usingCompressed) + Cat(Mux(wb_reg_raw_inst(1, 0).andR, wb_reg_inst >> 16, 0.U), wb_reg_raw_inst(15, 0)) + else wb_reg_inst) + csr.io.interrupts := io.interrupts + csr.io.hartid := io.hartid + io.fpu.fcsr_rm := csr.io.fcsr_rm + val vector_fcsr_flags = io.vector.map(_.set_fflags.bits).getOrElse(0.U(5.W)) + val vector_fcsr_flags_valid = io.vector.map(_.set_fflags.valid).getOrElse(false.B) + csr.io.fcsr_flags.valid := io.fpu.fcsr_flags.valid | vector_fcsr_flags_valid + csr.io.fcsr_flags.bits := (io.fpu.fcsr_flags.bits & Fill(5, io.fpu.fcsr_flags.valid)) | (vector_fcsr_flags & Fill( + 5, + vector_fcsr_flags_valid + )) + io.fpu.time := csr.io.time(31, 0) + io.fpu.hartid := io.hartid + csr.io.rocc_interrupt := io.rocc.interrupt + csr.io.pc := wb_reg_pc + + val tval_dmem_addr = !wb_reg_xcpt + + val tval_any_addr = tval_dmem_addr || + wb_reg_cause.isOneOf( + Causes.breakpoint.U, + Causes.fetch_access.U, + Causes.fetch_page_fault.U, + Causes.fetch_guest_page_fault.U + ) + + val tval_inst = wb_reg_cause === Causes.illegal_instruction.U + val tval_valid = wb_xcpt && (tval_any_addr || tval_inst) + csr.io.gva := wb_xcpt && (tval_any_addr && csr.io.status.v || tval_dmem_addr && wb_reg_hls_or_dv) + csr.io.tval := Mux(tval_valid, encodeVirtualAddress(wb_reg_wdata, wb_reg_wdata), 0.U) + + val (htval, mhtinst_read_pseudo) = { + val htval_valid_imem = wb_reg_xcpt && wb_reg_cause === Causes.fetch_guest_page_fault.U + val htval_imem = Mux(htval_valid_imem, io.imem.gpa.bits, 0.U) + assert(!htval_valid_imem || io.imem.gpa.valid) + + val htval_valid_dmem = + wb_xcpt && tval_dmem_addr && io.dmem.s2_xcpt.gf.asUInt.orR && !io.dmem.s2_xcpt.pf.asUInt.orR + val htval_dmem = Mux(htval_valid_dmem, io.dmem.s2_gpa, 0.U) + + val htval = (htval_dmem | htval_imem) >> hypervisorExtraAddrBits + // read pseudoinstruction if a guest-page fault is caused by an implicit memory access for VS-stage address + // translation + val mhtinst_read_pseudo = (io.imem.gpa_is_pte && htval_valid_imem) || (io.dmem.s2_gpa_is_pte && htval_valid_dmem) + (htval, mhtinst_read_pseudo) + } + + csr.io.vector.foreach { v => + v.set_vconfig.valid := wb_reg_set_vconfig && wb_reg_valid + v.set_vconfig.bits := wb_reg_rs2.asTypeOf(new VConfig) + v.set_vs_dirty := wb_valid && wb_ctrl.vec + v.set_vstart.valid := wb_valid && wb_reg_set_vconfig + v.set_vstart.bits := 0.U + } + + io.vector.foreach { v => + when(v.wb.retire || v.wb.xcpt || wb_ctrl.vec) { + csr.io.pc := v.wb.pc + csr.io.retire := v.wb.retire + csr.io.inst(0) := v.wb.inst + when(v.wb.xcpt && !wb_reg_xcpt) { + wb_xcpt := true.B + wb_cause := v.wb.cause + csr.io.tval := v.wb.tval + } + } + v.wb.store_pending := io.dmem.store_pending + v.wb.vxrm := csr.io.vector.get.vxrm + v.wb.frm := csr.io.fcsr_rm + csr.io.vector.get.set_vxsat := v.set_vxsat + when(v.set_vconfig.valid) { + csr.io.vector.get.set_vconfig.valid := true.B + csr.io.vector.get.set_vconfig.bits := v.set_vconfig.bits + } + when(v.set_vstart.valid) { + csr.io.vector.get.set_vstart.valid := true.B + csr.io.vector.get.set_vstart.bits := v.set_vstart.bits + } + } + + csr.io.htval := htval + csr.io.mhtinst_read_pseudo := mhtinst_read_pseudo + io.ptw.ptbr := csr.io.ptbr + io.ptw.hgatp := csr.io.hgatp + io.ptw.vsatp := csr.io.vsatp + (io.ptw.customCSRs.csrs zip csr.io.customCSRs).map { case (lhs, rhs) => lhs <> rhs } + io.ptw.status := csr.io.status + io.ptw.hstatus := csr.io.hstatus + io.ptw.gstatus := csr.io.gstatus + io.ptw.pmp := csr.io.pmp + csr.io.rw.addr := wb_reg_inst(31, 20) + csr.io.rw.cmd := CSR.maskCmd(wb_reg_valid, wb_ctrl.csr) + csr.io.rw.wdata := wb_reg_wdata + + io.rocc.csrs <> csr.io.roccCSRs + io.trace.time := csr.io.time + io.trace.insns := csr.io.trace + if (rocketParams.debugROB.isDefined) { + val sz = rocketParams.debugROB.get.size + if (sz < 1) { // use unsynthesizable ROB + val csr_trace_with_wdata = WireInit(csr.io.trace(0)) + csr_trace_with_wdata.wdata.get := rf_wdata + val should_wb = WireInit((wb_ctrl.wfd || (wb_ctrl.wxd && wb_waddr =/= 0.U)) && !csr.io.trace(0).exception) + val has_wb = WireInit(wb_ctrl.wxd && wb_wen && !wb_set_sboard) + val wb_addr = WireInit(wb_waddr + Mux(wb_ctrl.wfd, 32.U, 0.U)) + + io.vector.foreach { v => + when(v.wb.retire) { + should_wb := v.wb.rob_should_wb + has_wb := false.B + wb_addr := Cat(v.wb.rob_should_wb_fp, csr_trace_with_wdata.insn(11, 7)) + } + } + + DebugROB.pushTrace(clock, reset, io.hartid, csr_trace_with_wdata, should_wb, has_wb, wb_addr) + + io.trace.insns(0) := DebugROB.popTrace(clock, reset, io.hartid) + + DebugROB.pushWb(clock, reset, io.hartid, ll_wen, rf_waddr, rf_wdata) + } else { // synthesizable ROB (no FPRs) + require(!usingVector, "Synthesizable ROB does not support vector implementations") + val csr_trace_with_wdata = WireInit(csr.io.trace(0)) + csr_trace_with_wdata.wdata.get := rf_wdata + + val debug_rob = Module(new HardDebugROB(sz, 32)) + debug_rob.io.i_insn := csr_trace_with_wdata + debug_rob.io.should_wb := (wb_ctrl.wfd || (wb_ctrl.wxd && wb_waddr =/= 0.U)) && + !csr.io.trace(0).exception + debug_rob.io.has_wb := wb_ctrl.wxd && wb_wen && !wb_set_sboard + debug_rob.io.tag := wb_waddr + Mux(wb_ctrl.wfd, 32.U, 0.U) + + debug_rob.io.wb_val := ll_wen + debug_rob.io.wb_tag := rf_waddr + debug_rob.io.wb_data := rf_wdata + + io.trace.insns(0) := debug_rob.io.o_insn + } + } else { + io.trace.insns := csr.io.trace + } + for (((iobpw, wphit), bp) <- io.bpwatch zip wb_reg_wphit zip csr.io.bp) { + iobpw.valid(0) := wphit + iobpw.action := bp.control.action + // tie off bpwatch valids + iobpw.rvalid.foreach(_ := false.B) + iobpw.wvalid.foreach(_ := false.B) + iobpw.ivalid.foreach(_ := false.B) + } + + val hazard_targets = Seq( + (id_ctrl.rxs1 && id_raddr1 =/= 0.U, id_raddr1), + (id_ctrl.rxs2 && id_raddr2 =/= 0.U, id_raddr2), + (id_ctrl.wxd && id_waddr =/= 0.U, id_waddr) + ) + + val fp_hazard_targets = Seq( + (io.fpu.dec.ren1, id_raddr1), + (io.fpu.dec.ren2, id_raddr2), + (io.fpu.dec.ren3, id_raddr3), + (io.fpu.dec.wen, id_waddr) + ) + + val sboard = new Scoreboard(32, true) + sboard.clear(ll_wen, ll_waddr) + + def id_sboard_clear_bypass(r: UInt) = + // ll_waddr arrives late when D$ has ECC, so reshuffle the hazard check + if (!tileParams.dcache.get.dataECC.isDefined) ll_wen && ll_waddr === r + else div.io.resp.fire && div.io.resp.bits.tag === r || dmem_resp_replay && dmem_resp_xpu && dmem_resp_waddr === r + + val id_sboard_hazard = checkHazards(hazard_targets, rd => sboard.read(rd) && !id_sboard_clear_bypass(rd)) + sboard.set(wb_set_sboard && wb_wen, wb_waddr) + + // stall for RAW/WAW hazards on CSRs, loads, AMOs, and mul/div in execute stage. + val ex_cannot_bypass = + ex_ctrl.csr =/= CSR.N || ex_ctrl.jalr || ex_ctrl.mem || ex_ctrl.mul || ex_ctrl.div || ex_ctrl.fp || ex_ctrl.rocc || ex_ctrl.vec + val data_hazard_ex = ex_ctrl.wxd && checkHazards(hazard_targets, _ === ex_waddr) + val fp_data_hazard_ex = id_ctrl.fp && ex_ctrl.wfd && checkHazards(fp_hazard_targets, _ === ex_waddr) + val id_ex_hazard = ex_reg_valid && (data_hazard_ex && ex_cannot_bypass || fp_data_hazard_ex) + + // stall for RAW/WAW hazards on CSRs, LB/LH, and mul/div in memory stage. + val mem_mem_cmd_bh = + if (fastLoadWord) (!fastLoadByte).B && mem_reg_slow_bypass + else true.B + + val mem_cannot_bypass = + mem_ctrl.csr =/= CSR.N || mem_ctrl.mem && mem_mem_cmd_bh || mem_ctrl.mul || mem_ctrl.div || mem_ctrl.fp || mem_ctrl.rocc || mem_ctrl.vec + val data_hazard_mem = mem_ctrl.wxd && checkHazards(hazard_targets, _ === mem_waddr) + val fp_data_hazard_mem = id_ctrl.fp && mem_ctrl.wfd && checkHazards(fp_hazard_targets, _ === mem_waddr) + val id_mem_hazard = mem_reg_valid && (data_hazard_mem && mem_cannot_bypass || fp_data_hazard_mem) + id_load_use := mem_reg_valid && data_hazard_mem && mem_ctrl.mem + + val id_vconfig_hazard = id_ctrl.vec && ( + (ex_reg_valid && ex_reg_set_vconfig) || + (mem_reg_valid && mem_reg_set_vconfig) || + (wb_reg_valid && wb_reg_set_vconfig) + ) + + // stall for RAW/WAW hazards on load/AMO misses and mul/div in writeback. + val data_hazard_wb = wb_ctrl.wxd && checkHazards(hazard_targets, _ === wb_waddr) + val fp_data_hazard_wb = id_ctrl.fp && wb_ctrl.wfd && checkHazards(fp_hazard_targets, _ === wb_waddr) + val id_wb_hazard = wb_reg_valid && (data_hazard_wb && wb_set_sboard || fp_data_hazard_wb) + + val id_stall_fpu = + if (usingFPU) { + val fp_sboard = new Scoreboard(32) + fp_sboard.set(((wb_dcache_miss || wb_ctrl.vec) && wb_ctrl.wfd || io.fpu.sboard_set) && wb_valid, wb_waddr) + val v_ll = io.vector.map(v => v.resp.fire && v.resp.bits.fp).getOrElse(false.B) + fp_sboard.clear((dmem_resp_replay && dmem_resp_fpu) || v_ll, io.fpu.ll_resp_tag) + fp_sboard.clear(io.fpu.sboard_clr, io.fpu.sboard_clra) + + checkHazards(fp_hazard_targets, fp_sboard.read _) + } else false.B + + val dcache_blocked = { + // speculate that a blocked D$ will unblock the cycle after a Grant + val blocked = Reg(Bool()) + blocked := !io.dmem.req.ready && io.dmem.clock_enabled && !io.dmem.perf.grant && (blocked || io.dmem.req.valid || io.dmem.s2_nack) + blocked && !io.dmem.perf.grant + } + + val rocc_blocked = Reg(Bool()) + rocc_blocked := !wb_xcpt && !io.rocc.cmd.ready && (io.rocc.cmd.valid || rocc_blocked) + + val ctrl_stalld = + id_ex_hazard || id_mem_hazard || id_wb_hazard || id_sboard_hazard || + id_vconfig_hazard || + csr.io.singleStep && (ex_reg_valid || mem_reg_valid || wb_reg_valid) || + id_csr_en && csr.io.decode(0).fp_csr && !io.fpu.fcsr_rdy || + id_csr_en && csr.io.decode(0).vector_csr && id_vec_busy || + id_ctrl.fp && id_stall_fpu || + id_ctrl.mem && dcache_blocked || // reduce activity during D$ misses + id_ctrl.rocc && rocc_blocked || // reduce activity while RoCC is busy + id_ctrl.div && (!(div.io.req.ready || (div.io.resp.valid && !wb_wxd)) || div.io.req.valid) || // reduce odds of replay + !clock_en || + id_do_fence || io.rocc.busy || + csr.io.csr_stall || + id_reg_pause || + io.traceStall + + ctrl_killd := !ibuf.io.inst(0).valid || ibuf.io.inst( + 0 + ).bits.replay || take_pc_mem_wb || ctrl_stalld || csr.io.interrupt + + io.imem.req.valid := take_pc + io.imem.req.bits.speculative := !take_pc_wb + io.imem.req.bits.pc := + Mux( + wb_xcpt || csr.io.eret, + csr.io.evec, // exception or [m|s]ret + Mux( + replay_wb, + wb_reg_pc, // replay + mem_npc + ) + ) // flush or branch misprediction + io.imem.flush_icache := wb_reg_valid && wb_ctrl.fence_i && !io.dmem.s2_nack + io.imem.might_request := { + imem_might_request_reg := ex_pc_valid || mem_pc_valid || io.ptw.customCSRs.disableICacheClockGate || io.vector.map( + _.trap_check_busy + ).getOrElse(false.B) + imem_might_request_reg + } + io.imem.progress := RegNext(wb_reg_valid && !replay_wb_common) + io.imem.sfence.valid := wb_reg_valid && wb_reg_sfence + io.imem.sfence.bits.rs1 := wb_reg_mem_size(0) + io.imem.sfence.bits.rs2 := wb_reg_mem_size(1) + io.imem.sfence.bits.addr := wb_reg_wdata + io.imem.sfence.bits.asid := wb_reg_rs2 + io.imem.sfence.bits.hv := wb_reg_hfence_v + io.imem.sfence.bits.hg := wb_reg_hfence_g + io.ptw.sfence := io.imem.sfence + + ibuf.io.inst(0).ready := !ctrl_stalld + + io.imem.btb_update.valid := mem_reg_valid && !take_pc_wb && mem_wrong_npc && (!mem_cfi || mem_cfi_taken) + io.imem.btb_update.bits.isValid := mem_cfi + io.imem.btb_update.bits.cfiType := + Mux( + (mem_ctrl.jal || mem_ctrl.jalr) && mem_waddr(0), + CFIType.call, + Mux( + mem_ctrl.jalr && (mem_reg_inst(19, 15) & regAddrMask.U) === BitPat("b00?01"), + CFIType.ret, + Mux(mem_ctrl.jal || mem_ctrl.jalr, CFIType.jump, CFIType.branch) + ) + ) + io.imem.btb_update.bits.target := io.imem.req.bits.pc + io.imem.btb_update.bits.br_pc := (if (usingCompressed) mem_reg_pc + Mux(mem_reg_rvc, 0.U, 2.U) else mem_reg_pc) + io.imem.btb_update.bits.pc := ~(~io.imem.btb_update.bits.br_pc | (coreInstBytes * fetchWidth - 1).U) + io.imem.btb_update.bits.prediction := mem_reg_btb_resp + io.imem.btb_update.bits.taken := DontCare + + io.imem.bht_update.valid := mem_reg_valid && !take_pc_wb + io.imem.bht_update.bits.pc := io.imem.btb_update.bits.pc + io.imem.bht_update.bits.taken := mem_br_taken + io.imem.bht_update.bits.mispredict := mem_wrong_npc + io.imem.bht_update.bits.branch := mem_ctrl.branch + io.imem.bht_update.bits.prediction := mem_reg_btb_resp.bht + + // Connect RAS in Frontend + io.imem.ras_update := DontCare + + io.fpu.valid := !ctrl_killd && id_ctrl.fp + io.fpu.killx := ctrl_killx + io.fpu.killm := killm_common + io.fpu.inst := id_inst(0) + io.fpu.fromint_data := ex_rs(0) + io.fpu.ll_resp_val := dmem_resp_valid && dmem_resp_fpu + io.fpu.ll_resp_data := (if (minFLen == 32) io.dmem.resp.bits.data_word_bypass else io.dmem.resp.bits.data) + io.fpu.ll_resp_type := io.dmem.resp.bits.size + io.fpu.ll_resp_tag := dmem_resp_waddr + io.fpu.keep_clock_enabled := io.ptw.customCSRs.disableCoreClockGate + + io.fpu.v_sew := csr.io.vector.map(_.vconfig.vtype.vsew).getOrElse(0.U) + + io.vector.map { v => + when(!(dmem_resp_valid && dmem_resp_fpu)) { + io.fpu.ll_resp_val := v.resp.valid && v.resp.bits.fp + io.fpu.ll_resp_data := v.resp.bits.data + io.fpu.ll_resp_type := v.resp.bits.size + io.fpu.ll_resp_tag := v.resp.bits.rd + } + } + + io.vector.foreach { v => + v.ex.valid := ex_reg_valid && (ex_ctrl.vec || rocketParams.vector.get.issueVConfig.B && ex_reg_set_vconfig) && !ctrl_killx + v.ex.inst := ex_reg_inst + v.ex.vconfig := csr.io.vector.get.vconfig + v.ex.vstart := Mux(mem_reg_valid && mem_ctrl.vec || wb_reg_valid && wb_ctrl.vec, 0.U, csr.io.vector.get.vstart) + v.ex.rs1 := ex_rs(0) + v.ex.rs2 := ex_rs(1) + v.ex.pc := ex_reg_pc + v.mem.frs1 := io.fpu.store_data + v.killm := killm_common + v.status := csr.io.status + } + + io.dmem.req.valid := ex_reg_valid && ex_ctrl.mem + val ex_dcache_tag = Cat(ex_waddr, ex_ctrl.fp) + require(coreParams.dcacheReqTagBits >= ex_dcache_tag.getWidth) + io.dmem.req.bits.tag := ex_dcache_tag + io.dmem.req.bits.cmd := ex_ctrl.mem_cmd + io.dmem.req.bits.size := ex_reg_mem_size + io.dmem.req.bits.signed := !Mux(ex_reg_hls, ex_reg_inst(20), ex_reg_inst(14)) + io.dmem.req.bits.phys := false.B + io.dmem.req.bits.addr := encodeVirtualAddress(ex_rs(0), alu.io.adder_out) + io.dmem.req.bits.idx.foreach(_ := io.dmem.req.bits.addr) + io.dmem.req.bits.dprv := Mux(ex_reg_hls, csr.io.hstatus.spvp, csr.io.status.dprv) + io.dmem.req.bits.dv := ex_reg_hls || csr.io.status.dv + io.dmem.req.bits.no_resp := !isRead(ex_ctrl.mem_cmd) || (!ex_ctrl.fp && ex_waddr === 0.U) + io.dmem.req.bits.no_alloc := DontCare + io.dmem.req.bits.no_xcpt := DontCare + io.dmem.req.bits.data := DontCare + io.dmem.req.bits.mask := DontCare + + io.dmem.s1_data.data := (if (fLen == 0) mem_reg_rs2 + else Mux(mem_ctrl.fp, Fill(coreDataBits / fLen, io.fpu.store_data), mem_reg_rs2)) + io.dmem.s1_data.mask := DontCare + + io.dmem.s1_kill := killm_common || mem_ldst_xcpt || fpu_kill_mem || vec_kill_mem + io.dmem.s2_kill := false.B + // don't let D$ go to sleep if we're probably going to use it soon + io.dmem.keep_clock_enabled := ibuf.io.inst(0).valid && id_ctrl.mem && !csr.io.csr_stall + + io.rocc.cmd.valid := wb_reg_valid && wb_ctrl.rocc && !replay_wb_common + io.rocc.exception := wb_xcpt && csr.io.status.xs.orR + io.rocc.cmd.bits.raw_inst := wb_reg_inst + val inst_bits = wb_reg_inst.asTypeOf(new RoCCInstruction()) + io.rocc.cmd.bits.funct := inst_bits.funct + io.rocc.cmd.bits.funct3 := wb_reg_inst(14, 12) + io.rocc.cmd.bits.rs2 := inst_bits.rs2 + io.rocc.cmd.bits.rs1 := inst_bits.rs1 + io.rocc.cmd.bits.pc := wb_reg_pc + io.rocc.cmd.bits.xd := inst_bits.xd + io.rocc.cmd.bits.xs1 := inst_bits.xs1 + io.rocc.cmd.bits.xs2 := inst_bits.xs2 + io.rocc.cmd.bits.rd := inst_bits.rd + io.rocc.cmd.bits.opcode := inst_bits.opcode + io.rocc.cmd.bits.rs1Data := wb_reg_wdata + io.rocc.cmd.bits.rs2Data := wb_reg_rs2 + + // gate the clock + val unpause = + csr.io.time(rocketParams.lgPauseCycles - 1, 0) === 0.U || csr.io.inhibit_cycle || io.dmem.perf.release || take_pc + when(unpause)(id_reg_pause := false.B) + io.cease := csr.io.status.cease && !clock_en_reg + io.wfi := csr.io.status.wfi + if (rocketParams.clockGate) { + long_latency_stall := csr.io.csr_stall || io.dmem.perf.blocked || id_reg_pause && !unpause + clock_en := clock_en_reg || ex_pc_valid || (!long_latency_stall && io.imem.resp.valid) + clock_en_reg := + ex_pc_valid || mem_pc_valid || wb_pc_valid || // instruction in flight + io.ptw.customCSRs.disableCoreClockGate || // chicken bit + !div.io.req.ready || // mul/div in flight + usingFPU.B && !io.fpu.fcsr_rdy || // long-latency FPU in flight + io.dmem.replay_next || // long-latency load replaying + (!long_latency_stall && (ibuf.io.inst(0).valid || io.imem.resp.valid)) // instruction pending + + assert(!(ex_pc_valid || mem_pc_valid || wb_pc_valid) || clock_en) + } + + // evaluate performance counters + val icache_blocked = !(io.imem.resp.valid || RegNext(io.imem.resp.valid)) + csr.io.counters foreach { c => c.inc := RegNext(perfEvents.evaluate(c.eventSel)) } + + val coreMonitorBundle = Wire(new CoreMonitorBundle(xLen, fLen)) + + coreMonitorBundle.clock := clock + coreMonitorBundle.reset := reset + coreMonitorBundle.hartid := io.hartid + coreMonitorBundle.timer := csr.io.time(31, 0) + coreMonitorBundle.valid := csr.io.trace(0).valid && !csr.io.trace(0).exception + coreMonitorBundle.pc := csr.io.trace(0).iaddr(vaddrBitsExtended - 1, 0).sextTo(xLen) + coreMonitorBundle.wrenx := wb_wen && !wb_set_sboard + coreMonitorBundle.wrenf := false.B + coreMonitorBundle.wrdst := wb_waddr + coreMonitorBundle.wrdata := rf_wdata + coreMonitorBundle.rd0src := wb_reg_inst(19, 15) + coreMonitorBundle.rd0val := RegNext(RegNext(ex_rs(0))) + coreMonitorBundle.rd1src := wb_reg_inst(24, 20) + coreMonitorBundle.rd1val := RegNext(RegNext(ex_rs(1))) + coreMonitorBundle.inst := csr.io.trace(0).insn + coreMonitorBundle.excpt := csr.io.trace(0).exception + coreMonitorBundle.priv_mode := csr.io.trace(0).priv + + if (enableCommitLog) { + val t = csr.io.trace(0) + val rd = wb_waddr + val wfd = wb_ctrl.wfd + val wxd = wb_ctrl.wxd + val has_data = wb_wen && !wb_set_sboard + + when(t.valid && !t.exception) { + when(wfd) { + printf("%d 0x%x (0x%x) f%d p%d 0xXXXXXXXXXXXXXXXX\n", t.priv, t.iaddr, t.insn, rd, rd + 32.U) + } + .elsewhen(wxd && rd =/= 0.U && has_data) { + printf("%d 0x%x (0x%x) x%d 0x%x\n", t.priv, t.iaddr, t.insn, rd, rf_wdata) + } + .elsewhen(wxd && rd =/= 0.U && !has_data) { + printf("%d 0x%x (0x%x) x%d p%d 0xXXXXXXXXXXXXXXXX\n", t.priv, t.iaddr, t.insn, rd, rd) + } + .otherwise { + printf("%d 0x%x (0x%x)\n", t.priv, t.iaddr, t.insn) + } + } + + when(ll_wen && rf_waddr =/= 0.U) { + printf("x%d p%d 0x%x\n", rf_waddr, rf_waddr, rf_wdata) + } + } else { + when(csr.io.trace(0).valid) { + printf( + "C%d: %d [%d] pc=[%x] W[r%d=%x][%d] R[r%d=%x] R[r%d=%x] inst=[%x] DASM(%x) wb_xcpt:%d\n", + io.hartid, + coreMonitorBundle.timer, + coreMonitorBundle.valid, + coreMonitorBundle.pc, + Mux(wb_ctrl.wxd || wb_ctrl.wfd, coreMonitorBundle.wrdst, 0.U), + Mux(coreMonitorBundle.wrenx, coreMonitorBundle.wrdata, 0.U), + coreMonitorBundle.wrenx, + Mux(wb_ctrl.rxs1 || wb_ctrl.rfs1, coreMonitorBundle.rd0src, 0.U), + Mux(wb_ctrl.rxs1 || wb_ctrl.rfs1, coreMonitorBundle.rd0val, 0.U), + Mux(wb_ctrl.rxs2 || wb_ctrl.rfs2, coreMonitorBundle.rd1src, 0.U), + Mux(wb_ctrl.rxs2 || wb_ctrl.rfs2, coreMonitorBundle.rd1val, 0.U), + coreMonitorBundle.inst, + coreMonitorBundle.inst, + wb_xcpt + ) + } + } + + // CoreMonitorBundle for late latency writes + val xrfWriteBundle = Wire(new CoreMonitorBundle(xLen, fLen)) + + xrfWriteBundle.clock := clock + xrfWriteBundle.reset := reset + xrfWriteBundle.hartid := io.hartid + xrfWriteBundle.timer := csr.io.time(31, 0) + xrfWriteBundle.valid := false.B + xrfWriteBundle.pc := 0.U + xrfWriteBundle.wrdst := rf_waddr + xrfWriteBundle.wrenx := rf_wen && !(csr.io.trace(0).valid && wb_wen && (wb_waddr === rf_waddr)) + xrfWriteBundle.wrenf := false.B + xrfWriteBundle.wrdata := rf_wdata + xrfWriteBundle.rd0src := 0.U + xrfWriteBundle.rd0val := 0.U + xrfWriteBundle.rd1src := 0.U + xrfWriteBundle.rd1val := 0.U + xrfWriteBundle.inst := 0.U + xrfWriteBundle.excpt := false.B + xrfWriteBundle.priv_mode := csr.io.trace(0).priv + + if (rocketParams.haveSimTimeout) + PlusArg.timeout( + name = "max_core_cycles", + docstring = "Kill the emulation after INT rdtime cycles. Off if 0." + )(csr.io.time) + + } // leaving gated-clock domain + val rocketImpl = withClock(gated_clock)(new RocketImpl) + + def checkExceptions(x: Seq[(Bool, UInt)]) = + (WireInit(x.map(_._1).reduce(_ || _)), WireInit(PriorityMux(x))) + + def coverExceptions( + exceptionValid: Bool, + cause: UInt, + labelPrefix: String, + coverCausesLabels: Seq[(Int, String)] + ): Unit = { + for ((coverCause, label) <- coverCausesLabels) { + property.cover(exceptionValid && (cause === coverCause.U), s"${labelPrefix}_$label") + } + } + + def checkHazards(targets: Seq[(Bool, UInt)], cond: UInt => Bool) = + targets.map(h => h._1 && cond(h._2)).reduce(_ || _) + + def encodeVirtualAddress(a0: UInt, ea: UInt) = + if (vaddrBitsExtended == vaddrBits) ea + else { + // efficient means to compress 64-bit VA into vaddrBits+1 bits + // (VA is bad if VA(vaddrBits) != VA(vaddrBits-1)) + val b = vaddrBitsExtended - 1 + val a = (a0 >> b).asSInt + val msb = Mux(a === 0.S || a === -1.S, ea(b), !ea(b - 1)) + Cat(msb, ea(b - 1, 0)) + } + + class Scoreboard(n: Int, zero: Boolean = false) { + def set(en: Bool, addr: UInt): Unit = update(en, _next | mask(en, addr)) + def clear(en: Bool, addr: UInt): Unit = update(en, _next & ~mask(en, addr)) + def read(addr: UInt): Bool = r(addr) + def readBypassed(addr: UInt): Bool = _next(addr) + + private val _r = RegInit(0.U(n.W)) + private val r = if (zero) (_r >> 1 << 1) else _r + private var _next = r + private var ens = false.B + private def mask(en: Bool, addr: UInt) = Mux(en, 1.U << addr, 0.U) + + private def update(en: Bool, update: UInt) = { + _next = update + ens = ens || en + when(ens)(_r := _next) + } + + } + +} + +class RegFile(n: Int, w: Int, zero: Boolean = false) { + val rf = Mem(n, UInt(w.W)) + private def access(addr: UInt) = rf(~addr(log2Up(n) - 1, 0)) + private val reads = ArrayBuffer[(UInt, UInt)]() + private var canRead = true + + def read(addr: UInt) = { + require(canRead) + reads += addr -> Wire(UInt()) + reads.last._2 := Mux(zero.B && addr === 0.U, 0.U, access(addr)) + reads.last._2 + } + + def write(addr: UInt, data: UInt) = { + canRead = false + when(addr =/= 0.U) { + access(addr) := data + for ((raddr, rdata) <- reads) + when(addr === raddr)(rdata := data) + } + } + +} + +object ImmGen { + + def apply(sel: UInt, inst: UInt) = { + val sign = Mux(sel === IMM_Z, 0.S, inst(31).asSInt) + val b30_20 = Mux(sel === IMM_U, inst(30, 20).asSInt, sign) + val b19_12 = Mux(sel =/= IMM_U && sel =/= IMM_UJ, sign, inst(19, 12).asSInt) + val b11 = Mux( + sel === IMM_U || sel === IMM_Z, + 0.S, + Mux(sel === IMM_UJ, inst(20).asSInt, Mux(sel === IMM_SB, inst(7).asSInt, sign)) + ) + val b10_5 = Mux(sel === IMM_U || sel === IMM_Z, 0.U, inst(30, 25)) + val b4_1 = Mux( + sel === IMM_U, + 0.U, + Mux(sel === IMM_S || sel === IMM_SB, inst(11, 8), Mux(sel === IMM_Z, inst(19, 16), inst(24, 21))) + ) + val b0 = Mux(sel === IMM_S, inst(7), Mux(sel === IMM_I, inst(20), Mux(sel === IMM_Z, inst(15), 0.U))) + + Cat(sign, b30_20, b19_12, b11, b10_5, b4_1, b0).asSInt + } + +} diff --git a/arch/src/main/scala/framework/core/bbtile/id/RVVRoCCDecode.scala b/arch/src/main/scala/framework/core/bbtile/id/RVVRoCCDecode.scala new file mode 100644 index 00000000..c5c6e430 --- /dev/null +++ b/arch/src/main/scala/framework/core/bbtile/id/RVVRoCCDecode.scala @@ -0,0 +1,31 @@ +package framework.core.bbtile.id + +import chisel3._ +import chisel3.util._ +import org.chipsalliance.cde.config.Parameters +import freechips.rocketchip.rocket._ +import freechips.rocketchip.rocket.Instructions._ +import freechips.rocketchip.rocket.ALU._ +import freechips.rocketchip.rocket.constants.ScalarOpConstants +import freechips.rocketchip.rocket.constants.MemoryOpConstants +import freechips.rocketchip.util._ + +/** RVV -> RoCC decode table. Marks RVV instructions as RoCC so they route to Buckyball. */ +class RVVRoCCDecode(implicit val p: Parameters) extends DecodeConstants with ScalarOpConstants with MemoryOpConstants { + + private val rvvRoCCBase: List[BitPat] = + List(Y, N, Y, N, N, N, N, N, A2_X, A1_X, IMM_X, DW_X, FN_X, N, M_X, N, N, N, N, N, N, N, CSR.N, N, N, N, N) + + val table: Array[(BitPat, List[BitPat])] = Array( + BitPat("b?????????????????????????????????1010111") -> rvvRoCCBase, + BitPat("b?????????????????000?????0000111") -> rvvRoCCBase, + BitPat("b?????????????????101?????0000111") -> rvvRoCCBase, + BitPat("b?????????????????110?????0000111") -> rvvRoCCBase, + BitPat("b?????????????????111?????0000111") -> rvvRoCCBase, + BitPat("b?????????????????000?????0100111") -> rvvRoCCBase, + BitPat("b?????????????????101?????0100111") -> rvvRoCCBase, + BitPat("b?????????????????110?????0100111") -> rvvRoCCBase, + BitPat("b?????????????????111?????0100111") -> rvvRoCCBase + ) + +} diff --git a/arch/src/main/scala/framework/core/configs/CoreParam.scala b/arch/src/main/scala/framework/core/configs/CoreParam.scala new file mode 100644 index 00000000..9686f33a --- /dev/null +++ b/arch/src/main/scala/framework/core/configs/CoreParam.scala @@ -0,0 +1,24 @@ +package framework.core.configs + +import upickle.default._ + +/** + * Core Parameter + */ +case class CoreParam( + coreDataBytes: Int, + xLen: Int, + vaddrBits: Int, + paddrBits: Int, + pgIdxBits: Int, + nPMPs: Int) // Physical Memory Protection entries, typically 8 or 16 + +object CoreParam { + implicit val rw: ReadWriter[CoreParam] = macroRW + + def apply(): CoreParam = { + val jsonStr = scala.io.Source.fromFile("src/main/scala/framework/core/configs/default.json").mkString + read[CoreParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/core/configs/default.json b/arch/src/main/scala/framework/core/configs/default.json new file mode 100644 index 00000000..d3ba0329 --- /dev/null +++ b/arch/src/main/scala/framework/core/configs/default.json @@ -0,0 +1,8 @@ +{ + "coreDataBytes": 64, + "xLen": 64, + "vaddrBits": 39, + "paddrBits": 56, + "pgIdxBits": 12, + "nPMPs": 8 +} diff --git a/arch/src/main/scala/framework/frontend/Frontend.scala b/arch/src/main/scala/framework/frontend/Frontend.scala new file mode 100644 index 00000000..85e5053f --- /dev/null +++ b/arch/src/main/scala/framework/frontend/Frontend.scala @@ -0,0 +1,77 @@ +package framework.frontend + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.frontend.decoder.{GlobalDecoder, PostGDCmd} +import framework.frontend.globalrs.{GlobalSchedComplete, GlobalSchedIssue, GlobalScheduler} +import framework.top.GlobalConfig +import framework.core.bbtile.{RoCCCommandBB, RoCCResponseBB} +import framework.balldomain.blink.SubRobRow + +/** + * Frontend Module + * Encapsulates GlobalDecoder and global scheduler + */ +@instantiable +class Frontend(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + + // RoCC command input + val cmd = Flipped(Decoupled(new Bundle { + val cmd = new RoCCCommandBB(b.core.xLen) + })) + + // Issue to domains + val ball_issue_o = Decoupled(new GlobalSchedIssue(b)) + val mem_issue_o = Decoupled(new GlobalSchedIssue(b)) + val gp_issue_o = Decoupled(new GlobalSchedIssue(b)) + // Complete from domains + val ball_complete_i = Flipped(Decoupled(new GlobalSchedComplete(b))) + val mem_complete_i = Flipped(Decoupled(new GlobalSchedComplete(b))) + val gp_complete_i = Flipped(Decoupled(new GlobalSchedComplete(b))) + + // Ball -> SubROB request passthrough + val ball_subrob_req_i = Flipped(Vec(b.ballDomain.ballNum, Decoupled(new SubRobRow(b)))) + + // RoCC response + val resp = Decoupled(new RoCCResponseBB(b.core.xLen)) + val busy = Output(Bool()) + + // Barrier interface — passthrough to GlobalRS + val barrier_arrive = Output(Bool()) + val barrier_release = Input(Bool()) + }) + + val gDecoder: Instance[GlobalDecoder] = Instantiate(new GlobalDecoder(b)) + val scheduler: Instance[GlobalScheduler] = Instantiate(new GlobalScheduler(b)) + + gDecoder.io.id_i.valid := io.cmd.valid + gDecoder.io.id_i.bits.cmd := io.cmd.bits.cmd + io.cmd.ready := gDecoder.io.id_i.ready + + scheduler.io.decode_cmd_i <> gDecoder.io.id_o + + io.ball_issue_o <> scheduler.io.ball_issue_o + io.mem_issue_o <> scheduler.io.mem_issue_o + io.gp_issue_o <> scheduler.io.gp_issue_o + + scheduler.io.ball_complete_i <> io.ball_complete_i + scheduler.io.mem_complete_i <> io.mem_complete_i + scheduler.io.gp_complete_i <> io.gp_complete_i + + // Wire SubROB request from BallDomain through to scheduler + for (i <- 0 until b.ballDomain.ballNum) { + scheduler.io.ball_subrob_req_i(i) <> io.ball_subrob_req_i(i) + } + + io.resp <> scheduler.io.scheduler_rocc_o.resp + io.busy := scheduler.io.scheduler_rocc_o.busy + + // Barrier passthrough + io.barrier_arrive := scheduler.io.barrier_arrive + scheduler.io.barrier_release := io.barrier_release + +} diff --git a/arch/src/main/scala/framework/frontend/configs/FrontendParam.scala b/arch/src/main/scala/framework/frontend/configs/FrontendParam.scala new file mode 100644 index 00000000..fbfcab33 --- /dev/null +++ b/arch/src/main/scala/framework/frontend/configs/FrontendParam.scala @@ -0,0 +1,24 @@ +package framework.frontend.configs + +import upickle.default._ + +/** + * Frontend Parameter - 包含前端所有配置 + */ +case class FrontendParam( + rob_entries: Int, + rs_out_of_order_response: Boolean, + bank_id_len: Int, + vbank_id_upper_bound: Int, + iter_len: Int, + sub_rob_depth: Int) + +object FrontendParam { + implicit val rw: ReadWriter[FrontendParam] = macroRW + + def apply(): FrontendParam = { + val jsonStr = scala.io.Source.fromFile("src/main/scala/framework/frontend/configs/default.json").mkString + read[FrontendParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/frontend/configs/default.json b/arch/src/main/scala/framework/frontend/configs/default.json new file mode 100644 index 00000000..7612f4c0 --- /dev/null +++ b/arch/src/main/scala/framework/frontend/configs/default.json @@ -0,0 +1,8 @@ +{ + "rob_entries": 16, + "rs_out_of_order_response": true, + "bank_id_len": 10, + "vbank_id_upper_bound": 31, + "iter_len": 34, + "sub_rob_depth": 64 +} diff --git a/arch/src/main/scala/framework/frontend/decoder/GISA.scala b/arch/src/main/scala/framework/frontend/decoder/GISA.scala index d71436bf..ec0b1616 100644 --- a/arch/src/main/scala/framework/frontend/decoder/GISA.scala +++ b/arch/src/main/scala/framework/frontend/decoder/GISA.scala @@ -4,5 +4,15 @@ import chisel3._ import chisel3.util._ object GISA { - val FENCE_BITPAT = BitPat("b0011111") // 31 (0x1F) + // enable=000, opcode group for no-bank-access instructions + val FENCE_BITPAT = BitPat("b0000000") // 0 (0x00) — enable=000, opcode=0 + val BARRIER_BITPAT = BitPat("b0000001") // 1 (0x01) — enable=000, opcode=1 +} + +// Domain ID constants +object DomainId { + val FRONTEND = 0.U(4.W) // Frontend (fence), does not enter ROB queue + val MEM = 1.U(4.W) // Memory domain + val GP = 2.U(4.W) // General purpose domain (T1 vector processor) + val BALL = 3.U(4.W) // Ball domain } diff --git a/arch/src/main/scala/framework/frontend/decoder/GobalDecoder.scala b/arch/src/main/scala/framework/frontend/decoder/GobalDecoder.scala index db909317..46cd70c2 100644 --- a/arch/src/main/scala/framework/frontend/decoder/GobalDecoder.scala +++ b/arch/src/main/scala/framework/frontend/decoder/GobalDecoder.scala @@ -3,54 +3,118 @@ package framework.frontend.decoder import chisel3._ import chisel3.util._ import chisel3.stage._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig +import framework.top.GlobalConfig import freechips.rocketchip.tile._ -import framework.memdomain.DISA._ + import framework.frontend.decoder.GISA._ +import framework.memdomain.frontend.cmd_channel.decoder.DISA._ +import framework.gpdomain.sequencer.decoder.DISA._ +import framework.frontend.scoreboard.BankAccessInfo -class BuckyballRawCmd(implicit p: Parameters) extends Bundle { - val cmd = new RoCCCommand -} +import framework.core.bbtile.RoCCCommandBB +class BuckyballRawCmd(val b: GlobalConfig) extends Bundle { + val cmd = new RoCCCommandBB(b.core.xLen) +} +class PostGDCmd(val b: GlobalConfig) extends Bundle { + val domain_id = UInt(4.W) + val cmd = new RoCCCommandBB(b.core.xLen) + val bankAccess = new BankAccessInfo(log2Up(b.memDomain.bankNum)) + val isFence = Bool() + val isBarrier = Bool() +} -class PostGDCmd(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - // Instruction type determination - // Ball instruction (excluding FENCE) - val is_ball = Bool() - // Memory instruction (load/store) - val is_mem = Bool() - // Fence instruction - val is_fence = Bool() +@instantiable +class GlobalDecoder(val b: GlobalConfig) extends Module { - // Raw instruction information, passed to corresponding domain decoder - val raw_cmd = new RoCCCommand -} + val bankIdLen = b.frontend.bank_id_len -class GlobalDecoder(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { + @public val io = IO(new Bundle { + val id_i = Flipped(Decoupled(new Bundle { - val cmd = new RoCCCommand + val cmd = new RoCCCommandBB(b.core.xLen) })) - val id_o = Decoupled(new PostGDCmd) + + val id_o = Decoupled(new PostGDCmd(b)) }) // If reservation station is blocked, id_i is also blocked io.id_i.ready := io.id_o.ready - val func7 = io.id_i.bits.cmd.inst.funct + val func7 = io.id_i.bits.cmd.funct + val funct3 = io.id_i.bits.cmd.funct3 + val opcode = io.id_i.bits.cmd.opcode + val rs1 = io.id_i.bits.cmd.rs1Data - // Instruction type determination: distinguish Ball, Mem, Fence instructions - val is_mem_instr = (func7 === MVIN_BITPAT) || (func7 === MVOUT_BITPAT) - val is_fence_instr = (func7 === FENCE_BITPAT) - val is_ball_instr = !is_mem_instr && !is_fence_instr + // Instruction type determination: distinguish Ball, Mem, Fence, GP (RVV) instructions + val is_mem_inst = (func7 === MVIN_BITPAT) || + (func7 === MVOUT_BITPAT) || + (func7 === MSET_BITPAT) - // Output control - io.id_o.valid := io.id_i.valid + val is_frontend_inst = func7 === FENCE_BITPAT + val is_barrier_inst = func7 === BARRIER_BITPAT - io.id_o.bits.is_ball := is_ball_instr - io.id_o.bits.is_mem := is_mem_instr - io.id_o.bits.is_fence := is_fence_instr - io.id_o.bits.raw_cmd := io.id_i.bits.cmd + // RVV instructions: opcode 0x57 (vector compute), 0x07 (vector load), 0x27 (vector store) + val is_gp_inst = (opcode === RVV_OPCODE_V) || + (opcode === RVV_OPCODE_VL) || + (opcode === RVV_OPCODE_VS) + + val is_ball_inst = !is_mem_inst && !is_frontend_inst && !is_barrier_inst && !is_gp_inst + + // Encode domain ID + val domain_id = MuxCase( + DomainId.BALL, + Seq( + is_frontend_inst -> DomainId.FRONTEND, + is_mem_inst -> DomainId.MEM, + is_gp_inst -> DomainId.GP, + is_ball_inst -> DomainId.BALL + ) + ) + + // ------------------------------------------------------------------------- + // Bank access info extraction — enable flags from funct7[6:4] + // + // Unified rs1 layout (defined in isa.h): + // rs1[9:0] = bank_0 (rd_bank_0 or wr_bank for MVIN/MSET) + // rs1[19:10] = bank_1 (rd_bank_1, dual-operand only) + // rs1[29:20] = bank_2 (wr_bank for Ball instructions) + // rs1[63:30] = iter (34-bit) + // + // funct7[6:4] enable encoding: + // 000 = no bank access + // 001 = 1 read (bank0) + // 010 = 1 write (bank2) + // 011 = 1 read + 1 write (bank0 read, bank2 write) + // 100 = 2 read + 1 write (bank0+bank1 read, bank2 write) + // 101,110,111 = no bank access (extended opcode space) + // ------------------------------------------------------------------------- + val bankAccess = Wire(new BankAccessInfo(bankIdLen)) + val enableBits = func7(6, 4) + + // Decode enable from funct7[6:4] + val hasRd0 = enableBits === 1.U || enableBits === 3.U || enableBits === 4.U + val hasRd1 = enableBits === 4.U + val hasWr = enableBits === 2.U || enableBits === 3.U || enableBits === 4.U + + bankAccess.rd_bank_0_valid := hasRd0 + bankAccess.rd_bank_0_id := rs1(bankIdLen - 1, 0) + bankAccess.rd_bank_1_valid := hasRd1 + bankAccess.rd_bank_1_id := rs1(bankIdLen + 9, 10) + bankAccess.wr_bank_valid := hasWr + // For Mem instructions (MVIN/MSET), wr_bank is bank_0 (rs1[9:0]) + // For Ball instructions, wr_bank is bank_2 (rs1[29:20]) + bankAccess.wr_bank_id := Mux(is_mem_inst, rs1(bankIdLen - 1, 0), rs1(bankIdLen + 19, 20)) + + // Output control + io.id_o.valid := io.id_i.valid + io.id_o.bits.domain_id := domain_id + io.id_o.bits.cmd := io.id_i.bits.cmd + io.id_o.bits.bankAccess := bankAccess + io.id_o.bits.isFence := is_frontend_inst + io.id_o.bits.isBarrier := is_barrier_inst } diff --git a/arch/src/main/scala/framework/frontend/globalrs/GlobalROB.scala b/arch/src/main/scala/framework/frontend/globalrs/GlobalROB.scala index ec719cad..8545d583 100644 --- a/arch/src/main/scala/framework/frontend/globalrs/GlobalROB.scala +++ b/arch/src/main/scala/framework/frontend/globalrs/GlobalROB.scala @@ -3,173 +3,228 @@ package framework.frontend.globalrs import chisel3._ import chisel3.util._ import chisel3.experimental._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.top.GlobalConfig import framework.frontend.decoder.PostGDCmd +import framework.frontend.scoreboard.{BankAliasTable, BankScoreboard} -class GlobalROB(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { +@instantiable +class GlobalROB(val b: GlobalConfig) extends Module { + + val robDepth = b.frontend.rob_entries + val idWidth = log2Up(robDepth) + val scoreBankNum = 1 << b.frontend.bank_id_len + + require( + b.frontend.vbank_id_upper_bound < b.memDomain.bankNum, + s"vbank_id_upper_bound(${b.frontend.vbank_id_upper_bound}) must be < memDomain.bankNum(${b.memDomain.bankNum})" + ) + + @public val io = IO(new Bundle { - // Allocation interface - val alloc = Flipped(new DecoupledIO(new PostGDCmd)) - - // Issue interface - issue uncompleted head instruction - val issue = new DecoupledIO(new GlobalRobEntry) - - // Completion interface - report instruction completion - val complete = Flipped(new DecoupledIO(UInt(log2Up(b.rob_entries).W))) - - // Status signals - exposed to reservation station for decision making - val empty = Output(Bool()) - val full = Output(Bool()) - // head pointer position - val head_ptr = Output(UInt(log2Up(b.rob_entries).W)) - // Number of issued but uncompleted instructions - val issued_count = Output(UInt(log2Up(b.rob_entries + 1).W)) - // Whether each entry is valid - val entry_valid = Output(Vec(b.rob_entries, Bool())) - // Whether each entry is complete - val entry_complete = Output(Vec(b.rob_entries, Bool())) + val alloc = Flipped(new DecoupledIO(new PostGDCmd(b))) + val issue = new DecoupledIO(new GlobalRobEntry(b)) + val complete = Flipped(new DecoupledIO(UInt(idWidth.W))) + + val empty = Output(Bool()) + val full = Output(Bool()) + val head_ptr = Output(UInt(idWidth.W)) + val issued_count = Output(UInt(log2Up(robDepth + 1).W)) + val entry_valid = Output(Vec(robDepth, Bool())) + val entry_complete = Output(Vec(robDepth, Bool())) + + val subRobActive = Input(Bool()) }) - // Circular ROB structure - // Initialize to zero to avoid X states in FPGA (critical for FireSim) - val robEntries = RegInit(VecInit(Seq.fill(b.rob_entries)(0.U.asTypeOf(new GlobalRobEntry)))) - // Whether entry is valid - val robValid = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - // Whether entry is issued - val robIssued = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - // Whether entry is complete - val robComplete = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - - // Circular queue pointers - // Points to oldest uncommitted instruction - val headPtr = RegInit(0.U(log2Up(b.rob_entries).W)) - // Points to next position to allocate - val tailPtr = RegInit(0.U(log2Up(b.rob_entries).W)) - // ROB ID circular counter - val robIdCounter = RegInit(0.U(log2Up(b.rob_entries).W)) - - // Number of issued but uncompleted instructions (used to limit issue) - val issuedCount = RegInit(0.U(log2Up(b.rob_entries + 1).W)) - // Maximum issue limit: half of ROB depth - val maxIssueLimit = (b.rob_entries / 2).U - - // Queue status + // --------------------------------------------------------------------------- + // BAT + Bank Scoreboard + // --------------------------------------------------------------------------- + val bat: Instance[BankAliasTable] = Instantiate( + new BankAliasTable( + bankIdLen = b.frontend.bank_id_len, + vbankUpper = b.frontend.vbank_id_upper_bound, + robEntries = robDepth + ) + ) + + val scoreboard: Instance[BankScoreboard] = Instantiate(new BankScoreboard(scoreBankNum, robDepth)) + + // --------------------------------------------------------------------------- + // Instruction trace (DPI-C, defined in ITraceDPI.scala) + // --------------------------------------------------------------------------- + val itraceAlloc = Module(new ITraceDPI) + val itraceIssue = Module(new ITraceDPI) + val itraceComp = Module(new ITraceDPI) + + for (t <- Seq(itraceAlloc, itraceIssue, itraceComp)) { + t.io.is_issue := 0.U + t.io.rob_id := 0.U + t.io.domain_id := 0.U + t.io.funct := 0.U + t.io.pc := 0.U + t.io.rs1 := 0.U + t.io.rs2 := 0.U + t.io.bank_enable := 0.U + t.io.enable := false.B + } + + // --------------------------------------------------------------------------- + // Storage + // --------------------------------------------------------------------------- + val robEntries = RegInit(VecInit(Seq.fill(robDepth)(0.U.asTypeOf(new GlobalRobEntry(b))))) + val robValid = RegInit(VecInit(Seq.fill(robDepth)(false.B))) + val robIssued = RegInit(VecInit(Seq.fill(robDepth)(false.B))) + val robComplete = RegInit(VecInit(Seq.fill(robDepth)(false.B))) + + val headPtr = RegInit(0.U(idWidth.W)) + val tailPtr = RegInit(0.U(idWidth.W)) + val issuedCount = RegInit(0.U(log2Up(robDepth + 1).W)) + val isEmpty = headPtr === tailPtr && !robValid(headPtr) val isFull = headPtr === tailPtr && robValid(headPtr) -// ----------------------------------------------------------------------------- -// Inbound - instruction allocation -// ----------------------------------------------------------------------------- - io.alloc.ready := !isFull + def nextPtr(p: UInt): UInt = Mux(p === (robDepth - 1).U, 0.U, p + 1.U) + def wrapPtr(v: UInt): UInt = Mux(v >= robDepth.U, v - robDepth.U, v) + + // --------------------------------------------------------------------------- + // Allocate: enqueue decoded instruction into ROB + // rob_id == tailPtr at allocation time (no separate counter needed) + // --------------------------------------------------------------------------- + io.alloc.ready := !isFull + bat.io.alloc.valid := io.alloc.fire + bat.io.alloc.rob_id := tailPtr + bat.io.alloc.raw := io.alloc.bits.bankAccess + + val commitMask = Wire(Vec(robDepth, Bool())) + for (i <- 0 until robDepth) { + commitMask(i) := false.B + } + bat.io.free.valid := commitMask.asUInt.orR + bat.io.free.mask := commitMask when(io.alloc.fire) { - robEntries(tailPtr).cmd := io.alloc.bits - robEntries(tailPtr).rob_id := robIdCounter - robValid(tailPtr) := true.B - robIssued(tailPtr) := false.B - robComplete(tailPtr) := false.B - - // Update tail pointer and rob_id counter (circular) - tailPtr := Mux(tailPtr === (b.rob_entries - 1).U, 0.U, tailPtr + 1.U) - robIdCounter := Mux(robIdCounter === (b.rob_entries - 1).U, 0.U, robIdCounter + 1.U) + itraceAlloc.io.is_issue := 2.U + itraceAlloc.io.rob_id := tailPtr + itraceAlloc.io.domain_id := io.alloc.bits.domain_id + itraceAlloc.io.funct := io.alloc.bits.cmd.funct + itraceAlloc.io.pc := io.alloc.bits.cmd.pc + itraceAlloc.io.rs1 := io.alloc.bits.cmd.rs1 + itraceAlloc.io.rs2 := io.alloc.bits.cmd.rs2 + itraceAlloc.io.bank_enable := io.alloc.bits.cmd.funct(6, 4) + itraceAlloc.io.enable := true.B + + robEntries(tailPtr).cmd := io.alloc.bits + robEntries(tailPtr).renamedBankAccess := bat.io.alloc_renamed + robEntries(tailPtr).rob_id := tailPtr + robValid(tailPtr) := true.B + robIssued(tailPtr) := false.B + robComplete(tailPtr) := false.B + tailPtr := nextPtr(tailPtr) } -// ----------------------------------------------------------------------------- -// Completion signal processing -// ----------------------------------------------------------------------------- + // --------------------------------------------------------------------------- + // Complete: mark entry as completed, release scoreboard resources + // --------------------------------------------------------------------------- io.complete.ready := true.B + + scoreboard.complete.valid := false.B + scoreboard.complete.bits := 0.U.asTypeOf(scoreboard.complete.bits) + when(io.complete.fire) { - val completeId = io.complete.bits - robComplete(completeId) := true.B - // When complete, decrement issued count - when(robIssued(completeId)) { + val cid = io.complete.bits + robComplete(cid) := true.B + when(robIssued(cid)) { issuedCount := issuedCount - 1.U } + scoreboard.complete.valid := true.B + scoreboard.complete.bits := robEntries(cid).renamedBankAccess + + itraceComp.io.is_issue := 0.U + itraceComp.io.rob_id := cid + itraceComp.io.domain_id := robEntries(cid).cmd.domain_id + itraceComp.io.funct := robEntries(cid).cmd.cmd.funct + itraceComp.io.pc := robEntries(cid).cmd.cmd.pc + itraceComp.io.rs1 := robEntries(cid).cmd.cmd.rs1 + itraceComp.io.rs2 := robEntries(cid).cmd.cmd.rs2 + itraceComp.io.bank_enable := robEntries(cid).cmd.cmd.funct(6, 4) + itraceComp.io.enable := true.B } -// ----------------------------------------------------------------------------- -// Outbound - issue instructions in order (starting from head) -// ----------------------------------------------------------------------------- - // Find first valid and unissued instruction starting from head - val canIssue = Wire(Bool()) - val issuePtr = Wire(UInt(log2Up(b.rob_entries).W)) - - // Default values - canIssue := false.B - issuePtr := headPtr - - // Scan from head to find first issuable instruction - val scanValid = Wire(Vec(b.rob_entries, Bool())) - for (i <- 0 until b.rob_entries) { - val ptr = Mux(headPtr + i.U >= b.rob_entries.U, - headPtr + i.U - b.rob_entries.U, - headPtr + i.U) - scanValid(i) := robValid(ptr) && !robIssued(ptr) && !robComplete(ptr) + // --------------------------------------------------------------------------- + // Issue: scan from head for first issuable entry (valid && !issued && !complete) + // --------------------------------------------------------------------------- + val scanValid = Wire(Vec(robDepth, Bool())) + val scanReady = Wire(Vec(robDepth, Bool())) + for (i <- 0 until robDepth) { + val ptr = wrapPtr(headPtr + i.U) + scanValid(i) := robValid(ptr) && !robIssued(ptr) && !robComplete(ptr) + scoreboard.queryVec(i) := robEntries(ptr).renamedBankAccess + scanReady(i) := scanValid(i) && !scoreboard.hazardVec(i) } - // Find first issuable position - val firstValid = PriorityEncoder(scanValid.asUInt) - val hasValid = scanValid.asUInt.orR + val hasReady = scanReady.asUInt.orR + val firstReady = PriorityEncoder(scanReady.asUInt) + val actualIssuePtr = wrapPtr(headPtr + firstReady) - val actualIssuePtr = Mux(headPtr + firstValid >= b.rob_entries.U, - headPtr + firstValid - b.rob_entries.U, - headPtr + firstValid) + scoreboard.query := robEntries(actualIssuePtr).renamedBankAccess + val canIssue = hasReady - // Can only issue if issue limit is not reached - val canIssueMore = issuedCount < maxIssueLimit - canIssue := hasValid && canIssueMore - issuePtr := actualIssuePtr + io.issue.valid := canIssue && !io.subRobActive + io.issue.bits := robEntries(actualIssuePtr) - io.issue.valid := canIssue - io.issue.bits := robEntries(issuePtr) + scoreboard.issue.valid := false.B + scoreboard.issue.bits := 0.U.asTypeOf(scoreboard.issue.bits) when(io.issue.fire) { - robIssued(issuePtr) := true.B - issuedCount := issuedCount + 1.U + robIssued(actualIssuePtr) := true.B + issuedCount := issuedCount + 1.U + scoreboard.issue.valid := true.B + scoreboard.issue.bits := robEntries(actualIssuePtr).renamedBankAccess + + itraceIssue.io.is_issue := 1.U + itraceIssue.io.rob_id := robEntries(actualIssuePtr).rob_id + itraceIssue.io.domain_id := robEntries(actualIssuePtr).cmd.domain_id + itraceIssue.io.funct := robEntries(actualIssuePtr).cmd.cmd.funct + itraceIssue.io.pc := robEntries(actualIssuePtr).cmd.cmd.pc + itraceIssue.io.rs1 := robEntries(actualIssuePtr).cmd.cmd.rs1 + itraceIssue.io.rs2 := robEntries(actualIssuePtr).cmd.cmd.rs2 + itraceIssue.io.bank_enable := robEntries(actualIssuePtr).cmd.cmd.funct(6, 4) + itraceIssue.io.enable := true.B } -// ----------------------------------------------------------------------------- -// Instruction commit - commit all completed instructions out-of-order -// ----------------------------------------------------------------------------- - // Commit all completed instructions - for (i <- 0 until b.rob_entries) { - when(robValid(i.U) && robComplete(i.U)) { - robValid(i.U) := false.B - robIssued(i.U) := false.B - robComplete(i.U) := false.B + // --------------------------------------------------------------------------- + // Commit: clear completed entries. + // Explicitly skip entries being allocated or completed this cycle. + // --------------------------------------------------------------------------- + for (i <- 0 until robDepth) { + val beingAllocated = io.alloc.fire && (tailPtr === i.U) + val beingCompleted = io.complete.fire && (io.complete.bits === i.U) + commitMask(i) := robValid(i) && robComplete(i) && !beingAllocated && !beingCompleted + when(commitMask(i)) { + robValid(i) := false.B + robIssued(i) := false.B + robComplete(i) := false.B } } - // Update head pointer: skip all completed (about to be cleared) positions - // Find first "valid and incomplete" instruction position starting from head - val nextHeadCandidates = Wire(Vec(b.rob_entries, Bool())) - for (i <- 0 until b.rob_entries) { - val ptr = Mux(headPtr + i.U >= b.rob_entries.U, - headPtr + i.U - b.rob_entries.U, - headPtr + i.U) - // Entry is valid and incomplete (will not be committed) + // Update head pointer: advance past all committed entries + val nextHeadCandidates = Wire(Vec(robDepth, Bool())) + for (i <- 0 until robDepth) { + val ptr = wrapPtr(headPtr + i.U) nextHeadCandidates(i) := robValid(ptr) && !robComplete(ptr) } val hasUncommitted = nextHeadCandidates.asUInt.orR val nextHeadOffset = PriorityEncoder(nextHeadCandidates.asUInt) - val nextHeadPtr = Mux(headPtr + nextHeadOffset >= b.rob_entries.U, - headPtr + nextHeadOffset - b.rob_entries.U, - headPtr + nextHeadOffset) - - // Update head pointer: - // - If there are uncompleted instructions, move head to first uncompleted position - // - If there are no uncompleted instructions (all complete), move head to tail (ROB is empty) - headPtr := Mux(hasUncommitted, nextHeadPtr, tailPtr) - -// ----------------------------------------------------------------------------- -// Status signals - exposed to reservation station -// ----------------------------------------------------------------------------- - io.empty := isEmpty - io.full := isFull - io.head_ptr := headPtr - io.issued_count := issuedCount - io.entry_valid := robValid + headPtr := Mux(hasUncommitted, wrapPtr(headPtr + nextHeadOffset), tailPtr) + + // --------------------------------------------------------------------------- + // Status outputs + // --------------------------------------------------------------------------- + io.empty := isEmpty + io.full := isFull + io.head_ptr := headPtr + io.issued_count := issuedCount + io.entry_valid := robValid io.entry_complete := robComplete } diff --git a/arch/src/main/scala/framework/frontend/globalrs/GlobalReservationStation.scala b/arch/src/main/scala/framework/frontend/globalrs/GlobalReservationStation.scala deleted file mode 100644 index 7a028fe2..00000000 --- a/arch/src/main/scala/framework/frontend/globalrs/GlobalReservationStation.scala +++ /dev/null @@ -1,134 +0,0 @@ -package framework.frontend.globalrs - -import chisel3._ -import chisel3.util._ -import chisel3.experimental._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.frontend.decoder.PostGDCmd -import freechips.rocketchip.tile.RoCCResponse - -// Global ROB entry - only contains basic information, does not include specific instruction decoding -class GlobalRobEntry(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val cmd = new PostGDCmd - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -// Global RS issue interface -class GlobalRsIssue(implicit b: CustomBuckyballConfig, p: Parameters) extends GlobalRobEntry - -// Global RS completion interface -class GlobalRsComplete(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -// No additional interface Bundle needed, defined directly in IO - -// Global reservation station - between GlobalDecoder and each Domain -class GlobalReservationStation(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - // GlobalDecoder -> Global RS - val global_decode_cmd_i = Flipped(new DecoupledIO(new PostGDCmd)) - - // Global RS -> BallDomain (single channel) - val ball_issue_o = Decoupled(new GlobalRsIssue) - - // Global RS -> MemDomain (single channel) - val mem_issue_o = Decoupled(new GlobalRsIssue) - - // BallDomain -> Global RS (single channel) - val ball_complete_i = Flipped(Decoupled(new GlobalRsComplete)) - - // MemDomain -> Global RS (single channel) - val mem_complete_i = Flipped(Decoupled(new GlobalRsComplete)) - - // RoCC response - val rs_rocc_o = new Bundle { - val resp = new DecoupledIO(new RoCCResponse()(p)) - val busy = Output(Bool()) - } - }) - - val rob = Module(new GlobalROB) - -// ----------------------------------------------------------------------------- -// Fence handling -// ----------------------------------------------------------------------------- - val fenceActive = RegInit(false.B) - // Cannot use fire, would form a loop - val isFenceCmd = io.global_decode_cmd_i.valid && io.global_decode_cmd_i.bits.is_fence - val robEmpty = rob.io.empty - - // Fence state machine: only activate when fence instruction is accepted (fire) - when (io.global_decode_cmd_i.fire && isFenceCmd && !fenceActive) { - fenceActive := true.B - } - when (fenceActive && robEmpty) { - fenceActive := false.B - } - -// ----------------------------------------------------------------------------- -// Inbound - instruction allocation (Fence instructions do not enter ROB) -// ----------------------------------------------------------------------------- - // Filter out fence instructions - rob.io.alloc.valid := io.global_decode_cmd_i.valid && !isFenceCmd - rob.io.alloc.bits := io.global_decode_cmd_i.bits - - // Backpressure logic: - // - Normal instructions: wait for ROB ready - // - Fence instructions: wait for ROB empty - io.global_decode_cmd_i.ready := Mux(isFenceCmd, robEmpty, rob.io.alloc.ready) - -// ----------------------------------------------------------------------------- -// Outbound - instruction issue (dispatch to corresponding domain based on is_ball/is_mem) -// ----------------------------------------------------------------------------- - // Ball domain issue - io.ball_issue_o.valid := rob.io.issue.valid && rob.io.issue.bits.cmd.is_ball - io.ball_issue_o.bits := rob.io.issue.bits - - // Mem domain issue - io.mem_issue_o.valid := rob.io.issue.valid && rob.io.issue.bits.cmd.is_mem - io.mem_issue_o.bits := rob.io.issue.bits - - // Set ROB ready signal - can only issue when target domain is ready - rob.io.issue.ready := - (rob.io.issue.bits.cmd.is_ball && io.ball_issue_o.ready) || - (rob.io.issue.bits.cmd.is_mem && io.mem_issue_o.ready) - -// ----------------------------------------------------------------------------- -// Completion signal processing -// ----------------------------------------------------------------------------- - val completeArb = Module(new Arbiter(UInt(log2Up(b.rob_entries).W), 2)) - - // Connect Ball and Mem domain completion signals to arbiter - completeArb.io.in(0).valid := io.ball_complete_i.valid - completeArb.io.in(0).bits := io.ball_complete_i.bits.rob_id - io.ball_complete_i.ready := completeArb.io.in(0).ready - - completeArb.io.in(1).valid := io.mem_complete_i.valid - completeArb.io.in(1).bits := io.mem_complete_i.bits.rob_id - io.mem_complete_i.ready := completeArb.io.in(1).ready - - // Decide whether to filter completion signals based on configuration - if (b.rs_out_of_order_response) { - // Out-of-order mode: accept all completion signals, ROB commits out-of-order internally - rob.io.complete <> completeArb.io.out - } else { - // Sequential mode: only accept completion signals where rob_id == head_ptr - val isHeadComplete = completeArb.io.out.bits === rob.io.head_ptr - rob.io.complete.valid := completeArb.io.out.valid && isHeadComplete - rob.io.complete.bits := completeArb.io.out.bits - completeArb.io.out.ready := rob.io.complete.ready && isHeadComplete - } - -// ----------------------------------------------------------------------------- -// Response generation -// ----------------------------------------------------------------------------- - // Buckyball does not generate RoCC responses for normal instructions - // Only performance counter or special commands would generate responses - // This matches Gemmini's behavior where io.resp is only connected to counters - io.rs_rocc_o.resp.valid := false.B - io.rs_rocc_o.resp.bits.rd := 0.U - io.rs_rocc_o.resp.bits.data := 0.U - io.rs_rocc_o.busy := !rob.io.empty || fenceActive -} diff --git a/arch/src/main/scala/framework/frontend/globalrs/GlobalScheduler.scala b/arch/src/main/scala/framework/frontend/globalrs/GlobalScheduler.scala new file mode 100644 index 00000000..427edde9 --- /dev/null +++ b/arch/src/main/scala/framework/frontend/globalrs/GlobalScheduler.scala @@ -0,0 +1,194 @@ +package framework.frontend.globalrs + +import chisel3._ +import chisel3.util._ +import chisel3.experimental._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.top.GlobalConfig +import framework.frontend.decoder.{DomainId, PostGDCmd} +import framework.frontend.decoder.GISA._ +import framework.frontend.scoreboard.BankAccessInfo +import framework.core.bbtile.RoCCResponseBB +import framework.balldomain.blink.SubRobRow + +class GlobalRobEntry(val b: GlobalConfig) extends Bundle { + val cmd = new PostGDCmd(b) + val renamedBankAccess = new BankAccessInfo(b.frontend.bank_id_len) + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) +} + +class GlobalSchedIssue(b: GlobalConfig) extends GlobalRobEntry(b) { + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) +} + +class GlobalSchedComplete(b: GlobalConfig) extends Bundle { + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) +} + +@instantiable +class GlobalScheduler(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val decode_cmd_i = Flipped(new DecoupledIO(new PostGDCmd(b))) + val ball_issue_o = Decoupled(new GlobalSchedIssue(b)) + val mem_issue_o = Decoupled(new GlobalSchedIssue(b)) + val gp_issue_o = Decoupled(new GlobalSchedIssue(b)) + val ball_complete_i = Flipped(Decoupled(new GlobalSchedComplete(b))) + val mem_complete_i = Flipped(Decoupled(new GlobalSchedComplete(b))) + val gp_complete_i = Flipped(Decoupled(new GlobalSchedComplete(b))) + val ball_subrob_req_i = Flipped(Vec(b.ballDomain.ballNum, Decoupled(new SubRobRow(b)))) + + val scheduler_rocc_o = new Bundle { + val resp = new DecoupledIO(new RoCCResponseBB(b.core.xLen)) + val busy = Output(Bool()) + } + + val barrier_arrive = Output(Bool()) + val barrier_release = Input(Bool()) + }) + + val rob: Instance[GlobalROB] = Instantiate(new GlobalROB(b)) + val subRob: Instance[SubROB] = Instantiate(new SubROB(b)) + + val isFenceCmd = io.decode_cmd_i.valid && io.decode_cmd_i.bits.isFence + val fenceActive = RegInit(false.B) + when(isFenceCmd && !fenceActive) { + fenceActive := true.B + } + when(fenceActive && rob.io.empty) { + fenceActive := false.B + } + + val isBarrierCmd = io.decode_cmd_i.valid && io.decode_cmd_i.bits.isBarrier + val barrierWaitROB = RegInit(false.B) + val barrierWaitRelease = RegInit(false.B) + when(isBarrierCmd && !barrierWaitROB && !barrierWaitRelease && !fenceActive) { + barrierWaitROB := true.B + } + when(barrierWaitROB && rob.io.empty) { + barrierWaitROB := false.B + barrierWaitRelease := true.B + } + when(barrierWaitRelease && io.barrier_release) { + barrierWaitRelease := false.B + } + io.barrier_arrive := barrierWaitRelease + + val isFrontendCmd = io.decode_cmd_i.bits.isFence || io.decode_cmd_i.bits.isBarrier + val anyStall = fenceActive || barrierWaitROB || barrierWaitRelease + rob.io.alloc.valid := io.decode_cmd_i.valid && !isFrontendCmd && !anyStall + rob.io.alloc.bits := io.decode_cmd_i.bits + io.decode_cmd_i.ready := Mux( + isFrontendCmd, + !anyStall, + rob.io.alloc.ready && !anyStall + ) + + val subRobWriteArb = Module(new Arbiter(new SubRobRow(b), b.ballDomain.ballNum)) + for (i <- 0 until b.ballDomain.ballNum) { + subRobWriteArb.io.in(i) <> io.ball_subrob_req_i(i) + } + subRob.io.write <> subRobWriteArb.io.out + + val is_ball_domain = rob.io.issue.bits.cmd.domain_id === DomainId.BALL + val is_mem_domain = rob.io.issue.bits.cmd.domain_id === DomainId.MEM + val is_gp_domain = rob.io.issue.bits.cmd.domain_id === DomainId.GP + + val subRobIssueValid = subRob.io.issue.valid + val subRobCmd = subRob.io.issue.bits + + val subRobIssueEntry = Wire(new GlobalSchedIssue(b)) + subRobIssueEntry.cmd := subRobCmd + subRobIssueEntry.renamedBankAccess := 0.U.asTypeOf(subRobIssueEntry.renamedBankAccess) + subRobIssueEntry.rob_id := subRob.io.issueMasterRobId + subRobIssueEntry.is_sub := true.B + subRobIssueEntry.sub_rob_id := subRob.io.issueSubId + + val subRobIssBall = subRobCmd.domain_id === DomainId.BALL + val subRobIssMem = subRobCmd.domain_id === DomainId.MEM + val subRobIssGp = subRobCmd.domain_id === DomainId.GP + + val mainIssueEntry = Wire(new GlobalSchedIssue(b)) + mainIssueEntry.cmd := rob.io.issue.bits.cmd + mainIssueEntry.renamedBankAccess := rob.io.issue.bits.renamedBankAccess + mainIssueEntry.rob_id := rob.io.issue.bits.rob_id + mainIssueEntry.is_sub := false.B + mainIssueEntry.sub_rob_id := 0.U + + io.ball_issue_o.valid := Mux( + subRobIssueValid && subRobIssBall, + true.B, + rob.io.issue.valid && is_ball_domain && !subRobIssueValid + ) + io.ball_issue_o.bits := Mux(subRobIssueValid && subRobIssBall, subRobIssueEntry, mainIssueEntry) + + io.mem_issue_o.valid := Mux( + subRobIssueValid && subRobIssMem, + true.B, + rob.io.issue.valid && is_mem_domain && !subRobIssueValid + ) + io.mem_issue_o.bits := Mux(subRobIssueValid && subRobIssMem, subRobIssueEntry, mainIssueEntry) + + io.gp_issue_o.valid := Mux( + subRobIssueValid && subRobIssGp, + true.B, + rob.io.issue.valid && is_gp_domain && !subRobIssueValid + ) + io.gp_issue_o.bits := Mux(subRobIssueValid && subRobIssGp, subRobIssueEntry, mainIssueEntry) + + subRob.io.issue.ready := + (subRobIssBall && io.ball_issue_o.ready) || + (subRobIssMem && io.mem_issue_o.ready) || + (subRobIssGp && io.gp_issue_o.ready) + + rob.io.issue.ready := !subRobIssueValid && ( + (is_ball_domain && io.ball_issue_o.ready) || + (is_mem_domain && io.mem_issue_o.ready) || + (is_gp_domain && io.gp_issue_o.ready) + ) + rob.io.subRobActive := subRobIssueValid + + val completeArb = Module(new Arbiter(new GlobalSchedComplete(b), 3)) + completeArb.io.in(0).valid := io.ball_complete_i.valid + completeArb.io.in(0).bits := io.ball_complete_i.bits + io.ball_complete_i.ready := completeArb.io.in(0).ready + completeArb.io.in(1).valid := io.mem_complete_i.valid + completeArb.io.in(1).bits := io.mem_complete_i.bits + io.mem_complete_i.ready := completeArb.io.in(1).ready + completeArb.io.in(2).valid := io.gp_complete_i.valid + completeArb.io.in(2).bits := io.gp_complete_i.bits + io.gp_complete_i.ready := completeArb.io.in(2).ready + + val completeBits = completeArb.io.out.bits + subRob.io.subComplete.valid := completeArb.io.out.valid && completeBits.is_sub + subRob.io.subComplete.bits := completeBits.sub_rob_id + subRob.io.masterComplete.ready := true.B + + val normalComplete = completeArb.io.out.valid && !completeBits.is_sub + if (b.frontend.rs_out_of_order_response) { + rob.io.complete.valid := normalComplete || subRob.io.masterComplete.valid + rob.io.complete.bits := Mux(subRob.io.masterComplete.valid, subRob.io.masterComplete.bits, completeBits.rob_id) + } else { + val isHeadComplete = Mux( + subRob.io.masterComplete.valid, + subRob.io.masterComplete.bits === rob.io.head_ptr, + completeBits.rob_id === rob.io.head_ptr + ) + rob.io.complete.valid := (normalComplete || subRob.io.masterComplete.valid) && isHeadComplete + rob.io.complete.bits := Mux(subRob.io.masterComplete.valid, subRob.io.masterComplete.bits, completeBits.rob_id) + } + completeArb.io.out.ready := Mux( + completeBits.is_sub, + subRob.io.subComplete.ready, + rob.io.complete.ready + ) + + io.scheduler_rocc_o.resp.valid := false.B + io.scheduler_rocc_o.resp.bits.rd := 0.U + io.scheduler_rocc_o.resp.bits.data := 0.U + io.scheduler_rocc_o.busy := !rob.io.empty || fenceActive || barrierWaitROB || barrierWaitRelease +} diff --git a/arch/src/main/scala/framework/frontend/globalrs/ITraceDPI.scala b/arch/src/main/scala/framework/frontend/globalrs/ITraceDPI.scala new file mode 100644 index 00000000..f93975cf --- /dev/null +++ b/arch/src/main/scala/framework/frontend/globalrs/ITraceDPI.scala @@ -0,0 +1,54 @@ +package framework.frontend.globalrs + +import chisel3._ +import chisel3.util._ + +// DPI-C BlackBox for instruction trace (issue / complete events) +class ITraceDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val is_issue = Input(UInt(8.W)) + val rob_id = Input(UInt(32.W)) + val domain_id = Input(UInt(32.W)) + val funct = Input(UInt(32.W)) + val pc = Input(UInt(64.W)) + val rs1 = Input(UInt(64.W)) + val rs2 = Input(UInt(64.W)) + val bank_enable = Input(UInt(8.W)) + val enable = Input(Bool()) + }) + + setInline( + "ITraceDPI.v", + """ + |import "DPI-C" function void dpi_itrace( + | input byte unsigned is_issue, + | input int unsigned rob_id, + | input int unsigned domain_id, + | input int unsigned funct, + | input longint unsigned pc, + | input longint unsigned rs1, + | input longint unsigned rs2, + | input byte unsigned bank_enable + |); + | + |module ITraceDPI( + | input [7:0] is_issue, + | input [31:0] rob_id, + | input [31:0] domain_id, + | input [31:0] funct, + | input [63:0] pc, + | input [63:0] rs1, + | input [63:0] rs2, + | input [7:0] bank_enable, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_itrace(is_issue, rob_id, domain_id, funct, pc, rs1, rs2, bank_enable); + | end + | end + |endmodule + """.stripMargin + ) +} diff --git a/arch/src/main/scala/framework/frontend/globalrs/SubROB.scala b/arch/src/main/scala/framework/frontend/globalrs/SubROB.scala new file mode 100644 index 00000000..13eb888b --- /dev/null +++ b/arch/src/main/scala/framework/frontend/globalrs/SubROB.scala @@ -0,0 +1,195 @@ +package framework.frontend.globalrs + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.balldomain.blink.SubRobRow +import framework.frontend.decoder.PostGDCmd + +@instantiable +class SubROB(val b: GlobalConfig) extends Module { + val subRobDepth = b.frontend.sub_rob_depth + val subIdBits = log2Up(subRobDepth * 4) + val robIdBits = log2Up(b.frontend.rob_entries) + val ballIdBits = log2Up(b.ballDomain.ballNum) + val rowIdBits = log2Up(subRobDepth) + + @public + val io = IO(new Bundle { + // Ball → SubROB: write a row of sub-instructions + val write = Flipped(Decoupled(new SubRobRow(b))) + // SubROB → GlobalRS: issue one slot at a time (PostGDCmd, domain_id determines target) + val issue = Decoupled(new PostGDCmd(b)) + val issueSubId = Output(UInt(subIdBits.W)) + val issueMasterRobId = Output(UInt(robIdBits.W)) + // GlobalRS → SubROB: a sub-instruction completed (sub_rob_id) + val subComplete = Flipped(Decoupled(UInt(subIdBits.W))) + // SubROB → main ROB: all sub-instructions done + val masterComplete = Decoupled(UInt(robIdBits.W)) + // Status + val occupied = Output(Bool()) + val lockedBallId = Output(UInt(ballIdBits.W)) + }) + + // ------------------------------------------------------------------------- + // Storage + // ------------------------------------------------------------------------- + val sram = SyncReadMem(subRobDepth, new SubRobRow(b)) + + val writePtr = RegInit(0.U(rowIdBits.W)) + val readPtr = RegInit(0.U(rowIdBits.W)) + val rowCount = RegInit(0.U((log2Up(subRobDepth) + 1).W)) + val lockedBallId = RegInit(0.U(ballIdBits.W)) + val masterRobId = RegInit(0.U(robIdBits.W)) + + def nextPtr(p: UInt): UInt = Mux(p === (subRobDepth - 1).U, 0.U, p + 1.U) + + val occupied = rowCount > 0.U + val isFull = rowCount === subRobDepth.U + val ballMatch = !occupied || (io.write.bits.ball_id === lockedBallId) + + // Write path: accept if not full AND ball_id matches (or not occupied yet) + io.write.ready := !isFull && ballMatch + + when(io.write.fire) { + when(occupied) { + assert(io.write.bits.master_rob_id === masterRobId, "SubROB row master_rob_id mismatch") + } + sram.write(writePtr, io.write.bits) + writePtr := nextPtr(writePtr) + rowCount := rowCount + 1.U + when(!occupied) { + lockedBallId := io.write.bits.ball_id + masterRobId := io.write.bits.master_rob_id + } + } + + // ------------------------------------------------------------------------- + // FSM + // ------------------------------------------------------------------------- + // States: + // sIdle - nothing to do (rowCount == 0) + // sReadReq - issue SRAM read request for readPtr row + // sReadWait - wait 1 cycle for sramRaw to be latched into sramData + // sReadResp - sramData valid; dispatch valid slots one per cycle + // sWaitSlots - all slots issued; wait for all subCompletes + // sWaitMaster - all rows done; fire masterComplete then return to sIdle + val sIdle :: sReadReq :: sReadWait :: sReadResp :: sWaitSlots :: sWaitMaster :: Nil = Enum(6) + val state = RegInit(sIdle) + + // SyncReadMem: read issued in sReadReq (cycle N), data valid on cycle N+1. + // We latch it into sramData with RegEnable so it stays stable during sReadResp + // and sWaitSlots, even when the read port is not enabled. + val sramReadEn = state === sReadReq + val sramRaw = sram.read(readPtr, sramReadEn) + val dataFresh = RegNext(sramReadEn, false.B) // pulse on first cycle of sReadResp + val sramData = RegEnable(sramRaw, dataFresh) + val readPtrReg = RegEnable(readPtr, sramReadEn) + + // Per-row tracking registers (reset when moving to next row) + // slotIssued: have we already sent this slot to a domain? + // slotDone: have we received subComplete for this slot? + val slotIssued = RegInit(VecInit(Seq.fill(4)(false.B))) + val slotDone = RegInit(VecInit(Seq.fill(4)(false.B))) + + // Combinational: which slots still need to be issued? + val slotNeedsIssue = VecInit((0 until 4).map(i => sramData.slots(i).valid && !slotIssued(i))) + val hasSlotToIssue = slotNeedsIssue.asUInt.orR + val firstSlotIdx = PriorityEncoder(slotNeedsIssue.asUInt) + + // Combinational: are all valid slots done? + val allSlotsDone = VecInit((0 until 4).map(i => !sramData.slots(i).valid || slotDone(i))).asUInt.andR + + // Issue output (valid only in sReadResp when there's a slot to issue) + io.issue.valid := (state === sReadResp) && hasSlotToIssue + io.issue.bits := sramData.slots(firstSlotIdx).cmd + io.issueSubId := readPtrReg * 4.U + firstSlotIdx + io.issueMasterRobId := masterRobId + + // subComplete: always ready, mark the corresponding slot done + io.subComplete.ready := true.B + when(io.subComplete.fire) { + val subId = io.subComplete.bits + val subRow = subId / 4.U + val slotIdx = subId(1, 0) + assert( + state === sReadResp || state === sWaitSlots || state === sWaitMaster, + "SubROB subComplete arrived in invalid state" + ) + assert(subRow === readPtrReg, "SubROB subComplete points to non-active row") + slotDone(slotIdx) := true.B + } + + // masterComplete + io.masterComplete.valid := state === sWaitMaster + io.masterComplete.bits := masterRobId + + // Status outputs + io.occupied := occupied + io.lockedBallId := lockedBallId + + // ------------------------------------------------------------------------- + // FSM transitions + // ------------------------------------------------------------------------- + // Helper: advance past current row (called when row is complete) + def advanceRow(): Unit = { + readPtr := nextPtr(readPtr) + rowCount := rowCount - 1.U + slotIssued.foreach(_ := false.B) + slotDone.foreach(_ := false.B) + } + + switch(state) { + is(sIdle) { + when(occupied)(state := sReadReq) + } + is(sReadReq) { + // SRAM read issued, data available on sramRaw next cycle + state := sReadWait + } + is(sReadWait) { + // sramRaw is valid; RegEnable latches it into sramData at this clock edge. + // sramData will be stable starting next cycle (sReadResp). + state := sReadResp + } + is(sReadResp) { + // Fire issued slot + when(io.issue.fire) { + slotIssued(firstSlotIdx) := true.B + } + // Check if all slots have been issued + // After marking current slot issued, recalculate + val allIssued = VecInit((0 until 4).map(i => + !sramData.slots(i).valid || slotIssued(i) || + (io.issue.fire && firstSlotIdx === i.U) + )).asUInt.andR + + when(allIssued) { + // Check fast path: all already done (e.g., empty row or immediate completion) + val allDoneNow = VecInit((0 until 4).map(i => !sramData.slots(i).valid || slotDone(i))).asUInt.andR + when(allDoneNow) { + // Fast path: skip sWaitSlots + advanceRow() + state := Mux(rowCount === 1.U, sWaitMaster, sReadReq) + }.otherwise { + state := sWaitSlots + } + } + // else: stay in sReadResp to dispatch remaining slots + } + is(sWaitSlots) { + when(allSlotsDone) { + advanceRow() + state := Mux(rowCount === 1.U, sWaitMaster, sReadReq) + } + } + is(sWaitMaster) { + when(io.masterComplete.fire) { + lockedBallId := 0.U + masterRobId := 0.U + state := sIdle + } + } + } +} diff --git a/arch/src/main/scala/framework/frontend/scoreboard/BankAliasTable.scala b/arch/src/main/scala/framework/frontend/scoreboard/BankAliasTable.scala new file mode 100644 index 00000000..8c6dd1a3 --- /dev/null +++ b/arch/src/main/scala/framework/frontend/scoreboard/BankAliasTable.scala @@ -0,0 +1,128 @@ +package framework.frontend.scoreboard + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +/** + * BAT (Bank Alias Table) + * + * Rename virtual bank IDs into the high-ID alias namespace for scoreboard. + * Lifetime policy: per ROB entry. + * + * ID partition: + * [0, vbankUpper] : virtual/architected bank IDs + * [vbankUpper + 1, maxBankId] : renamed alias IDs (one alias per ROB entry) + */ +@instantiable +class BankAliasTable(val bankIdLen: Int, val vbankUpper: Int, val robEntries: Int) extends Module { + private val aliasIdLen = bankIdLen + private val robIdLen = log2Up(robEntries) + private val vbankNum = vbankUpper + 1 + private val vbankIdLen = log2Up(vbankNum) + private val maxBankId = (1 << bankIdLen) - 1 + private val aliasBase = vbankNum + + @public + val io = IO(new Bundle { + + val alloc = Input(new Bundle { + val valid = Bool() + val rob_id = UInt(robIdLen.W) + val raw = new BankAccessInfo(aliasIdLen) + }) + + val alloc_renamed = Output(new BankAccessInfo(aliasIdLen)) + + val free = Input(new Bundle { + val valid = Bool() + val mask = Vec(robEntries, Bool()) + }) + + }) + + // Current architectural alias for each virtual bank. + val v2a = RegInit(VecInit((0 until vbankNum).map(_.U(aliasIdLen.W)))) + + // Extra aliases [aliasBase, aliasBase + robEntries - 1] are one-per-ROB. + val aliasInUse = RegInit(VecInit(Seq.fill(robEntries)(false.B))) + + // Per-entry metadata for commit-time free. + val entHasWrite = RegInit(VecInit(Seq.fill(robEntries)(false.B))) + val entOldAlias = RegInit(VecInit(Seq.fill(robEntries)(0.U(aliasIdLen.W)))) + val entNewAlias = RegInit(VecInit(Seq.fill(robEntries)(0.U(aliasIdLen.W)))) + val entWrVbank = RegInit(VecInit(Seq.fill(robEntries)(0.U(vbankIdLen.W)))) + + require(vbankUpper >= 0, s"BAT vbankUpper must be non-negative, got $vbankUpper") + require(vbankUpper < maxBankId, s"BAT vbankUpper must be < maxBankId($maxBankId), got $vbankUpper") + require( + aliasBase + robEntries - 1 <= maxBankId, + s"BAT alias range exceeds bank_id_len space: base=$aliasBase entries=$robEntries max=$maxBankId" + ) + + private def extraAlias(robId: UInt): UInt = aliasBase.U(aliasIdLen.W) + robId + private def toVbankIdx(v: UInt): UInt = v(vbankIdLen - 1, 0) + + private def mapVbank(v: UInt): UInt = { + val idx = toVbankIdx(v) + val out = WireDefault(0.U(aliasIdLen.W)) + for (i <- 0 until vbankNum) { + when(idx === i.U(vbankIdLen.W)) { + out := v2a(i) + } + } + out + } + + val q = io.alloc.raw + io.alloc_renamed.rd_bank_0_valid := q.rd_bank_0_valid + io.alloc_renamed.rd_bank_1_valid := q.rd_bank_1_valid + io.alloc_renamed.wr_bank_valid := q.wr_bank_valid + io.alloc_renamed.rd_bank_0_id := Mux(q.rd_bank_0_valid, mapVbank(q.rd_bank_0_id), 0.U) + io.alloc_renamed.rd_bank_1_id := Mux(q.rd_bank_1_valid, mapVbank(q.rd_bank_1_id), 0.U) + io.alloc_renamed.wr_bank_id := Mux(q.wr_bank_valid, extraAlias(io.alloc.rob_id), 0.U) + + when(io.alloc.valid) { + val rid = io.alloc.rob_id + when(q.rd_bank_0_valid) { + assert(q.rd_bank_0_id <= vbankUpper.U, "BAT rd_bank_0_id exceeds virtual bank upper bound") + } + when(q.rd_bank_1_valid) { + assert(q.rd_bank_1_id <= vbankUpper.U, "BAT rd_bank_1_id exceeds virtual bank upper bound") + } + when(q.wr_bank_valid) { + assert(q.wr_bank_id <= vbankUpper.U, "BAT wr_bank_id exceeds virtual bank upper bound") + } + + entHasWrite(rid) := q.wr_bank_valid + entOldAlias(rid) := Mux(q.wr_bank_valid, mapVbank(q.wr_bank_id), 0.U) + entNewAlias(rid) := Mux(q.wr_bank_valid, extraAlias(rid), 0.U) + entWrVbank(rid) := toVbankIdx(q.wr_bank_id) + + when(q.wr_bank_valid) { + assert(!aliasInUse(rid), "BAT alias reused before free") + aliasInUse(rid) := true.B + v2a(toVbankIdx(q.wr_bank_id)) := extraAlias(rid) + } + } + + when(io.free.valid) { + for (i <- 0 until robEntries) { + when(io.free.mask(i)) { + when(entHasWrite(i)) { + assert(aliasInUse(i), "BAT free on non-allocated alias") + aliasInUse(i) := false.B + + // Restore old alias only if mapping still points to this entry's alias. + when(v2a(entWrVbank(i)) === entNewAlias(i)) { + v2a(entWrVbank(i)) := entOldAlias(i) + } + } + entHasWrite(i) := false.B + entOldAlias(i) := 0.U + entNewAlias(i) := 0.U + entWrVbank(i) := 0.U + } + } + } +} diff --git a/arch/src/main/scala/framework/frontend/scoreboard/BankScoreboard.scala b/arch/src/main/scala/framework/frontend/scoreboard/BankScoreboard.scala new file mode 100644 index 00000000..6270d0aa --- /dev/null +++ b/arch/src/main/scala/framework/frontend/scoreboard/BankScoreboard.scala @@ -0,0 +1,113 @@ +package framework.frontend.scoreboard + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +/** + * Bank access information extracted from instruction encoding. + * This is instruction-agnostic — the scoreboard only sees read/write bank sets. + */ +class BankAccessInfo(val bankIdLen: Int) extends Bundle { + val rd_bank_0_valid = Bool() + val rd_bank_0_id = UInt(bankIdLen.W) + val rd_bank_1_valid = Bool() + val rd_bank_1_id = UInt(bankIdLen.W) + val wr_bank_valid = Bool() + val wr_bank_id = UInt(bankIdLen.W) +} + +object BankAccessInfo { + + /** Create a zero-valued (no access) BankAccessInfo */ + def none(bankIdLen: Int): BankAccessInfo = { + val w = Wire(new BankAccessInfo(bankIdLen)) + w.rd_bank_0_valid := false.B + w.rd_bank_0_id := 0.U + w.rd_bank_1_valid := false.B + w.rd_bank_1_id := 0.U + w.wr_bank_valid := false.B + w.wr_bank_id := 0.U + w + } + +} + +/** + * Bank Scoreboard — tracks in-flight read/write operations per bank. + * + * Hazard rules: + * - Read bank X → requires bankWrBusy(X) == false (RAW) + * - Write bank X → requires bankRdCount(X) == 0 (WAR) + * AND bankWrBusy(X) == false (WAW) + * + * bankRdCount: multi-bit counter (multiple concurrent readers allowed, RR is OK) + * bankWrBusy: 1-bit flag (WAW rule guarantees at most 1 writer in-flight) + * + * Issue and complete may fire in the same cycle. Updates are computed + * combinationally so both increments and decrements take effect. + */ +@instantiable +class BankScoreboard(val bankNum: Int, val robEntries: Int) extends Module { + + val bankIdLen = log2Up(bankNum) + val cntWidth = log2Ceil(robEntries + 1) + + @public + val issue = IO(Flipped(Valid(new BankAccessInfo(bankIdLen)))) + @public + val complete = IO(Flipped(Valid(new BankAccessInfo(bankIdLen)))) + @public + val query = IO(Input(new BankAccessInfo(bankIdLen))) + @public + val hasHazard = IO(Output(Bool())) + @public + val queryVec = IO(Input(Vec(robEntries, new BankAccessInfo(bankIdLen)))) + @public + val hazardVec = IO(Output(Vec(robEntries, Bool()))) + + val bankRdCount = RegInit(VecInit(Seq.fill(bankNum)(0.U(cntWidth.W)))) + val bankWrBusy = RegInit(VecInit(Seq.fill(bankNum)(false.B))) + + // --- Hazard detection (reads current register state) --- + private def hazardOf(q: BankAccessInfo): Bool = { + val rd0_hazard = q.rd_bank_0_valid && bankWrBusy(q.rd_bank_0_id) + val rd1_hazard = q.rd_bank_1_valid && bankWrBusy(q.rd_bank_1_id) + val wr_hazard = q.wr_bank_valid && ( + bankRdCount(q.wr_bank_id) =/= 0.U || + bankWrBusy(q.wr_bank_id) + ) + rd0_hazard || rd1_hazard || wr_hazard + } + + hasHazard := hazardOf(query) + for (i <- 0 until robEntries) { + hazardVec(i) := hazardOf(queryVec(i)) + } + + // --- Compute per-bank deltas to handle simultaneous issue + complete --- + for (bank <- 0 until bankNum) { + val bankU = bank.U(bankIdLen.W) + + // Read counter: +1 per issue read, -1 per complete read + val issRd0 = issue.valid && issue.bits.rd_bank_0_valid && (issue.bits.rd_bank_0_id === bankU) + val issRd1 = issue.valid && issue.bits.rd_bank_1_valid && (issue.bits.rd_bank_1_id === bankU) + val cmpRd0 = complete.valid && complete.bits.rd_bank_0_valid && (complete.bits.rd_bank_0_id === bankU) + val cmpRd1 = complete.valid && complete.bits.rd_bank_1_valid && (complete.bits.rd_bank_1_id === bankU) + + val rdInc = issRd0.asUInt +& issRd1.asUInt + val rdDec = cmpRd0.asUInt +& cmpRd1.asUInt + bankRdCount(bank) := bankRdCount(bank) + rdInc - rdDec + + // Write flag: issue sets, complete clears. If both happen to same bank, + // complete takes priority (the completing instruction frees the bank). + val issWr = issue.valid && issue.bits.wr_bank_valid && (issue.bits.wr_bank_id === bankU) + val cmpWr = complete.valid && complete.bits.wr_bank_valid && (complete.bits.wr_bank_id === bankU) + + when(issWr && !cmpWr) { + bankWrBusy(bank) := true.B + }.elsewhen(cmpWr) { + bankWrBusy(bank) := false.B + } + } +} diff --git a/arch/src/main/scala/framework/gendomain/LICENSE b/arch/src/main/scala/framework/gendomain/LICENSE deleted file mode 100644 index b81ee094..00000000 --- a/arch/src/main/scala/framework/gendomain/LICENSE +++ /dev/null @@ -1,33 +0,0 @@ -BSD 3-Clause License - -Copyright (c) 2024, The Regents of the University of California (Regents) -All Rights Reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -1. Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - -2. Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -3. Neither the name of the copyright holder nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -The generic domain is built based on the Saturn, Berkeley's vector processor -design, which is available at https://github.com/ucb-bar/saturn-vectors. -Thanks for the great design! diff --git a/arch/src/main/scala/framework/gendomain/backend/Backend.scala b/arch/src/main/scala/framework/gendomain/backend/Backend.scala deleted file mode 100644 index bfafaaec..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/Backend.scala +++ /dev/null @@ -1,446 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import chisel3.experimental.dataview._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import framework.gendomain.mem._ -import framework.gendomain.exu._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class VectorBackend(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val dis = Flipped(Decoupled(new VectorIssueInst)) - - val vmu = Flipped(new VectorMemDatapathIO) - - val busy = Output(Bool()) - - val index_access = new VectorIndexAccessIO - val mask_access = new VectorMaskAccessIO - - val scalar_resp = Decoupled(new ScalarWrite) - - val set_vxsat = Output(Bool()) - val set_fflags = Output(Valid(UInt(5.W))) - - val fp_req = Decoupled(new FPInput()) - val fp_resp = Flipped(Valid(new FPResult())) - - val vat_tail = Input(UInt(vParams.vatSz.W)) - val vat_head = Input(UInt(vParams.vatSz.W)) - - val vat_release = Output(Vec(nRelease, Valid(UInt(vParams.vatSz.W)))) - }) - - require(vLen >= 64) - require(xLen == 64) - require(vLen >= dLen) - require(vLen % dLen == 0) - - def vatOlder(i0: UInt, i1: UInt) = cqOlder(i0, i1, io.vat_tail) - - val vdq = Module(new DCEQueue(new VectorIssueInst, vParams.vdqEntries)) - vdq.io.enq <> io.dis - - val perm_buffer = Module(new Compactor(dLenB, dLenB, UInt(8.W), false)) - - val xissParams = vParams.issStructure.generate(vParams) - val all_supported_insns = xissParams.map(_.insns).flatten - - val vlissq = Module(new IssueQueue(vParams.vlissqEntries, 1)) - val vsissq = Module(new IssueQueue(vParams.vsissqEntries, 1)) - val vpissq = Module(new IssueQueue(vParams.vpissqEntries, 1)) - val vxissqs = xissParams.map(q => Module(new IssueQueue(q.depth, q.seqs.size)).suggestName(s"vxissq_${q.name}")) - - val vls = Module(new LoadSequencer) - val vss = Module(new StoreSequencer) - val vps = Module(new PermuteSequencer(xissParams.map(_.insns).flatten)) - val vxs = xissParams.map(q => q.seqs.map(s => - Module(new ExecuteSequencer(s.insns)).suggestName(s"vxs${s.name}") - )) - - val allSeqs = Seq(vls, vss, vps) ++ vxs.flatten - val allIssQs = Seq(vlissq, vsissq, vpissq) ++ vxissqs - - val vxus = xissParams.map(_.seqs.map(s => Module(new ExecutionUnit(s.fus)).suggestName(s"vxu${s.name}"))) - - - io.fp_req.valid := false.B - io.fp_req.bits := DontCare - vxus.foreach(_.foreach(_.io.shared_fp_req := DontCare)) - vxus.foreach(_.foreach(_.io.shared_fp_resp := DontCare)) - - val shared_fp_vxu = vxus.flatten.filter(_.hasSharedFPUnits) - require(shared_fp_vxu.size <= 1) - shared_fp_vxu.headOption.foreach { vxu => - io.fp_req <> vxu.io.shared_fp_req - vxu.io.shared_fp_resp <> io.fp_resp - } - - case class IssueGroup( - issq: IssueQueue, - seqs: Seq[PipeSequencer[_]]) - - - val issGroups = Seq( - IssueGroup(vlissq, Seq(vls)), - IssueGroup(vsissq, Seq(vss)), - IssueGroup(vpissq, Seq(vps)) - ) ++ (vxissqs.zip(vxs).map { case (q, seqs) => - IssueGroup(q, seqs) - }) - - vlissq.io.enq.bits.reduction := false.B - vlissq.io.enq.bits.wide_vd := false.B - vlissq.io.enq.bits.wide_vs2 := false.B - vlissq.io.enq.bits.writes_mask := false.B - vlissq.io.enq.bits.reads_vs1_mask := false.B - vlissq.io.enq.bits.reads_vs2_mask := false.B - vlissq.io.enq.bits.nf_log2 := log2_up(vdq.io.deq.bits.nf, 8) - vlissq.io.enq.bits.renv1 := false.B - vlissq.io.enq.bits.renv2 := false.B - vlissq.io.enq.bits.renvd := false.B - vlissq.io.enq.bits.renvm := !vdq.io.deq.bits.vm - vlissq.io.enq.bits.wvd := true.B - vlissq.io.enq.bits.scalar_to_vd0 := false.B - vlissq.io.enq.bits.rs1_is_rs2 := false.B - - vsissq.io.enq.bits.reduction := false.B - vsissq.io.enq.bits.wide_vd := false.B - vsissq.io.enq.bits.wide_vs2 := false.B - vsissq.io.enq.bits.writes_mask := false.B - vsissq.io.enq.bits.reads_vs1_mask := false.B - vsissq.io.enq.bits.reads_vs2_mask := false.B - vsissq.io.enq.bits.nf_log2 := log2_up(vdq.io.deq.bits.nf, 8) - vsissq.io.enq.bits.renv1 := false.B - vsissq.io.enq.bits.renv2 := false.B - vsissq.io.enq.bits.renvd := true.B - vsissq.io.enq.bits.renvm := !vdq.io.deq.bits.vm && vdq.io.deq.bits.mop === mopUnit - vsissq.io.enq.bits.wvd := false.B - vsissq.io.enq.bits.scalar_to_vd0 := false.B - vsissq.io.enq.bits.rs1_is_rs2 := false.B - - vpissq.io.enq.bits.reduction := false.B - vpissq.io.enq.bits.wide_vd := false.B - vpissq.io.enq.bits.wide_vs2 := false.B - vpissq.io.enq.bits.writes_mask := false.B - vpissq.io.enq.bits.reads_vs1_mask := false.B - vpissq.io.enq.bits.reads_vs2_mask := false.B - vpissq.io.enq.bits.nf_log2 := 0.U - vpissq.io.enq.bits.renv1 := false.B - vpissq.io.enq.bits.renv2 := vdq.io.deq.bits.mop(0) || !vdq.io.deq.bits.vmu - vpissq.io.enq.bits.renvd := true.B - vpissq.io.enq.bits.renvm := !vdq.io.deq.bits.vm && vdq.io.deq.bits.mop =/= mopUnit && vdq.io.deq.bits.vmu - vpissq.io.enq.bits.wvd := false.B - vpissq.io.enq.bits.scalar_to_vd0 := false.B - vpissq.io.enq.bits.rs1_is_rs2 := !vdq.io.deq.bits.vmu && (vdq.io.deq.bits.opif6 === OPIFunct6.rgather || (vdq.io.deq.bits.funct3 === OPIVV && vdq.io.deq.bits.opif6 === OPIFunct6.rgatherei16)) - - val xdis_ctrl = new VectorDecoder(vdq.io.deq.bits.funct3, vdq.io.deq.bits.funct6, vdq.io.deq.bits.rs1, vdq.io.deq.bits.rs2, all_supported_insns, - Seq(Reduction, Wide2VD, Wide2VS2, WritesAsMask, ReadsVS1AsMask, ReadsVS2AsMask, ReadsVS1, ReadsVS2, ReadsVD, - VMBitReadsVM, AlwaysReadsVM, WritesVD, WritesScalar, ScalarToVD0)) - vxissqs.foreach { vxissq => - vxissq.io.enq.bits.wide_vd := xdis_ctrl.bool(Wide2VD) - vxissq.io.enq.bits.wide_vs2 := xdis_ctrl.bool(Wide2VS2) - vxissq.io.enq.bits.writes_mask := xdis_ctrl.bool(WritesAsMask) - vxissq.io.enq.bits.reads_vs1_mask := xdis_ctrl.bool(ReadsVS1AsMask) - vxissq.io.enq.bits.reads_vs2_mask := xdis_ctrl.bool(ReadsVS2AsMask) - vxissq.io.enq.bits.nf_log2 := 0.U - vxissq.io.enq.bits.renv1 := xdis_ctrl.bool(ReadsVS1) - vxissq.io.enq.bits.renv2 := xdis_ctrl.bool(ReadsVS2) - vxissq.io.enq.bits.renvd := xdis_ctrl.bool(ReadsVD) - vxissq.io.enq.bits.renvm := (!vdq.io.deq.bits.vm && xdis_ctrl.bool(VMBitReadsVM)) || xdis_ctrl.bool(AlwaysReadsVM) - vxissq.io.enq.bits.wvd := !xdis_ctrl.bool(WritesScalar) - vxissq.io.enq.bits.scalar_to_vd0 := xdis_ctrl.bool(ScalarToVD0) - vxissq.io.enq.bits.reduction := xdis_ctrl.bool(Reduction) - vxissq.io.enq.bits.rs1_is_rs2 := false.B - } - - val issq_stall = Wire(Vec(issGroups.size, Bool())) - vdq.io.deq.ready := !issq_stall.orR - - for ((group, i) <- issGroups.zipWithIndex) { - val otherIssGroups = issGroups.zipWithIndex.filter(_._2 != i).map(_._1) - val otherIssqs = otherIssGroups.map(_.issq) - val otherIssqSeqs = otherIssGroups.map(_.seqs).flatten - - for ((seq, j) <- group.seqs.zipWithIndex) { - val otherSameIssqSeqs = group.seqs.zipWithIndex.filter(_._2 != j).map(_._1) - val otherSeqs = otherIssqSeqs ++ otherSameIssqSeqs - - val vat = seq.io.vat - - seq.io.rvs1 := DontCare - seq.io.rvs2 := DontCare - seq.io.rvd := DontCare - seq.io.rvm := DontCare - seq.io.perm := DontCare - seq.io.acc.valid := false.B - seq.io.acc.bits := DontCare - seq.io.vat_head := io.vat_head - - val older_issq_wintents = FillInterleaved(egsPerVReg, otherIssqs.map { i => - i.io.hazards.map(h => Mux(vatOlder(h.bits.vat, vat) && h.valid, h.bits.wintent, 0.U)) - }.flatten.foldLeft(0.U)(_|_)) - val older_seq_wintents = otherSeqs.map { s => - Mux(vatOlder(s.io.seq_hazard.bits.vat, vat) && s.io.seq_hazard.valid, s.io.seq_hazard.bits.wintent, 0.U) - }.reduce(_|_) - val older_wintents = older_issq_wintents | older_seq_wintents - - val older_issq_rintents = FillInterleaved(egsPerVReg, otherIssqs.map { i => - i.io.hazards.map(h => Mux(vatOlder(h.bits.vat, vat) && h.valid, h.bits.rintent, 0.U)) - }.flatten.foldLeft(0.U)(_|_)) - val older_seq_rintents = otherSeqs.map { s => - Mux(vatOlder(s.io.seq_hazard.bits.vat, vat) && s.io.seq_hazard.valid, s.io.seq_hazard.bits.rintent, 0.U) - }.reduce(_|_) - val older_rintents = older_issq_rintents | older_seq_rintents - - val older_pipe_writes = vxus.flatten.map(_.io.pipe_hazards.toSeq).flatten.map { h => - Mux(h.valid, h.bits.eg_oh, 0.U) - }.reduce(_|_) - - val older_iter_writes = vxus.flatten.map(_.io.iter_hazards.toSeq).flatten.map { h => - Mux(h.valid, h.bits.eg_oh, 0.U) - }.reduce(_|_) - - seq.io.older_writes := older_pipe_writes | older_iter_writes | older_wintents - seq.io.older_reads := older_rintents - - if (!vParams.enableOOO) { - // stall dispatch if any other sequencers are at the head and stalled - seq.io.dis_stall := otherSeqs.map { s => - s.io.busy && s.io.head && !(s.io.iss.valid && s.io.iss.ready) - }.orR - } else { - seq.io.dis_stall := false.B // never stall dispatch - } - } - - val accepts = group.seqs.map(_.accepts(vdq.io.deq.bits)) - issq_stall(i) := !group.issq.io.enq.ready && accepts.orR - - group.issq.io.enq.valid := vdq.io.deq.valid && !issq_stall.orR && accepts.orR - group.issq.io.enq.bits.viewAsSupertype(new VectorIssueInst) := vdq.io.deq.bits - group.issq.io.enq.bits.seq := VecInit(accepts).asUInt - - // In case of multiple available sequencers, select the first ready one - val valid_seqs = group.issq.io.deq.bits.seq - val ready_seqs = VecInit(group.seqs.map(_.io.dis.ready)).asUInt - val chosen_seq = PriorityEncoder(valid_seqs & ready_seqs) - - group.seqs.zipWithIndex.foreach{ case(s, j) => - s.io.dis.valid := group.issq.io.deq.valid && chosen_seq === j.U - s.io.dis.bits := group.issq.io.deq.bits.viewAsSupertype(new BackendIssueInst) - } - group.issq.io.deq.ready := (valid_seqs & ready_seqs) =/= 0.U - } - - val flat_vxs = vxs.flatten - val flat_vxus = vxus.flatten - require(flat_vxs.size == flat_vxus.size) - - // Hazard checking for multi-VXS - // Check if there is a VRF write port hazard against the in-flight insns in other VXUs - // Check if there is a VRF write port hazard against a simultaneously issuing insn - // from another VXS (check that it's actually a valid hazard) - val inflight_hazards = WireInit(VecInit(Seq.fill(flat_vxs.length)(false.B))) - for (i <- 0 until flat_vxs.length) { - val other_vxu_idx = (0 until flat_vxs.length).filter(_ != i) - - val inflight_hazard = other_vxu_idx.map(flat_vxus(_).io.pipe_hazards).flatten.map { hazard => - hazard.valid && - (hazard.bits.latency === flat_vxus(i).io.issue_pipe_latency) && - (hazard.bits.eg(vrfBankBits-1,0) === flat_vxs(i).io.iss.bits.wvd_eg(vrfBankBits-1,0)) - }.reduceOption(_ || _).getOrElse(false.B) - - inflight_hazards(i) := inflight_hazard - - val issue_hazard = other_vxu_idx.map { other_iss => - (flat_vxus(other_iss).io.issue_pipe_latency === flat_vxus(i).io.issue_pipe_latency) && - (flat_vxs(other_iss).io.iss.bits.wvd_eg(vrfBankBits-1,0) === flat_vxs(i).io.iss.bits.wvd_eg(vrfBankBits-1,0)) && - vatOlder(flat_vxs(other_iss).io.iss.bits.vat, flat_vxs(i).io.iss.bits.vat) && - !inflight_hazards(other_iss) && - flat_vxs(other_iss).io.iss.valid && - flat_vxus(other_iss).io.iss.ready - }.reduceOption(_ || _).getOrElse(false.B) - - flat_vxus(i).io.iss.valid := flat_vxs(i).io.iss.valid && !inflight_hazard && !issue_hazard - flat_vxs(i).io.iss.ready := flat_vxus(i).io.iss.ready && !inflight_hazard && !issue_hazard - flat_vxus(i).io.iss.bits := flat_vxs(i).io.iss.bits - flat_vxs(i).io.acc := flat_vxus(i).io.acc_write - } - - // Read ports are - // vxs0-vrs1, vxs1-vrs1, vmu-index, frontend-index - // vxs0-vrs2, vxs1-vrs2 - // vxs0-vrs3, vxs1-vrs3, vss-vrd - // vxs0-mask, vxs1-mask, vls-mask, vss-mask, vps-mask, frontend-mask - // Mask ports are - // vxs0-mask, vxs1-mask, vls-mask, vss-mask, vps-mask, frontend-mask - val vrf = Module(new RegisterFile( - reads = Seq(2 + flat_vxs.size, flat_vxs.size, 1 + flat_vxs.size), - maskReads = Seq(4 + flat_vxs.size), - pipeWrites = flat_vxus.size, - llWrites = flat_vxus.size + 2 // vxus + load + reset - )) - - val load_write = Wire(Decoupled(new VectorWrite(dLen))) - io.vmu.lresp.ready := vls.io.iss.valid && load_write.ready - vls.io.iss.ready := io.vmu.lresp.valid && load_write.ready - load_write.valid := vls.io.iss.valid && io.vmu.lresp.valid - load_write.bits.eg := vls.io.iss.bits.wvd_eg - load_write.bits.data := io.vmu.lresp.bits.data - load_write.bits.mask := FillInterleaved(8, vls.io.iss.bits.wmask) - when (io.vmu.lresp.fire) { - assert(io.vmu.lresp.bits.debug_id === vls.io.iss.bits.debug_id) - } - - val resetting = RegInit(true.B) - val reset_ctr = RegInit(0.U(log2Ceil(egsTotal).W)) - when (resetting) { - reset_ctr := reset_ctr + 1.U - io.dis.ready := false.B - } - when (~reset_ctr === 0.U) { resetting := false.B } - - // Write ports - vrf.io.pipe_writes.zip(vxus.flatten).foreach { case (w,vxu) => - w := vxu.io.pipe_write - } - - vrf.io.ll_writes(0) <> load_write - vrf.io.ll_writes(1).valid := resetting - vrf.io.ll_writes(1).bits.eg := reset_ctr - vrf.io.ll_writes(1).bits.data := 0.U - vrf.io.ll_writes(1).bits.mask := ~(0.U(dLen.W)) - vxus.flatten.zipWithIndex.foreach { case (vxu,i) => - vrf.io.ll_writes(2+i) <> vxu.io.iter_write - } - - flat_vxs.zipWithIndex.foreach { case(xs, i) => - vrf.io.read(0)(i) <> xs.io.rvs1 - vrf.io.read(1)(i) <> xs.io.rvs2 - vrf.io.read(2)(i) <> xs.io.rvd - vrf.io.mask_read(0)(i) <> xs.io.rvm - } - - vrf.io.read(0)(flat_vxs.length) <> vps.io.rvs2 - vps.io.rvs1.req.ready := true.B - - val index_access_eg = getEgId(io.index_access.vrs, io.index_access.eidx, io.index_access.eew, false.B) - val index_access_eg_oh = UIntToOH(index_access_eg) - val index_access_hazard = (allSeqs.map(_.io.seq_hazard).map { h => - h.valid && ((h.bits.wintent & index_access_eg_oh) =/= 0.U) - } ++ allIssQs.map(_.io.hazards).flatten.map { h => - h.valid && h.bits.wintent(io.index_access.vrs) - } ++ vxus.flatten.map(_.io.pipe_hazards).flatten.map { h => - h.valid && h.bits.eg === index_access_eg - } ++ vxus.flatten.map(_.io.iter_hazards).flatten.map { h => - h.valid && h.bits.eg === index_access_eg - }).orR || vdq.io.peek.map(i => i.valid && !(i.bits.vmu && i.bits.store)).orR - // TODO: this conservatively assumes a index data hazard against anything in the vdq - - vrf.io.read(0)(flat_vxs.size+1).req.valid := io.index_access.valid && !index_access_hazard - io.index_access.ready := vrf.io.read(0)(flat_vxs.size+1).req.ready && !index_access_hazard - vrf.io.read(0)(flat_vxs.size+1).req.bits.eg := index_access_eg - vrf.io.read(0)(flat_vxs.size+1).req.bits.oldest := false.B - io.index_access.idx := vrf.io.read(0)(flat_vxs.size+1).resp >> ((io.index_access.eidx << io.index_access.eew)(dLenOffBits-1,0) << 3) & eewBitMask(io.index_access.eew) - - vrf.io.read(2)(flat_vxs.size) <> vss.io.rvd - io.vmu.sdata.valid := vss.io.iss.valid - io.vmu.sdata.bits := vss.io.iss.bits - vss.io.iss.ready := io.vmu.sdata.ready - - vrf.io.mask_read(0)(flat_vxs.length) <> vls.io.rvm - vrf.io.mask_read(0)(flat_vxs.length+1) <> vss.io.rvm - vrf.io.mask_read(0)(flat_vxs.length+2) <> vps.io.rvm - val vm_busy = Wire(Bool()) - vrf.io.mask_read(0)(flat_vxs.length+3).req.valid := io.mask_access.valid && !vm_busy - vrf.io.mask_read(0)(flat_vxs.length+3).req.bits.eg := getEgId(0.U, io.mask_access.eidx, 0.U, true.B) - vrf.io.mask_read(0)(flat_vxs.length+3).req.bits.oldest := false.B - io.mask_access.ready := vrf.io.mask_read(0)(flat_vxs.length+3).req.ready && !vm_busy - io.mask_access.mask := vrf.io.mask_read(0)(flat_vxs.length+3).resp >> io.mask_access.eidx(log2Ceil(dLen)-1,0) - - - val vmu_index_q = Module(new Compactor(dLenB, dLenB, UInt(8.W), false)) - val vmu_mask_q = Module(new Compactor(dLenB, dLenB, Bool(), false)) - val perm_q = Module(new DCEQueue(new PermuteMicroOp, 2)) - - vmu_index_q.io.push_data := vps.io.iss.bits.rvs2_data.asTypeOf(Vec(dLenB, UInt(8.W))) - vmu_index_q.io.push.bits.head := vps.io.iss.bits.eidx << vps.io.iss.bits.rvs2_eew - vmu_index_q.io.push.bits.tail := Mux(vps.io.iss.bits.tail, - vps.io.iss.bits.vl << vps.io.iss.bits.rvs2_eew, - 0.U) - - vmu_mask_q.io.push_data := (vps.io.iss.bits.rvm_data >> vps.io.iss.bits.eidx(log2Ceil(dLen)-1,0))(dLenB-1,0).asBools - vmu_mask_q.io.push.bits.head := 0.U - vmu_mask_q.io.push.bits.tail := Mux(vps.io.iss.bits.tail, vps.io.iss.bits.vl, 0.U) - vps.io.iss.bits.eidx - - - vps.io.iss.ready := Mux(vps.io.iss.bits.vmu, - vmu_index_q.io.push.ready && vmu_mask_q.io.push.ready, - perm_q.io.enq.ready) - - vmu_index_q.io.push.valid := vps.io.iss.valid && vps.io.iss.bits.vmu && vps.io.iss.bits.renv2 && vps.io.iss.ready - vmu_mask_q.io.push.valid := vps.io.iss.valid && vps.io.iss.bits.vmu && vps.io.iss.bits.renvm && vps.io.iss.ready - - io.vmu.mask_pop <> vmu_mask_q.io.pop - io.vmu.mask_data := vmu_mask_q.io.pop_data - io.vmu.index_pop <> vmu_index_q.io.pop - io.vmu.index_data := vmu_index_q.io.pop_data - - perm_q.io.enq.valid := vps.io.iss.valid && !vps.io.iss.bits.vmu - perm_q.io.enq.bits := vps.io.iss.bits - - perm_q.io.deq.ready := perm_buffer.io.push.ready - perm_buffer.io.push.valid := perm_q.io.deq.valid - perm_buffer.io.push.bits.head := perm_q.io.deq.bits.eidx << perm_q.io.deq.bits.rvs2_eew - perm_buffer.io.push.bits.tail := Mux(perm_q.io.deq.bits.tail, - perm_q.io.deq.bits.vl << perm_q.io.deq.bits.rvs2_eew, - 0.U) - perm_buffer.io.push_data := perm_q.io.deq.bits.rvs2_data.asTypeOf(Vec(dLenB, UInt(8.W))) - - perm_buffer.io.pop <> vxs.head.head.io.perm.req - vxs.head.head.io.perm.data := perm_buffer.io.pop_data.asUInt - - // Clear the age tags - var r_idx = 0 - def clearVat(fire: Bool, tag: UInt) = { - assert(r_idx < nRelease) - io.vat_release(r_idx).valid := fire - io.vat_release(r_idx).bits := tag - r_idx += 1 - } - - clearVat(vls.io.iss.fire && vls.io.iss.bits.tail, vls.io.iss.bits.vat) - clearVat(vss.io.iss.fire && vss.io.iss.bits.tail, vss.io.iss.bits.vat) - vxs.flatten.foreach(xs => clearVat(xs.io.iss.fire && xs.io.iss.bits.tail, xs.io.iss.bits.vat)) - - // Signalling to frontend - val seq_inflight_wv0 = (allSeqs.map(_.io.seq_hazard).map { h => - h.valid && ((h.bits.wintent & ~(0.U(egsPerVReg.W))) =/= 0.U) - } ++ allIssQs.map(_.io.hazards).flatten.map { h => - h.valid && h.bits.wintent(0) - } ++ vxus.flatten.map(_.io.pipe_hazards).flatten.map { h => - h.valid && (h.bits.eg < egsPerVReg.U) - } ++ vxus.flatten.map(_.io.iter_hazards).flatten.map { h => - h.valid && (h.bits.eg < egsPerVReg.U) - }).orR - val vdq_inflight_wv0 = vdq.io.peek.map { h => - h.valid && h.bits.may_write_v0 - }.orR - - vm_busy := seq_inflight_wv0 || vdq_inflight_wv0 - io.busy := vdq.io.deq.valid || allSeqs.map(_.io.busy).orR || vxus.flatten.map(_.io.busy).asUInt.orR || resetting - io.set_vxsat := vxus.flatten.map(_.io.set_vxsat).asUInt.orR - io.set_fflags.valid := vxus.flatten.map(_.io.set_fflags.valid).asUInt.orR - io.set_fflags.bits := vxus.flatten.map( xu => Mux(xu.io.set_fflags.valid, xu.io.set_fflags.bits, 0.U)).reduce(_|_) - - // Only one of these should actually be connected - val scalar_write_arb = Module(new Arbiter(new ScalarWrite, flat_vxus.size)) - vxus.flatten.map(_.io.scalar_write).zip(scalar_write_arb.io.in).foreach { case (i,o) => o <> i } - io.scalar_resp <> scalar_write_arb.io.out -} diff --git a/arch/src/main/scala/framework/gendomain/backend/ExecuteSequencer.scala b/arch/src/main/scala/framework/gendomain/backend/ExecuteSequencer.scala deleted file mode 100644 index f4553d23..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/ExecuteSequencer.scala +++ /dev/null @@ -1,360 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import chisel3.experimental.dataview._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class ExecuteSequencer(supported_insns: Seq[VectorInstruction])(implicit p: Parameters) extends PipeSequencer(new ExecuteMicroOp)(p) { - def accepts(inst: VectorIssueInst) = !inst.vmu && new VectorDecoder(inst.funct3, inst.funct6, inst.rs1, inst.rs2, supported_insns, Nil).matched - - val valid = RegInit(false.B) - val inst = Reg(new BackendIssueInst) - val head = Reg(Bool()) - val reduction_head = Reg(Bool()) - val wvd_mask = Reg(UInt(egsTotal.W)) - val rvs1_mask = Reg(UInt(egsTotal.W)) - val rvs2_mask = Reg(UInt(egsTotal.W)) - val rvd_mask = Reg(UInt(egsTotal.W)) - val rvm_mask = Reg(UInt(egsPerVReg.W)) - val slide = Reg(Bool()) - val slide_up = Reg(Bool()) - val slide1 = Reg(Bool()) - val slide_offset = Reg(UInt((1+log2Ceil(maxVLMax)).W)) - val perm_head = Reg(UInt(dLenOffBits.W)) - val perm_tail = Reg(UInt(dLenOffBits.W)) - - val acc = Reg(Vec(dLenB, UInt(8.W))) - val acc_ready = Reg(Bool()) - val acc_tail = Reg(Bool()) - val acc_tail_id = Reg(UInt(log2Ceil(dLenB).W)) - - val ctrl = new VectorDecoder(inst.funct3, inst.funct6, inst.rs1, inst.rs2, supported_insns, - Seq(SetsWMask, UsesPermuteSeq, FPAdd, FPComp, Elementwise, UsesNarrowingSext, ZextImm5)) - - val mvnrr = inst.funct3 === OPIVI && inst.opif6 === OPIFunct6.mvnrr - val rgatherei16 = inst.funct3 === OPIVV && inst.opif6 === OPIFunct6.rgatherei16 - val compress = inst.opmf6 === OPMFunct6.compress - val vs1_eew = Mux(rgatherei16, 1.U, inst.vconfig.vtype.vsew) - val vs2_eew = inst.vconfig.vtype.vsew + inst.wide_vs2 - Mux(ctrl.bool(UsesNarrowingSext), ~inst.rs1(2,1) + 1.U, 0.U) - val vs3_eew = inst.vconfig.vtype.vsew + inst.wide_vd - val vd_eew = inst.vconfig.vtype.vsew + inst.wide_vd - val incr_eew = Seq( - Mux(inst.renv1, vs1_eew, 0.U), - Mux(inst.renv2, vs2_eew, 0.U), - Mux(inst.renvd, vs3_eew, 0.U), - vd_eew).foldLeft(0.U(2.W)) { case (b, a) => Mux(a > b, a, b) } - val acc_elementwise_opcodes = (Seq(OPFFunct6.fredosum, OPFFunct6.fwredosum) ++ - (if (vParams.useScalarFPMisc) Seq(OPFFunct6.fredmax, OPFFunct6.fredmin) else Nil) ++ - (if (vParams.useScalarFPFMA) Seq(OPFFunct6.fredusum, OPFFunct6.fwredusum) else Nil) - ) - val acc_copy = (vd_eew === 3.U && (dLenB == 8).B) || inst.opff6.isOneOf(acc_elementwise_opcodes) - val acc_last = acc_tail_id + 1.U === log2Ceil(dLenB).U - vd_eew || acc_copy - val uscalar = Mux(inst.funct3(2), inst.rs1_data, inst.imm5) - val sscalar = Mux(inst.funct3(2), inst.rs1_data, inst.imm5_sext) - val rgather = inst.opif6 === OPIFunct6.rgather - val rgather_ix = rgather && inst.funct3.isOneOf(OPIVX, OPIVI) - val rgather_v = rgather && inst.funct3.isOneOf(OPIVV) - val renv1 = Mux(inst.reduction, reduction_head, inst.renv1) - val renv2 = Mux(rgather_ix, head, Mux(inst.reduction, !reduction_head && !acc_tail, inst.renv2)) - val renvd = inst.renvd - val renvm = inst.renvm - val renacc = inst.reduction - - val use_wmask = !inst.vm && ctrl.bool(SetsWMask) - val eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val eff_vl = Mux(mvnrr, ((vLen/8).U >> vd_eew) << inst.emul, Mux(inst.scalar_to_vd0, 1.U, inst.vconfig.vl)) - val increments_as_mask = (!inst.renv1 || inst.reads_vs1_mask) && (!inst.renv2 || inst.reads_vs2_mask) && (!inst.wvd || inst.writes_mask) - val next_eidx = get_next_eidx(eff_vl, eidx, incr_eew, 0.U, increments_as_mask, ctrl.bool(Elementwise)) - val eidx_tail = next_eidx === eff_vl - val tail = Mux(inst.reduction, acc_tail && acc_last, eidx_tail) - - io.dis.ready := (!valid || (tail && io.iss.fire)) && !io.dis_stall - - when (io.dis.fire) { - val dis_inst = io.dis.bits - valid := true.B - inst := io.dis.bits - assert(dis_inst.vstart === 0.U) - eidx := 0.U - - val vd_arch_mask = get_arch_mask(dis_inst.rd , dis_inst.emul +& dis_inst.wide_vd) - val vs1_arch_mask = get_arch_mask(dis_inst.rs1, Mux(dis_inst.reads_vs1_mask, 0.U, dis_inst.emul)) - val vs2_arch_mask = get_arch_mask(dis_inst.rs2, Mux(dis_inst.reads_vs2_mask, 0.U, dis_inst.emul +& dis_inst.wide_vs2)) - - wvd_mask := Mux(dis_inst.wvd , FillInterleaved(egsPerVReg, vd_arch_mask), 0.U) - rvs1_mask := Mux(dis_inst.renv1, FillInterleaved(egsPerVReg, vs1_arch_mask), 0.U) - rvs2_mask := Mux(dis_inst.renv2, FillInterleaved(egsPerVReg, vs2_arch_mask), 0.U) - rvd_mask := Mux(dis_inst.renvd, FillInterleaved(egsPerVReg, vd_arch_mask), 0.U) - rvm_mask := Mux(dis_inst.renvm, ~(0.U(egsPerVReg.W)), 0.U) - head := true.B - reduction_head := true.B - acc_tail := false.B - acc_tail_id := 0.U - acc_ready := true.B - - val dis_slide = (dis_inst.funct6.isOneOf(OPIFunct6.slideup.litValue.U, OPIFunct6.slidedown.litValue.U) - && dis_inst.funct3 =/= OPIVV) - val dis_slide_up = !dis_inst.funct6(0) - val dis_vl = dis_inst.vconfig.vl - val dis_sew = dis_inst.vconfig.vtype.vsew - val dis_vlmax = dis_inst.vconfig.vtype.vlMax - val dis_next_eidx = get_next_eidx(dis_vl, 0.U, dis_sew, 0.U, false.B, false.B) - val dis_slide1 = !dis_inst.isOpi - val dis_uscalar = Mux(dis_inst.funct3(2), dis_inst.rs1_data, dis_inst.imm5) - val dis_slide_offset = Mux(!dis_slide1, get_max_offset(dis_uscalar), 1.U) - val dis_tail = dis_next_eidx === dis_vl - val dis_rgather_eew = Mux(dis_inst.opif6 === OPIFunct6.rgatherei16, 1.U, dis_sew) - slide := dis_slide - when (dis_slide) { - slide_up := dis_slide_up - slide1 := dis_slide1 - slide_offset := dis_slide_offset - } - perm_head := Mux(dis_slide && dis_slide_up, - (dis_slide_offset << dis_sew)(dLenOffBits-1,0), - 0.U) - perm_tail := Mux(dis_slide, - Mux(dis_slide_up, - Mux(dis_tail, dis_vl << dis_sew, 0.U), - (Mux(dis_next_eidx + dis_slide_offset <= dis_vlmax, dis_next_eidx, dis_vlmax - dis_slide_offset) << dis_sew)(dLenOffBits-1,0) - ), - 1.U << dis_rgather_eew) - } .elsewhen (io.iss.fire) { - valid := !tail - head := false.B - } - - when (io.acc.valid) { - acc_ready := true.B - for (i <- 0 until dLenB) when (io.acc.bits.mask(i*8)) { acc(i) := io.acc.bits.data >> (i*8) } - } - - io.vat := inst.vat - io.seq_hazard.valid := valid - io.seq_hazard.bits.rintent := hazardMultiply(rvs1_mask | rvs2_mask | rvd_mask | rvm_mask) - io.seq_hazard.bits.wintent := hazardMultiply(wvd_mask) - io.seq_hazard.bits.vat := inst.vat - - val vs1_read_oh = Mux(renv1 , UIntToOH(io.rvs1.req.bits.eg), 0.U) - val vs2_read_oh = Mux(renv2 , UIntToOH(io.rvs2.req.bits.eg), 0.U) - val vd_read_oh = Mux(renvd , UIntToOH(io.rvd.req.bits.eg ), 0.U) - val vm_read_oh = Mux(renvm , UIntToOH(io.rvm.req.bits.eg ), 0.U) - val vd_write_oh = Mux(inst.wvd, UIntToOH(io.iss.bits.wvd_eg), 0.U) - - val raw_hazard = ((vs1_read_oh | vs2_read_oh | vd_read_oh | vm_read_oh) & io.older_writes) =/= 0.U - val waw_hazard = (vd_write_oh & io.older_writes) =/= 0.U - val war_hazard = (vd_write_oh & io.older_reads) =/= 0.U - val data_hazard = raw_hazard || waw_hazard || war_hazard - - val acc_insns = supported_insns.filter(_.props.contains(Reduction.Y)) - val acc_ctrl = new VectorDecoder(inst.funct3, inst.funct6, inst.rs1, inst.rs2, acc_insns, Seq(AccInitZeros, AccInitOnes, AccInitPos, AccInitNeg)) - val acc_init_fp_pos = inst.opff6 === OPFFunct6.fredmin - val acc_init_fp_neg = inst.opff6 === OPFFunct6.fredmax - - val acc_init = Mux1H(Seq( - (acc_ctrl.bool(AccInitZeros) , 0.U(dLen.W)), - (acc_ctrl.bool(AccInitOnes) , ~(0.U(dLen.W))), - (acc_ctrl.bool(AccInitPos) , VecInit.tabulate(4)({sew => Fill(dLenB >> sew, maxPosUInt(sew))})(vd_eew)), - (acc_ctrl.bool(AccInitNeg) , VecInit.tabulate(4)({sew => Fill(dLenB >> sew, minNegUInt(sew))})(vd_eew)), - (acc_init_fp_pos, VecInit.tabulate(4)({sew => Fill(dLenB >> sew, maxPosFPUInt(sew))})(vd_eew)), - (acc_init_fp_neg, VecInit.tabulate(4)({sew => Fill(dLenB >> sew, minNegFPUInt(sew))})(vd_eew)), - )) - - val rgather_eidx = get_max_offset(Mux(rgather_ix && rgather, uscalar, io.perm.data & eewBitMask(vs1_eew))) - val rgather_zero = rgather_eidx >= inst.vconfig.vtype.vlMax - val rvs2_eidx = Mux(rgather || rgatherei16, rgather_eidx, eidx) - io.rvs1.req.bits.eg := getEgId(inst.rs1, eidx , vs1_eew, inst.reads_vs1_mask) - io.rvs2.req.bits.eg := getEgId(inst.rs2, rvs2_eidx, vs2_eew, inst.reads_vs2_mask) - io.rvd.req.bits.eg := getEgId(inst.rd , eidx , vs3_eew, false.B) - io.rvm.req.bits.eg := getEgId(0.U , eidx , 0.U , true.B) - - io.rvs1.req.valid := valid && renv1 - io.rvs2.req.valid := valid && renv2 - io.rvd.req.valid := valid && renvd - io.rvm.req.valid := valid && renvm - - val oldest = inst.vat === io.vat_head - io.rvs1.req.bits.oldest := oldest - io.rvs2.req.bits.oldest := oldest - io.rvd.req.bits.oldest := oldest - io.rvm.req.bits.oldest := oldest - - val read_perm_buffer = ctrl.bool(UsesPermuteSeq) && (!slide || Mux(slide_up, - next_eidx > slide_offset, - eidx +& slide_offset < inst.vconfig.vtype.vlMax)) - - io.perm.req.bits.head := perm_head - io.perm.req.bits.tail := perm_tail - - val slide_down_byte_mask = Mux(slide && !slide_up && next_eidx + slide_offset > inst.vconfig.vtype.vlMax, - Mux(eidx +& slide_offset >= inst.vconfig.vtype.vlMax, - 0.U, - ~(0.U(dLenB.W)) >> (0.U(dLenOffBits.W) - ((inst.vconfig.vtype.vlMax - slide_offset) << vs2_eew))(dLenOffBits-1,0)), - ~(0.U(dLenB.W))) - val slide_down_bit_mask = FillInterleaved(8, slide_down_byte_mask) - val iss_valid = (valid && - !data_hazard && - !(renv1 && !io.rvs1.req.ready) && - !(renv2 && !io.rvs2.req.ready) && - !(renvd && !io.rvd.req.ready) && - !(renvm && !io.rvm.req.ready) && - !(read_perm_buffer && !io.perm.req.ready) && - !(renacc && !acc_ready) - ) - io.perm.req.valid := iss_valid && read_perm_buffer && io.iss.ready - io.iss.valid := iss_valid && !(inst.reduction && reduction_head) - - io.iss.bits.rvs1_data := io.rvs1.resp - io.iss.bits.rvs2_data := io.rvs2.resp - io.iss.bits.rvd_data := io.rvd.resp - io.iss.bits.rvs1_elem := extractElem(io.rvs1.resp, vs1_eew, eidx) - io.iss.bits.rvs2_elem := extractElem(io.rvs2.resp, vs2_eew, eidx) - io.iss.bits.rvd_elem := extractElem(io.rvd.resp , vs3_eew, eidx) - io.iss.bits.rvs1_eew := vs1_eew - io.iss.bits.rvs2_eew := vs2_eew - io.iss.bits.rvd_eew := vs3_eew - io.iss.bits.vd_eew := vd_eew - io.iss.bits.eidx := eidx - io.iss.bits.vl := inst.vconfig.vl - io.iss.bits.wvd_eg := getEgId(inst.rd, Mux(inst.reduction, 0.U, eidx), vd_eew, inst.writes_mask) - io.iss.bits.rs1 := inst.rs1 - io.iss.bits.rs2 := inst.rs2 - io.iss.bits.rd := inst.rd - io.iss.bits.funct3 := inst.funct3 - io.iss.bits.funct6 := inst.funct6 - io.iss.bits.tail := tail - io.iss.bits.head := head - io.iss.bits.acc := inst.reduction - io.iss.bits.vat := inst.vat - io.iss.bits.vm := inst.vm - io.iss.bits.rm := inst.rm - - val dlen_mask = ~(0.U(dLenB.W)) - val head_mask = dlen_mask << (eidx << vd_eew)(dLenOffBits-1,0) - val tail_mask = dlen_mask >> (0.U(dLenOffBits.W) - (next_eidx << vd_eew)(dLenOffBits-1,0)) - val slide1up_mask = Mux(head && !inst.isOpi, eewByteMask(vs2_eew), 0.U) - val slideup_mask = Mux(slide && slide_up && eidx < slide_offset, - Mux(next_eidx <= slide_offset, 0.U, dlen_mask << (slide_offset << vd_eew)(dLenOffBits-1,0)) | slide1up_mask, - dlen_mask) - val full_tail_mask = Mux(tail, - ~(0.U(dLen.W)) >> (0.U(log2Ceil(dLen).W) - eff_vl(log2Ceil(dLen)-1,0)), - ~(0.U(dLen.W)) - ) - val vm_off = ((1 << dLenOffBits) - 1).U(log2Ceil(dLen).W) - val vm_eidx = (eidx & ~(vm_off >> vd_eew))(log2Ceil(dLen)-1,0) - val vm_resp = (io.rvm.resp >> vm_eidx)(dLenB-1,0) - val vm_mask = Mux(use_wmask, - VecInit.tabulate(4)({ sew => FillInterleaved(1 << sew, vm_resp)(dLenB-1,0) })(vd_eew), - ~(0.U(dLenB.W)) - ) - val acc_mask = Mux(acc_last, - eewByteMask(vd_eew), - VecInit.tabulate(log2Ceil(dLenB))(i => ~(0.U((dLen>>i).W)))(acc_tail_id)) - io.iss.bits.wmask := Mux(inst.reduction && acc_tail, - acc_mask, - head_mask & tail_mask & vm_mask & slideup_mask) - - io.iss.bits.rmask := Mux(inst.vm, ~(0.U(dLenB.W)), vm_resp) - io.iss.bits.rvm_data := Mux(inst.vm, ~(0.U(dLen.W)), io.rvm.resp) - io.iss.bits.full_tail_mask := full_tail_mask - - when (inst.funct3.isOneOf(OPIVI, OPIVX, OPMVX, OPFVF)) { - io.iss.bits.rvs1_elem := sscalar - io.iss.bits.rvs1_data := dLenSplat(Mux(ctrl.bool(ZextImm5), uscalar, sscalar), vs1_eew) - } - - when (inst.reduction) { - val acc_bits = acc.asUInt - val elementwise_acc = inst.opff6.isOneOf(OPFFunct6.fredosum, OPFFunct6.fwredosum) || ( - vParams.useScalarFPMisc.B && ctrl.bool(FPComp) && inst.isOpf - ) || ( - vParams.useScalarFPFMA.B && ctrl.bool(FPAdd) && inst.isOpf - ) - - when (elementwise_acc && !acc_tail) { - io.iss.bits.rvs2_data := io.iss.bits.rvs2_elem - val mask_bit = Mux(use_wmask, (io.rvm.resp >> eidx(log2Ceil(dLen)-1,0))(0), true.B) - io.iss.bits.wmask := VecInit.tabulate(4)({sew => Fill(1 << sew, mask_bit)})(vd_eew) - } - when (acc_tail) { - val folded = VecInit.tabulate(log2Ceil(dLenB))(i => { - val start = dLen >> (1 + i) - acc_bits(2*start-1,start) - })(acc_tail_id) - io.iss.bits.rvs1_elem := Mux(acc_copy, acc_init, folded) - io.iss.bits.rvs1_data := Mux(acc_copy, acc_init, folded) - io.iss.bits.rvs1_eew := vd_eew - io.iss.bits.rvs2_elem := acc_bits - io.iss.bits.rvs2_data := acc_bits - io.iss.bits.rvs2_eew := vd_eew - } .otherwise { - io.iss.bits.rvs1_elem := acc_bits - io.iss.bits.rvs1_data := acc_bits - io.iss.bits.rvs1_eew := vd_eew - } - } - when (rgather_v || rgatherei16) { - io.iss.bits.rvs1_elem := rgather_eidx - io.iss.bits.rvs1_data := rgather_eidx - } - when (rgather_zero && (rgather || rgatherei16)) { - io.iss.bits.rvs2_elem := 0.U - io.iss.bits.rvs2_data := 0.U - } - when (slide) { - io.iss.bits.rvs2_elem := io.perm.data & slide_down_bit_mask - io.iss.bits.rvs2_data := io.perm.data & slide_down_bit_mask - } - - when (iss_valid && inst.reduction && reduction_head) { - val v0_mask = eewBitMask(vd_eew) - acc := ((acc_init & ~v0_mask.pad(dLen)) | (io.rvs1.resp & v0_mask)).asTypeOf(Vec(dLenB, UInt(8.W))) - reduction_head := false.B - } - - when (io.iss.fire && !tail) { - when (next_is_new_eg(eidx, next_eidx, vd_eew, inst.writes_mask) && !inst.reduction && !compress && vParams.enableChaining.B) { - val wvd_clr_mask = UIntToOH(io.iss.bits.wvd_eg) - wvd_mask := wvd_mask & ~wvd_clr_mask - } - when (next_is_new_eg(eidx, next_eidx, vs2_eew, inst.reads_vs2_mask) && !(inst.reduction && head) && !rgather_v && !rgatherei16 && vParams.enableChaining.B) { - rvs2_mask := rvs2_mask & ~UIntToOH(io.rvs2.req.bits.eg) - } - when (rgather_ix && vParams.enableChaining.B) { - rvs2_mask := 0.U - } - when (next_is_new_eg(eidx, next_eidx, vs1_eew, inst.reads_vs1_mask) && vParams.enableChaining.B) { - rvs1_mask := rvs1_mask & ~UIntToOH(io.rvs1.req.bits.eg) - } - when (next_is_new_eg(eidx, next_eidx, vs3_eew, false.B) && vParams.enableChaining.B) { - rvd_mask := rvd_mask & ~UIntToOH(io.rvd.req.bits.eg) - } - when (next_is_new_eg(eidx, next_eidx, 0.U , true.B) && vParams.enableChaining.B) { - rvm_mask := rvm_mask & ~UIntToOH(io.rvm.req.bits.eg) - } - acc_ready := false.B - when (eidx_tail) { acc_tail := true.B } - when (acc_tail) { acc_tail_id := acc_tail_id + 1.U } - eidx := next_eidx - - when (ctrl.bool(UsesPermuteSeq) && slide) { - val next_next_eidx = get_next_eidx(eff_vl, next_eidx, incr_eew, 0.U, increments_as_mask, ctrl.bool(Elementwise)) - val next_tail = next_next_eidx === eff_vl - perm_head := Mux(slide_up, - Mux(next_eidx < slide_offset, (slide_offset << vs2_eew)(dLenOffBits-1,0), 0.U), - next_eidx << vs2_eew) - perm_tail := Mux(slide_up, - Mux(next_tail, eff_vl << vs2_eew, 0.U), - (Mux(next_next_eidx + slide_offset <= inst.vconfig.vtype.vlMax, next_next_eidx, inst.vconfig.vtype.vlMax - slide_offset) << vs2_eew)(dLenOffBits-1,0)) - } - } - - io.busy := valid - io.head := head -} diff --git a/arch/src/main/scala/framework/gendomain/backend/IssueQueue.scala b/arch/src/main/scala/framework/gendomain/backend/IssueQueue.scala deleted file mode 100644 index 74973a9d..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/IssueQueue.scala +++ /dev/null @@ -1,47 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import chisel3.experimental.dataview._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class IssueQueue(depth: Int, nSeqs: Int)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - - val io = IO(new Bundle { - val enq = Flipped(Decoupled(new IssueQueueInst(nSeqs))) - val deq = Decoupled(new IssueQueueInst(nSeqs)) - val hazards = Output(Vec(depth, Valid(new InstructionHazard))) - }) - - if (depth > 0) { - val q = Module(new DCEQueue(new IssueQueueInst(nSeqs), depth, pipe=true)) - q.io.enq <> io.enq - io.deq <> q.io.deq - - q.io.peek.zip(io.hazards).foreach { case (e,h) => - h.valid := e.valid - h.bits.vat := e.bits.vat - val rs2 = Mux(e.bits.rs1_is_rs2, e.bits.rs1, e.bits.rs2) - val only_writes_vd0 = e.bits.scalar_to_vd0 || e.bits.reduction - val vd_lmul = Mux(only_writes_vd0 , 0.U, e.bits.emul +& e.bits.wide_vd +& e.bits.nf_log2) - val vs1_lmul = Mux(e.bits.reads_vs1_mask, 0.U, e.bits.emul) - val vs2_lmul = Mux(e.bits.reads_vs2_mask, 0.U, e.bits.emul +& e.bits.wide_vs2 +& e.bits.nf_log2) - val vd_arch_mask = get_arch_mask(e.bits.rd , vd_lmul ) - val vs1_arch_mask = get_arch_mask(e.bits.rs1, vs1_lmul) - val vs2_arch_mask = get_arch_mask(rs2 , vs2_lmul) - h.bits.rintent := Seq( - (e.bits.renv1, vs1_arch_mask), - (e.bits.renv2, vs2_arch_mask), - (e.bits.renvd, vd_arch_mask), - (e.bits.renvm, 1.U) - ).map(t => Mux(t._1, t._2, 0.U)).reduce(_|_) - h.bits.wintent := Mux(e.bits.wvd, vd_arch_mask, 0.U) - } - } else { - io.deq <> io.enq - } -} diff --git a/arch/src/main/scala/framework/gendomain/backend/LoadSequencer.scala b/arch/src/main/scala/framework/gendomain/backend/LoadSequencer.scala deleted file mode 100644 index 1a05321c..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/LoadSequencer.scala +++ /dev/null @@ -1,92 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import framework.gendomain.common._ - -class LoadSequencer(implicit p: Parameters) extends PipeSequencer(new LoadRespMicroOp)(p) { - def accepts(inst: VectorIssueInst) = inst.vmu && !inst.opcode(5) - - val valid = RegInit(false.B) - val inst = Reg(new BackendIssueInst) - val eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val sidx = Reg(UInt(3.W)) - val wvd_mask = Reg(UInt(egsTotal.W)) - val rvm_mask = Reg(UInt(egsPerVReg.W)) - val head = Reg(Bool()) - - val renvm = !inst.vm - val next_eidx = get_next_eidx(inst.vconfig.vl, eidx, inst.mem_elem_size, 0.U, false.B, false.B) - val tail = next_eidx === inst.vconfig.vl && sidx === inst.seg_nf - - io.dis.ready := !valid || (tail && io.iss.fire) && !io.dis_stall - - when (io.dis.fire) { - val iss_inst = io.dis.bits - valid := true.B - inst := iss_inst - eidx := iss_inst.vstart - sidx := iss_inst.segstart - - val wvd_arch_mask = Wire(Vec(32, Bool())) - for (i <- 0 until 32) { - val group = i.U >> iss_inst.emul - val rd_group = iss_inst.rd >> iss_inst.emul - wvd_arch_mask(i) := group >= rd_group && group <= (rd_group + iss_inst.nf) - } - wvd_mask := FillInterleaved(egsPerVReg, wvd_arch_mask.asUInt) - rvm_mask := Mux(!iss_inst.vm, ~(0.U(egsPerVReg.W)), 0.U) - head := true.B - } .elsewhen (io.iss.fire) { - valid := !tail - head := false.B - } - - io.vat := inst.vat - io.seq_hazard.valid := valid - io.seq_hazard.bits.rintent := hazardMultiply(rvm_mask) - io.seq_hazard.bits.wintent := hazardMultiply(wvd_mask) - io.seq_hazard.bits.vat := inst.vat - - val vm_read_oh = Mux(renvm, UIntToOH(io.rvm.req.bits.eg), 0.U) - val vd_write_oh = UIntToOH(io.iss.bits.wvd_eg) - - val raw_hazard = (vm_read_oh & io.older_writes) =/= 0.U - val waw_hazard = (vd_write_oh & io.older_writes) =/= 0.U - val war_hazard = (vd_write_oh & io.older_reads) =/= 0.U - val data_hazard = raw_hazard || waw_hazard || war_hazard - - io.rvm.req.valid := valid && renvm - io.rvm.req.bits.eg := getEgId(0.U, eidx, 0.U, true.B) - io.rvm.req.bits.oldest := inst.vat === io.vat_head - - io.iss.valid := valid && !data_hazard && (!renvm || io.rvm.req.ready) - io.iss.bits.wvd_eg := getEgId(inst.rd + (sidx << inst.emul), eidx, inst.mem_elem_size, false.B) - io.iss.bits.tail := tail - io.iss.bits.vat := inst.vat - io.iss.bits.debug_id := inst.debug_id - - val head_mask = get_head_mask(~(0.U(dLenB.W)), eidx , inst.mem_elem_size) - val tail_mask = get_tail_mask(~(0.U(dLenB.W)), next_eidx, inst.mem_elem_size) - val vm_mask = Mux(!renvm, ~(0.U(dLenB.W)), get_vm_mask(io.rvm.resp, eidx, inst.mem_elem_size)) - io.iss.bits.wmask := Mux(sidx > inst.segend && inst.seg_nf =/= 0.U, 0.U, head_mask & tail_mask & vm_mask) - - when (io.iss.fire && !tail) { - when (next_is_new_eg(eidx, next_eidx, inst.mem_elem_size, false.B) && vParams.enableChaining.B) { - wvd_mask := wvd_mask & ~vd_write_oh - } - when (next_is_new_eg(eidx, next_eidx, 0.U, true.B) && vParams.enableChaining.B) { - rvm_mask := rvm_mask & ~UIntToOH(io.rvm.req.bits.eg) - } - when (sidx === inst.seg_nf) { - sidx := 0.U - eidx := next_eidx - } .otherwise { - sidx := sidx + 1.U - } - } - - io.busy := valid - io.head := head -} diff --git a/arch/src/main/scala/framework/gendomain/backend/PermuteSequencer.scala b/arch/src/main/scala/framework/gendomain/backend/PermuteSequencer.scala deleted file mode 100644 index c91311f9..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/PermuteSequencer.scala +++ /dev/null @@ -1,111 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class PermuteSequencer(exu_insns: Seq[VectorInstruction])(implicit p: Parameters) extends PipeSequencer(new PermuteMicroOp)(p) { - def accepts(inst: VectorIssueInst) = { - val needs_mask = inst.vmu && (!inst.vm && inst.mop =/= mopUnit) - val needs_index = inst.vmu && inst.mop(0) - val arith = !inst.vmu && new VectorDecoder(inst.funct3, inst.funct6, inst.rs1, inst.rs2, exu_insns.filter(_.props.contains(UsesPermuteSeq.Y)), Nil).matched - needs_mask || needs_index || arith - } - - val valid = RegInit(false.B) - val inst = Reg(new BackendIssueInst) - val eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val rvs2_mask = Reg(UInt(egsTotal.W)) - val rvm_mask = Reg(UInt(egsPerVReg.W)) - val head = Reg(Bool()) - val slide_offset = Reg(UInt((1+log2Ceil(maxVLMax)).W)) - val slide = !inst.vmu && inst.funct3 =/= OPIVV - val slide_up = !inst.funct6(0) - val rs2 = Mux(inst.rs1_is_rs2, inst.rs1, inst.rs2) - val gatherei16 = inst.funct3 === OPIVV && inst.opif6 === OPIFunct6.rgatherei16 - - val renvm = inst.renvm - val renv2 = inst.renv2 - val incr_eew = Mux(inst.vmu, inst.mem_idx_size, - Mux(gatherei16, 1.U, inst.vconfig.vtype.vsew)) - val eff_vl = Mux(slide, - Mux(slide_up, inst.vconfig.vl - slide_offset, min(inst.vconfig.vtype.vlMax, inst.vconfig.vl + slide_offset)), - inst.vconfig.vl - )(log2Ceil(maxVLMax),0) - val next_eidx = get_next_eidx(eff_vl, eidx, incr_eew, 0.U, false.B, false.B) - val tail = next_eidx === eff_vl - - io.dis.ready := !valid || (tail && io.iss.fire) && !io.dis_stall - - when (io.dis.fire) { - val iss_inst = io.dis.bits - val offset = Mux(iss_inst.isOpi, get_max_offset(Mux(iss_inst.funct3(2), iss_inst.rs1_data, iss_inst.imm5)), 1.U) - val slide = !iss_inst.vmu && iss_inst.funct3 =/= OPIVV - val slide_up = !iss_inst.funct6(0) - val slide_start = Mux(slide_up, 0.U, offset) - val vlmax = iss_inst.vconfig.vtype.vlMax - val slide_no_read = Mux(slide_up, - iss_inst.vconfig.vl <= offset, - offset >= vlmax) - valid := Mux(!slide, true.B, !slide_no_read) - inst := iss_inst - eidx := Mux(!slide, iss_inst.vstart, slide_start) - slide_offset := offset - - val rs2 = Mux(iss_inst.rs1_is_rs2, iss_inst.rs1, iss_inst.rs2) - val renv2_arch_mask = get_arch_mask(rs2, iss_inst.emul) - rvs2_mask := Mux(iss_inst.renv2, FillInterleaved(egsPerVReg, renv2_arch_mask), 0.U) - rvm_mask := Mux(iss_inst.renvm, ~(0.U(egsPerVReg.W)), 0.U) - head := true.B - } .elsewhen (io.iss.fire) { - valid := !tail - head := false.B - } - - io.vat := inst.vat - io.seq_hazard.valid := valid - io.seq_hazard.bits.rintent := hazardMultiply(rvs2_mask | rvm_mask) - io.seq_hazard.bits.wintent := false.B - io.seq_hazard.bits.vat := inst.vat - - val vs2_read_oh = Mux(renv2, UIntToOH(io.rvs2.req.bits.eg), 0.U) - val vm_read_oh = Mux(renvm, UIntToOH(io.rvm.req.bits.eg), 0.U) - - val raw_hazard = ((vm_read_oh | vs2_read_oh) & io.older_writes) =/= 0.U - val data_hazard = raw_hazard - - val oldest = inst.vat === io.vat_head - - io.rvs2.req.valid := valid && renv2 - io.rvs2.req.bits.eg := getEgId(rs2, eidx, incr_eew, false.B) - io.rvs2.req.bits.oldest := oldest - io.rvm.req.valid := valid && renvm - io.rvm.req.bits.eg := getEgId(0.U, eidx, 0.U, true.B) - io.rvm.req.bits.oldest := oldest - - io.iss.valid := valid && !data_hazard && (!renvm || io.rvm.req.ready) && (!renv2 || io.rvs2.req.ready) - io.iss.bits.renv2 := renv2 - io.iss.bits.renvm := renvm - io.iss.bits.rvs2_data := io.rvs2.resp - io.iss.bits.rvs2_eew := incr_eew - io.iss.bits.eidx := eidx - io.iss.bits.vl := eff_vl - io.iss.bits.rvm_data := Mux(renvm, io.rvm.resp, ~(0.U(dLen.W))) - io.iss.bits.vmu := inst.vmu - io.iss.bits.tail := tail - - when (io.iss.fire && !tail) { - when (next_is_new_eg(eidx, next_eidx, incr_eew, false.B) && vParams.enableChaining.B) { - rvs2_mask := rvs2_mask & ~vs2_read_oh - } - when (next_is_new_eg(eidx, next_eidx, 0.U, true.B) && vParams.enableChaining.B) { - rvm_mask := rvm_mask & ~UIntToOH(io.rvm.req.bits.eg) - } - eidx := next_eidx - } - - io.busy := valid - io.head := head -} diff --git a/arch/src/main/scala/framework/gendomain/backend/PipeSequencer.scala b/arch/src/main/scala/framework/gendomain/backend/PipeSequencer.scala deleted file mode 100644 index cd9db257..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/PipeSequencer.scala +++ /dev/null @@ -1,74 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.tile.{CoreModule} -import framework.gendomain.common._ - -abstract class PipeSequencer[T <: Data](issType: T)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - - val io = IO(new Bundle { - val dis = Flipped(Decoupled(new BackendIssueInst)) - val dis_stall = Input(Bool()) // used to disable OOO - - val seq_hazard = Output(Valid(new SequencerHazard)) - - val vat = Output(UInt(vParams.vatSz.W)) - val vat_head = Input(UInt(vParams.vatSz.W)) - val older_writes = Input(UInt(egsTotal.W)) - val older_reads = Input(UInt(egsTotal.W)) - - val busy = Output(Bool()) - val head = Output(Bool()) - - val rvs1 = new VectorReadIO - val rvs2 = new VectorReadIO - val rvd = new VectorReadIO - val rvm = new VectorReadIO - val perm = new Bundle { - val req = Decoupled(new CompactorReq(dLenB)) - val data = Input(UInt(dLen.W)) - } - - val iss = Decoupled(issType) - - val acc = Input(Valid(new VectorWrite(dLen))) - }) - def accepts(inst: VectorIssueInst): Bool - - def min(a: UInt, b: UInt) = Mux(a > b, b, a) - def get_max_offset(offset: UInt): UInt = min(offset, maxVLMax.U)(log2Ceil(maxVLMax),0) - def get_head_mask(bit_mask: UInt, eidx: UInt, eew: UInt) = bit_mask << (eidx << eew)(dLenOffBits-1,0) - def get_tail_mask(bit_mask: UInt, eidx: UInt, eew: UInt) = bit_mask >> (0.U(dLenOffBits.W) - (eidx << eew)(dLenOffBits-1,0)) - def get_vm_mask(mask_resp: UInt, eidx: UInt, eew: UInt) = { - val vm_off = ((1 << dLenOffBits) - 1).U(log2Ceil(dLen).W) - val vm_eidx = (eidx & ~(vm_off >> eew))(log2Ceil(dLen)-1,0) - val vm_resp = (mask_resp >> vm_eidx)(dLenB-1,0) - Mux1H(UIntToOH(eew), (0 until 4).map { w => FillInterleaved(1 << w, vm_resp) }) - } - def get_next_eidx(vl: UInt, eidx: UInt, eew: UInt, sub_dlen: UInt, reads_mask: Bool, elementwise: Bool) = { - val next = Wire(UInt((1+log2Ceil(maxVLMax)).W)) - next := Mux(elementwise, eidx +& 1.U, Mux(reads_mask, - eidx +& dLen.U, - (((eidx >> (dLenOffBits.U - eew - sub_dlen)) +& 1.U) << (dLenOffBits.U - eew - sub_dlen)) - )) - min(vl, next) - } - def next_is_new_eg(eidx: UInt, next_eidx: UInt, eew: UInt, masked: Bool) = { - val offset = Mux(masked, log2Ceil(dLen).U, dLenOffBits.U - eew) - (next_eidx >> offset) =/= (eidx >> offset) - } - - - io.rvs1.req.valid := false.B - io.rvs1.req.bits := DontCare - io.rvs2.req.valid := false.B - io.rvs2.req.bits := DontCare - io.rvd.req.valid := false.B - io.rvd.req.bits := DontCare - io.rvm.req.valid := false.B - io.rvm.req.bits := DontCare - io.perm.req.valid := false.B - io.perm.req.bits := DontCare -} diff --git a/arch/src/main/scala/framework/gendomain/backend/RegisterFile.scala b/arch/src/main/scala/framework/gendomain/backend/RegisterFile.scala deleted file mode 100644 index ccd37d1f..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/RegisterFile.scala +++ /dev/null @@ -1,180 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.tile.{CoreModule} -import freechips.rocketchip.util._ -import framework.gendomain.common._ - -class OldestRRArbiter(val n: Int)(implicit p: Parameters) extends Module { - val io = IO(new ArbiterIO(new VectorReadReq, n)) - - val arb = Module(new RRArbiter(new VectorReadReq, n)) - io <> arb.io - val oldest_oh = io.in.map(i => i.valid && i.bits.oldest) - //assert(PopCount(oldest_oh) <= 1.U) - when (oldest_oh.orR) { - io.chosen := VecInit(oldest_oh).asUInt - io.out.valid := true.B - io.out.bits := Mux1H(oldest_oh, io.in.map(_.bits)) - for (i <- 0 until n) { - io.in(i).ready := oldest_oh(i) && io.out.ready - } - } -} - -class RegisterReadXbar(n: Int, banks: Int)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val in = Vec(n, Flipped(new VectorReadIO)) - val out = Vec(banks, new VectorReadIO) - }) - - val arbs = Seq.fill(banks) { Module(new OldestRRArbiter(n)) } - for (i <- 0 until banks) { - io.out(i).req <> arbs(i).io.out - } - - val bankOffset = log2Ceil(banks) - - for (i <- 0 until n) { - val bank_sel = if (bankOffset == 0) true.B else UIntToOH(io.in(i).req.bits.eg(bankOffset-1,0)) - for (j <- 0 until banks) { - arbs(j).io.in(i).valid := io.in(i).req.valid && bank_sel(j) - arbs(j).io.in(i).bits.eg := io.in(i).req.bits.eg >> bankOffset - arbs(j).io.in(i).bits.oldest := io.in(i).req.bits.oldest - } - io.in(i).req.ready := Mux1H(bank_sel, arbs.map(_.io.in(i).ready)) - io.in(i).resp := Mux1H(bank_sel, io.out.map(_.resp)) - } -} - -class RegisterFileBank(reads: Int, maskReads: Int, rows: Int, maskRows: Int)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val read = Vec(reads, Flipped(new VectorReadIO)) - val mask_read = Vec(maskReads, Flipped(new VectorReadIO)) - val write = Input(Valid(new VectorWrite(dLen))) - val ll_write = Flipped(Decoupled(new VectorWrite(dLen))) - }) - val ll_write_valid = RegInit(false.B) - val ll_write_bits = Reg(new VectorWrite(dLen)) - - val vrf = Mem(rows, Vec(dLen, Bool())) - val v0_mask = Mem(maskRows, Vec(dLen, Bool())) - for (read <- io.read) { - read.req.ready := !(ll_write_valid && read.req.bits.eg === ll_write_bits.eg) - read.resp := DontCare - when (read.req.valid) { - read.resp := vrf.read(read.req.bits.eg).asUInt - } - } - for (mask_read <- io.mask_read) { - mask_read.req.ready := !(ll_write_valid && mask_read.req.bits.eg === ll_write_bits.eg) - mask_read.resp := DontCare - when (mask_read.req.valid) { - mask_read.resp := v0_mask.read(mask_read.req.bits.eg).asUInt - } - } - - val write = WireInit(io.write) - io.ll_write.ready := false.B - if (vParams.vrfHiccupBuffer) { - when (!io.write.valid) { // drain hiccup buffer - write.valid := ll_write_valid || io.ll_write.valid - write.bits := Mux(ll_write_valid, ll_write_bits, io.ll_write.bits) - ll_write_valid := false.B - when (io.ll_write.valid && ll_write_valid) { - ll_write_valid := true.B - ll_write_bits := io.ll_write.bits - } - io.ll_write.ready := true.B - } .elsewhen (!ll_write_valid) { // fill hiccup buffer - when (io.ll_write.valid) { - ll_write_valid := true.B - ll_write_bits := io.ll_write.bits - } - io.ll_write.ready := true.B - } - } else { - when (!io.write.valid) { - io.ll_write.ready := true.B - write.valid := io.ll_write.valid - write.bits := io.ll_write.bits - } - } - - when (write.valid) { - vrf.write( - write.bits.eg, - VecInit(write.bits.data.asBools), - write.bits.mask.asBools) - when (write.bits.eg < maskRows.U) { - v0_mask.write( - write.bits.eg, - VecInit(write.bits.data.asBools), - write.bits.mask.asBools) - } - } -} - -class RegisterFile(reads: Seq[Int], maskReads: Seq[Int], pipeWrites: Int, llWrites: Int)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - - val nBanks = vParams.vrfBanking - // Support 1, 2, and 4 banks for the VRF - require(nBanks == 1 || nBanks == 2 || nBanks == 4) - - val io = IO(new Bundle { - val read = MixedVec(reads.map(rc => Vec(rc, Flipped(new VectorReadIO)))) - val mask_read = MixedVec(maskReads.map(rc => Vec(rc, Flipped(new VectorReadIO)))) - - val pipe_writes = Vec(pipeWrites, Input(Valid(new VectorWrite(dLen)))) - val ll_writes = Vec(llWrites, Flipped(Decoupled(new VectorWrite(dLen)))) - }) - - val vrf = Seq.fill(nBanks) { Module(new RegisterFileBank(reads.size, maskReads.size, egsTotal/nBanks, if (egsPerVReg < nBanks) 1 else egsPerVReg / nBanks)) } - - reads.zipWithIndex.foreach { case (rc, i) => - val xbar = Module(new RegisterReadXbar(rc, nBanks)) - vrf.zipWithIndex.foreach { case (bank, j) => - bank.io.read(i) <> xbar.io.out(j) - } - xbar.io.in <> io.read(i) - } - - maskReads.zipWithIndex.foreach { case (rc, i) => - val mask_xbar = Module(new RegisterReadXbar(rc, nBanks)) - vrf.zipWithIndex.foreach { case (bank, j) => - bank.io.mask_read(i) <> mask_xbar.io.out(j) - } - mask_xbar.io.in <> io.mask_read(i) - } - - io.ll_writes.foreach(_.ready := false.B) - - vrf.zipWithIndex.foreach { case (rf, i) => - val bank_match = io.pipe_writes.map { w => (w.bits.bankId === i.U) && w.valid } - val bank_write_data = Mux1H(bank_match, io.pipe_writes.map(_.bits.data)) - val bank_write_mask = Mux1H(bank_match, io.pipe_writes.map(_.bits.mask)) - val bank_write_eg = Mux1H(bank_match, io.pipe_writes.map(_.bits.eg)) - val bank_write_valid = bank_match.orR - - rf.io.write.valid := bank_write_valid - rf.io.write.bits.data := bank_write_data - rf.io.write.bits.mask := bank_write_mask - rf.io.write.bits.eg := bank_write_eg >> vrfBankBits - when (bank_write_valid) { PopCount(bank_match) === 1.U } - - val ll_arb = Module(new Arbiter(new VectorWrite(dLen), llWrites)) - rf.io.ll_write <> ll_arb.io.out - - io.ll_writes.zipWithIndex.foreach { case (w, j) => - ll_arb.io.in(j).valid := w.valid && w.bits.bankId === i.U - ll_arb.io.in(j).bits.eg := w.bits.eg >> vrfBankBits - ll_arb.io.in(j).bits.data := w.bits.data - ll_arb.io.in(j).bits.mask := w.bits.mask - when (ll_arb.io.in(j).ready && w.bits.bankId === i.U) { - w.ready := true.B - } - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/backend/StoreSequencer.scala b/arch/src/main/scala/framework/gendomain/backend/StoreSequencer.scala deleted file mode 100644 index 5f3bfbba..00000000 --- a/arch/src/main/scala/framework/gendomain/backend/StoreSequencer.scala +++ /dev/null @@ -1,98 +0,0 @@ -package framework.gendomain.backend - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import framework.gendomain.common._ - -class StoreSequencer(implicit p: Parameters) extends PipeSequencer(new StoreDataMicroOp)(p) { - def accepts(inst: VectorIssueInst) = inst.vmu && inst.opcode(5) - - val valid = RegInit(false.B) - val inst = Reg(new VectorIssueInst) - val eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val sidx = Reg(UInt(3.W)) - val rvd_mask = Reg(UInt(egsTotal.W)) - val rvm_mask = Reg(UInt(egsPerVReg.W)) - val sub_dlen = Reg(UInt(2.W)) - val head = Reg(Bool()) - - val renvm = !inst.vm && inst.mop === mopUnit - val next_eidx = get_next_eidx(inst.vconfig.vl, eidx, inst.mem_elem_size, sub_dlen, false.B, false.B) - val tail = next_eidx === inst.vconfig.vl && sidx === inst.seg_nf - - io.dis.ready := !valid || (tail && io.iss.fire) && !io.dis_stall - - when (io.dis.fire) { - val iss_inst = io.dis.bits - valid := true.B - inst := iss_inst - eidx := iss_inst.vstart - sidx := 0.U - - val rvd_arch_mask = Wire(Vec(32, Bool())) - for (i <- 0 until 32) { - val group = i.U >> iss_inst.emul - val rd_group = iss_inst.rd >> iss_inst.emul - rvd_arch_mask(i) := group >= rd_group && group <= (rd_group + iss_inst.nf) - } - rvd_mask := FillInterleaved(egsPerVReg, rvd_arch_mask.asUInt) - rvm_mask := Mux(!iss_inst.vm, ~(0.U(egsPerVReg.W)), 0.U) - sub_dlen := Mux(iss_inst.seg_nf =/= 0.U && (dLenOffBits.U > (3.U +& iss_inst.mem_elem_size)), - dLenOffBits.U - 3.U - iss_inst.mem_elem_size, - 0.U) - head := true.B - } .elsewhen (io.iss.fire) { - valid := !tail - head := false.B - } - - io.vat := inst.vat - io.seq_hazard.valid := valid - io.seq_hazard.bits.rintent := hazardMultiply(rvd_mask | rvm_mask) - io.seq_hazard.bits.wintent := 0.U - io.seq_hazard.bits.vat := inst.vat - - val vd_read_oh = UIntToOH(io.rvd.req.bits.eg) - val vm_read_oh = Mux(renvm, UIntToOH(io.rvm.req.bits.eg), 0.U) - - val raw_hazard = ((vm_read_oh | vd_read_oh) & io.older_writes) =/= 0.U - val data_hazard = raw_hazard - - val oldest = inst.vat === io.vat_head - - io.rvd.req.valid := valid && io.iss.ready - io.rvd.req.bits.eg := getEgId(inst.rd + (sidx << inst.emul), eidx, inst.mem_elem_size, false.B) - io.rvd.req.bits.oldest := oldest - io.rvm.req.valid := valid && renvm && io.iss.ready - io.rvm.req.bits.eg := getEgId(0.U, eidx, 0.U, true.B) - io.rvm.req.bits.oldest := oldest - - io.iss.valid := valid && !data_hazard && (!renvm || io.rvm.req.ready) && io.rvd.req.ready - io.iss.bits.stdata := io.rvd.resp - val head_mask = get_head_mask(~(0.U(dLenB.W)), eidx , inst.mem_elem_size) - val tail_mask = get_tail_mask(~(0.U(dLenB.W)), next_eidx, inst.mem_elem_size) - val vm_mask = Mux(!renvm, ~(0.U(dLenB.W)), get_vm_mask(io.rvm.resp, eidx, inst.mem_elem_size)) - io.iss.bits.stmask := vm_mask - io.iss.bits.debug_id := inst.debug_id - io.iss.bits.tail := tail - io.iss.bits.vat := inst.vat - - when (io.iss.fire && !tail) { - when (next_is_new_eg(eidx, next_eidx, inst.mem_elem_size, false.B) && vParams.enableChaining.B) { - rvd_mask := rvd_mask & ~UIntToOH(io.rvd.req.bits.eg) - } - when (next_is_new_eg(eidx, next_eidx, 0.U, true.B) && vParams.enableChaining.B) { - rvm_mask := rvm_mask & ~UIntToOH(io.rvm.req.bits.eg) - } - when (sidx === inst.seg_nf) { - sidx := 0.U - eidx := next_eidx - } .otherwise { - sidx := sidx + 1.U - } - } - - io.busy := valid - io.head := head -} diff --git a/arch/src/main/scala/framework/gendomain/common/Bundles.scala b/arch/src/main/scala/framework/gendomain/common/Bundles.scala deleted file mode 100644 index b70fab7f..00000000 --- a/arch/src/main/scala/framework/gendomain/common/Bundles.scala +++ /dev/null @@ -1,265 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ - -class VectorMemMacroOp(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val debug_id = UInt(debugIdSz.W) - - val base_offset = UInt(pgIdxBits.W) - val page = UInt((paddrBits - pgIdxBits).W) - val stride = UInt(pgIdxBits.W) - - val segstart = UInt(3.W) - val segend = UInt(3.W) - val vstart = UInt(log2Ceil(maxVLMax).W) - val vl = UInt((1+log2Ceil(maxVLMax)).W) - - val mop = UInt(2.W) - val vm = Bool() - val nf = UInt(3.W) - - val idx_size = UInt(2.W) - val elem_size = UInt(2.W) - val whole_reg = Bool() - val store = Bool() - val fast_sg = Bool() - - def indexed = !mop.isOneOf(mopUnit, mopStrided) - def seg_nf = Mux(whole_reg, 0.U, nf) - def wr_nf = Mux(whole_reg, nf, 0.U) -} - - -class VectorIssueInst(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val pc = UInt(vaddrBitsExtended.W) - val bits = UInt(32.W) - val vconfig = new VConfig - - val vstart = UInt(log2Ceil(maxVLMax).W) - val segstart = UInt(3.W) - val segend = UInt(3.W) - val rs1_data = UInt(xLen.W) - val rs2_data = UInt(xLen.W) - val page = UInt((paddrBits - pgIdxBits).W) - val vat = UInt(vParams.vatSz.W) - val rm = UInt(3.W) - val emul = UInt(2.W) - val fast_sg = Bool() - val debug_id = UInt(debugIdSz.W) - val mop = UInt(2.W) // stored separately from bits since dispatch may need to set this - - def opcode = bits(6,0) - def store = opcode(5) - def mem_idx_size = bits(13,12) - def mem_elem_size = Mux(mop(0), vconfig.vtype.vsew, bits(13,12)) - def vm = bits(25) - def orig_mop = bits(27,26) - def umop = bits(24,20) - def nf = bits(31,29) - def wr = orig_mop === mopUnit && umop === lumopWhole - def seg_nf = Mux(wr, 0.U, nf) - def wr_nf = Mux(wr, nf, 0.U) - def vmu = opcode.isOneOf(opcLoad, opcStore) - def rs1 = bits(19,15) - def rs2 = bits(24,20) - def rd = bits(11,7) - def may_write_v0 = rd === 0.U && opcode =/= opcStore - def funct3 = bits(14,12) - def imm5 = bits(19,15) - def imm5_sext = Cat(Fill(59, imm5(4)), imm5) - def funct6 = bits(31,26) - def writes_xrf = !vmu && ((funct3 === OPMVV && opmf6 === OPMFunct6.wrxunary0) || (funct3 === OPFVV && opff6 === OPFFunct6.wrfunary0)) - def writes_frf = !vmu && (funct3 === OPFVV) - - def isOpi = funct3.isOneOf(OPIVV, OPIVI, OPIVX) - def isOpm = funct3.isOneOf(OPMVV, OPMVX) - def isOpf = funct3.isOneOf(OPFVV, OPFVF) - - def opmf6 = Mux(isOpm, OPMFunct6(funct6), OPMFunct6.illegal) - def opif6 = Mux(isOpi, OPIFunct6(funct6), OPIFunct6.illegal) - def opff6 = Mux(isOpf, OPFFunct6(funct6), OPFFunct6.illegal) -} - -class BackendIssueInst(implicit p: Parameters) extends VectorIssueInst()(p) { - val reduction = Bool() // accumulates into vd[0] - val scalar_to_vd0 = Bool() // mv scalar to vd[0] - val wide_vd = Bool() // vd reads/writes at 2xSEW - val wide_vs2 = Bool() // vs2 reads at 2xSEW - val writes_mask = Bool() // writes dest as a mask - val reads_vs1_mask = Bool() // vs1 read as mask - val reads_vs2_mask = Bool() // vs2 read as mask - val rs1_is_rs2 = Bool() - val nf_log2 = UInt(2.W) - - val renv1 = Bool() - val renv2 = Bool() - val renvd = Bool() - val renvm = Bool() - val wvd = Bool() -} - -class IssueQueueInst(nSeqs: Int)(implicit p: Parameters) extends BackendIssueInst()(p) { - val seq = UInt(nSeqs.W) -} - -class VectorWrite(writeBits: Int)(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val eg = UInt(log2Ceil(32 * vLen / writeBits).W) - def bankId = if (vrfBankBits == 0) 0.U else eg(vrfBankBits-1,0) - val data = UInt(writeBits.W) - val mask = UInt(writeBits.W) -} - -class ScalarWrite extends Bundle { - val data = UInt(64.W) - val fp = Bool() - val size = UInt(2.W) - val rd = UInt(5.W) -} - -class VectorReadReq(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val eg = UInt(log2Ceil(egsTotal).W) - val oldest = Bool() -} - -class VectorReadIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val req = Decoupled(new VectorReadReq) - val resp = Input(UInt(dLen.W)) -} - -class VectorIndexAccessIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val ready = Output(Bool()) - val valid = Input(Bool()) - val vrs = Input(UInt(5.W)) - val eidx = Input(UInt((1+log2Ceil(maxVLMax)).W)) - val eew = Input(UInt(2.W)) - val idx = Output(UInt(64.W)) -} - -class VectorMaskAccessIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val ready = Output(Bool()) - val valid = Input(Bool()) - val eidx = Input(UInt((1+log2Ceil(maxVLMax)).W)) - val mask = Output(Bool()) -} - -class MaskedByte(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val debug_id = UInt(debugIdSz.W) - val data = UInt(8.W) - val mask = Bool() -} - -class ExecuteMicroOp(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val eidx = UInt(log2Ceil(maxVLMax).W) - val vl = UInt((1+log2Ceil(maxVLMax)).W) - - val rvs1_data = UInt(dLen.W) - val rvs2_data = UInt(dLen.W) - val rvd_data = UInt(dLen.W) - val rvm_data = UInt(dLen.W) - - val rvs1_elem = UInt(64.W) - val rvs2_elem = UInt(64.W) - val rvd_elem = UInt(64.W) - - val rvs1_eew = UInt(2.W) - val rvs2_eew = UInt(2.W) - val rvd_eew = UInt(2.W) - val vd_eew = UInt(2.W) - - val rmask = UInt(dLenB.W) - val wmask = UInt(dLenB.W) - - val full_tail_mask = UInt(dLen.W) - - val wvd_eg = UInt(log2Ceil(egsTotal).W) - - val funct3 = UInt(3.W) - def isOpi = funct3.isOneOf(OPIVV, OPIVI, OPIVX) - def isOpm = funct3.isOneOf(OPMVV, OPMVX) - def isOpf = funct3.isOneOf(OPFVV, OPFVF) - - def opmf6 = Mux(isOpm, OPMFunct6(funct6), OPMFunct6.illegal) - def opif6 = Mux(isOpi, OPIFunct6(funct6), OPIFunct6.illegal) - def opff6 = Mux(isOpf, OPFFunct6(funct6), OPFFunct6.illegal) - - def vd_eew8 = vd_eew === 0.U - def vd_eew16 = vd_eew === 1.U - def vd_eew32 = vd_eew === 2.U - def vd_eew64 = vd_eew === 3.U - - val funct6 = UInt(6.W) - val rs1 = UInt(5.W) - val rs2 = UInt(5.W) - val rd = UInt(5.W) - val vm = Bool() - - val head = Bool() - val tail = Bool() - val vat = UInt(vParams.vatSz.W) - val acc = Bool() - - val rm = UInt(3.W) - def vxrm = rm(1,0) - def frm = rm -} - -class StoreDataMicroOp(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val stdata = UInt(dLen.W) - val stmask = UInt(dLenB.W) - val debug_id = UInt(debugIdSz.W) - val tail = Bool() - val vat = UInt(vParams.vatSz.W) - def asMaskedBytes = { - val bytes = Wire(Vec(dLenB, new MaskedByte)) - for (i <- 0 until dLenB) { - bytes(i).data := stdata(((i+1)*8)-1,i*8) - bytes(i).mask := stmask(i) - bytes(i).debug_id := debug_id - } - bytes - } -} - -class LoadRespMicroOp(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val wvd_eg = UInt(log2Ceil(egsTotal).W) - val wmask = UInt(dLenB.W) - val tail = Bool() - val debug_id = UInt(debugIdSz.W) - val vat = UInt(vParams.vatSz.W) -} - -class PermuteMicroOp(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val renv2 = Bool() - val renvm = Bool() - val rvs2_data = UInt(dLen.W) - val eidx = UInt(log2Ceil(maxVLMax).W) - val rvs2_eew = UInt(2.W) - val rvm_data = UInt(dLen.W) - val vmu = Bool() - val vl = UInt((1+log2Ceil(maxVLMax)).W) - val tail = Bool() -} - -class PipeHazard(pipe_depth: Int)(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val latency = UInt(log2Ceil(pipe_depth).W) - val eg = UInt(log2Ceil(egsTotal).W) - def eg_oh = UIntToOH(eg) -} - -class SequencerHazard(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val vat = UInt(vParams.vatSz.W) - val rintent = UInt(egsTotal.W) - val wintent = UInt(egsTotal.W) -} - - -class InstructionHazard(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val vat = UInt(vParams.vatSz.W) - val rintent = UInt(32.W) - val wintent = UInt(32.W) -} diff --git a/arch/src/main/scala/framework/gendomain/common/Compactor.scala b/arch/src/main/scala/framework/gendomain/common/Compactor.scala deleted file mode 100644 index 49ae19ba..00000000 --- a/arch/src/main/scala/framework/gendomain/common/Compactor.scala +++ /dev/null @@ -1,73 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ - -class CompactorReq(n: Int) extends Bundle { - val head = UInt(log2Ceil(n).W) - val tail = UInt(log2Ceil(n).W) - def count = Mux(tail === 0.U, n.U, tail) - head -} - -class Compactor[T <: Data](pushN: Int, popN: Int, gen: => T, forward: Boolean) extends Module { - require (pushN >= popN) - val io = IO(new Bundle { - val push = Flipped(Decoupled(new CompactorReq(pushN))) - val push_data = Input(Vec(pushN, gen)) - - val pop = Flipped(Decoupled(new CompactorReq(popN))) - val pop_data = Output(Vec(popN, gen)) - }) - - val (push, push_data) = if (forward) { - (io.push, io.push_data) - } else { - val push_q = Module(new Queue(new CompactorReq(pushN) { - val data = Vec(pushN, gen) - }, 2)) - push_q.io.enq.valid := io.push.valid - push_q.io.enq.bits.head := io.push.bits.head - push_q.io.enq.bits.tail := io.push.bits.tail - push_q.io.enq.bits.data := io.push_data - io.push.ready := push_q.io.enq.ready - (push_q.io.deq, push_q.io.deq.bits.data) - } - - def wshr(in: Seq[T], shamt: UInt): Seq[T] = - (0 until in.size).map { i => VecInit(in.drop(i))(shamt) } - def wshl(in: Seq[T], shamt: UInt): Seq[T] = - wshr(in.reverse, shamt).reverse - - val count = RegInit(0.U((1+log2Ceil(pushN)).W)) - val regs = Seq.fill(pushN) { Reg(gen) } - val valid = (1.U << count) - 1.U - - push.ready := pushN.U +& Mux(io.pop.valid, io.pop.bits.count, 0.U) >= count +& push.bits.count - io.pop.ready := count +& Mux(push.valid, push.bits.count, 0.U) >= io.pop.bits.count - - val regs_shr = wshr(regs, io.pop.bits.count) - val valid_shr = valid >> io.pop.bits.count - - when (push.fire || io.pop.fire) { - count := count +& Mux(push.fire, push.bits.count, 0.U) - Mux(io.pop.fire, io.pop.bits.count, 0.U) - } - - val push_elems = push_data - val push_shr = wshr((Seq.fill(pushN)(0.U.asTypeOf(gen)) ++ push_elems), pushN.U +& push.bits.head - count) - val push_shr_pop = wshr((Seq.fill(pushN)(0.U.asTypeOf(gen)) ++ push_elems), pushN.U +& push.bits.head +& io.pop.bits.count - count) - - when (io.pop.fire) { - for (i <- 0 until pushN) regs(i) := Mux(valid_shr(i), regs_shr(i), push_shr_pop(i)) - } .elsewhen (push.fire) { - for (i <- 0 until pushN) when (!valid(i)) { - regs(i) := push_shr(i) - } - } - - val out_data = (0 until popN).map { i => Mux(valid(i), regs(i), push_shr(i)) } - io.pop_data := VecInit(wshl(out_data, io.pop.bits.head).take(popN)) -} diff --git a/arch/src/main/scala/framework/gendomain/common/Consts.scala b/arch/src/main/scala/framework/gendomain/common/Consts.scala deleted file mode 100644 index e9efcd68..00000000 --- a/arch/src/main/scala/framework/gendomain/common/Consts.scala +++ /dev/null @@ -1,157 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import chisel3.util.experimental.decode._ - -object OPIFunct6 extends ChiselEnum { - val add = Value - val andn = Value - val sub, rsub, minu, min, maxu, max = Value - val _ = Value - val and, or, xor, rgather = Value - val _ = Value - val slideup, slidedown = Value - def rgatherei16 = slideup - - val adc = Value - val madc, sbc, msbc = Value - val ror, rol, _ = Value - val merge, mseq, msne, msltu, mslt, msleu, msle, msgtu, msgt = Value - - val saddu, sadd, ssubu, ssub = Value - val _ = Value - val sll = Value - val _ = Value - val smul = Value - def mvnrr = smul - val srl, sra, ssrl, ssra, nsrl, nsra, nclipu, nclip = Value - val wredsumu, wredsum = Value - val _, _, _ = Value - val wsll = Value - - val illegal = Value(0x40.U) -} - -object OPMFunct6 extends ChiselEnum { - val redsum, redand, redor, redxor, redminu, redmin, redmaxu, redmax, aaddu, aadd, asubu, asub = Value - val _, _ = Value - val slide1up, slide1down = Value - - val wrxunary0 = Value - val _ = Value - val xunary0 = Value - val _ = Value - val munary0 = Value - val _, _ = Value - val compress, mandnot, mand, mor, mxor, mornot, mnand, mnor, mxnor = Value - - val divu, div, remu, rem, mulhu, mul, mulhsu, mulh = Value - val _ = Value - val madd = Value - val _ = Value - val nmsub = Value - val _ = Value - val macc = Value - val _ = Value - val nmsac = Value - - val waddu, wadd, wsubu, wsub, wadduw, waddw, wsubuw, wsubw, wmulu = Value - val _ = Value - val wmulsu, wmul, wmaccu, wmacc, wmaccus, wmaccsu = Value - - val illegal = Value(0x40.U) -} - -object OPFFunct6 extends ChiselEnum { - val fadd, fredusum, fsub, fredosum, fmin, fredmin, fmax, fredmax, fsgnj, fsgnjn, fsgnjx = Value - val _, _, _ = Value - val fslide1up, fslide1down = Value - val wrfunary0 = Value - val _ = Value - val funary0, funary1 = Value - val _, _, _ = Value - val fmerge, mfeq, mfle = Value - val _ = Value - val mflt, mfne, mfgt = Value - val _ = Value - val mfge, fdiv, frdiv = Value - val _, _ = Value - val fmul = Value - val _, _ = Value - val frsub = Value - val fmadd, fnmadd, fmsub, fnmsub, fmacc, fnmacc, fmsac, fnmsac, fwadd, fwredusum, fwsub, fwredosum = Value - val fwaddw, _, fwsubw, _, fwmul, _, _, _, fwmacc, fwnmacc, fwmsac, fwnmsac = Value - val illegal = Value(0x40.U) -} - -trait HasVectorConsts { - def mopUnit = 0.U(2.W) - def mopUnordered = 1.U(2.W) - def mopStrided = 2.U(2.W) - def mopOrdered = 3.U(2.W) - - def lumopUnit = "b00000".U - def lumopWhole = "b01000".U - def lumopMask = "b01011".U - def lumopFF = "b10000".U - - - def sumopUnit = "b00000".U - def sumopWhole = "b01000".U - def sumopMask = "b01011".U - - def opcLoad = "b0000111".U - def opcStore = "b0100111".U - def opcVector = "b1010111".U - - def OPIVV = "b000".U(3.W) - def OPFVV = "b001".U(3.W) - def OPMVV = "b010".U(3.W) - def OPIVI = "b011".U(3.W) - def OPIVX = "b100".U(3.W) - def OPFVF = "b101".U(3.W) - def OPMVX = "b110".U(3.W) - def OPCFG = "b111".U(3.W) - - def X = BitPat("b?") - def N = BitPat("b0") - def Y = BitPat("b1") -} - -object VectorConsts extends HasVectorConsts - -object VecDecode extends HasVectorConsts { - def apply(funct3: UInt, funct6: UInt, default: Seq[BitPat], table: Seq[(EnumType, Seq[BitPat])]): Seq[UInt] = { - def enumToUInt(e: EnumType): Seq[UInt] = e match { - case v: OPIFunct6.Type => Seq(OPIVV, OPIVI, OPIVX).map { f3 => ((f3.litValue << 6) + v.litValue).U(9.W) } - case v: OPMFunct6.Type => Seq(OPMVV, OPMVX ).map { f3 => ((f3.litValue << 6) + v.litValue).U(9.W) } - case v: OPFFunct6.Type => Seq(OPFVV, OPFVF ).map { f3 => ((f3.litValue << 6) + v.litValue).U(9.W) } - } - val nElts = default.size - require(table.forall(_._2.size == nElts)) - - val elementsGrouped = table.map(_._2).transpose - val elementWidths = elementsGrouped.zip(default).map { case (elts, default) => - require(elts.forall(_.getWidth == default.getWidth)) - default.getWidth - } - val resultWidth = elementWidths.sum - val elementIndices = elementWidths.scan(resultWidth - 1) { case (l, r) => l - r } - val truthTable = TruthTable(table.map(e => enumToUInt(e._1).map(u => (BitPat(u), e._2.reduce(_ ## _)))).flatten, default.reduce(_ ## _)) - val decoded = chisel3.util.experimental.decode.decoder(Cat(funct3(2,0), funct6(5,0)), truthTable) - elementIndices.zip(elementIndices.tail).map { case (msb, lsb) => decoded(msb, lsb + 1) }.toSeq - } - - - def applyBools(funct3: UInt, funct6: UInt, default: Seq[BitPat], table: Seq[(EnumType, Seq[BitPat])]): Seq[Bool] = apply( - funct3, funct6, default, table).map(_(0)) - - def apply(funct3: UInt, funct6: UInt, trues: Seq[EnumType], falses: Seq[EnumType]): Bool = applyBools( - funct3, funct6, Seq(BitPat.dontCare(1)), trues.map(e => (e, Seq(BitPat(true.B)))) ++ falses.map(e => (e, Seq(BitPat(false.B)))))(0) - def apply(funct3: UInt, funct6: UInt, trues: Seq[EnumType]): Bool = applyBools( - funct3, funct6, Seq(BitPat(false.B)), trues.map(e => (e, Seq(BitPat(true.B)))))(0) -} diff --git a/arch/src/main/scala/framework/gendomain/common/DCEQueue.scala b/arch/src/main/scala/framework/gendomain/common/DCEQueue.scala deleted file mode 100644 index 73340d25..00000000 --- a/arch/src/main/scala/framework/gendomain/common/DCEQueue.scala +++ /dev/null @@ -1,83 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.rocket.Instructions._ - -class DCEQueue[T <: Data]( - val gen: T, - val entries: Int, - val pipe: Boolean = false, - val flow: Boolean = false)(implicit val p: Parameters) extends Module { - require(entries > -1, "Queue must have non-negative number of entries") - require(entries != 0, "Use companion object Queue.apply for zero entries") - - val io = IO(new QueueIO(gen, entries, false) { - val peek = Output(Vec(entries, Valid(gen))) - }) - val valids = RegInit(VecInit.fill(entries)(false.B)) - val ram = Reg(Vec(entries, gen)) - val enq_ptr = Counter(entries) - val deq_ptr = Counter(entries) - val maybe_full = RegInit(false.B) - val ptr_match = enq_ptr.value === deq_ptr.value - val empty = ptr_match && !maybe_full - val full = ptr_match && maybe_full - val do_enq = WireDefault(io.enq.fire) - val do_deq = WireDefault(io.deq.fire) - - for (i <- 0 until entries) { - io.peek(i).bits := ram(i) - io.peek(i).valid := valids(i) - } - - when(do_deq) { - deq_ptr.inc() - valids(deq_ptr.value) := false.B - } - - when(do_enq) { - ram(enq_ptr.value) := io.enq.bits - valids(enq_ptr.value) := true.B - enq_ptr.inc() - } - - when(do_enq =/= do_deq) { - maybe_full := do_enq - } - - io.deq.valid := !empty - io.enq.ready := !full - - io.deq.bits := ram(deq_ptr.value) - - if (flow) { - when(io.enq.valid) { io.deq.valid := true.B } - when(empty) { - io.deq.bits := io.enq.bits - do_deq := false.B - when(io.deq.ready) { do_enq := false.B } - } - } - - if (pipe) { - when(io.deq.ready) { io.enq.ready := true.B } - } - - val ptr_diff = enq_ptr.value - deq_ptr.value - - if (isPow2(entries)) { - io.count := Mux(maybe_full && ptr_match, entries.U, 0.U) | ptr_diff - } else { - io.count := Mux( - ptr_match, - Mux(maybe_full, entries.asUInt, 0.U), - Mux(deq_ptr.value > enq_ptr.value, entries.asUInt + ptr_diff, ptr_diff) - ) - } -} diff --git a/arch/src/main/scala/framework/gendomain/common/Parameters.scala b/arch/src/main/scala/framework/gendomain/common/Parameters.scala deleted file mode 100644 index 6c89a602..00000000 --- a/arch/src/main/scala/framework/gendomain/common/Parameters.scala +++ /dev/null @@ -1,404 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.diplomacy.{BufferParams} -import framework.gendomain.exu._ - -object VectorParams { - - // minParams: - // For a very small area-efficient vector unit with iterative - // and element-wise functional units - def minParams = VectorParams() - - // refParams - // For a standard modestly capable small vector unit with - // SIMD functional units - def refParams = minParams.copy( - vlrobEntries = 4, - vlissqEntries = 3, - vsissqEntries = 3, - vxissqEntries = 3, - vatSz = 5, - useSegmentedIMul = true, - doubleBufferSegments = true, - useScalarFPFMA = false, - vrfBanking = 4, - ) - - // dspParams - // For a wide high-performance vector unit with multi-issue - def dspParams = refParams.copy( - issStructure = VectorIssueStructure.Shared - ) - - // genParams: - // For a vector unit that performs better on less-optimized - // code sequences - def genParams = dspParams.copy( - issStructure = VectorIssueStructure.Split, - vlifqEntries = 16, - vlrobEntries = 16 - ) - - // multiFMAParams: - // Provides a second sequencer and set of functional units for FMA operations - def multiFMAParams = genParams.copy( - issStructure = VectorIssueStructure.MultiFMA - ) - - // multiMACParams: - // Provides a second sequencer and set of functional units for integer MAC operations - def multiMACParams = genParams.copy( - issStructure = VectorIssueStructure.MultiMAC - ) - - // dmaParams: - // For a vector unit that only does memcpys, and no arithmetic - def dmaParams = VectorParams( - vdqEntries = 2, - vliqEntries = 4, - vsiqEntries = 4, - vlifqEntries = 32, - vlrobEntries = 4, - vsifqEntries = 32, - vlissqEntries = 2, - vsissqEntries = 1, - vrfBanking = 1, - useIterativeIMul = true - ) - - // The parameters below are approximations - - // hwaParams - // For a vector unit with limited sequencer slots akin to Hwacha - def hwaParams = genParams.copy( - vatSz = 3, // 8 mseq Entries - vdqEntries = 1, - vlissqEntries = 8, - vsissqEntries = 8, - vxissqEntries = 8, - vpissqEntries = 8, - hwachaLimiter = Some(8), // sequencer slots - ) - - // lgvParams - // For a vector unit with very long vector lengths - def lgvParams = VectorParams( - vatSz = 5, - vlifqEntries = 32, - vsifqEntries = 32, - vlrobEntries = 32, - vlissqEntries = 8, - vsissqEntries = 8, - vxissqEntries = 8, - vpissqEntries = 8, - useSegmentedIMul = true, - useScalarFPMisc = false, - useScalarFPFMA = false, - vrfBanking = 4, - issStructure = VectorIssueStructure.Split - ) -} - -case class VXSequencerParams( - name: String, - fus: Seq[FunctionalUnitFactory] -) { - def insns = fus.map(_.insns).flatten -} - -case class VXIssuePathParams( - name: String, - depth: Int, - seqs: Seq[VXSequencerParams] -) { - def insns = seqs.map(_.insns).flatten -} - -object VXFunctionalUnitGroups { - def integerFUs(idivDoesImul: Boolean = false) = Seq( - IntegerPipeFactory, - ShiftPipeFactory, - BitwisePipeFactory, - IntegerDivideFactory(idivDoesImul), - MaskUnitFactory, - PermuteUnitFactory - ) - def integerMAC(pipeDepth: Int, useSegmented: Boolean) = Seq( - IntegerMultiplyFactory(pipeDepth, useSegmented) - ) - - def allIntegerFUs(idivDoesImul: Boolean, imaDepth: Int, useSegmentedImul: Boolean) = ( - integerFUs(idivDoesImul) ++ integerMAC(imaDepth, useSegmentedImul) - ) - - def sharedFPFMA(pipeDepth: Int) = Seq( - FPFMAFactory(pipeDepth, true) - ) - def sharedFPMisc = Seq( - SharedFPMiscFactory - ) - def fpFMA(pipeDepth: Int) = Seq( - FPFMAFactory(pipeDepth, false) - ) - def fpMisc = Seq( - FPDivSqrtFactory, - FPCmpFactory, - FPConvFactory - ) - - def allFPFUs(fmaPipeDepth: Int, useScalarFPFMA: Boolean, useScalarFPMisc: Boolean) = ( - (if (useScalarFPFMA) sharedFPFMA(fmaPipeDepth) else fpFMA(fmaPipeDepth)) ++ - (if (useScalarFPMisc) sharedFPMisc else fpMisc) - ) -} - -sealed trait VectorIssueStructure { - def generate(params: VectorParams): Seq[VXIssuePathParams] -} - -object VectorIssueStructure { - import VXFunctionalUnitGroups._ - - case object Unified extends VectorIssueStructure { - def generate(params: VectorParams) = { - val fp_int_path = VXIssuePathParams( - name = "fp_int", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("fp_int", ( - allIntegerFUs(params.useIterativeIMul, params.imaPipeDepth, params.useSegmentedIMul) ++ - allFPFUs(params.fmaPipeDepth, params.useScalarFPFMA, params.useScalarFPMisc) - )) - ) - ) - Seq(fp_int_path) - } - } - - case object Shared extends VectorIssueStructure { - def generate(params: VectorParams) = { - val fp_int_path = VXIssuePathParams( - name = "fp_int", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("int", allIntegerFUs(params.useIterativeIMul, params.imaPipeDepth, params.useSegmentedIMul)), - VXSequencerParams("fp", allFPFUs(params.fmaPipeDepth, params.useScalarFPFMA, params.useScalarFPMisc)) - ) - ) - Seq(fp_int_path) - } - } - - case object Split extends VectorIssueStructure { - def generate(params: VectorParams) = { - val int_path = VXIssuePathParams( - name = "int", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("int", allIntegerFUs(params.useIterativeIMul, params.imaPipeDepth, params.useSegmentedIMul)), - ) - ) - val fp_path = VXIssuePathParams( - name = "fp", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("fp", allFPFUs(params.fmaPipeDepth, params.useScalarFPFMA, params.useScalarFPMisc)) - ) - ) - Seq(int_path, fp_path) - } - } - - case object MultiFMA extends VectorIssueStructure { - def generate(params: VectorParams) = { - require(!params.useScalarFPFMA) - val int_path = VXIssuePathParams( - name = "int", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("int", allIntegerFUs(params.useIterativeIMul, params.imaPipeDepth, params.useSegmentedIMul)), - ) - ) - val fp_path = VXIssuePathParams( - name = "fp", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("fp0", allFPFUs(params.fmaPipeDepth, params.useScalarFPFMA, params.useScalarFPMisc)), - VXSequencerParams("fp1", fpFMA(params.fmaPipeDepth)) - ) - ) - Seq(int_path, fp_path) - } - } - - case object MultiMAC extends VectorIssueStructure { - def generate(params: VectorParams) = { - require(!params.useIterativeIMul && params.useSegmentedIMul) - val int_path = VXIssuePathParams( - name = "int", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("int0", allIntegerFUs(params.useIterativeIMul, params.imaPipeDepth, params.useSegmentedIMul)), - VXSequencerParams("int1", integerMAC(params.imaPipeDepth, params.useSegmentedIMul)) - ) - ) - val fp_path = VXIssuePathParams( - name = "fp", - depth = params.vxissqEntries, - seqs = Seq( - VXSequencerParams("fp", allFPFUs(params.fmaPipeDepth, params.useScalarFPFMA, params.useScalarFPMisc)) - ) - ) - Seq(int_path, fp_path) - } - } -} - -case class VectorParams( - // In-order dispatch Queue - vdqEntries: Int = 4, - - // Load store instruction queues (in VLSU) - vliqEntries: Int = 4, - vsiqEntries: Int = 4, - - // Load store in-flight queues (in VLSU) - vlifqEntries: Int = 8, - vsifqEntries: Int = 16, - vlrobEntries: Int = 2, - - // Scatter-gather engine params - vsgPorts: Int = 8, - vsgifqEntries: Int = 4, - vsgBuffers: Int = 3, - - // Load/store/execute/permute/maskindex issue queues - vlissqEntries: Int = 0, - vsissqEntries: Int = 0, - vxissqEntries: Int = 0, - vpissqEntries: Int = 0, - - dLen: Int = 64, - vatSz: Int = 3, - - - useSegmentedIMul: Boolean = false, - useScalarFPFMA: Boolean = true, // Use shared scalar FPU all non-FMA FP instructions - useScalarFPMisc: Boolean = true, // Use shared scalar FPU all non-FMA FP instructions - useIterativeIMul: Boolean = false, - fmaPipeDepth: Int = 4, - imaPipeDepth: Int = 3, - - // for comparisons only - hazardingMultiplier: Int = 0, - hwachaLimiter: Option[Int] = None, - enableChaining: Boolean = true, - latencyInject: Boolean = false, - enableDAE: Boolean = true, - enableOOO: Boolean = true, - enableScalarVectorAddrDisambiguation: Boolean = true, - - doubleBufferSegments: Boolean = false, - - vrfBanking: Int = 2, - vrfHiccupBuffer: Boolean = true, - - issStructure: VectorIssueStructure = VectorIssueStructure.Unified, - - tlBuffer: BufferParams = BufferParams.default, -) { - def supported_ex_insns = issStructure.generate(this).map(_.insns).flatten -} - -case object VectorParamsKey extends Field[VectorParams] - -trait HasVectorParams extends HasVectorConsts { this: HasCoreParameters => - implicit val p: Parameters - def vParams: VectorParams = p(VectorParamsKey) - def dLen = vParams.dLen - def dLenB = dLen / 8 - def dLenOffBits = log2Ceil(dLenB) - def dmemTagBits = log2Ceil(vParams.vlifqEntries.max(vParams.vsifqEntries)) - def sgmemTagBits = log2Ceil(vParams.vsgifqEntries) - def egsPerVReg = vLen / dLen - def egsTotal = (vLen / dLen) * 32 - def vrfBankBits = log2Ceil(vParams.vrfBanking) - def lsiqIdBits = log2Ceil(vParams.vliqEntries.max(vParams.vsiqEntries)) - val debugIdSz = 16 - val nRelease = vParams.issStructure match { - case VectorIssueStructure.Unified => 3 - case VectorIssueStructure.Shared | VectorIssueStructure.Split => 4 - case VectorIssueStructure.MultiFMA | VectorIssueStructure.MultiMAC => 5 - } - - def getEgId(vreg: UInt, eidx: UInt, eew: UInt, bitwise: Bool): UInt = { - val base = vreg << log2Ceil(egsPerVReg) - val off = eidx >> Mux(bitwise, log2Ceil(dLen).U, (log2Ceil(dLenB).U - eew)) - base +& off - } - def getByteId(vreg: UInt, eidx: UInt, eew: UInt): UInt = { - Cat(getEgId(vreg, eidx, eew, false.B), (eidx << eew)(log2Ceil(dLenB)-1,0)) - } - - def eewByteMask(eew: UInt) = (0 until (1+log2Ceil(eLen/8))).map { e => - Mux(e.U === eew, ((1 << (1 << e)) - 1).U, 0.U) - }.reduce(_|_)((eLen/8)-1,0) - def eewBitMask(eew: UInt) = FillInterleaved(8, eewByteMask(eew)) - - - def cqOlder(i0: UInt, i1: UInt, tail: UInt) = (i0 < i1) ^ (i0 < tail) ^ (i1 < tail) - def dLenSplat(in: UInt, eew: UInt) = { - val v = Wire(UInt(64.W)) - v := in - Mux1H(UIntToOH(eew), (0 until 4).map { i => Fill(dLenB >> i, v((8< - Cat(in((8 << eew)-1), in((8 << eew)-1,0)).asSInt - })(in_eew)(64,0) - - def extractElem(in: UInt, in_eew: UInt, eidx: UInt): UInt = { - val bytes = in.asTypeOf(Vec(dLenB, UInt(8.W))) - VecInit.tabulate(4) { eew => - val elem = if (dLen == 64 && eew == 3) { - in - } else { - VecInit(bytes.grouped(1 << eew).map(g => VecInit(g).asUInt).toSeq)(eidx(log2Ceil(dLenB)-1-eew,0)) - } - elem((8 << eew)-1,0) - }(in_eew) - } - - def maxPosUInt(sew: Int) = Cat(0.U, ~(0.U(((8 << sew)-1).W))) - def minNegUInt(sew: Int) = Cat(1.U, 0.U(((8 << sew)-1).W)) - def maxPosSInt(sew: Int) = ((1 << ((8 << sew)-1))-1).S - def minNegSInt(sew: Int) = (-1 << ((8 << sew)-1)).S - def maxPosFPUInt(sew: Int) = { - val expBits = Seq(4, 5, 8, 11)(sew) - val fracBits = (8 << sew) - expBits - 1 - Cat(0.U, ~(0.U(expBits.W)), 0.U(fracBits.W)) - } - def minNegFPUInt(sew: Int) = { - val expBits = Seq(4, 5, 8, 11)(sew) - val fracBits = (8 << sew) - expBits - 1 - Cat(1.U, ~(0.U(expBits.W)), 0.U(fracBits.W)) - } - def get_arch_mask(reg: UInt, emul: UInt) = VecInit.tabulate(4)({ lmul => - FillInterleaved(1 << lmul, UIntToOH(reg >> lmul)((32>>lmul)-1,0)) - })(emul) - def log2_up(f: UInt, max: Int) = VecInit.tabulate(max)({nf => log2Ceil(nf+1).U})(f) - - def hazardMultiply(mask: UInt): UInt = if (vParams.hazardingMultiplier == 0) { mask } else { - require((1 << vParams.hazardingMultiplier) <= egsTotal) - VecInit(mask.asBools.grouped(1 << vParams.hazardingMultiplier).map { g => - Fill(1 << vParams.hazardingMultiplier, g.orR) - }.toSeq).asUInt - } -} diff --git a/arch/src/main/scala/framework/gendomain/common/ReorderBuffer.scala b/arch/src/main/scala/framework/gendomain/common/ReorderBuffer.scala deleted file mode 100644 index 005be980..00000000 --- a/arch/src/main/scala/framework/gendomain/common/ReorderBuffer.scala +++ /dev/null @@ -1,62 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.rocket.Instructions._ - -class ReorderBuffer[T <: Data]( - gen: => T, - val entries: Int)(implicit val p: Parameters) extends Module { - require(entries > -1, "Queue must have non-negative number of entries") - require(entries != 0, "Use companion object Queue.apply for zero entries") - - val tagBits = log2Ceil(entries) - val io = IO(new Bundle { - val reserve = Decoupled(UInt(tagBits.W)) - val push = Input(Valid(new Bundle { - val data = gen - val tag = UInt(tagBits.W) - })) - val deq = Decoupled(gen) - val busy = Output(Bool()) - }) - - val valids = RegInit(VecInit.fill(entries)(false.B)) - val ram = Reg(Vec(entries, gen)) - val enq_ptr = Counter(entries) - val deq_ptr = Counter(entries) - val maybe_full = RegInit(false.B) - val ptr_match = enq_ptr.value === deq_ptr.value - val empty = ptr_match && !maybe_full - val full = ptr_match && maybe_full - - io.busy := !empty - io.reserve.valid := !full - io.reserve.bits := enq_ptr.value - when (io.reserve.fire) { - enq_ptr.inc() - } - - when (io.push.fire) { - assert(!valids(io.push.bits.tag)) - valids(io.push.bits.tag) := !(io.deq.ready && deq_ptr.value === io.push.bits.tag) - ram(io.push.bits.tag) := io.push.bits.data - } - - io.deq.valid := !empty && (valids(deq_ptr.value) || (io.push.fire && io.push.bits.tag === deq_ptr.value)) - io.deq.bits := Mux(valids(deq_ptr.value), ram(deq_ptr.value), io.push.bits.data) - - when (io.deq.fire) { - deq_ptr.inc() - valids(deq_ptr.value) := false.B - } - - when (io.reserve.fire =/= io.deq.fire) { - maybe_full := io.reserve.fire - } -} diff --git a/arch/src/main/scala/framework/gendomain/common/ShiftPacker.scala b/arch/src/main/scala/framework/gendomain/common/ShiftPacker.scala deleted file mode 100644 index 81f2e030..00000000 --- a/arch/src/main/scala/framework/gendomain/common/ShiftPacker.scala +++ /dev/null @@ -1,59 +0,0 @@ -package framework.gendomain.common - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ - -class ShiftPackerReq(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val head = UInt(log2Ceil(dLenB).W) - val tail = UInt(log2Ceil(dLenB).W) - def count = Mux(tail === 0.U, dLenB.U, tail) - head -} - -class ShiftPacker(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val push = Flipped(Decoupled(new ShiftPackerReq)) - val push_data = Input(Vec(dLenB, UInt(8.W))) - - val pop = Flipped(Decoupled(new ShiftPackerReq)) - val pop_data = Output(Vec(dLenB, UInt(8.W))) - }) - - def wshr(in: Seq[UInt], shamt: UInt): Seq[UInt] = - (0 until in.size).map { i => VecInit(in.drop(i))(shamt) } - def wshl(in: Seq[UInt], shamt: UInt): Seq[UInt] = - wshr(in.reverse, shamt).reverse - - val count = RegInit(0.U(log2Ceil(dLenB).W)) - val regs = Seq.fill(dLenB) { Reg(UInt(8.W)) } - val valid = (1.U << count) - 1.U - - val may_forward = io.pop.bits.count > count - - io.push.ready := dLenB.U +& Mux(io.pop.valid, io.pop.bits.count, 0.U) >= count +& io.push.bits.count - io.pop.ready := count +& Mux(io.push.valid, io.push.bits.count, 0.U) >= io.pop.bits.count - - val regs_shr = wshr(regs, io.pop.bits.count) - val valid_shr = valid >> io.pop.bits.count - - when (io.push.fire || io.pop.fire) { - count := count +& Mux(io.push.fire, io.push.bits.count, 0.U) - Mux(io.pop.fire, io.pop.bits.count, 0.U) - } - - val push_shr = wshr((Seq.fill(dLenB)(0.U(8.W)) ++ io.push_data), dLenB.U +& io.push.bits.head - count) - val push_shr_pop = wshr((Seq.fill(dLenB)(0.U(8.W)) ++ io.push_data), dLenB.U +& io.push.bits.head +& io.pop.bits.count - count) - - when (io.pop.fire) { - for (i <- 0 until dLenB) regs(i) := Mux(valid_shr(i), regs_shr(i), push_shr_pop(i)) - } .elsewhen (io.push.fire) { - for (i <- 0 until dLenB) when (!valid(i)) { - regs(i) := push_shr(i) - } - } - - val out_data = (0 until dLenB).map { i => Mux(valid(i), regs(i), push_shr(i)) } - io.pop_data := VecInit(wshl(out_data, io.pop.bits.head)) -} diff --git a/arch/src/main/scala/framework/gendomain/exu/ExecutionUnit.scala b/arch/src/main/scala/framework/gendomain/exu/ExecutionUnit.scala deleted file mode 100644 index 28dae4c8..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/ExecutionUnit.scala +++ /dev/null @@ -1,173 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class ExecutionUnit(genFUs: Seq[FunctionalUnitFactory])(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val fus = genFUs.map(gen => Module(gen.generate(p))) - - val pipe_fus: Seq[PipelinedFunctionalUnit] = fus.collect { case p: PipelinedFunctionalUnit => p } - val iter_fus: Seq[IterativeFunctionalUnit] = fus.collect { case i: IterativeFunctionalUnit => i } - - val pipe_depth = (pipe_fus.map(_.depth) :+ 0).max - - val io = IO(new Bundle { - val iss = Flipped(Decoupled(new ExecuteMicroOp)) - val iter_hazards = Output(Vec(iter_fus.size, Valid(new PipeHazard(pipe_depth)))) - val iter_write = Decoupled(new VectorWrite(dLen)) - val pipe_write = Output(Valid(new VectorWrite(dLen))) - val acc_write = Output(Valid(new VectorWrite(dLen))) - val scalar_write = Decoupled(new ScalarWrite) - - val pipe_hazards = Output(Vec(pipe_depth, Valid(new PipeHazard(pipe_depth)))) - val issue_pipe_latency = Output(UInt((log2Ceil(pipe_depth) + 1).W)) - - val shared_fp_req = Decoupled(new FPInput()) - val shared_fp_resp = Flipped(Valid(new FPResult())) - - val set_vxsat = Output(Bool()) - val set_fflags = Output(Valid(UInt(5.W))) - val busy = Output(Bool()) - }) - - val sharedFPUnits = fus.collect { case fp: HasSharedFPUIO => fp } - val hasSharedFPUnits = sharedFPUnits.size > 0 - - io.shared_fp_req.valid := false.B - io.shared_fp_req.bits := DontCare - if (sharedFPUnits.size > 0) { - val shared_fp_arb = Module(new Arbiter(new FPInput, sharedFPUnits.size)) - for ((u, i) <- sharedFPUnits.zipWithIndex) { - val otherUnits = sharedFPUnits.zipWithIndex.filter(_._2 != i).map(_._1) - val other_busy = otherUnits.map(_.io_fp_active).orR - u.io_fp_req.ready := shared_fp_arb.io.in(i).ready && !other_busy - shared_fp_arb.io.in(i).valid := u.io_fp_req.valid && !other_busy - shared_fp_arb.io.in(i).bits := u.io_fp_req.bits - u.io_fp_resp := io.shared_fp_resp - } - io.shared_fp_req <> shared_fp_arb.io.out - } - - val pipe_stall = WireInit(false.B) - - fus.foreach { fu => - fu.io.iss.op := io.iss.bits - fu.io.iss.valid := io.iss.valid && !pipe_stall - } - - val pipe_write_hazard = WireInit(false.B) - val readies = fus.map(_.io.iss.ready) - io.iss.ready := readies.orR && !pipe_write_hazard && !pipe_stall - when (io.iss.valid) { assert(PopCount(readies) <= 1.U) } - - io.issue_pipe_latency := Mux1H(pipe_fus.map(_.io.iss.ready), pipe_fus.map(_.depth.U)) - - val pipe_write = WireInit(false.B) - - io.pipe_write.valid := false.B - io.pipe_write.bits := DontCare - io.iter_write.valid := false.B - io.iter_write.bits := DontCare - io.acc_write.valid := false.B - io.acc_write.bits := DontCare - io.busy := false.B - io.set_vxsat := fus.map(_.io.set_vxsat).orR - io.set_fflags.valid := fus.map(_.io.set_fflags.valid).orR - io.set_fflags.bits := fus.map(f => Mux(f.io.set_fflags.valid, f.io.set_fflags.bits, 0.U)).reduce(_|_) - - - val scalar_write_arb = Module(new Arbiter(new ScalarWrite, fus.size)) - scalar_write_arb.io.in.zip(fus.map(_.io.scalar_write)).foreach { case (l, r) => l <> r } - io.scalar_write <> scalar_write_arb.io.out - - if (pipe_fus.size > 0) { - val pipe_iss_depth = Mux1H(pipe_fus.map(_.io.iss.ready), pipe_fus.map(_.depth.U)) - - val pipe_valids = Seq.fill(pipe_depth)(RegInit(false.B)) - val pipe_sels = Seq.fill(pipe_depth)(Reg(UInt(pipe_fus.size.W))) - val pipe_bits = Seq.fill(pipe_depth)(Reg(new ExecuteMicroOp)) - val pipe_latencies = Seq.fill(pipe_depth)(Reg(UInt(log2Ceil(pipe_depth).W))) - - pipe_stall := Mux1H(pipe_sels.head, pipe_fus.map(_.io.pipe0_stall)) - - pipe_write_hazard := (0 until pipe_depth).map { i => - pipe_valids(i) && pipe_latencies(i) === pipe_iss_depth - }.orR - - val pipe_iss = io.iss.fire && pipe_fus.map(_.io.iss.ready).orR - when (!pipe_stall) { - pipe_valids.head := pipe_iss - when (pipe_iss) { - pipe_bits.head := io.iss.bits - pipe_latencies.head := pipe_iss_depth - 1.U - pipe_sels.head := VecInit(pipe_fus.map(_.io.iss.ready)).asUInt - } - } - for (i <- 1 until pipe_depth) { - val fire = pipe_valids(i-1) && pipe_latencies(i-1) =/= 0.U && !((i == 1).B && pipe_stall) - pipe_valids(i) := fire - when (fire) { - pipe_bits(i) := pipe_bits(i-1) - pipe_latencies(i) := pipe_latencies(i-1) - 1.U - pipe_sels(i) := pipe_sels(i-1) - } - } - for ((fu, j) <- pipe_fus.zipWithIndex) { - for (i <- 0 until fu.depth) { - fu.io.pipe(i).valid := pipe_valids(i) && pipe_sels(i)(j) - fu.io.pipe(i).bits := Mux(pipe_valids(i) && pipe_sels(i)(j), - pipe_bits(i), 0.U.asTypeOf(new ExecuteMicroOp)) - } - } - - val write_sel = pipe_valids.zip(pipe_latencies).map { case (v,l) => v && l === 0.U } - val fu_sel = Mux1H(write_sel, pipe_sels) - pipe_write := write_sel.orR - when (write_sel.orR) { - val acc = Mux1H(write_sel, pipe_bits.map(_.acc)) - val tail = Mux1H(write_sel, pipe_bits.map(_.tail)) - io.pipe_write.valid := Mux1H(fu_sel, pipe_fus.map(_.io.write.valid)) && (!acc || tail) - io.pipe_write.bits := Mux1H(fu_sel, pipe_fus.map(_.io.write.bits)) - io.acc_write.valid := acc && !tail - io.acc_write.bits := Mux1H(fu_sel, pipe_fus.map(_.io.write.bits)) - } - - when (pipe_valids.orR) { io.busy := true.B } - for (i <- 0 until pipe_depth) { - io.pipe_hazards(i).valid := pipe_valids(i) - io.pipe_hazards(i).bits.eg := pipe_bits(i).wvd_eg - when (pipe_latencies(i) === 0.U) { // hack to deal with compress unit - io.pipe_hazards(i).bits.eg := Mux1H(pipe_sels(i), pipe_fus.map(_.io.write.bits.eg)) - } - io.pipe_hazards(i).bits.latency := pipe_latencies(i) - } - } - - if (iter_fus.size > 0) { - val iter_write_arb = Module(new Arbiter(new VectorWrite(dLen), iter_fus.size)) - iter_write_arb.io.in.zip(iter_fus.map(_.io.write)).foreach { case (l,r) => l <> r } - iter_write_arb.io.out.ready := !pipe_write && io.iter_write.ready - - val acc = Mux1H(iter_write_arb.io.in.map(_.fire), iter_fus.map(_.io.acc)) - val tail = Mux1H(iter_write_arb.io.in.map(_.fire), iter_fus.map(_.io.tail)) - io.iter_write.valid := iter_write_arb.io.out.valid && (!acc || tail) && !pipe_write - io.iter_write.bits.eg := iter_write_arb.io.out.bits.eg - io.iter_write.bits.mask := iter_write_arb.io.out.bits.mask - io.iter_write.bits.data := iter_write_arb.io.out.bits.data - when (!pipe_write) { - io.acc_write.valid := iter_write_arb.io.out.valid && acc - io.acc_write.bits.eg := Mux1H(iter_write_arb.io.in.map(_.fire), iter_fus.map(_.io.write.bits.eg)) - io.acc_write.bits.data := Mux1H(iter_write_arb.io.in.map(_.fire), iter_fus.map(_.io.write.bits.data)) - io.acc_write.bits.mask := Mux1H(iter_write_arb.io.in.map(_.fire), iter_fus.map(_.io.write.bits.mask)) - } - when (iter_fus.map(_.io.busy).orR) { io.busy := true.B } - for (i <- 0 until iter_fus.size) { - io.iter_hazards(i) := iter_fus(i).io.hazard - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/exu/FunctionalUnit.scala b/arch/src/main/scala/framework/gendomain/exu/FunctionalUnit.scala deleted file mode 100644 index 27b2f7f4..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/FunctionalUnit.scala +++ /dev/null @@ -1,84 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns.{VectorInstruction} - -abstract class FunctionalUnitIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val iss = new Bundle { - val valid = Input(Bool()) - val op = Input(new ExecuteMicroOp) - val ready = Output(Bool()) - } - - val scalar_write = Decoupled(new ScalarWrite) - val set_vxsat = Output(Bool()) - val set_fflags = Output(Valid(UInt(5.W))) -} - -class PipelinedFunctionalUnitIO(depth: Int)(implicit p: Parameters) extends FunctionalUnitIO { - val write = Valid(new VectorWrite(dLen)) - val pipe = Input(Vec(depth, Valid(new ExecuteMicroOp))) - val pipe0_stall = Output(Bool()) -} - -class IterativeFunctionalUnitIO(implicit p: Parameters) extends FunctionalUnitIO { - val write = Decoupled(new VectorWrite(dLen)) - val hazard = Output(Valid(new PipeHazard(10))) - val acc = Output(Bool()) - val tail = Output(Bool()) - - val busy = Output(Bool()) -} - -trait FunctionalUnitFactory { - def insns: Seq[VectorInstruction] - def generate(implicit p: Parameters): FunctionalUnit -} - -abstract class FunctionalUnit(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io: FunctionalUnitIO -} - -abstract class PipelinedFunctionalUnit(val depth: Int)(implicit p: Parameters) extends FunctionalUnit()(p) { - val io = IO(new PipelinedFunctionalUnitIO(depth)) - - require (depth > 0) - - def narrow2_expand(bits: Seq[UInt], eew: UInt, upper: Bool, sext: Bool): Vec[UInt] = { - val narrow_eew = (0 until 3).map { eew => Wire(Vec(dLenB >> (eew + 1), UInt((16 << eew).W))) } - for (eew <- 0 until 3) { - val in_vec = bits.grouped(1 << eew).map(g => VecInit(g).asUInt).toSeq - for (i <- 0 until dLenB >> (eew + 1)) { - val lo = Mux(upper, in_vec(i + (dLenB >> (eew + 1))), in_vec(i)) - val hi = Fill(16 << eew, lo((8 << eew)-1) && sext) - narrow_eew(eew)(i) := Cat(hi, lo) - } - } - VecInit(narrow_eew.map(_.asUInt))(eew).asTypeOf(Vec(dLenB, UInt(8.W))) - } -} - -abstract class IterativeFunctionalUnit(implicit p: Parameters) extends FunctionalUnit()(p) { - val io = IO(new IterativeFunctionalUnitIO) - - val valid = RegInit(false.B) - val op = Reg(new ExecuteMicroOp) - val last = Wire(Bool()) - - io.busy := valid - - io.hazard.bits.latency := DontCare - - when (io.iss.valid && io.iss.ready) { - valid := true.B - op := io.iss.op - } .elsewhen (last) { - valid := false.B - } -} diff --git a/arch/src/main/scala/framework/gendomain/exu/MaskUnit.scala b/arch/src/main/scala/framework/gendomain/exu/MaskUnit.scala deleted file mode 100644 index 97a5088d..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/MaskUnit.scala +++ /dev/null @@ -1,134 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case object MaskUnitFactory extends FunctionalUnitFactory { - def insns = Seq(MV_S_X, MV_X_S, POPC, FIRST, FMV_S_F, FMV_F_S, MSBF, MSOF, MSIF, IOTA, ID) - def generate(implicit p: Parameters) = new MaskUnit()(p) -} - -class MaskUnit(implicit p: Parameters) extends PipelinedFunctionalUnit(1)(p) { - val supported_insns = MaskUnitFactory.insns - - val scalar_wb_busy = RegInit(false.B) - val scalar_wb_data = Reg(UInt(64.W)) - val scalar_wb_rd = Reg(UInt(5.W)) - val scalar_wb_fp = Reg(Bool()) - val scalar_wb_size = Reg(UInt(2.W)) - val found_first = Reg(Bool()) - - def accepts(op: ExecuteMicroOp): Bool = (op.opff6.isOneOf(OPFFunct6.wrfunary0) || op.opmf6.isOneOf(OPMFunct6.wrxunary0, OPMFunct6.munary0)) && !scalar_wb_busy - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, io.iss.op.rs1, io.iss.op.rs2, supported_insns, Nil).matched && !scalar_wb_busy && !io.pipe(0).bits.tail - - io.set_vxsat := false.B - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - - val op = io.pipe(0).bits - val opmvv = op.funct3 === OPMVV - val opmvx = op.funct3 === OPMVX - val opfvv = op.funct3 === OPFVV - val opfvf = op.funct3 === OPFVF - - val wxunary0 = opmvv && !op.funct6(2) - val rxunary0 = opmvx - val wfunary0 = opfvv - val rfunary0 = opfvf - val munary0 = opmvv && op.funct6(2) - - val set_before = op.rs1.isOneOf(1.U, 3.U) - val set_first = op.rs1.isOneOf(2.U, 3.U) - - val elems = (op.rvs2_data & op.rvm_data & op.full_tail_mask) - val popc = PopCount(elems) - val ff = PriorityEncoder(elems) - val ff_oh = PriorityEncoderOH(elems) - val bf = ~((0 until dLen).map { i => (elems << i)(dLen-1,0) }.reduce(_|_)) - val nonzero = elems =/= 0.U - val first_here = (!found_first || op.head) && nonzero - val before = Mux(found_first && !op.head, 0.U, Mux(nonzero, bf, ~(0.U(dLen.W)))) - val first = Mux(first_here, ff_oh, 0.U) - val set = Mux(set_before, before, 0.U) | Mux(set_first, first, 0.U) - val sign = VecInit.tabulate(4)({sew => op.rvs2_data((8 << sew)-1)})(op.rvs2_eew) - val eew_mask = eewBitMask(op.rvs2_eew).pad(64) - val elem = (op.rvs2_data & eew_mask) | (Fill(64, sign && op.isOpm) & ~eew_mask) - val scalar_wb_rdata = Mux(op.head, 0.U, scalar_wb_data) - - val iota_dlenb = VecInit.tabulate(4)({sew => - val grouped = Mux(op.rs1(0), ~(0.U(dLen.W)), elems).asTypeOf(Vec(8 << sew, UInt((dLenB >> sew).W))) - grouped(op.eidx(log2Ceil(dLen)-1,log2Ceil(dLenB) - sew)) - })(op.rvd_eew) - val iota_sums = (0 until dLenB).map { i => - (PopCount(iota_dlenb & ((1< - val out = Wire(Vec(dLenB >> sew, UInt((8<> sew) - out.asUInt - })(op.vd_eew) - - when (io.pipe(0).valid) { - scalar_wb_rd := io.pipe(0).bits.rd - scalar_wb_size := io.pipe(0).bits.rvs2_eew - when (op.head) { - found_first := false.B - scalar_wb_data := 0.U - } - when (first_here) { found_first := true.B } - when (wxunary0) { - when (op.rs1 === 16.U) { // popc - scalar_wb_data := (scalar_wb_rdata + popc)(log2Ceil(maxVLMax),0) - } .elsewhen (op.rs1 === 17.U) { // first - when (first_here) { - scalar_wb_data := op.eidx + ff - } .elsewhen (!found_first || op.head) { - scalar_wb_data := ~(0.U(64.W)) - } - } .otherwise { // mv - scalar_wb_data := elem - } - } - when (wfunary0) { // fmv - scalar_wb_data := elem - } - when (munary0) { - val mask = VecInit.tabulate(4)({sew => ~(0.U((dLenB >> sew).W))})(op.vd_eew) - val incr = PopCount(iota_dlenb & mask) - scalar_wb_data := (scalar_wb_rdata + incr)(log2Ceil(maxVLMax),0) - } - when (op.tail) { - scalar_wb_busy := wxunary0 || wfunary0 - scalar_wb_fp := wfunary0 - } - } - - io.scalar_write.valid := scalar_wb_busy - io.scalar_write.bits.data := scalar_wb_data - io.scalar_write.bits.rd := scalar_wb_rd - io.scalar_write.bits.fp := scalar_wb_fp - io.scalar_write.bits.size := scalar_wb_size - - io.pipe0_stall := false.B - io.write.valid := io.pipe(0).valid && (rxunary0 || rfunary0 || munary0) - io.write.bits.eg := op.wvd_eg - io.write.bits.mask := Mux1H(Seq( - (rxunary0 || rfunary0 , eewBitMask(op.vd_eew)), - (munary0 && op.rs1(4) , FillInterleaved(8, op.wmask)), - (munary0 && !op.rs1(4), op.full_tail_mask & op.rvm_data) - )) - io.write.bits.data := Mux1H(Seq( - (rxunary0 || rfunary0 , op.rvs1_data(63,0)), - (munary0 && op.rs1(4) , iota_out), - (munary0 && !op.rs1(4), set) - )) - - when (io.scalar_write.fire) { scalar_wb_busy := false.B } -} diff --git a/arch/src/main/scala/framework/gendomain/exu/PermuteUnit.scala b/arch/src/main/scala/framework/gendomain/exu/PermuteUnit.scala deleted file mode 100644 index 41dd681d..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/PermuteUnit.scala +++ /dev/null @@ -1,95 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case object PermuteUnitFactory extends FunctionalUnitFactory { - def insns = Seq( - SLIDEUP.VI, SLIDEUP.VX, SLIDEDOWN.VI, SLIDEDOWN.VX, - SLIDE1UP.VX, SLIDE1DOWN.VX, FSLIDE1UP.VF, FSLIDE1DOWN.VF, - RGATHER_VV, RGATHER_VI, RGATHER_VX, - RGATHEREI16, COMPRESS.VV, - MVNRR - ) - - def generate(implicit p: Parameters) = new PermuteUnit()(p) -} - -class PermuteUnit(implicit p: Parameters) extends PipelinedFunctionalUnit(1)(p) { - val supported_insns = PermuteUnitFactory.insns - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, io.iss.op.rs1, io.iss.op.rs2, - supported_insns, Nil).matched - - val wvd_reg = Reg(UInt(5.W)) - val result_reg = Reg(UInt(64.W)) - - val mvnrr = io.pipe(0).bits.funct3 === OPIVV && io.pipe(0).bits.opif6 === OPIFunct6.mvnrr - val compress = io.pipe(0).bits.opmf6 === OPMFunct6.compress - val rgatherei16 = io.pipe(0).bits.funct3 === OPIVV && io.pipe(0).bits.opif6 === OPIFunct6.rgatherei16 - val rgather = io.pipe(0).bits.opif6 === OPIFunct6.rgather || rgatherei16 - - - val index_eew = Mux(rgatherei16, 1.U, io.pipe(0).bits.rvs2_eew) - val elem_eidx = Mux(rgather, io.pipe(0).bits.rvs1_data, io.pipe(0).bits.eidx) - val elem = VecInit.tabulate(4)({sew => if (sew == 3 && dLenB == 8) { - io.pipe(0).bits.rvs2_data - } else { - io.pipe(0).bits.rvs2_data.asTypeOf(Vec(dLenB >> sew, UInt((8 << sew).W)))(elem_eidx) - }})(io.pipe(0).bits.rvs2_eew) - val rgather_elem = Mux(io.pipe(0).bits.head || io.pipe(0).bits.funct3 === OPIVV, elem, result_reg) - val splat = dLenSplat(Mux(compress, elem, rgather_elem), io.pipe(0).bits.rvs2_eew) - - val compress_wvd = Mux(io.pipe(0).bits.head, io.pipe(0).bits.wvd_eg >> log2Ceil(egsPerVReg), wvd_reg) - val compress_bit = (io.pipe(0).bits.rvs1_data >> io.pipe(0).bits.eidx(log2Ceil(dLen)-1,0))(0) - val compress_eidx = Mux(io.pipe(0).bits.head, 0.U, result_reg)(log2Ceil(maxVLMax),0) - - when (io.pipe(0).valid && io.pipe(0).bits.head && rgather) { - result_reg := elem - } - when (io.pipe(0).valid && io.pipe(0).bits.head) { - wvd_reg := io.pipe(0).bits.wvd_eg >> log2Ceil(egsPerVReg) - } - when (io.pipe(0).valid && compress) { - result_reg := (compress_eidx + compress_bit)(log2Ceil(maxVLMax),0) - } - - val shifted_mask_eidx = Mux(compress, compress_eidx, io.pipe(0).bits.vl - 1.U) - val shifted_mask = VecInit.tabulate(4)({sew => if (sew == 3 && dLenB == 8) { ~(0.U(8.W)) } else { - FillInterleaved(1 << sew, UIntToOH(shifted_mask_eidx(dLenOffBits-sew-1,0))) - }})(io.pipe(0).bits.rvs2_eew) - - - val slide_up = !io.pipe(0).bits.funct6(0) - val slide1 = !io.pipe(0).bits.isOpi - val slide1up_mask = eewByteMask(io.pipe(0).bits.rvs2_eew) - val slide1_mask = Mux(slide_up, - Mux(io.pipe(0).bits.head, slide1up_mask, 0.U), - Mux(io.pipe(0).bits.tail, shifted_mask, 0.U)) - val use_rvs1_mask = FillInterleaved(8, Mux(slide1, slide1_mask, 0.U).pad(dLenB)) - - val wmask = Mux(mvnrr, ~(0.U(dLenB.W)), - Mux(compress, Mux(compress_bit, shifted_mask, 0.U), io.pipe(0).bits.wmask)) - - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.set_vxsat := false.B - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - - io.pipe0_stall := false.B - io.write.valid := io.pipe(0).valid && (!compress || compress_bit) - io.write.bits.eg := Mux(compress, - getEgId(compress_wvd, compress_eidx, io.pipe(0).bits.rvs2_eew, false.B), - io.pipe(0).bits.wvd_eg) - io.write.bits.mask := FillInterleaved(8, wmask) - io.write.bits.data := Mux(rgather || compress, - splat, - (io.pipe(0).bits.rvs2_data & ~use_rvs1_mask) | (io.pipe(0).bits.rvs1_data & use_rvs1_mask)) -} diff --git a/arch/src/main/scala/framework/gendomain/exu/Rounding.scala b/arch/src/main/scala/framework/gendomain/exu/Rounding.scala deleted file mode 100644 index a2373081..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/Rounding.scala +++ /dev/null @@ -1,21 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ - -object RoundingIncrement { - - def apply(vxrm: UInt, v_d: Bool, v_d1: Bool, v_d20: Option[UInt]): Bool = MuxLookup(vxrm(1,0), false.B)(Seq( - (0.U -> (v_d1)), - (1.U -> (v_d1 && (v_d20.map(_ =/= 0.U).getOrElse(false.B) || v_d))), - (2.U -> (false.B)), - (3.U -> (!v_d && Cat(v_d1, v_d20.getOrElse(false.B)) =/= 0.U)) - )) - - def apply(vxrm: UInt, bits: UInt): UInt = { - val w = bits.getWidth - val d = w - 1 - require(w >= 2) - apply(vxrm, bits(d), bits(d-1), if (w > 2) Some(bits(d-2,0)) else None) - } -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/ElementwiseFPU.scala b/arch/src/main/scala/framework/gendomain/exu/fp/ElementwiseFPU.scala deleted file mode 100644 index 655e3b88..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/ElementwiseFPU.scala +++ /dev/null @@ -1,326 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - - -class ElementwiseFPUFMA(depth: Int)(implicit p: Parameters) extends PipelinedFunctionalUnit(depth)(p) with HasFPUParameters { - val io_fp_req = IO(Decoupled(new FPInput())) - val io_fp_resp = IO(Flipped(Valid(new FPResult()))) - - val supported_insns = Seq( - FADD.VV, FADD.VF, FSUB.VV, FSUB.VF, FRSUB.VF, - FMUL.VV, FMUL.VF, - FMACC.VV, FMACC.VF, FNMACC.VV, FNMACC.VF, - FMSAC.VV, FMSAC.VF, FNMSAC.VV, FNMSAC.VF, - FMADD.VV, FMADD.VF, FNMADD.VV, FNMADD.VF, - FMSUB.VV, FMSUB.VF, FNMSUB.VV, FNMSUB.VF, - FWADD.VV, FWADD.VF, FWSUB.VV, FWSUB.VF, - FWADDW.VV, FWADDW.VF, FWSUBW.VV, FWSUBW.VF, - FWMUL.VV, FWMUL.VF, - FWMACC.VV, FWMACC.VF, FWNMACC.VV, FWNMACC.VF, - FWMSAC.VV, FWMSAC.VF, FWNMSAC.VV, FWNMSAC.VF, - FREDOSUM.VV, FREDUSUM.VV, FWREDOSUM.VV, FWREDUSUM.VV, - ).map(_.elementWise) - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, supported_insns, Seq( - FPAdd, FPMul, FPSwapVdV2, FPFMACmd, ReadsVD, FPSpecRM, Wide2VD, Wide2VS2, Reduction)) - - val vs1_eew = io.pipe(0).bits.rvs1_eew - val vs2_eew = io.pipe(0).bits.rvs2_eew - val vd_eew = io.pipe(0).bits.vd_eew - val vd_eew64 = io.pipe(0).bits.vd_eew64 - val eidx = Mux(io.pipe(0).bits.acc, 0.U, io.pipe(0).bits.eidx) - - // Functional unit is ready if not currently running and the scalar FPU is available - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched && io_fp_req.ready - - // Create FPInput - val req = Wire(new FPInput) - req.ldst := false.B - req.wen := false.B - req.ren1 := true.B - req.ren2 := true.B - req.ren3 := ctrl.bool(ReadsVD) - req.swap12 := false.B - req.swap23 := ctrl.bool(FPAdd) && !ctrl.bool(FPMul) - req.typeTagIn := Mux(vd_eew64, D, S) - req.typeTagOut := Mux(vd_eew64, D, S) - req.fromint := false.B - req.toint := false.B - req.fastpipe := false.B - req.fma := true.B - req.div := false.B - req.sqrt := false.B - req.wflags := true.B - req.vec := true.B - req.rm := io.pipe(0).bits.frm - req.fmaCmd := ctrl.uint(FPFMACmd) - req.typ := 0.U - req.fmt := 0.U - - val rvs2_elem = io.pipe(0).bits.rvs2_elem - val rvs1_elem = io.pipe(0).bits.rvs1_elem - val rvd_elem = io.pipe(0).bits.rvd_elem - - val s_rvs2 = FType.S.recode(rvs2_elem(31,0)) - val s_rvs1 = FType.S.recode(rvs1_elem(31,0)) - val s_rvd = FType.S.recode(rvd_elem(31,0)) - - // For widening operations, widen the narrow operands to compute with the scalar FPU - val widen_rvs1 = Module(new hardfloat.RecFNToRecFN(8, 24, 11, 53)) - widen_rvs1.io.in := s_rvs1 - widen_rvs1.io.roundingMode := io.pipe(0).bits.frm - widen_rvs1.io.detectTininess := hardfloat.consts.tininess_afterRounding - - val widen_rvs2 = Module(new hardfloat.RecFNToRecFN(8, 24, 11, 53)) - widen_rvs2.io.in := s_rvs2 - widen_rvs2.io.roundingMode := io.pipe(0).bits.frm - widen_rvs2.io.detectTininess := hardfloat.consts.tininess_afterRounding - - val d_rvs2 = FType.D.recode(rvs2_elem) - val d_rvs1 = FType.D.recode(rvs1_elem) - val d_rvd = FType.D.recode(rvd_elem) - - val rvs2_recoded_elem = Mux(vd_eew64, d_rvs2, s_rvs2) - val rvs1_recoded_elem = Mux(vd_eew64, d_rvs1, s_rvs1) - val rvd_recoded_elem = Mux(vd_eew64, d_rvd, s_rvd) - - // Set req.in1 - when (ctrl.bool(FPSwapVdV2)) { - req.in1 := rvd_recoded_elem - } .elsewhen (vs2_eew === 3.U) { - req.in1 := d_rvs2 - } .elsewhen (ctrl.bool(Wide2VD)) { - req.in1 := widen_rvs2.io.out - } .otherwise { - req.in1 := s_rvs2 - } - - // Set req.in2 - when (vs1_eew === 3.U) { - req.in2 := d_rvs1 - } .elsewhen (ctrl.bool(Wide2VD) && !io.pipe(0).bits.acc) { - req.in2 := widen_rvs1.io.out - } .otherwise { - req.in2 := s_rvs1 - } - - // Set req.in3 - when (ctrl.bool(FPSwapVdV2)) { - req.in3 := rvs2_recoded_elem - } .otherwise { - req.in3 := rvd_recoded_elem - } - - io_fp_req.bits := req - io_fp_req.valid := io.pipe(0).valid - - io.write.valid := io.pipe(depth-1).valid - io.write.bits.eg := io.pipe(depth-1).bits.wvd_eg - io.write.bits.mask := FillInterleaved(8, io.pipe(depth-1).bits.wmask) - io.write.bits.data := Fill(dLenB >> 3, Mux(io.pipe(depth-1).bits.vd_eew === 3.U, FType.D.ieee(io_fp_resp.bits.data), Fill(2, FType.S.ieee(unbox(io_fp_resp.bits.data, 0.U, Some(FType.S)))))) - - io.set_fflags := DontCare - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.set_vxsat := false.B - io.pipe0_stall := false.B -} - - -class ElementwiseFPU(implicit p: Parameters) extends IterativeFunctionalUnit()(p) with HasFPUParameters { - val io_fp_req = IO(Decoupled(new FPInput())) - val io_fp_resp = IO(Flipped(Valid(new FPResult()))) - - val supported_insns = Seq( - FDIV.VV, FDIV.VF, - FRDIV.VF, - FSQRT_V, - FRSQRT7_V, - FREC7_V, - FCLASS_V, - FMIN.VV, FMIN.VF, FMAX.VV, FMAX.VF, - FSGNJ.VV, FSGNJ.VF, FSGNJN.VV, FSGNJN.VF, FSGNJX.VV, FSGNJX.VF, - MFEQ.VV, MFEQ.VF, MFNE.VV, MFNE.VF, - MFLT.VV, MFLT.VF, MFLE.VV, MFLE.VF, - MFGT.VF, MFGE.VF, - FREDMIN.VV, FREDMAX.VV, - FCVT_SGL, FCVT_WID, FCVT_NRW - ).map(_.elementWise) - - val ctrl = new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Seq( - FPSwapVdV2, ReadsVD, WritesAsMask, FPSgnj, FPComp, FPSpecRM, FPMNE, FPMGT, Wide2VD, Wide2VS2, Reduction)) - - val vs1_eew = io.iss.op.rvs1_eew - val vs2_eew = io.iss.op.rvs2_eew - val vd_eew = io.iss.op.vd_eew - val vd_eew64 = io.iss.op.vd_eew64 - val eidx = Mux(io.iss.op.acc, 0.U, io.iss.op.eidx) - - val ctrl_isDiv = io.iss.op.opff6.isOneOf(OPFFunct6.fdiv, OPFFunct6.frdiv) - val ctrl_funary0 = io.iss.op.opff6.isOneOf(OPFFunct6.funary0) - val ctrl_funary1 = io.iss.op.opff6.isOneOf(OPFFunct6.funary1) - val ctrl_vfclass = ctrl_funary1 && (io.iss.op.rs1 === 16.U) - val ctrl_swap12 = io.iss.op.opff6.isOneOf(OPFFunct6.frdiv) - - val rs1 = io.iss.op.rs1 - val ctrl_widen = ctrl_funary0 && rs1(3) - val ctrl_narrow = rs1(4) - val ctrl_single_wide = ctrl_funary0 && !ctrl_widen && !ctrl_narrow - val ctrl_signed = rs1(0) - val ctrl_truncating = rs1(2) && rs1(1) - val ctrl_round_to_odd = rs1(0) - val ctrl_fptoint = ctrl_funary0 && ((!rs1(2) && !rs1(1)) || (rs1(2) && rs1(1))) - val ctrl_inttofp = ctrl_funary0 && (!rs1(2) && rs1(1)) - val ctrl_fptofp = ctrl_funary0 && (rs1(2) && !rs1(1)) - - val vfclass_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 16.U && valid - val vfrsqrt7_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 4.U && valid - val vfrec7_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 5.U && valid - - // Functional unit is ready if not currently running and the scalar FPU is available - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched && !valid && io_fp_req.ready - - io.hazard.valid := valid - io.hazard.bits.eg := op.wvd_eg - - val has_wdata = Reg(Bool()) - val wdata = Reg(UInt(64.W)) - when (io.iss.valid && io.iss.ready) { - has_wdata := false.B - } - - // Create FPInput - val req = Wire(new FPInput) - req.ldst := false.B - req.wen := false.B - req.ren1 := true.B - req.ren2 := !(ctrl_funary0 || ctrl_funary1) - req.ren3 := false.B - req.swap12 := false.B - req.swap23 := false.B - req.typeTagIn := Mux(((ctrl_single_wide || !ctrl_funary0) && vd_eew64) || (ctrl_inttofp && ctrl_widen) || (ctrl_fptofp && ctrl_narrow), D, S) - req.typeTagOut := Mux(((ctrl_single_wide || !ctrl_funary0) && vd_eew64) || (ctrl_fptoint && ctrl_narrow) || (ctrl_fptofp && ctrl_widen) || (ctrl_inttofp && ctrl_widen), D, S) - req.fromint := ctrl_inttofp - req.toint := (ctrl_fptoint) || ctrl_vfclass || ctrl.bool(WritesAsMask) - req.fastpipe := ctrl_fptofp || ctrl.bool(FPSgnj) || ctrl.bool(FPComp) - req.fma := false.B - req.div := ctrl_isDiv - req.sqrt := ctrl_funary1 && (rs1 === 0.U) - req.wflags := !ctrl_vfclass && !ctrl.bool(FPSgnj) - req.vec := true.B - req.rm := Mux(ctrl_fptofp && ctrl_round_to_odd, "b110".U, Mux((!ctrl_isDiv && !ctrl_funary1 && !ctrl_funary0) || ctrl_vfclass, ctrl.uint(FPSpecRM), io.iss.op.frm)) - req.fmaCmd := 0.U - req.typ := Mux(ctrl_funary0, Cat((ctrl_inttofp && ctrl_narrow) || (ctrl_fptoint && ctrl_widen) || (ctrl_single_wide && vd_eew64), !ctrl_signed), 0.U) - req.fmt := 0.U - - val rvs2_elem = io.iss.op.rvs2_elem - val rvs1_elem = io.iss.op.rvs1_elem - val rvd_elem = io.iss.op.rvd_elem - - val s_rvs2_int = rvs2_elem(31,0) - val s_rvs2_fp = FType.S.recode(Mux(ctrl_funary0 && ctrl_truncating, rvs2_elem(31,22) << 22, rvs2_elem(31,0))) - val s_rvs2_unbox = unbox(box(s_rvs2_fp, FType.S), S, None) - - val s_rvs1 = FType.S.recode(rvs1_elem(31,0)) - val s_rvs1_unbox = unbox(box(s_rvs1, FType.S), S, None) - val s_rvd = FType.S.recode(rvd_elem(31,0)) - - val d_rvs2_int = rvs2_elem - val d_rvs2_fp = FType.D.recode(Mux(ctrl_funary0 && ctrl_truncating, rvs2_elem(63, 51) << 51, rvs2_elem)) - - val d_rvs1 = FType.D.recode(rvs1_elem) - val d_rvd = FType.D.recode(rvd_elem) - - val s_isNaN = FType.S.isNaN(s_rvs2_fp) || FType.S.isNaN(s_rvs1) - val d_isNaN = FType.D.isNaN(d_rvs2_fp) || FType.D.isNaN(d_rvs1) - - val mgt_NaN = ctrl.bool(WritesAsMask) && ctrl.bool(FPMGT) && ((vd_eew64 && d_isNaN) || (io.iss.op.vd_eew32 && s_isNaN)) - val mgt_NaN_reg = RegInit(false.B) - - when (io.iss.ready && io.iss.valid && mgt_NaN) { - mgt_NaN_reg := true.B - } .elsewhen (io.write.fire) { - mgt_NaN_reg := false.B - } - - // Set req.in1 - when (ctrl_swap12) { - req.in1 := Mux(vd_eew64, d_rvs1, s_rvs1_unbox) - } .elsewhen (ctrl_inttofp) { - req.in1 := Mux((vd_eew64 && !ctrl_widen) || (ctrl_funary0 && ctrl_narrow), d_rvs2_int, s_rvs2_int) - } .otherwise { - req.in1 := Mux((vd_eew64 && !ctrl_widen) || (ctrl_funary0 && ctrl_narrow), d_rvs2_fp, s_rvs2_unbox) - } - - // Set req.in2 - when (ctrl_swap12) { - req.in2 := Mux(vd_eew64, d_rvs2_fp, s_rvs2_unbox) - } .otherwise { - req.in2 := Mux(vd_eew64, d_rvs1, s_rvs1_unbox) - } - - // Set req.in3 - req.in3 := 0.U - - io_fp_req.bits := req - io_fp_req.valid := (io.iss.valid && io.iss.ready) && !vfrsqrt7_inst && !vfrec7_inst && !mgt_NaN - - // Approximation Instructions - val rvs2_op_bits = op.rvs2_elem - - // Reciprocal Sqrt Approximation - val recSqrt7 = Module(new VFRSQRT7) - recSqrt7.io.rvs2_input := rvs2_op_bits - recSqrt7.io.eew := op.rvs2_eew - - // Reciprocal Approximation - val rec7 = Module(new VFREC7) - rec7.io.rvs2_input := rvs2_op_bits - rec7.io.eew := op.rvs2_eew - rec7.io.frm := op.frm - - when (io_fp_resp.valid) { - when (ctrl.bool(WritesAsMask)) { - when (ctrl.bool(FPMNE) || (ctrl.bool(FPMGT) && !mgt_NaN_reg)) { - wdata := Fill(dLen, !io_fp_resp.bits.data(0)) - } .elsewhen (ctrl.bool(FPMGT) && mgt_NaN_reg) { - wdata := Fill(dLen, 0.U) - } .otherwise { - wdata := Fill(dLen, io_fp_resp.bits.data(0)) - } - } .elsewhen (vfclass_inst) { - wdata := Mux(vd_eew64, Cat(0.U(54.W), io_fp_resp.bits.data(9,0)), Fill(2, Cat(0.U(22.W), io_fp_resp.bits.data(9,0)))) - } .elsewhen (ctrl_fptoint) { - wdata := Mux(vd_eew64, io_fp_resp.bits.data(63,0), Fill(2, io_fp_resp.bits.data(31,0))) - } .otherwise { - wdata := Mux(vd_eew64, FType.D.ieee(io_fp_resp.bits.data), Fill(2, FType.S.ieee(unbox(io_fp_resp.bits.data, 0.U, Some(FType.S))))) - } - has_wdata := true.B - } - - val mask_bit = Mux(op.vd_eew64, op.wmask(0), Mux(op.eidx(0), op.wmask(4), op.wmask(0))) - - io.write.valid := (has_wdata || vfrsqrt7_inst || vfrec7_inst || mgt_NaN_reg) && valid - io.write.bits.eg := op.wvd_eg - io.write.bits.mask := Mux(ctrl.bool(WritesAsMask), ((1.U(dLen.W) & mask_bit) << (op.eidx % dLen.U)), FillInterleaved(8, op.wmask)) - io.write.bits.data := Mux1H(Seq(vfrsqrt7_inst, vfrec7_inst, io_fp_resp.fire), - Seq(Fill(dLenB >> 3, recSqrt7.io.out), Fill(dLenB >> 3, rec7.io.out), Fill(dLenB >> 3, wdata))) - - last := io.write.fire - - io.set_fflags := DontCare - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.set_vxsat := false.B - - io.acc := io.iss.op.acc - io.tail := io.iss.op.tail -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/FPComp.scala b/arch/src/main/scala/framework/gendomain/exu/fp/FPComp.scala deleted file mode 100644 index ef348d57..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/FPComp.scala +++ /dev/null @@ -1,155 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case object FPCmpFactory extends FunctionalUnitFactory { - def insns = Seq( - FMIN.VV, FMIN.VF, FMAX.VV, FMAX.VF, - FSGNJ.VV, FSGNJ.VF, FSGNJN.VV, FSGNJN.VF, FSGNJX.VV, FSGNJX.VF, - MFEQ.VV, MFEQ.VF, MFNE.VV, MFNE.VF, - MFLT.VV, MFLT.VF, MFLE.VV, MFLE.VF, - MFGT.VF, MFGE.VF, - FREDMIN.VV, FREDMAX.VV) - def generate(implicit p: Parameters) = new FPCompPipe()(p) -} - -class FPCompPipe(implicit p: Parameters) extends PipelinedFunctionalUnit(1)(p) with HasFPUParameters { - val supported_insns = FPCmpFactory.insns - - io.set_vxsat := false.B - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, - supported_insns, Seq(WritesAsMask, FPComp, FPCompMin, FPMEQ, FPMNE, FPMLT, FPMGT, FPSgnj)) - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - - val ctrl_sgnjn = io.pipe(0).bits.funct6(0) - val ctrl_sgnjx = io.pipe(0).bits.funct6(1) - val rvs1_eew = io.pipe(0).bits.rvs1_eew - val rvs2_eew = io.pipe(0).bits.rvs2_eew - val rvd_eew = io.pipe(0).bits.rvd_eew - val rvs2_data = io.pipe(0).bits.rvs2_data - val rvs1_data = io.pipe(0).bits.rvs1_data - - val fTypes = Seq(FType.H, FType.S, FType.D) - val minmax_results = Wire(Vec(3, UInt(dLen.W))) // results for vfmin/vfmax - val comp_results_eew = Seq.tabulate(4)({sew => WireInit(0.U((dLenB >> sew).W))}) - val comp_results = (0 until 4).map({sew => Mux(rvd_eew === sew.U, Fill(1 << sew, comp_results_eew(sew)), 0.U(dLenB.W))}).reduce(_|_) - val exceptions = Wire(Vec(3, UInt(5.W))) - - for (eew <- 1 until 4) { - val fType = fTypes(eew-1) - val num_chunks = dLen / fType.ieeeWidth - val compare_modules = Seq.fill(num_chunks)(Module(new hardfloat.CompareRecFN(fType.exp, fType.sig))) - - val zip_compares = rvs2_data.asTypeOf(Vec(num_chunks, UInt(fType.ieeeWidth.W))).zip(rvs1_data.asTypeOf(Vec(num_chunks, UInt(fType.ieeeWidth.W)))).zip(compare_modules) - - val gen_compares = zip_compares.map { case((rvs2, rvs1), comp) => - val rvs2_rec = fType.recode(rvs2) - val rvs1_rec = fType.recode(rvs1) - val rvs2_nan = fType.isNaN(rvs2_rec) - val rvs1_nan = fType.isNaN(rvs1_rec) - - comp.io.signaling := true.B - comp.io.a := rvs2_rec - comp.io.b := rvs1_rec - (comp.io, rvs2, rvs2_nan, rvs1, rvs1_nan) - } - - val minmax = gen_compares.map{ case(comp, rvs2, rvs2_nan, rvs1, rvs1_nan) => - val minmax_out = Wire(UInt(fType.ieeeWidth.W)) - when(rvs2_nan && rvs1_nan) { - minmax_out := fType.ieeeQNaN - } .elsewhen (rvs2_nan) { - minmax_out := rvs1 - } .elsewhen (rvs1_nan) { - minmax_out := rvs2 - } .otherwise { - minmax_out := Mux((!ctrl.bool(FPCompMin) && comp.gt) || (ctrl.bool(FPCompMin) && comp.lt), rvs2, rvs1) - } - minmax_out - } - minmax_results(eew - 1) := minmax.asUInt - - val comparisons = gen_compares.map{ case(comp, rvs2, rvs2_nan, rvs1, rvs1_nan) => - val comparison_out = Wire(UInt(1.W)) - when (ctrl.bool(FPMNE)) { - when (rvs2_nan || rvs1_nan) { - comparison_out := 1.U - } .otherwise { - comparison_out := !comp.eq - } - } .elsewhen (rvs2_nan || rvs1_nan) { - comparison_out := 0.U - } .otherwise { - comparison_out := (comp.eq && ctrl.bool(FPMEQ)) || (comp.lt && ctrl.bool(FPMLT)) || (comp.gt && ctrl.bool(FPMGT)) - } - comparison_out - } - comp_results_eew(eew) := comparisons.asUInt - - exceptions(eew - 1) := gen_compares.map {case(comp, rvs2, rvs2_nan, rvs1, rvs1_nan) => comp.exceptionFlags}.reduce(_ | _) - } - - - val rvs1_vals = io.pipe(0).bits.rvs1_data.asTypeOf(Vec(dLenB / 8, UInt(64.W))) - val rvs2_vals = io.pipe(0).bits.rvs2_data.asTypeOf(Vec(dLenB / 8, UInt(64.W))) - - // Sign-Injection Instructions - val sgnj = rvs2_vals.zip(rvs1_vals).map{ case(rvs2, rvs1) => - val d_bit = Wire(Bool()) - val s_bit = Wire(Bool()) - val h_bits = Wire(UInt(2.W)) - - when (ctrl_sgnjn) { - d_bit := !rvs1(63) - s_bit := Mux(rvd_eew <= 2.U, !rvs1(31), rvs2(31)) - h_bits := Mux(rvd_eew === 1.U, Cat(!rvs1(47), !rvs1(15)), Cat(rvs2(47), rvs2(15))) - } .elsewhen (ctrl_sgnjx) { - d_bit := rvs1(63) ^ rvs2(63) - s_bit := Mux(rvd_eew <= 2.U, rvs1(31) ^ rvs2(31), rvs2(31)) - h_bits := Mux(rvd_eew === 1.U, Cat(rvs1(47) ^ rvs2(47), rvs1(15) ^ rvs2(15)), Cat(rvs2(47), rvs2(15))) - } .otherwise { - d_bit := rvs1(63) - s_bit := Mux(rvd_eew <= 2.U, rvs1(31), rvs2(31)) - h_bits := Mux(rvd_eew === 1.U, Cat(rvs1(47), rvs1(15)), Cat(rvs2(47), rvs2(15))) - } - d_bit ## rvs2(62,48) ## h_bits(1) ## rvs2(46, 32) ## s_bit ## rvs2(30,16) ## h_bits(0) ## rvs2(14,0) - } - - val out = Wire(UInt(dLen.W)) - when (ctrl.bool(FPComp)) { - out := Mux(rvs1_eew === 3.U, minmax_results(1), minmax_results(0)) - out := Mux1H(Seq(rvs1_eew === 3.U, rvs1_eew === 2.U, rvs1_eew === 1.U), - Seq(minmax_results(2), minmax_results(1), minmax_results(0))) - } .elsewhen (ctrl.bool(WritesAsMask)) { - out := Fill(8, comp_results) - } .otherwise { - out := sgnj.asUInt - } - - // Mask writing - val mask_write_offset = VecInit.tabulate(4)({ eew => - Cat(io.pipe(0).bits.eidx(log2Ceil(dLen)-1, dLenOffBits-eew), 0.U((dLenOffBits-eew).W)) - })(rvs1_eew) - val mask_write_mask = (VecInit.tabulate(4)({ eew => - VecInit(io.pipe(0).bits.wmask.asBools.grouped(1 << eew).map(_.head).toSeq).asUInt - })(rvs1_eew) << mask_write_offset)(dLen-1,0) - - io.write.valid := io.pipe(0).valid - io.write.bits.eg := io.pipe(0).bits.wvd_eg - io.write.bits.mask := Mux(ctrl.bool(WritesAsMask), mask_write_mask, FillInterleaved(8, io.pipe(0).bits.wmask)) - io.write.bits.data := out - - io.set_fflags.valid := io.write.valid - io.set_fflags.bits := Mux(rvd_eew === 3.U, exceptions(1), exceptions(0)) - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.pipe0_stall := false.B -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/FPConv.scala b/arch/src/main/scala/framework/gendomain/exu/fp/FPConv.scala deleted file mode 100644 index 5d83c8f3..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/FPConv.scala +++ /dev/null @@ -1,176 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case object FPConvFactory extends FunctionalUnitFactory { - def insns = Seq(FCVT_SGL, FCVT_NRW, FCVT_WID) - def generate(implicit p: Parameters) = new FPConvPipe()(p) -} - -class FPConvPipe(implicit p: Parameters) extends PipelinedFunctionalUnit(2)(p) with HasFPUParameters { - val supported_insns = FPConvFactory.insns - - io.set_vxsat := false.B - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, - supported_insns, Nil).matched - - val rs1 = io.pipe(0).bits.rs1 - val ctrl_widen = rs1(3) - val ctrl_narrow = rs1(4) - val ctrl_signed = rs1(0) - val ctrl_out = !rs1(2) && rs1(1) - val ctrl_truncating = rs1(2) && rs1(1) - val ctrl_round_to_odd = rs1(0) - - val rvs2_data = io.pipe(0).bits.rvs2_data - val vd_eew = io.pipe(0).bits.vd_eew - val rvs2_eew = io.pipe(0).bits.rvs2_eew - val eew_select = Seq(rvs2_eew === 1.U, rvs2_eew === 2.U, rvs2_eew === 3.U) - - val fTypes = Seq(FType.H, FType.S, FType.D) - - // Single Width Conversions - val single_width_conversions = fTypes.map { fType => - val num_chunks = dLen / fType.ieeeWidth - val rvs2_chunks = rvs2_data.asTypeOf(Vec(num_chunks, UInt(fType.ieeeWidth.W))) - - // FP to Int - val fptoint_modules = Seq.fill(num_chunks)(Module(new hardfloat.RecFNToIN(fType.exp, fType.sig, fType.ieeeWidth))) - val gen_fptoint = rvs2_chunks.zip(fptoint_modules).map { case(rvs2, conv) => - conv.io.signedOut := ctrl_signed - conv.io.roundingMode := Mux(ctrl_truncating, 1.U, io.pipe(0).bits.frm) - conv.io.in := fType.recode(rvs2) - conv.io.out - } - - // Int to FP - val inttofp_modules = Seq.fill(num_chunks)(Module(new hardfloat.INToRecFN(fType.ieeeWidth, fType.exp, fType.sig))) - val gen_inttofp = rvs2_chunks.zip(inttofp_modules).map { case(rvs2, conv) => - conv.io.signedIn := ctrl_signed - conv.io.roundingMode := io.pipe(0).bits.frm - conv.io.detectTininess := hardfloat.consts.tininess_afterRounding - conv.io.in := rvs2 - fType.ieee(conv.io.out) - } - - Mux(ctrl_out, gen_inttofp.asUInt, gen_fptoint.asUInt) - } - - val single_width_out = Mux1H(eew_select, single_width_conversions) - - // Widening Conversions - val widening_conversions = fTypes.zipWithIndex.filter(_._1.ieeeWidth <= 32).map { case (fType, i) => - val num_converts = dLen / (2 * fType.ieeeWidth) - val in_sew = log2Ceil(fType.ieeeWidth / 8) - val wideType = fTypes(i+1) - - // Int to FP conversions - val wide_inttofp_modules = Seq.fill(num_converts)(Module(new hardfloat.INToRecFN(fType.ieeeWidth, wideType.exp, wideType.sig))) - val gen_inttofp = wide_inttofp_modules.zipWithIndex.map { case(wide, idx) => - wide.io.signedIn := ctrl_signed - wide.io.roundingMode := io.pipe(0).bits.frm - wide.io.detectTininess := hardfloat.consts.tininess_afterRounding - wide.io.in := extractElem(rvs2_data, in_sew.U, io.pipe(0).bits.eidx + idx.U)(fType.ieeeWidth-1,0) - wideType.ieee(wide.io.out) - }.asUInt - - // FP to FP conversions - val wide_fptofp_modules = Seq.fill(num_converts)(Module(new hardfloat.RecFNToRecFN(fType.exp, fType.sig, wideType.exp, wideType.sig))) - val gen_fptofp = wide_fptofp_modules.zipWithIndex.map{ case(wide, idx) => - wide.io.in := fType.recode(extractElem(rvs2_data, in_sew.U, io.pipe(0).bits.eidx + idx.U)(fType.ieeeWidth-1,0)) - wide.io.roundingMode := io.pipe(0).bits.frm - wide.io.detectTininess := hardfloat.consts.tininess_afterRounding - wideType.ieee(wide.io.out) - }.asUInt - - // FP to Int conversions - val wide_fptoint_modules = Seq.fill(num_converts)(Module(new hardfloat.RecFNToIN(fType.exp, fType.sig, wideType.ieeeWidth))) - val gen_fptoint = wide_fptoint_modules.zipWithIndex.map{ case(wide, idx) => - val extracted_rvs2_bits = extractElem(rvs2_data, in_sew.U, io.pipe(0).bits.eidx + idx.U)(fType.ieeeWidth-1,0) - wide.io.signedOut := ctrl_signed - wide.io.roundingMode := Mux(ctrl_truncating, 1.U, io.pipe(0).bits.frm) - wide.io.in := fType.recode(extracted_rvs2_bits) - wide.io.out - }.asUInt - - Mux1H(Seq(!rs1(2) && rs1(1), rs1(2) && !rs1(1), (!rs1(2) && !rs1(1)) || (rs1(2) && rs1(1))), - Seq(gen_inttofp , gen_fptofp , gen_fptoint)) - } - - val widening_out = Mux1H(Seq(vd_eew === 2.U, vd_eew === 3.U), widening_conversions) - - // Narrowing Conversions - // Just EEW of 32 and 64 - val narrowing_conversions = fTypes.zipWithIndex.filter(_._1.ieeeWidth >= 32).map { case (fType, i) => - val num_converts = dLen / fType.ieeeWidth - val in_sew = log2Ceil(fType.ieeeWidth / 8) - val narrowType = fTypes(i-1) - - // Int to FP Conversions - val narrow_inttofp_modules = Seq.fill(num_converts)(Module(new hardfloat.INToRecFN(fType.ieeeWidth, narrowType.exp, narrowType.sig))) - val gen_inttofp = narrow_inttofp_modules.zipWithIndex.map { case(narrow, idx) => - narrow.io.signedIn := ctrl_signed - narrow.io.roundingMode := io.pipe(0).bits.frm - narrow.io.detectTininess := hardfloat.consts.tininess_afterRounding - narrow.io.in := extractElem(rvs2_data, in_sew.U, io.pipe(0).bits.eidx + idx.U)(fType.ieeeWidth-1,0) - narrowType.ieee(narrow.io.out) - }.asUInt - - // FP to FP Conversions - val fptofp_modules = Seq.fill(num_converts)(Module(new hardfloat.RecFNToRecFN(fType.exp, fType.sig, narrowType.exp, narrowType.sig))) - val gen_fptofp = fptofp_modules.zipWithIndex.map{ case(narrow, idx) => - narrow.io.in := fType.recode(extractElem(rvs2_data, in_sew.U, io.pipe(0).bits.eidx + idx.U)(fType.ieeeWidth-1,0)) - narrow.io.roundingMode := Mux(ctrl_round_to_odd, "b110".U, io.pipe(0).bits.frm) - narrow.io.detectTininess := hardfloat.consts.tininess_afterRounding - narrowType.ieee(narrow.io.out) - }.asUInt - - // FP to Int Conversions - val fptoint_modules = Seq.fill(num_converts)(Module(new hardfloat.RecFNToIN(fType.exp, fType.sig, narrowType.ieeeWidth))) - val gen_fptoint = fptoint_modules.zipWithIndex.map { case(conv, idx) => - val extracted_rvs2_bits = extractElem(rvs2_data, in_sew.U, io.pipe(0).bits.eidx + idx.U)(fType.ieeeWidth-1,0) - conv.io.signedOut := ctrl_signed - conv.io.roundingMode := Mux(ctrl_truncating, 1.U, io.pipe(0).bits.frm) - conv.io.in := fType.recode(extracted_rvs2_bits) - conv.io.out - }.asUInt - - Mux1H(Seq(!rs1(2) && rs1(1), rs1(2) && !rs1(1), (!rs1(2) && !rs1(1)) || (rs1(2) && rs1(1))), - Seq(gen_inttofp , gen_fptofp , gen_fptoint)) - } - - // Special Case for FP16 narrowing converts - // Only narrowing FP to Int - val fp16_fptoint_modules = Seq.fill(dLen/16)(Module(new hardfloat.RecFNToIN(FType.H.exp, FType.H.sig, FType.H.ieeeWidth/2))) - val fp16_gen_fptoint = fp16_fptoint_modules.zipWithIndex.map { case(conv, idx) => - val extracted_rvs2_bits = extractElem(rvs2_data, 1.U, io.pipe(0).bits.eidx + idx.U)(FType.H.ieeeWidth-1,0) - conv.io.signedOut := ctrl_signed - conv.io.roundingMode := io.pipe(0).bits.frm - conv.io.in := FType.H.recode(Mux(ctrl_truncating, extracted_rvs2_bits(FType.H.ieeeWidth-1, FType.H.sig-2) << (FType.H.sig - 2), extracted_rvs2_bits)) - conv.io.out - }.asUInt - - - val narrowing_out = Fill(2, Mux1H(Seq(vd_eew === 0.U, vd_eew === 1.U, vd_eew === 2.U), Seq(fp16_gen_fptoint) ++ narrowing_conversions)) - - val pipe_out = Pipe(io.pipe(0).valid, Mux1H(Seq(!ctrl_widen && !ctrl_narrow, ctrl_widen, ctrl_narrow), - Seq(single_width_out, widening_out, narrowing_out))).bits - - io.write.valid := io.pipe(depth-1).valid - io.write.bits.eg := io.pipe(depth-1).bits.wvd_eg - io.write.bits.mask := FillInterleaved(8, io.pipe(depth-1).bits.wmask) - io.write.bits.data := pipe_out - - io.set_fflags := DontCare - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.pipe0_stall := false.B -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/FPDiv.scala b/arch/src/main/scala/framework/gendomain/exu/fp/FPDiv.scala deleted file mode 100644 index a2124645..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/FPDiv.scala +++ /dev/null @@ -1,370 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import chisel3.util.experimental.decode._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class VFREC7(implicit p: Parameters) extends FPUModule()(p) { - val io = IO(new Bundle { - val rvs2_input = Input(UInt(64.W)) - val eew = Input(UInt(2.W)) - val frm = Input(UInt(3.W)) - val out = Output(UInt(64.W)) - val exc = Output(UInt(5.W)) - }) - - val table = Seq( - 127, 125, 123, 121, 119, 117, 116, 114, 112, 110, // 0-9 - 109, 107, 105, 104, 102, 100, 99, 97, 96, 94, - 93, 91, 90, 88, 87, 85, 84, 83, 81, 80, - 79, 77, 76, 75, 74, 72, 71, 70, 69, 68, - 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, - 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, - 46, 45, 44, 43, 42, 41, 40, 40, 39, 38, - 37, 36, 35, 35, 34, 33, 32, 31, 31, 30, - 29, 28, 28, 27, 26, 25, 25, 24, 23, 23, - 22, 21, 21, 20, 19, 19, 18, 17, 17, 16, - 15, 15, 14, 14, 13, 12, 12, 11, 11, 10, - 9, 9, 8, 8, 7, 7, 6, 5, 5, 4, - 4, 3, 3, 2, 2, 1, 1, 0) - - def count_leading_zeros(in: UInt): UInt = { - PriorityEncoder(Reverse(in)) - } - - val rvs2_bits = io.rvs2_input - val fTypes = Seq(FType.H, FType.S, FType.D) - - val eew_sel = (1 to 3).map(_.U === io.eew) - val classify = Mux1H(eew_sel, fTypes.map(f => f.classify(f.recode(rvs2_bits(f.ieeeWidth-1,0))))) - - val dz = WireInit(false.B) - val nv = WireInit(false.B) - val of = WireInit(false.B) - val nx = WireInit(false.B) - - val ret = Wire(UInt(64.W)) - ret := 0.U // it should not be possible to fall into this case - when (classify(0)) { // -inf - ret := Mux1H(eew_sel, fTypes.map(f => 1.U ## 0.U((f.ieeeWidth-1).W))) - } .elsewhen (classify(7)) { // +inf - ret := 0.U - } .elsewhen (classify(3)) { // -0 - ret := Mux1H(eew_sel, fTypes.map(f => 1.U(1.W) ## ~(0.U((f.exp).W)) ## 0.U((f.sig-1).W))) - dz := true.B - } .elsewhen (classify(4)) { // +0 - ret := Mux1H(eew_sel, fTypes.map(f => 0.U(1.W) ## ~(0.U((f.exp).W)) ## 0.U((f.sig-1).W))) - dz := true.B - } .elsewhen (classify(8)) { // sNaN - ret := Mux1H(eew_sel, fTypes.map(f => f.ieeeQNaN)) - nv := true.B - } .elsewhen (classify(9)) { // qNaN - ret := Mux1H(eew_sel, fTypes.map(f => f.ieeeQNaN)) - } .otherwise { - val sub = classify(2) || classify(5) - val exp = Mux1H(eew_sel, fTypes.map(f => (rvs2_bits >> (f.sig - 1))(f.exp-1,0))) - val sig = Mux1H(eew_sel, fTypes.map(f => rvs2_bits(f.sig-2,0))) - val sign = Mux1H(eew_sel, fTypes.map(f => rvs2_bits(f.ieeeWidth-1))) - - val norm_exp = WireInit(exp) - val norm_sig = WireInit(sig) - val round_abnormal = WireInit(false.B) - - when (sub) { - val leading_zeros = Mux1H(eew_sel, fTypes.map(f => count_leading_zeros(sig(f.sig-2,0)))) - - val exp_new = exp - leading_zeros - val sig_new = (sig << (leading_zeros +& 1.U)) & Mux1H(eew_sel, fTypes.map(f => ~(0.U((f.sig-1).W)))) - norm_exp := exp_new - norm_sig := sig_new - - when (exp_new =/= 0.U && ~exp_new =/= 0.U) { - round_abnormal := true.B - when (io.frm === 1.U || - (io.frm === 2.U && !sign) || - (io.frm === 3.U && sign)) { - ret := Mux1H(eew_sel, fTypes.map(f => (sign << (f.sig + f.exp - 1)) | (~(0.U(f.exp.W)) << (f.sig - 1)))) - 1.U - } .otherwise { - ret := Mux1H(eew_sel, fTypes.map(f => (sign << (f.sig + f.exp - 1)) | (~(0.U(f.exp.W)) << (f.sig - 1)))) - } - } - } - - when (!round_abnormal) { - val idx = Mux1H(eew_sel, fTypes.map(f => norm_sig >> (f.sig - 1 - 7)))(6,0) - val lookup = VecInit(table.map(_.U(7.W)))(idx) - val default_out_sig = Mux1H(eew_sel, fTypes.map(f => lookup << (f.sig - 1 - 7))) - val biases = fTypes.map(f => (1 << (f.exp - 1)) - 1) - val default_out_exp = Mux1H(eew_sel, fTypes.zip(biases).map { case (f, b) => - (2 * b).U + ~norm_exp - }) - - val out_sig = WireInit(default_out_sig) - val out_exp = WireInit(default_out_exp) - - when (default_out_exp === 0.U || (~default_out_exp === 0.U)) { - out_sig := (default_out_sig >> 1) | Mux1H(eew_sel, fTypes.map(f => 1.U << (f.sig - 1 - 1))) - when (~default_out_exp === 0.U) { - out_sig := default_out_sig >> 1; - out_exp := 0.U - } - } - ret := Mux1H(eew_sel, fTypes.map(f => sign ## out_exp(f.exp-1,0) ## out_sig(f.sig-2,0))) - } - - when (round_abnormal) { - of := true.B - nx := true.B - } - } - - io.out := Mux1H(eew_sel, fTypes.map(f => Fill(64 / f.ieeeWidth, ret(f.ieeeWidth-1,0)))) - io.exc := nv ## dz ## of ## false.B ## nx -} - - -class VFRSQRT7(implicit p: Parameters) extends FPUModule()(p) { - val io = IO(new Bundle { - val rvs2_input = Input(UInt(64.W)) - val eew = Input(UInt(2.W)) - val out = Output(UInt(64.W)) - val exc = Output(UInt(5.W)) - }) - - val table = Seq( - 52, 51, 50, 48, 47, 46, 44, 43, - 42, 41, 40, 39, 38, 36, 35, 34, - 33, 32, 31, 30, 30, 29, 28, 27, - 26, 25, 24, 23, 23, 22, 21, 20, - 19, 19, 18, 17, 16, 16, 15, 14, - 14, 13, 12, 12, 11, 10, 10, 9, - 9, 8, 7, 7, 6, 6, 5, 4, - 4, 3, 3, 2, 2, 1, 1, 0, - 127, 125, 123, 121, 119, 118, 116, 114, - 113, 111, 109, 108, 106, 105, 103, 102, - 100, 99, 97, 96, 95, 93, 92, 91, - 90, 88, 87, 86, 85, 84, 83, 82, - 80, 79, 78, 77, 76, 75, 74, 73, - 72, 71, 70, 70, 69, 68, 67, 66, - 65, 64, 63, 63, 62, 61, 60, 59, - 59, 58, 57, 56, 56, 55, 54, 53 - ) - - def count_leading_zeros(in: UInt): UInt = { - PriorityEncoder(Reverse(in)) - } - - - val rvs2_bits = io.rvs2_input - val fTypes = Seq(FType.H, FType.S, FType.D) - - val eew_sel = (1 to 3).map(_.U === io.eew) - val classify = Mux1H(eew_sel, fTypes.map(f => f.classify(f.recode(rvs2_bits(f.ieeeWidth-1,0))))) - - val dz = WireInit(false.B) - val nv = WireInit(false.B) - val of = WireInit(false.B) - val nx = WireInit(false.B) - - val ret = Wire(UInt(64.W)) - ret := 0.U // it should not be possible to fall into this case - - when (classify(0) || classify(1) || classify(2) || classify(8)) { // -inf, -normal, -subnormal, sNaN - nv := true.B - ret := Mux1H(eew_sel, fTypes.map(f => f.ieeeQNaN)) - } .elsewhen (classify(9)) { // qNaN - ret := Mux1H(eew_sel, fTypes.map(f => f.ieeeQNaN)) - } .elsewhen (classify(3)) { // -0 - ret := Mux1H(eew_sel, fTypes.map(f => 1.U(1.W) ## ~(0.U((f.exp).W)) ## 0.U((f.sig-1).W))) - dz := true.B - } .elsewhen (classify(4)) { // +0 - ret := Mux1H(eew_sel, fTypes.map(f => 0.U(1.W) ## ~(0.U((f.exp).W)) ## 0.U((f.sig-1).W))) - dz := true.B - } .elsewhen (classify(7)) { // +inf - ret := 0.U - } .otherwise { - val sub = classify(5) - - val exp = Mux1H(eew_sel, fTypes.map(f => (rvs2_bits >> (f.sig - 1))(f.exp-1,0))) - val sig = Mux1H(eew_sel, fTypes.map(f => rvs2_bits(f.sig-2,0))) - val sign = Mux1H(eew_sel, fTypes.map(f => rvs2_bits(f.ieeeWidth-1))) - - val norm_exp = Wire(UInt((1+fTypes.map(_.exp).max).W)) - norm_exp := exp - val norm_sig = WireInit(sig) - - when (sub) { - val leading_zeros = Mux1H(eew_sel, fTypes.map(f => count_leading_zeros(sig(f.sig-2,0)))) - val exp_new = (0.U(1.W) ## exp) - leading_zeros - val sig_new = (sig << (leading_zeros +& 1.U)) & Mux1H(eew_sel, fTypes.map(f => ~(0.U((f.sig-1).W)))) - norm_exp := exp_new - norm_sig := sig_new - } - - val idx = ((norm_exp(0) << 6) | Mux1H(eew_sel, fTypes.map(f => norm_sig >> (f.sig - 1 - 7 + 1))))(6,0) - val lookup = VecInit(table.map(_.U(7.W)))(idx) - val out_sig = Mux1H(eew_sel, fTypes.map(f => lookup << (f.sig - 1 - 7))) - val biases = fTypes.map(f => (1 << (f.exp - 1)) - 1) - val out_exp = Mux1H(eew_sel, fTypes.zip(biases).map { case (f, b) => - val bias3 = ((3 * b).S((f.exp + 2).W) - norm_exp.asSInt - 1.S).asUInt - bias3 >> 1 - }) - - ret := Mux1H(eew_sel, fTypes.map(f => sign ## out_exp(f.exp-1,0) ## out_sig(f.sig-2,0))) - } - io.out := Mux1H(eew_sel, fTypes.map(f => Fill(64 / f.ieeeWidth, ret(f.ieeeWidth-1,0)))) - io.exc := nv ## dz ## of ## false.B ## nx - -} - -case object FPDivSqrtFactory extends FunctionalUnitFactory { - def insns = Seq( - FDIV.VV, FDIV.VF, - FRDIV.VF, - FSQRT_V, - FRSQRT7_V, - FREC7_V, - FCLASS_V - ).map(_.elementWise) - - def generate(implicit p: Parameters) = new FPDivSqrt()(p) -} - -class FPDivSqrt(implicit p: Parameters) extends IterativeFunctionalUnit()(p) with HasFPUParameters { - val supported_insns = FPDivSqrtFactory.insns - io.set_vxsat := false.B - - val divSqrt = Module(new hardfloat.DivSqrtRecF64) - val divSqrt16 = Module(new hardfloat.DivSqrtRecFN_small(FType.H.exp, FType.H.sig, 0)) - - val accept_inst = new VectorDecoder( - io.iss.op.funct3, io.iss.op.funct6, io.iss.op.rs1, io.iss.op.rs2, - supported_insns, - Seq(FPSwapVdV2)) - - val ctrl = new VectorDecoder( - op.funct3, op.funct6, op.rs1, op.rs2, - supported_insns, - Seq(FPSwapVdV2)) - - val ctrl_isDiv = io.iss.op.opff6.isOneOf(OPFFunct6.fdiv, OPFFunct6.frdiv) - val divSqrt_ready = (ctrl_isDiv && divSqrt.io.inReady_div) || (!ctrl_isDiv && divSqrt.io.inReady_sqrt) - val divSqrt16_ready = divSqrt16.io.inReady - - val div_op = op.opff6.isOneOf(OPFFunct6.fdiv, OPFFunct6.frdiv) - - val rvs2_bits = op.rvs2_elem - val rvs1_bits = op.rvs1_elem - - divSqrt.io.detectTininess := hardfloat.consts.tininess_afterRounding - divSqrt.io.roundingMode := op.frm - divSqrt16.io.detectTininess := hardfloat.consts.tininess_afterRounding - divSqrt16.io.roundingMode := op.frm - - val iss_fire_pipe = Reg(Bool()) - iss_fire_pipe := io.iss.valid && io.iss.ready - - divSqrt.io.inValid := iss_fire_pipe && !(op.rvd_eew === 1.U) && (div_op || (op.opff6 === OPFFunct6.funary1 && op.rs1 === 0.U)) - divSqrt.io.sqrtOp := !div_op - divSqrt16.io.inValid := iss_fire_pipe && (op.rvd_eew === 1.U) && (div_op || (op.opff6 === OPFFunct6.funary1 && op.rs1 === 0.U)) - divSqrt16.io.sqrtOp := !div_op - - io.hazard.valid := valid - io.hazard.bits.eg := op.wvd_eg - - when (op.rvs1_eew === 3.U) { - divSqrt.io.a := Mux(ctrl.bool(FPSwapVdV2) && div_op, FType.D.recode(rvs1_bits), FType.D.recode(rvs2_bits)) - divSqrt.io.b := Mux(ctrl.bool(FPSwapVdV2) || !div_op, FType.D.recode(rvs2_bits), FType.D.recode(rvs1_bits)) - } .otherwise { - val narrow_rvs2_bits = rvs2_bits(31,0) - val narrow_rvs1_bits = rvs1_bits(31,0) - val widen = Seq(FType.S.recode(narrow_rvs2_bits), FType.S.recode(narrow_rvs1_bits)).zip( - Seq.fill(2)(Module(new hardfloat.RecFNToRecFN(8, 24, 11, 53)))).map { case(input, upconvert) => - upconvert.io.in := input - upconvert.io.roundingMode := op.frm - upconvert.io.detectTininess := hardfloat.consts.tininess_afterRounding - upconvert - } - - divSqrt.io.a := Mux(ctrl.bool(FPSwapVdV2) && div_op, widen(1).io.out, widen(0).io.out) - divSqrt.io.b := Mux(ctrl.bool(FPSwapVdV2) || !div_op, widen(0).io.out, widen(1).io.out) - } - - divSqrt16.io.a := Mux(ctrl.bool(FPSwapVdV2) && div_op, FType.H.recode(rvs1_bits), FType.H.recode(rvs2_bits)) - divSqrt16.io.b := Mux(ctrl.bool(FPSwapVdV2) || !div_op, FType.H.recode(rvs2_bits), FType.H.recode(rvs1_bits)) - - val divSqrt_out_valid = divSqrt.io.outValid_div || divSqrt.io.outValid_sqrt - val divSqrt16_out_valid = divSqrt16.io.outValid_div || divSqrt16.io.outValid_sqrt - - val narrow = Module(new hardfloat.RecFNToRecFN(11, 53, 8, 24)) - narrow.io.roundingMode := op.frm - narrow.io.detectTininess := hardfloat.consts.tininess_afterRounding - narrow.io.in := divSqrt.io.out - - val divSqrt_out = Mux(op.vd_eew === 3.U, FType.D.ieee(divSqrt.io.out), Fill(2, FType.S.ieee(narrow.io.out))) - - val out_buffer = RegEnable(divSqrt_out, divSqrt_out_valid) - val out_toWrite = RegInit(false.B) - val divSqrt_write = Mux(out_toWrite, out_buffer, divSqrt_out) - - val divSqrt16_out = FType.H.ieee(divSqrt16.io.out) - val out16_buffer = RegEnable(divSqrt16_out, divSqrt16_out_valid) - val out16_toWrite = RegInit(false.B) - val divSqrt16_write = Mux(out16_toWrite, out16_buffer, divSqrt16_out) - - // vfclass instruction - val gen_vfclass = Seq(FType.H, FType.S, FType.D).zipWithIndex.map { case(fType, i) => - Fill(2, Cat(0.U((fType.ieeeWidth-10).W), fType.classify(fType.recode(rvs2_bits(fType.ieeeWidth-1,0))))) - } - - val vfclass_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 16.U - val vfrsqrt7_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 4.U - val vfrec7_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 5.U - - // Reciprocal Sqrt Approximation - val recSqrt7 = Module(new VFRSQRT7) - recSqrt7.io.rvs2_input := Mux(valid && vfrsqrt7_inst, rvs2_bits, 0.U) - recSqrt7.io.eew := op.rvs2_eew - - // Reciprocal Approximation - val rec7 = Module(new VFREC7) - rec7.io.rvs2_input := Mux(valid && vfrec7_inst, rvs2_bits, 0.U) - rec7.io.eew := op.rvs2_eew - rec7.io.frm := op.frm - - // Capture result in case of write port backpressure - when (io.write.fire) { - out_toWrite := false.B - out16_toWrite := false.B - } .elsewhen (divSqrt_out_valid) { - out_toWrite := true.B - out16_toWrite := true.B - } - - val out = Mux1H( - Seq(vfclass_inst, vfrsqrt7_inst, vfrec7_inst, out_toWrite || divSqrt_out_valid || divSqrt16_out_valid), - Seq(Mux1H(Seq(op.rvs2_eew === 3.U, op.rvs2_eew === 2.U, op.rvs2_eew === 1.U), Seq(gen_vfclass(2), gen_vfclass(1), gen_vfclass(0))), recSqrt7.io.out, rec7.io.out, divSqrt_write) - )(63,0) - - io.write.valid := ((vfclass_inst || vfrsqrt7_inst || vfrec7_inst) && valid) || out_toWrite || divSqrt_out_valid - io.write.bits.eg := op.wvd_eg - io.write.bits.mask := FillInterleaved(8, op.wmask) - io.write.bits.data := Fill(dLenB >> 3, out) - io.iss.ready := accept_inst.matched && ((divSqrt_ready && io.iss.op.vd_eew >= 2.U) || (divSqrt16_ready && io.iss.op.vd_eew === 1.U)) && (!valid || last) - last := io.write.fire - - io.set_fflags.valid := divSqrt_out_valid || divSqrt16_out_valid || (vfrsqrt7_inst && io.write.fire) || (vfrec7_inst && io.write.fire) - io.set_fflags.bits := (divSqrt.io.exceptionFlags & Fill(5, divSqrt_out_valid)) | divSqrt16.io.exceptionFlags & Fill(5, divSqrt_out_valid) | (recSqrt7.io.exc & Fill(5, vfrsqrt7_inst)) | (rec7.io.exc & Fill(5, vfrec7_inst)) - - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - - io.acc := false.B - io.tail := false.B -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/FPFMAPipe.scala b/arch/src/main/scala/framework/gendomain/exu/fp/FPFMAPipe.scala deleted file mode 100644 index f0831052..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/FPFMAPipe.scala +++ /dev/null @@ -1,205 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - - -class TandemFMAPipe(depth: Int)(implicit p: Parameters) extends FPUModule()(p) { - val io = IO(new Bundle { - val valid = Input(Bool()) - val frm = Input(UInt(3.W)) - val addsub = Input(Bool()) - val mul = Input(Bool()) - val op = Input(UInt(2.W)) - val a_eew = Input(UInt(2.W)) - val b_eew = Input(UInt(2.W)) - val c_eew = Input(UInt(2.W)) - val out_eew = Input(UInt(2.W)) - val a = Input(UInt(64.W)) - val b = Input(UInt(64.W)) - val c = Input(UInt(64.W)) - val mask = Input(UInt(4.W)) - val out = Output(UInt(64.W)) - val exc = Output(UInt(5.W)) - }) - - val out_eew_pipe = Pipe(io.valid, io.out_eew, depth-1) - val frm_pipe = Pipe(io.valid, io.frm, depth-1) - val mask_pipe = Pipe(io.valid, io.mask, depth-1) - - val fTypes = Seq(FType.D, FType.S, FType.H) - - val fma_results = fTypes.zipWithIndex.map { case(fType, j) => - val n = 64 / fType.ieeeWidth - val fma_eew = log2Ceil(fType.ieeeWidth >> 3) - - val results = (0 until n).map { i => - val fma = Module(new MulAddRecFNPipe((depth-1) min 2, fType.exp, fType.sig)) - val validin = io.valid && (io.out_eew === fma_eew.U) - val msb_idx = ((i + 1) * fType.ieeeWidth) - 1 - val lsb_idx = i * fType.ieeeWidth - - val inputs = Seq((io.a, io.a_eew), (io.b, io.b_eew), (io.c, io.c_eew)).map { case(in, eew) => - if (j <= 1) { - val widen = Module(new hardfloat.RecFNToRecFN(fTypes(j+1).exp, fTypes(j+1).sig, fType.exp, fType.sig)) - widen.io.in := fTypes(j+1).recode(Mux(validin && io.mask(i*(4/n)), in(fTypes(j+1).ieeeWidth-1,0), 0.U)) - widen.io.roundingMode := io.frm - widen.io.detectTininess := hardfloat.consts.tininess_afterRounding - - Mux(eew =/= fma_eew.U, widen.io.out, fType.recode(Mux(validin && io.mask(i*(4/n)), in(msb_idx, lsb_idx), 0.U))) - } else { - fType.recode(Mux(validin && io.mask(i*(4/n)), in(msb_idx, lsb_idx), 0.U)) - } - } - - fma.io.validin := validin - fma.io.op := Mux(validin, io.op, 0.U) - fma.io.roundingMode := Mux(validin, io.frm, 0.U) - fma.io.detectTininess := hardfloat.consts.tininess_afterRounding - fma.io.a := inputs(0) - fma.io.b := Mux(io.addsub, 1.U << (fType.ieeeWidth - 1), inputs(1)) - fma.io.c := Mux(io.mul, (inputs(0) ^ inputs(1)) & (1.U << fType.ieeeWidth), inputs(2)) - - val out = Pipe(fma.io.validout, fType.ieee(fma.io.out), (depth-3) max 0).bits - val exc = Pipe(fma.io.validout, fma.io.exceptionFlags, (depth-3) max 0).bits - (out, exc) - } - val out = results.map(_._1).asUInt - val exc = results.map(_._2).zipWithIndex.map{ case(e, i) => e & Fill(5, mask_pipe.bits(i)) }.reduce(_ | _) - (out, exc) - } - - val out_sel_oh = fTypes.map{ fType => log2Ceil(fType.ieeeWidth >> 3).U === out_eew_pipe.bits} - io.out := Mux1H(out_sel_oh, fma_results.map(_._1)) - io.exc := Mux1H(out_sel_oh, fma_results.map(_._2)) -} - -case class FPFMAFactory(depth: Int, sharedScalar: Boolean) extends FunctionalUnitFactory { - def base_insns = Seq( - FADD.VV, FADD.VF, FSUB.VV, FSUB.VF, FRSUB.VF, - FMUL.VV, FMUL.VF, - FMACC.VV, FMACC.VF, FNMACC.VV, FNMACC.VF, - FMSAC.VV, FMSAC.VF, FNMSAC.VV, FNMSAC.VF, - FMADD.VV, FMADD.VF, FNMADD.VV, FNMADD.VF, - FMSUB.VV, FMSUB.VF, FNMSUB.VV, FNMSUB.VF, - FWADD.VV, FWADD.VF, FWSUB.VV, FWSUB.VF, - FWADDW.VV, FWADDW.VF, FWSUBW.VV, FWSUBW.VF, - FWMUL.VV, FWMUL.VF, - FWMACC.VV, FWMACC.VF, FWNMACC.VV, FWNMACC.VF, - FWMSAC.VV, FWMSAC.VF, FWNMSAC.VV, FWNMSAC.VF, - FREDOSUM.VV, FREDUSUM.VV, FWREDOSUM.VV, FWREDUSUM.VV - ) - def insns = if (sharedScalar) base_insns.map(_.elementWise) else base_insns - def generate(implicit p: Parameters) = if (sharedScalar) { - new SharedScalarElementwiseFPFMA(depth)(p) - } else { - new FPFMAPipe(depth)(p) - } -} - -class FPFMAPipe(depth: Int)(implicit p: Parameters) extends PipelinedFunctionalUnit(depth)(p) with HasFPUParameters { - val supported_insns = FPFMAFactory(depth, false).insns - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - - io.set_vxsat := false.B - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, supported_insns, Seq( - FPAdd, FPMul, FPSwapVdV2, FPFMACmd)) - - val vs1_eew = io.pipe(0).bits.rvs1_eew - val vs2_eew = io.pipe(0).bits.rvs2_eew - val vd_eew = io.pipe(0).bits.vd_eew - val ctrl_widen_vs2 = vs2_eew =/= vd_eew - val ctrl_widen_vs1 = vs1_eew =/= vd_eew - val wmask = io.pipe(0).bits.wmask - - val nTandemFMA = dLenB / 8 - - val eidx = Mux(io.pipe(0).bits.acc, 0.U, io.pipe(0).bits.eidx) - val one_bits = Mux1H(Seq(vd_eew === 3.U, vd_eew === 2.U, vd_eew === 1.U), - Seq("h3FF0000000000000".U, "h3F8000003F800000".U, "h3C003C003C003C00".U)) - val fmaCmd = ctrl.uint(FPFMACmd) - - val vec_rvs1 = io.pipe(0).bits.rvs1_data.asTypeOf(Vec(nTandemFMA, UInt(64.W))) - val vec_rvs2 = io.pipe(0).bits.rvs2_data.asTypeOf(Vec(nTandemFMA, UInt(64.W))) - val vec_rvd = io.pipe(0).bits.rvd_data.asTypeOf(Vec(nTandemFMA, UInt(64.W))) - - val fma_pipes = Seq.fill(nTandemFMA)(Module(new TandemFMAPipe(depth))).zipWithIndex.map { case(fma_pipe, i) => - val widening_vs1_bits = extractElem(io.pipe(0).bits.rvs1_data, 2.U, eidx + i.U)(31,0) - val rs1_bits = Mux(ctrl_widen_vs1, widening_vs1_bits, vec_rvs1(i)) - val widening_vs2_bits = extractElem(io.pipe(0).bits.rvs2_data, 2.U, eidx + i.U)(31,0) - val vs2_bits = Mux(ctrl_widen_vs2, widening_vs2_bits, vec_rvs2(i)) - - fma_pipe.io.mask := ((vs1_eew === 1.U) && wmask((i*8)+6)) ## ((vs1_eew <= 2.U) && wmask((i*8)+4)) ## - ((vs1_eew === 1.U) && wmask((i*8)+2)) ## wmask(i*8) - fma_pipe.io.addsub := ctrl.bool(FPAdd) && !ctrl.bool(FPMul) - fma_pipe.io.mul := ctrl.bool(FPMul) && !ctrl.bool(FPAdd) - fma_pipe.io.out_eew := vd_eew - - // FMA - when (ctrl.bool(FPMul) && ctrl.bool(FPAdd)) { - fma_pipe.io.b := rs1_bits - fma_pipe.io.b_eew := vs1_eew - when (ctrl.bool(FPSwapVdV2)) { - fma_pipe.io.a := vec_rvd(i) - fma_pipe.io.a_eew := vd_eew - fma_pipe.io.c := vs2_bits - fma_pipe.io.c_eew := vs2_eew - } .otherwise { - fma_pipe.io.a := vs2_bits - fma_pipe.io.a_eew := vs2_eew - fma_pipe.io.c := vec_rvd(i) - fma_pipe.io.c_eew := vd_eew - } - } - // Multiply - .elsewhen (ctrl.bool(FPMul)) { - fma_pipe.io.a := vs2_bits - fma_pipe.io.a_eew := vs2_eew - fma_pipe.io.b := rs1_bits - fma_pipe.io.b_eew := vs1_eew - fma_pipe.io.c := 0.U - fma_pipe.io.c_eew := vs2_eew - } - // Add type - .elsewhen (ctrl.bool(FPAdd)) { - fma_pipe.io.a := vs2_bits - fma_pipe.io.a_eew := vs2_eew - fma_pipe.io.b := one_bits - fma_pipe.io.b_eew := vd_eew - fma_pipe.io.c := rs1_bits - fma_pipe.io.c_eew := vs1_eew - } .otherwise { - fma_pipe.io.a := 0.U - fma_pipe.io.a_eew := 0.U - fma_pipe.io.b := 0.U - fma_pipe.io.b_eew := 0.U - fma_pipe.io.c := 0.U - fma_pipe.io.c_eew := 0.U - } - - fma_pipe.io.valid := io.pipe(0).valid - fma_pipe.io.frm := io.pipe(0).bits.frm - fma_pipe.io.op := fmaCmd - - fma_pipe.io - } - - io.write.valid := io.pipe(depth-1).valid - io.write.bits.eg := io.pipe(depth-1).bits.wvd_eg - io.write.bits.mask := FillInterleaved(8, io.pipe(depth-1).bits.wmask) - io.write.bits.data := fma_pipes.map(pipe => pipe.out).asUInt - - io.set_fflags.valid := io.write.valid - io.set_fflags.bits := fma_pipes.map(pipe => pipe.exc).reduce(_ | _) - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.pipe0_stall := false.B -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/SharedFPFMA.scala b/arch/src/main/scala/framework/gendomain/exu/fp/SharedFPFMA.scala deleted file mode 100644 index 6887beff..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/SharedFPFMA.scala +++ /dev/null @@ -1,157 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -trait HasSharedFPUIO { - implicit val p: Parameters - val io_fp_req = IO(Decoupled(new FPInput())) - val io_fp_active = IO(Output(Bool())) - val io_fp_resp = IO(Flipped(Valid(new FPResult()))) -} - -class SharedScalarElementwiseFPFMA(depth: Int)(implicit p: Parameters) extends PipelinedFunctionalUnit(depth)(p) - with HasFPUParameters - with HasSharedFPUIO { - - val supported_insns = FPFMAFactory(depth, true).insns - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, supported_insns, Seq( - FPAdd, FPMul, FPSwapVdV2, FPFMACmd, ReadsVD, FPSpecRM, Wide2VD, Wide2VS2, Reduction)) - - val vs1_eew = io.pipe(0).bits.rvs1_eew - val vs2_eew = io.pipe(0).bits.rvs2_eew - val vd_eew = io.pipe(0).bits.vd_eew - val vd_eew64 = io.pipe(0).bits.vd_eew64 - val vd_eew32 = io.pipe(0).bits.vd_eew32 - val vd_eew16 = io.pipe(0).bits.vd_eew16 - val eidx = Mux(io.pipe(0).bits.acc, 0.U, io.pipe(0).bits.eidx) - - // Functional unit is ready if not currently running and the scalar FPU is available - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - - io_fp_active := io.pipe.tail.map(_.valid).orR // head is pipe0, issuing the request - - // Create FPInput - val req = Wire(new FPInput) - req.ldst := false.B - req.wen := false.B - req.ren1 := true.B - req.ren2 := true.B - req.ren3 := ctrl.bool(ReadsVD) - req.swap12 := false.B - req.swap23 := ctrl.bool(FPAdd) && !ctrl.bool(FPMul) - req.typeTagIn := Mux(vd_eew64, D, Mux(vd_eew32, S, H)) - req.typeTagOut := Mux(vd_eew64, D, Mux(vd_eew32, S, H)) - req.fromint := false.B - req.toint := false.B - req.fastpipe := false.B - req.fma := true.B - req.div := false.B - req.sqrt := false.B - req.wflags := true.B - req.vec := true.B - req.rm := io.pipe(0).bits.frm - req.fmaCmd := ctrl.uint(FPFMACmd) - req.typ := 0.U - req.fmt := 0.U - - val rvs2_elem = io.pipe(0).bits.rvs2_elem - val rvs1_elem = io.pipe(0).bits.rvs1_elem - val rvd_elem = io.pipe(0).bits.rvd_elem - - val h_rvs2 = FType.H.recode(rvs2_elem(15,0)) - val h_rvs1 = FType.H.recode(rvs1_elem(15,0)) - val h_rvd = FType.H.recode(rvd_elem(15,0)) - - // For widening operations, widen the narrow operands to compute with the scalar FPU - val h_widen_rvs1 = Module(new hardfloat.RecFNToRecFN(5, 11, 8, 24)) - h_widen_rvs1.io.in := h_rvs1 - h_widen_rvs1.io.roundingMode := io.pipe(0).bits.frm - h_widen_rvs1.io.detectTininess := hardfloat.consts.tininess_afterRounding - - val h_widen_rvs2 = Module(new hardfloat.RecFNToRecFN(5, 11, 8, 24)) - h_widen_rvs2.io.in := h_rvs2 - h_widen_rvs2.io.roundingMode := io.pipe(0).bits.frm - h_widen_rvs2.io.detectTininess := hardfloat.consts.tininess_afterRounding - - val s_rvs2 = FType.S.recode(rvs2_elem(31,0)) - val s_rvs1 = FType.S.recode(rvs1_elem(31,0)) - val s_rvd = FType.S.recode(rvd_elem(31,0)) - - // For widening operations, widen the narrow operands to compute with the scalar FPU - val s_widen_rvs1 = Module(new hardfloat.RecFNToRecFN(8, 24, 11, 53)) - s_widen_rvs1.io.in := s_rvs1 - s_widen_rvs1.io.roundingMode := io.pipe(0).bits.frm - s_widen_rvs1.io.detectTininess := hardfloat.consts.tininess_afterRounding - - val s_widen_rvs2 = Module(new hardfloat.RecFNToRecFN(8, 24, 11, 53)) - s_widen_rvs2.io.in := s_rvs2 - s_widen_rvs2.io.roundingMode := io.pipe(0).bits.frm - s_widen_rvs2.io.detectTininess := hardfloat.consts.tininess_afterRounding - - val d_rvs2 = FType.D.recode(rvs2_elem) - val d_rvs1 = FType.D.recode(rvs1_elem) - val d_rvd = FType.D.recode(rvd_elem) - - val rvs2_recoded = Mux(vd_eew64, d_rvs2, Mux(vd_eew32, s_rvs2, h_rvs2)) - val rvs1_recoded = Mux(vd_eew64, d_rvs1, Mux(vd_eew32, s_rvs1, h_rvs1)) - val rvd_recoded = Mux(vd_eew64, d_rvd, Mux(vd_eew32, s_rvd, h_rvd)) - - // Set req.in1 - when (ctrl.bool(FPSwapVdV2)) { - req.in1 := rvd_recoded - } .elsewhen (vs2_eew === 3.U) { - req.in1 := d_rvs2 - } .elsewhen (ctrl.bool(Wide2VD) && vd_eew64) { - req.in1 := s_widen_rvs2.io.out - } .elsewhen (ctrl.bool(Wide2VD) && vd_eew32) { - req.in1 := h_widen_rvs2.io.out - } .elsewhen (vs2_eew === 2.U) { - req.in1 := s_rvs2 - } .otherwise { - req.in1 := h_rvs2 - } - - // Set req.in2 - when (vs1_eew === 3.U) { - req.in2 := d_rvs1 - } .elsewhen (ctrl.bool(Wide2VD) && (vs1_eew === 2.U) && !io.pipe(0).bits.acc) { - req.in2 := s_widen_rvs1.io.out - } .elsewhen (ctrl.bool(Wide2VD) && (vs1_eew === 1.U) && !io.pipe(0).bits.acc) { - req.in2 := h_widen_rvs1.io.out - } .elsewhen (vs1_eew === 2.U) { - req.in2 := s_rvs1 - } .otherwise { - req.in2 := h_rvs1 - } - - // Set req.in3 - when (ctrl.bool(FPSwapVdV2)) { - req.in3 := rvs2_recoded - } .otherwise { - req.in3 := rvd_recoded - } - - io_fp_req.bits := req - io_fp_req.valid := io.pipe(0).valid - io.pipe0_stall := !io_fp_req.ready - - when (io.pipe(depth-1).valid) { assert(io_fp_resp.valid) } - io.write.valid := io.pipe(depth-1).valid - io.write.bits.eg := io.pipe(depth-1).bits.wvd_eg - io.write.bits.mask := FillInterleaved(8, io.pipe(depth-1).bits.wmask) - io.write.bits.data := Fill(dLenB >> 3, Mux(io.pipe(depth-1).bits.vd_eew === 3.U, FType.D.ieee(io_fp_resp.bits.data), Mux(io.pipe(depth-1).bits.vd_eew === 2.U, - Fill(2, FType.S.ieee(unbox(io_fp_resp.bits.data, S, Some(FType.S)))), Fill(4, FType.H.ieee(unbox(io_fp_resp.bits.data, H, Some(FType.H))))))) - - io.set_fflags := DontCare - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.set_vxsat := false.B -} diff --git a/arch/src/main/scala/framework/gendomain/exu/fp/SharedFPMisc.scala b/arch/src/main/scala/framework/gendomain/exu/fp/SharedFPMisc.scala deleted file mode 100644 index 37eb9f59..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/fp/SharedFPMisc.scala +++ /dev/null @@ -1,222 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case object SharedFPMiscFactory extends FunctionalUnitFactory { - def insns = Seq( - FDIV.VV, FDIV.VF, - FRDIV.VF, - FSQRT_V, - FRSQRT7_V, - FREC7_V, - FCLASS_V, - FMIN.VV, FMIN.VF, FMAX.VV, FMAX.VF, - FSGNJ.VV, FSGNJ.VF, FSGNJN.VV, FSGNJN.VF, FSGNJX.VV, FSGNJX.VF, - MFEQ.VV, MFEQ.VF, MFNE.VV, MFNE.VF, - MFLT.VV, MFLT.VF, MFLE.VV, MFLE.VF, - MFGT.VF, MFGE.VF, - FREDMIN.VV, FREDMAX.VV, - FCVT_SGL, FCVT_WID, FCVT_NRW - ).map(_.elementWise) - def generate(implicit p: Parameters) = new SharedScalarElementwiseFPMisc()(p) -} - -class SharedScalarElementwiseFPMisc(implicit p: Parameters) extends IterativeFunctionalUnit()(p) - with HasFPUParameters - with HasSharedFPUIO { - - val fp_req = Wire(Decoupled(new FPInput)) - io_fp_req <> fp_req - - val supported_insns = SharedFPMiscFactory.insns - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched && !valid && io_fp_req.ready - - val ctrl = new VectorDecoder(op.funct3, op.funct6, 0.U, 0.U, supported_insns, Seq( - FPSwapVdV2, ReadsVD, WritesAsMask, FPSgnj, FPComp, FPSpecRM, FPMNE, FPMGT, Wide2VD, Wide2VS2, Reduction)) - - val issued = Reg(Bool()) - val has_wdata = Reg(Bool()) - val wdata = Reg(UInt(64.W)) - when (io.iss.valid && io.iss.ready) { - issued := false.B - has_wdata := false.B - } - - val vs1_eew = op.rvs1_eew - val vs2_eew = op.rvs2_eew - val vd_eew = op.vd_eew - val vd_eew64 = op.vd_eew64 - val vd_eew32 = op.vd_eew32 - val vd_eew16 = op.vd_eew16 - val eidx = Mux(op.acc, 0.U, op.eidx) - - val ctrl_isDiv = op.opff6.isOneOf(OPFFunct6.fdiv, OPFFunct6.frdiv) - val ctrl_funary0 = op.opff6.isOneOf(OPFFunct6.funary0) - val ctrl_funary1 = op.opff6.isOneOf(OPFFunct6.funary1) - val ctrl_vfclass = ctrl_funary1 && (op.rs1 === 16.U) - val ctrl_swap12 = op.opff6.isOneOf(OPFFunct6.frdiv) - - val rs1 = op.rs1 - val ctrl_widen = ctrl_funary0 && rs1(3) - val ctrl_narrow = rs1(4) - val ctrl_single_wide = ctrl_funary0 && !ctrl_widen && !ctrl_narrow - val ctrl_signed = rs1(0) - val ctrl_truncating = rs1(2) && rs1(1) - val ctrl_round_to_odd = rs1(0) - val ctrl_fptoint = ctrl_funary0 && ((!rs1(2) && !rs1(1)) || (rs1(2) && rs1(1))) - val ctrl_inttofp = ctrl_funary0 && (!rs1(2) && rs1(1)) - val ctrl_fptofp = ctrl_funary0 && (rs1(2) && !rs1(1)) - - val vfclass_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 16.U && valid - val vfrsqrt7_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 4.U && valid - val vfrec7_inst = op.opff6.isOneOf(OPFFunct6.funary1) && op.rs1 === 5.U && valid - - io.hazard.valid := valid - io.hazard.bits.eg := op.wvd_eg - - io_fp_active := valid && issued - - // Create FPInput - val req = Wire(new FPInput) - req.ldst := false.B - req.wen := false.B - req.ren1 := true.B - req.ren2 := !(ctrl_funary0 || ctrl_funary1) - req.ren3 := false.B - req.swap12 := false.B - req.swap23 := false.B - req.typeTagIn := Mux1H(UIntToOH(op.rvs2_eew), Seq(S, H, S, D)) - req.typeTagOut := Mux1H(UIntToOH(op.rvd_eew), Seq(S, H, S, D)) - req.fromint := ctrl_inttofp - req.toint := (ctrl_fptoint) || ctrl_vfclass || ctrl.bool(WritesAsMask) - req.fastpipe := ctrl_fptofp || ctrl.bool(FPSgnj) || ctrl.bool(FPComp) - req.fma := false.B - req.div := ctrl_isDiv - req.sqrt := ctrl_funary1 && (rs1 === 0.U) - req.wflags := !ctrl_vfclass && !ctrl.bool(FPSgnj) - req.vec := true.B - req.rm := Mux(ctrl_fptofp && ctrl_round_to_odd, "b110".U, Mux(ctrl_fptoint && ctrl_truncating, 1.U, Mux((!ctrl_isDiv && !ctrl_funary1 && !ctrl_funary0) || ctrl_vfclass, ctrl.uint(FPSpecRM), op.frm))) - req.fmaCmd := 0.U - req.typ := Mux(ctrl_funary0, Cat((ctrl_inttofp && ctrl_narrow) || (ctrl_fptoint && ctrl_widen) || (ctrl_single_wide && vd_eew64), !ctrl_signed), 0.U) - req.fmt := 0.U - - val rvs2_elem = op.rvs2_elem - val rvs1_elem = op.rvs1_elem - val rvd_elem = op.rvd_elem - - val h_rvs2_int = rvs2_elem(15,0) - val h_rvs2_fp = FType.H.recode(Mux(ctrl_funary0 && ctrl_truncating, rvs2_elem(15,9) << 9, rvs2_elem(15,0))) - val h_rvs2_unbox = unbox(box(h_rvs2_fp, FType.H), H, None) - - val h_rvs1 = FType.H.recode(rvs1_elem(15,0)) - val h_rvs1_unbox = unbox(box(h_rvs1, FType.H), H, None) - val h_rvd = FType.H.recode(rvd_elem(15,0)) - - val s_rvs2_int = rvs2_elem(31,0) - val s_rvs2_fp = FType.S.recode(rvs2_elem(31,0)) - val s_rvs2_unbox = unbox(box(s_rvs2_fp, FType.S), S, None) - - val s_rvs1 = FType.S.recode(rvs1_elem(31,0)) - val s_rvs1_unbox = unbox(box(s_rvs1, FType.S), S, None) - val s_rvd = FType.S.recode(rvd_elem(31,0)) - - val d_rvs2_int = rvs2_elem - val d_rvs2_fp = FType.D.recode(rvs2_elem) - - val d_rvs1 = FType.D.recode(rvs1_elem) - val d_rvd = FType.D.recode(rvd_elem) - - val h_isNaN = FType.H.isNaN(h_rvs2_fp) || FType.H.isNaN(h_rvs1) - val s_isNaN = FType.S.isNaN(s_rvs2_fp) || FType.S.isNaN(s_rvs1) - val d_isNaN = FType.D.isNaN(d_rvs2_fp) || FType.D.isNaN(d_rvs1) - - val mgt_NaN = ctrl.bool(WritesAsMask) && ctrl.bool(FPMGT) && ((vd_eew64 && d_isNaN) || (vd_eew32 && s_isNaN) || (vd_eew16 && h_isNaN)) - val mgt_NaN_reg = RegInit(false.B) - - // Set req.in1 - when (ctrl_swap12) { - req.in1 := Mux(vd_eew64, d_rvs1, Mux(vd_eew32, s_rvs1_unbox, h_rvs1_unbox)) - } .elsewhen (ctrl_inttofp) { - req.in1 := rvs2_elem - } .otherwise { - req.in1 := Mux(vd_eew64 && (!ctrl_widen || (ctrl_funary0 && ctrl_narrow)), d_rvs2_fp, Mux(vd_eew32 && (!ctrl_widen || (ctrl_funary0 && ctrl_narrow)), s_rvs2_unbox, h_rvs2_unbox)) - } - - // Set req.in2 - when (ctrl_swap12) { - req.in2 := Mux(vd_eew64, d_rvs2_fp, Mux(vd_eew32, s_rvs2_unbox, h_rvs2_unbox)) - } .otherwise { - req.in2 := Mux(vd_eew64, d_rvs1, Mux(vd_eew32, s_rvs1_unbox, h_rvs1_unbox)) - } - - // Set req.in3 - req.in3 := 0.U - - fp_req.bits := req - fp_req.valid := valid && !issued && !vfrsqrt7_inst && !vfrec7_inst && !mgt_NaN - when (fp_req.fire) { issued := true.B } - - // Approximation Instructions - - // Reciprocal Sqrt Approximation - val recSqrt7 = Module(new VFRSQRT7) - recSqrt7.io.rvs2_input := rvs2_elem - recSqrt7.io.eew := op.rvs2_eew - - // Reciprocal Approximation - val rec7 = Module(new VFREC7) - rec7.io.rvs2_input := rvs2_elem - rec7.io.eew := op.rvs2_eew - rec7.io.frm := op.frm - - when (io_fp_resp.valid) { - has_wdata := true.B - when (ctrl.bool(WritesAsMask)) { - when (ctrl.bool(FPMNE) || (ctrl.bool(FPMGT) && !mgt_NaN)) { - wdata := Fill(dLen, !io_fp_resp.bits.data(0)) - } .elsewhen (ctrl.bool(FPMGT) && mgt_NaN) { - wdata := Fill(dLen, 0.U) - } .otherwise { - wdata := Fill(dLen, io_fp_resp.bits.data(0)) - } - } .elsewhen (vfclass_inst) { - wdata := Mux(vd_eew64, Cat(0.U(54.W), io_fp_resp.bits.data(9,0)), Fill(2, Cat(0.U(22.W), io_fp_resp.bits.data(9,0)))) - } .elsewhen (ctrl_fptoint) { - wdata := Mux(vd_eew64, io_fp_resp.bits.data(63,0), Fill(2, io_fp_resp.bits.data(31,0))) - } .otherwise { - wdata := Mux(vd_eew64, FType.D.ieee(io_fp_resp.bits.data), Mux(vd_eew32, Fill(2, FType.S.ieee(unbox(io_fp_resp.bits.data, 0.U, Some(FType.S)))), - Fill(4, FType.H.ieee(unbox(io_fp_resp.bits.data, H, Some(FType.H)))))) - } - } - - val mask_write_offset = VecInit.tabulate(4)({ eew => - Cat(op.eidx(log2Ceil(dLen)-1, dLenOffBits-eew), 0.U((dLenOffBits-eew).W)) - })(op.vd_eew) - val mask_write_mask = (VecInit.tabulate(4)({ eew => - VecInit(op.wmask.asBools.grouped(1 << eew).map(_.head).toSeq).asUInt - })(op.vd_eew) << mask_write_offset)(dLen-1,0) - - io.write.valid := (has_wdata || vfrsqrt7_inst || vfrec7_inst || mgt_NaN) && valid - io.write.bits.eg := op.wvd_eg - io.write.bits.mask := Mux(ctrl.bool(WritesAsMask), mask_write_mask, FillInterleaved(8, op.wmask)) - io.write.bits.data := Mux1H(Seq(vfrsqrt7_inst, vfrec7_inst, has_wdata), - Seq(Fill(dLenB >> 3, recSqrt7.io.out), Fill(dLenB >> 3, rec7.io.out), Fill(dLenB >> 3, wdata))) - - last := io.write.fire - - io.set_fflags := DontCare - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - io.set_vxsat := false.B - - io.acc := op.acc - io.tail := op.tail -} diff --git a/arch/src/main/scala/framework/gendomain/exu/int/BitwisePipe.scala b/arch/src/main/scala/framework/gendomain/exu/int/BitwisePipe.scala deleted file mode 100644 index c4cb9e06..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/int/BitwisePipe.scala +++ /dev/null @@ -1,52 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case object BitwisePipeFactory extends FunctionalUnitFactory { - def insns = Seq( - AND.VV, AND.VX, AND.VI, OR.VV, OR.VX, OR.VI, XOR.VV, XOR.VX, XOR.VI, - MANDNOT.VV, MAND.VV, MOR.VV, MXOR.VV, MORNOT.VV, MNAND.VV, MNOR.VV, MXNOR.VV, - REDAND.VV, REDOR.VV, REDXOR.VV, - // Zvbb - ANDN.VV, ANDN.VX - ) - def generate(implicit p: Parameters) = new BitwisePipe()(p) -} - -class BitwisePipe(implicit p: Parameters) extends PipelinedFunctionalUnit(1)(p) { - val supported_insns = BitwisePipeFactory.insns - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, supported_insns, - Seq(BWAnd, BWOr, BWXor, BWInvOut, BWInv1)) - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - - val in1 = Mux(ctrl.bool(BWInv1), ~io.pipe(0).bits.rvs1_data, io.pipe(0).bits.rvs1_data) - val in2 = io.pipe(0).bits.rvs2_data - val op = Mux1H(Seq( - (ctrl.bool(BWAnd), (in1 & in2)), - (ctrl.bool(BWOr) , (in1 | in2)), - (ctrl.bool(BWXor), (in1 ^ in2)) - )) - val out = Mux(ctrl.bool(BWInvOut), ~op, op) - - io.pipe0_stall := false.B - io.write.valid := io.pipe(0).valid - io.write.bits.eg := io.pipe(0).bits.wvd_eg - io.write.bits.mask := Mux(io.pipe(0).bits.isOpm && !io.pipe(0).bits.acc, - io.pipe(0).bits.full_tail_mask, - FillInterleaved(8, io.pipe(0).bits.wmask)) - io.write.bits.data := out - - io.set_vxsat := false.B - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare -} diff --git a/arch/src/main/scala/framework/gendomain/exu/int/ElementwiseMultiplyPipe.scala b/arch/src/main/scala/framework/gendomain/exu/int/ElementwiseMultiplyPipe.scala deleted file mode 100644 index 5e85599a..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/int/ElementwiseMultiplyPipe.scala +++ /dev/null @@ -1,68 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class ElementwiseMultiplyPipe(depth: Int)(implicit p: Parameters) extends PipelinedFunctionalUnit(depth)(p) { - val supported_insns = IntegerMultiplyFactory(depth, false).insns - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - io.set_vxsat := false.B - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, supported_insns, Seq( - MULHi, MULSign1, MULSign2, MULSwapVdV2, MULAccumulate, MULSub)) - - val in_eew = io.pipe(0).bits.rvs1_eew - val eidx = io.pipe(0).bits.eidx - - val in_vs1 = Mux(ctrl.bool(MULSign1), sextElem(io.pipe(0).bits.rvs1_elem, in_eew), io.pipe(0).bits.rvs1_elem) - val in_vs2 = Mux(ctrl.bool(MULSign2), sextElem(io.pipe(0).bits.rvs2_elem, in_eew), io.pipe(0).bits.rvs2_elem) - val in_vd = io.pipe(0).bits.rvd_elem - - val prod = in_vs1.asSInt * Mux(ctrl.bool(MULSwapVdV2), in_vd, in_vs2).asSInt - ///////////////// pipe - val prod_pipe = Pipe(io.pipe(0).valid, prod, depth-2).bits - val in_vs2_pipe = Pipe(io.pipe(0).valid, in_vs2, depth-2).bits - val in_vd_pipe = Pipe(io.pipe(0).valid, in_vd, depth-2).bits - val ctrl_MULSub = Pipe(io.pipe(0).valid, ctrl.bool(MULSub), depth-2).bits - val ctrl_MULSwapVdV2 = Pipe(io.pipe(0).valid, ctrl.bool(MULSwapVdV2), depth-2).bits - val ctrl_MULAccumulate = Pipe(io.pipe(0).valid, ctrl.bool(MULAccumulate), depth-2).bits - val ctrl_MULHi = Pipe(io.pipe(0).valid, ctrl.bool(MULHi), depth-2).bits - val ctrl_smul = io.pipe(depth-2).bits.isOpi - val out_eew = io.pipe(depth-2).bits.vd_eew - - val hi = VecInit.tabulate(4)({ eew => prod_pipe >> (8 << eew) })(out_eew)(63,0) - val lo = VecInit.tabulate(4)({ eew => prod_pipe((8 << eew)-1,0)})(out_eew)(63,0) - val madd = Mux(ctrl_MULSub, ~lo, lo) + ctrl_MULSub + Mux(ctrl_MULSwapVdV2, in_vs2_pipe, in_vd_pipe) - val rounding_incr = VecInit.tabulate(4)({ eew => RoundingIncrement(io.pipe(depth-2).bits.vxrm, prod_pipe((8 << eew)-1,0)) })(out_eew) - val smul = VecInit.tabulate(4)({ eew => prod_pipe >> ((8 << eew) - 1) })(out_eew) + Cat(0.U(1.W), rounding_incr).asSInt - val smul_clip_neg = VecInit.tabulate(4)({ eew => (-1 << ((8 << eew)-1)).S })(out_eew) - val smul_clip_pos = VecInit.tabulate(4)({ eew => ((1 << ((8 << eew)-1)) - 1).S })(out_eew) - val smul_clip_hi = smul > smul_clip_pos - val smul_clip_lo = smul < smul_clip_neg - val smul_clipped = Mux(smul_clip_hi, smul_clip_pos, 0.S) | Mux(smul_clip_lo, smul_clip_neg, 0.S) | Mux(!smul_clip_hi && !smul_clip_lo, smul, 0.S) - val smul_sat = smul_clip_hi || smul_clip_lo - val out = Mux(ctrl_MULAccumulate, madd, 0.U) | Mux(ctrl_smul, smul_clipped.asUInt, 0.U) | Mux(!ctrl_MULAccumulate && !ctrl_smul, Mux(ctrl_MULHi, hi, lo), 0.U) - - ///////////////// pipe - val pipe_out = Pipe(io.pipe(depth-2).valid, out(63,0), 1).bits - val pipe_vxsat = Pipe(io.pipe(depth-2).valid, smul_sat && ctrl_smul, 1).bits - val wdata = VecInit.tabulate(4)({ eew => Fill(dLenB >> eew, pipe_out((8< Fill(dLenB >> eew, write_elem((8< prod >> (8 << eew) })(op.vd_eew)(63,0) - val lo = VecInit.tabulate(4)({ eew => prod((8 << eew)-1,0)})(op.vd_eew)(63,0) - val madd = Mux(mul_ctrl.bool(MULSub), ~lo, lo) + mul_ctrl.bool(MULSub) + Mux(mul_ctrl.bool(MULSwapVdV2), - op.rvs2_elem, op.rvd_elem) - val rounding_incr = VecInit.tabulate(4)({ eew => RoundingIncrement(op.vxrm, prod((8 << eew)-1,0)) })(op.vd_eew) - val smul = VecInit.tabulate(4)({ eew => prod >> ((8 << eew) - 1) })(op.vd_eew) + Cat(0.U(1.W), rounding_incr).asSInt - val smul_clip_neg = VecInit.tabulate(4)({ eew => (-1 << ((8 << eew)-1)).S })(op.vd_eew) - val smul_clip_pos = VecInit.tabulate(4)({ eew => ((1 << ((8 << eew)-1)) - 1).S })(op.vd_eew) - val smul_clip_hi = smul > smul_clip_pos - val smul_clip_lo = smul < smul_clip_neg - val smul_clipped = Mux(smul_clip_hi, smul_clip_pos, 0.S) | Mux(smul_clip_lo, smul_clip_neg, 0.S) | Mux(!smul_clip_hi && !smul_clip_lo, smul, 0.S) - val smul_sat = smul_clip_hi || smul_clip_lo - val mul_wdata = WireInit(Mux(mul_ctrl.bool(MULHi), hi, lo)) - when (mul_ctrl.bool(MULAccumulate) || is_smul) { - mul_wdata := Mux(mul_ctrl.bool(MULAccumulate), madd, 0.U) | Mux(is_smul, smul_clipped.asUInt, 0.U) - - } - when (mul_ctrl.matched) { - write_elem := mul_wdata - io.set_vxsat := is_smul && smul_sat && io.write.fire && op.wmask =/= 0.U - } - } - - div.io.resp.ready := io.write.ready - io.write.valid := div.io.resp.valid - io.write.bits.eg := op.wvd_eg - io.write.bits.mask := FillInterleaved(8, op.wmask) - io.write.bits.data := wdata - - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare - - last := io.write.fire - - io.acc := false.B - io.tail := false.B -} diff --git a/arch/src/main/scala/framework/gendomain/exu/int/IntegerPipe.scala b/arch/src/main/scala/framework/gendomain/exu/int/IntegerPipe.scala deleted file mode 100644 index 12470655..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/int/IntegerPipe.scala +++ /dev/null @@ -1,386 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class AdderArray(dLenB: Int) extends Module { - val io = IO(new Bundle { - val in1 = Input(Vec(dLenB, UInt(8.W))) - val in2 = Input(Vec(dLenB, UInt(8.W))) - val incr = Input(Vec(dLenB, Bool())) - val mask_carry = Input(UInt(dLenB.W)) - - val signed = Input(Bool()) - val eew = Input(UInt(2.W)) - val avg = Input(Bool()) - val rm = Input(UInt(2.W)) - val sub = Input(Bool()) - val cmask = Input(Bool()) - - val out = Output(Vec(dLenB, UInt(8.W))) - val carry = Output(Vec(dLenB, Bool())) - }) - - val use_carry = VecInit.tabulate(4)({ eew => - Fill(dLenB >> eew, ~(1.U((1 << eew).W))) - })(io.eew) - val carry_clear = Mux(io.avg, use_carry.asBools.map(Cat(~(0.U(8.W)), _)).asUInt, ~(0.U(73.W))) - val carry_restore = Mux(io.avg, use_carry.asBools.map(Cat(0.U(8.W), _)).asUInt, 0.U(73.W)) - - val avg_in1 = VecInit.tabulate(4) { eew => - VecInit(io.in1.asTypeOf(Vec(dLenB >> eew, UInt((8 << eew).W))).map(e => Cat(io.signed && e((8<> 1)).asUInt - }(io.eew).asTypeOf(Vec(dLenB, UInt(8.W))) - val avg_in2 = VecInit.tabulate(4) { eew => - VecInit(io.in2.asTypeOf(Vec(dLenB >> eew, UInt((8 << eew).W))).map(e => Cat(io.signed && e((8<> 1)).asUInt - }(io.eew).asTypeOf(Vec(dLenB, UInt(8.W))) - - val in1 = Mux(io.avg, avg_in1, io.in1) - val in2 = Mux(io.avg, avg_in2, io.in2) - - for (i <- 0 until (dLenB >> 3)) { - val h = (i+1)*8-1 - val l = i*8 - val io_in1_slice = io.in1.slice(l,h+1) - val io_in2_slice = io.in2.slice(l,h+1) - val in1_slice = in1.slice(l,h+1) - val in2_slice = in2.slice(l,h+1) - val use_carry_slice = use_carry(h,l).asBools - val mask_carry_slice = io.mask_carry(h,l).asBools - val incr_slice = io.incr.slice(l,h+1) - - val in1_dummy_bits = (io_in1_slice - .zip(io_in2_slice) - .zip(use_carry_slice) - .zip(mask_carry_slice).map { case(((i1, i2), carry), mask_bit) => { - val avg_bit = ((io.sub ^ i1(0)) & i2(0)) | (((io.sub ^ i1(0)) ^ i2(0)) & io.sub) - val bit = (!io.cmask & io.sub) | (io.cmask & (io.sub ^ mask_bit)) - Mux(carry, 1.U(1.W), Mux(io.avg, avg_bit, bit)) - }}) - val in2_dummy_bits = (io_in1_slice - .zip(io_in2_slice) - .zip(use_carry_slice) - .zip(mask_carry_slice).map { case(((i1, i2), carry), mask_bit) => { - val avg_bit = ((io.sub ^ i1(0)) & i2(0)) | (((io.sub ^ i1(0)) ^ i2(0)) & io.sub) - val bit = (!io.cmask & io.sub) | (io.cmask & (io.sub ^ mask_bit)) - Mux(carry, 0.U(1.W), Mux(io.avg, avg_bit, bit)) - }}) - val round_incrs = (io_in1_slice - .zip(io_in2_slice) - .zipWithIndex.map { case((l, r), i) => { - val sum = r(1,0) +& ((l(1,0) ^ Fill(2, io.sub)) +& io.sub) - Cat(0.U(7.W), Cat(Mux(io.avg, RoundingIncrement(io.rm, sum(1), sum(0), None) & !use_carry_slice(i), 0.U), 0.U(1.W))) - }} - .asUInt) - - - val in1_constructed = in1_slice.zip(in1_dummy_bits).map { case(i1, dummy_bit) => (i1 ^ Fill(8, io.sub)) ## dummy_bit }.asUInt - val in2_constructed = in2_slice.zip(in2_dummy_bits).map { case(i2, dummy_bit) => i2 ## dummy_bit }.asUInt - - val incr_constructed = incr_slice.zip(use_carry_slice).map { case(incr, masking) => Cat(0.U(7.W), Cat(Mux(!masking, incr, 0.U(1.W)), 0.U(1.W))) }.asUInt - - val sum = (((in1_constructed +& in2_constructed) & carry_clear) | carry_restore) +& round_incrs +& incr_constructed - - for (j <- 0 until 8) { - io.out((i*8) + j) := sum(((j+1)*9)-1, (j*9) + 1) - io.carry((i*8) + j) := sum((j+1)*9) - } - } -} - -class CompareArray(dLenB: Int) extends Module { - val io = IO(new Bundle { - val in1 = Input(Vec(dLenB, UInt(8.W))) - val in2 = Input(Vec(dLenB, UInt(8.W))) - val eew = Input(UInt(2.W)) - val signed = Input(Bool()) - val less = Input(Bool()) - val sle = Input(Bool()) - val inv = Input(Bool()) - - val minmax = Output(UInt(dLenB.W)) - val result = Output(UInt(dLenB.W)) - }) - - val eq = io.in2.zip(io.in1).map { x => x._1 === x._2 } - val lt = io.in2.zip(io.in1).map { x => x._1 < x._2 } - - val minmax_bits = Wire(Vec(4, UInt(dLenB.W))) - val result_bits = Wire(Vec(4, UInt(dLenB.W))) - - io.minmax := minmax_bits(io.eew) - io.result := result_bits(io.eew) - - for (eew <- 0 until 4) { - val lts = lt.grouped(1 << eew) - val eqs = eq.grouped(1 << eew) - val bits = VecInit(lts.zip(eqs).zipWithIndex.map { case ((e_lts, e_eqs), i) => - val eq = e_eqs.andR - val in1_hi = io.in1((i+1)*(1< l || (e && p) } - Mux(io.less, lt || (io.sle && eq), io.inv ^ eq) - }.toSeq).asUInt - minmax_bits(eew) := FillInterleaved(1 << eew, bits) - result_bits(eew) := Fill(1 << eew, bits) - } -} - -class SaturatedSumArray(dLenB: Int) extends Module { - val dLen = dLenB * 8 - val io = IO(new Bundle { - val sum = Input(Vec(dLenB, UInt(8.W))) - val carry = Input(Vec(dLenB, Bool())) - val in1_sign = Input(Vec(dLenB, Bool())) - val in2_sign = Input(Vec(dLenB, Bool())) - val sub = Input(Bool()) - val eew = Input(UInt(2.W)) - val signed = Input(Bool()) - - val set_vxsat = Output(UInt(dLenB.W)) - val out = Output(Vec(dLenB, UInt(8.W))) - }) - - val unsigned_mask = VecInit.tabulate(4)({ eew => - FillInterleaved(1 << eew, VecInit.tabulate(dLenB >> eew)(i => io.sub ^ io.carry(((i+1) << eew)-1)).asUInt) - })(io.eew) - val unsigned_clip = Mux(io.sub, 0.U(dLen.W), ~(0.U(dLen.W))).asTypeOf(Vec(dLenB, UInt(8.W))) - - val (signed_masks, signed_clips): (Seq[UInt], Seq[UInt]) = Seq.tabulate(4)({ eew => - val out_sign = VecInit.tabulate(dLenB >> eew)(i => io.sum(((i+1)<> eew)(i => io.in2_sign(((i+1)<> eew)(i => io.in1_sign(((i+1)< Mux(sign, clip_neg, clip_pos))).asUInt - (FillInterleaved((1 << eew), clip), clip_value) - }).unzip - val signed_mask = VecInit(signed_masks)(io.eew) - val signed_clip = VecInit(signed_clips)(io.eew).asTypeOf(Vec(dLenB, UInt(8.W))) - - val mask = Mux(io.signed, signed_mask, unsigned_mask) - val clip = Mux(io.signed, signed_clip, unsigned_clip) - io.out := io.sum.zipWithIndex.map { case (o,i) => Mux(mask(i), clip(i), o) } - io.set_vxsat := mask -} - -case object IntegerPipeFactory extends FunctionalUnitFactory { - def insns = Seq( - ADD.VV, ADD.VX, ADD.VI, SUB.VV, SUB.VX, RSUB.VX, RSUB.VI, - WADDU.VV, WADDU.VX, WADD.VV, WADD.VX, WSUBU.VV, WSUBU.VX, WSUB.VV, WSUB.VX, - WADDUW.VV, WADDUW.VX, WADDW.VV, WADDW.VX, WSUBUW.VV, WSUBUW.VX, WSUBW.VV, WSUBW.VX, - ADC.VV, ADC.VX, ADC.VI, MADC.VV, MADC.VX, MADC.VI, - SBC.VV, SBC.VX, MSBC.VV, MSBC.VX, - NEXT.VV, - MSEQ.VV, MSEQ.VX, MSEQ.VI, MSNE.VV, MSNE.VX, MSNE.VI, - MSLTU.VV, MSLTU.VX, MSLT.VV, MSLT.VX, - MSLEU.VV, MSLEU.VX, MSLEU.VI, MSLE.VV, MSLE.VX, MSLE.VI, - MSGTU.VX, MSGTU.VI, MSGT.VX, MSGT.VI, - MINU.VV, MINU.VX, MIN.VV, MIN.VX, - MAXU.VV, MAXU.VX, MAX.VV, MAX.VX, - MERGE.VV, MERGE.VX, MERGE.VI, - SADDU.VV, SADDU.VX, SADDU.VI, SADD.VV, SADD.VX, SADD.VI, - SSUBU.VV, SSUBU.VX, SSUB.VV, SSUB.VX, - AADDU.VV, AADDU.VX, AADD.VV, AADD.VX, - ASUBU.VV, ASUBU.VX, ASUB.VV, ASUB.VX, - REDSUM.VV, WREDSUM.VV, WREDSUMU.VV, - REDMINU.VV, REDMIN.VV, REDMAXU.VV, REDMAX.VV, - FMERGE.VF, - // zvbb - BREV8.VV, BREV.VV, REV8.VV, CLZ.VV, CTZ.VV, CPOP.VV - ) - def generate(implicit p: Parameters) = new IntegerPipe()(p) -} - -class IntegerPipe(implicit p: Parameters) extends PipelinedFunctionalUnit(1)(p) { - val supported_insns = IntegerPipeFactory.insns - - val rvs1_eew = io.pipe(0).bits.rvs1_eew - val rvs2_eew = io.pipe(0).bits.rvs2_eew - val vd_eew = io.pipe(0).bits.vd_eew - - val ctrl = new VectorDecoder( - io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, io.pipe(0).bits.rs1, io.pipe(0).bits.rs2, - supported_insns, - Seq(UsesCmp, UsesNarrowingSext, UsesMinMax, UsesMerge, UsesSat, - DoSub, WideningSext, Averaging, - CarryIn, AlwaysCarryIn, CmpLess, Swap12, WritesAsMask, - UsesBitSwap, UsesCountZeros)) - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - - val carry_in = ctrl.bool(CarryIn) && (!io.pipe(0).bits.vm || ctrl.bool(AlwaysCarryIn)) - - val sat_signed = io.pipe(0).bits.funct6(0) - val sat_addu = io.pipe(0).bits.funct6(1,0) === 0.U - val sat_subu = io.pipe(0).bits.funct6(1,0) === 2.U - - val rvs1_bytes = io.pipe(0).bits.rvs1_data.asTypeOf(Vec(dLenB, UInt(8.W))) - val rvs2_bytes = io.pipe(0).bits.rvs2_data.asTypeOf(Vec(dLenB, UInt(8.W))) - - val in1_bytes = Mux(ctrl.bool(Swap12), rvs2_bytes, rvs1_bytes) - val in2_bytes = Mux(ctrl.bool(Swap12), rvs1_bytes, rvs2_bytes) - - val narrow_vs1 = narrow2_expand(rvs1_bytes, rvs1_eew, - (io.pipe(0).bits.eidx >> (dLenOffBits.U - vd_eew))(0), - ctrl.bool(WideningSext)) - val narrow_vs2 = narrow2_expand(rvs2_bytes, rvs2_eew, - (io.pipe(0).bits.eidx >> (dLenOffBits.U - vd_eew))(0), - ctrl.bool(WideningSext)) - - val add_mask_carry = VecInit.tabulate(4)({ eew => - VecInit((0 until dLenB >> eew).map { i => io.pipe(0).bits.rmask(i) | 0.U((1 << eew).W) }).asUInt - })(rvs2_eew) - val add_carry = Wire(Vec(dLenB, UInt(1.W))) - val add_out = Wire(Vec(dLenB, UInt(8.W))) - - val merge_mask = VecInit.tabulate(4)({eew => FillInterleaved(1 << eew, io.pipe(0).bits.rmask((dLenB >> eew)-1,0))})(rvs2_eew) - val merge_out = VecInit((0 until dLenB).map { i => Mux(merge_mask(i), rvs1_bytes(i), rvs2_bytes(i)) }).asUInt - - val carryborrow_res = VecInit.tabulate(4)({ eew => - Fill(1 << eew, VecInit(add_carry.grouped(1 << eew).map(_.last).toSeq).asUInt) - })(rvs1_eew) - - val adder_arr = Module(new AdderArray(dLenB)) - adder_arr.io.in1 := Mux(rvs1_eew < vd_eew, narrow_vs1, in1_bytes) - adder_arr.io.in2 := Mux(rvs2_eew < vd_eew, narrow_vs2, in2_bytes) - adder_arr.io.incr.foreach(_ := false.B) - adder_arr.io.avg := ctrl.bool(Averaging) - adder_arr.io.eew := vd_eew - adder_arr.io.rm := io.pipe(0).bits.vxrm - adder_arr.io.mask_carry := add_mask_carry - adder_arr.io.sub := ctrl.bool(DoSub) - adder_arr.io.cmask := carry_in - adder_arr.io.signed := io.pipe(0).bits.funct6(0) - add_out := adder_arr.io.out - add_carry := adder_arr.io.carry - - val cmp_arr = Module(new CompareArray(dLenB)) - cmp_arr.io.in1 := in1_bytes - cmp_arr.io.in2 := in2_bytes - cmp_arr.io.eew := rvs1_eew - cmp_arr.io.signed := io.pipe(0).bits.funct6(0) - cmp_arr.io.less := ctrl.bool(CmpLess) - cmp_arr.io.sle := io.pipe(0).bits.funct6(2,1) === 2.U - cmp_arr.io.inv := io.pipe(0).bits.funct6(0) - val minmax_out = VecInit(rvs1_bytes.zip(rvs2_bytes).zip(cmp_arr.io.minmax.asBools).map { case ((v1, v2), s) => Mux(s, v2, v1) }).asUInt - - val mask_out = Fill(8, Mux(ctrl.bool(UsesCmp), cmp_arr.io.result, carryborrow_res ^ Fill(dLenB, ctrl.bool(DoSub)))) - - val sat_arr = Module(new SaturatedSumArray(dLenB)) - sat_arr.io.sum := add_out - sat_arr.io.carry := add_carry - sat_arr.io.in1_sign := rvs1_bytes.map(_(7)) - sat_arr.io.in2_sign := rvs2_bytes.map(_(7)) - sat_arr.io.sub := ctrl.bool(DoSub) - sat_arr.io.eew := vd_eew - sat_arr.io.signed := io.pipe(0).bits.funct6(0) - val sat_out = sat_arr.io.out.asUInt - - val narrowing_ext_eew_mul = io.pipe(0).bits.vd_eew - rvs2_eew - val narrowing_ext_in = (1 until 4).map { m => - val w = dLen >> m - val in = Wire(UInt(w.W)) - val in_mul = io.pipe(0).bits.rvs2_data.asTypeOf(Vec(1 << m, UInt(w.W))) - val sel = (io.pipe(0).bits.eidx >> (dLenOffBits.U - vd_eew))(m-1,0) - in := in_mul(sel) - in - } - val narrowing_ext_out = Mux1H((1 until 4).map { eew => (0 until eew).map { vs2_eew => - (vd_eew === eew.U && rvs2_eew === vs2_eew.U) -> { - val mul = eew - vs2_eew - val in = narrowing_ext_in(mul-1).asTypeOf(Vec(dLenB >> eew, UInt((8 << vs2_eew).W))) - val out = Wire(Vec(dLenB >> eew, UInt((8 << eew).W))) - out.zip(in).foreach { case (l, r) => l := Cat( - Fill((8 << eew) - (8 << vs2_eew), io.pipe(0).bits.rs1(0) && r((8 << vs2_eew)-1)), - r) - } - out.asUInt - } - }}.flatten) - - val brev_bytes = VecInit(in2_bytes.map(b => Reverse(b))).asUInt - val brev_elements = VecInit((0 until 4).map { eew => - VecInit(in2_bytes.asTypeOf(Vec(dLenB >> eew, UInt((8 << eew).W))).map(b => Reverse(b))).asUInt - })(vd_eew) - val rev8_elements = VecInit((0 until 4).map { eew => - VecInit(in2_bytes.asTypeOf(Vec(dLenB >> eew, Vec(1 << eew, UInt(8.W)))).map(b => VecInit(b.reverse))).asUInt - })(vd_eew) - val swap_out = Mux1H(Seq( - (io.pipe(0).bits.rs1(1,0) === 0.U) -> brev_bytes, - (io.pipe(0).bits.rs1(1,0) === 1.U) -> rev8_elements, - (io.pipe(0).bits.rs1(1,0) === 2.U) -> brev_elements - )) - - val tz_in = Mux(io.pipe(0).bits.rs1(0), in2_bytes, brev_elements.asTypeOf(Vec(dLenB, UInt(8.W)))) - val tz_8b = tz_in.map(b => (b === 0.U, (PriorityEncoderOH(1.U ## b) - 1.U)(7,0))) - val tz_16b = tz_8b.grouped(2).toSeq.map(t => - (t.map(_._1).andR, Mux(t(0)._1, t(1)._2 ## ~(0.U(8.W)), t(0)._2)) - ) - val tz_32b = tz_16b.grouped(2).toSeq.map(t => - (t.map(_._1).andR, Mux(t(0)._1, t(1)._2 ## ~(0.U(16.W)), t(0)._2)) - ) - val tz_64b = tz_32b.grouped(2).toSeq.map(t => - (t.map(_._1).andR, Mux(t(0)._1, t(1)._2 ## ~(0.U(32.W)), t(0)._2)) - ) - val tz_out = WireInit(VecInit( - VecInit(tz_8b.map(_._2)).asUInt, - VecInit(tz_16b.map(_._2)).asUInt, - VecInit(tz_32b.map(_._2)).asUInt, - VecInit(tz_64b.map(_._2)).asUInt - )(vd_eew).asTypeOf(Vec(dLenB, UInt(8.W)))) - - val cpop_in = Mux(io.pipe(0).bits.rs1(1), in2_bytes, tz_out) - val cpop_8b = cpop_in.map(b => PopCount(b)) - val cpop_16b = cpop_8b.grouped(2).toSeq.map(_.reduce(_ +& _)) - val cpop_32b = cpop_16b.grouped(2).toSeq.map(_.reduce(_ +& _)) - val cpop_64b = cpop_32b.grouped(2).toSeq.map(_.reduce(_ +& _)) - val cpops = Seq(cpop_8b, cpop_16b, cpop_32b, cpop_64b) - val count_out = WireInit(VecInit((0 until 4).map { eew => - val out = Wire(Vec(dLenB >> eew, UInt((8 << eew).W))) - out := VecInit(cpops(eew)) - out.asUInt - })(vd_eew)) - - val outs = Seq( - (ctrl.bool(UsesNarrowingSext) , narrowing_ext_out), - (ctrl.bool(WritesAsMask) , mask_out), - (ctrl.bool(UsesMinMax) , minmax_out), - (ctrl.bool(UsesMerge) , merge_out), - (ctrl.bool(UsesSat) , sat_out), - (ctrl.bool(UsesBitSwap) , swap_out), - (ctrl.bool(UsesCountZeros) , count_out) - ) - val out = Mux(outs.map(_._1).orR, Mux1H(outs), add_out.asUInt) - - val mask_write_offset = VecInit.tabulate(4)({ eew => - Cat(io.pipe(0).bits.eidx(log2Ceil(dLen)-1, dLenOffBits-eew), 0.U((dLenOffBits-eew).W)) - })(rvs1_eew) - val mask_write_mask = (VecInit.tabulate(4)({ eew => - VecInit(io.pipe(0).bits.wmask.asBools.grouped(1 << eew).map(_.head).toSeq).asUInt - })(rvs1_eew) << mask_write_offset)(dLen-1,0) - - io.pipe0_stall := false.B - io.write.valid := io.pipe(0).valid - io.write.bits.eg := io.pipe(0).bits.wvd_eg - io.write.bits.mask := Mux(ctrl.bool(WritesAsMask), mask_write_mask, FillInterleaved(8, io.pipe(0).bits.wmask)) - io.write.bits.data := out - - val sat_vxsat = Mux(ctrl.bool(UsesSat) , sat_arr.io.set_vxsat , 0.U) & io.pipe(0).bits.wmask - io.set_vxsat := io.pipe(0).valid && (sat_vxsat =/= 0.U) - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare -} diff --git a/arch/src/main/scala/framework/gendomain/exu/int/SegmentedMultiplyPipe.scala b/arch/src/main/scala/framework/gendomain/exu/int/SegmentedMultiplyPipe.scala deleted file mode 100644 index e7a404fa..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/int/SegmentedMultiplyPipe.scala +++ /dev/null @@ -1,272 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -case class IntegerMultiplyFactory(depth: Int, segmented: Boolean) extends FunctionalUnitFactory { - def base_insns = Seq( - MUL.VV, MUL.VX, MULH.VV, MULH.VX, - MULHU.VV, MULHU.VX, MULHSU.VV, MULHSU.VX, - WMUL.VV, WMUL.VX, WMULU.VV, WMULU.VX, - WMULSU.VV, WMULSU.VX, - MACC.VV, MACC.VX, NMSAC.VV, NMSAC.VX, - MADD.VV, MADD.VX, NMSUB.VV, NMSUB.VX, - WMACC.VV, WMACC.VX, WMACCU.VV, WMACCU.VX, - WMACCSU.VV , WMACCSU.VX, WMACCUS.VV, WMACCUS.VX, - SMUL.VV, SMUL.VX) - def insns = if (segmented) base_insns else base_insns.map(_.elementWise) - def generate(implicit p: Parameters) = if (segmented) { - new SegmentedMultiplyPipe(depth)(p) - } else { - new ElementwiseMultiplyPipe(depth)(p) - } -} - -class SegmentedMultiplyPipe(depth: Int)(implicit p: Parameters) extends PipelinedFunctionalUnit(depth)(p) { - val supported_insns = IntegerMultiplyFactory(depth, true).insns - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - io.set_vxsat := false.B - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - - val ctrl = new VectorDecoder(io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, supported_insns, Seq( - MULHi, MULSign1, MULSign2, MULSwapVdV2, MULAccumulate, MULSub)) - - val in_eew = io.pipe(0).bits.rvs1_eew - val out_eew = io.pipe(0).bits.vd_eew - - val in_vs1 = io.pipe(0).bits.rvs1_data - val in_vs2 = io.pipe(0).bits.rvs2_data - val in_vd = io.pipe(0).bits.rvd_data - - val mul_in1 = in_vs1 - val mul_in2 = Mux(ctrl.bool(MULSwapVdV2), in_vd, in_vs2) - - val multipliers = Seq.fill(dLenB >> 3)(Module(new MultiplyBlock)) - for (i <- 0 until (dLenB >> 3)) { - multipliers(i).io.in1_signed := ctrl.bool(MULSign1) - multipliers(i).io.in2_signed := ctrl.bool(MULSign2) - multipliers(i).io.eew := io.pipe(0).bits.rvs1_eew - multipliers(i).io.in1 := mul_in1.asTypeOf(Vec(dLenB >> 3, UInt(64.W)))(i) - multipliers(i).io.in2 := mul_in2.asTypeOf(Vec(dLenB >> 3, UInt(64.W)))(i) - } - val mul_out_comb = VecInit(multipliers.map(_.io.out_data)).asUInt - - //////////////////////////////////////////////////////////////////////////////////////////// - // Pipeline Stages Before Adder Array - //////////////////////////////////////////////////////////////////////////////////////////// - val mul_out = Pipe(io.pipe(0).valid, mul_out_comb, depth-2).bits - - val in_eew_pipe = io.pipe(depth-2).bits.rvs1_eew - val out_eew_pipe = io.pipe(depth-2).bits.vd_eew - val ctrl_wmul = out_eew_pipe > in_eew_pipe - val ctrl_smul = io.pipe(depth-2).bits.isOpi - val ctrl_MULSub = Pipe(io.pipe(0).valid, ctrl.bool(MULSub), depth-2).bits - val ctrl_MULSwapVdV2 = Pipe(io.pipe(0).valid, ctrl.bool(MULSwapVdV2), depth-2).bits - val ctrl_MULAccumulate = Pipe(io.pipe(0).valid, ctrl.bool(MULAccumulate), depth-2).bits - val ctrl_MULHi = Pipe(io.pipe(0).valid, ctrl.bool(MULHi), depth-2).bits - val in_vs2_pipe = io.pipe(depth-2).bits.rvs2_data - val in_vd_pipe = io.pipe(depth-2).bits.rvd_data - //////////////////////////////////////////////////////////////////////////////////////////// - - val hi = VecInit.tabulate(4)({sew => - VecInit(mul_out.asTypeOf(Vec((2*dLenB) >> sew, UInt((8 << sew).W))).grouped(2).map(_.last).toSeq).asUInt - })(in_eew_pipe) - val lo = VecInit.tabulate(4)({sew => - VecInit(mul_out.asTypeOf(Vec((2*dLenB) >> sew, UInt((8 << sew).W))).grouped(2).map(_.head).toSeq).asUInt - })(in_eew_pipe) - val half_sel = (io.pipe(depth-2).bits.eidx >> (dLenOffBits.U - out_eew_pipe))(0) - val wide = Mux(half_sel, mul_out >> dLen, mul_out)(dLen-1,0) - - val (smul_clipped, smul_sat) = { - val smul_arr = Module(new VectorSMul) - smul_arr.io.mul_in := mul_out - smul_arr.io.eew := out_eew_pipe - smul_arr.io.vxrm := io.pipe(depth-2).bits.vxrm - (smul_arr.io.clipped, smul_arr.io.sat) - } - - val adder_arr = Module(new AdderArray(dLenB)) - adder_arr.io.in1 := Mux(ctrl_wmul, wide, lo).asTypeOf(Vec(dLenB, UInt(8.W))) - adder_arr.io.in2 := Mux(ctrl_MULAccumulate, Mux(ctrl_MULSwapVdV2, in_vs2_pipe, in_vd_pipe), 0.U(dLen.W)).asTypeOf(Vec(dLenB, UInt(8.W))) - adder_arr.io.incr := VecInit.fill(dLenB)(false.B) - adder_arr.io.mask_carry := 0.U - adder_arr.io.signed := DontCare - adder_arr.io.eew := out_eew_pipe - adder_arr.io.avg := false.B - adder_arr.io.rm := DontCare - adder_arr.io.sub := ctrl_MULSub - adder_arr.io.cmask := false.B - - val add_out = adder_arr.io.out - - val out = Mux(ctrl_smul, smul_clipped, 0.U) | Mux(ctrl_MULHi, hi, 0.U) | Mux(!ctrl_smul && !ctrl_MULHi, add_out.asUInt, 0.U) - val pipe_out = Pipe(io.pipe(depth-2).valid, out, 1).bits - - val vxsat = Mux(ctrl_smul, smul_sat, 0.U) & io.pipe(depth-2).bits.wmask - val pipe_vxsat = Pipe(io.pipe(depth-2).valid, vxsat, 1).bits - - io.pipe0_stall := false.B - io.write.valid := io.pipe(depth-1).valid - io.write.bits.eg := io.pipe(depth-1).bits.wvd_eg - io.write.bits.data := pipe_out - io.write.bits.mask := FillInterleaved(8, io.pipe(depth-1).bits.wmask) - - io.set_vxsat := io.pipe(depth-1).valid && (pipe_vxsat =/= 0.U) - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare -} - -class MultiplyBlock extends Module { - val xLen = 64 - - val io = IO(new Bundle { - val in1_signed = Input(Bool()) - val in2_signed = Input(Bool()) - val eew = Input(UInt(2.W)) - - val in1 = Input(UInt(xLen.W)) - val in2 = Input(UInt(xLen.W)) - - val out_data = Output(UInt((2*xLen).W)) - }) - - val mul64 = Module(new Multiplier(64)) - mul64.io.in1_signed := io.in1_signed - mul64.io.in2_signed := io.in2_signed - mul64.io.in1 := io.in1 - mul64.io.in2 := io.in2 - - val mul32 = Module(new Multiplier(32)) - mul32.io.in1_signed := io.in1_signed - mul32.io.in2_signed := io.in2_signed - mul32.io.in1 := io.in1(63,32) - mul32.io.in2 := io.in2(63,32) - - val mul16 = Seq.tabulate(2) { i => - val indh = 32*(i+1) - 1 - val indl = 32*i + 16 - val in1 = io.in1(indh, indl) - val in2 = io.in2(indh, indl) - val mul = Module(new Multiplier(16)) - mul.io.in1_signed := io.in1_signed - mul.io.in2_signed := io.in2_signed - mul.io.in1 := in1 - mul.io.in2 := in2 - mul - } - val mul8 = Seq.tabulate(4) { i => - val indh = 16*(i+1) - 1 - val indl = 16*i + 8 - val in1 = io.in1(indh, indl) - val in2 = io.in2(indh, indl) - val mul = Module(new Multiplier(8)) - mul.io.in1_signed := io.in1_signed - mul.io.in2_signed := io.in2_signed - mul.io.in1 := in1 - mul.io.in2 := in2 - mul - } - when (io.eew === 0.U) { - mul16(1).io.in1 := Cat(Fill(8, io.in1_signed && io.in1(55)), io.in1(55, 48)) - mul16(1).io.in2 := Cat(Fill(8, io.in2_signed && io.in2(55)), io.in2(55, 48)) - mul32.io.in1 := Cat(Fill(8, io.in1_signed && io.in1(39)), io.in1(39, 32)) - mul32.io.in2 := Cat(Fill(8, io.in2_signed && io.in2(39)), io.in2(39, 32)) - mul16(0).io.in1 := Cat(Fill(8, io.in1_signed && io.in1(23)), io.in1(23, 16)) - mul16(0).io.in2 := Cat(Fill(8, io.in2_signed && io.in2(23)), io.in2(23, 16)) - mul64.io.in1 := Cat(Fill(8, io.in1_signed && io.in1(7)), io.in1(7,0)) - mul64.io.in2 := Cat(Fill(8, io.in2_signed && io.in2(7)), io.in2(7,0)) - - io.out_data := Cat(mul8(3).io.out_data, - mul16(1).io.out_data(15,0), - mul8(2).io.out_data, - mul32.io.out_data(15,0), - mul8(1).io.out_data, - mul16(0).io.out_data(15,0), - mul8(0).io.out_data, - mul64.io.out_data(15,0)) - } - .elsewhen (io.eew === 1.U) { - mul32.io.in1 := Cat(Fill(16, io.in1_signed && io.in1(47)), io.in1(47, 32)) - mul32.io.in2 := Cat(Fill(16, io.in2_signed && io.in2(47)), io.in2(47, 32)) - mul64.io.in1 := Cat(Fill(16, io.in1_signed && io.in1(15)), io.in1(15,0)) - mul64.io.in2 := Cat(Fill(16, io.in2_signed && io.in2(15)), io.in2(15,0)) - - io.out_data := Cat(mul16(1).io.out_data, - mul32.io.out_data(31,0), - mul16(0).io.out_data, - mul64.io.out_data(31,0)) - } - .elsewhen (io.eew === 2.U) { - mul64.io.in1 := Cat(Fill(32, io.in1_signed && io.in1(31)), io.in1(31,0)) - mul64.io.in2 := Cat(Fill(32, io.in2_signed && io.in2(31)), io.in2(31,0)) - - io.out_data := Cat(mul32.io.out_data, - mul64.io.out_data(63,0)) - } - .otherwise { - io.out_data := mul64.io.out_data - } -} - -class Multiplier(width: Int) extends Module { - val io = IO(new Bundle { - val in1_signed = Input(Bool()) - val in2_signed = Input(Bool()) - - val in1 = Input(UInt(width.W)) - val in2 = Input(UInt(width.W)) - - val out_data = Output(UInt((2*width).W)) - }) - - val lhs = Cat(io.in1_signed && io.in1(width-1), io.in1).asSInt - val rhs = Cat(io.in2_signed && io.in2(width-1), io.in2).asSInt - - val prod = lhs * rhs - - io.out_data := prod(2*width-1,0) -} - -class VectorSMul(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val eew = Input(UInt(2.W)) - val vxrm = Input(UInt(2.W)) - val mul_in = Input(UInt((2*dLen).W)) - - val clipped = Output(UInt((dLen).W)) - val sat = Output(UInt(dLenB.W)) - }) - val sat_sew = Wire(Vec(4, UInt(dLenB.W))) - val clipped_sew = Wire(Vec(4, UInt(dLen.W))) - for (sew <- 0 until 4) { - val wideProds = io.mul_in.asTypeOf(Vec(dLenB >> sew, SInt((16 << sew).W))) - val smul = wideProds.map { wideElem => - val rounding_incr = RoundingIncrement(io.vxrm, wideElem((8 << sew)-1, 0)) - (wideElem >> ((8 << sew) - 1)) + Cat(0.U(1.W), rounding_incr).asSInt - } - val clip_neg = (-1 << ((8 << sew)-1)).S - val clip_pos = ((1 << ((8 << sew)-1)) - 1).S - val clip_hi = smul.map{ _ > clip_pos } - val clip_lo = smul.map{ _ < clip_neg } - - clipped_sew(sew) := smul.zipWithIndex.map { case (sm, i) => - val long = Mux(clip_hi(i), clip_pos, 0.S) | Mux(clip_lo(i), clip_neg, 0.S) | Mux(!clip_hi(i) && !clip_lo(i), sm, 0.S) - long((8 << sew)-1, 0) - }.asUInt - val sat_vec_sew = smul.zipWithIndex.map { case (sm, i) => - clip_hi(i) || clip_lo(i) - } - sat_sew(sew) := FillInterleaved((1 << sew), sat_vec_sew) - } - - io.clipped := clipped_sew(io.eew) - io.sat := sat_sew(io.eew) -} diff --git a/arch/src/main/scala/framework/gendomain/exu/int/ShiftPipe.scala b/arch/src/main/scala/framework/gendomain/exu/int/ShiftPipe.scala deleted file mode 100644 index 251e2d9b..00000000 --- a/arch/src/main/scala/framework/gendomain/exu/int/ShiftPipe.scala +++ /dev/null @@ -1,240 +0,0 @@ -package framework.gendomain.exu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class ShiftBlock(w: Int) extends Module { - val io = IO(new Bundle { - val in = Input(UInt(w.W)) - val shamt = Input(UInt((1+log2Ceil(w)).W)) - val shl = Input(Bool()) - val sign = Input(Bool()) - val rm = Input(UInt(2.W)) - val out = Output(UInt(w.W)) - val round = Output(Bool()) - }) - val full_shifted = (Mux(io.shl, - Cat(false.B, Reverse(io.in), false.B), - Cat(io.sign, io.in, false.B)).asSInt >> io.shamt)(w,0).asUInt - - val shifted = full_shifted(w,1) - - io.out := Mux(io.shl, Reverse(shifted), shifted) - io.round := RoundingIncrement(io.rm, shifted(0), full_shifted(0), - Some(io.in & (((1.U << io.shamt) - 1.U) >> 1)(w-1,0))) -} - -class ShiftUnit extends Module { - val io = IO(new Bundle { - val in_eew = Input(UInt(2.W)) - val in = Input(UInt(64.W)) - val shamt = Input(UInt(64.W)) - val rot = Input(Bool()) - val shl = Input(Bool()) - val signed = Input(Bool()) - val rm = Input(UInt(2.W)) - - val out = Output(UInt(64.W)) - val round = Output(UInt(8.W)) - }) - - val shamt_mask = VecInit.tabulate(4)({eew => ~(0.U((log2Ceil(8) + eew).W))})(io.in_eew) - - val shift_out = Seq.tabulate(4) { sew => Wire(Vec(8 >> sew, UInt((8 << sew).W))) } - val round_out = Seq.tabulate(4) { sew => Wire(Vec(8 >> sew, UInt((1 << sew).W))) } - - def sextElem(in: UInt, sew: Int, sext: Bool): UInt = Cat(Fill(65 - (8 << sew), sext && in((8 << sew)-1)), in((8 << sew)-1,0))(63,0) - - for (i <- 0 until 8) { - val sews = (0 until 4).filter { sew => i < (8 >> sew) } - val shifter = Module(new ShiftBlock(8 << (sews.max))) - val rotator = Module(new ShiftBlock(8 << (sews.max))) - shifter.io.in := Mux1H(sews - .map { sew => (io.in_eew === sew.U, sextElem(io.in((i+1)*(8< (io.in_eew === sew.U, io.in((i+1)*(8< (io.in_eew === sew.U, io.shamt((i+1)*(8< (io.in_eew === sew.U, io.in((i+1)*(8< - shift_out(sew)(i) := shifter.io.out | Mux(io.rot, rotator.io.out, 0.U) - round_out(sew)(i) := shifter.io.round - } - } - io.out := Mux1H(UIntToOH(io.in_eew), shift_out.map(_.asUInt)) - io.round := Mux1H(UIntToOH(io.in_eew), round_out.map(_.asUInt)) -} - -class ShiftArray(dLenB: Int) extends Module { - val dLen = dLenB * 8 - val io = IO(new Bundle { - val in_eew = Input(UInt(2.W)) - val in = Input(UInt(dLen.W)) - val shamt = Input(UInt(dLen.W)) - val rori_hi = Input(Bool()) - val rot = Input(Bool()) - val shl = Input(Bool()) - val signed = Input(Bool()) - val scaling = Input(Bool()) - val rm = Input(UInt(2.W)) - val narrowing = Input(Bool()) - - val out = Output(Vec(dLenB, UInt(8.W))) - val set_vxsat = Output(UInt(dLenB.W)) - }) - - val shifted = Reg(Vec(dLenB, UInt(8.W))) - val rounding_incrs = Reg(Vec(dLenB, Bool())) - val in_eew_pipe = RegNext(io.in_eew) - val signed_pipe = RegNext(io.signed) - val scaling_pipe = RegNext(io.scaling) - val narrowing_pipe = RegNext(io.narrowing) - - for (i <- 0 until (dLenB >> 3)) { - val shifter = Module(new ShiftUnit) - shifter.io.in_eew := io.in_eew - shifter.io.in := io.in((i+1)*64-1,i*64) - shifter.io.shamt := io.shamt((i+1)*64-1,i*64) | Mux(io.rori_hi, 0x20.U, 0.U) - shifter.io.rot := io.rot - shifter.io.shl := io.shl - shifter.io.signed := io.signed - shifter.io.rm := io.rm - for (j <- 0 until 8) { - shifted(i*8+j) := shifter.io.out((j+1)*8-1,j*8) - rounding_incrs(i*8+j) := shifter.io.round(j) - } - } - - val scaling_array = Module(new AdderArray(dLenB)) - scaling_array.io.in1 := shifted - scaling_array.io.in2.foreach(_ := 0.U) - scaling_array.io.incr := Mux(scaling_pipe, rounding_incrs, VecInit.fill(dLenB)(false.B)) - scaling_array.io.signed := DontCare - scaling_array.io.eew := in_eew_pipe - scaling_array.io.avg := false.B - scaling_array.io.rm := DontCare - scaling_array.io.sub := false.B - scaling_array.io.cmask := false.B - scaling_array.io.mask_carry := DontCare - - val narrow_out_elems: Seq[Seq[UInt]] = Seq.tabulate(3)({eew => - scaling_array.io.out.grouped(2 << eew).map(e => VecInit(e.take(1 << eew)).asUInt).toSeq - }) - val narrow_out_his: Seq[Seq[UInt]] = Seq.tabulate(3)({eew => - scaling_array.io.out.grouped(2 << eew).map(e => VecInit(e.drop(1 << eew)).asUInt).toSeq - }) - val narrow_out_carries = Seq.tabulate(3)({eew => - scaling_array.io.carry.grouped(2 << eew).map(_.last).toSeq - }) - - val narrow_unsigned_mask = VecInit.tabulate(3)({ eew => - FillInterleaved(1 << eew, VecInit.tabulate(dLenB >> (eew + 1))(i => - Cat(narrow_out_carries(eew)(i), narrow_out_his(eew)(i)) =/= 0.U - ).asUInt) - })(in_eew_pipe - 1.U) - val narrow_unsigned_clip = (~(0.U((dLen >> 1).W))).asTypeOf(Vec(dLenB >> 1, UInt(8.W))) - - val (narrow_signed_masks, narrow_signed_clips): (Seq[UInt], Seq[UInt]) = Seq.tabulate(3)({ eew => - val signs = narrow_out_his(eew).map(_((8 << eew)-1)) - val his = narrow_out_his(eew).zip(narrow_out_elems(eew)).map({ case (h,e) => Cat(h((8 << eew)-2,0), e((8< s && h =/= ~0.U((8 << eew).W) }) - val clip_hi = signs.zip(his).map({ case (s,h) => !s && h =/= 0.U((8 << eew).W) }) - val clip_neg = Cat(1.U, 0.U(((8 << eew)-1).W)) - val clip_pos = ~clip_neg - val clip_value = VecInit(signs.map(s => Mux(s, clip_neg, clip_pos))).asUInt - val clip = clip_lo.zip(clip_hi).map(t => t._1 || t._2) - (FillInterleaved((1 << eew), clip), clip_value) - }).unzip - val narrow_signed_mask = VecInit(narrow_signed_masks)(in_eew_pipe - 1.U) - val narrow_signed_clip = VecInit(narrow_signed_clips)(in_eew_pipe - 1.U).asTypeOf(Vec(dLenB >> 1, UInt(8.W))) - - val narrow_mask = Mux(signed_pipe, narrow_signed_mask, narrow_unsigned_mask) - val narrow_clip = Mux(signed_pipe, narrow_signed_clip, narrow_unsigned_clip) - - val narrow_out_clipped = VecInit(narrow_out_elems.map(e => VecInit(e).asUInt))(in_eew_pipe - 1.U) - .asTypeOf(Vec(dLenB >> 1, UInt(8.W))) - .zip(narrow_mask.asBools) - .zip(narrow_clip).map ({ case ((o,s),c) => Mux(s && scaling_pipe, c, o) }) - val narrow_out = Fill(2, narrow_out_clipped.asUInt).asTypeOf(Vec(dLenB, UInt(8.W))) - - io.out := Mux(narrowing_pipe, narrow_out, scaling_array.io.out) - io.set_vxsat := Mux(narrowing_pipe && scaling_pipe, Fill(2, narrow_mask), 0.U) -} - -case object ShiftPipeFactory extends FunctionalUnitFactory { - def insns = Seq( - SLL.VV, SLL.VX, SLL.VI, SRL.VV, SRL.VX, SRL.VI, SRA.VV, SRA.VX, SRA.VI, - NSRA.VV, NSRA.VX, NSRA.VI, NSRL.VV, NSRL.VX, NSRL.VI, - NCLIPU.VV, NCLIPU.VX, NCLIPU.VI, NCLIP.VV, NCLIP.VX, NCLIP.VI, - SSRL.VV, SSRL.VX, SSRL.VI, SSRA.VV, SSRA.VX, SSRA.VI, - // Zvbb - ROL.VV, ROL.VX, ROR.VV, ROR.VX, ROR.VI, RORI.VI, WSLL.VV, WSLL.VX, WSLL.VI - ) - def generate(implicit p: Parameters) = new ShiftPipe()(p) -} - -class ShiftPipe(implicit p: Parameters) extends PipelinedFunctionalUnit(2)(p) { - val supported_insns = ShiftPipeFactory.insns - - val rvs1_eew = io.pipe(0).bits.rvs1_eew - val rvs2_eew = io.pipe(0).bits.rvs2_eew - val vd_eew = io.pipe(0).bits.vd_eew - - val ctrl = new VectorDecoder( - io.pipe(0).bits.funct3, io.pipe(0).bits.funct6, 0.U, 0.U, - supported_insns, - Seq(UsesShift, ShiftsLeft, ScalingShift)) - - io.iss.ready := new VectorDecoder(io.iss.op.funct3, io.iss.op.funct6, 0.U, 0.U, supported_insns, Nil).matched - - val shift_narrowing = vd_eew < rvs2_eew - val shift_widening = vd_eew > rvs2_eew - - val rvs1_bytes = io.pipe(0).bits.rvs1_data.asTypeOf(Vec(dLenB, UInt(8.W))) - val rvs2_bytes = io.pipe(0).bits.rvs2_data.asTypeOf(Vec(dLenB, UInt(8.W))) - val narrow_vs1 = narrow2_expand(rvs1_bytes, rvs1_eew, - (io.pipe(0).bits.eidx >> (dLenOffBits.U - Mux(shift_narrowing, rvs2_eew, vd_eew)))(0), false.B) - val narrow_vs2 = narrow2_expand(rvs2_bytes, rvs2_eew, - (io.pipe(0).bits.eidx >> (dLenOffBits.U - vd_eew))(0), false.B) - - val shift_arr = Module(new ShiftArray(dLenB)) - shift_arr.io.in_eew := Mux(shift_widening, vd_eew, rvs2_eew) - shift_arr.io.in := Mux(shift_widening, narrow_vs2, rvs2_bytes).asUInt - shift_arr.io.shamt := Mux(shift_narrowing || shift_widening, narrow_vs1, rvs1_bytes).asUInt - shift_arr.io.rori_hi := io.pipe(0).bits.opif6 === OPIFunct6.rol && io.pipe(0).bits.funct3 === OPIVI - shift_arr.io.rot := io.pipe(0).bits.opif6.isOneOf(OPIFunct6.rol, OPIFunct6.ror) - shift_arr.io.shl := ctrl.bool(ShiftsLeft) - shift_arr.io.signed := io.pipe(0).bits.funct6(0) - shift_arr.io.rm := io.pipe(0).bits.vxrm - shift_arr.io.scaling := ctrl.bool(ScalingShift) - shift_arr.io.narrowing := shift_narrowing - - io.pipe0_stall := false.B - io.write.valid := io.pipe(depth-1).valid - io.write.bits.eg := io.pipe(depth-1).bits.wvd_eg - io.write.bits.mask := FillInterleaved(8, io.pipe(depth-1).bits.wmask) - io.write.bits.data := shift_arr.io.out.asUInt - - val shift_vxsat = shift_arr.io.set_vxsat & io.pipe(depth-1).bits.wmask - io.set_vxsat := io.pipe(depth-1).valid && (shift_vxsat =/= 0.U) - io.set_fflags.valid := false.B - io.set_fflags.bits := DontCare - - io.scalar_write.valid := false.B - io.scalar_write.bits := DontCare -} diff --git a/arch/src/main/scala/framework/gendomain/frontend/Dispatch.scala b/arch/src/main/scala/framework/gendomain/frontend/Dispatch.scala deleted file mode 100644 index c249f8fd..00000000 --- a/arch/src/main/scala/framework/gendomain/frontend/Dispatch.scala +++ /dev/null @@ -1,129 +0,0 @@ -package framework.gendomain.frontend - -import chisel3._ -import chisel3.util._ -import chisel3.experimental.dataview._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ - -import framework.gendomain.common._ -import framework.gendomain.insns._ - -class VectorDispatcher(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val issue = Flipped(Decoupled(new VectorIssueInst)) - - val mem = Decoupled(new VectorMemMacroOp) - val dis = Decoupled(new VectorIssueInst) - - val scalar_resp = Decoupled(new ScalarWrite) - - val vat_release = Input(Vec(nRelease, Valid(UInt(vParams.vatSz.W)))) - val vat_head = Output(UInt(vParams.vatSz.W)) - val vat_tail = Output(UInt(vParams.vatSz.W)) - }) - - val debug_id_ctr = RegInit(0.U(debugIdSz.W)) - val vat_valids = RegInit(VecInit.fill(1 << vParams.vatSz)(false.B)) - val vat_tail = RegInit(0.U(vParams.vatSz.W)) - val vat_head = RegInit(0.U(vParams.vatSz.W)) - def vatOlder(i0: UInt, i1: UInt) = cqOlder(i0, i1, vat_tail) - val vat_available = !vat_valids(vat_tail) - val vat_available_count = PopCount(~vat_valids.asUInt) - val vat_head_incr = WireInit(false.B) - - when (vat_head_incr) { - vat_head := vat_head + 1.U - } - - when (io.dis.fire) { - assert(!vat_valids(vat_tail)) - vat_valids(vat_tail) := true.B - vat_tail := vat_tail + 1.U - debug_id_ctr := debug_id_ctr + 1.U - } - - when (vat_tail =/= vat_head && !vat_valids(vat_head)) { - vat_head_incr := true.B - } - - val issue_inst = WireInit(io.issue.bits) - issue_inst.vat := vat_tail - issue_inst.debug_id := debug_id_ctr - - val hwacha_limiter = vParams.hwachaLimiter.map(n => Module(new HwachaLimiter(n))) - hwacha_limiter.foreach { h => - h.io.inst := issue_inst - h.io.fire := io.issue.fire - h.io.vat_release.foreach(_ := false.B) - } - val hwacha_block = hwacha_limiter.map(_.io.block).getOrElse(false.B) - - - io.issue.ready := vat_available && io.dis.ready && (!issue_inst.vmu || io.mem.ready) && !hwacha_block - io.dis.valid := vat_available && io.issue.valid && (!issue_inst.vmu || io.mem.ready) && !hwacha_block - io.mem.valid := vat_available && io.issue.valid && io.dis.ready && issue_inst.vmu && !hwacha_block - - io.vat_tail := vat_tail - io.vat_head := vat_head - - when ((io.issue.bits.funct3 === OPMVV && io.issue.bits.opmf6 === OPMFunct6.wrxunary0 && io.issue.bits.rs1 === 0.U) || - (io.issue.bits.funct3 === OPFVV && io.issue.bits.opff6 === OPFFunct6.wrfunary0 && io.issue.bits.rs1 === 0.U)) { - issue_inst.vconfig.vl := 1.U - issue_inst.vstart := 0.U - } - - // Strided with stride = 1 << eew is just unit-strided - when (io.issue.bits.mop === mopStrided && io.issue.bits.rs2_data === ((io.issue.bits.nf +& 1.U) << io.issue.bits.mem_elem_size)) { - issue_inst.mop := mopUnit - } - - io.scalar_resp.valid := false.B - io.scalar_resp.bits.fp := false.B - io.scalar_resp.bits.rd := io.issue.bits.rd - io.scalar_resp.bits.size := 3.U - io.scalar_resp.bits.data := Mux(io.issue.bits.rs1(0), - ~(0.U(xLen.W)), // vfirst - 0.U // vpopc - ) - - when (issue_inst.vconfig.vl <= issue_inst.vstart && !(issue_inst.funct3 === OPIVI && issue_inst.opif6 === OPIFunct6.mvnrr)) { - io.issue.ready := true.B - io.mem.valid := false.B - io.dis.valid := false.B - when (io.issue.bits.funct3 === OPMVV && io.issue.bits.opmf6 === OPMFunct6.wrxunary0) { - io.issue.ready := io.scalar_resp.ready - io.scalar_resp.valid := io.issue.valid - } - } - - io.dis.bits := issue_inst - - io.mem.bits.base_offset := issue_inst.rs1_data - io.mem.bits.stride := issue_inst.rs2_data - io.mem.bits.page := issue_inst.page - io.mem.bits.vstart := issue_inst.vstart - io.mem.bits.segstart := issue_inst.segstart - io.mem.bits.segend := issue_inst.segend - io.mem.bits.vl := issue_inst.vconfig.vl - io.mem.bits.mop := issue_inst.mop - io.mem.bits.vm := issue_inst.vm - io.mem.bits.nf := issue_inst.nf - io.mem.bits.idx_size := issue_inst.mem_idx_size - io.mem.bits.elem_size := issue_inst.mem_elem_size - io.mem.bits.whole_reg := issue_inst.umop === lumopWhole && issue_inst.orig_mop === mopUnit - io.mem.bits.store := issue_inst.bits(5) - io.mem.bits.fast_sg := issue_inst.fast_sg - io.mem.bits.debug_id := issue_inst.debug_id - - - for (r <- io.vat_release) { - when (r.valid) { - assert(vat_valids(r.bits)) - when (r.bits === vat_head) { vat_head_incr := true.B } - vat_valids(r.bits) := false.B - hwacha_limiter.foreach(_.io.vat_release(r.bits) := true.B) - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/frontend/EarlyDecode.scala b/arch/src/main/scala/framework/gendomain/frontend/EarlyDecode.scala deleted file mode 100644 index be92da6c..00000000 --- a/arch/src/main/scala/framework/gendomain/frontend/EarlyDecode.scala +++ /dev/null @@ -1,56 +0,0 @@ -package framework.gendomain.frontend - -import chisel3._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import framework.gendomain.common._ -import framework.gendomain.insns.{VectorInstruction, VectorDecoder} - -class EarlyVectorDecode(supported_ex_insns: Seq[VectorInstruction])(implicit p: Parameters) extends RocketVectorDecoder()(p) with HasVectorConsts { - - io.legal := false.B - io.fp := false.B - io.read_rs1 := false.B - io.read_rs2 := false.B - io.read_frs1 := false.B - io.write_rd := false.B - io.write_frd := false.B - - val opcode = io.inst(6,0) - - val width = io.inst(14,12) - val lumop = io.inst(24,20) - val sumop = lumop - val vm = io.inst(25) - val mop = io.inst(27,26) - val mew = io.inst(28) - val nf = io.inst(31,29) - val funct3 = io.inst(14,12) - val funct6 = io.inst(31,26) - val rs1 = io.inst(19,15) - val rs2 = io.inst(24,20) - - val v_load = opcode === opcLoad - val v_store = opcode === opcStore - val v_arith = opcode === opcVector && funct3 =/= 7.U && new VectorDecoder(funct3, funct6, rs1, rs2, supported_ex_insns, Nil).matched - - when (v_load || v_store) { - io.legal := mew === 0.U && width.isOneOf(0.U, 5.U, 6.U, 7.U) - val unit = mop === 0.U - when (unit) { - when (v_load && !lumop.isOneOf(lumopUnit, lumopWhole, lumopMask, lumopFF)) { io.legal := false.B } - when (v_store && !sumop.isOneOf(sumopUnit, sumopWhole, sumopMask)) { io.legal := false.B } - } - when (mew === 1.U) { io.legal := false.B } - io.read_rs1 := true.B - io.read_rs2 := mop === mopStrided - } .elsewhen (v_arith) { - io.legal := true.B - io.read_rs1 := funct3.isOneOf(OPIVX, OPMVX) - io.read_frs1 := funct3 === OPFVF - io.write_rd := funct3 === OPMVV && OPMFunct6(funct6) === OPMFunct6.wrxunary0 - io.write_frd := funct3 === OPFVV && OPFFunct6(funct6) === OPFFunct6.wrfunary0 - io.fp := funct3 === OPFVF - } -} diff --git a/arch/src/main/scala/framework/gendomain/frontend/EarlyTrapCheck.scala b/arch/src/main/scala/framework/gendomain/frontend/EarlyTrapCheck.scala deleted file mode 100644 index eef168f5..00000000 --- a/arch/src/main/scala/framework/gendomain/frontend/EarlyTrapCheck.scala +++ /dev/null @@ -1,235 +0,0 @@ -package framework.gendomain.frontend - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ -import framework.gendomain.backend.{VectorBackend} - -class EarlyTrapCheck(edge: TLEdge, sgSize: Option[BigInt])(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - - val unified_addresses = AddressSet.unify(edge.manager.managers.map(_.address).flatten) - require(unified_addresses.forall(_.alignment >= (1 << pgIdxBits)), - "Memory devices on this system must be at least page-aligned") - - val io = IO(new Bundle { - val sg_base = Input(UInt(coreMaxAddrBits.W)) - - val busy = Output(Bool()) - val s0 = new Bundle { - val in = Input(Valid(new Bundle { - val inst = UInt(32.W) - val pc = UInt(vaddrBitsExtended.W) - val status = new MStatus - val vconfig = new VConfig - val vstart = UInt(log2Ceil(maxVLMax).W) - val rs1 = UInt(xLen.W) - val rs2 = UInt(xLen.W) - val phys = Bool() - })) - val tlb_req = Valid(new TLBReq(3)) - } - - val s1 = new Bundle { - val inst = Output(new VectorIssueInst) - val rs1 = Input(Valid(UInt(xLen.W))) - val kill = Input(Bool()) - val tlb_req = Valid(new TLBReq(3)) - val tlb_resp = Input(new TLBResp) - } - - val s2 = new Bundle { - val scalar_store_pending = Input(Bool()) - val inst = Valid(new VectorIssueInst) - val replay = Output(Bool()) - val vstart = Valid(UInt(log2Ceil(maxVLMax).W)) - val retire = Output(Bool()) - val xcpt = Valid(new Bundle { - val cause = UInt(xLen.W) - val tval = UInt(coreMaxAddrBits.W) - }) - val pc = Output(UInt(vaddrBitsExtended.W)) - val internal_replay = Valid(new VectorIssueInst) - val issue = Decoupled(new VectorIssueInst) - val vxrm = Input(UInt(2.W)) - val frm = Input(UInt(3.W)) - } - }) - - val s1_valid = RegInit(false.B) - val s2_valid = RegInit(false.B) - io.busy := s1_valid || s2_valid - - val s0_inst = Wire(new VectorIssueInst) - s0_inst.pc := io.s0.in.bits.pc - s0_inst.bits := io.s0.in.bits.inst - s0_inst.vconfig := io.s0.in.bits.vconfig - s0_inst.vstart := Mux(s1_valid || s2_valid, 0.U, io.s0.in.bits.vstart) - s0_inst.segstart := 0.U - s0_inst.segend := s0_inst.seg_nf - s0_inst.rs1_data := io.s0.in.bits.rs1 - s0_inst.rs2_data := io.s0.in.bits.rs2 - s0_inst.emul := Mux(io.s0.in.bits.vconfig.vtype.vlmul_sign, 0.U, io.s0.in.bits.vconfig.vtype.vlmul_mag) - s0_inst.page := DontCare - s0_inst.vat := DontCare - s0_inst.debug_id := DontCare - s0_inst.rm := DontCare - s0_inst.fast_sg := false.B - s0_inst.mop := s0_inst.orig_mop - when (s0_inst.vmu && s0_inst.mop === mopUnit) { - val mask_vl = (io.s0.in.bits.vconfig.vl >> 3) + Mux(io.s0.in.bits.vconfig.vl(2,0) === 0.U, 0.U, 1.U) - val whole_vl = (vLen.U >> (s0_inst.mem_elem_size +& 3.U)) * (s0_inst.nf +& 1.U) - s0_inst.vconfig.vl := MuxLookup(s0_inst.umop, io.s0.in.bits.vconfig.vl)(Seq( - (lumopWhole -> whole_vl), (lumopMask -> mask_vl) - )) - when (s0_inst.umop === lumopWhole) { - s0_inst.emul := VecInit.tabulate(8)(nf => log2Ceil(nf+1).U)(s0_inst.nf) - } - } - when (!s0_inst.vmu && s0_inst.funct3 === OPIVI && s0_inst.funct6 === OPIFunct6.mvnrr.litValue.U) { - s0_inst.emul := log2_up(s0_inst.imm5, 8) - } - - val s0_unit = s0_inst.mop === mopUnit || (s0_inst.mop === mopStrided && io.s0.in.bits.rs2 === ((s0_inst.nf +& 1.U) << s0_inst.mem_elem_size)) - val s0_indexed = s0_inst.mop.isOneOf(mopOrdered, mopUnordered) - val s0_base = io.s0.in.bits.rs1 + (((s0_inst.seg_nf +& 1.U) * s0_inst.vstart ) << s0_inst.mem_elem_size) - val s0_bound = io.s0.in.bits.rs1 + (((s0_inst.seg_nf +& 1.U) * s0_inst.vconfig.vl) << s0_inst.mem_elem_size) - 1.U - val s0_single_page = (s0_base >> pgIdxBits) === (s0_bound >> pgIdxBits) - val s0_replay_next_page = s0_inst.vmu && s0_unit && s0_inst.nf === 0.U && !s0_single_page - val s0_iterative = (!s0_single_page || !s0_unit || s0_inst.umop === lumopFF) && !s0_replay_next_page - val s0_fast_sg = s0_iterative && io.s0.in.bits.phys && s0_inst.mop === mopUnordered && s0_inst.seg_nf === 0.U && sgSize.map { size => - s0_base >= io.sg_base && s0_base < (io.sg_base + size.U) - }.getOrElse(false.B) - - val s0_tlb_valid = !s0_iterative && s0_inst.vmu && s0_inst.vstart < s0_inst.vconfig.vl - - io.s0.tlb_req.valid := s0_tlb_valid && io.s0.in.valid - io.s0.tlb_req.bits.vaddr := s0_base - io.s0.tlb_req.bits.passthrough := false.B - io.s0.tlb_req.bits.size := s0_inst.mem_elem_size - io.s0.tlb_req.bits.cmd := Mux(s0_inst.opcode(5), M_XWR, M_XRD) - io.s0.tlb_req.bits.prv := io.s0.in.bits.status.prv - io.s0.tlb_req.bits.v := io.s0.in.bits.status.v - - // s1_stage - s1_valid := io.s0.in.fire - val s1_inst = RegEnable(s0_inst , io.s0.in.valid) - val s1_iterative = RegEnable(s0_iterative , io.s0.in.valid) - val s1_replay_next_page = RegEnable(s0_replay_next_page, io.s0.in.valid) - val s1_base = RegEnable(s0_base , io.s0.in.valid) - val s1_tlb_valid = RegEnable(s0_tlb_valid , io.s0.in.valid) - val s1_fast_sg = RegEnable(s0_fast_sg , io.s0.in.valid) - val s1_tlb_resp = WireInit(io.s1.tlb_resp) - - when (!s1_tlb_valid) { - s1_tlb_resp := 0.U.asTypeOf(new TLBResp) - when (s1_fast_sg) { - s1_tlb_resp.paddr := s1_base - } - } - - io.s1.inst := s1_inst - io.s1.tlb_req.valid := RegNext(io.s0.tlb_req.valid, false.B) - io.s1.tlb_req.bits := RegEnable(io.s0.tlb_req.bits, s0_tlb_valid) - - // s2 stage - s2_valid := s1_valid && !io.s1.kill - val s2_inst = Reg(new VectorIssueInst) - val s2_base = RegEnable(s1_base, s1_valid) - val s2_iterative = RegEnable(s1_iterative , s1_valid) - val s2_fast_sg = RegEnable(s1_fast_sg , s1_valid) - val s2_replay_next_page = RegEnable(s1_replay_next_page, s1_valid) - when (s1_valid) { - s2_inst := s1_inst - when (io.s1.rs1.valid) { s2_inst.rs1_data := io.s1.rs1.bits } - } - val s2_tlb_resp = RegEnable(s1_tlb_resp, s1_valid) - - val s2_xcpts = Seq( - (s2_tlb_resp.pf.st, Causes.store_page_fault.U), - (s2_tlb_resp.pf.ld, Causes.load_page_fault.U), - (s2_tlb_resp.gf.st, Causes.store_guest_page_fault.U), - (s2_tlb_resp.gf.ld, Causes.load_guest_page_fault.U), - (s2_tlb_resp.ae.st, Causes.store_access.U), - (s2_tlb_resp.ae.ld, Causes.load_access.U), - (s2_tlb_resp.ma.st, Causes.misaligned_store.U), - (s2_tlb_resp.ma.ld, Causes.misaligned_load.U) - ) - val s2_xcpt = s2_xcpts.map(_._1).orR - val s2_cause = PriorityMux(s2_xcpts) - val s2_go_to_itc = WireInit(s2_inst.vmu && s2_iterative) - val s2_generate_xcpt = WireInit(s2_xcpt) - - // masked checks, even in the fast case, need to - // to to ITC to get the precise element+address of the trap - when (s2_inst.vmu && s2_xcpt && !s2_inst.vm) { - s2_go_to_itc := true.B - s2_generate_xcpt := false.B - } - - io.s2.inst.valid := s2_valid - io.s2.inst.bits := s2_inst - io.s2.replay := false.B - io.s2.vstart.valid := false.B - io.s2.vstart.bits := 0.U - io.s2.retire := false.B - io.s2.internal_replay.valid := false.B - io.s2.internal_replay.bits := s2_inst - io.s2.internal_replay.bits.rm := Mux(s2_inst.isOpf, io.s2.frm, io.s2.vxrm) - io.s2.xcpt.valid := false.B - io.s2.xcpt.bits.cause := s2_cause - io.s2.xcpt.bits.tval := s2_base - io.s2.pc := s2_inst.pc - io.s2.issue.valid := false.B - io.s2.issue.bits := s2_inst - io.s2.issue.bits.segstart := 0.U - io.s2.issue.bits.segend := s2_inst.seg_nf - io.s2.issue.bits.rm := Mux(s2_inst.isOpf, io.s2.frm, io.s2.vxrm) - io.s2.issue.bits.page := s2_tlb_resp.paddr >> pgIdxBits - - val consumed = ((1 << pgIdxBits).U - s2_tlb_resp.paddr(pgIdxBits-1,0)) >> s2_inst.mem_elem_size - when (s2_inst.vmu && s2_replay_next_page) { - io.s2.issue.bits.vconfig.vl := s2_inst.vstart +& consumed - } - - when (s2_valid) { - when (!io.s2.issue.ready || (io.s2.scalar_store_pending && s2_inst.vmu)) { - io.s2.replay := true.B - } .elsewhen (s2_inst.vstart =/= 0.U && !s2_inst.vmu) { - io.s2.xcpt.valid := true.B - io.s2.xcpt.bits.cause := Causes.illegal_instruction.U - io.s2.xcpt.bits.tval := s2_inst.pc - } .elsewhen (s2_inst.vstart >= s2_inst.vconfig.vl) { - io.s2.retire := true.B - io.s2.issue.valid := true.B - io.s2.vstart.valid := true.B - } .elsewhen (s2_tlb_resp.miss) { - io.s2.replay := true.B - } .elsewhen (s2_generate_xcpt) { - io.s2.xcpt.valid := true.B - } .elsewhen (s2_inst.vmu && s2_fast_sg) { - io.s2.retire := true.B - io.s2.issue.valid := true.B - io.s2.issue.bits.fast_sg := true.B - io.s2.vstart.valid := true.B - } .elsewhen (s2_go_to_itc) { - io.s2.internal_replay.valid := true.B - } .elsewhen (s2_replay_next_page) { - io.s2.replay := true.B - io.s2.issue.valid := true.B - io.s2.vstart.valid := true.B - io.s2.vstart.bits := s2_inst.vstart +& consumed - } .otherwise { - io.s2.retire := true.B - io.s2.vstart.valid := true.B - io.s2.issue.valid := true.B - } - } - -} diff --git a/arch/src/main/scala/framework/gendomain/frontend/HwachaLimiter.scala b/arch/src/main/scala/framework/gendomain/frontend/HwachaLimiter.scala deleted file mode 100644 index a3b9b30b..00000000 --- a/arch/src/main/scala/framework/gendomain/frontend/HwachaLimiter.scala +++ /dev/null @@ -1,57 +0,0 @@ -package framework.gendomain.frontend - -import chisel3._ -import chisel3.util._ -import chisel3.experimental.dataview._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import framework.gendomain.common._ - - -class HwachaLimiter(n: Int)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val vatSz = vParams.vatSz - val io = IO(new Bundle { - val block = Output(Bool()) - val fire = Input(Bool()) - val inst = Input(new VectorIssueInst) - - val vat_release = Input(Vec(1 << vatSz, Bool())) - }) - - val slot_valids = RegInit(VecInit.fill(n)(false.B)) - val slot_vats = Reg(Vec(n, UInt(vatSz.W))) - val head = RegInit(0.U(log2Ceil(n).W)) - val head_p1 = incr(head) - val head_p2 = incr(head_p1) - - def incr(x: UInt) = Mux(x +& 1.U === n.U, 0.U, x + 1.U) - - val issue_2 = io.inst.vmu - val issue_3 = io.inst.vmu && io.inst.mop =/= mopUnit - val slots_available = !slot_valids(head) && (!issue_2 || !slot_valids(head_p1)) && (!issue_3 || !slot_valids(head_p2)) - - io.block := !slots_available - - when (io.fire) { - slot_valids(head) := true.B - slot_vats(head) := io.inst.vat - head := incr(head) - when (issue_2) { - slot_valids(head_p1) := true.B - slot_vats(head_p1) := io.inst.vat - head := incr(head_p1) - } - when (issue_3) { - slot_valids(head_p2) := true.B - slot_vats(head_p2) := io.inst.vat - head := incr(head_p2) - } - } - - for (i <- 0 until n) { - when (io.vat_release(slot_vats(i))) { - slot_valids(i) := false.B - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/frontend/IterativeTrapCheck.scala b/arch/src/main/scala/framework/gendomain/frontend/IterativeTrapCheck.scala deleted file mode 100644 index 98cf47b3..00000000 --- a/arch/src/main/scala/framework/gendomain/frontend/IterativeTrapCheck.scala +++ /dev/null @@ -1,270 +0,0 @@ -package framework.gendomain.frontend - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ - -class IndexMaskAccess(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val in = Input(Bool()) - val inst = Input(new VectorIssueInst) - - val index_access = Flipped(new VectorIndexAccessIO) - val mask_access = Flipped(new VectorMaskAccessIO) - - val access = new Bundle { - val ready = Output(Bool()) - val eidx = Input(UInt(log2Ceil(maxVLMax).W)) - val index = Output(UInt(64.W)) - val mask = Output(Bool()) - } - - val pop = Input(Valid(UInt(log2Ceil(maxVLMax).W))) - val flush = Input(Bool()) - }) - - val valid = RegInit(false.B) - val eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - // This all works only with pow2 buffers and eidx starting at 0 - val valids = Reg(Vec(4, Bool())) - val indices = Reg(Vec(4, UInt(64.W))) - val masks = Reg(Vec(4, Bool())) - when (io.in) { - assert(!valid) - valid := true.B - eidx := 0.U - valids.foreach(_ := false.B) - } - - val needs_index = io.inst.mop.isOneOf(mopOrdered, mopUnordered) - val needs_mask = !io.inst.vm - - val index_ready = io.index_access.ready || !needs_index - val mask_ready = io.mask_access.ready || !needs_mask - - io.index_access.valid := valid && needs_index && !valids(eidx(1,0)) - io.mask_access.valid := valid && needs_mask && !valids(eidx(1,0)) - - io.index_access.vrs := io.inst.rs2 - io.index_access.eidx := eidx - io.index_access.eew := io.inst.mem_idx_size - io.mask_access.eidx := eidx - - when (valid && index_ready && mask_ready && !valids(eidx(1,0))) { - val next_eidx = eidx +& 1.U - eidx := eidx + 1.U - when (next_eidx === io.inst.vconfig.vl) { - valid := false.B - } - valids(eidx(1,0)) := true.B - indices(eidx(1,0)) := io.index_access.idx - masks(eidx(1,0)) := io.mask_access.mask - } - - io.access.ready := valids(io.access.eidx(1,0)) - io.access.index := indices(io.access.eidx(1,0)) - io.access.mask := masks(io.access.eidx(1,0)) - - when (io.pop.fire) { - valids(io.pop.bits(1,0)) := false.B - } - when (io.flush) { - valid := false.B - } -} - -class IterativeTrapCheck(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val status = Input(new MStatus) - val in = Input(Valid(new VectorIssueInst)) - val busy = Output(Bool()) - val s0_tlb_req = Valid(new TLBReq(3)) - val s1_tlb_req = Valid(new TLBReq(3)) - val tlb_resp = Input(new TLBResp) - val retire = Output(Bool()) - val pc = Output(UInt(vaddrBitsExtended.W)) - val vstart = Valid(UInt(log2Ceil(maxVLMax).W)) - val vconfig = Valid(new VConfig) - val xcpt = Valid(new Bundle { - val cause = UInt(xLen.W) - val tval = UInt(coreMaxAddrBits.W) - }) - val inst = Output(new VectorIssueInst) - val issue = Decoupled(new VectorIssueInst) - - val index_access = Flipped(new VectorIndexAccessIO) - val mask_access = Flipped(new VectorMaskAccessIO) - }) - - val replay_kill = WireInit(false.B) - - def nextPage(addr: UInt) = ((addr + (1 << pgIdxBits).U) >> pgIdxBits) << pgIdxBits - - val valid = RegInit(false.B) - val seg_hi = Reg(Bool()) - val inst = Reg(new VectorIssueInst) - val eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val addr = Reg(UInt(vaddrBitsExtended.W)) - val tlb_backoff = RegInit(0.U(2.W)) - when (tlb_backoff =/= 0.U) { tlb_backoff := tlb_backoff - 1.U } - - val im_access = Module(new IndexMaskAccess) - im_access.io.in := io.in.valid - im_access.io.inst := inst - im_access.io.index_access <> io.index_access - im_access.io.mask_access <> io.mask_access - - when (io.in.valid) { - assert(!valid) - valid := true.B - seg_hi := false.B - inst := io.in.bits - eidx := 0.U - addr := io.in.bits.rs1_data - } - - - val stride = MuxLookup(inst.mop, 0.U)(Seq( - (mopUnit -> ((inst.seg_nf +& 1.U) << inst.mem_elem_size)), - (mopStrided -> inst.rs2_data) - )) - - val indexed = inst.mop.isOneOf(mopOrdered, mopUnordered) - val index_ready = !indexed || im_access.io.access.ready - val mask_ready = inst.vm || im_access.io.access.ready - val index = Mux(indexed, im_access.io.access.index & eewBitMask(inst.mem_idx_size), 0.U) - val base = Mux(indexed, inst.rs1_data, addr) - val indexaddr = base + index - val tlb_addr = Mux(seg_hi, nextPage(indexaddr), indexaddr) - val seg_nf_consumed = ((1 << pgIdxBits).U - Mux(seg_hi, indexaddr, tlb_addr)(pgIdxBits-1,0)) >> inst.mem_elem_size - val seg_single_page = seg_nf_consumed >= (inst.seg_nf +& 1.U) - val masked = !im_access.io.access.mask && !inst.vm - val tlb_valid = eidx < inst.vconfig.vl && eidx >= inst.vstart && !masked - val ff = inst.umop === lumopFF && inst.mop === mopUnit - - io.busy := valid - io.inst := inst - - im_access.io.access.eidx := eidx - - io.s0_tlb_req.valid := tlb_valid && tlb_backoff === 0.U && index_ready && mask_ready - io.s0_tlb_req.bits.vaddr := tlb_addr - io.s0_tlb_req.bits.passthrough := false.B - io.s0_tlb_req.bits.size := inst.mem_elem_size - io.s0_tlb_req.bits.cmd := Mux(inst.opcode(5), M_XWR, M_XRD) - io.s0_tlb_req.bits.prv := io.status.prv - io.s0_tlb_req.bits.v := io.status.v - - io.s1_tlb_req.valid := RegEnable(io.s0_tlb_req.valid, false.B, valid) - io.s1_tlb_req.bits := RegEnable(io.s0_tlb_req.bits, valid) - - val replay_fire = valid && eidx < inst.vconfig.vl && tlb_backoff === 0.U && index_ready && mask_ready - when (replay_fire) { - when (seg_hi || seg_single_page || inst.seg_nf === 0.U) { - eidx := eidx + 1.U - addr := addr + stride - seg_hi := false.B - } .otherwise { - seg_hi := true.B - } - } - - val s1_valid = RegNext(replay_fire && !replay_kill, false.B) - val s1_eidx = RegEnable(eidx, valid) - val s1_masked = RegEnable(masked, valid) - val s1_seg_hi = RegEnable(seg_hi, valid) - val s1_base = RegEnable(base, valid) - val s1_tlb_valid = RegEnable(tlb_valid, valid) - val s1_tlb_addr = RegEnable(tlb_addr, valid) - val s1_seg_nf_consumed = RegEnable(seg_nf_consumed, valid) - val s1_seg_single_page = RegEnable(seg_single_page, valid) - - when (io.tlb_resp.miss && s1_valid && tlb_backoff === 0.U) { tlb_backoff := 3.U } - - val tlb_resp = WireInit(io.tlb_resp) - when (!s1_tlb_valid) { - tlb_resp.miss := false.B - } - - val xcpts = Seq( - (tlb_resp.pf.st, Causes.store_page_fault.U), - (tlb_resp.pf.ld, Causes.load_page_fault.U), - (tlb_resp.gf.st, Causes.store_guest_page_fault.U), - (tlb_resp.gf.ld, Causes.load_guest_page_fault.U), - (tlb_resp.ae.st, Causes.store_access.U), - (tlb_resp.ae.ld, Causes.load_access.U), - (tlb_resp.ma.st, Causes.misaligned_store.U), - (tlb_resp.ma.ld, Causes.misaligned_load.U) - ) - val xcpt = xcpts.map(_._1).orR && s1_eidx >= inst.vstart && !s1_masked - val cause = PriorityMux(xcpts) - - - io.issue.valid := false.B - io.issue.bits := inst - io.issue.bits.vstart := s1_eidx - io.issue.bits.vconfig.vl := s1_eidx +& 1.U - io.issue.bits.segend := inst.seg_nf - io.issue.bits.segstart := 0.U - io.issue.bits.page := tlb_resp.paddr >> pgIdxBits - io.xcpt.valid := false.B - io.pc := inst.pc - io.xcpt.bits.cause := cause - io.xcpt.bits.tval := s1_tlb_addr - io.vstart.valid := false.B - io.vstart.bits := s1_eidx - io.retire := false.B - io.vconfig.valid := false.B - io.vconfig.bits := inst.vconfig - io.vconfig.bits.vl := s1_eidx - im_access.io.pop.valid := false.B - im_access.io.pop.bits := s1_eidx - im_access.io.flush := false.B - - when (s1_valid) { - io.issue.valid := !tlb_resp.miss && !xcpt && s1_eidx >= inst.vstart && !s1_masked - when (inst.seg_nf =/= 0.U && !s1_seg_single_page) { - when (!s1_seg_hi) { - io.issue.bits.segend := s1_seg_nf_consumed - 1.U - } .otherwise { - io.issue.bits.segstart := s1_seg_nf_consumed - } - } - - when (s1_seg_hi || s1_seg_single_page || inst.seg_nf === 0.U) { - im_access.io.pop.valid := true.B - } - - when (tlb_resp.miss || !io.issue.ready) { - tlb_backoff := 3.U - replay_kill := true.B - eidx := s1_eidx - addr := s1_base - seg_hi := s1_seg_hi - im_access.io.pop.valid := false.B - } .elsewhen (xcpt) { - val ff_nofault = ff && s1_eidx =/= 0.U - valid := false.B - replay_kill := true.B - io.retire := ff_nofault - io.xcpt.valid := !ff_nofault - io.vstart.valid := !ff_nofault - io.vconfig.valid := ff_nofault - im_access.io.flush := true.B - } .elsewhen ((s1_eidx +& 1.U) === inst.vconfig.vl && (s1_seg_hi || s1_seg_single_page || inst.seg_nf === 0.U)) { - valid := false.B - replay_kill := true.B - io.retire := true.B - io.vstart.valid := true.B - io.vstart.bits := 0.U - im_access.io.flush := true.B - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/insns/Base.scala b/arch/src/main/scala/framework/gendomain/insns/Base.scala deleted file mode 100644 index ef4092b4..00000000 --- a/arch/src/main/scala/framework/gendomain/insns/Base.scala +++ /dev/null @@ -1,56 +0,0 @@ -package framework.gendomain.insns - -import chisel3._ -import chisel3.util._ - -import org.chipsalliance.cde.config._ -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.rocket.constants._ -import freechips.rocketchip.util._ - -trait InstructionField { - def default: BitPat - def width: Int - def Y = InstructionProperty(this, true.B) - def N = InstructionProperty(this, false.B) - def dontCare: BitPat = BitPat.dontCare(width) - def apply(v: UInt) = InstructionProperty(this, v(width-1,0)) - def apply(v: BitPat) = InstructionProperty(this, v) - def apply(e: EnumType) = InstructionProperty(this, e.litValue.U(width.W)) -} - -trait NDefaultInstructionField extends InstructionField { - val default: BitPat = false.B - val width: Int = 1 -} - -trait YDefaultInstructionField extends InstructionField { - val default: BitPat = true.B - val width: Int = 1 -} - -trait XDefaultInstructionField extends InstructionField { - def default: BitPat = BitPat.dontCare(width) - val width: Int = 1 -} - -case class InstructionProperty(val field: InstructionField, val value: BitPat) - -trait VectorInstruction { - val props: Seq[InstructionProperty] - def lookup(field: InstructionField) = { - val matches = props.collect { case InstructionProperty(`field`, value) => value } - if (matches.size > 0) { - require(matches.toSet.size <= 1, s"Field lookup for $field returned multiple results") - matches(0) - } else { - field.default - } - } - def elementWise: VectorInstruction = new ElementwiseVectorInstruction(props) -} - -class ElementwiseVectorInstruction(_props: Seq[InstructionProperty]) extends VectorInstruction { - val props = _props :+ Elementwise.Y -} diff --git a/arch/src/main/scala/framework/gendomain/insns/Control.scala b/arch/src/main/scala/framework/gendomain/insns/Control.scala deleted file mode 100644 index 45cd1bf7..00000000 --- a/arch/src/main/scala/framework/gendomain/insns/Control.scala +++ /dev/null @@ -1,96 +0,0 @@ -package framework.gendomain.insns - -import chisel3._ -import chisel3.util._ - -import org.chipsalliance.cde.config._ -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.rocket.constants._ -import freechips.rocketchip.util._ - -object F6 extends XDefaultInstructionField { override val width: Int = 6 } -object F3 extends XDefaultInstructionField { override val width: Int = 3 } -object RS1 extends XDefaultInstructionField { override val width: Int = 5 } -object RS2 extends XDefaultInstructionField { override val width: Int = 5 } - -object AlwaysReadsVM extends NDefaultInstructionField -object VMBitReadsVM extends YDefaultInstructionField -object ReadsVS1 extends NDefaultInstructionField -object ReadsVS2 extends YDefaultInstructionField -object ReadsVD extends NDefaultInstructionField -object WritesVD extends YDefaultInstructionField -object ScalarToVD0 extends NDefaultInstructionField -object Reduction extends NDefaultInstructionField -object Wide2VD extends NDefaultInstructionField -object Wide2VS2 extends NDefaultInstructionField -object WritesAsMask extends NDefaultInstructionField -object ReadsVS1AsMask extends NDefaultInstructionField -object ReadsVS2AsMask extends NDefaultInstructionField -object WritesScalar extends NDefaultInstructionField -object UsesPermuteSeq extends NDefaultInstructionField -object ZextImm5 extends NDefaultInstructionField - -// Execute Sequencer control -object Elementwise extends NDefaultInstructionField -object SetsWMask extends YDefaultInstructionField -object AccInitZeros extends NDefaultInstructionField -object AccInitOnes extends NDefaultInstructionField -object AccInitPos extends NDefaultInstructionField -object AccInitNeg extends NDefaultInstructionField - -// Integer Pipe control -object Swap12 extends NDefaultInstructionField -object WideningSext extends XDefaultInstructionField - -object DoSub extends XDefaultInstructionField -object Averaging extends XDefaultInstructionField -object CarryIn extends XDefaultInstructionField -object AlwaysCarryIn extends XDefaultInstructionField - -object UsesShift extends NDefaultInstructionField -object ShiftsLeft extends XDefaultInstructionField -object ScalingShift extends XDefaultInstructionField - -object UsesCmp extends NDefaultInstructionField -object CmpLess extends XDefaultInstructionField - -object UsesNarrowingSext extends NDefaultInstructionField -object UsesMinMax extends NDefaultInstructionField -object UsesMerge extends NDefaultInstructionField -object UsesSat extends NDefaultInstructionField -object UsesBitSwap extends NDefaultInstructionField -object UsesCountZeros extends NDefaultInstructionField - -// Bitwise pipe control -object BWAnd extends NDefaultInstructionField -object BWOr extends NDefaultInstructionField -object BWXor extends NDefaultInstructionField -object BWInvOut extends NDefaultInstructionField -object BWInv1 extends NDefaultInstructionField - -// Multiply control -object MULHi extends XDefaultInstructionField -object MULSign1 extends XDefaultInstructionField -object MULSign2 extends XDefaultInstructionField -object MULSwapVdV2 extends NDefaultInstructionField -object MULAccumulate extends NDefaultInstructionField -object MULSub extends XDefaultInstructionField - -// FPFMA control -object FPAdd extends XDefaultInstructionField -object FPMul extends XDefaultInstructionField -object FPSwapVdV2 extends XDefaultInstructionField -object FPFMACmd extends XDefaultInstructionField { override val width = 2 } - -// FPComp control -object FPComp extends NDefaultInstructionField -object FPCompMin extends XDefaultInstructionField - -object FPMEQ extends XDefaultInstructionField -object FPMNE extends XDefaultInstructionField -object FPMLT extends XDefaultInstructionField -object FPMGT extends XDefaultInstructionField - -object FPSgnj extends NDefaultInstructionField -object FPSpecRM extends XDefaultInstructionField { override val width = 3 } diff --git a/arch/src/main/scala/framework/gendomain/insns/Decode.scala b/arch/src/main/scala/framework/gendomain/insns/Decode.scala deleted file mode 100644 index 776bba06..00000000 --- a/arch/src/main/scala/framework/gendomain/insns/Decode.scala +++ /dev/null @@ -1,46 +0,0 @@ -package framework.gendomain.insns - -import chisel3._ -import chisel3.util._ -import chisel3.util.experimental.decode._ - -import org.chipsalliance.cde.config._ -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.rocket.constants._ -import freechips.rocketchip.util._ - - -class VectorDecoder( - funct3: UInt, funct6: UInt, rs1: UInt, rs2: UInt, - insns: Seq[VectorInstruction], - fields: Seq[InstructionField]) { - - val index = Cat(rs1(4,0), rs2(4,0), funct3(2,0), funct6(5,0)) - val lookups = insns.map { i => i.lookup(RS1) ## i.lookup(RS2) ## i.lookup(F3) ## i.lookup(F6) } - val duplicates = lookups.diff(lookups.distinct).distinct - val table = insns.map { i => fields.map(f => i.lookup(f)) :+ BitPat(true.B) } - - val elementsGrouped = table.transpose - val defaults = fields.map(_.dontCare) :+ BitPat(false.B) - val elementWidths = elementsGrouped.zip(defaults).map { case (elts, default) => - require(elts.forall(_.getWidth == default.getWidth)) - default.getWidth - } - val resultWidth = elementWidths.sum - val elementIndices = elementWidths.scan(resultWidth-1) { case (l,r) => l - r } - val truthTable = TruthTable(lookups.zip(table).map { case (l,r) => (l, r.reduce(_ ## _)) }, defaults.reduce(_ ## _)) - val decode = chisel3.util.experimental.decode.decoder(index, truthTable) - val decoded = elementIndices.zip(elementIndices.tail).map { case (msb, lsb) => decode(msb, lsb+1) }.toSeq - - def uint(field: InstructionField): UInt = { - val index = fields.indexOf(field) - require(index >= 0, s"Field $field not found in this decoder") - decoded(index) - } - def bool(field: InstructionField): Bool = { - require(field.width == 1) - uint(field)(0) - } - def matched: Bool = decoded.last(0) -} diff --git a/arch/src/main/scala/framework/gendomain/insns/Instructions.scala b/arch/src/main/scala/framework/gendomain/insns/Instructions.scala deleted file mode 100644 index 05728239..00000000 --- a/arch/src/main/scala/framework/gendomain/insns/Instructions.scala +++ /dev/null @@ -1,233 +0,0 @@ -package framework.gendomain.insns - -import chisel3._ -import chisel3.util._ - -import org.chipsalliance.cde.config._ -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.rocket.constants._ -import freechips.rocketchip.util._ -import framework.gendomain.common.{OPIFunct6, OPMFunct6, OPFFunct6, VectorConsts} - -class OPIVVInstruction(base: OPIInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPIVV), ReadsVS1.Y) -} -class OPIVXInstruction(base: OPIInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPIVX)) -} -class OPIVIInstruction(base: OPIInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPIVI)) -} -class OPMVVInstruction(base: OPMInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPMVV), ReadsVS1.Y) -} -class OPMVXInstruction(base: OPMInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPMVX)) -} -class OPFVVInstruction(base: OPFInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPFVV), ReadsVS1.Y) -} -class OPFVFInstruction(base: OPFInstruction) extends VectorInstruction { - val props = base.props ++ Seq(F3(VectorConsts.OPFVF)) -} - -trait OPIInstruction extends VectorInstruction { - def VV = new OPIVVInstruction(this) - def VX = new OPIVXInstruction(this) - def VI = new OPIVIInstruction(this) -} - -trait OPMInstruction extends VectorInstruction { - def VV = new OPMVVInstruction(this) - def VX = new OPMVXInstruction(this) -} - -trait OPFInstruction extends VectorInstruction { - def VV = new OPFVVInstruction(this) - def VF = new OPFVFInstruction(this) -} - -object ADD extends OPIInstruction { val props = Seq(F6(OPIFunct6.add) , DoSub.N, Averaging.N, CarryIn.N) } -object SUB extends OPIInstruction { val props = Seq(F6(OPIFunct6.sub) , DoSub.Y, Averaging.N, CarryIn.N) } -object RSUB extends OPIInstruction { val props = Seq(F6(OPIFunct6.rsub) , DoSub.Y, Averaging.N, CarryIn.N, Swap12.Y) } -object WADDU extends OPMInstruction { val props = Seq(F6(OPMFunct6.waddu) , DoSub.N, Averaging.N, CarryIn.N, WideningSext.N, Wide2VD.Y) } -object WADD extends OPMInstruction { val props = Seq(F6(OPMFunct6.wadd) , DoSub.N, Averaging.N, CarryIn.N, WideningSext.Y, Wide2VD.Y) } -object WSUBU extends OPMInstruction { val props = Seq(F6(OPMFunct6.wsubu) , DoSub.Y, Averaging.N, CarryIn.N, WideningSext.N, Wide2VD.Y) } -object WSUB extends OPMInstruction { val props = Seq(F6(OPMFunct6.wsub) , DoSub.Y, Averaging.N, CarryIn.N, WideningSext.Y, Wide2VD.Y) } -object WADDUW extends OPMInstruction { val props = Seq(F6(OPMFunct6.wadduw) , DoSub.N, Averaging.N, CarryIn.N, WideningSext.N, Wide2VD.Y, Wide2VS2.Y) } -object WADDW extends OPMInstruction { val props = Seq(F6(OPMFunct6.waddw) , DoSub.N, Averaging.N, CarryIn.N, WideningSext.Y, Wide2VD.Y, Wide2VS2.Y) } -object WSUBUW extends OPMInstruction { val props = Seq(F6(OPMFunct6.wsubuw) , DoSub.Y, Averaging.N, CarryIn.N, WideningSext.N, Wide2VD.Y, Wide2VS2.Y) } -object WSUBW extends OPMInstruction { val props = Seq(F6(OPMFunct6.wsubw) , DoSub.Y, Averaging.N, CarryIn.N, WideningSext.Y, Wide2VD.Y, Wide2VS2.Y) } -object ADC extends OPIInstruction { val props = Seq(F6(OPIFunct6.adc) , DoSub.N, Averaging.N, CarryIn.Y, AlwaysCarryIn.Y, SetsWMask.N) } -object MADC extends OPIInstruction { val props = Seq(F6(OPIFunct6.madc) , DoSub.N, Averaging.N, CarryIn.Y, AlwaysCarryIn.N, SetsWMask.N, WritesAsMask.Y) } -object SBC extends OPIInstruction { val props = Seq(F6(OPIFunct6.sbc) , DoSub.Y, Averaging.N, CarryIn.Y, AlwaysCarryIn.Y, SetsWMask.N) } -object MSBC extends OPIInstruction { val props = Seq(F6(OPIFunct6.msbc) , DoSub.Y, Averaging.N, CarryIn.Y, AlwaysCarryIn.N, SetsWMask.N, WritesAsMask.Y) } -object NEXT extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b00???")), UsesNarrowingSext.Y) } -object SLL extends OPIInstruction { val props = Seq(F6(OPIFunct6.sll) , UsesShift.Y, ShiftsLeft.Y, ScalingShift.N) } -object SRA extends OPIInstruction { val props = Seq(F6(OPIFunct6.sra) , UsesShift.Y, ShiftsLeft.N, ScalingShift.N) } -object SRL extends OPIInstruction { val props = Seq(F6(OPIFunct6.srl) , UsesShift.Y, ShiftsLeft.N, ScalingShift.N, ZextImm5.Y) } -object NSRA extends OPIInstruction { val props = Seq(F6(OPIFunct6.nsra) , UsesShift.Y, ShiftsLeft.N, ScalingShift.N, WideningSext.N, Wide2VS2.Y, ZextImm5.Y) } -object NSRL extends OPIInstruction { val props = Seq(F6(OPIFunct6.nsrl) , UsesShift.Y, ShiftsLeft.N, ScalingShift.N, WideningSext.N, Wide2VS2.Y, ZextImm5.Y) } -object MSEQ extends OPIInstruction { val props = Seq(F6(OPIFunct6.mseq) , WritesAsMask.Y, UsesCmp.Y, CmpLess.N) } -object MSNE extends OPIInstruction { val props = Seq(F6(OPIFunct6.msne) , WritesAsMask.Y, UsesCmp.Y, CmpLess.N) } -object MSLTU extends OPIInstruction { val props = Seq(F6(OPIFunct6.msltu) , WritesAsMask.Y, UsesCmp.Y, CmpLess.Y) } -object MSLT extends OPIInstruction { val props = Seq(F6(OPIFunct6.mslt) , WritesAsMask.Y, UsesCmp.Y, CmpLess.Y) } -object MSLEU extends OPIInstruction { val props = Seq(F6(OPIFunct6.msleu) , WritesAsMask.Y, UsesCmp.Y, CmpLess.Y) } -object MSLE extends OPIInstruction { val props = Seq(F6(OPIFunct6.msle) , WritesAsMask.Y, UsesCmp.Y, CmpLess.Y) } -object MSGTU extends OPIInstruction { val props = Seq(F6(OPIFunct6.msgtu) , WritesAsMask.Y, UsesCmp.Y, CmpLess.Y, Swap12.Y) } -object MSGT extends OPIInstruction { val props = Seq(F6(OPIFunct6.msgt) , WritesAsMask.Y, UsesCmp.Y, CmpLess.Y, Swap12.Y) } -object MINU extends OPIInstruction { val props = Seq(F6(OPIFunct6.minu) , UsesMinMax.Y, CmpLess.Y) } -object MIN extends OPIInstruction { val props = Seq(F6(OPIFunct6.min) , UsesMinMax.Y, CmpLess.Y) } -object MAXU extends OPIInstruction { val props = Seq(F6(OPIFunct6.maxu) , UsesMinMax.Y, CmpLess.Y, Swap12.Y) } -object MAX extends OPIInstruction { val props = Seq(F6(OPIFunct6.max) , UsesMinMax.Y, CmpLess.Y, Swap12.Y) } -object MERGE extends OPIInstruction { val props = Seq(F6(OPIFunct6.merge) , AlwaysReadsVM.Y, UsesMerge.Y, SetsWMask.N) } -object SADDU extends OPIInstruction { val props = Seq(F6(OPIFunct6.saddu) , DoSub.N, Averaging.N, CarryIn.N, UsesSat.Y) } -object SADD extends OPIInstruction { val props = Seq(F6(OPIFunct6.sadd) , DoSub.N, Averaging.N, CarryIn.N, UsesSat.Y) } -object SSUBU extends OPIInstruction { val props = Seq(F6(OPIFunct6.ssubu) , DoSub.Y, Averaging.N, CarryIn.N, UsesSat.Y) } -object SSUB extends OPIInstruction { val props = Seq(F6(OPIFunct6.ssub) , DoSub.Y, Averaging.N, CarryIn.N, UsesSat.Y) } -object AADDU extends OPMInstruction { val props = Seq(F6(OPMFunct6.aaddu) , DoSub.N, Averaging.Y, CarryIn.N) } -object AADD extends OPMInstruction { val props = Seq(F6(OPMFunct6.aadd) , DoSub.N, Averaging.Y, CarryIn.N) } -object ASUBU extends OPMInstruction { val props = Seq(F6(OPMFunct6.asubu) , DoSub.Y, Averaging.Y, CarryIn.N) } -object ASUB extends OPMInstruction { val props = Seq(F6(OPMFunct6.asub) , DoSub.Y, Averaging.Y, CarryIn.N) } -object SSRL extends OPIInstruction { val props = Seq(F6(OPIFunct6.ssrl) , UsesShift.Y, ShiftsLeft.N, ScalingShift.Y, ZextImm5.Y) } -object SSRA extends OPIInstruction { val props = Seq(F6(OPIFunct6.ssra) , UsesShift.Y, ShiftsLeft.N, ScalingShift.Y) } -object NCLIPU extends OPIInstruction { val props = Seq(F6(OPIFunct6.nclipu) , UsesShift.Y, ShiftsLeft.N, ScalingShift.Y, Wide2VS2.Y, ZextImm5.Y) } -object NCLIP extends OPIInstruction { val props = Seq(F6(OPIFunct6.nclip) , UsesShift.Y, ShiftsLeft.N, ScalingShift.Y, Wide2VS2.Y, ZextImm5.Y) } -object REDSUM extends OPMInstruction { val props = Seq(F6(OPMFunct6.redsum) , Reduction.Y, AccInitZeros.Y, DoSub.N, Averaging.N, CarryIn.N) } -object WREDSUM extends OPIInstruction { val props = Seq(F6(OPIFunct6.wredsum) , Reduction.Y, AccInitZeros.Y, DoSub.N, Averaging.N, CarryIn.N, WideningSext.Y, Wide2VD.Y) } -object WREDSUMU extends OPIInstruction { val props = Seq(F6(OPIFunct6.wredsumu), Reduction.Y, AccInitZeros.Y, DoSub.N, Averaging.N, CarryIn.N, WideningSext.N, Wide2VD.Y) } -object REDMINU extends OPMInstruction { val props = Seq(F6(OPMFunct6.redminu) , Reduction.Y, AccInitOnes.Y , UsesMinMax.Y, CmpLess.Y) } -object REDMIN extends OPMInstruction { val props = Seq(F6(OPMFunct6.redmin) , Reduction.Y, AccInitPos.Y , UsesMinMax.Y, CmpLess.Y) } -object REDMAXU extends OPMInstruction { val props = Seq(F6(OPMFunct6.redmaxu) , Reduction.Y, AccInitZeros.Y, UsesMinMax.Y, CmpLess.Y, Swap12.Y) } -object REDMAX extends OPMInstruction { val props = Seq(F6(OPMFunct6.redmax) , Reduction.Y, AccInitNeg.Y , UsesMinMax.Y, CmpLess.Y, Swap12.Y) } -object FMERGE extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmerge) , AlwaysReadsVM.Y, UsesMerge.Y, SetsWMask.N) } - - -object AND extends OPIInstruction { val props = Seq(F6(OPIFunct6.and) , BWAnd.Y) } -object OR extends OPIInstruction { val props = Seq(F6(OPIFunct6.or) , BWOr.Y) } -object XOR extends OPIInstruction { val props = Seq(F6(OPIFunct6.xor) , BWXor.Y) } -object MANDNOT extends OPMInstruction { val props = Seq(F6(OPMFunct6.mandnot) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWAnd.Y, BWInv1.Y) } -object MAND extends OPMInstruction { val props = Seq(F6(OPMFunct6.mand) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWAnd.Y) } -object MOR extends OPMInstruction { val props = Seq(F6(OPMFunct6.mor) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWOr.Y) } -object MXOR extends OPMInstruction { val props = Seq(F6(OPMFunct6.mxor) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWXor.Y) } -object MORNOT extends OPMInstruction { val props = Seq(F6(OPMFunct6.mornot) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWOr.Y, BWInv1.Y) } -object MNAND extends OPMInstruction { val props = Seq(F6(OPMFunct6.mnand) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWAnd.Y, BWInvOut.Y) } -object MNOR extends OPMInstruction { val props = Seq(F6(OPMFunct6.mnor) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWOr.Y, BWInvOut.Y) } -object MXNOR extends OPMInstruction { val props = Seq(F6(OPMFunct6.mxnor) , WritesAsMask.Y, ReadsVS1AsMask.Y, ReadsVS2AsMask.Y, BWXor.Y, BWInvOut.Y) } -object REDAND extends OPMInstruction { val props = Seq(F6(OPMFunct6.redand) , Reduction.Y, AccInitOnes.Y , BWAnd.Y) } -object REDOR extends OPMInstruction { val props = Seq(F6(OPMFunct6.redor) , Reduction.Y, AccInitZeros.Y, BWOr.Y) } -object REDXOR extends OPMInstruction { val props = Seq(F6(OPMFunct6.redxor) , Reduction.Y, AccInitZeros.Y, BWXor.Y) } - -object MUL extends OPMInstruction { val props = Seq(F6(OPMFunct6.mul) , MULHi.N) } -object MULH extends OPMInstruction { val props = Seq(F6(OPMFunct6.mulh) , MULHi.Y, MULSign1.Y, MULSign2.Y) } -object MULHU extends OPMInstruction { val props = Seq(F6(OPMFunct6.mulhu) , MULHi.Y, MULSign1.N, MULSign2.N) } -object MULHSU extends OPMInstruction { val props = Seq(F6(OPMFunct6.mulhsu) , MULHi.Y, MULSign1.N, MULSign2.Y) } -object WMUL extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmul) , MULHi.N, MULSign1.Y, MULSign2.Y, Wide2VD.Y) } -object WMULU extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmulu) , MULHi.N, MULSign1.N, MULSign2.N, Wide2VD.Y) } -object WMULSU extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmulsu) , MULHi.N, MULSign1.N, MULSign2.Y, Wide2VD.Y) } -object MACC extends OPMInstruction { val props = Seq(F6(OPMFunct6.macc) , MULHi.N, ReadsVD.Y, MULAccumulate.Y, MULSub.N) } -object NMSAC extends OPMInstruction { val props = Seq(F6(OPMFunct6.nmsac) , MULHi.N, ReadsVD.Y, MULAccumulate.Y, MULSub.Y) } -object MADD extends OPMInstruction { val props = Seq(F6(OPMFunct6.madd) , MULHi.N, ReadsVD.Y, MULAccumulate.Y, MULSub.N, MULSwapVdV2.Y) } -object NMSUB extends OPMInstruction { val props = Seq(F6(OPMFunct6.nmsub) , MULHi.N, ReadsVD.Y, MULAccumulate.Y, MULSub.Y, MULSwapVdV2.Y) } -object WMACC extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmacc) , MULHi.N, ReadsVD.Y, MULSign1.Y, MULSign2.Y, MULAccumulate.Y, MULSub.N, Wide2VD.Y) } -object WMACCU extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmaccu) , MULHi.N, ReadsVD.Y, MULSign1.N, MULSign2.N, MULAccumulate.Y, MULSub.N, Wide2VD.Y) } -object WMACCUS extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmaccus) , MULHi.N, ReadsVD.Y, MULSign1.N, MULSign2.Y, MULAccumulate.Y, MULSub.N, Wide2VD.Y) } -object WMACCSU extends OPMInstruction { val props = Seq(F6(OPMFunct6.wmaccsu) , MULHi.N, ReadsVD.Y, MULSign1.Y, MULSign2.N, MULAccumulate.Y, MULSub.N, Wide2VD.Y) } -object SMUL extends OPIInstruction { val props = Seq(F6(OPIFunct6.smul) , MULHi.N, MULSign1.Y, MULSign2.Y, MULSub.N) } - -object DIV extends OPMInstruction { val props = Seq(F6(OPMFunct6.div)) } -object DIVU extends OPMInstruction { val props = Seq(F6(OPMFunct6.divu)) } -object REM extends OPMInstruction { val props = Seq(F6(OPMFunct6.rem)) } -object REMU extends OPMInstruction { val props = Seq(F6(OPMFunct6.remu)) } - -object MV_S_X extends VectorInstruction { val props = Seq(F6(OPMFunct6.wrxunary0), F3(VectorConsts.OPMVX), RS2( 0.U(5.W)), ReadsVS2.N, VMBitReadsVM.N, ScalarToVD0.Y) } -object MV_X_S extends VectorInstruction { val props = Seq(F6(OPMFunct6.wrxunary0), F3(VectorConsts.OPMVV), RS1( 0.U(5.W)), ReadsVS2.Y, VMBitReadsVM.N, WritesScalar.Y, WritesVD.N) } -object POPC extends VectorInstruction { val props = Seq(F6(OPMFunct6.wrxunary0), F3(VectorConsts.OPMVV), RS1(16.U(5.W)), ReadsVS2.Y, VMBitReadsVM.Y, WritesScalar.Y, WritesVD.N, ReadsVS2AsMask.Y) } -object FIRST extends VectorInstruction { val props = Seq(F6(OPMFunct6.wrxunary0), F3(VectorConsts.OPMVV), RS1(17.U(5.W)), ReadsVS2.Y, VMBitReadsVM.Y, WritesScalar.Y, WritesVD.N, ReadsVS2AsMask.Y) } -object FMV_S_F extends VectorInstruction { val props = Seq(F6(OPFFunct6.wrfunary0), F3(VectorConsts.OPFVF), RS2( 0.U(5.W)), ReadsVS2.N, VMBitReadsVM.N, ScalarToVD0.Y) } -object FMV_F_S extends VectorInstruction { val props = Seq(F6(OPFFunct6.wrfunary0), F3(VectorConsts.OPFVV), RS1( 0.U(5.W)), ReadsVS2.Y, VMBitReadsVM.N, WritesScalar.Y, WritesVD.N) } -object MSBF extends VectorInstruction { val props = Seq(F6(OPMFunct6.munary0) , F3(VectorConsts.OPMVV), RS1( 1.U(5.W)), ReadsVS2AsMask.Y, WritesAsMask.Y) } -object MSOF extends VectorInstruction { val props = Seq(F6(OPMFunct6.munary0) , F3(VectorConsts.OPMVV), RS1( 2.U(5.W)), ReadsVS2AsMask.Y, WritesAsMask.Y) } -object MSIF extends VectorInstruction { val props = Seq(F6(OPMFunct6.munary0) , F3(VectorConsts.OPMVV), RS1( 3.U(5.W)), ReadsVS2AsMask.Y, WritesAsMask.Y) } -object IOTA extends VectorInstruction { val props = Seq(F6(OPMFunct6.munary0) , F3(VectorConsts.OPMVV), RS1(16.U(5.W)), ReadsVS2AsMask.Y) } -object ID extends VectorInstruction { val props = Seq(F6(OPMFunct6.munary0) , F3(VectorConsts.OPMVV), RS1(17.U(5.W)), ReadsVS2AsMask.Y, ReadsVS2.N) } - -object FADD extends OPFInstruction { val props = Seq(F6(OPFFunct6.fadd) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W))) } -object FSUB extends OPFInstruction { val props = Seq(F6(OPFFunct6.fsub) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(1.U(2.W))) } -object FRSUB extends OPFInstruction { val props = Seq(F6(OPFFunct6.frsub) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(2.U(2.W))) } -object FMUL extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmul) , FPAdd.N, FPMul.Y, FPSwapVdV2.N, FPFMACmd(0.U(2.W))) } -object FMACC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmacc) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), ReadsVD.Y) } -object FNMACC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fnmacc) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(3.U(2.W)), ReadsVD.Y) } -object FMSAC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmsac) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(1.U(2.W)), ReadsVD.Y) } -object FNMSAC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fnmsac) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(2.U(2.W)), ReadsVD.Y) } -object FMADD extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmadd) , FPAdd.Y, FPMul.Y, FPSwapVdV2.Y, FPFMACmd(0.U(2.W)), ReadsVD.Y) } -object FNMADD extends OPFInstruction { val props = Seq(F6(OPFFunct6.fnmadd) , FPAdd.Y, FPMul.Y, FPSwapVdV2.Y, FPFMACmd(3.U(2.W)), ReadsVD.Y) } -object FMSUB extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmsub) , FPAdd.Y, FPMul.Y, FPSwapVdV2.Y, FPFMACmd(1.U(2.W)), ReadsVD.Y) } -object FNMSUB extends OPFInstruction { val props = Seq(F6(OPFFunct6.fnmsub) , FPAdd.Y, FPMul.Y, FPSwapVdV2.Y, FPFMACmd(2.U(2.W)), ReadsVD.Y) } -object FWADD extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwadd) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Wide2VD.Y) } -object FWSUB extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwsub) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(1.U(2.W)), Wide2VD.Y) } -object FWADDW extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwaddw) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Wide2VD.Y, Wide2VS2.Y) } -object FWSUBW extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwsubw) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(1.U(2.W)), Wide2VD.Y, Wide2VS2.Y) } -object FWMUL extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwmul) , FPAdd.N, FPMul.Y, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Wide2VD.Y) } -object FWMACC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwmacc) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Wide2VD.Y, ReadsVD.Y) } -object FWNMACC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwnmacc) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(3.U(2.W)), Wide2VD.Y, ReadsVD.Y) } -object FWMSAC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwmsac) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(1.U(2.W)), Wide2VD.Y, ReadsVD.Y) } -object FWNMSAC extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwnmsac) , FPAdd.Y, FPMul.Y, FPSwapVdV2.N, FPFMACmd(2.U(2.W)), Wide2VD.Y, ReadsVD.Y) } -object FREDOSUM extends OPFInstruction { val props = Seq(F6(OPFFunct6.fredosum) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Reduction.Y, AccInitZeros.Y, Elementwise.Y) } -object FREDUSUM extends OPFInstruction { val props = Seq(F6(OPFFunct6.fredusum) , FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Reduction.Y, AccInitZeros.Y) } -object FWREDOSUM extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwredosum), FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Wide2VD.Y, Reduction.Y, AccInitZeros.Y, Elementwise.Y) } -object FWREDUSUM extends OPFInstruction { val props = Seq(F6(OPFFunct6.fwredusum), FPAdd.Y, FPMul.N, FPSwapVdV2.N, FPFMACmd(0.U(2.W)), Wide2VD.Y, Reduction.Y, AccInitZeros.Y) } - -object FDIV extends OPFInstruction { val props = Seq(F6(OPFFunct6.fdiv) , FPSwapVdV2.N, FPAdd.N, FPMul.N) } -object FRDIV extends OPFInstruction { val props = Seq(F6(OPFFunct6.frdiv) , FPSwapVdV2.Y, FPAdd.N, FPMul.N) } -object FSQRT_V extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary1) , F3(VectorConsts.OPFVV), RS1( 0.U(5.W)), FPSwapVdV2.N, FPAdd.N, FPMul.N) } -object FRSQRT7_V extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary1) , F3(VectorConsts.OPFVV), RS1( 4.U(5.W)), FPSwapVdV2.N) } -object FREC7_V extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary1) , F3(VectorConsts.OPFVV), RS1( 5.U(5.W)), FPSwapVdV2.N) } -object FCLASS_V extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary1) , F3(VectorConsts.OPFVV), RS1(16.U(5.W)), FPSwapVdV2.N, FPAdd.N, FPMul.N, FPSpecRM(1.U(3.W))) } - -object FMIN extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmin) , FPComp.Y, FPCompMin.Y, FPAdd.N, FPMul.N, FPSpecRM(0.U(3.W))) } -object FMAX extends OPFInstruction { val props = Seq(F6(OPFFunct6.fmax) , FPComp.Y, FPCompMin.N, FPAdd.N, FPMul.N, FPSpecRM(1.U(3.W))) } -object FSGNJ extends OPFInstruction { val props = Seq(F6(OPFFunct6.fsgnj) , FPSgnj.Y, FPAdd.N, FPMul.N, FPSpecRM(0.U(3.W))) } -object FSGNJN extends OPFInstruction { val props = Seq(F6(OPFFunct6.fsgnjn) , FPSgnj.Y, FPAdd.N, FPMul.N, FPSpecRM(1.U(2.W))) } -object FSGNJX extends OPFInstruction { val props = Seq(F6(OPFFunct6.fsgnjx) , FPSgnj.Y, FPAdd.N, FPMul.N, FPSpecRM(2.U(3.W))) } -object MFEQ extends OPFInstruction { val props = Seq(F6(OPFFunct6.mfeq) , WritesAsMask.Y, FPMEQ.Y, FPMNE.N, FPMLT.N, FPMGT.N, FPAdd.N, FPMul.N, FPSpecRM(2.U(3.W))) } -object MFNE extends OPFInstruction { val props = Seq(F6(OPFFunct6.mfne) , WritesAsMask.Y, FPMEQ.N, FPMNE.Y, FPMLT.N, FPMGT.N, FPAdd.N, FPMul.N, FPSpecRM(2.U(3.W))) } -object MFLT extends OPFInstruction { val props = Seq(F6(OPFFunct6.mflt) , WritesAsMask.Y, FPMEQ.N, FPMNE.N, FPMLT.Y, FPMGT.N, FPAdd.N, FPMul.N, FPSpecRM(1.U(3.W))) } -object MFLE extends OPFInstruction { val props = Seq(F6(OPFFunct6.mfle) , WritesAsMask.Y, FPMEQ.Y, FPMNE.N, FPMLT.Y, FPMGT.N, FPAdd.N, FPMul.N, FPSpecRM(0.U(3.W))) } -object MFGT extends OPFInstruction { val props = Seq(F6(OPFFunct6.mfgt) , WritesAsMask.Y, FPMEQ.N, FPMNE.N, FPMLT.N, FPMGT.Y, FPAdd.N, FPMul.N, FPSpecRM(0.U(3.W))) } -object MFGE extends OPFInstruction { val props = Seq(F6(OPFFunct6.mfge) , WritesAsMask.Y, FPMEQ.Y, FPMNE.N, FPMLT.N, FPMGT.Y, FPAdd.N, FPMul.N, FPSpecRM(1.U(3.W))) } -object FREDMIN extends OPFInstruction { val props = Seq(F6(OPFFunct6.fredmin) , FPComp.Y, FPCompMin.Y, Reduction.Y, FPAdd.N, FPMul.N, FPSpecRM(0.U(3.W))) } -object FREDMAX extends OPFInstruction { val props = Seq(F6(OPFFunct6.fredmax) , FPComp.Y, FPCompMin.N, Reduction.Y, FPAdd.N, FPMul.N, FPSpecRM(1.U(3.W))) } - -object FCVT_SGL extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary0), F3(VectorConsts.OPFVV), RS1(BitPat("b00???")), FPAdd.N, FPMul.N) } -object FCVT_WID extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary0), F3(VectorConsts.OPFVV), RS1(BitPat("b01???")), Wide2VD.Y, FPAdd.N, FPMul.N) } -object FCVT_NRW extends VectorInstruction { val props = Seq(F6(OPFFunct6.funary0), F3(VectorConsts.OPFVV), RS1(BitPat("b10???")), Wide2VD.N, Wide2VS2.Y, FPAdd.N, FPMul.N) } - -object SLIDEUP extends OPIInstruction { val props = Seq(F6(OPIFunct6.slideup) , UsesPermuteSeq.Y, ReadsVS2.N) } -object SLIDEDOWN extends OPIInstruction { val props = Seq(F6(OPIFunct6.slidedown) , UsesPermuteSeq.Y, ReadsVS2.N) } -object SLIDE1UP extends OPMInstruction { val props = Seq(F6(OPMFunct6.slide1up) , UsesPermuteSeq.Y, ReadsVS2.N) } -object SLIDE1DOWN extends OPMInstruction { val props = Seq(F6(OPMFunct6.slide1down) , UsesPermuteSeq.Y, ReadsVS2.N) } -object FSLIDE1UP extends OPFInstruction { val props = Seq(F6(OPFFunct6.fslide1up) , UsesPermuteSeq.Y, ReadsVS2.N) } -object FSLIDE1DOWN extends OPFInstruction { val props = Seq(F6(OPFFunct6.fslide1down), UsesPermuteSeq.Y, ReadsVS2.N) } - -object RGATHER_VX extends VectorInstruction { val props = Seq(F6(OPIFunct6.rgather) , F3(VectorConsts.OPIVX)) } -object RGATHER_VI extends VectorInstruction { val props = Seq(F6(OPIFunct6.rgather) , F3(VectorConsts.OPIVI)) } -object RGATHER_VV extends VectorInstruction { val props = Seq(F6(OPIFunct6.rgather) , F3(VectorConsts.OPIVV), UsesPermuteSeq.Y, Elementwise.Y) } -object RGATHEREI16 extends VectorInstruction { val props = Seq(F6(OPIFunct6.rgatherei16), F3(VectorConsts.OPIVV), UsesPermuteSeq.Y, Elementwise.Y) } -object COMPRESS extends OPMInstruction { val props = Seq(F6(OPMFunct6.compress) , ReadsVS1AsMask.Y, Elementwise.Y) } -object MVNRR extends VectorInstruction { val props = Seq(F6(OPIFunct6.mvnrr) , F3(VectorConsts.OPIVI)) } - - -// Zvbb instructions -object ANDN extends OPIInstruction { val props = Seq(F6(OPIFunct6.andn) , BWAnd.Y, BWInv1.Y) } -object BREV8 extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b01000")), UsesBitSwap.Y) } -object REV8 extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b01001")), UsesBitSwap.Y) } -object BREV extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b01010")), UsesBitSwap.Y) } -object CLZ extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b01100")), UsesCountZeros.Y) } -object CTZ extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b01101")), UsesCountZeros.Y) } -object CPOP extends OPMInstruction { val props = Seq(F6(OPMFunct6.xunary0) , RS1(BitPat("b01110")), UsesCountZeros.Y) } -object ROL extends OPIInstruction { val props = Seq(F6(OPIFunct6.rol) , UsesShift.Y, ShiftsLeft.Y, ScalingShift.N) } -object RORI extends OPIInstruction { val props = Seq(F6(OPIFunct6.rol) , UsesShift.Y, ShiftsLeft.Y, ScalingShift.N) } -object ROR extends OPIInstruction { val props = Seq(F6(OPIFunct6.ror) , UsesShift.Y, ShiftsLeft.N, ScalingShift.N) } -object WSLL extends OPIInstruction { val props = Seq(F6(OPIFunct6.wsll) , UsesShift.Y, ShiftsLeft.Y, ScalingShift.N, Wide2VD.Y, ZextImm5.Y) } diff --git a/arch/src/main/scala/framework/gendomain/mem/AddrGen.scala b/arch/src/main/scala/framework/gendomain/mem/AddrGen.scala deleted file mode 100644 index 84e0367a..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/AddrGen.scala +++ /dev/null @@ -1,127 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class AddrGen(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val valid = Input(Bool()) - val lsiq_id = Input(UInt(lsiqIdBits.W)) - val done = Output(Bool()) - val tag = Flipped(Decoupled(UInt(dmemTagBits.W))) - val op = Input(new VectorMemMacroOp) - val maskindex = new Bundle { - val index = Input(UInt(64.W)) - val mask = Input(Bool()) - val eew = Output(UInt(2.W)) - val needs_mask = Output(Bool()) - val needs_index = Output(Bool()) - val valid = Input(Bool()) - val ready = Output(Bool()) - } - val req = Decoupled(new MemRequest(dLenB, dmemTagBits)) - - val out = Decoupled(new IFQEntry) - }) - - def min(a: UInt, b: UInt) = Mux(a > b, b, a) - - def getElems(off: UInt, eew: UInt): UInt = { - (dLenB.U - off(dLenOffBits-1,0)) >> eew - } - - val r_eaddr = Reg(UInt(paddrBits.W)) - val r_saddr = Reg(UInt(paddrBits.W)) - val r_eidx = Reg(UInt((1+log2Ceil(8*maxVLMax)).W)) - val r_sidx = Reg(UInt(3.W)) - val r_head = RegInit(true.B) - - val fast_segmented = io.op.mop === mopUnit && io.op.segend === io.op.seg_nf && io.op.segstart === 0.U - val eidx = Mux(r_head, - io.op.vstart * (Mux(fast_segmented, io.op.seg_nf, 0.U) +& 1.U), - r_eidx) - val sidx = Mux(r_head, io.op.segstart , r_sidx) - val start_offset = (io.op.vstart * Mux(io.op.mop === mopStrided, - io.op.stride, - (io.op.seg_nf +& 1.U) << io.op.elem_size))(pgIdxBits-1,0) - val start_addr = io.op.base_offset + start_offset + (io.op.segstart << io.op.elem_size) - val index_offset = io.maskindex.index & eewBitMask(io.op.idx_size) - val eaddr = Mux(io.op.indexed, - io.op.base_offset + index_offset + Mux(r_head, io.op.segstart << io.op.elem_size, 0.U), - Mux(r_head, start_addr, r_eaddr)) - val saddr = Mux(io.op.seg_nf =/= 0.U && !fast_segmented, Mux(r_head, eaddr, r_saddr), eaddr) - - val mem_size = io.op.elem_size - val max_eidx = Mux(fast_segmented, - io.op.vl * (io.op.seg_nf +& 1.U), - io.op.vl) - - val next_max_elems = getElems(saddr, mem_size) - val next_contig_elems = Mux(fast_segmented, - max_eidx - eidx, - io.op.seg_nf +& 1.U - sidx) - val next_act_elems = min(next_contig_elems, next_max_elems)(dLenOffBits,0) - val next_act_bytes = next_act_elems << mem_size - - val next_sidx = sidx +& next_act_elems - val next_eidx = eidx +& Mux(fast_segmented, next_act_elems, 1.U) - - val next_eaddr = eaddr + Mux(io.op.mop === mopUnit, next_act_bytes, Mux(io.op.mop === mopStrided, io.op.stride, 0.U)) - val next_saddr = saddr + next_act_bytes - - val needs_mask = !io.op.vm && io.op.mop =/= mopUnit - val needs_index = io.op.mop(0) - val block_maskindex = (needs_mask || needs_index) && !io.maskindex.valid - - val masked = (needs_mask && !io.maskindex.mask) || (io.op.seg_nf > 0.U && sidx > io.op.segend) - val may_clear = (fast_segmented || next_sidx > io.op.seg_nf) && next_eidx >= max_eidx - - - io.done := false.B - io.maskindex.ready := false.B - io.maskindex.needs_mask := needs_mask - io.maskindex.needs_index := needs_index - io.maskindex.eew := io.op.idx_size - io.out.valid := io.valid && !block_maskindex && (masked || io.req.ready) && io.tag.valid - io.out.bits.head := saddr - io.out.bits.tail := saddr + next_act_bytes - io.out.bits.masked := masked - io.out.bits.last := may_clear - io.out.bits.lsiq_id := io.lsiq_id - io.out.bits.page_offset := saddr(pgIdxBits-1,0) - - io.req.valid := io.valid && io.out.ready && !block_maskindex && !masked && io.tag.valid - io.req.bits.addr := Cat(io.op.page, saddr(pgIdxBits-1,0)) - io.req.bits.data := DontCare - io.req.bits.mask := ((1.U << next_act_bytes) - 1.U) << saddr(dLenOffBits-1,0) - io.req.bits.tag := io.tag.bits - io.req.bits.store := DontCare - - io.tag.ready := io.valid && (io.req.ready || masked) && io.out.ready && !block_maskindex - - when (io.out.fire) { - when (next_sidx > io.op.seg_nf || fast_segmented) { - r_eaddr := next_eaddr - r_saddr := next_eaddr - r_eidx := next_eidx - r_sidx := 0.U - io.maskindex.ready := needs_mask || needs_index - } .otherwise { - r_eaddr := eaddr - r_saddr := next_saddr - r_eidx := io.op.vstart - r_sidx := next_sidx - } - r_head := false.B - when (may_clear) { - io.done := true.B - r_head := true.B - } - } - -} diff --git a/arch/src/main/scala/framework/gendomain/mem/LoadOrderBuffer.scala b/arch/src/main/scala/framework/gendomain/mem/LoadOrderBuffer.scala deleted file mode 100644 index d839ef92..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/LoadOrderBuffer.scala +++ /dev/null @@ -1,112 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import framework.gendomain.common._ - -object AgePriorityEncoder -{ - def apply(in: Seq[Bool], head: UInt): UInt = { - val n = in.size - val width = log2Ceil(in.size) - val n_padded = 1 << width - val temp_vec = (0 until n_padded).map(i => if (i < n) in(i) && i.U >= head else false.B) ++ in - val idx = PriorityEncoder(temp_vec) - idx(width-1, 0) //discard msb - } -} - -class LoadOrderBuffer(nEntries: Int, nRobEntries: Int)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - require(nEntries > 0, "Queue must have non-negative number of entries") - require(nRobEntries <= nEntries) - - val tagBits = log2Ceil(nEntries) - val io = IO(new Bundle { - val reserve = Decoupled(UInt(tagBits.W)) - val entry = Input(new IFQEntry) - val push = Input(Valid(new Bundle { - val data = UInt(dLen.W) - val tag = UInt(tagBits.W) - })) - - val replay_liq_id = Output(UInt(log2Ceil(vParams.vliqEntries).W)) - val replay = Decoupled(new MemRequest(dLenB, dmemTagBits)) - val deq = Decoupled(new IFQEntry) - val deq_data = Output(UInt(dLen.W)) - val busy = Output(Bool()) - }) - - val simpleRob = nEntries == nRobEntries - - val valids = RegInit(VecInit.fill(nEntries)(false.B)) - val must_replay = RegInit(VecInit.fill(nEntries)(false.B)) - val entries = Reg(Vec(nEntries, new IFQEntry)) - val rob_idxs = Reg(Vec(nEntries, UInt(log2Ceil(nRobEntries).W))) - val rob = Reg(Vec(nRobEntries, UInt(dLen.W))) - val rob_valids = RegInit(VecInit.fill(nRobEntries)(false.B)) - - val enq_ptr = Counter(nEntries) - val deq_ptr = Counter(nEntries) - val maybe_full = RegInit(false.B) - val has_replay = must_replay.orR - val rob_full = rob_valids.andR && (!simpleRob).B - val rob_next = PriorityEncoder(~rob_valids) - val ptr_match = enq_ptr.value === deq_ptr.value - val empty = ptr_match && !maybe_full - val full = ptr_match && maybe_full - - io.busy := !empty - io.reserve.valid := !full && !has_replay - io.reserve.bits := enq_ptr.value - when (io.reserve.fire) { - entries(enq_ptr.value) := io.entry - valids(enq_ptr.value) := io.entry.masked - enq_ptr.inc() - } - - io.deq.valid := !empty && (valids(deq_ptr.value) || (io.push.fire && io.push.bits.tag === deq_ptr.value)) - io.deq.bits := entries(deq_ptr.value) - val rob_deq_idx = if (simpleRob) deq_ptr.value else rob_idxs(deq_ptr.value) - io.deq_data := Mux(valids(deq_ptr.value), rob(rob_deq_idx), io.push.bits.data) - - val rob_push_idx = if (simpleRob) io.push.bits.tag else rob_next - when (io.push.valid && !(deq_ptr.value === io.push.bits.tag && io.deq.ready)) { - when (rob_full) { - must_replay(io.push.bits.tag) := true.B - } .otherwise { - valids(io.push.bits.tag) := true.B - rob_idxs(io.push.bits.tag) := rob_next - rob_valids(rob_push_idx) := true.B - rob(rob_push_idx) := io.push.bits.data - } - } - - when (io.deq.fire) { - deq_ptr.inc() - valids(deq_ptr.value) := false.B - when (valids(deq_ptr.value) && !entries(deq_ptr.value).masked) { - rob_valids(rob_deq_idx) := false.B - } - } - - val replay_valid = must_replay.orR - val next_replay = AgePriorityEncoder(must_replay, deq_ptr.value) - io.replay_liq_id := entries(next_replay).lsiq_id - io.replay.valid := replay_valid - io.replay.bits.addr := entries(next_replay).page_offset - io.replay.bits.data := DontCare - io.replay.bits.mask := ~(0.U(dLenB.W)) - io.replay.bits.tag := next_replay - io.replay.bits.store := false.B - - when (io.replay.fire) { - must_replay(next_replay) := false.B - } - - when (io.reserve.fire =/= io.deq.fire) { - maybe_full := io.reserve.fire - } -} diff --git a/arch/src/main/scala/framework/gendomain/mem/LoadSegmenter.scala b/arch/src/main/scala/framework/gendomain/mem/LoadSegmenter.scala deleted file mode 100644 index e8927faa..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/LoadSegmenter.scala +++ /dev/null @@ -1,93 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class LoadSegmenter(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val valid = Input(Bool()) - val done = Output(Bool()) - val op = Input(new VectorMemMacroOp) - - val compactor = Decoupled(new CompactorReq(dLenB)) - val compactor_data = Input(UInt(dLen.W)) - - val resp = Decoupled(new Bundle { - val data = UInt(dLen.W) - val debug_id = UInt(debugIdSz.W) - }) - }) - - val segbuf = Module(new LoadSegmentBuffer(vParams.doubleBufferSegments)) - - val r_eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val r_head = RegInit(true.B) - val r_sidx = Reg(UInt(3.W)) - val eidx = Mux(r_head, io.op.vstart, r_eidx) - val sidx = Mux(r_head, io.op.segstart, r_sidx) - - val mem_size = io.op.elem_size - val incr = (dLenB.U - (Mux(io.op.seg_nf === 0.U, eidx, sidx) << mem_size)(dLenOffBits-1,0)) >> mem_size - val eidx_incr = Mux(io.op.seg_nf =/= 0.U, 1.U, incr) - val sidx_incr = incr - val next_eidx = eidx +& eidx_incr - val next_sidx = sidx +& sidx_incr - - val sidx_tail = next_sidx > io.op.seg_nf - val eidx_tail = next_eidx >= io.op.vl - - when (io.op.seg_nf === 0.U) { - io.compactor.valid := io.valid && !segbuf.io.busy && io.resp.ready - io.compactor.bits.head := eidx << mem_size - io.compactor.bits.tail := Mux(eidx_tail, io.op.vl << mem_size, 0.U) - } .otherwise { - io.compactor.valid := io.valid && segbuf.io.in.ready - io.compactor.bits.head := sidx << mem_size - io.compactor.bits.tail := Mux(sidx_tail, (io.op.nf +& 1.U) << mem_size, 0.U) - } - - segbuf.io.in.valid := io.valid && io.op.seg_nf =/= 0.U && io.compactor.ready - segbuf.io.in.bits.eew := mem_size - segbuf.io.in.bits.nf := io.op.nf - segbuf.io.in.bits.data := io.compactor_data - segbuf.io.in.bits.eidx := eidx - segbuf.io.in.bits.sidx := sidx - segbuf.io.in.bits.sidx_tail := sidx_tail - segbuf.io.in.bits.tail := eidx_tail - segbuf.io.in.bits.segstart := io.op.segstart - segbuf.io.in.bits.debug_id := io.op.debug_id - - segbuf.io.out.ready := io.resp.ready - - io.resp.valid := Mux(segbuf.io.busy, - segbuf.io.out.valid, - io.compactor.ready && io.valid && io.op.seg_nf === 0.U) - io.resp.bits.data := Mux(segbuf.io.busy, segbuf.io.out.bits.data, io.compactor_data) - io.resp.bits.debug_id := Mux(segbuf.io.busy, segbuf.io.out.bits.debug_id, io.op.debug_id) - - - val seg_ready = Mux(io.op.seg_nf === 0.U, - !segbuf.io.busy && io.compactor.ready && io.resp.ready, - segbuf.io.in.ready && io.compactor.ready && sidx_tail) - - when (segbuf.io.in.fire) { - - r_head := false.B - when (r_head) { r_eidx := io.op.vstart } - r_sidx := next_sidx - when (next_sidx > io.op.nf) { - r_sidx := 0.U - } - } - io.done := false.B - when (seg_ready && io.valid) { - r_head := eidx_tail - r_eidx := next_eidx - io.done := eidx_tail - } -} diff --git a/arch/src/main/scala/framework/gendomain/mem/Mem.scala b/arch/src/main/scala/framework/gendomain/mem/Mem.scala deleted file mode 100644 index f158598f..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/Mem.scala +++ /dev/null @@ -1,402 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class LSIQEntry(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val op = new VectorMemMacroOp - def bound_all = op.mop =/= mopUnit - val bound_offset = UInt(pgIdxBits.W) - val ld_dep_mask = Vec(vParams.vliqEntries, Bool()) - val st_dep_mask = Vec(vParams.vsiqEntries, Bool()) - - def containsBlock(addr: UInt) = { - val cl = addr(pgIdxBits-1,lgCacheBlockBytes) - val base_cl = op.base_offset >> lgCacheBlockBytes - val bound_cl = bound_offset >> lgCacheBlockBytes - (((addr >> pgIdxBits) === op.page) && (base_cl <= cl && bound_cl >= cl)) || bound_all - - } - def overlaps(other: LSIQEntry) = { - (op.page === other.op.page) && (bound_all || other.bound_all || ( - (op.base_offset <= other.bound_offset && bound_offset >= other.op.base_offset) - )) - } -} - -class IFQEntry(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val head = UInt(log2Ceil(dLenB).W) - val tail = UInt(log2Ceil(dLenB).W) - val masked = Bool() - val last = Bool() - val lsiq_id = UInt(lsiqIdBits.W) - val page_offset = UInt(pgIdxBits.W) -} - -class MemRequest(bytes: Int, tagBits: Int)(implicit p: Parameters) extends CoreBundle()(p) { - val addr = UInt(coreMaxAddrBits.W) - val data = UInt((bytes*8).W) - val mask = UInt(bytes.W) - val tag = UInt(tagBits.W) - val store = Bool() -} - -class MemResponse(bytes: Int, tagBits: Int)(implicit p: Parameters) extends CoreBundle()(p) { - val data = UInt((bytes*8).W) - val tag = UInt(tagBits.W) -} - -class ScalarMemOrderCheckIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val addr = Input(UInt(coreMaxAddrBits.W)) - val size = Input(UInt(2.W)) - val store = Input(Bool()) - val conflict = Output(Bool()) -} - -class VectorMemIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val load_req = Decoupled(new MemRequest(dLenB, dmemTagBits)) - val load_resp = Input(Valid(new MemResponse(dLenB, dmemTagBits))) - val store_req = Decoupled(new MemRequest(dLenB, dmemTagBits)) - val store_ack = Input(Valid(new MemResponse(dLenB, dmemTagBits))) -} - -class VectorSGMemIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val req = Vec(vParams.vsgPorts, Decoupled(new MemRequest(1, sgmemTagBits))) - val resp = Vec(vParams.vsgPorts, Input(Valid(new MemResponse(1, sgmemTagBits)))) -} - -class VectorMemDatapathIO(implicit p: Parameters) extends CoreBundle()(p) with HasVectorParams { - val lresp = Decoupled(new Bundle { - val data = UInt(dLen.W) - val debug_id = UInt(debugIdSz.W) - }) - val sdata = Flipped(Decoupled(new StoreDataMicroOp)) - - val mask_pop = Decoupled(new CompactorReq(dLenB)) - val mask_data = Input(Vec(dLenB, Bool())) - val index_pop = Decoupled(new CompactorReq(dLenB)) - val index_data = Input(Vec(dLenB, UInt(8.W))) -} - -class VectorMemUnit(sgSize: Option[BigInt] = None)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val enq = Flipped(Decoupled(new VectorMemMacroOp)) - - val dmem = new VectorMemIO - val sgmem = sgSize.map(_ => new VectorSGMemIO) - val scalar_check = new ScalarMemOrderCheckIO - - val vu = new VectorMemDatapathIO - - val busy = Output(Bool()) - }) - - def ptrIncr(u: UInt, sz: Int): Unit = { - val n = u +& 1.U - u := Mux(n === sz.U, 0.U, n) - } - - val sgas = sgSize.map { size => Module(new ScatterGatherAddrGen(size)) } - - val las = Module(new AddrGen) - val lifq = Module(new LoadOrderBuffer(vParams.vlifqEntries, vParams.vlrobEntries)) - val lcu = Module(new Compactor(dLenB, dLenB, UInt(8.W), true)) - val lss = Module(new LoadSegmenter) - - val scu = Module(new Compactor(dLenB, dLenB, new MaskedByte, false)) - val sss = Module(new StoreSegmenter) - val sas = Module(new AddrGen) - val sifq = Module(new DCEQueue(new IFQEntry, vParams.vsifqEntries)) - - val liq = Reg(Vec(vParams.vliqEntries, new LSIQEntry)) - val liq_valids = RegInit(VecInit.fill(vParams.vliqEntries)(false.B)) - val liq_las = RegInit(VecInit.fill(vParams.vliqEntries)(false.B)) - val liq_enq_ptr = RegInit(0.U(log2Ceil(vParams.vliqEntries).W)) - val liq_las_ptr = RegInit(0.U(log2Ceil(vParams.vliqEntries).W)) - val liq_lss_ptr = RegInit(0.U(log2Ceil(vParams.vliqEntries).W)) - - val liq_enq_fire = Wire(Bool()) - val liq_las_fire = Wire(Bool()) - val liq_lss_fire = Wire(Bool()) - - val liq_enq_ready = !liq_valids(liq_enq_ptr) - val liq_las_valid = !liq_las(liq_las_ptr) && liq_valids(liq_las_ptr) - val liq_lss_valid = liq_valids(liq_lss_ptr) - - val siq = Reg(Vec(vParams.vsiqEntries, new LSIQEntry)) - val siq_valids = RegInit(VecInit.fill(vParams.vsiqEntries)(false.B)) - val siq_sss = RegInit(VecInit.fill(vParams.vsiqEntries)(false.B)) - val siq_sas = RegInit(VecInit.fill(vParams.vsiqEntries)(false.B)) - val siq_enq_ptr = RegInit(0.U(log2Ceil(vParams.vsiqEntries).W)) - val siq_sss_ptr = RegInit(0.U(log2Ceil(vParams.vsiqEntries).W)) - val siq_sas_ptr = RegInit(0.U(log2Ceil(vParams.vsiqEntries).W)) - val siq_deq_ptr = RegInit(0.U(log2Ceil(vParams.vsiqEntries).W)) - - val siq_enq_fire = Wire(Bool()) - val siq_sss_fire = Wire(Bool()) - val siq_sas_fire = Wire(Bool()) - val siq_deq_fire = Wire(Bool()) - - val siq_enq_ready = !siq_valids(siq_enq_ptr) - val siq_sss_valid = !siq_sss(siq_sss_ptr) && siq_valids(siq_sss_ptr) - val siq_sas_valid = !siq_sas(siq_sas_ptr) && siq_valids(siq_sas_ptr) - - val enq_bound_max = (((io.enq.bits.nf +& 1.U) * io.enq.bits.vl) << io.enq.bits.elem_size) + io.enq.bits.base_offset - 1.U - val enq_bound = Mux((enq_bound_max >> pgIdxBits) =/= 0.U, ~(0.U(pgIdxBits.W)), enq_bound_max) - - when (liq_enq_fire) { - liq(liq_enq_ptr).op := io.enq.bits - liq(liq_enq_ptr).bound_offset := enq_bound - liq(liq_enq_ptr).st_dep_mask := siq_valids - liq_las(liq_enq_ptr) := false.B - ptrIncr(liq_enq_ptr, vParams.vliqEntries) - liq_valids(liq_enq_ptr) := true.B - } - when (liq_las_fire) { - ptrIncr(liq_las_ptr, vParams.vliqEntries) - liq_las(liq_las_ptr) := true.B - } - when (liq_lss_fire) { - ptrIncr(liq_lss_ptr, vParams.vliqEntries) - liq_valids(liq_lss_ptr) := false.B - assert(liq_las(liq_lss_ptr) || (liq_lss_ptr === liq_las_ptr && liq_las_fire)) - } - - when (siq_enq_fire) { - siq(siq_enq_ptr).op := io.enq.bits - siq(siq_enq_ptr).bound_offset := enq_bound - siq(siq_enq_ptr).ld_dep_mask := liq_valids - siq_sss(siq_enq_ptr) := false.B - siq_sas(siq_enq_ptr) := false.B - ptrIncr(siq_enq_ptr, vParams.vsiqEntries) - siq_valids(siq_enq_ptr) := true.B - } - when (siq_sss_fire) { - ptrIncr(siq_sss_ptr, vParams.vsiqEntries) - siq_sss(siq_sss_ptr) := true.B - } - when (siq_sas_fire) { - ptrIncr(siq_sas_ptr, vParams.vsiqEntries) - siq_sas(siq_sas_ptr) := true.B - assert(siq_sss(siq_sas_ptr) || (siq_sss_fire && siq_sss_ptr === siq_sas_ptr)) - } - when (siq_deq_fire) { - ptrIncr(siq_deq_ptr, vParams.vsiqEntries) - siq_valids(siq_deq_ptr) := false.B - assert(siq_sas(siq_deq_ptr) || (siq_sas_fire && siq_sas_ptr === siq_deq_ptr)) - } - - io.enq.ready := Mux(io.enq.bits.store, siq_enq_ready, liq_enq_ready) - liq_enq_fire := io.enq.valid && liq_enq_ready && !io.enq.bits.store - siq_enq_fire := io.enq.valid && siq_enq_ready && io.enq.bits.store - - when (liq_lss_fire) { siq.foreach(_.ld_dep_mask(liq_lss_ptr) := false.B) } - when (siq_deq_fire) { liq.foreach(_.st_dep_mask(siq_deq_ptr) := false.B) } - - val scalar_store_conflict = (0 until vParams.vsiqEntries).map { i => - siq_valids(i) && (siq(i).containsBlock(io.scalar_check.addr) || !vParams.enableScalarVectorAddrDisambiguation.B) - }.orR - val scalar_load_conflict = (0 until vParams.vliqEntries).map { i => - liq_valids(i) && (liq(i).containsBlock(io.scalar_check.addr) || !vParams.enableScalarVectorAddrDisambiguation.B) - }.orR - io.scalar_check.conflict := scalar_store_conflict || (scalar_load_conflict && io.scalar_check.store) - - // Send indices/masks to las/sas - - val las_older_than_sas = (liq_las_valid && !liq(liq_las_ptr).st_dep_mask(siq_sas_ptr)) || !siq_sas_valid - val maskindex_load = liq_las_valid && las_older_than_sas && !liq(liq_las_ptr).op.fast_sg - val maskindex_store = siq_sas_valid && !las_older_than_sas && !siq(siq_sas_ptr).op.fast_sg - val maskindex_gather = liq_las_valid && las_older_than_sas && liq(liq_las_ptr).op.fast_sg - val maskindex_scatter = siq_sas_valid && !las_older_than_sas && siq(siq_sas_ptr).op.fast_sg - las.io.maskindex.index := io.vu.index_data.asUInt - sas.io.maskindex.index := io.vu.index_data.asUInt - las.io.maskindex.mask := io.vu.mask_data(0) - sas.io.maskindex.mask := io.vu.mask_data(0) - io.vu.mask_pop.valid := false.B - io.vu.mask_pop.bits.head := 0.U - io.vu.mask_pop.bits.tail := 1.U - io.vu.index_pop.valid := false.B - io.vu.index_pop.bits.head := 0.U - io.vu.index_pop.bits.tail := 1.U - - when (maskindex_load) { - io.vu.mask_pop.valid := las.io.maskindex.needs_mask && las.io.maskindex.ready - io.vu.index_pop.valid := las.io.maskindex.needs_index && las.io.maskindex.ready - io.vu.index_pop.bits.tail := 1.U << las.io.maskindex.eew - } - when (maskindex_store) { - io.vu.mask_pop.valid := sas.io.maskindex.needs_mask && sas.io.maskindex.ready - io.vu.index_pop.valid := sas.io.maskindex.needs_index && sas.io.maskindex.ready - io.vu.index_pop.bits.tail := 1.U << sas.io.maskindex.eew - } - - // scatter/gather paths - sgas.foreach { sgas => - sgas.io.index_pop.ready := false.B - sgas.io.mask_pop.ready := false.B - when (maskindex_gather || maskindex_scatter) { - io.vu.mask_pop <> sgas.io.mask_pop - io.vu.index_pop <> sgas.io.index_pop - } - sgas.io.index_data := io.vu.index_data - sgas.io.mask_data := io.vu.mask_data - sgas.io.valid := maskindex_gather || maskindex_scatter - sgas.io.lsiq_id := Mux(maskindex_gather, liq_las_ptr, siq_sas_ptr) - sgas.io.op := Mux(maskindex_gather, liq(liq_las_ptr).op, siq(siq_sas_ptr).op) - sgas.io.req <> io.sgmem.get.req - sgas.io.resp <> io.sgmem.get.resp - } - - las.io.maskindex.valid := maskindex_load && (io.vu.mask_pop.ready || !las.io.maskindex.needs_mask) && (io.vu.index_pop.ready || !las.io.maskindex.needs_index) - sas.io.maskindex.valid := !maskindex_load && (io.vu.mask_pop.ready || !sas.io.maskindex.needs_mask) && (io.vu.index_pop.ready || !sas.io.maskindex.needs_index) - - // Load Addr Sequencing - val las_order_block = (0 until vParams.vsiqEntries).map { i => - val addr_conflict = siq(i).overlaps(liq(liq_las_ptr)) - siq_valids(i) && addr_conflict && liq(liq_las_ptr).st_dep_mask(i) - }.orR - val dae_block = !vParams.enableDAE.B && (!io.vu.lresp.ready || - io.vu.lresp.bits.debug_id =/= liq(liq_las_ptr).op.debug_id) - las.io.valid := liq_las_valid && !las_order_block && !liq(liq_las_ptr).op.fast_sg && !dae_block - las.io.lsiq_id := liq_las_ptr - las.io.op := liq(liq_las_ptr).op - liq_las_fire := Mux(liq(liq_las_ptr).op.fast_sg, - sgas.map(_.io.done && maskindex_gather).getOrElse(false.B), las.io.done) - - las.io.tag <> lifq.io.reserve - las.io.out.ready := lifq.io.reserve.valid - lifq.io.entry := las.io.out.bits - - lifq.io.push.valid := io.dmem.load_resp.valid - lifq.io.push.bits.data := io.dmem.load_resp.bits.data - lifq.io.push.bits.tag := io.dmem.load_resp.bits.tag - - val load_arb = Module(new Arbiter(new MemRequest(dLenB, dmemTagBits), 2)) - load_arb.io.in(1) <> las.io.req - load_arb.io.in(1).bits.store := false.B - load_arb.io.in(0) <> lifq.io.replay - load_arb.io.in(0).bits.addr := Cat(liq(lifq.io.replay_liq_id).op.page, lifq.io.replay.bits.addr(pgIdxBits-1,0)) - when (io.dmem.store_req.valid) { - load_arb.io.in(0).valid := false.B - lifq.io.replay.ready := false.B - } - - // Load compacting - lcu.io.push.valid := lifq.io.deq.valid - lcu.io.push.bits.head := lifq.io.deq.bits.head - lcu.io.push.bits.tail := lifq.io.deq.bits.tail - lcu.io.push_data := lifq.io.deq_data.asTypeOf(Vec(dLenB, UInt(8.W))) - lifq.io.deq.ready := lcu.io.push.ready - - sgas.foreach { sgas => - sgas.io.load_resp.ready := false.B - when (maskindex_gather && !lifq.io.busy) { - sgas.io.load_resp.ready := lcu.io.push.ready - lcu.io.push.valid := sgas.io.load_resp.valid - lcu.io.push.bits := sgas.io.load_resp.bits - lcu.io.push_data := sgas.io.load_data - } - } - - // Load segment sequencing - lss.io.valid := liq_lss_valid - lss.io.op := liq(liq_lss_ptr).op - lcu.io.pop <> lss.io.compactor - lss.io.compactor_data := lcu.io.pop_data.asUInt - io.vu.lresp <> lss.io.resp - liq_lss_fire := lss.io.done - - // Store segment sequencing - sss.io.valid := siq_sss_valid - sss.io.op := siq(siq_sss_ptr).op - scu.io.push <> sss.io.compactor - scu.io.push_data := sss.io.compactor_data - sss.io.stdata <> io.vu.sdata - siq_sss_fire := sss.io.done - - // Store address sequencing - val sas_order_block = (0 until vParams.vliqEntries).map { i => - val addr_conflict = liq(i).overlaps(siq(siq_sas_ptr)) - liq_valids(i) && addr_conflict && siq(siq_sas_ptr).ld_dep_mask(i) - }.orR - sas.io.valid := siq_sas_valid && !sas_order_block && !siq(siq_sas_ptr).op.fast_sg - sas.io.lsiq_id := siq_sas_ptr - sas.io.op := siq(siq_sas_ptr).op - siq_sas_fire := Mux(siq(siq_sas_ptr).op.fast_sg, sgas.map(_.io.done && maskindex_scatter).getOrElse(false.B), sas.io.done) - - val store_req_q = Module(new DCEQueue(new MemRequest(dLenB, dmemTagBits), 2)) - store_req_q.io.enq <> sas.io.req - store_req_q.io.enq.bits.store := true.B - store_req_q.io.enq.bits.data := VecInit(scu.io.pop_data.map(_.data)).asUInt - store_req_q.io.enq.bits.mask := VecInit(scu.io.pop_data.map(_.mask)).asUInt & sas.io.req.bits.mask - - val store_rob = Module(new ReorderBuffer(Bool(), vParams.vsifqEntries)) - sas.io.tag <> store_rob.io.reserve - store_rob.io.reserve.ready := sas.io.tag.ready && sas.io.req.valid - - sas.io.out.ready := sifq.io.enq.ready && scu.io.pop.ready - sifq.io.enq.valid := sas.io.out.valid && scu.io.pop.ready - sifq.io.enq.bits := sas.io.out.bits - scu.io.pop.valid := sas.io.out.valid && sifq.io.enq.ready - when (scu.io.pop.fire) { - for (i <- 0 until dLenB) { - assert(scu.io.pop_data(i).debug_id === sas.io.op.debug_id || - i.U < scu.io.pop.bits.head || - (i.U >= scu.io.pop.bits.tail && scu.io.pop.bits.tail =/= 0.U)) - } - } - - scu.io.pop.bits.head := sas.io.out.bits.head - scu.io.pop.bits.tail := sas.io.out.bits.tail - - sgas.foreach { sgas => - sgas.io.store_pop.ready := false.B - sgas.io.store_data := scu.io.pop_data.map(_.data) - when (maskindex_scatter && !store_rob.io.busy) { - sgas.io.store_pop.ready := scu.io.pop.ready - scu.io.pop.valid := sgas.io.store_pop.valid - scu.io.pop.bits := sgas.io.store_pop.bits - } - } - - store_rob.io.push.valid := io.dmem.store_ack.valid - store_rob.io.push.bits.tag := io.dmem.store_ack.bits.tag - store_rob.io.push.bits.data := DontCare - - sifq.io.deq.ready := sifq.io.deq.bits.masked || store_rob.io.deq.valid - store_rob.io.deq.ready := !sifq.io.deq.bits.masked && sifq.io.deq.valid - when (store_rob.io.deq.valid) { assert(sifq.io.deq.valid) } - siq_deq_fire := sifq.io.deq.fire && sifq.io.deq.bits.last - - sgas.foreach { sgas => - when (maskindex_scatter && sgas.io.valid && sgas.io.done) { siq_deq_fire := true.B } - } - - if (vParams.latencyInject) { - val latency = Wire(UInt(32.W)) - latency := PlusArg("saturn_mem_latency") - val delay_timer = RegInit(0.U(64.W)) - delay_timer := delay_timer + 1.U - val load_delay = Module(new DelayQueue(new MemRequest(dLenB, dmemTagBits), 1024, 64)) - val store_delay = Module(new DelayQueue(new MemRequest(dLenB, dmemTagBits), 1024, 64)) - load_delay.io.timer := delay_timer - store_delay.io.timer := delay_timer - load_delay.io.delay := latency - store_delay.io.delay := latency - load_delay.io.enq <> load_arb.io.out - store_delay.io.enq <> store_req_q.io.deq - io.dmem.load_req <> load_delay.io.deq - io.dmem.store_req <> store_delay.io.deq - } else { - io.dmem.load_req <> load_arb.io.out - io.dmem.store_req <> store_req_q.io.deq - } - io.dmem.load_req.bits.mask := ~(0.U(dLenB.W)) - - io.busy := liq_valids.orR || siq_valids.orR -} diff --git a/arch/src/main/scala/framework/gendomain/mem/SGAddrGen.scala b/arch/src/main/scala/framework/gendomain/mem/SGAddrGen.scala deleted file mode 100644 index ee19338b..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/SGAddrGen.scala +++ /dev/null @@ -1,140 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - - -class ScatterGatherAddrGen(sgSize: BigInt)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val sgPorts = vParams.vsgPorts - assert(sgPorts <= dLenB && sgPorts >= 8) - val io = IO(new Bundle { - val valid = Input(Bool()) - val lsiq_id = Input(UInt(lsiqIdBits.W)) - val done = Output(Bool()) - val op = Input(new VectorMemMacroOp) - val index_pop = Decoupled(new CompactorReq(dLenB)) - val index_data = Input(Vec(dLenB, UInt(8.W))) - val mask_pop = Decoupled(new CompactorReq(dLenB)) - val mask_data = Input(Vec(dLenB, Bool())) - val store_pop = Decoupled(new CompactorReq(dLenB)) - val store_data = Input(Vec(dLenB, UInt(8.W))) - - val req = Vec(sgPorts, Decoupled(new MemRequest(1, sgmemTagBits))) - val resp = Vec(sgPorts, Input(Valid(new MemResponse(1, sgmemTagBits)))) - - val load_resp = Decoupled(new CompactorReq(dLenB)) - val load_data = Output(Vec(dLenB, UInt(8.W))) - }) - val vsgifqEntries = vParams.vsgifqEntries - - def min(a: UInt, b: UInt) = Mux(a > b, b, a) - - val resp_buffer = Reg(Vec(vsgifqEntries, Vec(sgPorts, UInt(8.W)))) - val resp_busys = Reg(Vec(vsgifqEntries, Vec(sgPorts, Bool()))) - val resp_bytes = Reg(Vec(vsgifqEntries, UInt(log2Ceil(dLenB).W))) - val resp_valids = RegInit(VecInit.fill(vsgifqEntries)(false.B)) - val fired = RegInit(VecInit.fill(sgPorts)(false.B)) - - val r_eidx = Reg(UInt((1 + log2Ceil(8*maxVLMax)).W)) - val r_enq = RegInit(0.U(log2Ceil(vsgifqEntries).W)) - val r_deq = RegInit(0.U(log2Ceil(vsgifqEntries).W)) - val r_head = RegInit(true.B) - val r_done = RegInit(false.B) - - val eidx = Mux(r_head, io.op.vstart, r_eidx) - val idx_incr = dLenB.U >> io.op.idx_size - val elem_incr = sgPorts.U >> io.op.elem_size - val incr = min(idx_incr, elem_incr) - val next_act_elems = min(incr, io.op.vl - eidx) - val next_eidx = eidx +& incr - val next_row = Mux(r_enq +& 1.U === vsgifqEntries.U, 0.U, r_enq + 1.U) - val store = io.op.store - - val enq_stall = resp_valids(r_enq) - val port_stalls = Wire(Vec(sgPorts, Bool())) - val fire = io.valid && !port_stalls.orR && !enq_stall && io.index_pop.ready && (io.mask_pop.ready || io.op.vm) && !r_done && (io.store_pop.ready || !store) - - - val base = Cat(io.op.page, io.op.base_offset) - val addrs: Seq[Vec[UInt]] = (0 until 4).map { sew => - val offsets = io.index_data.asTypeOf(Vec(dLenB >> sew, UInt((8 << sew).W))) - VecInit(offsets.map(o => Cat(base >> log2Ceil(sgSize), (o +& base)(log2Ceil(sgSize)-1,0)))) - } - - - for (i <- 0 until sgPorts) { - val port_eidx_offset = (i.U >> io.op.elem_size) - val port_byte_offset = i.U & ((1.U << io.op.elem_size) - 1.U) - val port_eidx = eidx +& port_eidx_offset - val port_masked = !io.op.vm && !io.mask_data(port_eidx_offset) - val port_addr = VecInit((0 until 4).map { sew => addrs(sew)(port_eidx_offset) })(io.op.idx_size) - - val port_active = io.valid && !r_done && port_eidx < io.op.vl && !port_masked && !fired(i) - port_stalls(i) := port_active && !io.req(i).ready - - io.req(i).valid := port_active && !enq_stall && io.index_pop.ready && (io.mask_pop.ready || io.op.vm) && (io.store_pop.ready || !store) - io.req(i).bits.mask := true.B - io.req(i).bits.data := io.store_data(i) - io.req(i).bits.tag := r_enq - io.req(i).bits.addr := port_addr | port_byte_offset // this is broken if the addrs are misaligned - io.req(i).bits.store := io.op.store - - - when (io.req(i).fire) { - resp_busys(r_enq)(i) := true.B - fired(i) := true.B - } - } - - when (fire) { - r_head := false.B - r_eidx := next_eidx - r_enq := next_row - resp_valids(r_enq) := true.B - resp_bytes(r_enq) := next_act_elems << io.op.elem_size - fired := VecInit.fill(sgPorts)(false.B) - when (next_eidx >= io.op.vl) { - r_done := true.B - } - } - - io.index_pop.valid := io.valid && !r_done && !enq_stall && (io.mask_pop.ready || io.op.vm) && !port_stalls.orR && (io.store_pop.ready || !store) - io.index_pop.bits.head := 0.U - io.index_pop.bits.tail := next_act_elems << io.op.idx_size - - io.mask_pop.valid := io.valid && !r_done && !io.op.vm && io.index_pop.ready && !enq_stall && !port_stalls.orR && (io.store_pop.ready || !store) - io.mask_pop.bits.head := 0.U - io.mask_pop.bits.tail := next_act_elems - - io.store_pop.valid := io.valid && !r_done && (io.mask_pop.ready || io.op.vm) && io.index_pop.ready && !enq_stall && !port_stalls.orR - io.store_pop.bits.head := 0.U - io.store_pop.bits.tail := next_act_elems << io.op.elem_size - - for (i <- 0 until sgPorts) { - when (io.resp(i).valid) { - assert(resp_busys(io.resp(i).bits.tag)(i)) - resp_busys(io.resp(i).bits.tag)(i) := false.B - resp_buffer(io.resp(i).bits.tag)(i) := io.resp(i).bits.data - } - } - - val resp_fire = io.valid && resp_valids(r_deq) && !resp_busys(r_deq).orR && (store || io.load_resp.ready) - io.load_resp.valid := io.valid && resp_valids(r_deq) && !resp_busys(r_deq).orR && !store - io.load_resp.bits.head := 0.U - io.load_resp.bits.tail := Mux(resp_bytes(r_deq) === 0.U, sgSize.U, resp_bytes(r_deq)) - io.load_data := resp_buffer(r_deq).asUInt.asTypeOf(Vec(dLenB, UInt(8.W))) - - when (resp_fire) { - r_deq := Mux(r_deq === (sgSize-1).U, 0.U, r_deq + 1.U) - resp_valids(r_deq) := false.B - } - - io.done := r_done && resp_fire && (resp_valids.asUInt === UIntToOH(r_deq)) - when (io.done) { r_head := true.B; r_done := false.B } - -} diff --git a/arch/src/main/scala/framework/gendomain/mem/SGTLInterface.scala b/arch/src/main/scala/framework/gendomain/mem/SGTLInterface.scala deleted file mode 100644 index e4a256a8..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/SGTLInterface.scala +++ /dev/null @@ -1,38 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ - -class SGTLInterface(implicit p: Parameters) extends LazyModule()(p) with HasCoreParameters with HasVectorParams { - - require(vParams.vsgBuffers >= 2) - val accessors = Seq.tabulate(vParams.vsgPorts) { i => LazyModule(new TLInterface(sgmemTagBits)) } - val identityNode = TLEphemeralNode() - accessors.foreach { a => identityNode := TLBuffer(BufferParams(vParams.vsgBuffers), BufferParams.none) := a.node } - - def node = TLWidthWidget(1) :=* identityNode - - override lazy val module = new Impl - class Impl extends LazyModuleImp(this) { - val io = IO(new Bundle { - val vec = Flipped(new VectorSGMemIO) - val mem_busy = Output(Bool()) - }) - - io.vec.base := DontCare // this should be set outside - io.mem_busy := false.B - for (i <- 0 until vParams.vsgPorts) { - accessors(i).module.io.req <> io.vec.req(i) - accessors(i).module.io.resp <> io.vec.resp(i) - } - io.mem_busy := accessors.map(_.module.io.busy).orR - } -} diff --git a/arch/src/main/scala/framework/gendomain/mem/SegmentBuffer.scala b/arch/src/main/scala/framework/gendomain/mem/SegmentBuffer.scala deleted file mode 100644 index 59c8c102..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/SegmentBuffer.scala +++ /dev/null @@ -1,235 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class LoadSegmentBuffer(doubleBuffer: Boolean)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val in = Flipped(Decoupled(new Bundle { - val data = UInt(dLen.W) - val eew = UInt(2.W) - val nf = UInt(3.W) - val eidx = UInt(log2Ceil(maxVLMax).W) - val segstart = UInt(3.W) - val sidx = UInt(3.W) - val sidx_tail = Bool() - val tail = Bool() - val debug_id = UInt(debugIdSz.W) - })) - val out = Decoupled(new Bundle { - val data = UInt(dLen.W) - val debug_id = UInt(debugIdSz.W) - }) - val busy = Output(Bool()) - }) - - val nB = if (doubleBuffer) 2 else 1 - - val rows = 8 - val cols = dLenB - - val wdata = Wire(Vec(4, UInt((rows*8*8).W))) - val warr = wdata(io.in.bits.eew).asTypeOf(Vec(rows, Vec(8, UInt(8.W)))) - val wrow = WireInit(0.U(rows.W)) - val wcol = WireInit(0.U(cols.W)) - val wmode = Wire(Bool()) - val array = Seq.tabulate(rows, cols, nB) { case (_,_,_) => Reg(UInt(8.W)) } - - for (r <- 0 until 8) { - for (c <- 0 until cols) { - for (s <- 0 until nB) { - when (wrow(r) && wcol(c) && wmode === s.U) { - array(r)(c)(s) := warr(r)(c % 8) - } - } - } - } - - val modes = RegInit(VecInit.fill(nB)(false.B)) - val in_sel = RegInit(false.B) - val out_sel = RegInit(false.B) - val out_nf = Reg(Vec(nB, UInt(3.W))) - val out_row = Reg(Vec(nB, UInt(3.W))) - val out_id = Reg(Vec(nB, UInt(debugIdSz.W))) - - io.in.ready := !modes(in_sel) - io.out.valid := modes(out_sel) - io.out.bits.data := Mux1H(UIntToOH(out_row(out_sel)), array.map(row => VecInit(row.map(_(out_sel))).asUInt)) - io.out.bits.debug_id := out_id(out_sel) - - when (io.in.fire) { - wrow := ((1.U << (dLenB.U >> io.in.bits.eew)) - 1.U)(7,0) << io.in.bits.sidx - } - wcol := ((1.U << (1.U << io.in.bits.eew)) - 1.U)(7,0) << (io.in.bits.eidx(log2Ceil(dLenB)-1,0) << io.in.bits.eew)(log2Ceil(dLenB)-1,0) - wmode := in_sel - - for (eew <- 0 until 4) { - val in_rows = 8 min (dLenB >> eew) - val in_cols = 8 >> eew - val in_elems = dLenB >> eew - - val col = Wire(Vec(in_rows, UInt((8 << eew).W))) - val arr = Wire(Vec(in_rows, Vec(in_cols, UInt((8 << eew).W)))) - - col := io.in.bits.data.asTypeOf(Vec(in_rows, UInt((8 << eew).W))) - for (r <- 0 until in_rows) { - for (c <- 0 until in_cols) { - arr(r)(c) := col(r) - } - } - - wdata(eew) := Fill(8 / in_rows, arr.asUInt) - } - - when (io.in.fire && io.in.bits.sidx_tail && (wcol(cols-1) || io.in.bits.tail)) { - in_sel := (if (doubleBuffer) (!in_sel) else false.B) - modes(in_sel) := true.B - out_nf(in_sel) := io.in.bits.nf - out_row(in_sel) := io.in.bits.segstart - out_id(in_sel) := io.in.bits.debug_id - } - - when (io.out.fire) { - when (out_row(out_sel) === out_nf(out_sel)) { - out_sel := (if (doubleBuffer) (!out_sel) else false.B) - modes(out_sel) := false.B - } .otherwise { - out_row(out_sel) := out_row(out_sel) + 1.U - } - } - - io.busy := modes.orR -} - -class StoreSegmentBuffer(doubleBuffer: Boolean)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - - val io = IO(new Bundle { - val in = Flipped(Decoupled(new Bundle { - val data = UInt(dLen.W) - val mask = UInt(dLenB.W) - val debug_id = UInt(debugIdSz.W) - val eew = UInt(2.W) - val nf = UInt(3.W) - val rows = UInt(4.W) - val sidx = UInt(3.W) - val segstart = UInt(3.W) - val segend = UInt(3.W) - })) - - val out = Decoupled(new Bundle { - val data = new StoreDataMicroOp - val head = UInt(log2Ceil(dLenB).W) - val tail = UInt(log2Ceil(dLenB).W) - }) - val busy = Output(Bool()) - }) - - val nB = if (doubleBuffer) 2 else 1 - val rows = 8 - val cols = dLenB - - val wdata = Wire(Vec(4, UInt((rows*8*8).W))) - val warr = wdata(io.in.bits.eew).asTypeOf(Vec(rows, Vec(8, UInt(8.W)))) - val wrow = WireInit(0.U(rows.W)) - val wcol = WireInit(0.U(cols.W)) - val wmode = Wire(Bool()) - val array = Seq.tabulate(rows, cols, nB) { case (_,_,_) => Reg(UInt(8.W)) } - val mask = Seq.fill(nB) { Reg(UInt(dLenB.W)) } - - for (r <- 0 until 8) { - for (c <- 0 until cols) { - for (s <- 0 until nB) { - when (wrow(r) && wcol(c) && wmode === s.U) { - array(r)(c)(s) := warr(r)(c % 8) - } - } - } - } - val modes = RegInit(VecInit.fill(nB)(false.B)) - val in_sel = RegInit(false.B) - val out_sidx = Reg(Vec(nB, UInt(3.W))) - val out_row = RegInit(0.U(3.W)) - val out_sel = RegInit(false.B) - val out_nf = Reg(Vec(nB, UInt(3.W))) - val out_eew = Reg(Vec(nB, UInt(2.W))) - val out_rows = Reg(Vec(nB, UInt(4.W))) - val out_segstart = Reg(Vec(nB, UInt(3.W))) - val out_id = Reg(Vec(nB, UInt(debugIdSz.W))) - - - def sidxOff(sidx: UInt, eew: UInt) = sidx & ~((1.U << (log2Ceil(cols).U - eew)) - 1.U) - - io.in.ready := !modes(in_sel) - io.out.valid := modes(out_sel) - val row_sel = out_row + sidxOff(out_sidx(out_sel), out_eew(out_sel)) - io.out.bits.data.tail := DontCare - io.out.bits.data.vat := DontCare - io.out.bits.data.stdata := Mux1H(UIntToOH(row_sel), array.map(row => VecInit(row.map(_(out_sel))).asUInt)) - io.out.bits.data.stmask := Fill(dLenB, (Mux1H(UIntToOH(out_sel), mask) >> (out_row << out_eew(out_sel)))(0)) - io.out.bits.data.debug_id := out_id(out_sel) - io.out.bits.head := out_sidx(out_sel) << out_eew(out_sel) - val remaining_bytes = (out_nf(out_sel) +& 1.U - out_sidx(out_sel)) << out_eew(out_sel) - io.out.bits.tail := Mux((remaining_bytes +& io.out.bits.head) >= dLenB.U, dLenB.U, remaining_bytes + io.out.bits.head) - - when (io.in.fire) { - wrow := ((1.U << (1.U << (log2Ceil(cols).U - io.in.bits.eew))) - 1.U)(7,0) << sidxOff(io.in.bits.sidx, io.in.bits.eew) - for (s <- 0 until nB) { - when (wmode === s.U && io.in.bits.sidx === 0.U) { - mask(s) := io.in.bits.mask - } - } - } - wcol := ((1.U << (1.U << io.in.bits.eew)) - 1.U)(7,0) << (io.in.bits.sidx << io.in.bits.eew)(log2Ceil(cols)-1,0) - - wmode := in_sel - - for (eew <- 0 until 4) { - val in_rows = 8 min (dLenB >> eew) - val in_cols = 8 >> eew - val in_elems = cols >> eew - - val col = Wire(Vec(in_rows, UInt((8 << eew).W))) - val arr = Wire(Vec(in_rows, Vec(in_cols, UInt((8 << eew).W)))) - - col := io.in.bits.data.asTypeOf(Vec(in_rows, UInt((8 << eew).W))) - - for (r <- 0 until in_rows) { - for (c <- 0 until in_cols) { - arr(r)(c) := col(r) - } - } - wdata(eew) := Fill(8 / in_rows, arr.asUInt) - } - - when (io.in.fire && io.in.bits.sidx === io.in.bits.nf) { - in_sel := (if (doubleBuffer) (!in_sel) else false.B) - modes(in_sel) := true.B - out_sidx(in_sel) := io.in.bits.segstart - out_nf(in_sel) := io.in.bits.segend - out_eew(in_sel) := io.in.bits.eew - out_rows(in_sel) := io.in.bits.rows - out_segstart(in_sel) := io.in.bits.segstart - out_id(in_sel) := io.in.bits.debug_id - } - - when (io.out.fire) { - val sidx_tail = ((out_sidx(out_sel) +& (cols.U >> out_eew(out_sel))) > out_nf(out_sel)) - when ((out_row +& 1.U === out_rows(out_sel)) && sidx_tail) { - out_sel := (if (doubleBuffer) (!out_sel) else false.B) - out_row := 0.U - modes(out_sel) := false.B - } .elsewhen (sidx_tail) { - out_sidx(out_sel) := out_segstart(out_sel) - out_row := out_row + 1.U - } .otherwise { - out_sidx(out_sel) := out_sidx(out_sel) + (cols.U >> out_eew(out_sel)) - } - } - - io.busy := modes.orR -} diff --git a/arch/src/main/scala/framework/gendomain/mem/StoreSegmenter.scala b/arch/src/main/scala/framework/gendomain/mem/StoreSegmenter.scala deleted file mode 100644 index f0c4dcec..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/StoreSegmenter.scala +++ /dev/null @@ -1,85 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import framework.gendomain.common._ - -class StoreSegmenter(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val valid = Input(Bool()) - val done = Output(Bool()) - val op = Input(new VectorMemMacroOp) - - val compactor = Decoupled(new CompactorReq(dLenB)) - val compactor_data = Output(Vec(dLenB, new MaskedByte)) - val stdata = Flipped(Decoupled(new StoreDataMicroOp)) - }) - - val segbuf = Module(new StoreSegmentBuffer(vParams.doubleBufferSegments)) - - val r_eidx = Reg(UInt(log2Ceil(maxVLMax).W)) - val r_head = RegInit(true.B) - val r_sidx = Reg(UInt(3.W)) - val eidx = Mux(r_head, io.op.vstart, r_eidx) - val sidx = Mux(r_head, 0.U, r_sidx) - - val mem_size = io.op.elem_size - val sub_dlen = Mux(io.op.seg_nf =/= 0.U && (log2Ceil(dLenB).U > (3.U +& mem_size)), - log2Ceil(dLenB).U - 3.U - mem_size, - 0.U) - val eidx_incr = (dLenB.U - ((eidx << (mem_size +& sub_dlen))(dLenOffBits-1,0))) >> (mem_size +& sub_dlen) - val next_eidx = eidx +& eidx_incr - val next_sidx = sidx +& 1.U - - val sidx_tail = next_sidx > io.op.seg_nf - val eidx_tail = next_eidx >= io.op.vl - - when (io.valid && io.stdata.valid) { - assert(io.stdata.bits.debug_id === io.op.debug_id) - } - - io.stdata.ready := io.valid && Mux(io.op.seg_nf === 0.U, - !segbuf.io.busy && io.compactor.ready, - segbuf.io.in.ready) - - segbuf.io.in.valid := io.valid && io.op.seg_nf =/= 0.U && io.stdata.valid - segbuf.io.in.bits.data := io.stdata.bits.stdata >> ((eidx << mem_size)(dLenOffBits-1,0) << 3) - segbuf.io.in.bits.mask := io.stdata.bits.stmask >> (eidx << mem_size)(dLenOffBits-1,0) - segbuf.io.in.bits.eew := mem_size - segbuf.io.in.bits.nf := io.op.nf - segbuf.io.in.bits.rows := Mux(next_eidx >= io.op.vl, (io.op.vl - eidx), eidx_incr) - segbuf.io.in.bits.sidx := sidx - segbuf.io.in.bits.segstart := io.op.segstart - segbuf.io.in.bits.segend := io.op.seg_nf - segbuf.io.in.bits.debug_id := io.op.debug_id - - io.compactor.valid := Mux(segbuf.io.busy, - segbuf.io.out.valid, - io.stdata.valid && io.valid && io.op.seg_nf === 0.U) - io.compactor_data := Mux(segbuf.io.busy, - segbuf.io.out.bits.data, io.stdata.bits).asMaskedBytes - io.compactor.bits.head := Mux(segbuf.io.busy, - segbuf.io.out.bits.head, eidx << mem_size) - io.compactor.bits.tail := Mux(segbuf.io.busy, - segbuf.io.out.bits.tail, Mux(eidx_tail, io.op.vl << mem_size, 0.U)) - - segbuf.io.out.ready := io.compactor.ready - - io.done := false.B - when (io.stdata.fire) { - r_head := false.B - when (io.op.seg_nf =/= 0.U && !sidx_tail) { - when (r_head) { r_eidx := io.op.vstart } - r_sidx := next_sidx - } .otherwise { - r_eidx := next_eidx - r_sidx := 0.U - io.done := eidx_tail - r_head := eidx_tail - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/mem/TLInterface.scala b/arch/src/main/scala/framework/gendomain/mem/TLInterface.scala deleted file mode 100644 index 6fbe113b..00000000 --- a/arch/src/main/scala/framework/gendomain/mem/TLInterface.scala +++ /dev/null @@ -1,87 +0,0 @@ -package framework.gendomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ - -class TLInterface(tagBits: Int)(implicit p: Parameters) extends LazyModule()(p) with HasCoreParameters { - val node = TLClientNode(Seq(TLMasterPortParameters.v1(Seq(TLMasterParameters.v1( - name = s"Core ${tileId} Vector Load", - sourceId = IdRange(0, 1 << tagBits) - ))))) - override lazy val module = new Impl - class Impl extends LazyModuleImp(this) { - - val (out, edge) = node.out(0) - - val widthBytes = edge.slave.beatBytes - val offBits = log2Ceil(widthBytes) - - val io = IO(new Bundle { - val busy = Output(Bool()) - val req = Flipped(Decoupled(new MemRequest(widthBytes, tagBits))) - val resp = Valid(new MemResponse(widthBytes, tagBits)) - }) - - val inflights = RegInit(0.U(tagBits.W)) - when (out.a.fire || out.d.fire) { - inflights := inflights + out.a.fire - out.d.fire - } - io.busy := inflights =/= 0.U - - io.req.ready := out.a.ready - out.a.valid := io.req.valid - out.a.bits := Mux(io.req.bits.store, - edge.Put( - io.req.bits.tag, - (io.req.bits.addr >> offBits) << offBits, - log2Ceil(widthBytes).U, - io.req.bits.data, - io.req.bits.mask)._2, - edge.Get( - io.req.bits.tag, - (io.req.bits.addr >> offBits) << offBits, - log2Ceil(widthBytes).U)._2 - ) - - out.d.ready := true.B - io.resp.valid := out.d.valid - io.resp.bits.data := out.d.bits.data - io.resp.bits.tag := out.d.bits.source - } -} - - -class TLSplitInterface(implicit p: Parameters) extends LazyModule()(p) with HasCoreParameters with HasVectorParams { - - val reader = LazyModule(new TLInterface(dmemTagBits)) - val writer = LazyModule(new TLInterface(dmemTagBits)) - - val arb = LazyModule(new TLXbar) - def node = TLWidthWidget(dLenB) := arb.node - def edge = arb.node.edges.out(0) - - arb.node := reader.node - arb.node := writer.node - - override lazy val module = new Impl - class Impl extends LazyModuleImp(this) { - val io = IO(new Bundle { - val vec = Flipped(new VectorMemIO) - val mem_busy = Output(Bool()) - }) - - reader.module.io.req <> io.vec.load_req - io.vec.load_resp <> reader.module.io.resp - writer.module.io.req <> io.vec.store_req - io.vec.store_ack <> writer.module.io.resp - io.mem_busy := reader.module.io.busy || writer.module.io.busy - } -} diff --git a/arch/src/main/scala/framework/gendomain/rocket/Configs.scala b/arch/src/main/scala/framework/gendomain/rocket/Configs.scala deleted file mode 100644 index 3bf06704..00000000 --- a/arch/src/main/scala/framework/gendomain/rocket/Configs.scala +++ /dev/null @@ -1,83 +0,0 @@ -package framework.gendomain.rocket - -import chisel3._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.diplomacy._ -import framework.gendomain.common._ -import framework.gendomain.frontend.{EarlyVectorDecode} - -class WithRocketVectorUnit( - vLen: Int = 128, - dLen: Int = 64, - params: VectorParams = VectorParams(), - cores: Option[Seq[Int]] = None, - useL1DCache: Boolean = true) extends Config((site, here, up) => { - case TilesLocated(InSubsystem) => up(TilesLocated(InSubsystem), site) map { - case tp: RocketTileAttachParams => { - val buildVector = cores.map(_.contains(tp.tileParams.tileId)).getOrElse(true) - if (buildVector) tp.copy(tileParams = tp.tileParams.copy( - core = tp.tileParams.core.copy( - vector = Some(RocketCoreVectorParams( - build = ((p: Parameters) => new SaturnRocketUnit()(p.alterPartial { - case VectorParamsKey => params.copy(dLen=dLen) - })), - vLen = vLen, - vfLen = 64, - vfh = true, - eLen = 64, - vMemDataBits = if (useL1DCache) dLen else 0, - decoder = ((p: Parameters) => { - val decoder = Module(new EarlyVectorDecode(params.supported_ex_insns)(p)) - decoder - }), - useDCache = true, - issueVConfig = false, - vExts = Seq("zvbb") - )), - fpu = (if (params.useScalarFPFMA || params.useScalarFPMisc) { tp.tileParams.core.fpu.map(_.copy( - sfmaLatency = params.fmaPipeDepth - 1, - dfmaLatency = params.fmaPipeDepth - 1, - ifpuLatency = params.fmaPipeDepth - 1, - fpmuLatency = params.fmaPipeDepth - 1, - )) } else { tp.tileParams.core.fpu }).map(_.copy(minFLen = 16)) - ), - dcache = if (useL1DCache) tp.tileParams.dcache.map(_.copy(rowBits = dLen)) else tp.tileParams.dcache - )) else tp - } - case tp: framework.rocket.RocketTileAttachParamsBB => { - val buildVector = cores.map(_.contains(tp.tileParams.tileId)).getOrElse(true) - if (buildVector) tp.copy(tileParams = tp.tileParams.copy( - core = tp.tileParams.core.copy( - vector = Some(RocketCoreVectorParams( - build = ((p: Parameters) => new SaturnRocketUnit()(p.alterPartial { - case VectorParamsKey => params.copy(dLen=dLen) - })), - vLen = vLen, - vfLen = 64, - vfh = true, - eLen = 64, - vMemDataBits = if (useL1DCache) dLen else 0, - decoder = ((p: Parameters) => { - val decoder = Module(new EarlyVectorDecode(params.supported_ex_insns)(p)) - decoder - }), - useDCache = true, - issueVConfig = false, - vExts = Seq("zvbb") - )), - fpu = (if (params.useScalarFPFMA || params.useScalarFPMisc) { tp.tileParams.core.fpu.map(_.copy( - sfmaLatency = params.fmaPipeDepth - 1, - dfmaLatency = params.fmaPipeDepth - 1, - ifpuLatency = params.fmaPipeDepth - 1, - fpmuLatency = params.fmaPipeDepth - 1, - )) } else { tp.tileParams.core.fpu }).map(_.copy(minFLen = 16)) - ), - dcache = if (useL1DCache) tp.tileParams.dcache.map(_.copy(rowBits = dLen)) else tp.tileParams.dcache - )) else tp - } - case other => other - } -}) diff --git a/arch/src/main/scala/framework/gendomain/rocket/Frontend.scala b/arch/src/main/scala/framework/gendomain/rocket/Frontend.scala deleted file mode 100644 index 5ad6c1f3..00000000 --- a/arch/src/main/scala/framework/gendomain/rocket/Frontend.scala +++ /dev/null @@ -1,94 +0,0 @@ -package framework.gendomain.rocket - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ -import framework.gendomain.backend.{VectorBackend} -import framework.gendomain.mem.{ScalarMemOrderCheckIO, TLSplitInterface} -import framework.gendomain.frontend.{EarlyTrapCheck, IterativeTrapCheck} - -class SaturnRocketFrontend(edge: TLEdge)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val core = new VectorCoreIO - val tlb = Flipped(new DCacheTLBPort) - - val issue = Decoupled(new VectorIssueInst) - - val index_access = Flipped(new VectorIndexAccessIO) - val mask_access = Flipped(new VectorMaskAccessIO) - - val scalar_check = Flipped(new ScalarMemOrderCheckIO) - }) - - val ptc = Module(new EarlyTrapCheck(edge, None)) - val itc = Module(new IterativeTrapCheck) - - ptc.io.sg_base := DontCare - ptc.io.s0.in.valid := io.core.ex.valid && !itc.io.busy - ptc.io.s0.in.bits.inst := io.core.ex.inst - ptc.io.s0.in.bits.pc := io.core.ex.pc - ptc.io.s0.in.bits.status := io.core.status - ptc.io.s0.in.bits.vconfig := io.core.ex.vconfig - ptc.io.s0.in.bits.vstart := io.core.ex.vstart - ptc.io.s0.in.bits.rs1 := io.core.ex.rs1 - ptc.io.s0.in.bits.rs2 := io.core.ex.rs2 - ptc.io.s0.in.bits.phys := false.B - io.core.ex.ready := !itc.io.busy - - ptc.io.s1.rs1.valid := ptc.io.s1.inst.isOpf && !ptc.io.s1.inst.vmu - ptc.io.s1.rs1.bits := io.core.mem.frs1 - ptc.io.s1.kill := io.core.killm - io.core.mem.block_all := itc.io.busy || ptc.io.s2.internal_replay.valid - io.core.mem.block_mem := (ptc.io.s2.inst.valid && ptc.io.s2.inst.bits.vmu) || io.scalar_check.conflict - - io.tlb.req.valid := Mux(itc.io.busy, itc.io.s0_tlb_req.valid, ptc.io.s0.tlb_req.valid) - io.tlb.req.bits := Mux(itc.io.busy, itc.io.s0_tlb_req.bits , ptc.io.s0.tlb_req.bits) - ptc.io.s1.tlb_resp := io.tlb.s1_resp - when (RegEnable(itc.io.busy || !io.tlb.req.ready, ptc.io.s0.tlb_req.valid)) { ptc.io.s1.tlb_resp.miss := true.B } - itc.io.tlb_resp := io.tlb.s1_resp - when (RegEnable(!io.tlb.req.ready, itc.io.s0_tlb_req.valid)) { itc.io.tlb_resp.miss := true.B } - io.tlb.s2_kill := false.B - - ptc.io.s2.scalar_store_pending := io.core.wb.store_pending - - io.core.wb.replay := ptc.io.s2.replay - io.core.wb.xcpt := Mux(itc.io.busy, itc.io.xcpt.valid , ptc.io.s2.xcpt.valid) - io.core.wb.cause := Mux(itc.io.busy, itc.io.xcpt.bits.cause, ptc.io.s2.xcpt.bits.cause) - io.core.wb.pc := Mux(itc.io.busy, itc.io.pc , ptc.io.s2.pc) - io.core.wb.retire := Mux(itc.io.busy, itc.io.retire , ptc.io.s2.retire) - io.core.wb.inst := Mux(itc.io.busy, itc.io.inst.bits , ptc.io.s2.inst.bits.bits) - io.core.wb.tval := Mux(itc.io.busy, itc.io.xcpt.bits.tval , ptc.io.s2.xcpt.bits.tval) - io.core.wb.rob_should_wb := Mux(itc.io.busy, itc.io.inst.writes_xrf, ptc.io.s2.inst.bits.writes_xrf) - io.core.wb.rob_should_wb_fp := Mux(itc.io.busy, itc.io.inst.writes_frf, ptc.io.s2.inst.bits.writes_frf) - io.core.set_vstart := Mux(itc.io.busy, itc.io.vstart , ptc.io.s2.vstart) - io.core.set_vconfig := itc.io.vconfig - ptc.io.s2.vxrm := io.core.wb.vxrm - ptc.io.s2.frm := io.core.wb.frm - itc.io.in := ptc.io.s2.internal_replay - - io.issue.valid := Mux(itc.io.busy, itc.io.issue.valid, ptc.io.s2.issue.valid) - io.issue.bits := Mux(itc.io.busy, itc.io.issue.bits , ptc.io.s2.issue.bits) - itc.io.issue.ready := io.issue.ready - ptc.io.s2.issue.ready := !itc.io.busy && io.issue.ready - - io.core.trap_check_busy := ptc.io.busy || itc.io.busy - - itc.io.status := io.core.status - itc.io.index_access <> io.index_access - itc.io.mask_access <> io.mask_access - io.scalar_check.addr := io.tlb.s1_resp.paddr - io.scalar_check.size := io.tlb.s1_resp.size - io.scalar_check.store := isWrite(io.tlb.s1_resp.cmd) - - io.core.backend_busy := false.B // set externally - io.core.set_vxsat := false.B // set externally - io.core.set_fflags := DontCare // set externally - io.core.resp := DontCare -} diff --git a/arch/src/main/scala/framework/gendomain/rocket/HellaInterface.scala b/arch/src/main/scala/framework/gendomain/rocket/HellaInterface.scala deleted file mode 100644 index 798b02f3..00000000 --- a/arch/src/main/scala/framework/gendomain/rocket/HellaInterface.scala +++ /dev/null @@ -1,94 +0,0 @@ -package framework.gendomain.rocket - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ -import framework.gendomain.common._ -import framework.gendomain.mem.{VectorMemIO} - -class HellaCacheInterface(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val status = Input(new MStatus) - val dmem = new HellaCacheIO - val vec = Flipped(new VectorMemIO) - val vec_busy = Input(Bool()) - val mem_busy = Output(Bool()) - }) - - val hella_simple = Module(new SimpleHellaCacheIF) - val hella_arb = Module(new HellaCacheArbiter(2)) - hella_simple.io.requestor <> hella_arb.io.mem - io.dmem <> hella_simple.io.cache - - val hella_load = hella_arb.io.requestor(1) - val hella_store = hella_arb.io.requestor(0) - - val hella_load_q = Module(new Queue(new HellaCacheReq, 2)) - hella_load.req <> hella_load_q.io.deq - val hella_store_q = Module(new Queue(new HellaCacheReq, 2)) - hella_store.req <> hella_store_q.io.deq - - hella_arb.io.requestor.foreach { h => - h.s1_kill := false.B - h.s1_data := DontCare - h.s2_kill := false.B - h.keep_clock_enabled := io.vec_busy - } - - val inflights = RegInit(0.U((1+dmemTagBits).W)) - - io.vec.load_req.ready := hella_load_q.io.enq.ready - hella_load_q.io.enq.valid := io.vec.load_req.valid - hella_load_q.io.enq.bits.addr := io.vec.load_req.bits.addr - hella_load_q.io.enq.bits.size := log2Ceil(dLenB).U - hella_load_q.io.enq.bits.tag := Cat(0.U, io.vec.load_req.bits.tag) - hella_load_q.io.enq.bits.cmd := M_XRD - hella_load_q.io.enq.bits.signed := false.B - hella_load_q.io.enq.bits.dprv := io.status.prv - hella_load_q.io.enq.bits.dv := io.status.dv - hella_load_q.io.enq.bits.data := DontCare - hella_load_q.io.enq.bits.mask := DontCare - hella_load_q.io.enq.bits.phys := false.B - hella_load_q.io.enq.bits.no_resp := false.B - hella_load_q.io.enq.bits.no_alloc := false.B - hella_load_q.io.enq.bits.no_xcpt := true.B - - io.vec.load_resp.valid := hella_load.resp.valid - io.vec.load_resp.bits.data := hella_load.resp.bits.data_raw - io.vec.load_resp.bits.tag := hella_load.resp.bits.tag - - io.vec.store_req.ready := hella_store_q.io.enq.ready - hella_store_q.io.enq.valid := io.vec.store_req.valid - hella_store_q.io.enq.bits.addr := io.vec.store_req.bits.addr - hella_store_q.io.enq.bits.tag := Cat(1.U, io.vec.store_req.bits.tag) - hella_store_q.io.enq.bits.cmd := M_PWR - hella_store_q.io.enq.bits.size := log2Ceil(dLenB).U - hella_store_q.io.enq.bits.signed := false.B - hella_store_q.io.enq.bits.dprv := io.status.prv - hella_store_q.io.enq.bits.dv := io.status.dv - hella_store_q.io.enq.bits.data := io.vec.store_req.bits.data - hella_store_q.io.enq.bits.mask := io.vec.store_req.bits.mask - hella_store_q.io.enq.bits.phys := false.B - hella_store_q.io.enq.bits.no_resp := false.B - hella_store_q.io.enq.bits.no_alloc := false.B - hella_store_q.io.enq.bits.no_xcpt := true.B - - io.vec.store_ack.valid := hella_store.resp.valid - io.vec.store_ack.bits.data := DontCare - io.vec.store_ack.bits.tag := hella_store.resp.bits.tag - - io.mem_busy := inflights =/= 0.U - - val load_enq = hella_load_q.io.enq.fire - val store_enq = hella_store_q.io.enq.fire - val load_deq = hella_load.resp.fire - val store_deq = hella_store.resp.fire - when (load_enq || store_enq || load_deq || store_deq) { - inflights := inflights + (load_enq +& store_enq) - (load_deq +& store_deq) - } -} diff --git a/arch/src/main/scala/framework/gendomain/rocket/Integration.scala b/arch/src/main/scala/framework/gendomain/rocket/Integration.scala deleted file mode 100644 index 13570524..00000000 --- a/arch/src/main/scala/framework/gendomain/rocket/Integration.scala +++ /dev/null @@ -1,119 +0,0 @@ -package framework.gendomain.rocket - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ -import framework.gendomain.backend.{VectorBackend} -import framework.gendomain.mem.{TLSplitInterface, VectorMemUnit} -import framework.gendomain.frontend.{VectorDispatcher} - -class SaturnRocketUnit(implicit p: Parameters) extends RocketVectorUnit()(p) with HasVectorParams with HasCoreParameters { - - if (vParams.useScalarFPFMA || vParams.useScalarFPMisc) { - require(coreParams.fpu.isDefined) - if (vParams.useScalarFPFMA) { - require(coreParams.fpu.get.sfmaLatency == vParams.fmaPipeDepth - 1) - require(coreParams.fpu.get.dfmaLatency == vParams.fmaPipeDepth - 1) - } - } - - val tl_if = LazyModule(new TLSplitInterface) - atlNode := TLBuffer(vParams.tlBuffer) := TLWidthWidget(dLen/8) := tl_if.node - - override lazy val module = new SaturnRocketImpl - class SaturnRocketImpl extends RocketVectorUnitModuleImp(this) with HasVectorParams with HasCoreParameters { - - val useL1DCache = dLen == vMemDataBits - - val dis = Module(new VectorDispatcher) - val vfu = Module(new SaturnRocketFrontend(tl_if.edge)) - val vu = Module(new VectorBackend) - val vmu = Module(new VectorMemUnit) - - val hella_if = Module(new HellaCacheInterface) - val scalar_arb = Module(new Arbiter(new ScalarWrite, 2)) - - dis.io.issue <> vfu.io.issue - - vfu.io.core <> io.core - vfu.io.tlb <> io.tlb - - vu.io.index_access <> vfu.io.index_access - vu.io.mask_access <> vfu.io.mask_access - vu.io.vmu <> vmu.io.vu - vu.io.vat_tail := dis.io.vat_tail - vu.io.vat_head := dis.io.vat_head - vu.io.dis <> dis.io.dis - dis.io.vat_release := vu.io.vat_release - vmu.io.enq <> dis.io.mem - - vmu.io.scalar_check <> vfu.io.scalar_check - - io.core.backend_busy := vu.io.busy || tl_if.module.io.mem_busy || hella_if.io.mem_busy || vmu.io.busy - io.core.set_vxsat := vu.io.set_vxsat - io.core.set_fflags := vu.io.set_fflags - - scalar_arb.io.in(0) <> vu.io.scalar_resp - scalar_arb.io.in(1) <> dis.io.scalar_resp - io.core.resp <> Queue(scalar_arb.io.out) - - io.fp_req <> vu.io.fp_req - vu.io.fp_resp.valid := io.fp_resp.valid - vu.io.fp_resp.bits := io.fp_resp.bits - io.fp_resp.ready := true.B - - io.dmem <> hella_if.io.dmem - hella_if.io.vec_busy := vu.io.busy || vmu.io.busy - hella_if.io.status := io.core.status - - def block[T <: Data](in: DecoupledIO[T], block: Bool): DecoupledIO[T] = { - val out = Wire(Decoupled(in.bits.cloneType)) - out.bits := in.bits - out.valid := in.valid && !block - in.ready := out.ready && !block - out - } - - val load_use_tl_reg = RegInit(true.B) - val store_use_tl_reg = RegInit(true.B) - - // virtually-addressed requests must go through L1 - val load_use_tl = load_use_tl_reg || !useL1DCache.B - val store_use_tl = store_use_tl_reg || !useL1DCache.B - - vmu.io.dmem.load_resp.valid := tl_if.module.io.vec.load_resp.valid || hella_if.io.vec.load_resp.valid - vmu.io.dmem.load_resp.bits := Mux1H( - Seq(tl_if.module.io.vec.load_resp.valid, hella_if.io.vec.load_resp.valid), - Seq(tl_if.module.io.vec.load_resp.bits , hella_if.io.vec.load_resp.bits)) - vmu.io.dmem.store_ack.valid := tl_if.module.io.vec.store_ack.valid || hella_if.io.vec.store_ack.valid - vmu.io.dmem.store_ack.bits := Mux1H( - Seq(tl_if.module.io.vec.store_ack.valid, hella_if.io.vec.store_ack.valid), - Seq(tl_if.module.io.vec.store_ack.bits , hella_if.io.vec.store_ack.bits)) - - when (load_use_tl) { - tl_if.module.io.vec.load_req <> block(vmu.io.dmem.load_req, hella_if.io.mem_busy) - hella_if.io.vec.load_req.valid := false.B - hella_if.io.vec.load_req.bits := DontCare - } .otherwise { - hella_if.io.vec.load_req <> block(vmu.io.dmem.load_req, tl_if.module.io.mem_busy) - tl_if.module.io.vec.load_req.valid := false.B - tl_if.module.io.vec.load_req.bits := DontCare - } - when (store_use_tl) { - tl_if.module.io.vec.store_req <> block(vmu.io.dmem.store_req, hella_if.io.mem_busy) - hella_if.io.vec.store_req.valid := false.B - hella_if.io.vec.store_req.bits := DontCare - } .otherwise { - hella_if.io.vec.store_req <> block(vmu.io.dmem.store_req, tl_if.module.io.mem_busy) - tl_if.module.io.vec.store_req.valid := false.B - tl_if.module.io.vec.store_req.bits := DontCare - } - } -} diff --git a/arch/src/main/scala/framework/gendomain/shuttle/Configs.scala b/arch/src/main/scala/framework/gendomain/shuttle/Configs.scala deleted file mode 100644 index 07e5e89a..00000000 --- a/arch/src/main/scala/framework/gendomain/shuttle/Configs.scala +++ /dev/null @@ -1,49 +0,0 @@ -package framework.gendomain.shuttle - -import chisel3._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.diplomacy._ -import framework.gendomain.common._ -import framework.gendomain.frontend.{EarlyVectorDecode} -import shuttle.common._ - -class WithShuttleVectorUnit( - vLen: Int = 128, - dLen: Int = 64, - params: VectorParams = VectorParams(), - cores: Option[Seq[Int]] = None, - location: HierarchicalLocation = InSubsystem -) extends Config((site, here, up) => { - case TilesLocated(InSubsystem) => up(TilesLocated(InSubsystem), site) map { - case tp: ShuttleTileAttachParams => { - val buildVector = cores.map(_.contains(tp.tileParams.tileId)).getOrElse(true) - val vParams = params.copy( - dLen=dLen, - useScalarFPFMA = false, - useScalarFPMisc = false - ) - if (buildVector) tp.copy(tileParams = tp.tileParams.copy( - core = tp.tileParams.core.copy( - vector = Some(ShuttleCoreVectorParams( - build = ((p: Parameters) => new SaturnShuttleUnit()(p.alterPartial { - case VectorParamsKey => vParams - })), - vfLen = 64, - vfh = true, - vLen = vLen, - decoder = ((p: Parameters) => { - val decoder = Module(new EarlyVectorDecode(vParams.supported_ex_insns)(p)) - decoder - }), - issueVConfig = false, - vExts = Seq("zvbb") - )), - ) - )) else tp - } - case other => other - } -}) diff --git a/arch/src/main/scala/framework/gendomain/shuttle/Frontend.scala b/arch/src/main/scala/framework/gendomain/shuttle/Frontend.scala deleted file mode 100644 index f2bab271..00000000 --- a/arch/src/main/scala/framework/gendomain/shuttle/Frontend.scala +++ /dev/null @@ -1,112 +0,0 @@ -package framework.gendomain.shuttle - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ -import framework.gendomain.backend.{VectorBackend} -import framework.gendomain.mem.{ScalarMemOrderCheckIO, MemRequest, TLSplitInterface, SGTLInterface} -import framework.gendomain.frontend.{EarlyTrapCheck, IterativeTrapCheck} -import shuttle.common._ - - -class SaturnShuttleFrontend(sgSize: Option[BigInt], edge: TLEdge)(implicit p: Parameters) extends CoreModule()(p) with HasVectorParams { - val io = IO(new Bundle { - val sg_base = Input(UInt(coreMaxAddrBits.W)) - val core = new ShuttleVectorCoreIO - - val issue = Decoupled(new VectorIssueInst) - - val index_access = Flipped(new VectorIndexAccessIO) - val mask_access = Flipped(new VectorMaskAccessIO) - - val scalar_check = Flipped(new ScalarMemOrderCheckIO) - }) - - val ptc = Module(new EarlyTrapCheck(edge, sgSize)) - val itc = Module(new IterativeTrapCheck) - - val replayed = RegInit(false.B) - - ptc.io.sg_base := io.sg_base - ptc.io.s0.in.valid := io.core.ex.valid && !itc.io.busy && !(replayed && !io.issue.ready) - ptc.io.s0.in.bits.inst := io.core.ex.uop.inst - ptc.io.s0.in.bits.pc := io.core.ex.uop.pc - ptc.io.s0.in.bits.status := io.core.status - ptc.io.s0.in.bits.vconfig := io.core.ex.vconfig - ptc.io.s0.in.bits.vstart := io.core.ex.vstart - ptc.io.s0.in.bits.rs1 := io.core.ex.uop.rs1_data - ptc.io.s0.in.bits.rs2 := io.core.ex.uop.rs2_data - ptc.io.s0.in.bits.phys := !(io.core.status.dprv <= PRV.S.U && io.core.satp.mode(io.core.satp.mode.getWidth-1)) - io.core.ex.ready := !itc.io.busy && !(replayed && !io.issue.ready) - - ptc.io.s1.rs1.valid := ptc.io.s1.inst.isOpf && !ptc.io.s1.inst.vmu - ptc.io.s1.rs1.bits := io.core.mem.frs1 - ptc.io.s1.kill := io.core.mem.kill || !RegEnable(io.core.ex.fire, io.core.ex.valid) - - io.core.mem.tlb_req.valid := Mux(itc.io.busy, itc.io.s1_tlb_req.valid, ptc.io.s1.tlb_req.valid) - io.core.mem.tlb_req.bits := Mux(itc.io.busy, itc.io.s1_tlb_req.bits, ptc.io.s1.tlb_req.bits) - val mem_tlb_resp = Wire(new TLBResp) - mem_tlb_resp.miss := io.core.mem.tlb_resp.miss || !io.core.mem.tlb_req.ready - mem_tlb_resp.paddr := io.core.mem.tlb_resp.paddr - mem_tlb_resp.pf := io.core.mem.tlb_resp.pf - mem_tlb_resp.ae := io.core.mem.tlb_resp.ae - mem_tlb_resp.ma := io.core.mem.tlb_resp.ma - mem_tlb_resp.gpa := DontCare - mem_tlb_resp.gpa_is_pte := DontCare - mem_tlb_resp.gf := 0.U.asTypeOf(new TLBExceptions) - mem_tlb_resp.cacheable := DontCare - mem_tlb_resp.must_alloc := DontCare - mem_tlb_resp.prefetchable := DontCare - mem_tlb_resp.size := DontCare - mem_tlb_resp.cmd := DontCare - ptc.io.s1.tlb_resp := mem_tlb_resp - itc.io.tlb_resp := mem_tlb_resp - - ptc.io.s2.scalar_store_pending := io.core.wb.store_pending - - io.core.wb.retire_late := itc.io.retire - io.core.wb.inst := Mux(itc.io.busy, itc.io.inst.bits , ptc.io.s2.inst.bits.bits) - io.core.wb.pc := Mux(itc.io.busy, itc.io.pc , ptc.io.s2.pc) - io.core.wb.xcpt := Mux(itc.io.busy, itc.io.xcpt.valid , ptc.io.s2.xcpt.valid) - io.core.wb.cause := Mux(itc.io.busy, itc.io.xcpt.bits.cause, ptc.io.s2.xcpt.bits.cause) - io.core.wb.tval := Mux(itc.io.busy, itc.io.xcpt.bits.tval , ptc.io.s2.xcpt.bits.tval) - io.core.wb.internal_replay := ptc.io.s2.internal_replay.valid - io.core.wb.block_all := itc.io.busy || (ptc.io.s2.inst.valid && !ptc.io.s2.retire && !ptc.io.s2.internal_replay.valid) - io.core.wb.rob_should_wb := Mux(itc.io.busy, itc.io.inst.writes_xrf, ptc.io.s2.inst.bits.writes_xrf) - io.core.wb.rob_should_wb_fp := Mux(itc.io.busy, itc.io.inst.writes_frf, ptc.io.s2.inst.bits.writes_frf) - io.core.set_vstart := Mux(itc.io.busy, itc.io.vstart, ptc.io.s2.vstart) - io.core.set_vconfig := itc.io.vconfig - ptc.io.s2.vxrm := io.core.wb.vxrm - ptc.io.s2.frm := io.core.wb.frm - itc.io.in := ptc.io.s2.internal_replay - - when (!io.issue.ready && ptc.io.s2.inst.valid) { replayed := true.B } - when (io.issue.ready) { replayed := false.B } - - io.issue.valid := Mux(itc.io.busy, itc.io.issue.valid, ptc.io.s2.issue.valid) - io.issue.bits := Mux(itc.io.busy, itc.io.issue.bits , ptc.io.s2.issue.bits) - itc.io.issue.ready := io.issue.ready - ptc.io.s2.issue.ready := !itc.io.busy && io.issue.ready - - io.core.trap_check_busy := ptc.io.busy || itc.io.busy - - itc.io.status := io.core.status - itc.io.index_access <> io.index_access - itc.io.mask_access <> io.mask_access - io.scalar_check.addr := io.core.wb.scalar_check.bits.addr - io.scalar_check.size := io.core.wb.scalar_check.bits.size - io.scalar_check.store := io.core.wb.scalar_check.bits.store - io.core.wb.scalar_check.ready := !io.scalar_check.conflict && !(ptc.io.s2.inst.valid && ptc.io.s2.inst.bits.vmu) - - io.core.backend_busy := false.B // set externally - io.core.set_vxsat := false.B // set externally - io.core.set_fflags := DontCare // set externally - io.core.resp := DontCare // set externally -} diff --git a/arch/src/main/scala/framework/gendomain/shuttle/Integration.scala b/arch/src/main/scala/framework/gendomain/shuttle/Integration.scala deleted file mode 100644 index bdc2b7c5..00000000 --- a/arch/src/main/scala/framework/gendomain/shuttle/Integration.scala +++ /dev/null @@ -1,77 +0,0 @@ -package framework.gendomain.shuttle - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.util._ -import freechips.rocketchip.tile._ -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.diplomacy._ - -import framework.gendomain.common._ -import framework.gendomain.backend.{VectorBackend} -import framework.gendomain.mem.{TLSplitInterface, SGTLInterface, VectorMemUnit} -import framework.gendomain.frontend.{VectorDispatcher} -import shuttle.common._ - - -class SaturnShuttleUnit(implicit p: Parameters) extends ShuttleVectorUnit()(p) with HasVectorParams with HasCoreParameters { - assert(!vParams.useScalarFPFMA && !vParams.useScalarFPMisc) - if (vParams.useScalarFPFMA) { - require(coreParams.fpu.get.dfmaLatency == vParams.fmaPipeDepth - 1) - } - - val tl_if = LazyModule(new TLSplitInterface) - atlNode := TLBuffer(vParams.tlBuffer) := TLWidthWidget(dLenB) := tl_if.node - - val sg_if = sgNode.map { n => - val sg_if = LazyModule(new SGTLInterface) - n :=* sg_if.node - sg_if - } - - override lazy val module = new SaturnShuttleImpl - class SaturnShuttleImpl extends ShuttleVectorUnitModuleImp(this) with HasVectorParams with HasCoreParameters { - - val dis = Module(new VectorDispatcher) - val scalar_arb = Module(new Arbiter(new ScalarWrite, 2)) - val vfu = Module(new SaturnShuttleFrontend(sgSize, tl_if.edge)) - val vu = Module(new VectorBackend) - val vmu = Module(new VectorMemUnit(sgSize)) - - sg_if.foreach { sg => - sg.module.io.vec <> vmu.io.sgmem.get - } - - dis.io.issue <> vfu.io.issue - vfu.io.core <> io - vfu.io.sg_base := io_sg_base - - vu.io.index_access <> vfu.io.index_access - vu.io.mask_access <> vfu.io.mask_access - vu.io.vmu <> vmu.io.vu - vu.io.vat_tail := dis.io.vat_tail - vu.io.vat_head := dis.io.vat_head - vu.io.dis <> dis.io.dis - dis.io.vat_release := vu.io.vat_release - vmu.io.enq <> dis.io.mem - - vmu.io.scalar_check <> vfu.io.scalar_check - - io.backend_busy := vu.io.busy || tl_if.module.io.mem_busy || sg_if.map(_.module.io.mem_busy).getOrElse(false.B) || vmu.io.busy - io.set_vxsat := vu.io.set_vxsat - io.set_fflags := vu.io.set_fflags - - - scalar_arb.io.in(0) <> vu.io.scalar_resp - scalar_arb.io.in(1) <> dis.io.scalar_resp - io.resp <> Queue(scalar_arb.io.out) - - tl_if.module.io.vec <> vmu.io.dmem - - vu.io.fp_req.ready := false.B - vu.io.fp_resp.valid := false.B - vu.io.fp_resp.bits := DontCare - } -} diff --git a/arch/src/main/scala/framework/gpdomain/GPDomain.scala b/arch/src/main/scala/framework/gpdomain/GPDomain.scala new file mode 100644 index 00000000..0cd62dfa --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/GPDomain.scala @@ -0,0 +1,38 @@ +package framework.gpdomain + +import chisel3._ +import chisel3.util._ +import framework.frontend.globalrs.{GlobalSchedComplete, GlobalSchedIssue} +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.top.GlobalConfig + +@instantiable +class GpDomain(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val global_issue_i = Flipped(Decoupled(new GlobalSchedIssue(b))) + val global_complete_o = Decoupled(new GlobalSchedComplete(b)) + // Status signal + val busy = Output(Bool()) + }) + + io.global_issue_i.ready := io.global_complete_o.ready + +// ----------------------------------------------------------------------------- +// Decode Stage +// ----------------------------------------------------------------------------- + val decoder: Instance[framework.gpdomain.sequencer.decoder.DomainDecoder] = + Instantiate(new framework.gpdomain.sequencer.decoder.DomainDecoder(b)) + // Extract raw_inst from PostGDCmd + decoder.io.inst_i <> io.global_issue_i.bits.cmd.cmd + val decoded = decoder.io.decoded_o + + io.global_complete_o.valid := io.global_issue_i.valid + io.global_complete_o.bits.rob_id := io.global_issue_i.bits.rob_id + io.global_complete_o.bits.is_sub := io.global_issue_i.bits.is_sub + io.global_complete_o.bits.sub_rob_id := io.global_issue_i.bits.sub_rob_id + + io.busy := false.B + +} diff --git a/arch/src/main/scala/framework/gpdomain/LICENSE.t1 b/arch/src/main/scala/framework/gpdomain/LICENSE.t1 new file mode 100644 index 00000000..c6d9f269 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/LICENSE.t1 @@ -0,0 +1,183 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + ------------------------------------------------------------------------ + Note: + Individual files contain the following tag instead of the full license text. + + // SPDX-License-Identifier: Apache-2.0 diff --git a/arch/src/main/scala/framework/gpdomain/configs/GpDomainParam.scala b/arch/src/main/scala/framework/gpdomain/configs/GpDomainParam.scala new file mode 100644 index 00000000..620b0109 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/configs/GpDomainParam.scala @@ -0,0 +1,33 @@ +package framework.gpdomain.configs + +import upickle.default._ + +/** + * GpDomain Parameter + */ +case class GpDomainParam( + /** Number of lanes in the GP domain */ + laneNumber: Int, + /** Chaining size for instruction scheduling */ + chainingSize: Int, + /** Vector length in bits */ + vLen: Int, + /** Data length per lane in bits */ + dLen: Int, + /** Element length in bits */ + eLen: Int, + /** Lane scale factor */ + laneScale: Int) + +object GpDomainParam { + implicit val rw: ReadWriter[GpDomainParam] = macroRW + + /** + * 从默认的局部JSON文件加载 + */ + def apply(): GpDomainParam = { + val jsonStr = scala.io.Source.fromFile("src/main/scala/framework/gpdomain/configs/default.json").mkString + read[GpDomainParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/configs/default.json b/arch/src/main/scala/framework/gpdomain/configs/default.json new file mode 100644 index 00000000..e8322624 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/configs/default.json @@ -0,0 +1,8 @@ +{ + "laneNumber": 4, + "chainingSize": 4, + "vLen": 1024, + "dLen": 128, + "eLen": 32, + "laneScale": 4 +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/Bundles.scala b/arch/src/main/scala/framework/gpdomain/sequencer/Bundles.scala new file mode 100644 index 00000000..db83becb --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/Bundles.scala @@ -0,0 +1,300 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu +// Port to buckyball framework + +package framework.gpdomain.sequencer + +import chisel3._ +import chisel3.util._ +import chisel3.util.experimental.decode.DecodeBundle +import framework.gpdomain.sequencer.decoder.{Decoder, DomainDecoderParameter} + +/** CSR Interface from Scalar Core */ +class CSRInterface(vlWidth: Int) extends Bundle { + + /** Vector Length Register `vl` */ + val vl: UInt = UInt(vlWidth.W) + + /** Vector Start Index CSR `vstart` */ + val vStart: UInt = UInt(vlWidth.W) + + /** Vector Register Grouping `vlmul[2:0]` subfield of `vtype` */ + val vlmul: UInt = UInt(3.W) + + /** Vector Selected Element Width `vsew[2:0]` subfield of `vtype` */ + val vSew: UInt = UInt(2.W) + + /** Vector Fixed-Point Rounding Mode Register `vxrm` */ + val vxrm: UInt = UInt(2.W) + + /** Floating-point rounding mode */ + val frm: UInt = UInt(3.W) + + /** Vector Tail Agnostic */ + val vta: Bool = Bool() + + /** Vector Mask Agnostic */ + val vma: Bool = Bool() +} + +/** Instruction record for tracking issued instructions */ +class InstructionRecord(instructionIndexWidth: Int) extends Bundle { + + /** Index of this instruction, maintained by instruction counter */ + val instructionIndex: UInt = UInt(instructionIndexWidth.W) + + /** Whether instruction is load/store */ + val isLoadStore: Bool = Bool() + + /** Whether instruction is mask type */ + val maskType: Bool = Bool() + + val gather: Bool = Bool() + val pop: Bool = Bool() +} + +/** Instruction execution state */ +class InstructionState extends Bundle { + + /** Wait for last signal from each lane */ + val wLast: Bool = Bool() + + /** The slot is idle */ + val idle: Bool = Bool() + + /** Used for mask unit, schedule mask unit to execute */ + val wMaskUnitLast: Bool = Bool() + + /** Used for instruction commit */ + val sCommit: Bool = Bool() +} + +/** Instruction control state for each slot */ +class InstructionControl(instIndexWidth: Int, laneSize: Int) extends Bundle { + + /** Metadata for this instruction */ + val record: InstructionRecord = new InstructionRecord(instIndexWidth) + + /** Control state to record the current execution state */ + val state: InstructionState = new InstructionState + + /** Tag for recording each lane is finished for this instruction */ + val endTag: Vec[Bool] = Vec(laneSize + 1, Bool()) + + /** Fixed-point saturation flag */ + val vxsat: Bool = Bool() +} + +/** Issue token for token manager */ +class IssueToken(instructionIndexBits: Int) extends Bundle { + val instructionIndex: UInt = UInt(instructionIndexBits.W) + val writeV0: Bool = Bool() + val useV0AsMask: Bool = Bool() + val isLoadStore: Bool = Bool() + val toLane: Bool = Bool() + val toMask: Bool = Bool() +} + +/** Instruction pipe bundle for queuing */ +class InstructionPipeBundle( + xLen: Int, + vLen: Int, + instructionIndexBits: Int, + vlMaxBits: Int) + extends Bundle { + val instruction: UInt = UInt(32.W) + val rs1Data: UInt = UInt(xLen.W) + val rs2Data: UInt = UInt(xLen.W) + val vl: UInt = UInt(32.W) + val vstart: UInt = UInt(32.W) + val vtype: UInt = UInt(32.W) + val decodeResult: DecodeBundle = Decoder.bundle(DomainDecoderParameter.decoderParam).cloneType + val instructionIndex: UInt = UInt(instructionIndexBits.W) + val vdIsV0: Bool = Bool() + val writeByte: UInt = UInt(vlMaxBits.W) +} + +// ============================================================================ +// VRF Related Bundles +// ============================================================================ + +/** Request to access VRF in each lanes */ +class VRFReadRequest(regNumBits: Int, offsetBits: Int, instructionIndexBits: Int) extends Bundle { + + /** address to access VRF (v0, v1, v2, ...) */ + val vs: UInt = UInt(regNumBits.W) + + /** read vs1 vs2 vd? */ + val readSource: UInt = UInt(2.W) + + /** the offset of VRF access */ + val offset: UInt = UInt(offsetBits.W) + + /** index for record the age of instruction, designed for handling RAW hazard */ + val instructionIndex: UInt = UInt(instructionIndexBits.W) +} + +class VRFWriteRequest( + regNumBits: Int, + offsetBits: Int, + instructionIndexSize: Int, + dataPathWidth: Int) + extends Bundle { + + /** address to access VRF (v0, v1, v2, ...) */ + val vd: UInt = UInt(regNumBits.W) + + /** the offset of VRF access */ + val offset: UInt = UInt(offsetBits.W) + + /** write mask in byte */ + val mask: UInt = UInt((dataPathWidth / 8).W) + + /** data to write to VRF */ + val data: UInt = UInt(dataPathWidth.W) + + /** this is the last write of this instruction */ + val last: Bool = Bool() + + /** used to update the record in VRF */ + val instructionIndex: UInt = UInt(instructionIndexSize.W) +} + +class V0Update(datapathWidth: Int, vrfOffsetBits: Int) extends Bundle { + val data: UInt = UInt(datapathWidth.W) + val offset: UInt = UInt(vrfOffsetBits.W) + val mask: UInt = UInt((datapathWidth / 8).W) +} + +// ============================================================================ +// Lane Related Bundles +// ============================================================================ + +/** Request from sequencer to lane */ +class LaneRequest( + instructionIndexBits: Int, + datapathWidth: Int, + vlMaxBits: Int, + laneNumber: Int, + dataPathByteWidth: Int) + extends Bundle { + val instructionIndex: UInt = UInt(instructionIndexBits.W) + + // decode + val decodeResult: DecodeBundle = Decoder.bundle(DomainDecoderParameter.decoderParam).cloneType + val loadStore: Bool = Bool() + val issueInst: Bool = Bool() + val store: Bool = Bool() + val special: Bool = Bool() + val lsWholeReg: Bool = Bool() + + // instruction + val vs1: UInt = UInt(5.W) + val vs2: UInt = UInt(5.W) + val vd: UInt = UInt(5.W) + + val loadStoreEEW: UInt = UInt(2.W) + val mask: Bool = Bool() + val segment: UInt = UInt(3.W) + + /** data of rs1 */ + val readFromScalar: UInt = UInt(datapathWidth.W) + + val csrInterface: CSRInterface = new CSRInterface(vlMaxBits) + + val writeCount: UInt = UInt((vlMaxBits - log2Ceil(laneNumber) - log2Ceil(dataPathByteWidth)).W) + + val maskE0: Bool = Bool() +} + +class LaneResponse(chaining1HBits: Int, vlMaxBits: Int) extends Bundle { + val instructionFinished: UInt = UInt(chaining1HBits.W) + val vxsatReport: UInt = UInt(chaining1HBits.W) + val popCount: UInt = UInt(vlMaxBits.W) +} + +class LaneResponseFeedback(instructionIndexBits: Int) extends Bundle { + + /** which instruction is the source of this transaction */ + val instructionIndex: UInt = UInt(instructionIndexBits.W) + + /** for instructions that might finish in other lanes */ + val complete: Bool = Bool() +} + +// ============================================================================ +// Mask Related Bundles +// ============================================================================ + +class MaskRequest(maskGroupSizeBits: Int) extends Bundle { + + /** select which mask group */ + val maskSelect: UInt = UInt(maskGroupSizeBits.W) + + /** The sew of instruction which is requesting for mask */ + val maskSelectSew: UInt = UInt(2.W) + + val slide: Bool = Bool() +} + +class MaskRequestAck(maskGroupWidth: Int) extends Bundle { + val data: UInt = UInt(maskGroupWidth.W) +} + +class MaskUnitExeReq( + eLen: Int, + datapathWidth: Int, + instructionIndexBits: Int, + fpuEnable: Boolean) + extends Bundle { + val source1: UInt = UInt(datapathWidth.W) + val source2: UInt = UInt(datapathWidth.W) + val index: UInt = UInt(instructionIndexBits.W) + val ffo: UInt = UInt((datapathWidth / eLen).W) + val fpReduceValid: Option[UInt] = Option.when(fpuEnable)(UInt((datapathWidth / eLen).W)) + val maskRequestToLSU: Bool = Bool() +} + +// ============================================================================ +// LSU Related Bundles +// ============================================================================ + +class LSUInstructionInformation extends Bundle { + val nf: UInt = UInt(3.W) + val mew: Bool = Bool() + val mop: UInt = UInt(2.W) + val lumop: UInt = UInt(5.W) + val eew: UInt = UInt(2.W) + val vs3: UInt = UInt(5.W) + val isStore: Bool = Bool() + val maskedLoadStore: Bool = Bool() + + def fof: Bool = mop === 0.U && lumop(4) && !isStore +} + +class LSURequest(dataWidth: Int, chainingSize: Int) extends Bundle { + val instructionInformation: LSUInstructionInformation = new LSUInstructionInformation + val rs1Data: UInt = UInt(dataWidth.W) + val rs2Data: UInt = UInt(dataWidth.W) + val instructionIndex: UInt = UInt((log2Ceil(chainingSize) + 1).W) +} + +class LSURequestInterface(dataWidth: Int, chainingSize: Int, vlWidth: Int) extends Bundle { + val request: LSURequest = new LSURequest(dataWidth, chainingSize) + val csrInterface: CSRInterface = new CSRInterface(vlWidth) +} + +// ============================================================================ +// Common Bundles +// ============================================================================ + +class LastReportBundle(chaining1HBits: Int) extends Bundle { + val last: UInt = UInt(chaining1HBits.W) +} + +class WriteCountReport(vLen: Int, laneNumber: Int, instSize: Int) extends Bundle { + val count: UInt = UInt(log2Ceil(vLen / laneNumber).W) + val instructionIndex: UInt = UInt(instSize.W) +} + +final class EmptyBundle extends Bundle diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/Sequencer.scala b/arch/src/main/scala/framework/gpdomain/sequencer/Sequencer.scala new file mode 100644 index 00000000..b1c6c1c8 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/Sequencer.scala @@ -0,0 +1,259 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu +// Port to buckyball framework + +package framework.gpdomain.sequencer + +import chisel3._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import chisel3.util._ +import chisel3.util.experimental.decode.DecodeBundle +import framework.top.GlobalConfig +import framework.frontend.globalrs.{GlobalSchedComplete, GlobalSchedIssue} +import framework.gpdomain.sequencer.decoder.{Decoder, DomainDecoder} + +/** + * Sequencer for GP Domain + * + * Handles instruction issue, scheduling, token management, and coordination + * between lanes, LSU, and mask unit. + * + * Ported from T1's sequencer logic + */ +@instantiable +class Sequencer(val b: GlobalConfig) extends Module { + val chainingSize = b.gpDomain.chainingSize + val laneNumber = b.gpDomain.laneNumber + val vLen = b.gpDomain.vLen + val dLen = b.gpDomain.dLen + val eLen = b.gpDomain.eLen + val xLen = 32 // Fixed to 32 for now + + val instructionIndexBits = log2Ceil(chainingSize) + 1 + val chaining1HBits = 2 << log2Ceil(chainingSize) + val datapathWidth = b.gpDomain.laneScale * eLen + val vlMaxBits = log2Ceil(vLen * 8 / 8) + 1 // vLen in bits / sewMin + val regNumBits = 5 // 32 vector registers + val vrfOffsetBits = log2Ceil(vLen / datapathWidth / laneNumber) + + @public + val io = IO(new Bundle { + // Interface with global RS + val global_issue_i = Flipped(Decoupled(new GlobalSchedIssue(b))) + val global_complete_o = Decoupled(new GlobalSchedComplete(b)) + + // TODO: Interface with lanes (placeholder) + // val laneRequest = Vec(laneNumber, Decoupled(new LaneRequest(...))) + // val laneResponse = Vec(laneNumber, Flipped(Decoupled(new LaneResponse(...)))) + + // TODO: Interface with LSU (placeholder) + // val lsuRequest = Decoupled(new LSURequestInterface(...)) + // val lsuReport = Flipped(Decoupled(new LastReportBundle(...))) + + // TODO: Interface with mask unit (placeholder) + // val maskUnitRequest = Decoupled(new MaskUnitExeReq(...)) + // val maskUnitReport = Flipped(Decoupled(new LastReportBundle(...))) + + // Status + val busy = Output(Bool()) + }) + + // =========================================================================== + // Decoder + // =========================================================================== + val decoder: Instance[DomainDecoder] = Instantiate(new DomainDecoder(b)) + decoder.io.inst_i := io.global_issue_i.bits.cmd.cmd + val decoded = decoder.io.decoded_o + + // =========================================================================== + // Token Manager + // =========================================================================== + val tokenManager: Instance[TokenManager] = Instantiate(new TokenManager(b)) + + // =========================================================================== + // Instruction Counter + // =========================================================================== + val instructionCounter: UInt = RegInit(0.U(instructionIndexBits.W)) + val nextInstructionCounter: UInt = instructionCounter + 1.U + + // =========================================================================== + // Request Register (1-deep queue) + // =========================================================================== + val requestReg: ValidIO[InstructionPipeBundle] = RegInit( + 0.U.asTypeOf(Valid(new InstructionPipeBundle(xLen, vLen, instructionIndexBits, vlMaxBits))) + ) + + val requestRegDequeue = Wire(Decoupled(io.global_issue_i.bits.cloneType)) + + // Latch instruction when issued + when(io.global_issue_i.fire) { + requestReg.bits.instruction := io.global_issue_i.bits.cmd.cmd.raw_inst + requestReg.bits.rs1Data := io.global_issue_i.bits.cmd.cmd.rs1Data + requestReg.bits.rs2Data := io.global_issue_i.bits.cmd.cmd.rs2Data + requestReg.bits.vl := 0.U // TODO: extract from CSR + requestReg.bits.vstart := 0.U // TODO: extract from CSR + requestReg.bits.vtype := 0.U // TODO: extract from CSR + requestReg.bits.decodeResult := decoded + requestReg.bits.instructionIndex := instructionCounter + requestReg.bits.vdIsV0 := (io.global_issue_i.bits.cmd.cmd.raw_inst(11, 7) === 0.U) && + (io.global_issue_i.bits.cmd.cmd.raw_inst(6) || !io.global_issue_i.bits.cmd.cmd.raw_inst(5)) + requestReg.bits.writeByte := 0.U // TODO: calculate based on vl and sew + + instructionCounter := nextInstructionCounter + } + + // Request register valid update + requestReg.valid := Mux( + io.global_issue_i.fire ^ requestRegDequeue.fire, + io.global_issue_i.fire, + requestReg.valid + ) + + // Manually maintain dequeue interface + requestRegDequeue.bits := io.global_issue_i.bits + requestRegDequeue.valid := requestReg.valid + + // Decode result alias + val decodeResult: DecodeBundle = requestReg.bits.decodeResult + + // Instruction type detection + val isLoadStoreType: Bool = !requestRegDequeue.bits.cmd.cmd.raw_inst(6) && requestRegDequeue.valid + val isStoreType: Bool = !requestRegDequeue.bits.cmd.cmd.raw_inst(6) && requestRegDequeue.bits.cmd.cmd.raw_inst(5) + val maskType: Bool = !requestRegDequeue.bits.cmd.cmd.raw_inst(25) + + // =========================================================================== + // Instruction Slots (State Machine Registers) + // =========================================================================== + val instructionFinished: Vec[Vec[Bool]] = Wire(Vec(laneNumber, Vec(chainingSize, Bool()))) + val vxsatReportVec: Vec[UInt] = Wire(Vec(laneNumber, UInt(chainingSize.W))) + val vxsatReport = vxsatReportVec.reduce(_ | _) + + // Initialize dummy values (will be connected to lanes later) + instructionFinished.foreach(_.foreach(_ := false.B)) + vxsatReportVec.foreach(_ := 0.U) + + val slots: Seq[InstructionControl] = Seq.tabulate(chainingSize) { index => + val control = RegInit( + (-1.S(new InstructionControl(instructionIndexBits, laneNumber).getWidth.W)) + .asTypeOf(new InstructionControl(instructionIndexBits, laneNumber)) + ) + + // Execution finished check + val laneAndLSUFinish: Bool = control.endTag.asUInt.andR + val v0WriteFinish = !ohCheck(tokenManager.v0WriteValid, control.record.instructionIndex, chainingSize) + + // LSU finished (placeholder) + val lsuFinished: Bool = false.B // TODO: connect to LSU + val vxsatUpdate = ohCheck(vxsatReport, control.record.instructionIndex, chainingSize) + + // Instruction allocation to this slot + val instructionToSlotOH: UInt = Wire(UInt(chainingSize.W)) + when(instructionToSlotOH(index)) { + // Instruction metadata + control.record.instructionIndex := requestReg.bits.instructionIndex + control.record.isLoadStore := isLoadStoreType + control.record.maskType := maskType + control.record.gather := false.B // TODO: decode + control.record.pop := false.B // TODO: decode + + // Control signals + control.state.idle := false.B + control.state.wLast := false.B + control.state.sCommit := false.B + control.state.wMaskUnitLast := true.B // TODO: check maskUnit requirement + + control.vxsat := false.B + + // Initialize endTag + control.endTag := VecInit(Seq.fill(laneNumber)(false.B) :+ !isLoadStoreType) + }.otherwise { + // State machine updates + when(laneAndLSUFinish && v0WriteFinish) { + control.state.wLast := true.B + } + + // TODO: Add retire logic + // when(responseCounter === control.record.instructionIndex && retire) { + // control.state.sCommit := true.B + // } + + when(control.state.sCommit && control.state.wMaskUnitLast) { + control.state.idle := true.B + } + + // Update endTag from lanes + control.endTag.zip(instructionFinished.map(_(index)) :+ lsuFinished).foreach { + case (d, c) => + d := d || c + } + + when(vxsatUpdate) { + control.vxsat := true.B + } + } + + control + } + + // =========================================================================== + // Slot Allocation Logic + // =========================================================================== + val slotFree: Vec[Bool] = VecInit(slots.map(_.state.idle)) + val allSlotFree: Bool = slotFree.asUInt.andR + val freeOR: Bool = slotFree.asUInt.orR + + // Special instructions go to last slot + val specialInstruction: Bool = false.B // TODO: decode special instructions + val slotReady: Bool = Mux(specialInstruction, slots.last.state.idle, freeOR) + + // Select slot for new instruction + val instructionToSlotOH: UInt = Mux( + specialInstruction, + UIntToOH(chainingSize.U), + PriorityEncoderOH(slotFree.asUInt) + ) + + // =========================================================================== + // Token Manager Connections + // =========================================================================== + tokenManager.instructionIssue.valid := requestRegDequeue.valid + tokenManager.instructionIssue.bits.instructionIndex := requestReg.bits.instructionIndex + tokenManager.instructionIssue.bits.writeV0 := requestReg.bits.vdIsV0 + tokenManager.instructionIssue.bits.useV0AsMask := maskType + tokenManager.instructionIssue.bits.isLoadStore := isLoadStoreType + tokenManager.instructionIssue.bits.toLane := !isLoadStoreType + tokenManager.instructionIssue.bits.toMask := false.B // TODO: detect mask unit instructions + + // LSU write v0 connections (placeholder) + tokenManager.lsuWriteV0.foreach { port => + port.valid := false.B + port.bits := 0.U + } + + // Instruction finish signals (placeholder) + tokenManager.instructionFinish.foreach(_ := 0.U) + tokenManager.maskUnitFree := true.B + + // =========================================================================== + // Issue Logic + // =========================================================================== + // Can issue when: + // 1. There's a free slot + // 2. Token manager allows (no v0 conflicts) + // 3. All lanes are ready (TODO) + val canIssue: Bool = slotReady && tokenManager.issueAllow + + io.global_issue_i.ready := canIssue && io.global_complete_o.ready + requestRegDequeue.ready := canIssue + + // =========================================================================== + // Completion Logic (Placeholder) + // =========================================================================== + io.global_complete_o.valid := io.global_issue_i.valid + io.global_complete_o.bits.rob_id := io.global_issue_i.bits.rob_id + + // =========================================================================== + // Status + // =========================================================================== + io.busy := !allSlotFree +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/TokenManager.scala b/arch/src/main/scala/framework/gpdomain/sequencer/TokenManager.scala new file mode 100644 index 00000000..84897bd4 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/TokenManager.scala @@ -0,0 +1,106 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu +// Port to buckyball framework + +package framework.gpdomain.sequencer + +import chisel3._ +import chisel3.experimental.hierarchy.{instantiable, public} +import chisel3.util._ +import framework.top.GlobalConfig + +object TokenManagerUtil { + + /** Convert index to one-hot encoding */ + def indexToOH(index: UInt, chainingSize: Int): UInt = + UIntToOH(index(log2Ceil(chainingSize), 0)) + + /** Conditional mask application */ + def maskAnd(mask: Bool, data: Data): Data = + Mux(mask, data, 0.U.asTypeOf(data)) +} + +@instantiable +class TokenManager(b: GlobalConfig) extends Module { + import TokenManagerUtil._ + + val chainingSize = b.gpDomain.chainingSize + val laneNumber = b.gpDomain.laneNumber + val instructionIndexBits = log2Ceil(chainingSize) + 1 + val chaining1HBits = 2 << log2Ceil(chainingSize) + + @public + val instructionIssue: ValidIO[IssueToken] = IO(Flipped(Valid(new IssueToken(instructionIndexBits)))) + + @public + val lsuWriteV0: Vec[ValidIO[UInt]] = IO( + Vec(laneNumber, Flipped(Valid(UInt(instructionIndexBits.W)))) + ) + + @public + val issueAllow: Bool = IO(Output(Bool())) + + @public + val instructionFinish: Vec[UInt] = IO(Vec(laneNumber, Input(UInt(chaining1HBits.W)))) + + @public + val v0WriteValid = IO(Output(UInt(chaining1HBits.W))) + + @public + val maskUnitFree: Bool = IO(Input(Bool())) + + val issueIndex1H: UInt = indexToOH(instructionIssue.bits.instructionIndex, chainingSize) + + // Boolean type token clear & set + def updateBooleanToken(set: UInt, clear: UInt): UInt = { + VecInit(Seq.tabulate(chaining1HBits) { chainingIndex => + val res = RegInit(false.B) + when(set(chainingIndex) || clear(chainingIndex)) { + res := set(chainingIndex) + } + res + }).asUInt + } + + // v0 write token + val v0WriteValidVec: Seq[UInt] = Seq.tabulate(laneNumber) { laneIndex => + val lsuWriteSet = maskAnd( + lsuWriteV0(laneIndex).valid, + indexToOH(lsuWriteV0(laneIndex).bits, chainingSize) + ).asUInt + val v0WriteIssue = + instructionIssue.valid && instructionIssue.bits.writeV0 && (instructionIssue.bits.toLane || instructionIssue.bits.isLoadStore) + val clear: UInt = instructionFinish(laneIndex) + val updateOH = maskAnd(v0WriteIssue, issueIndex1H).asUInt + updateBooleanToken(updateOH | lsuWriteSet, clear) + } + + val useV0AsMaskToken: UInt = Seq + .tabulate(laneNumber) { laneIndex => + val useV0Issue = instructionIssue.valid && instructionIssue.bits.useV0AsMask && + instructionIssue.bits.toLane + val clear: UInt = instructionFinish(laneIndex) + val updateOH = maskAnd(useV0Issue, issueIndex1H).asUInt + updateBooleanToken(updateOH, clear) + } + .reduce(_ | _) + + val maskUnitWriteV0: Bool = { + val set = instructionIssue.valid && instructionIssue.bits.writeV0 && instructionIssue.bits.toMask + val clear = maskUnitFree + val res = RegInit(false.B) + when(set || clear) { + res := set + } + res + } + + v0WriteValid := v0WriteValidVec.reduce(_ | _) + + // v0 read-write conflict + val v0Conflict: Bool = + (instructionIssue.bits.writeV0 && useV0AsMaskToken.orR) || + (instructionIssue.bits.useV0AsMask && (v0WriteValid.orR || maskUnitWriteV0)) + + issueAllow := !v0Conflict +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/DISA.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/DISA.scala new file mode 100644 index 00000000..87995075 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/DISA.scala @@ -0,0 +1,13 @@ +package framework.gpdomain.sequencer.decoder + +import chisel3._ +import chisel3.util._ +import framework.core.bbtile.RoCCCommandBB +import framework.top.GlobalConfig + +object DISA { + // RVV Instruction Opcodes + val RVV_OPCODE_V = "b1010111".U // 0x57: OP-V (vector compute) + val RVV_OPCODE_VL = "b0000111".U // 0x07: LOAD-FP (vector load) + val RVV_OPCODE_VS = "b0100111".U // 0x27: STORE-FP (vector store) +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/Decoder.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/Decoder.scala new file mode 100644 index 00000000..c33559f5 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/Decoder.scala @@ -0,0 +1,561 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder + +import chisel3._ +import chisel3.experimental.SerializableModuleParameter +import chisel3.util.BitPat +import chisel3.util.experimental.decode._ + +import framework.gpdomain.sequencer.decoder.InstructionEncoding.Instruction +import framework.gpdomain.sequencer.decoder.attribute._ + +object DecoderParam { + implicit def rwP: upickle.default.ReadWriter[DecoderParam] = upickle.default.macroRW +} + +case class DecoderParam( + fpuEnable: Boolean, + zvbbEnable: Boolean, + useXsfmm: Boolean, + allInstructions: Seq[Instruction]) + extends SerializableModuleParameter + +trait T1DecodeFiled[D <: Data] extends DecodeField[T1DecodePattern, D] with FieldName + +trait BoolField extends T1DecodeFiled[Bool] with BoolDecodeField[T1DecodePattern] { + def getTriState(pattern: T1DecodePattern): TriState + + override def genTable(pattern: T1DecodePattern): BitPat = + getTriState(pattern) match { + case attribute.Y => y + case attribute.N => n + case attribute.DC => dc + } + +} + +trait T1UopField extends T1DecodeFiled[UInt] with FieldName { + def chiselType: UInt = UInt(4.W) +} + +trait T1TopUopField extends T1DecodeFiled[UInt] with FieldName { + def chiselType: UInt = UInt(5.W) +} + +trait T1fpExecutionTypeUopField extends T1DecodeFiled[UInt] with FieldName { + def chiselType: UInt = UInt(2.W) +} + +object Decoder { + + object logic extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isLogic.value + } + + object adder extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isAdder.value + } + + object shift extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isShift.value + } + + object multiplier extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMultiplier.value + } + + object divider extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isDivider.value + } + + object multiCycle extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMulticycle.value + } + + object other extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isOther.value + } + + object unsigned0 extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isUnsigned0.value + } + + object unsigned1 extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isUnsigned1.value + } + + object itype extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isItype.value + } + + object nr extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isNr.value + } + + object red extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isRed.value + } + + object widenReduce extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isWidenreduce.value + } + + object targetRd extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isTargetrd.value + } + + object slid extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isSlid.value + } + + object gather extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isGather.value + } + + object gather16 extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isGather16.value + } + + object compress extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isCompress.value + } + + object unOrderWrite extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isUnorderwrite.value + } + + object extend extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isExtend.value + } + + object mv extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMv.value + } + + object iota extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isIota.value + } + + object maskLogic extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMasklogic.value + } + + object maskDestination extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMaskdestination.value + } + + object maskSource extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMasksource.value + } + + object readOnly extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isReadonly.value + } + + object vwmacc extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isVwmacc.value + } + + object saturate extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isSaturate.value + } + + object special extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isSpecial.value + } + + object maskUnit extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMaskunit.value + } + + object crossWrite extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isCrosswrite.value + } + + object crossRead extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isCrossread.value + } + + object sWrite extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isSwrite.value + } + + object vtype extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isVtype.value + } + + object sReadVD extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isSreadvd.value + } + + object scheduler extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isScheduler.value + } + + object dontNeedExecuteInLane extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isDontneedexecuteinlane.value + } + + object reverse extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isReverse.value + } + + object average extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isAverage.value + } + + object ffo extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isFfo.value + } + + object popCount extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isPopcount.value + } + + object specialSlot extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isSpecialslot.value + } + + object float extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isFloat.value + } + + object floatMul extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isFloatmul.value + } + + object orderReduce extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isOrderreduce.value + } + + object zvbb extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isZvbb.value + } + + object zvma extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isZvma.value + } + + object topUop extends T1TopUopField { + + override def genTable(pattern: T1DecodePattern): BitPat = pattern.topUop.value match { + case _: TopT0.type => BitPat("b00000") + case _: TopT1.type => BitPat("b00001") + case _: TopT2.type => BitPat("b00010") + case _: TopT3.type => BitPat("b00011") + case _: TopT4.type => BitPat("b00100") + case _: TopT5.type => BitPat("b00101") + case _: TopT6.type => BitPat("b00110") + case _: TopT7.type => BitPat("b00111") + case _: TopT8.type => BitPat("b01000") + case _: TopT9.type => BitPat("b01001") + case _: TopT10.type => BitPat("b01010") + case _: TopT11.type => BitPat("b01011") + case _: TopT12.type => BitPat("b01100") + case _: TopT13.type => BitPat("b01101") + case _: TopT14.type => BitPat("b01110") + case _: TopT15.type => BitPat("b01111") + case _: TopT16.type => BitPat("b10000") + case _: TopT17.type => BitPat("b10001") + case _: TopT18.type => BitPat("b10010") + case _: TopT19.type => BitPat("b10011") + case _: TopT20.type => BitPat("b10100") + case _: TopT21.type => BitPat("b10101") + case _: TopT22.type => BitPat("b10110") + case _: TopT23.type => BitPat("b10111") + case _: TopT24.type => BitPat("b11000") + case _: TopT25.type => BitPat("b11001") + case _: TopT26.type => BitPat("b11010") + case _: TopT27.type => BitPat("b11011") + case _: TopT28.type => BitPat("b11100") + case _: TopT29.type => BitPat("b11101") + case _: TopT30.type => BitPat("b11110") + case _: TopT31.type => BitPat("b11111") + case _ => BitPat.dontCare(5) + } + + } + + object uop extends T1UopField { + + override def genTable(pattern: T1DecodePattern): BitPat = pattern.decoderUop.value match { + case addCase: AdderUOPType => + addCase match { + case _: addUop0.type => BitPat("b0000") + case _: addUop1.type => BitPat("b0001") + case _: addUop10.type => BitPat("b1010") + case _: addUop11.type => BitPat("b1011") + case _: addUop2.type => BitPat("b0010") + case _: addUop3.type => BitPat("b0011") + case _: addUop4.type => BitPat("b0100") + case _: addUop6.type => BitPat("b0110") + case _: addUop7.type => BitPat("b0111") + case _: addUop8.type => BitPat("b1000") + case _: addUop9.type => BitPat("b1001") + case _ => BitPat.dontCare(4) + } + case divCase: DivUOPType => + divCase match { + case _: divUop0.type => BitPat("b0000") + case _: divUop1.type => BitPat("b0001") + case _: divUop10.type => BitPat("b1010") + case _: divUop8.type => BitPat("b1000") + case _: divUop9.type => BitPat("b1001") + case _ => BitPat.dontCare(4) + } + case floatCase: FloatUopType => + floatCase match { + case _: FUT0.type => BitPat("b0000") + case _: FUT1.type => BitPat("b0001") + case _: FUT10.type => BitPat("b1010") + case _: FUT12.type => BitPat("b1100") + case _: FUT13.type => BitPat("b1101") + case _: FUT14.type => BitPat("b1110") + case _: FUT2.type => BitPat("b0010") + case _: FUT3.type => BitPat("b0011") + case _: FUT4.type => BitPat("b0100") + case _: FUT5.type => BitPat("b0101") + case _: FUT6.type => BitPat("b0110") + case _: FUT7.type => BitPat("b0111") + case _: FUT8.type => BitPat("b1000") + case _: FUT9.type => BitPat("b1001") + case _ => BitPat.dontCare(4) + } + case logicCase: LogicUopType => + logicCase match { + case _: logicUop0.type => BitPat("b0000") + case _: logicUop1.type => BitPat("b0001") + case _: logicUop2.type => BitPat("b0010") + case _: logicUop4.type => BitPat("b0100") + case _: logicUop5.type => BitPat("b0101") + case _: logicUop6.type => BitPat("b0110") + case _: logicUop8.type => BitPat("b1000") + case _: logicUop9.type => BitPat("b1001") + case _ => BitPat.dontCare(4) + } + case mulCase: MulUOPType => + mulCase match { + case _: mulUop0.type => BitPat("b0000") + case _: mulUop1.type => BitPat("b0001") + case _: mulUop10.type => BitPat("b1010") + case _: mulUop14.type => BitPat("b1110") + case _: mulUop3.type => BitPat("b0011") + case _: mulUop5.type => BitPat("b0101") + case _ => BitPat.dontCare(4) + } + case otherCase: OtherUopType => + otherCase match { + case _: otherUop0.type => BitPat("b0000") + case _: otherUop1.type => BitPat("b0001") + case _: otherUop2.type => BitPat("b0010") + case _: otherUop3.type => BitPat("b0011") + case _: otherUop4.type => BitPat("b0100") + case _: otherUop5.type => BitPat("b0101") + case _: otherUop6.type => BitPat("b0110") + case _: otherUop7.type => BitPat("b0111") + case _: otherUop8.type => BitPat("b1000") + case _: otherUop9.type => BitPat("b1001") + case _ => BitPat.dontCare(4) + } + case shiftCase: ShiftUopType => + shiftCase match { + case _: shiftUop0.type => BitPat("b0000") + case _: shiftUop1.type => BitPat("b0001") + case _: shiftUop2.type => BitPat("b0010") + case _: shiftUop4.type => BitPat("b0100") + case _: shiftUop6.type => BitPat("b0110") + case _ => BitPat.dontCare(4) + } + case zeroCase: ZeroUOPType => + zeroCase match { + case _: zeroUop0.type => BitPat("b0000") + case _ => BitPat.dontCare(4) + } + case zvbbCase: ZvbbUOPType => + zvbbCase match { + case _: zvbbUop0.type => BitPat("b0000") // brev + case _: zvbbUop1.type => BitPat("b0001") // brev8 + case _: zvbbUop2.type => BitPat("b0010") // rev8 + case _: zvbbUop3.type => BitPat("b0011") // clz + case _: zvbbUop4.type => BitPat("b0100") // ctz + case _: zvbbUop5.type => BitPat("b0101") // rol + case _: zvbbUop6.type => BitPat("b0110") // ror + case _: zvbbUop7.type => BitPat("b0111") // wsll + case _: zvbbUop8.type => BitPat("b1000") // andn + case _: zvbbUop9.type => BitPat("b1001") // pop + case _ => BitPat.dontCare(4) + } + case _ => BitPat.dontCare(4) + } + + } + + object fpExecutionType extends T1fpExecutionTypeUopField { + + override def genTable(pattern: T1DecodePattern): BitPat = pattern.fpExecutionType match { + case FpExecutionType.Compare => BitPat("b10") + case FpExecutionType.MA => BitPat("b00") + case FpExecutionType.Other => BitPat("b11") + case FpExecutionType.Nil => BitPat.dontCare(2) + } + + } + + object maskPipeType extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isMaskPip.value + } + + object writeCount extends BoolField { + override def getTriState(pattern: T1DecodePattern): TriState = pattern.isWriteCount.value + } + + object maskPipeUop extends T1TopUopField { + + override def genTable(pattern: T1DecodePattern): BitPat = pattern.maskPipeUop.value match { + case _: MaskUop0.type => BitPat("b00000") + case _: MaskUop1.type => BitPat("b00001") + case _: MaskUop2.type => BitPat("b00010") + case _: MaskUop3.type => BitPat("b00011") + case _: MaskUop4.type => BitPat("b00100") + case _: MaskUop5.type => BitPat("b00101") + case _: MaskUop6.type => BitPat("b00110") + case _: MaskUop7.type => BitPat("b00111") + case _: MaskUop8.type => BitPat("b01000") + case _: MaskUop9.type => BitPat("b01001") + case _: MaskUop10.type => BitPat("b01010") + case _: MaskUop11.type => BitPat("b01011") + case _ => BitPat.dontCare(chiselType.getWidth) + } + + } + + def allFields(param: DecoderParam): Seq[T1DecodeFiled[_ >: Bool <: UInt]] = Seq( + logic, + adder, + shift, + multiplier, + divider, + multiCycle, + other, + maskPipeType, + unsigned0, + unsigned1, + itype, + nr, + red, + // top only + widenReduce, + targetRd, + slid, + gather, + gather16, + compress, + unOrderWrite, + // top uop + extend, // top uop + mv, // top uop + iota, // top uop + uop, + maskLogic, + maskDestination, + maskSource, + readOnly, + vwmacc, + saturate, + special, + maskUnit, + crossWrite, + crossRead, + // state + sWrite, + // sRead1 -> vType + vtype, + sReadVD, + scheduler, + dontNeedExecuteInLane, + reverse, // uop + average, // uop + ffo, // todo: add mask select -> top uop + popCount, // top uop add, red, uop popCount + topUop, + maskPipeUop, + writeCount, + specialSlot + ) ++ { + if (param.fpuEnable) + Seq( + float, + fpExecutionType, + floatMul, + orderReduce + ) + else Seq() + } ++ { + if (param.zvbbEnable) + Seq( + zvbb + ) + else Seq() + } ++ { + if (param.useXsfmm) + Seq( + zvma + ) + else Seq() + } + + def allDecodePattern(param: DecoderParam): Seq[T1DecodePattern] = + param.allInstructions.map(T1DecodePattern(_, param)).toSeq.sortBy(_.instruction.name) + + def decodeTable(param: DecoderParam): DecodeTable[T1DecodePattern] = + new DecodeTable[T1DecodePattern](allDecodePattern(param), allFields(param)) + + def decode(param: DecoderParam): UInt => DecodeBundle = decodeTable(param).decode + def bundle(param: DecoderParam): DecodeBundle = decodeTable(param).bundle +} + +trait FieldName { + def name: String = this.getClass.getSimpleName.replace("$", "") +} + +case class SpecialAux(name: String, vs: Int, value: String) +case class SpecialMap(name: String, vs: Int, data: Map[String, String]) + +case class SpecialAuxInstr( + instrName: String, + vs: Int, + value: String, + name: String) + +case class Op( + tpe: String, + funct6: String, + tpeOp2: String, + funct3: String, + name: String, + special: Option[SpecialAux], + notLSU: Boolean, + vd: String, + opcode: String) + extends DecodePattern { + + // include 32 bits: funct6 + vm + vs2 + vs1 + funct3 + vd + opcode + def bitPat: BitPat = BitPat( + "b" + + // funct6 + funct6 + + // ? for vm + "?" + + // vs2 + (if (special.isEmpty || special.get.vs == 1) "?????" else special.get.value) + + // vs1 + (if (special.isEmpty || special.get.vs == 2) "?????" else special.get.value) + + // funct3 + funct3 + + vd + + opcode + ) + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/DomainDecoder.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/DomainDecoder.scala new file mode 100644 index 00000000..bf4b7dba --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/DomainDecoder.scala @@ -0,0 +1,56 @@ +package framework.gpdomain.sequencer.decoder + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig +import framework.core.bbtile.RoCCCommandBB +import framework.gpdomain.sequencer.decoder.{Decoder, DecoderParam} +import chisel3.experimental.hierarchy.{instantiable, public} + +/** + * Domain Decoder Parameter Object + * Contains the configuration for RVV instruction decoding + */ +object DomainDecoderParameter { + + lazy val allInstructions: Seq[InstructionEncoding.Instruction] = { + RVVInstructions.allInstructions + } + + lazy val decoderParam: DecoderParam = DecoderParam( + fpuEnable = true, // Enable floating-point vector instructions + zvbbEnable = true, // Enable vector bit manipulation + useXsfmm = false, // Disable xsfmm extension for now + allInstructions = allInstructions + ) + +} + +/** + * Domain Decoder IO + */ +class DomainDecoderIO(b: GlobalConfig) extends Bundle { + val inst_i = Input(new RoCCCommandBB(b.core.xLen)) + val decoded_o = Decoder.bundle(DomainDecoderParameter.decoderParam).cloneType +} + +/** + * Domain Decoder Module + * Encapsulates the T1 decoder logic with local instruction database + */ +@instantiable +class DomainDecoder(val b: GlobalConfig) extends Module { + @public + val io = IO(new DomainDecoderIO(b)) + + import DomainDecoderParameter._ + + // Instantiate the T1 decoder with our local instructions + val decode = Decoder.decode(decoderParam) + + // Decode the incoming instruction + // RoCCCommandBB has 'raw_inst' field containing the raw 32-bit instruction + val inst = io.inst_i.raw_inst + io.decoded_o := decode(inst) + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/InstructionDatabase.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/InstructionDatabase.scala new file mode 100644 index 00000000..6de76028 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/InstructionDatabase.scala @@ -0,0 +1,371 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2024 Mio + +package framework.gpdomain.sequencer.decoder + +/** + * Local Instruction Database + * Provides instruction definitions without external dependencies + */ +object InstructionEncoding { + + /** Like chisel3.BitPat, this stores the instruction encoding */ + case class Encoding(value: BigInt, mask: BigInt) { + def toBitMask(bit_pat: String) = + Seq.tabulate(32)(i => if (!mask.testBit(i)) bit_pat else if (value.testBit(i)) "1" else "0").reverse.mkString + override def toString: String = toBitMask("?") + } + + object Encoding { + implicit val rw: upickle.default.ReadWriter[Encoding] = upickle.default.macroRW + + def fromString(str: String): Encoding = { + require(str.length == 32, s"Encoding string must be 32 bits, got ${str.length}") + Encoding( + str.reverse.zipWithIndex.map { + case (c, i) => + c match { + case '1' => BigInt(1) << i + case '0' => BigInt(0) + case '?' => BigInt(0) + case _ => throw new IllegalArgumentException(s"Invalid encoding character: $c") + } + }.sum, + str.reverse.zipWithIndex.map { + case (c, i) => + c match { + case '1' => BigInt(1) << i + case '0' => BigInt(1) << i + case '?' => BigInt(0) + case _ => throw new IllegalArgumentException(s"Invalid encoding character: $c") + } + }.sum + ) + } + + } + + case class Arg(name: String, msb: Int, lsb: Int) { + override def toString: String = name + } + + object Arg { + implicit val rw: upickle.default.ReadWriter[Arg] = upickle.default.macroRW + } + + case class InstructionSet(name: String) + + object InstructionSet { + implicit val rw: upickle.default.ReadWriter[InstructionSet] = upickle.default.macroRW + } + + case class Instruction( + name: String, + encoding: Encoding, + args: Seq[Arg], + instructionSets: Seq[InstructionSet], + pseudoFrom: Option[Instruction], + ratified: Boolean, + custom: Boolean) { + def instructionSet: InstructionSet = instructionSets.head + } + + object Instruction { + implicit val rw: upickle.default.ReadWriter[Instruction] = upickle.default.macroRW + } + +} + +/** + * Manual RVV Instruction Database + * Contains commonly used RVV instructions with their encodings + */ +object RVVInstructions { + import InstructionEncoding._ + + // Common argument definitions + val vd = Arg("vd", 11, 7) + val vs1 = Arg("vs1", 19, 15) + val vs2 = Arg("vs2", 24, 20) + val vs3 = Arg("vs3", 11, 7) + val vm = Arg("vm", 25, 25) + val rs1 = Arg("rs1", 19, 15) + val rs2 = Arg("rs2", 24, 20) + val rd = Arg("rd", 11, 7) + val zimm5 = Arg("zimm5", 19, 15) + val zimm10 = Arg("zimm10", 29, 20) + val zimm11 = Arg("zimm11", 30, 20) + val simm5 = Arg("simm5", 19, 15) + val nf = Arg("nf", 31, 29) + + // Instruction set definitions + val rv_v = InstructionSet("rv_v") + val rv_zvbb = InstructionSet("rv_zvbb") + + /** + * Create all RVV instructions + * Returns a sequence of Instructions for the decoder + */ + def allInstructions: Seq[Instruction] = { + configInstructions ++ + integerArithmeticInstructions ++ + loadStoreInstructions ++ + maskInstructions ++ + permuteInstructions + } + + // ============================================================================ + // Vector Configuration Instructions + // ============================================================================ + def configInstructions: Seq[Instruction] = Seq( + Instruction( + name = "vsetvli", + encoding = + Encoding.fromString( + "0????????????????111?????1010111" + ), // 31=0, zimm11(30..20), rs1(19..15), funct3=111(14..12), rd(11..7), opcode=1010111(6..0) + args = Seq(rd, rs1, zimm11), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vsetivli", + encoding = + Encoding.fromString( + "11???????????????111?????1010111" + ), // 31..30=11, zimm10(29..20), zimm5(19..15), funct3=111(14..12), rd(11..7), opcode=1010111(6..0) + args = Seq(rd, zimm10, zimm5), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vsetvl", + encoding = + Encoding.fromString( + "1000000??????????111?????1010111" + ), // 31=1, 30..25=000000, rs2(24..20), rs1(19..15), funct3=111(14..12), rd(11..7), opcode=1010111(6..0) + args = Seq(rd, rs1, rs2), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ) + ) + + // ============================================================================ + // Integer Arithmetic Instructions (Common ones) + // ============================================================================ + def integerArithmeticInstructions: Seq[Instruction] = Seq( + // VADD variants + Instruction( + name = "vadd.vv", + encoding = Encoding.fromString("000000???????????000?????1010111"), // funct6=000000, funct3=000 (OPIVV) + args = Seq(vd, vs2, vs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vadd.vx", + encoding = Encoding.fromString("000000???????????100?????1010111"), // funct6=000000, funct3=100 (OPIVX) + args = Seq(vd, vs2, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vadd.vi", + encoding = Encoding.fromString("000000???????????011?????1010111"), // funct6=000000, funct3=011 (OPIVI) + args = Seq(vd, vs2, simm5, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + // VSUB variants + Instruction( + name = "vsub.vv", + encoding = Encoding.fromString("000010???????????000?????1010111"), // funct6=000010, funct3=000 + args = Seq(vd, vs2, vs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vsub.vx", + encoding = Encoding.fromString("000010???????????100?????1010111"), // funct6=000010, funct3=100 + args = Seq(vd, vs2, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + // VMUL variants + Instruction( + name = "vmul.vv", + encoding = Encoding.fromString("100101???????????010?????1010111"), // funct6=100101, funct3=010 (OPMVV) + args = Seq(vd, vs2, vs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vmul.vx", + encoding = Encoding.fromString("100101???????????110?????1010111"), // funct6=100101, funct3=110 (OPMVX) + args = Seq(vd, vs2, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ) + ) + + // ============================================================================ + // Load/Store Instructions + // ============================================================================ + def loadStoreInstructions: Seq[Instruction] = Seq( + // Unit-stride loads + Instruction( + name = "vle8.v", + encoding = + Encoding.fromString( + "???000?00000?????000?????0000111" + ), // nf, mew=0, mop=00, lumop=00000, funct3=000, opcode=0000111 + args = Seq(vd, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vle16.v", + encoding = Encoding.fromString("???000?00000?????101?????0000111"), // funct3=101 + args = Seq(vd, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vle32.v", + encoding = Encoding.fromString("???000?00000?????110?????0000111"), // funct3=110 + args = Seq(vd, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vle64.v", + encoding = Encoding.fromString("???000?00000?????111?????0000111"), // funct3=111 + args = Seq(vd, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + // Unit-stride stores + Instruction( + name = "vse8.v", + encoding = Encoding.fromString("???000?00000?????000?????0100111"), // opcode=0100111 (VS) + args = Seq(vs3, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vse16.v", + encoding = Encoding.fromString("???000?00000?????101?????0100111"), + args = Seq(vs3, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vse32.v", + encoding = Encoding.fromString("???000?00000?????110?????0100111"), + args = Seq(vs3, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vse64.v", + encoding = Encoding.fromString("???000?00000?????111?????0100111"), + args = Seq(vs3, rs1, vm), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ) + ) + + // ============================================================================ + // Mask Instructions + // ============================================================================ + def maskInstructions: Seq[Instruction] = Seq( + Instruction( + name = "vmand.mm", + encoding = Encoding.fromString("011001???????????010?????1010111"), // funct6=011001, funct3=010 + args = Seq(vd, vs2, vs1), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vmor.mm", + encoding = Encoding.fromString("011010???????????010?????1010111"), // funct6=011010, funct3=010 + args = Seq(vd, vs2, vs1), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ) + ) + + // ============================================================================ + // Permute Instructions + // ============================================================================ + def permuteInstructions: Seq[Instruction] = Seq( + Instruction( + name = "vmv.v.v", + encoding = Encoding.fromString("010111?00000?????000?????1010111"), // funct6=010111, vs2=00000, funct3=000 + args = Seq(vd, vs1), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vmv.v.x", + encoding = Encoding.fromString("010111?00000?????100?????1010111"), // funct6=010111, vs2=00000, funct3=100 + args = Seq(vd, rs1), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ), + Instruction( + name = "vmv.v.i", + encoding = Encoding.fromString("010111?00000?????011?????1010111"), // funct6=010111, vs2=00000, funct3=011 + args = Seq(vd, simm5), + instructionSets = Seq(rv_v), + pseudoFrom = None, + ratified = true, + custom = false + ) + ) + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/InstructionDocumentation.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/InstructionDocumentation.scala new file mode 100644 index 00000000..4ba60527 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/InstructionDocumentation.scala @@ -0,0 +1,468 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder + +import framework.gpdomain.sequencer.decoder.InstructionEncoding.Instruction + +/** + * Generate documentation for each instructions for T1. The documentation should contain the behavior for instruction + * in a specific configuration in T1. + * @todo + * should it be a post process at omreader? + */ +case class InstructionDocumentation(instruction: Instruction, param: DecoderParam) { + + override def toString: String = instruction.name match { + case "vaadd.vv" => "TODO!" + case "vaadd.vx" => "TODO!" + case "vaaddu.vv" => "TODO!" + case "vaaddu.vx" => "TODO!" + case "vadc.vim" => "TODO!" + case "vadc.vvm" => "TODO!" + case "vadc.vxm" => "TODO!" + case "vadd.vi" => "TODO!" + case "vadd.vv" => "TODO!" + case "vadd.vx" => "TODO!" + case "vand.vi" => "TODO!" + case "vand.vv" => "TODO!" + case "vand.vx" => "TODO!" + case "vasub.vv" => "TODO!" + case "vasub.vx" => "TODO!" + case "vasubu.vv" => "TODO!" + case "vasubu.vx" => "TODO!" + case "vcompress.vm" => "TODO!" + case "vcpop.m" => "TODO!" + case "vdiv.vv" => "TODO!" + case "vdiv.vx" => "TODO!" + case "vdivu.vv" => "TODO!" + case "vdivu.vx" => "TODO!" + case "vfadd.vf" => "TODO!" + case "vfadd.vv" => "TODO!" + case "vfclass.v" => "TODO!" + case "vfcvt.f.x.v" => "TODO!" + case "vfcvt.f.xu.v" => "TODO!" + case "vfcvt.rtz.x.f.v" => "TODO!" + case "vfcvt.rtz.xu.f.v" => "TODO!" + case "vfcvt.x.f.v" => "TODO!" + case "vfcvt.xu.f.v" => "TODO!" + case "vfdiv.vf" => "TODO!" + case "vfdiv.vv" => "TODO!" + case "vfirst.m" => "TODO!" + case "vfmacc.vf" => "TODO!" + case "vfmacc.vv" => "TODO!" + case "vfmadd.vf" => "TODO!" + case "vfmadd.vv" => "TODO!" + case "vfmax.vf" => "TODO!" + case "vfmax.vv" => "TODO!" + case "vfmerge.vfm" => "TODO!" + case "vfmin.vf" => "TODO!" + case "vfmin.vv" => "TODO!" + case "vfmsac.vf" => "TODO!" + case "vfmsac.vv" => "TODO!" + case "vfmsub.vf" => "TODO!" + case "vfmsub.vv" => "TODO!" + case "vfmul.vf" => "TODO!" + case "vfmul.vv" => "TODO!" + case "vfmv.f.s" => "TODO!" + case "vfmv.s.f" => "TODO!" + case "vfmv.v.f" => "TODO!" + case "vfncvt.f.f.w" => "TODO!" + case "vfncvt.f.x.w" => "TODO!" + case "vfncvt.f.xu.w" => "TODO!" + case "vfncvt.rod.f.f.w" => "TODO!" + case "vfncvt.rtz.x.f.w" => "TODO!" + case "vfncvt.rtz.xu.f.w" => "TODO!" + case "vfncvt.x.f.w" => "TODO!" + case "vfncvt.xu.f.w" => "TODO!" + case "vfnmacc.vf" => "TODO!" + case "vfnmacc.vv" => "TODO!" + case "vfnmadd.vf" => "TODO!" + case "vfnmadd.vv" => "TODO!" + case "vfnmsac.vf" => "TODO!" + case "vfnmsac.vv" => "TODO!" + case "vfnmsub.vf" => "TODO!" + case "vfnmsub.vv" => "TODO!" + case "vfrdiv.vf" => "TODO!" + case "vfrec7.v" => "TODO!" + case "vfredmax.vs" => "TODO!" + case "vfredmin.vs" => "TODO!" + case "vfredosum.vs" => "TODO!" + case "vfredusum.vs" => "TODO!" + case "vfrsqrt7.v" => "TODO!" + case "vfrsub.vf" => "TODO!" + case "vfsgnj.vf" => "TODO!" + case "vfsgnj.vv" => "TODO!" + case "vfsgnjn.vf" => "TODO!" + case "vfsgnjn.vv" => "TODO!" + case "vfsgnjx.vf" => "TODO!" + case "vfsgnjx.vv" => "TODO!" + case "vfslide1down.vf" => "TODO!" + case "vfslide1up.vf" => "TODO!" + case "vfsqrt.v" => "TODO!" + case "vfsub.vf" => "TODO!" + case "vfsub.vv" => "TODO!" + case "vfwadd.vf" => "TODO!" + case "vfwadd.vv" => "TODO!" + case "vfwadd.wf" => "TODO!" + case "vfwadd.wv" => "TODO!" + case "vfwcvt.f.f.v" => "TODO!" + case "vfwcvt.f.x.v" => "TODO!" + case "vfwcvt.f.xu.v" => "TODO!" + case "vfwcvt.rtz.x.f.v" => "TODO!" + case "vfwcvt.rtz.xu.f.v" => "TODO!" + case "vfwcvt.x.f.v" => "TODO!" + case "vfwcvt.xu.f.v" => "TODO!" + case "vfwmacc.vf" => "TODO!" + case "vfwmacc.vv" => "TODO!" + case "vfwmsac.vf" => "TODO!" + case "vfwmsac.vv" => "TODO!" + case "vfwmul.vf" => "TODO!" + case "vfwmul.vv" => "TODO!" + case "vfwnmacc.vf" => "TODO!" + case "vfwnmacc.vv" => "TODO!" + case "vfwnmsac.vf" => "TODO!" + case "vfwnmsac.vv" => "TODO!" + case "vfwredosum.vs" => "TODO!" + case "vfwredusum.vs" => "TODO!" + case "vfwsub.vf" => "TODO!" + case "vfwsub.vv" => "TODO!" + case "vfwsub.wf" => "TODO!" + case "vfwsub.wv" => "TODO!" + case "vid.v" => "TODO!" + case "viota.m" => "TODO!" + case "vl1re16.v" => "TODO!" + case "vl1re32.v" => "TODO!" + case "vl1re64.v" => "TODO!" + case "vl1re8.v" => "TODO!" + case "vl2re16.v" => "TODO!" + case "vl2re32.v" => "TODO!" + case "vl2re64.v" => "TODO!" + case "vl2re8.v" => "TODO!" + case "vl4re16.v" => "TODO!" + case "vl4re32.v" => "TODO!" + case "vl4re64.v" => "TODO!" + case "vl4re8.v" => "TODO!" + case "vl8re16.v" => "TODO!" + case "vl8re32.v" => "TODO!" + case "vl8re64.v" => "TODO!" + case "vl8re8.v" => "TODO!" + case "vle1024.v" => "TODO!" + case "vle1024ff.v" => "TODO!" + case "vle128.v" => "TODO!" + case "vle128ff.v" => "TODO!" + case "vle16.v" => "TODO!" + case "vle16ff.v" => "TODO!" + case "vle256.v" => "TODO!" + case "vle256ff.v" => "TODO!" + case "vle32.v" => "TODO!" + case "vle32ff.v" => "TODO!" + case "vle512.v" => "TODO!" + case "vle512ff.v" => "TODO!" + case "vle64.v" => "TODO!" + case "vle64ff.v" => "TODO!" + case "vle8.v" => "TODO!" + case "vle8ff.v" => "TODO!" + case "vlm.v" => "TODO!" + case "vloxei1024.v" => "TODO!" + case "vloxei128.v" => "TODO!" + case "vloxei16.v" => "TODO!" + case "vloxei256.v" => "TODO!" + case "vloxei32.v" => "TODO!" + case "vloxei512.v" => "TODO!" + case "vloxei64.v" => "TODO!" + case "vloxei8.v" => "TODO!" + case "vlse1024.v" => "TODO!" + case "vlse128.v" => "TODO!" + case "vlse16.v" => "TODO!" + case "vlse256.v" => "TODO!" + case "vlse32.v" => "TODO!" + case "vlse512.v" => "TODO!" + case "vlse64.v" => "TODO!" + case "vlse8.v" => "TODO!" + case "vluxei1024.v" => "TODO!" + case "vluxei128.v" => "TODO!" + case "vluxei16.v" => "TODO!" + case "vluxei256.v" => "TODO!" + case "vluxei32.v" => "TODO!" + case "vluxei512.v" => "TODO!" + case "vluxei64.v" => "TODO!" + case "vluxei8.v" => "TODO!" + case "vmacc.vv" => "TODO!" + case "vmacc.vx" => "TODO!" + case "vmadc.vi" => "TODO!" + case "vmadc.vim" => "TODO!" + case "vmadc.vv" => "TODO!" + case "vmadc.vvm" => "TODO!" + case "vmadc.vx" => "TODO!" + case "vmadc.vxm" => "TODO!" + case "vmadd.vv" => "TODO!" + case "vmadd.vx" => "TODO!" + case "vmand.mm" => "TODO!" + case "vmandn.mm" => "TODO!" + case "vmax.vv" => "TODO!" + case "vmax.vx" => "TODO!" + case "vmaxu.vv" => "TODO!" + case "vmaxu.vx" => "TODO!" + case "vmerge.vim" => "TODO!" + case "vmerge.vvm" => "TODO!" + case "vmerge.vxm" => "TODO!" + case "vmfeq.vf" => "TODO!" + case "vmfeq.vv" => "TODO!" + case "vmfge.vf" => "TODO!" + case "vmfgt.vf" => "TODO!" + case "vmfle.vf" => "TODO!" + case "vmfle.vv" => "TODO!" + case "vmflt.vf" => "TODO!" + case "vmflt.vv" => "TODO!" + case "vmfne.vf" => "TODO!" + case "vmfne.vv" => "TODO!" + case "vmin.vv" => "TODO!" + case "vmin.vx" => "TODO!" + case "vminu.vv" => "TODO!" + case "vminu.vx" => "TODO!" + case "vmnand.mm" => "TODO!" + case "vmnor.mm" => "TODO!" + case "vmor.mm" => "TODO!" + case "vmorn.mm" => "TODO!" + case "vmsbc.vv" => "TODO!" + case "vmsbc.vvm" => "TODO!" + case "vmsbc.vx" => "TODO!" + case "vmsbc.vxm" => "TODO!" + case "vmsbf.m" => "TODO!" + case "vmseq.vi" => "TODO!" + case "vmseq.vv" => "TODO!" + case "vmseq.vx" => "TODO!" + case "vmsgt.vi" => "TODO!" + case "vmsgt.vx" => "TODO!" + case "vmsgtu.vi" => "TODO!" + case "vmsgtu.vx" => "TODO!" + case "vmsif.m" => "TODO!" + case "vmsle.vi" => "TODO!" + case "vmsle.vv" => "TODO!" + case "vmsle.vx" => "TODO!" + case "vmsleu.vi" => "TODO!" + case "vmsleu.vv" => "TODO!" + case "vmsleu.vx" => "TODO!" + case "vmslt.vv" => "TODO!" + case "vmslt.vx" => "TODO!" + case "vmsltu.vv" => "TODO!" + case "vmsltu.vx" => "TODO!" + case "vmsne.vi" => "TODO!" + case "vmsne.vv" => "TODO!" + case "vmsne.vx" => "TODO!" + case "vmsof.m" => "TODO!" + case "vmul.vv" => "TODO!" + case "vmul.vx" => "TODO!" + case "vmulh.vv" => "TODO!" + case "vmulh.vx" => "TODO!" + case "vmulhsu.vv" => "TODO!" + case "vmulhsu.vx" => "TODO!" + case "vmulhu.vv" => "TODO!" + case "vmulhu.vx" => "TODO!" + case "vmv.s.x" => "TODO!" + case "vmv.v.i" => "TODO!" + case "vmv.v.v" => "TODO!" + case "vmv.v.x" => "TODO!" + case "vmv.x.s" => "TODO!" + case "vmv1r.v" => "TODO!" + case "vmv2r.v" => "TODO!" + case "vmv4r.v" => "TODO!" + case "vmv8r.v" => "TODO!" + case "vmxnor.mm" => "TODO!" + case "vmxor.mm" => "TODO!" + case "vnclip.wi" => "TODO!" + case "vnclip.wv" => "TODO!" + case "vnclip.wx" => "TODO!" + case "vnclipu.wi" => "TODO!" + case "vnclipu.wv" => "TODO!" + case "vnclipu.wx" => "TODO!" + case "vnmsac.vv" => "TODO!" + case "vnmsac.vx" => "TODO!" + case "vnmsub.vv" => "TODO!" + case "vnmsub.vx" => "TODO!" + case "vnsra.wi" => "TODO!" + case "vnsra.wv" => "TODO!" + case "vnsra.wx" => "TODO!" + case "vnsrl.wi" => "TODO!" + case "vnsrl.wv" => "TODO!" + case "vnsrl.wx" => "TODO!" + case "vor.vi" => "TODO!" + case "vor.vv" => "TODO!" + case "vor.vx" => "TODO!" + case "vredand.vs" => "TODO!" + case "vredmax.vs" => "TODO!" + case "vredmaxu.vs" => "TODO!" + case "vredmin.vs" => "TODO!" + case "vredminu.vs" => "TODO!" + case "vredor.vs" => "TODO!" + case "vredsum.vs" => "TODO!" + case "vredxor.vs" => "TODO!" + case "vrem.vv" => "TODO!" + case "vrem.vx" => "TODO!" + case "vremu.vv" => "TODO!" + case "vremu.vx" => "TODO!" + case "vrgather.vi" => "TODO!" + case "vrgather.vv" => "TODO!" + case "vrgather.vx" => "TODO!" + case "vrgatherei16.vv" => "TODO!" + case "vrsub.vi" => "TODO!" + case "vrsub.vx" => "TODO!" + case "vs1r.v" => "TODO!" + case "vs2r.v" => "TODO!" + case "vs4r.v" => "TODO!" + case "vs8r.v" => "TODO!" + case "vsadd.vi" => "TODO!" + case "vsadd.vv" => "TODO!" + case "vsadd.vx" => "TODO!" + case "vsaddu.vi" => "TODO!" + case "vsaddu.vv" => "TODO!" + case "vsaddu.vx" => "TODO!" + case "vsbc.vvm" => "TODO!" + case "vsbc.vxm" => "TODO!" + case "vse1024.v" => "TODO!" + case "vse128.v" => "TODO!" + case "vse16.v" => "TODO!" + case "vse256.v" => "TODO!" + case "vse32.v" => "TODO!" + case "vse512.v" => "TODO!" + case "vse64.v" => "TODO!" + case "vse8.v" => "TODO!" + case "vsetivli" => "TODO!" + case "vsetvl" => "TODO!" + case "vsetvli" => "TODO!" + case "vsext.vf2" => "TODO!" + case "vsext.vf4" => "TODO!" + case "vsext.vf8" => "TODO!" + case "vslide1down.vx" => "TODO!" + case "vslide1up.vx" => "TODO!" + case "vslidedown.vi" => "TODO!" + case "vslidedown.vx" => "TODO!" + case "vslideup.vi" => "TODO!" + case "vslideup.vx" => "TODO!" + case "vsll.vi" => "TODO!" + case "vsll.vv" => "TODO!" + case "vsll.vx" => "TODO!" + case "vsm.v" => "TODO!" + case "vsmul.vv" => "TODO!" + case "vsmul.vx" => "TODO!" + case "vsoxei1024.v" => "TODO!" + case "vsoxei128.v" => "TODO!" + case "vsoxei16.v" => "TODO!" + case "vsoxei256.v" => "TODO!" + case "vsoxei32.v" => "TODO!" + case "vsoxei512.v" => "TODO!" + case "vsoxei64.v" => "TODO!" + case "vsoxei8.v" => "TODO!" + case "vsra.vi" => "TODO!" + case "vsra.vv" => "TODO!" + case "vsra.vx" => "TODO!" + case "vsrl.vi" => "TODO!" + case "vsrl.vv" => "TODO!" + case "vsrl.vx" => "TODO!" + case "vsse1024.v" => "TODO!" + case "vsse128.v" => "TODO!" + case "vsse16.v" => "TODO!" + case "vsse256.v" => "TODO!" + case "vsse32.v" => "TODO!" + case "vsse512.v" => "TODO!" + case "vsse64.v" => "TODO!" + case "vsse8.v" => "TODO!" + case "vssra.vi" => "TODO!" + case "vssra.vv" => "TODO!" + case "vssra.vx" => "TODO!" + case "vssrl.vi" => "TODO!" + case "vssrl.vv" => "TODO!" + case "vssrl.vx" => "TODO!" + case "vssub.vv" => "TODO!" + case "vssub.vx" => "TODO!" + case "vssubu.vv" => "TODO!" + case "vssubu.vx" => "TODO!" + case "vsub.vv" => "TODO!" + case "vsub.vx" => "TODO!" + case "vsuxei1024.v" => "TODO!" + case "vsuxei128.v" => "TODO!" + case "vsuxei16.v" => "TODO!" + case "vsuxei256.v" => "TODO!" + case "vsuxei32.v" => "TODO!" + case "vsuxei512.v" => "TODO!" + case "vsuxei64.v" => "TODO!" + case "vsuxei8.v" => "TODO!" + case "vwadd.vv" => "TODO!" + case "vwadd.vx" => "TODO!" + case "vwadd.wv" => "TODO!" + case "vwadd.wx" => "TODO!" + case "vwaddu.vv" => "TODO!" + case "vwaddu.vx" => "TODO!" + case "vwaddu.wv" => "TODO!" + case "vwaddu.wx" => "TODO!" + case "vwmacc.vv" => "TODO!" + case "vwmacc.vx" => "TODO!" + case "vwmaccsu.vv" => "TODO!" + case "vwmaccsu.vx" => "TODO!" + case "vwmaccu.vv" => "TODO!" + case "vwmaccu.vx" => "TODO!" + case "vwmaccus.vx" => "TODO!" + case "vwmul.vv" => "TODO!" + case "vwmul.vx" => "TODO!" + case "vwmulsu.vv" => "TODO!" + case "vwmulsu.vx" => "TODO!" + case "vwmulu.vv" => "TODO!" + case "vwmulu.vx" => "TODO!" + case "vwredsum.vs" => "TODO!" + case "vwredsumu.vs" => "TODO!" + case "vwsub.vv" => "TODO!" + case "vwsub.vx" => "TODO!" + case "vwsub.wv" => "TODO!" + case "vwsub.wx" => "TODO!" + case "vwsubu.vv" => "TODO!" + case "vwsubu.vx" => "TODO!" + case "vwsubu.wv" => "TODO!" + case "vwsubu.wx" => "TODO!" + case "vxor.vi" => "TODO!" + case "vxor.vv" => "TODO!" + case "vxor.vx" => "TODO!" + case "vzext.vf2" => "TODO!" + case "vzext.vf4" => "TODO!" + case "vzext.vf8" => "TODO!" + // rv_zvbb + case "vandn.vv" => "TODO!" + case "vandn.vx" => "TODO!" + case "vbrev.v" => "TODO!" + case "vbrev8.v" => "TODO!" + case "vrev8.v" => "TODO!" + case "vclz.v" => "TODO!" + case "vctz.v" => "TODO!" + case "vcpop.v" => "TODO!" + case "vrol.vv" => "TODO!" + case "vrol.vx" => "TODO!" + case "vror.vv" => "TODO!" + case "vror.vx" => "TODO!" + case "vror.vi" => "TODO!" + case "vwsll.vv" => "TODO!" + case "vwsll.vx" => "TODO!" + case "vwsll.vi" => "TODO!" + // + case "vlte8" => "TODO!" + case "vlte16" => "TODO!" + case "vlte32" => "TODO!" + case "vste8" => "TODO!" + case "vste16" => "TODO!" + case "vste32" => "TODO!" + case "vtmv.v.t" => "TODO!" + case "vtmv.t.v" => "TODO!" + case "mm.u.u" => "TODO!" + case "mm.u.s" => "TODO!" + case "mm.s.u" => "TODO!" + case "mm.s.s" => "TODO!" + case "mm.e5m2.e4m3" => "TODO!" + case "mm.e5m2.e5m2" => "TODO!" + case "mm.e4m3.e4m3" => "TODO!" + case "mm.e4m3.e5m2" => "TODO!" + case "vtzero.t" => "TODO!" + case "p2mm.f.f" => "TODO!" + case "vtdiscard" => "TODO!" + case _ => "TODO" + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/T1DecodePattern.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/T1DecodePattern.scala new file mode 100644 index 00000000..bd7c3f4c --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/T1DecodePattern.scala @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder + +import chisel3._ +import chisel3.experimental.hierarchy.core.Definition +import chisel3.experimental.hierarchy.{instantiable, public, Instantiate} +import chisel3.properties.{AnyClassType, Class, ClassType, Property} +import chisel3.util.BitPat +import chisel3.util.experimental.decode.DecodePattern +import framework.gpdomain.sequencer.decoder.InstructionEncoding.Instruction +import framework.gpdomain.sequencer.decoder.attribute._ + +@instantiable +class T1DecodeAttributeOM( + _identifier: String, + _description: String, + _value: String) + extends Class { + val identifier = IO(Output(Property[String]())) + val description = IO(Output(Property[String]())) + val value = IO(Output(Property[String]())) + identifier := Property(_identifier) + description := Property(_description) + value := Property(_value) +} + +@instantiable +class T1InstructionOM( + _instructionName: String, + _documentation: String, + _bitPat: String) + extends Class { + val instructionName = IO(Output(Property[String]())) + val documentation = IO(Output(Property[String]())) + val bitPat = IO(Output(Property[String]())) + val attributes = IO(Output(Property[Seq[AnyClassType]])) + @public + val attributesIn = IO(Input(Property[Seq[AnyClassType]])) + + instructionName := Property(_instructionName) + documentation := Property(_documentation) + bitPat := Property(_bitPat) + attributes := attributesIn +} + +/** + * A case class that should wrap all Vector Instructions. This is used to store the attribute for Vector Instruction + * under the T1 uArch. It generates [[chisel3.util.experimental.decode.TruthTable]], as well as documentation field. + */ +case class T1DecodePattern(instruction: Instruction, param: DecoderParam) extends DecodePattern { + override def bitPat: BitPat = BitPat("b" + instruction.encoding.toString) + + // use the attribute w/ [[isVector.value]] + def isVector: isVector = attribute.isVector(this) + def isAdder: isAdder = attribute.isAdder(this) + def isAverage: isAverage = attribute.isAverage(this) + def isCompress: isCompress = attribute.isCompress(this) + def isCrossread: isCrossread = attribute.isCrossread(this) + def isCrosswrite: isCrosswrite = attribute.isCrosswrite(this) + def isDivider: isDivider = attribute.isDivider(this) + def isDontneedexecuteinlane: isDontneedexecuteinlane = attribute.isDontneedexecuteinlane(this) + def isExtend: isExtend = attribute.isExtend(this) + def isFcompare: isFcompare = attribute.isFcompare(this) + def isFfo: isFfo = attribute.isFfo(this) + def isFirstwiden: isFirstwiden = attribute.isFirstwiden(this) + def isFloatmul: isFloatmul = attribute.isFloatmul(this) + def isFloat: isFloat = attribute.isFloat(this) + def isFloattype: isFloattype = attribute.isFloattype(this) + def isFma: isFma = attribute.isFma(this) + def isFother: isFother = attribute.isFother(this) + def isGather16: isGather16 = attribute.isGather16(this) + def isGather: isGather = attribute.isGather(this) + def isId: isId = attribute.isId(this) + def isIndextype: isIndextype = attribute.isIndextype(this) + def isIota: isIota = attribute.isIota(this) + def isItype: isItype = attribute.isItype(this) + def isLogic: isLogic = attribute.isLogic(this) + def isMaskdestination: isMaskdestination = attribute.isMaskdestination(this) + def isMasklogic: isMasklogic = attribute.isMasklogic(this) + def isMasksource: isMasksource = attribute.isMasksource(this) + def isMaskunit: isMaskunit = attribute.isMaskunit(this) + def isMulticycle: isMulticycle = attribute.isMulticycle(this) + def isMultiplier: isMultiplier = attribute.isMultiplier(this) + def isMv: isMv = attribute.isMv(this) + def isNarrow: isNarrow = attribute.isNarrow(this) + def isNr: isNr = attribute.isNr(this) + def isOrderreduce: isOrderreduce = attribute.isOrderreduce(this) + def isOther: isOther = attribute.isOther(this) + def isZero: isZero = attribute.isZero(this) + def isPopcount: isPopcount = attribute.isPopcount(this) + def isReadonly: isReadonly = attribute.isReadonly(this) + def isRed: isRed = attribute.isRed(this) + def isReverse: isReverse = attribute.isReverse(this) + def isSaturate: isSaturate = attribute.isSaturate(this) + def isScheduler: isScheduler = attribute.isScheduler(this) + def isShift: isShift = attribute.isShift(this) + def isSlid: isSlid = attribute.isSlid(this) + def isSpecial: isSpecial = attribute.isSpecial(this) + def isSpecialslot: isSpecialslot = attribute.isSpecialslot(this) + def isSreadvd: isSreadvd = attribute.isSreadvd(this) + def isSwrite: isSwrite = attribute.isSwrite(this) + def isTargetrd: isTargetrd = attribute.isTargetrd(this) + def isUnorderwrite: isUnorderwrite = attribute.isUnorderwrite(this) + def isUnsigned0: isUnsigned0 = attribute.isUnsigned0(this) + def isUnsigned1: isUnsigned1 = attribute.isUnsigned1(this) + def isVtype: isVtype = attribute.isVtype(this) + def isVwmacc: isVwmacc = attribute.isVwmacc(this) + def isWidenreduce: isWidenreduce = attribute.isWidenreduce(this) + def isZvbb: isZvbb = attribute.isZvbb(this) + def isZvma: isZvma = attribute.isZvma(this) + def isMaskPip: isMaskPipeType = attribute.isMaskPipeType(this) + def isWriteCount: isWriteCount = attribute.isWriteCount(this) + def fpExecutionType: FpExecutionType.Type = attribute.FpExecutionType(this) + def topUop: TopUop = attribute.TopUop(this) + def decoderUop: DecoderUop = attribute.DecoderUop(this) + def maskPipeUop: MaskPipeOpcode = attribute.MaskPipeOpcode(this) + + private def documentation: String = InstructionDocumentation(instruction, param).toString + + // This is the OM for this instruction + def om: Property[ClassType] = { + val obj = Instantiate( + new T1InstructionOM( + instruction.name, + bitPat.rawString, + documentation + ) + ) + // convert in-memory attributes to Chisel Property + // get type of [[T1DecodeAttributeOM]] + obj.attributesIn :#= Property( + Seq( + isVector, + isAdder, + isAverage, + isCompress, + isCrossread, + isCrosswrite, + isDivider, + isDontneedexecuteinlane, + isExtend, + isFcompare, + isFfo, + isFirstwiden, + isFloatmul, + isFloat, + isFloattype, + isFma, + isFother, + isGather16, + isGather, + isId, + isIndextype, + isIota, + isItype, + isLogic, + isMaskdestination, + isMasklogic, + isMasksource, + isMaskunit, + isMulticycle, + isMultiplier, + isMv, + isNarrow, + isNr, + isOrderreduce, + isOther, + isPopcount, + isReadonly, + isRed, + isReverse, + isSaturate, + isScheduler, + isShift, + isSlid, + isSpecial, + isSpecialslot, + isSreadvd, + isSwrite, + isTargetrd, + isUnorderwrite, + isUnsigned0, + isUnsigned1, + isVtype, + isVwmacc, + isWidenreduce + ).map(_.om.asAnyClassType) + ) + obj.getPropertyReference + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/TableGenerator.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/TableGenerator.scala new file mode 100644 index 00000000..753a2570 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/TableGenerator.scala @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder + +import chisel3.util._ +import chisel3.util.experimental.decode.TruthTable + +import scala.language.postfixOps + +object TableGenerator extends App { + + implicit class CrossAble[X](xs: Traversable[X]) { + + def cross[Y](ys: Traversable[Y]): Traversable[(X, Y)] = for { + x <- xs + y <- ys + } yield (x, y) + + } + + implicit def bool2str(b: Boolean): String = if (b) "b1" else "b0" + + object LogicTable { + + object LogicOpcode { + var index = 0 + } + + sealed trait LogicOpcode { + val value: Int = LogicOpcode.index + LogicOpcode.index = LogicOpcode.index + 1 + } + + trait Operand + + trait BinaryOperand extends Operand { + def op(op0: Boolean, op1: Boolean): Boolean + } + + case object and extends BinaryOperand with LogicOpcode { + override def op(op0: Boolean, op1: Boolean): Boolean = op0 && op1 + } + + case object or extends BinaryOperand with LogicOpcode { + override def op(op0: Boolean, op1: Boolean): Boolean = op0 || op1 + } + + case object xor extends BinaryOperand with LogicOpcode { + override def op(op0: Boolean, op1: Boolean): Boolean = op0 != op1 + } + + val opList: Seq[BinaryOperand with LogicOpcode] = Seq(and, or, xor) + val bitValue: Seq[Boolean] = Seq(true, false) + + val table: List[(BitPat, BitPat)] = bitValue + .cross(bitValue) + .cross(opList) + .map { + case ((op0, op1), op) => + BitPat(toBinary(op.value)) ## BitPat(op0) ## BitPat(op1) -> BitPat(op.op(op0, op1)) + } + .toList + + } + + object LaneDecodeTable { + + object LaneUop { + var index = 0 + } + + sealed trait LaneUop { + val value: Int = LaneUop.index + LaneUop.index = LaneUop.index + 1 + } + + /* object SubUnitCode { var index = 1 } + * + * sealed trait SubUnitCode { val value: Int = SubUnitCode.index SubUnitCode.index = SubUnitCode.index << 1 } */ + trait BaseObject + trait SubUnit extends BaseObject + + trait LogicUnit extends SubUnit + trait Arithmetic extends SubUnit + trait Shift extends SubUnit + trait Mul extends SubUnit + trait Div extends SubUnit + trait PopCount extends SubUnit + trait FFO extends SubUnit + trait GetIndex extends SubUnit + trait DataProcess extends SubUnit + + def subUnitCode(in: SubUnit): Int = { + in match { + case unit: LogicUnit => 1 + case arithmetic: Arithmetic => 2 + case shift: Shift => 4 + case mul: Mul => 8 + case div: Div => 16 + case count: PopCount => 32 + case ffo: FFO => 64 + case index: GetIndex => 128 + case process: DataProcess => 256 + case _ => 0 + } + } + + case object and extends LogicUnit with LaneUop + case object nand extends LogicUnit with LaneUop + case object andn extends LogicUnit with LaneUop + case object or extends LogicUnit with LaneUop + case object nor extends LogicUnit with LaneUop + case object orn extends LogicUnit with LaneUop + case object xor extends LogicUnit with LaneUop + case object xnor extends LogicUnit with LaneUop + case object add extends Arithmetic with LaneUop + case object sub extends Arithmetic with LaneUop + case object adc extends Arithmetic with LaneUop + case object madc extends Arithmetic with LaneUop + case object sbc extends Arithmetic with LaneUop + case object msbc extends Arithmetic with LaneUop + case object slt extends Arithmetic with LaneUop + case object sltu extends Arithmetic with LaneUop + case object sle extends Arithmetic with LaneUop + case object sleu extends Arithmetic with LaneUop + case object sgt extends Arithmetic with LaneUop + case object sgtu extends Arithmetic with LaneUop + case object sge extends Arithmetic with LaneUop + case object sgeu extends Arithmetic with LaneUop + case object max extends Arithmetic with LaneUop + case object maxu extends Arithmetic with LaneUop + case object min extends Arithmetic with LaneUop + case object minu extends Arithmetic with LaneUop + case object sll extends Shift with LaneUop + case object srl extends Shift with LaneUop + case object sra extends Shift with LaneUop + case object ssrl extends Shift with LaneUop + case object ssra extends Shift with LaneUop + case object mul extends Mul with LaneUop + case object mulh extends Mul with LaneUop + case object mulhu extends Mul with LaneUop + case object mulhsu extends Mul with LaneUop + case object ma extends Mul with LaneUop + case object ms extends Mul with LaneUop + case object div extends Div with LaneUop + case object divu extends Div with LaneUop + case object rem extends Div with LaneUop + case object remu extends Div with LaneUop + case object popCount extends PopCount with LaneUop + case object ffo extends FFO with LaneUop + case object ffB extends FFO with LaneUop + case object ffInc extends FFO with LaneUop + case object ffID extends FFO with LaneUop + case object getID extends GetIndex with LaneUop + } + + object BankEnableTable { + // TODO + val maskList: Seq[Int] = Seq(1, 3, 15) + val maskSizeList: Seq[Int] = Seq(1, 2, 4) + var table: List[(BitPat, BitPat)] = List.empty + for { + eew <- 0 to 2 + vs <- 0 to 3 + groupId <- 0 to 3 + v <- Seq(true, false) + } { + var mask = if (v) maskList(eew) else 0 + val maskSize = maskSizeList(eew) + val index = (maskSize * (vs + groupId)) % 4 + mask <<= index + table :+= BitPat(v) ## BitPat(toBinary(eew, 2)) ## BitPat(toBinary(vs, 2)) ## BitPat( + toBinary(groupId, 2) + ) -> BitPat(toBinary(index, 2)) ## BitPat(toBinary(mask, 4)) + } + val res: TruthTable = TruthTable(table, BitPat.dontCare(6)) + } + + def toBinary(i: Int, digits: Int = 3): String = + String.format("b%" + digits + "s", i.toBinaryString).replace(' ', '0') +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/adderUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/adderUop.scala new file mode 100644 index 00000000..a928020e --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/adderUop.scala @@ -0,0 +1,198 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait AdderUOPType extends Uop +object addUop0 extends AdderUOPType +object addUop1 extends AdderUOPType +object addUop2 extends AdderUOPType +object addUop3 extends AdderUOPType +object addUop4 extends AdderUOPType +object addUop6 extends AdderUOPType +object addUop7 extends AdderUOPType +object addUop8 extends AdderUOPType +object addUop9 extends AdderUOPType +object addUop10 extends AdderUOPType +object addUop11 extends AdderUOPType + +object AdderUOP { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> addUop0, + t1 _ -> addUop1, + t2 _ -> addUop2, + t3 _ -> addUop3, + t4 _ -> addUop4, + t6 _ -> addUop6, + t7 _ -> addUop7, + t8 _ -> addUop8, + t9 _ -> addUop9, + t10 _ -> addUop10, + t11 _ -> addUop11 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vaadd.vv", + "vaadd.vx", + "vaaddu.vv", + "vaaddu.vx", + "vadd.vi", + "vadd.vv", + "vadd.vx", + "vredsum.vs", + "vsadd.vi", + "vsadd.vv", + "vsadd.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwredsum.vs", + "vwredsumu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vasub.vv", + "vasub.vx", + "vasubu.vv", + "vasubu.vx", + "vrsub.vi", + "vrsub.vx", + "vssub.vv", + "vssub.vx", + "vssubu.vv", + "vssubu.vx", + "vsub.vv", + "vsub.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmax.vv", + "vmax.vx", + "vmaxu.vv", + "vmaxu.vx", + "vredmax.vs", + "vredmaxu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t7(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmin.vv", + "vmin.vx", + "vminu.vv", + "vminu.vx", + "vredmin.vs", + "vredminu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmseq.vi", + "vmseq.vv", + "vmseq.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsne.vi", + "vmsne.vv", + "vmsne.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t10(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vadc.vim", + "vadc.vvm", + "vadc.vxm", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t11(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vsbc.vvm", + "vsbc.vxm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/divUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/divUop.scala new file mode 100644 index 00000000..13d2e875 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/divUop.scala @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait DivUOPType extends Uop +object divUop0 extends DivUOPType +object divUop1 extends DivUOPType +object divUop8 extends DivUOPType +object divUop9 extends DivUOPType +object divUop10 extends DivUOPType + +object DivUOP { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> divUop0, + t1 _ -> divUop1, + t8 _ -> divUop8, + t9 _ -> divUop9, + t10 _ -> divUop10 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vdiv.vv", + "vdiv.vx", + "vdivu.vv", + "vdivu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrem.vv", + "vrem.vx", + "vremu.vv", + "vremu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfdiv.vf", + "vfdiv.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfsqrt.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t10(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfrdiv.vf" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/floatUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/floatUop.scala new file mode 100644 index 00000000..9ba81d4d --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/floatUop.scala @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait FloatUopType extends Uop +object FUT0 extends FloatUopType +object FUT1 extends FloatUopType +object FUT2 extends FloatUopType +object FUT3 extends FloatUopType +object FUT4 extends FloatUopType +object FUT5 extends FloatUopType +object FUT6 extends FloatUopType +object FUT7 extends FloatUopType +object FUT8 extends FloatUopType +object FUT9 extends FloatUopType +object FUT10 extends FloatUopType +object FUT12 extends FloatUopType +object FUT13 extends FloatUopType +object FUT14 extends FloatUopType + +object FloatUop { + + def apply(t1DecodePattern: T1DecodePattern) = { + Seq( + t0 _ -> FUT0, + t1 _ -> FUT1, + t2 _ -> FUT2, + t3 _ -> FUT3, + t4 _ -> FUT4, + t5 _ -> FUT5, + t6 _ -> FUT6, + t7 _ -> FUT7, + t8 _ -> FUT8, + t9 _ -> FUT9, + t10 _ -> FUT10, + t12 _ -> FUT12, + t13 _ -> FUT13, + t14 _ -> FUT14 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => + !(t1(t1DecodePattern) + || t2(t1DecodePattern) + || t3(t1DecodePattern) + || t4(t1DecodePattern) + || t5(t1DecodePattern) + || t6(t1DecodePattern) + || t7(t1DecodePattern) + || t8(t1DecodePattern) + || t9(t1DecodePattern) + || t10(t1DecodePattern) + || t12(t1DecodePattern) + || t13(t1DecodePattern) + || t14(t1DecodePattern)) + ) + allMatched.contains(t1DecodePattern.instruction) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmsac.vf", + "vfmsac.vv", + "vfsgnj.vf", + "vfsgnj.vv", + "vmfeq.vf", + "vmfeq.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfnmsac.vf", + "vfnmsac.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vmflt.vf", + "vmflt.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfnmacc.vf", + "vfnmacc.vv", + "vfsgnjx.vf", + "vfsgnjx.vv", + "vmfle.vf", + "vmfle.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfclass.v", + "vfmadd.vf", + "vfmadd.vv", + "vmfgt.vf" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmsub.vf", + "vfmsub.vv", + "vmfge.vf" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfnmsub.vf", + "vfnmsub.vv", + "vfrec7.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t7(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfnmadd.vf", + "vfnmadd.vv", + "vfrsqrt7.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfadd.vf", + "vfadd.vv", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfmin.vf", + "vfmin.vv", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfcvt.xu.f.v", + "vfsub.vf", + "vfsub.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t10(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfcvt.x.f.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t12(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmax.vf", + "vfmax.vv", + "vfredmax.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t13(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfcvt.rtz.xu.f.v", + "vfrsub.vf" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t14(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfcvt.rtz.x.f.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/fpExecutionType.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/fpExecutionType.scala new file mode 100644 index 00000000..1fe108a2 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/fpExecutionType.scala @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object FpExecutionType { + + trait Type extends Uop { + def apply(t1DecodePattern: T1DecodePattern): Boolean + } + + case object Compare extends Type { + def apply(t1DecodePattern: T1DecodePattern): Boolean = isFcompare.y(t1DecodePattern) + } + + case object Other extends Type { + def apply(t1DecodePattern: T1DecodePattern): Boolean = isFother.y(t1DecodePattern) + } + + case object MA extends Type { + def apply(t1DecodePattern: T1DecodePattern): Boolean = + !(isFcompare.y(t1DecodePattern) || isFother.y(t1DecodePattern)) + } + + case object Nil extends Type { + + def apply(t1DecodePattern: T1DecodePattern): Boolean = { + require(requirement = false, "unreachable") + false + } + + } + + def apply(t1DecodePattern: T1DecodePattern): Type = { + val tpe = Seq(Compare, Other, MA).filter(tpe => tpe(t1DecodePattern)) + require(tpe.size <= 1) + tpe.headOption.getOrElse(Nil) + } + +} + +case class FpExecutionType(value: FpExecutionType.Type) extends UopDecodeAttribute[FpExecutionType.Type] { + override val description: String = "float uop, goes to [[org.chipsalliance.t1.rtl.LaneFloatRequest.unitSelet]]" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isAdder.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isAdder.scala new file mode 100644 index 00000000..1a848b1f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isAdder.scala @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isAdder { + + def apply(t1DecodePattern: T1DecodePattern): isAdder = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isAdder(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaadd.vv", + "vaadd.vx", + "vaaddu.vv", + "vaaddu.vx", + "vadc.vim", + "vadc.vvm", + "vadc.vxm", + "vadd.vi", + "vadd.vv", + "vadd.vx", + "vasub.vv", + "vasub.vx", + "vasubu.vv", + "vasubu.vx", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmax.vv", + "vmax.vx", + "vmaxu.vv", + "vmaxu.vx", + "vmin.vv", + "vmin.vx", + "vminu.vv", + "vminu.vx", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredsum.vs", + "vrsub.vi", + "vrsub.vx", + "vsadd.vi", + "vsadd.vv", + "vsadd.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vsbc.vvm", + "vsbc.vxm", + "vssub.vv", + "vssub.vx", + "vssubu.vv", + "vssubu.vx", + "vsub.vv", + "vsub.vx", + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwredsum.vs", + "vwredsumu.vs", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isAdder(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.LaneAdder]]." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isAverage.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isAverage.scala new file mode 100644 index 00000000..23b1c3b6 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isAverage.scala @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isAverage { + + def apply(t1DecodePattern: T1DecodePattern): isAverage = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isAverage(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaadd.vv", + "vaadd.vx", + "vaaddu.vv", + "vaaddu.vx", + "vasub.vv", + "vasub.vx", + "vasubu.vv", + "vasubu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfirst.m", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrec7.v", + "vfrsqrt7.v", + "vfsqrt.v", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vid.v", + "viota.m", + "vmsbf.m", + "vmsif.m", + "vmsof.m", + "vmv.s.x", + "vmv.x.s", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class isAverage(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "For adder, does it need to take care of saturate. TODO: add to uop " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCompress.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCompress.scala new file mode 100644 index 00000000..9c723142 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCompress.scala @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isCompress { + + def apply(t1DecodePattern: T1DecodePattern): isCompress = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isCompress(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcompress.vm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isCompress(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "lane will read data from vs2, send to Sequencer. then Sequencer will read vs1 for mask. use mask to compress data in vs2. and write to vd at last. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCrossread.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCrossread.scala new file mode 100644 index 00000000..7768adfc --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCrossread.scala @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isCrossread { + + def apply(t1DecodePattern: T1DecodePattern): isCrossread = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isCrossread(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfwadd.wf", + "vfwadd.wv", + "vfwsub.wf", + "vfwsub.wv", + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.wv", + "vwsubu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isCrossread(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "Read vs2 or vd with cross read channel. crossRead -> narrow || firstWiden " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCrosswrite.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCrosswrite.scala new file mode 100644 index 00000000..ab957ddc --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isCrosswrite.scala @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isCrosswrite { + + def apply(t1DecodePattern: T1DecodePattern): isCrosswrite = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isCrosswrite(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + // rv_zvbb + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfirst.m", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrec7.v", + "vfrsqrt7.v", + "vfsqrt.v", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vid.v", + "viota.m", + "vmsbf.m", + "vmsif.m", + "vmsof.m", + "vmv.s.x", + "vmv.x.s", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class isCrosswrite(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "lane should write to another lane" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isDivider.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isDivider.scala new file mode 100644 index 00000000..1452d516 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isDivider.scala @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isDivider { + + def apply(t1DecodePattern: T1DecodePattern): isDivider = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isDivider(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vdiv.vv", + "vdiv.vx", + "vdivu.vv", + "vdivu.vx", + "vfdiv.vf", + "vfdiv.vv", + "vfrdiv.vf", + "vfsqrt.v", + "vrem.vv", + "vrem.vx", + "vremu.vv", + "vremu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isDivider(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "goes to [[org.chipsalliance.t1.rtl.LaneDiv]] or [[org.chipsalliance.t1.rtl.LaneDivFP]]. if FP exist, all div goes to [[org.chipsalliance.t1.rtl.LaneDivFP]]" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isDontneedexecuteinlane.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isDontneedexecuteinlane.scala new file mode 100644 index 00000000..0d17e7fd --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isDontneedexecuteinlane.scala @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isDontneedexecuteinlane { + + def apply(t1DecodePattern: T1DecodePattern): isDontneedexecuteinlane = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isDontneedexecuteinlane(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfslide1down.vf", + "vfslide1up.vf", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vcompress.vm", + "viota.m", + "vl1re16.v", + "vl1re32.v", + "vl1re64.v", + "vl1re8.v", + "vl2re16.v", + "vl2re32.v", + "vl2re64.v", + "vl2re8.v", + "vl4re16.v", + "vl4re32.v", + "vl4re64.v", + "vl4re8.v", + "vl8re16.v", + "vl8re32.v", + "vl8re64.v", + "vl8re8.v", + "vle1024.v", + "vle1024ff.v", + "vle128.v", + "vle128ff.v", + "vle16.v", + "vle16ff.v", + "vle256.v", + "vle256ff.v", + "vle32.v", + "vle32ff.v", + "vle512.v", + "vle512ff.v", + "vle64.v", + "vle64ff.v", + "vle8.v", + "vle8ff.v", + "vlm.v", + "vloxei1024.v", + "vloxei128.v", + "vloxei16.v", + "vloxei256.v", + "vloxei32.v", + "vloxei512.v", + "vloxei64.v", + "vloxei8.v", + "vlse1024.v", + "vlse128.v", + "vlse16.v", + "vlse256.v", + "vlse32.v", + "vlse512.v", + "vlse64.v", + "vlse8.v", + "vluxei1024.v", + "vluxei128.v", + "vluxei16.v", + "vluxei256.v", + "vluxei32.v", + "vluxei512.v", + "vluxei64.v", + "vluxei8.v", + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v", + "vrgather.vv", + "vrgatherei16.vv", + "vs1r.v", + "vs2r.v", + "vs4r.v", + "vs8r.v", + "vse1024.v", + "vse128.v", + "vse16.v", + "vse256.v", + "vse32.v", + "vse512.v", + "vse64.v", + "vse8.v", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vsm.v", + "vsoxei1024.v", + "vsoxei128.v", + "vsoxei16.v", + "vsoxei256.v", + "vsoxei32.v", + "vsoxei512.v", + "vsoxei64.v", + "vsoxei8.v", + "vsse1024.v", + "vsse128.v", + "vsse16.v", + "vsse256.v", + "vsse32.v", + "vsse512.v", + "vsse64.v", + "vsse8.v", + "vsuxei1024.v", + "vsuxei128.v", + "vsuxei16.v", + "vsuxei256.v", + "vsuxei32.v", + "vsuxei512.v", + "vsuxei64.v", + "vsuxei8.v", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isDontneedexecuteinlane(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "It is no longer executed in the execution unit, but it may pass through the pipe (expected to pass through the pipe in the future)" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isExtend.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isExtend.scala new file mode 100644 index 00000000..ad126d1b --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isExtend.scala @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isExtend { + + def apply(t1DecodePattern: T1DecodePattern): isExtend = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isExtend(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isExtend(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "send element to MaskUnit at top, extend and broadcast to multiple Lanes." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFcompare.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFcompare.scala new file mode 100644 index 00000000..fdad714d --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFcompare.scala @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFcompare { + + def apply(t1DecodePattern: T1DecodePattern): isFcompare = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFcompare(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfmax.vf", + "vfmax.vv", + "vfmin.vf", + "vfmin.vv", + "vfredmax.vs", + "vfredmin.vs", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFcompare(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "TODO: remove it, but remains attribute." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFfo.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFfo.scala new file mode 100644 index 00000000..93342100 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFfo.scala @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFfo { + + def apply(t1DecodePattern: T1DecodePattern): isFfo = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFfo(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfirst.m", + "vmsbf.m", + "vmsif.m", + "vmsof.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFfo(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "find first one, lane will report if 1 is found, Sequencer should decide which is exactly the first 1 in lanes. after 1 is found, tell each lane, 1 has been found at which the corresponding location. lane will stale at stage2. TODO: should split into lane control uop " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFirstwiden.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFirstwiden.scala new file mode 100644 index 00000000..8ba3a76f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFirstwiden.scala @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFirstwiden { + + def apply(t1DecodePattern: T1DecodePattern): isFirstwiden = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFirstwiden(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfwadd.wf", + "vfwadd.wv", + "vfwsub.wf", + "vfwsub.wv", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wv", + "vnclipu.wx", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wv", + "vnsrl.wx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.wv", + "vwsubu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFirstwiden(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "There are two types of widen: - vd -> widen. - vs2, vd -> widen. This op will widen vs2. TODO: remove it as attribute." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloat.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloat.scala new file mode 100644 index 00000000..47dbb423 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloat.scala @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFloat { + + def apply(t1DecodePattern: T1DecodePattern): isFloat = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFloat(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = + if (t1DecodePattern.param.fpuEnable) + Seq( + "vfadd.vf", + "vfadd.vv", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfmacc.vf", + "vfmacc.vv", + "vfmadd.vf", + "vfmadd.vv", + "vfmax.vf", + "vfmax.vv", + "vfmin.vf", + "vfmin.vv", + "vfmsac.vf", + "vfmsac.vv", + "vfmsub.vf", + "vfmsub.vv", + "vfmul.vf", + "vfmul.vv", + "vfmv.f.s", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfnmacc.vf", + "vfnmacc.vv", + "vfnmadd.vf", + "vfnmadd.vv", + "vfnmsac.vf", + "vfnmsac.vv", + "vfnmsub.vf", + "vfnmsub.vv", + "vfrec7.v", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfrsqrt7.v", + "vfrsub.vf", + "vfsgnj.vf", + "vfsgnj.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vfsgnjx.vf", + "vfsgnjx.vv", + "vfsub.vf", + "vfsub.vv", + "vfwadd.vf", + "vfwadd.vv", + "vfwadd.wf", + "vfwadd.wv", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vfwmacc.vf", + "vfwmacc.vv", + "vfwmsac.vf", + "vfwmsac.vv", + "vfwmul.vf", + "vfwmul.vv", + "vfwnmacc.vf", + "vfwnmacc.vv", + "vfwnmsac.vf", + "vfwnmsac.vv", + "vfwredosum.vs", + "vfwredusum.vs", + "vfwsub.vf", + "vfwsub.vv", + "vfwsub.wf", + "vfwsub.wv", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv" + ) + else Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFloat(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.LaneFloat]]." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloatmul.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloatmul.scala new file mode 100644 index 00000000..03c3dd6c --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloatmul.scala @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFloatmul { + + def apply(t1DecodePattern: T1DecodePattern): isFloatmul = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFloatmul(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = + if (t1DecodePattern.param.fpuEnable) + Seq( + "vfmul.vf", + "vfmul.vv" + ) + else Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFloatmul(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "TODO: add op." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloattype.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloattype.scala new file mode 100644 index 00000000..eabfca86 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFloattype.scala @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFloattype { + + def apply(t1DecodePattern: T1DecodePattern): isFloattype = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFloattype(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = + if (t1DecodePattern.param.fpuEnable) + Seq( + "vfadd.vf", + "vfadd.vv", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfdiv.vf", + "vfdiv.vv", + "vfmacc.vf", + "vfmacc.vv", + "vfmadd.vf", + "vfmadd.vv", + "vfmax.vf", + "vfmax.vv", + "vfmerge.vfm", + "vfmv.v.f", + "vfmin.vf", + "vfmin.vv", + "vfmsac.vf", + "vfmsac.vv", + "vfmsub.vf", + "vfmsub.vv", + "vfmul.vf", + "vfmul.vv", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfnmacc.vf", + "vfnmacc.vv", + "vfnmadd.vf", + "vfnmadd.vv", + "vfnmsac.vf", + "vfnmsac.vv", + "vfnmsub.vf", + "vfnmsub.vv", + "vfrdiv.vf", + "vfrec7.v", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfrsqrt7.v", + "vfrsub.vf", + "vfsgnj.vf", + "vfsgnj.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vfsgnjx.vf", + "vfsgnjx.vv", + "vfslide1down.vf", + "vfslide1up.vf", + "vfsqrt.v", + "vfsub.vf", + "vfsub.vv", + "vfwadd.vf", + "vfwadd.vv", + "vfwadd.wf", + "vfwadd.wv", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vfwmacc.vf", + "vfwmacc.vv", + "vfwmsac.vf", + "vfwmsac.vv", + "vfwmul.vf", + "vfwmul.vv", + "vfwnmacc.vf", + "vfwnmacc.vv", + "vfwnmsac.vf", + "vfwnmsac.vv", + "vfwredosum.vs", + "vfwredusum.vs", + "vfwsub.vf", + "vfwsub.vv", + "vfwsub.wf", + "vfwsub.wv", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv" + ) + else Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFloattype(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "is a float type. TODO: remove it." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFma.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFma.scala new file mode 100644 index 00000000..373e7885 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFma.scala @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFma { + + def apply(t1DecodePattern: T1DecodePattern): isFma = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFma(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfadd.vf", + "vfadd.vv", + "vfmacc.vf", + "vfmacc.vv", + "vfmadd.vf", + "vfmadd.vv", + "vfmsac.vf", + "vfmsac.vv", + "vfmsub.vf", + "vfmsub.vv", + "vfmul.vf", + "vfmul.vv", + "vfnmacc.vf", + "vfnmacc.vv", + "vfnmadd.vf", + "vfnmadd.vv", + "vfnmsac.vf", + "vfnmsac.vv", + "vfnmsub.vf", + "vfnmsub.vv", + "vfredosum.vs", + "vfredusum.vs", + "vfrsub.vf", + "vfsub.vf", + "vfsub.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFma(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "uop of FMA, goes to [[org.chipsalliance.t1.rtl.LaneFloat]] FMA unit." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFother.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFother.scala new file mode 100644 index 00000000..4f9db018 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isFother.scala @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isFother { + + def apply(t1DecodePattern: T1DecodePattern): isFother = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isFother(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfrec7.v", + "vfrsqrt7.v", + "vfsgnj.vf", + "vfsgnj.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vfsgnjx.vf", + "vfsgnjx.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isFother(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "designed for Other Unit in FP. goes to [[org.chipsalliance.t1.rtl.LaneFloat]] OtherUnit. TODO: perf it." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isGather.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isGather.scala new file mode 100644 index 00000000..a771d262 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isGather.scala @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isGather { + + def apply(t1DecodePattern: T1DecodePattern): isGather = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isGather(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vrgather.vi", + "vrgather.vv", + "vrgather.vx", + "vrgatherei16.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isGather(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "lane will read index from vs1, send to Sequencer. mask unit will calculate vrf address based on the vs1 from lane, and send read request to lanes, lanes should read it and send vs2 to Sequencer. Sequencer will write vd at last. address: 0 -> vlmax(sew decide address width.) " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isGather16.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isGather16.scala new file mode 100644 index 00000000..1867d225 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isGather16.scala @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isGather16 { + + def apply(t1DecodePattern: T1DecodePattern): isGather16 = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isGather16(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vrgatherei16.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isGather16(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "same with [[gather]] ignore sew, address width is fixed to 16. @note When SEW=8, vrgather.vv can only reference vector elements 0-255. The vrgatherei16 form can index 64K elements, and can also be used to reduce the register capacity needed to hold indices when SEW > 16. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isId.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isId.scala new file mode 100644 index 00000000..3c6e40f6 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isId.scala @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isId { + + def apply(t1DecodePattern: T1DecodePattern): isId = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isId(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vdiv.vv", + "vdiv.vx", + "vdivu.vv", + "vdivu.vx", + "vfdiv.vf", + "vfdiv.vv", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrdiv.vf", + "vfslide1down.vf", + "vfslide1up.vf", + "vfsqrt.v", + "vfwadd.wf", + "vfwadd.wv", + "vfwsub.wf", + "vfwsub.wv", + "vid.v", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wv", + "vnclipu.wx", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wv", + "vnsrl.wx", + "vrem.vv", + "vrem.vx", + "vremu.vv", + "vremu.vx", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwredsum.vs", + "vwredsumu.vs", + "vwsub.wv", + "vwsub.wx", + "vwsubu.wv", + "vwsubu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isId(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "write 0...vlmax to VRF. Lane other unit should handle it. TODO: remove it, it's a uop. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isIndextype.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isIndextype.scala new file mode 100644 index 00000000..95ca2064 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isIndextype.scala @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isIndextype { + + def apply(t1DecodePattern: T1DecodePattern): isIndextype = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isIndextype(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vloxei1024.v", + "vloxei128.v", + "vloxei16.v", + "vloxei256.v", + "vloxei32.v", + "vloxei512.v", + "vloxei64.v", + "vloxei8.v", + "vluxei1024.v", + "vluxei128.v", + "vluxei16.v", + "vluxei256.v", + "vluxei32.v", + "vluxei512.v", + "vluxei64.v", + "vluxei8.v", + "vsoxei1024.v", + "vsoxei128.v", + "vsoxei16.v", + "vsoxei256.v", + "vsoxei32.v", + "vsoxei512.v", + "vsoxei64.v", + "vsoxei8.v", + "vsuxei1024.v", + "vsuxei128.v", + "vsuxei16.v", + "vsuxei256.v", + "vsuxei32.v", + "vsuxei512.v", + "vsuxei64.v", + "vsuxei8.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isIndextype(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "TODO: remove it." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isIota.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isIota.scala new file mode 100644 index 00000000..346a0ff7 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isIota.scala @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isIota { + + def apply(t1DecodePattern: T1DecodePattern): isIota = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isIota(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "viota.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isIota(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "lane will read vs2 from VRF, send to Top. Top read v0(at register) calculate the result and write back to VRFs " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isItype.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isItype.scala new file mode 100644 index 00000000..6d6c0b54 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isItype.scala @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isItype { + + def apply(t1DecodePattern: T1DecodePattern): isItype = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isItype(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vadc.vim", + "vadd.vi", + "vand.vi", + "vmadc.vi", + "vmadc.vim", + "vmerge.vim", + "vmv.v.i", + "vmseq.vi", + "vmsgt.vi", + "vmsgtu.vi", + "vmsle.vi", + "vmsleu.vi", + "vmsne.vi", + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v", + "vnclip.wi", + "vnclipu.wi", + "vnsra.wi", + "vnsrl.wi", + "vor.vi", + "vrgather.vi", + "vrsub.vi", + "vsadd.vi", + "vsaddu.vi", + "vslidedown.vi", + "vslideup.vi", + "vsll.vi", + "vsra.vi", + "vsrl.vi", + "vssra.vi", + "vssrl.vi", + "vxor.vi", + // rv_zvbb + "vror.vi", + "vwsll.vi" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isItype(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "src is imm." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isLogic.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isLogic.scala new file mode 100644 index 00000000..34d50375 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isLogic.scala @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isLogic { + + def apply(t1DecodePattern: T1DecodePattern): isLogic = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isLogic(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vand.vi", + "vand.vv", + "vand.vx", + "vmand.mm", + "vmandn.mm", + "vmnand.mm", + "vmnor.mm", + "vmor.mm", + "vmorn.mm", + "vmxnor.mm", + "vmxor.mm", + "vor.vi", + "vor.vv", + "vor.vx", + "vredand.vs", + "vredor.vs", + "vredxor.vs", + "vxor.vi", + "vxor.vv", + "vxor.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isLogic(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "Instruction should use [[org.chipsalliance.t1.rtl.decoder.TableGenerator.LaneDecodeTable.LogicUnit]]." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskPipeType.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskPipeType.scala new file mode 100644 index 00000000..60b7382d --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskPipeType.scala @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMaskPipeType { + + def apply(t1DecodePattern: T1DecodePattern): isMaskPipeType = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMaskPipeType(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val isExtend = Seq( + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + + val isCrossWrite = Seq( + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + // rv_zvbb + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + + val isSlide = Seq( + "vfslide1down.vf", + "vfslide1up.vf", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx" + ) + + val isGather = Seq( + "vrgather.vv", + "vrgatherei16.vv" + ) + + val isReduce = Seq( + "vcpop.m", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfwredosum.vs", + "vfwredusum.vs", + "vredand.vs", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredor.vs", + "vredsum.vs", + "vredxor.vs", + "vwredsum.vs", + "vwredsumu.vs" + ) + + val allMatched = isExtend ++ isCrossWrite ++ isSlide ++ isGather ++ isReduce + + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMaskPipeType(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskdestination.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskdestination.scala new file mode 100644 index 00000000..caa580d7 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskdestination.scala @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMaskdestination { + + def apply(t1DecodePattern: T1DecodePattern): isMaskdestination = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMaskdestination(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMaskdestination(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "vd is mask format. execute at lane, send result to Sequencer, regroup it and write to vd. if datapath is unaligned, need to take care the tail. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMasklogic.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMasklogic.scala new file mode 100644 index 00000000..fe8b5ef1 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMasklogic.scala @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMasklogic { + + def apply(t1DecodePattern: T1DecodePattern): isMasklogic = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMasklogic(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfirst.m", + "viota.m", + "vmand.mm", + "vmandn.mm", + "vmnand.mm", + "vmnor.mm", + "vmor.mm", + "vmorn.mm", + "vmsbf.m", + "vmsif.m", + "vmsof.m", + "vmxnor.mm", + "vmxor.mm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMasklogic(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "only one or two operators src is mask format(one element one bit). vl should align with src. if datapath is unaligned, need to take care the tail. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMasksource.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMasksource.scala new file mode 100644 index 00000000..5da843cf --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMasksource.scala @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMasksource { + + def apply(t1DecodePattern: T1DecodePattern): isMasksource = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMasksource(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vadc.vim", + "vadc.vvm", + "vadc.vxm", + "vfmerge.vfm", + "vfmv.v.f", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmerge.vim", + "vmv.v.i", + "vmerge.vvm", + "vmv.v.v", + "vmerge.vxm", + "vmv.v.x", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vsbc.vvm", + "vsbc.vxm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMasksource(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "three ops. these ops don't use mask. use v0 as third op, read it from duplicate V0. it will read use mask(v0) in mask format as source. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskunit.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskunit.scala new file mode 100644 index 00000000..70d0dd51 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMaskunit.scala @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMaskunit { + + def apply(t1DecodePattern: T1DecodePattern): isMaskunit = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMaskunit(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val mvType = Seq( + "vfmv.f.s", + "vfmv.s.f", + "vmv.s.x", + "vmv.x.s" + ) + val compress = Seq( + "vcompress.vm", + "viota.m" + ) + val maskDestination = Seq( + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmsbf.m", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsif.m", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx", + "vmsof.m" + ) + val isFFO = Seq( + "vfirst.m", + "vmsbf.m", + "vmsif.m", + "vmsof.m" + ) + val allMatched = mvType ++ compress ++ maskDestination ++ isFFO + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMaskunit(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "mask unit -> red || compress || viota || ffo || slid || maskDestination || gather(v) || mv || popCount || extend all instruction in Sequencer mask unit. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMulticycle.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMulticycle.scala new file mode 100644 index 00000000..a050fccc --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMulticycle.scala @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMulticycle { + + def apply(t1DecodePattern: T1DecodePattern): isMulticycle = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMulticycle(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vdiv.vv", + "vdiv.vx", + "vdivu.vv", + "vdivu.vx", + "vfadd.vf", + "vfadd.vv", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfdiv.vf", + "vfdiv.vv", + "vfmacc.vf", + "vfmacc.vv", + "vfmadd.vf", + "vfmadd.vv", + "vfmax.vf", + "vfmax.vv", + "vfmin.vf", + "vfmin.vv", + "vfmsac.vf", + "vfmsac.vv", + "vfmsub.vf", + "vfmsub.vv", + "vfmul.vf", + "vfmul.vv", + "vfmv.f.s", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfnmacc.vf", + "vfnmacc.vv", + "vfnmadd.vf", + "vfnmadd.vv", + "vfnmsac.vf", + "vfnmsac.vv", + "vfnmsub.vf", + "vfnmsub.vv", + "vfrdiv.vf", + "vfrec7.v", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfrsqrt7.v", + "vfrsub.vf", + "vfsgnj.vf", + "vfsgnj.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vfsgnjx.vf", + "vfsgnjx.vv", + "vfsqrt.v", + "vfsub.vf", + "vfsub.vv", + "vfwadd.vf", + "vfwadd.vv", + "vfwadd.wf", + "vfwadd.wv", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vfwmacc.vf", + "vfwmacc.vv", + "vfwmsac.vf", + "vfwmsac.vv", + "vfwmul.vf", + "vfwmul.vv", + "vfwnmacc.vf", + "vfwnmacc.vv", + "vfwnmsac.vf", + "vfwnmsac.vv", + "vfwredosum.vs", + "vfwredusum.vs", + "vfwsub.vf", + "vfwsub.vv", + "vfwsub.wf", + "vfwsub.wv", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vrem.vv", + "vrem.vx", + "vremu.vv", + "vremu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMulticycle(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "TODO: remove? only Div or customer" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMultiplier.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMultiplier.scala new file mode 100644 index 00000000..80d80466 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMultiplier.scala @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMultiplier { + + def apply(t1DecodePattern: T1DecodePattern): isMultiplier = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMultiplier(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vmacc.vv", + "vmacc.vx", + "vmadd.vv", + "vmadd.vx", + "vmul.vv", + "vmul.vx", + "vmulh.vv", + "vmulh.vx", + "vmulhsu.vv", + "vmulhsu.vx", + "vmulhu.vv", + "vmulhu.vx", + "vnmsac.vv", + "vnmsac.vx", + "vnmsub.vv", + "vnmsub.vx", + "vsmul.vv", + "vsmul.vx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMultiplier(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.LaneMul]]. (only apply to int mul)" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMv.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMv.scala new file mode 100644 index 00000000..ab1e5e36 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isMv.scala @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isMv { + + def apply(t1DecodePattern: T1DecodePattern): isMv = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isMv(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfmv.f.s", + "vfmv.s.f", + "vmv.s.x", + "vmv.x.s" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isMv(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "move instruction, v->v s->v x->v, single element move. TODO: split them into multiple op since datapath differs " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isNarrow.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isNarrow.scala new file mode 100644 index 00000000..9e98c73f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isNarrow.scala @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isNarrow { + + def apply(t1DecodePattern: T1DecodePattern): isNarrow = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isNarrow(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isNarrow(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + " dual width of src will be convert to single width to dst. narrow can be the src of chain. as the dst of chain, only can be fed with Load. TODO: remove it as attribute. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isNr.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isNr.scala new file mode 100644 index 00000000..1f7782f9 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isNr.scala @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isNr { + + def apply(t1DecodePattern: T1DecodePattern): isNr = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isNr(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isNr(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "for vmvnr, move vreg group to another vreg group. it will ignore lmul, use from instr. chainable" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isOrderreduce.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isOrderreduce.scala new file mode 100644 index 00000000..26e24f72 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isOrderreduce.scala @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isOrderreduce { + + def apply(t1DecodePattern: T1DecodePattern): isOrderreduce = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isOrderreduce(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfredosum.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isOrderreduce(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "don't use it, it's slow, lane read all elements from VRF, send to Top." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isOther.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isOther.scala new file mode 100644 index 00000000..60759f93 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isOther.scala @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isOther { + + def apply(t1DecodePattern: T1DecodePattern): isOther = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isOther(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfirst.m", + "vfmerge.vfm", + "vfmv.v.f", + "vfmv.s.f", + "vid.v", + "vmerge.vim", + "vmv.v.i", + "vmerge.vvm", + "vmv.v.v", + "vmerge.vxm", + "vmv.v.x", + "vmsbf.m", + "vmsif.m", + "vmsof.m", + "vmv.s.x", + "vmv.x.s", + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vrgather.vi", + "vrgather.vv", + "vrgather.vx", + "vrgatherei16.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isOther(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.OtherUnit]]" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isPopcount.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isPopcount.scala new file mode 100644 index 00000000..35989575 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isPopcount.scala @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isPopcount { + + def apply(t1DecodePattern: T1DecodePattern): isPopcount = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isPopcount(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isPopcount(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + " count how many 1s in VS2. lane will use [[org.chipsalliance.t1.rtl.OtherUnit]] to how many 1s locally; use reduce datapath to accumulate, send total result to top top will send result to vd. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isReadonly.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isReadonly.scala new file mode 100644 index 00000000..f4ed3059 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isReadonly.scala @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isReadonly { + + def apply(t1DecodePattern: T1DecodePattern): isReadonly = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isReadonly(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcompress.vm", + "viota.m", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isReadonly(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "lane read only instructions. for these instructions lane will only read vrf and send data back to Sequencer. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isRed.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isRed.scala new file mode 100644 index 00000000..2d6f293e --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isRed.scala @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isRed { + + def apply(t1DecodePattern: T1DecodePattern): isRed = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isRed(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfwredosum.vs", + "vfwredusum.vs", + "vredand.vs", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredor.vs", + "vredsum.vs", + "vredxor.vs", + "vwredsum.vs", + "vwredsumu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isRed(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "do reduce in each lane. each element will sequentially executed in each lanes. after finishing, pop it to Top, and use ALU at top to get the final result and send to element0 TODO: better name. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isReverse.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isReverse.scala new file mode 100644 index 00000000..a7825d39 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isReverse.scala @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isReverse { + + def apply(t1DecodePattern: T1DecodePattern): isReverse = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isReverse(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vrsub.vi", + "vrsub.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isReverse(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "only instruction will switch src. TODO: send it to uop. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSaturate.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSaturate.scala new file mode 100644 index 00000000..be1dd104 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSaturate.scala @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isSaturate { + + def apply(t1DecodePattern: T1DecodePattern): isSaturate = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isSaturate(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vsadd.vi", + "vsadd.vv", + "vsadd.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vsmul.vv", + "vsmul.vx", + "vssra.vi", + "vssra.vv", + "vssra.vx", + "vssrl.vi", + "vssrl.vv", + "vssrl.vx", + "vssub.vv", + "vssub.vx", + "vssubu.vv", + "vssubu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfirst.m", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrec7.v", + "vfrsqrt7.v", + "vfsqrt.v", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vid.v", + "viota.m", + "vmsbf.m", + "vmsif.m", + "vmsof.m", + "vmv.s.x", + "vmv.x.s", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class isSaturate(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "For adder, does it need to take care of saturate. TODO: add to uop " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isScheduler.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isScheduler.scala new file mode 100644 index 00000000..39f3dbf5 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isScheduler.scala @@ -0,0 +1,279 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isScheduler { + + def apply(t1DecodePattern: T1DecodePattern): isScheduler = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isScheduler(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaadd.vv", + "vaadd.vx", + "vaaddu.vv", + "vaaddu.vx", + "vadc.vim", + "vadc.vvm", + "vadc.vxm", + "vadd.vi", + "vadd.vv", + "vadd.vx", + "vand.vi", + "vand.vv", + "vand.vx", + "vasub.vv", + "vasub.vx", + "vasubu.vv", + "vasubu.vx", + "vdiv.vv", + "vdiv.vx", + "vdivu.vv", + "vdivu.vx", + "vfadd.vf", + "vfadd.vv", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfdiv.vf", + "vfdiv.vv", + "vfmacc.vf", + "vfmacc.vv", + "vfmadd.vf", + "vfmadd.vv", + "vfmax.vf", + "vfmax.vv", + "vfmerge.vfm", + "vfmv.v.f", + "vfmin.vf", + "vfmin.vv", + "vfmsac.vf", + "vfmsac.vv", + "vfmsub.vf", + "vfmsub.vv", + "vfmul.vf", + "vfmul.vv", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfnmacc.vf", + "vfnmacc.vv", + "vfnmadd.vf", + "vfnmadd.vv", + "vfnmsac.vf", + "vfnmsac.vv", + "vfnmsub.vf", + "vfnmsub.vv", + "vfrdiv.vf", + "vfrec7.v", + "vfrsqrt7.v", + "vfrsub.vf", + "vfsgnj.vf", + "vfsgnj.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vfsgnjx.vf", + "vfsgnjx.vv", + "vfslide1down.vf", + "vfslide1up.vf", + "vfsqrt.v", + "vfsub.vf", + "vfsub.vv", + "vfwadd.vf", + "vfwadd.vv", + "vfwadd.wf", + "vfwadd.wv", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vfwmacc.vf", + "vfwmacc.vv", + "vfwmsac.vf", + "vfwmsac.vv", + "vfwmul.vf", + "vfwmul.vv", + "vfwnmacc.vf", + "vfwnmacc.vv", + "vfwnmsac.vf", + "vfwnmsac.vv", + "vfwsub.vf", + "vfwsub.vv", + "vfwsub.wf", + "vfwsub.wv", + "vid.v", + "vmacc.vv", + "vmacc.vx", + "vmadd.vv", + "vmadd.vx", + "vmand.mm", + "vmandn.mm", + "vmax.vv", + "vmax.vx", + "vmaxu.vv", + "vmaxu.vx", + "vmerge.vim", + "vmv.v.i", + "vmerge.vvm", + "vmv.v.v", + "vmerge.vxm", + "vmv.v.x", + "vmin.vv", + "vmin.vx", + "vminu.vv", + "vminu.vx", + "vmnand.mm", + "vmnor.mm", + "vmor.mm", + "vmorn.mm", + "vmul.vv", + "vmul.vx", + "vmulh.vv", + "vmulh.vx", + "vmulhsu.vv", + "vmulhsu.vx", + "vmulhu.vv", + "vmulhu.vx", + "vmv.s.x", + "vmv.x.s", + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v", + "vmxnor.mm", + "vmxor.mm", + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnmsac.vv", + "vnmsac.vx", + "vnmsub.vv", + "vnmsub.vx", + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vor.vi", + "vor.vv", + "vor.vx", + "vrem.vv", + "vrem.vx", + "vremu.vv", + "vremu.vx", + "vrgather.vi", + "vrgather.vx", + "vrsub.vi", + "vrsub.vx", + "vsadd.vi", + "vsadd.vv", + "vsadd.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vsbc.vvm", + "vsbc.vxm", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vsll.vi", + "vsll.vv", + "vsll.vx", + "vsmul.vv", + "vsmul.vx", + "vsra.vi", + "vsra.vv", + "vsra.vx", + "vsrl.vi", + "vsrl.vv", + "vsrl.vx", + "vssra.vi", + "vssra.vv", + "vssra.vx", + "vssrl.vi", + "vssrl.vv", + "vssrl.vx", + "vssub.vv", + "vssub.vx", + "vssubu.vv", + "vssubu.vx", + "vsub.vv", + "vsub.vx", + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + "vxor.vi", + "vxor.vv", + "vxor.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isScheduler(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "lane will send request to Sequencer and wait ack from Sequencer. Instructions that will communicate with T1 top module.*/ " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isShift.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isShift.scala new file mode 100644 index 00000000..1b8122a5 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isShift.scala @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isShift { + + def apply(t1DecodePattern: T1DecodePattern): isShift = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isShift(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vsll.vi", + "vsll.vv", + "vsll.vx", + "vsra.vi", + "vsra.vv", + "vsra.vx", + "vsrl.vi", + "vsrl.vv", + "vsrl.vx", + "vssra.vi", + "vssra.vv", + "vssra.vx", + "vssrl.vi", + "vssrl.vv", + "vssrl.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isShift(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.LaneShifter]]." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSlid.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSlid.scala new file mode 100644 index 00000000..ecf2f4a7 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSlid.scala @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isSlid { + + def apply(t1DecodePattern: T1DecodePattern): isSlid = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isSlid(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfslide1down.vf", + "vfslide1up.vf", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isSlid(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "used in Sequencer mask unit, decide which vrf should be read. send read request to corresponding lane, lane will respond data to Sequencer. Sequencer will write data to VD. mask unit is work as the router here. TODO: opimize mask unit. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSpecial.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSpecial.scala new file mode 100644 index 00000000..dfa9045e --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSpecial.scala @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isSpecial { + + def apply(t1DecodePattern: T1DecodePattern): isSpecial = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isSpecial(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcompress.vm", + "vcpop.m", + "vfirst.m", + "vfmv.f.s", + "vfmv.s.f", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfslide1down.vf", + "vfslide1up.vf", + "vfwredosum.vs", + "vfwredusum.vs", + "viota.m", + "vloxei1024.v", + "vloxei128.v", + "vloxei16.v", + "vloxei256.v", + "vloxei32.v", + "vloxei512.v", + "vloxei64.v", + "vloxei8.v", + "vluxei1024.v", + "vluxei128.v", + "vluxei16.v", + "vluxei256.v", + "vluxei32.v", + "vluxei512.v", + "vluxei64.v", + "vluxei8.v", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmsbf.m", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsif.m", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx", + "vmsof.m", + "vmv.s.x", + "vmv.x.s", + "vredand.vs", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredor.vs", + "vredsum.vs", + "vredxor.vs", + "vrgather.vv", + "vrgatherei16.vv", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vsoxei1024.v", + "vsoxei128.v", + "vsoxei16.v", + "vsoxei256.v", + "vsoxei32.v", + "vsoxei512.v", + "vsoxei64.v", + "vsoxei8.v", + "vsuxei1024.v", + "vsuxei128.v", + "vsuxei16.v", + "vsuxei256.v", + "vsuxei32.v", + "vsuxei512.v", + "vsuxei64.v", + "vsuxei8.v", + "vwredsum.vs", + "vwredsumu.vs", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isSpecial(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "if Sequencer is the router for data from Lane to LSU or Sequencer mask unit. special -> maskUnit || index type load store " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSpecialslot.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSpecialslot.scala new file mode 100644 index 00000000..54516774 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSpecialslot.scala @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isSpecialslot { + + def apply(t1DecodePattern: T1DecodePattern): isSpecialslot = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isSpecialslot(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vadc.vim", + "vadc.vvm", + "vadc.vxm", + "vcpop.m", + "vfirst.m", + "vfmerge.vfm", + "vfmv.v.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfwadd.wf", + "vfwadd.wv", + "vfwsub.wf", + "vfwsub.wv", + "viota.m", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmand.mm", + "vmandn.mm", + "vmerge.vim", + "vmv.v.i", + "vmerge.vvm", + "vmv.v.v", + "vmerge.vxm", + "vmv.v.x", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vmnand.mm", + "vmnor.mm", + "vmor.mm", + "vmorn.mm", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmsbf.m", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsif.m", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx", + "vmsof.m", + "vmxnor.mm", + "vmxor.mm", + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vsbc.vvm", + "vsbc.vxm", + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isSpecialslot(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "special instructions schedule to slot0." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSreadvd.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSreadvd.scala new file mode 100644 index 00000000..741d2f78 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSreadvd.scala @@ -0,0 +1,312 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isSreadvd { + + def apply(t1DecodePattern: T1DecodePattern): isSreadvd = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isSreadvd(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaadd.vv", + "vaadd.vx", + "vaaddu.vv", + "vaaddu.vx", + "vadc.vim", + "vadc.vvm", + "vadc.vxm", + "vadd.vi", + "vadd.vv", + "vadd.vx", + "vand.vi", + "vand.vv", + "vand.vx", + "vasub.vv", + "vasub.vx", + "vasubu.vv", + "vasubu.vx", + "vcompress.vm", + "vdiv.vv", + "vdiv.vx", + "vdivu.vv", + "vdivu.vx", + "vfadd.vf", + "vfadd.vv", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfdiv.vf", + "vfdiv.vv", + "vfmax.vf", + "vfmax.vv", + "vfmerge.vfm", + "vfmv.v.f", + "vfmin.vf", + "vfmin.vv", + "vfmul.vf", + "vfmul.vv", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrdiv.vf", + "vfrec7.v", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfrsqrt7.v", + "vfrsub.vf", + "vfsgnj.vf", + "vfsgnj.vv", + "vfsgnjn.vf", + "vfsgnjn.vv", + "vfsgnjx.vf", + "vfsgnjx.vv", + "vfslide1down.vf", + "vfslide1up.vf", + "vfsqrt.v", + "vfsub.vf", + "vfsub.vv", + "vfwadd.vf", + "vfwadd.vv", + "vfwadd.wf", + "vfwadd.wv", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vfwmacc.vf", + "vfwmacc.vv", + "vfwmsac.vf", + "vfwmsac.vv", + "vfwmul.vf", + "vfwmul.vv", + "vfwnmacc.vf", + "vfwnmacc.vv", + "vfwnmsac.vf", + "vfwnmsac.vv", + "vfwredosum.vs", + "vfwredusum.vs", + "vfwsub.vf", + "vfwsub.vv", + "vfwsub.wf", + "vfwsub.wv", + "vid.v", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmax.vv", + "vmax.vx", + "vmaxu.vv", + "vmaxu.vx", + "vmerge.vim", + "vmv.v.i", + "vmerge.vvm", + "vmv.v.v", + "vmerge.vxm", + "vmv.v.x", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vmin.vv", + "vmin.vx", + "vminu.vv", + "vminu.vx", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx", + "vmul.vv", + "vmul.vx", + "vmulh.vv", + "vmulh.vx", + "vmulhsu.vv", + "vmulhsu.vx", + "vmulhu.vv", + "vmulhu.vx", + "vmv.s.x", + "vmv.x.s", + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v", + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vor.vi", + "vor.vv", + "vor.vx", + "vredand.vs", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredor.vs", + "vredsum.vs", + "vredxor.vs", + "vrem.vv", + "vrem.vx", + "vremu.vv", + "vremu.vx", + "vrgather.vi", + "vrgather.vv", + "vrgather.vx", + "vrgatherei16.vv", + "vrsub.vi", + "vrsub.vx", + "vsadd.vi", + "vsadd.vv", + "vsadd.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vsbc.vvm", + "vsbc.vxm", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vsll.vi", + "vsll.vv", + "vsll.vx", + "vsmul.vv", + "vsmul.vx", + "vsra.vi", + "vsra.vv", + "vsra.vx", + "vsrl.vi", + "vsrl.vv", + "vsrl.vx", + "vssra.vi", + "vssra.vv", + "vssra.vx", + "vssrl.vi", + "vssrl.vv", + "vssrl.vx", + "vssub.vv", + "vssub.vx", + "vssubu.vv", + "vssubu.vx", + "vsub.vv", + "vsub.vx", + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwredsum.vs", + "vwredsumu.vs", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + "vxor.vi", + "vxor.vv", + "vxor.vx", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isSreadvd(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "sReadVD -> !(ma || maskLogic): instructions that need to read vd as the operator. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSwrite.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSwrite.scala new file mode 100644 index 00000000..49243c44 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isSwrite.scala @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isSwrite { + + def apply(t1DecodePattern: T1DecodePattern): isSwrite = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isSwrite(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfirst.m", + "vfmv.f.s", + "vl1re16.v", + "vl1re32.v", + "vl1re64.v", + "vl1re8.v", + "vl2re16.v", + "vl2re32.v", + "vl2re64.v", + "vl2re8.v", + "vl4re16.v", + "vl4re32.v", + "vl4re64.v", + "vl4re8.v", + "vl8re16.v", + "vl8re32.v", + "vl8re64.v", + "vl8re8.v", + "vle1024.v", + "vle1024ff.v", + "vle128.v", + "vle128ff.v", + "vle16.v", + "vle16ff.v", + "vle256.v", + "vle256ff.v", + "vle32.v", + "vle32ff.v", + "vle512.v", + "vle512ff.v", + "vle64.v", + "vle64ff.v", + "vle8.v", + "vle8ff.v", + "vlm.v", + "vloxei1024.v", + "vloxei128.v", + "vloxei16.v", + "vloxei256.v", + "vloxei32.v", + "vloxei512.v", + "vloxei64.v", + "vloxei8.v", + "vlse1024.v", + "vlse128.v", + "vlse16.v", + "vlse256.v", + "vlse32.v", + "vlse512.v", + "vlse64.v", + "vlse8.v", + "vluxei1024.v", + "vluxei128.v", + "vluxei16.v", + "vluxei256.v", + "vluxei32.v", + "vluxei512.v", + "vluxei64.v", + "vluxei8.v", + "vmv.x.s", + "vs1r.v", + "vs2r.v", + "vs4r.v", + "vs8r.v", + "vse1024.v", + "vse128.v", + "vse16.v", + "vse256.v", + "vse32.v", + "vse512.v", + "vse64.v", + "vse8.v", + "vsm.v", + "vsoxei1024.v", + "vsoxei128.v", + "vsoxei16.v", + "vsoxei256.v", + "vsoxei32.v", + "vsoxei512.v", + "vsoxei64.v", + "vsoxei8.v", + "vsse1024.v", + "vsse128.v", + "vsse16.v", + "vsse256.v", + "vsse32.v", + "vsse512.v", + "vsse64.v", + "vsse8.v", + "vsuxei1024.v", + "vsuxei128.v", + "vsuxei16.v", + "vsuxei256.v", + "vsuxei32.v", + "vsuxei512.v", + "vsuxei64.v", + "vsuxei8.v", + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwredsum.vs", + "vwredsumu.vs", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + // rv_zvbb + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isSwrite(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "sWrite -> targetRd || readOnly || crossWrite || maskDestination || reduce || loadStore instruction will write vd or rd(scalar) from outside of lane. It will request vrf wait, and lane will not write. No write to vd when isSwrite is True!!!" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isTargetrd.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isTargetrd.scala new file mode 100644 index 00000000..29dde00e --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isTargetrd.scala @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isTargetrd { + + def apply(t1DecodePattern: T1DecodePattern): isTargetrd = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isTargetrd(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfirst.m", + "vfmv.f.s", + "vmv.x.s" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isTargetrd(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "write rd/fd at scalar core." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnorderwrite.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnorderwrite.scala new file mode 100644 index 00000000..bbf09061 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnorderwrite.scala @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isUnorderwrite { + + def apply(t1DecodePattern: T1DecodePattern): isUnorderwrite = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isUnorderwrite(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vfmv.f.s", + "vfmv.s.f", + "vfredosum.vs", + "vfslide1down.vf", + "vfslide1up.vf", + "viota.m", + "vmv.s.x", + "vmv.x.s", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isUnorderwrite(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "unmanaged write for VRF. these instructions cannot be chain as source. TODO: add an attribute these instruction cannot be the source of chaining. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnsigned0.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnsigned0.scala new file mode 100644 index 00000000..dda6ce36 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnsigned0.scala @@ -0,0 +1,173 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isUnsigned0 { + + def apply(t1DecodePattern: T1DecodePattern): isUnsigned0 = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isUnsigned0(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaaddu.vv", + "vaaddu.vx", + "vasubu.vv", + "vasubu.vx", + "vcpop.m", + "vdivu.vv", + "vdivu.vx", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfirst.m", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrec7.v", + "vfrsqrt7.v", + "vfsqrt.v", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vid.v", + "viota.m", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmaxu.vv", + "vmaxu.vx", + "vminu.vv", + "vminu.vx", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmsbf.m", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsif.m", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsof.m", + "vmulhsu.vv", + "vmulhsu.vx", + "vmulhu.vv", + "vmulhu.vx", + "vmv.s.x", + "vmv.x.s", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vredmaxu.vs", + "vredminu.vs", + "vremu.vv", + "vremu.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vsll.vi", + "vsll.vv", + "vsll.vx", + "vsrl.vi", + "vsrl.vv", + "vsrl.vx", + "vssrl.vi", + "vssrl.vv", + "vssrl.vx", + "vssubu.vv", + "vssubu.vx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwredsumu.vs", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8", + // rv_zvbb + "vandn.vv", + "vandn.vx", + "vbrev.v", + "vbrev8.v", + "vrev8.v", + "vclz.v", + "vctz.v", + "vcpop.v", + "vrol.vv", + "vrol.vx", + "vror.vv", + "vror.vx", + "vror.vi", + "vwsll.vv", + "vwsll.vx", + "vwsll.vi", + "vfslide1down.vf", + "vfslide1up.vf", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isUnsigned0(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "is src0 unsigned? used everywhere in Lane and VFU. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnsigned1.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnsigned1.scala new file mode 100644 index 00000000..283aa5e1 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isUnsigned1.scala @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isUnsigned1 { + + def apply(t1DecodePattern: T1DecodePattern): isUnsigned1 = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isUnsigned1(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaaddu.vv", + "vaaddu.vx", + "vasubu.vv", + "vasubu.vx", + "vcpop.m", + "vdivu.vv", + "vdivu.vx", + "vfcvt.f.xu.v", + "vfcvt.rtz.xu.f.v", + "vfirst.m", + "vid.v", + "viota.m", + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmaxu.vv", + "vmaxu.vx", + "vminu.vv", + "vminu.vx", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmsbf.m", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsif.m", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsof.m", + "vmulhu.vv", + "vmulhu.vx", + "vmv.s.x", + "vmv.x.s", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx", + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vredmaxu.vs", + "vredminu.vs", + "vremu.vv", + "vremu.vx", + "vsaddu.vi", + "vsaddu.vv", + "vsaddu.vx", + "vsll.vi", + "vsll.vv", + "vsll.vx", + "vsrl.vi", + "vsrl.vv", + "vsrl.vx", + "vssrl.vi", + "vssrl.vv", + "vssrl.vx", + "vssubu.vv", + "vssubu.vx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwredsumu.vs", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8", + // rv_zvbb + "vandn.vv", + "vandn.vx", + "vbrev.v", + "vbrev8.v", + "vrev8.v", + "vclz.v", + "vctz.v", + "vcpop.v", + "vrol.vv", + "vrol.vx", + "vror.vv", + "vror.vx", + "vror.vi", + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isUnsigned1(value: TriState) extends BooleanDecodeAttribute { + override val description: String = " is src1 unsigned? used everywhere in Lane and VFU. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVector.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVector.scala new file mode 100644 index 00000000..4d7c5fe6 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVector.scala @@ -0,0 +1,34 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isVector { + + def apply(t1DecodePattern: T1DecodePattern): isVector = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isVector(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => i.instructionSet.name == "rv_v") + allMatched.contains(t1DecodePattern.instruction) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isVector(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "This instruction should be decode by T1." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVtype.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVtype.scala new file mode 100644 index 00000000..89d4988f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVtype.scala @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isVtype { + + def apply(t1DecodePattern: T1DecodePattern): isVtype = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isVtype(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vaadd.vv", + "vaaddu.vv", + "vadc.vvm", + "vadd.vv", + "vand.vv", + "vasub.vv", + "vasubu.vv", + "vcompress.vm", + "vcpop.m", + "vdiv.vv", + "vdivu.vv", + "vfadd.vv", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfdiv.vv", + "vfirst.m", + "vfmacc.vv", + "vfmadd.vv", + "vfmax.vv", + "vfmin.vv", + "vfmsac.vv", + "vfmsub.vv", + "vfmul.vv", + "vfmv.f.s", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfnmacc.vv", + "vfnmadd.vv", + "vfnmsac.vv", + "vfnmsub.vv", + "vfrec7.v", + "vfredmax.vs", + "vfredmin.vs", + "vfredosum.vs", + "vfredusum.vs", + "vfrsqrt7.v", + "vfsgnj.vv", + "vfsgnjn.vv", + "vfsgnjx.vv", + "vfsqrt.v", + "vfsub.vv", + "vfwadd.vv", + "vfwadd.wv", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vfwmacc.vv", + "vfwmsac.vv", + "vfwmul.vv", + "vfwnmacc.vv", + "vfwnmsac.vv", + "vfwredosum.vs", + "vfwredusum.vs", + "vfwsub.vv", + "vfwsub.wv", + "vid.v", + "viota.m", + "vmacc.vv", + "vmadc.vv", + "vmadc.vvm", + "vmadd.vv", + "vmand.mm", + "vmandn.mm", + "vmax.vv", + "vmaxu.vv", + "vmerge.vvm", + "vmv.v.v", + "vmfeq.vv", + "vmfle.vv", + "vmflt.vv", + "vmfne.vv", + "vmin.vv", + "vminu.vv", + "vmnand.mm", + "vmnor.mm", + "vmor.mm", + "vmorn.mm", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbf.m", + "vmseq.vv", + "vmsif.m", + "vmsle.vv", + "vmsleu.vv", + "vmslt.vv", + "vmsltu.vv", + "vmsne.vv", + "vmsof.m", + "vmul.vv", + "vmulh.vv", + "vmulhsu.vv", + "vmulhu.vv", + "vmv.x.s", + "vmxnor.mm", + "vmxor.mm", + "vnclip.wv", + "vnclipu.wv", + "vnmsac.vv", + "vnmsub.vv", + "vnsra.wv", + "vnsrl.wv", + "vor.vv", + "vredand.vs", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredor.vs", + "vredsum.vs", + "vredxor.vs", + "vrem.vv", + "vremu.vv", + "vrgather.vv", + "vrgatherei16.vv", + "vsadd.vv", + "vsaddu.vv", + "vsbc.vvm", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vsll.vv", + "vsmul.vv", + "vsra.vv", + "vsrl.vv", + "vssra.vv", + "vssrl.vv", + "vssub.vv", + "vssubu.vv", + "vsub.vv", + "vwadd.vv", + "vwadd.wv", + "vwaddu.vv", + "vwaddu.wv", + "vwmacc.vv", + "vwmaccsu.vv", + "vwmaccu.vv", + "vwmul.vv", + "vwmulsu.vv", + "vwmulu.vv", + "vwredsum.vs", + "vwredsumu.vs", + "vwsub.vv", + "vwsub.wv", + "vwsubu.vv", + "vwsubu.wv", + "vxor.vv", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8", + // rv_zvbb + "vandn.vv", + "vrol.vv", + "vror.vv", + "vwsll.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isVtype(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "src1 is vtype." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVwmacc.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVwmacc.scala new file mode 100644 index 00000000..fb3d1a77 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isVwmacc.scala @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isVwmacc { + + def apply(t1DecodePattern: T1DecodePattern): isVwmacc = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isVwmacc(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isVwmacc(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "special MAC instruction, MAC use vd as source, it cross read vd. TODO: cross read vd + mac uop. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isWidenreduce.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isWidenreduce.scala new file mode 100644 index 00000000..ffc295bb --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isWidenreduce.scala @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isWidenreduce { + + def apply(t1DecodePattern: T1DecodePattern): isWidenreduce = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isWidenreduce(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vwredsum.vs", + "vwredsumu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcpop.m", + "vfclass.v", + "vfcvt.f.x.v", + "vfcvt.f.xu.v", + "vfcvt.rtz.x.f.v", + "vfcvt.rtz.xu.f.v", + "vfcvt.x.f.v", + "vfcvt.xu.f.v", + "vfirst.m", + "vfmv.f.s", + "vfmv.s.f", + "vfncvt.f.f.w", + "vfncvt.f.x.w", + "vfncvt.f.xu.w", + "vfncvt.rod.f.f.w", + "vfncvt.rtz.x.f.w", + "vfncvt.rtz.xu.f.w", + "vfncvt.x.f.w", + "vfncvt.xu.f.w", + "vfrec7.v", + "vfrsqrt7.v", + "vfsqrt.v", + "vfwcvt.f.f.v", + "vfwcvt.f.x.v", + "vfwcvt.f.xu.v", + "vfwcvt.rtz.x.f.v", + "vfwcvt.rtz.xu.f.v", + "vfwcvt.x.f.v", + "vfwcvt.xu.f.v", + "vid.v", + "viota.m", + "vmsbf.m", + "vmsif.m", + "vmsof.m", + "vmv.s.x", + "vmv.x.s", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class isWidenreduce(value: TriState) extends BooleanDecodeAttribute { + override val description: String = + "a special widen, only write dual vd from Top to element0 it doesn't cross. TODO: better name. " +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isWriteCount.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isWriteCount.scala new file mode 100644 index 00000000..b352a175 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isWriteCount.scala @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isWriteCount { + + def apply(t1DecodePattern: T1DecodePattern): isWriteCount = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isWriteCount(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val isExtend = Seq( + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + + val isCrossWrite = Seq( + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + // rv_zvbb + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + + val isSlide = Seq( + "vfslide1down.vf", + "vfslide1up.vf", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx" + ) + + val isGather = Seq( + "vrgather.vv", + "vrgatherei16.vv" + ) + + val allMatched = isExtend ++ isCrossWrite ++ isSlide ++ isGather + + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isWriteCount(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZero.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZero.scala new file mode 100644 index 00000000..5c88ea6f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZero.scala @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isZero { + + def apply(t1DecodePattern: T1DecodePattern): isZero = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isZero(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = Seq( + "vcompress.vm", + "vfslide1down.vf", + "vfslide1up.vf", + "viota.m", + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isZero(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.OtherUnit]]" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZvbb.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZvbb.scala new file mode 100644 index 00000000..53d6e5be --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZvbb.scala @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isZvbb { + + def apply(t1DecodePattern: T1DecodePattern): isZvbb = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isZvbb(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = + if (t1DecodePattern.param.zvbbEnable) + Seq( + "vandn.vv", + "vandn.vx", + "vbrev.v", + "vbrev8.v", + "vrev8.v", + "vclz.v", + "vctz.v", + "vcpop.v", + "vrol.vv", + "vrol.vx", + "vror.vv", + "vror.vx", + "vror.vi", + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + else Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isZvbb(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.LaneZvbb]]." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZvma.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZvma.scala new file mode 100644 index 00000000..45045e51 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/isZvma.scala @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object isZvma { + + def apply(t1DecodePattern: T1DecodePattern): isZvma = + Seq( + y _ -> Y, + n _ -> N, + dc _ -> DC + ).collectFirst { + case (fn, tri) if fn(t1DecodePattern) => isZvma(tri) + }.get + + def y(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = + if (t1DecodePattern.param.useXsfmm) + Seq( + "vlte8", + "vlte16", + "vlte32", + "vste8", + "vste16", + "vste32", + "vtmv.v.t", + "vtmv.t.v", + "mm.u.u", + "mm.u.s", + "mm.s.u", + "mm.s.s", + "mm.e5m2.e4m3", + "mm.e5m2.e5m2", + "mm.e4m3.e4m3", + "mm.e4m3.e5m2", + "vtzero.t", + "p2mm.f.f", + "vtdiscard" + ) + else Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def n(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched = t1DecodePattern.param.allInstructions.filter(i => !(y(t1DecodePattern) || dc(t1DecodePattern))) + allMatched.contains(t1DecodePattern.instruction) + } + + def dc(t1DecodePattern: T1DecodePattern): Boolean = false +} + +case class isZvma(value: TriState) extends BooleanDecodeAttribute { + override val description: String = "goes to [[org.chipsalliance.t1.rtl.LaneZvma]]." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/logicUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/logicUop.scala new file mode 100644 index 00000000..22685ae6 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/logicUop.scala @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait LogicUopType extends Uop +object logicUop0 extends LogicUopType +object logicUop1 extends LogicUopType +object logicUop2 extends LogicUopType +object logicUop4 extends LogicUopType +object logicUop5 extends LogicUopType +object logicUop6 extends LogicUopType +object logicUop8 extends LogicUopType +object logicUop9 extends LogicUopType + +object LogicUop { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> logicUop0, + t1 _ -> logicUop1, + t2 _ -> logicUop2, + t4 _ -> logicUop4, + t5 _ -> logicUop5, + t6 _ -> logicUop6, + t8 _ -> logicUop8, + t9 _ -> logicUop9 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vand.vi", + "vand.vv", + "vand.vx", + "vmand.mm", + "vredand.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmor.mm", + "vor.vi", + "vor.vv", + "vor.vx", + "vredor.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmxor.mm", + "vredxor.vs", + "vxor.vi", + "vxor.vv", + "vxor.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmandn.mm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmorn.mm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmxnor.mm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmnand.mm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmnor.mm" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/maskPipeOpcode.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/maskPipeOpcode.scala new file mode 100644 index 00000000..668c8c58 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/maskPipeOpcode.scala @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait MaskPipeUop extends Uop +object MaskUop0 extends MaskPipeUop +object MaskUop1 extends MaskPipeUop +object MaskUop2 extends MaskPipeUop +object MaskUop3 extends MaskPipeUop +object MaskUop4 extends MaskPipeUop +object MaskUop5 extends MaskPipeUop +object MaskUop6 extends MaskPipeUop +object MaskUop7 extends MaskPipeUop +object MaskUop8 extends MaskPipeUop +object MaskUop9 extends MaskPipeUop +object MaskUop10 extends MaskPipeUop +object MaskUop11 extends MaskPipeUop + +// 0000 x => extend x?4:2 [0,1] +// 0001 x => gather x?16:sew [2,3] +// 001 xy => slide x?up:down y?s:1 [4,7] +// 010 xx => 0: add 1: logic 2: float 3: order [8,11] +object MaskPipeOpcode { + + def apply(t1DecodePattern: T1DecodePattern): MaskPipeOpcode = { + Seq( + t0 _ -> MaskUop0, + t1 _ -> MaskUop1, + t2 _ -> MaskUop2, + t3 _ -> MaskUop3, + t4 _ -> MaskUop4, + t5 _ -> MaskUop5, + t6 _ -> MaskUop6, + t7 _ -> MaskUop7, + t8 _ -> MaskUop8, + t9 _ -> MaskUop9, + t10 _ -> MaskUop10, + t11 _ -> MaskUop11 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => MaskPipeOpcode(tpe) + }.getOrElse(MaskPipeOpcode(MaskUop0)) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val isCrossWrite = Seq( + "vwadd.vv", + "vwadd.vx", + "vwadd.wv", + "vwadd.wx", + "vwaddu.vv", + "vwaddu.vx", + "vwaddu.wv", + "vwaddu.wx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx", + "vwsub.vv", + "vwsub.vx", + "vwsub.wv", + "vwsub.wx", + "vwsubu.vv", + "vwsubu.vx", + "vwsubu.wv", + "vwsubu.wx", + // rv_zvbb + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + val extend = Seq( + "vsext.vf2", + "vzext.vf2" + ) + val allMatched: Seq[String] = extend ++ isCrossWrite + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vsext.vf4", + "vzext.vf4" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrgather.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrgatherei16.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfslide1down.vf", + "vslide1down.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vslidedown.vi", + "vslidedown.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfslide1up.vf", + "vslide1up.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t7(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vslideup.vi", + "vslideup.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vcpop.m", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredsum.vs", + "vwredsum.vs", + "vwredsumu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vredand.vs", + "vredor.vs", + "vredxor.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t10(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfredmax.vs", + "vfredmin.vs", + "vfredusum.vs", + "vfwredusum.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t11(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfredosum.vs", + "vfwredosum.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class MaskPipeOpcode(value: MaskPipeUop) extends UopDecodeAttribute[MaskPipeUop] { + override val description: String = "" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/mulUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/mulUop.scala new file mode 100644 index 00000000..89f14874 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/mulUop.scala @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait MulUOPType extends Uop +object mulUop0 extends MulUOPType +object mulUop1 extends MulUOPType +object mulUop3 extends MulUOPType +object mulUop5 extends MulUOPType +object mulUop10 extends MulUOPType +object mulUop14 extends MulUOPType + +object MulUOP { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> mulUop0, + t1 _ -> mulUop1, + t3 _ -> mulUop3, + t5 _ -> mulUop5, + t10 _ -> mulUop10, + t14 _ -> mulUop14 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmul.vv", + "vmul.vx", + "vsmul.vv", + "vsmul.vx", + "vwmul.vv", + "vwmul.vx", + "vwmulsu.vv", + "vwmulsu.vx", + "vwmulu.vv", + "vwmulu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmadd.vv", + "vmadd.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmulh.vv", + "vmulh.vx", + "vmulhsu.vv", + "vmulhsu.vx", + "vmulhu.vv", + "vmulhu.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmacc.vv", + "vmacc.vx", + "vwmacc.vv", + "vwmacc.vx", + "vwmaccsu.vv", + "vwmaccsu.vx", + "vwmaccu.vv", + "vwmaccu.vx", + "vwmaccus.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t10(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vnmsub.vv", + "vnmsub.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t14(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vnmsac.vv", + "vnmsac.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class MulUOP(value: MulUOPType) extends UopDecodeAttribute[MulUOPType] { + override val description: String = "" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/otherUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/otherUop.scala new file mode 100644 index 00000000..8dc4eeeb --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/otherUop.scala @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait OtherUopType extends Uop +object otherUop0 extends OtherUopType +object otherUop1 extends OtherUopType +object otherUop2 extends OtherUopType +object otherUop3 extends OtherUopType +object otherUop4 extends OtherUopType +object otherUop5 extends OtherUopType +object otherUop6 extends OtherUopType +object otherUop7 extends OtherUopType +object otherUop8 extends OtherUopType +object otherUop9 extends OtherUopType + +object OtherUop { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> otherUop0, + t1 _ -> otherUop1, + t2 _ -> otherUop2, + t3 _ -> otherUop3, + t4 _ -> otherUop4, + t5 _ -> otherUop5, + t6 _ -> otherUop6, + t7 _ -> otherUop7, + t8 _ -> otherUop8, + t9 _ -> otherUop9 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfirst.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsbf.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsof.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsif.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrgather.vi", + "vrgather.vv", + "vrgather.vx", + "vrgatherei16.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmerge.vfm", + "vfmv.v.f", + "vmerge.vim", + "vmv.v.i", + "vmerge.vvm", + "vmv.v.v", + "vmerge.vxm", + "vmv.v.x" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vnclip.wi", + "vnclip.wv", + "vnclip.wx", + "vnclipu.wi", + "vnclipu.wv", + "vnclipu.wx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t7(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmv.s.f", + "vmv.s.x", + "vmv.x.s" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vcpop.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vid.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/package.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/package.scala new file mode 100644 index 00000000..75e371b2 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/package.scala @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder + +import chisel3.experimental.hierarchy.Instantiate +import chisel3.properties.{ClassType, Property} + +package object attribute { + + /** + * Attribute that will be encode the property of an instruction in the uarch and will be additional encode into the + * object module, which will be used to provide metadata for verifications. + */ + trait DecodeAttribute[T] { + val identifier: String = this.getClass.getSimpleName.replace("$", "") + val value: T + val description: String + + // Property of this attribute + def om: Property[ClassType] = { + val obj = Instantiate(new T1DecodeAttributeOM(identifier, description, value.toString)) + obj.getPropertyReference + } + + } + + sealed trait TriState + case object Y extends TriState + case object N extends TriState + case object DC extends TriState + + trait Uop + object UopDC extends Uop + trait UopDecodeAttribute[T <: Uop] extends DecodeAttribute[T] + + trait BooleanDecodeAttribute extends DecodeAttribute[TriState] + // TODO: we can add more scala type to avoid string type. + trait StringDecodeAttribute extends DecodeAttribute[String] +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/shiftUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/shiftUop.scala new file mode 100644 index 00000000..22f7246f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/shiftUop.scala @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait ShiftUopType extends Uop +object shiftUop0 extends ShiftUopType +object shiftUop1 extends ShiftUopType +object shiftUop2 extends ShiftUopType +object shiftUop4 extends ShiftUopType +object shiftUop6 extends ShiftUopType + +object ShiftUop { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> shiftUop0, + t1 _ -> shiftUop1, + t2 _ -> shiftUop2, + t4 _ -> shiftUop4, + t6 _ -> shiftUop6 + ).collectFirst { case (fn, tpe) if fn(t1DecodePattern) => tpe } + .getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vnsrl.wi", + "vnsrl.wv", + "vnsrl.wx", + "vsrl.vi", + "vsrl.vv", + "vsrl.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vsll.vi", + "vsll.vv", + "vsll.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vnsra.wi", + "vnsra.wv", + "vnsra.wx", + "vsra.vi", + "vsra.vv", + "vsra.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vssrl.vi", + "vssrl.vv", + "vssrl.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vssra.vi", + "vssra.vv", + "vssra.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class ShiftUop(value: ShiftUopType) extends UopDecodeAttribute[ShiftUopType] { + override val description: String = "" +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/topUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/topUop.scala new file mode 100644 index 00000000..8615238b --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/topUop.scala @@ -0,0 +1,331 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait TopUopType extends Uop +object TopT0 extends TopUopType +object TopT1 extends TopUopType +object TopT2 extends TopUopType +object TopT3 extends TopUopType +object TopT4 extends TopUopType +object TopT5 extends TopUopType +object TopT6 extends TopUopType +object TopT7 extends TopUopType +object TopT8 extends TopUopType +object TopT9 extends TopUopType +object TopT10 extends TopUopType +object TopT11 extends TopUopType +object TopT12 extends TopUopType +object TopT13 extends TopUopType +object TopT14 extends TopUopType +object TopT15 extends TopUopType +object TopT16 extends TopUopType +object TopT17 extends TopUopType +object TopT18 extends TopUopType +object TopT19 extends TopUopType +object TopT20 extends TopUopType +object TopT21 extends TopUopType +object TopT22 extends TopUopType +object TopT23 extends TopUopType +object TopT24 extends TopUopType +object TopT25 extends TopUopType +object TopT26 extends TopUopType +object TopT27 extends TopUopType +object TopT28 extends TopUopType +object TopT29 extends TopUopType +object TopT30 extends TopUopType +object TopT31 extends TopUopType + +object TopUop { + + def apply(t1DecodePattern: T1DecodePattern): TopUop = { + Seq( + t0 _ -> TopT0, + t1 _ -> TopT1, + t2 _ -> TopT2, + t3 _ -> TopT3, + t4 _ -> TopT4, + t5 _ -> TopT5, + t6 _ -> TopT6, + t7 _ -> TopT7, + t8 _ -> TopT8, + t9 _ -> TopT9, + t10 _ -> TopT10, + t11 _ -> TopT11, + t12 _ -> TopT12, + t13 _ -> TopT13, + t14 _ -> TopT14, + t15 _ -> TopT15, + t16 _ -> TopT16, + t17 _ -> TopT17, + t18 _ -> TopT18, + t19 _ -> TopT19, + t20 _ -> TopT20, + t21 _ -> TopT21, + t22 _ -> TopT22, + t23 _ -> TopT23, + t24 _ -> TopT24, + t25 _ -> TopT25, + t26 _ -> TopT26, + t27 _ -> TopT27, + t28 _ -> TopT28, + t29 _ -> TopT29, + t30 _ -> TopT30, + t31 _ -> TopT31 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => TopUop(tpe) + }.getOrElse(TopUop(TopT0)) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vslidedown.vi", + "vslidedown.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vslideup.vi", + "vslideup.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vslide1down.vx", + "vfslide1down.vf" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vslide1up.vx", + "vfslide1up.vf" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrgather.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrgatherei16.vv" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t7(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("viota.m") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vcompress.vm") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t10(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmv.s.f", + "vmv.s.x" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t11(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfmv.f.s", + "vmv.x.s" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t12(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t13(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t14(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmsbf.m", + "vmsif.m", + "vmsof.m" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t15(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vfirst.m") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t16(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vcpop.m", + "vredmax.vs", + "vredmaxu.vs", + "vredmin.vs", + "vredminu.vs", + "vredsum.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t17(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vwredsum.vs", + "vwredsumu.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t18(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vredand.vs", + "vredor.vs", + "vredxor.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t19(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vfredmax.vs", + "vfredmin.vs" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t20(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vfredusum.vs") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t21(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vfredosum.vs") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t22(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vfwredusum.vs") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t23(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vfwredosum.vs") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t24(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vmadc.vi", + "vmadc.vim", + "vmadc.vv", + "vmadc.vvm", + "vmadc.vx", + "vmadc.vxm", + "vmfeq.vf", + "vmfeq.vv", + "vmfge.vf", + "vmfgt.vf", + "vmfle.vf", + "vmfle.vv", + "vmflt.vf", + "vmflt.vv", + "vmfne.vf", + "vmfne.vv", + "vmsbc.vv", + "vmsbc.vvm", + "vmsbc.vx", + "vmsbc.vxm", + "vmseq.vi", + "vmseq.vv", + "vmseq.vx", + "vmsgt.vi", + "vmsgt.vx", + "vmsgtu.vi", + "vmsgtu.vx", + "vmsle.vi", + "vmsle.vv", + "vmsle.vx", + "vmsleu.vi", + "vmsleu.vv", + "vmsleu.vx", + "vmslt.vv", + "vmslt.vx", + "vmsltu.vv", + "vmsltu.vx", + "vmsne.vi", + "vmsne.vv", + "vmsne.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t25(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq() + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t26(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vzext.vf2") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t27(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vsext.vf2") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t28(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vzext.vf4") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t29(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vsext.vf4") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t30(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vzext.vf8") + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t31(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq("vsext.vf8") + allMatched.contains(t1DecodePattern.instruction.name) + } + +} + +case class TopUop(value: TopUopType) extends UopDecodeAttribute[TopUopType] { + override val description: String = "uop for mask unit." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/uop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/uop.scala new file mode 100644 index 00000000..0673d65d --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/uop.scala @@ -0,0 +1,32 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +object DecoderUop { + + def apply(t1DecodePattern: T1DecodePattern): DecoderUop = { + val tpe: Option[DecoderUop] = Seq( + isDivider.y(t1DecodePattern) -> DivUOP(t1DecodePattern), + isFloat.y(t1DecodePattern) -> FloatUop(t1DecodePattern), + isMultiplier.y(t1DecodePattern) -> MulUOP(t1DecodePattern), + isAdder.y(t1DecodePattern) -> AdderUOP(t1DecodePattern), + isLogic.y(t1DecodePattern) -> LogicUop(t1DecodePattern), + isShift.y(t1DecodePattern) -> ShiftUop(t1DecodePattern), + isOther.y(t1DecodePattern) -> OtherUop(t1DecodePattern), + isZero.y(t1DecodePattern) -> ZeroUOP(t1DecodePattern), + isZvbb.y(t1DecodePattern) -> ZvbbUOP(t1DecodePattern) + ).collectFirst { + case (fn, tpe) if fn => DecoderUop(tpe) + } + require(tpe.size <= 1) + tpe.getOrElse(DecoderUop(UopDC)) + } + +} + +case class DecoderUop(value: Uop) extends UopDecodeAttribute[Uop] { + override val description: String = "uop for mask unit." +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/zeroUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/zeroUop.scala new file mode 100644 index 00000000..7b153009 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/zeroUop.scala @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait ZeroUOPType extends Uop +object zeroUop0 extends ZeroUOPType + +object ZeroUOP { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> zeroUop0 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vcompress.vm", + "vfslide1down.vf", + "vfslide1up.vf", + "viota.m", + "vmv1r.v", + "vmv2r.v", + "vmv4r.v", + "vmv8r.v", + "vsext.vf2", + "vsext.vf4", + "vsext.vf8", + "vslide1down.vx", + "vslide1up.vx", + "vslidedown.vi", + "vslidedown.vx", + "vslideup.vi", + "vslideup.vx", + "vzext.vf2", + "vzext.vf4", + "vzext.vf8" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/zvbbUop.scala b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/zvbbUop.scala new file mode 100644 index 00000000..9361933f --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/decoder/attribute/zvbbUop.scala @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu + +package framework.gpdomain.sequencer.decoder.attribute + +import framework.gpdomain.sequencer.decoder.T1DecodePattern + +trait ZvbbUOPType extends Uop +object zvbbUop0 extends ZvbbUOPType // brev +object zvbbUop1 extends ZvbbUOPType // brev8 +object zvbbUop2 extends ZvbbUOPType // rev8 +object zvbbUop3 extends ZvbbUOPType // clz +object zvbbUop4 extends ZvbbUOPType // ctz +object zvbbUop5 extends ZvbbUOPType // rol +object zvbbUop6 extends ZvbbUOPType // ror +object zvbbUop7 extends ZvbbUOPType // wsll +object zvbbUop8 extends ZvbbUOPType // andn +object zvbbUop9 extends ZvbbUOPType // pop + +object ZvbbUOP { + + def apply(t1DecodePattern: T1DecodePattern): Uop = { + Seq( + t0 _ -> zvbbUop0, + t1 _ -> zvbbUop1, + t2 _ -> zvbbUop2, + t3 _ -> zvbbUop3, + t4 _ -> zvbbUop4, + t5 _ -> zvbbUop5, + t6 _ -> zvbbUop6, + t7 _ -> zvbbUop7, + t8 _ -> zvbbUop8, + t9 _ -> zvbbUop9 + ).collectFirst { + case (fn, tpe) if fn(t1DecodePattern) => tpe + }.getOrElse(UopDC) + } + + def t0(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vbrev.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t1(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vbrev8.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t2(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrev8.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t3(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vclz.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t4(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vctz.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t5(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vrol.vv", + "vrol.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t6(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vror.vv", + "vror.vx", + "vror.vi" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t7(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vwsll.vv", + "vwsll.vx", + "vwsll.vi" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t8(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vandn.vv", + "vandn.vx" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + + def t9(t1DecodePattern: T1DecodePattern): Boolean = { + val allMatched: Seq[String] = Seq( + "vcpop.v" + ) + allMatched.contains(t1DecodePattern.instruction.name) + } + +} diff --git a/arch/src/main/scala/framework/gpdomain/sequencer/package.scala b/arch/src/main/scala/framework/gpdomain/sequencer/package.scala new file mode 100644 index 00000000..4b0f9fd7 --- /dev/null +++ b/arch/src/main/scala/framework/gpdomain/sequencer/package.scala @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: 2022 Jiuyang Liu +// Port to buckyball framework + +package framework.gpdomain + +import chisel3._ +import chisel3.util._ + +package object sequencer { + + /** Find first one */ + def ffo(input: UInt): UInt = + ((~(scanLeftOr(input) << 1)).asUInt & input)(input.getWidth - 1, 0) + + /** Conditional mask application */ + def maskAnd(mask: Bool, data: Data): Data = + Mux(mask, data, 0.U.asTypeOf(data)) + + /** Enable/disable mask */ + def maskEnable(enable: Bool, mask: UInt): UInt = + Mux(enable, mask, (-1.S(mask.getWidth.W)).asUInt.asTypeOf(mask)) + + /** Convert index to one-hot encoding */ + def indexToOH(index: UInt, chainingSize: Int): UInt = + UIntToOH(index(log2Ceil(chainingSize), 0)) + + /** Check if index matches in one-hot encoded lastReport */ + def ohCheck(lastReport: UInt, index: UInt, chainingSize: Int): Bool = + (indexToOH(index, chainingSize) & lastReport).orR + + /** Instruction index comparison: a < b */ + def instIndexL(a: UInt, b: UInt): Bool = { + require(a.getWidth == b.getWidth) + (a(a.getWidth - 2, 0) < b(b.getWidth - 2, 0)) ^ a(a.getWidth - 1) ^ b(b.getWidth - 1) + } + + /** Instruction index comparison: a <= b */ + def instIndexLE(a: UInt, b: UInt): Bool = { + require(a.getWidth == b.getWidth) + a === b || instIndexL(a, b) + } + + /** Cut UInt into equal width pieces */ + def cutUInt(data: UInt, width: Int): Vec[UInt] = { + require(data.getWidth % width == 0) + VecInit(Seq.tabulate(data.getWidth / width) { groupIndex => + data(groupIndex * width + width - 1, groupIndex * width) + }) + } + + /** Cut UInt into specified number of pieces */ + def cutUIntBySize(data: UInt, size: Int): Vec[UInt] = { + require(data.getWidth % size == 0) + val width: Int = data.getWidth / size + cutUInt(data, width) + } + + /** Change UInt size with optional sign extension */ + def changeUIntSize(data: UInt, size: Int, sign: Boolean = false): UInt = { + if (data.getWidth >= size) { + data(size - 1, 0) + } else { + val extend = if (sign) data(data.getWidth - 1) else false.B + Fill(size - data.getWidth, extend) ## data + } + } + + /** Carry-save adder 3:2 compressor */ + def csa32(s: UInt, c: UInt, a: UInt): (UInt, UInt) = { + val xor = s ^ c + val so = xor ^ a + val co = (xor & a) | (s & c) + (so, co) + } + + /** Multi-lane shifter */ + def multiShifter(right: Boolean, multiSize: Int)(data: UInt, shifterSize: UInt): UInt = { + VecInit( + data.asBools + .grouped(multiSize) + .toSeq + .transpose + .map { dataGroup => + if (right) { + (VecInit(dataGroup).asUInt >> shifterSize).asBools + } else { + (VecInit(dataGroup).asUInt << shifterSize).asBools + } + } + .transpose + .map(VecInit(_).asUInt) + ).asUInt + } + +} diff --git a/arch/src/main/scala/framework/memdomain/DISA.scala b/arch/src/main/scala/framework/memdomain/DISA.scala deleted file mode 100644 index c4365448..00000000 --- a/arch/src/main/scala/framework/memdomain/DISA.scala +++ /dev/null @@ -1,17 +0,0 @@ -package framework.memdomain - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -// import framework.ballcore.ballcore.RoCCCommandBB -import freechips.rocketchip.tile._ - - -class BuckyballRawCmd(implicit p: Parameters) extends Bundle { - val cmd = new RoCCCommand -} - -object DISA { - val MVIN_BITPAT = BitPat("b0011000") // 24 - val MVOUT_BITPAT = BitPat("b0011001") // 25 -} diff --git a/arch/src/main/scala/framework/memdomain/DomainDecoder.scala b/arch/src/main/scala/framework/memdomain/DomainDecoder.scala deleted file mode 100644 index bdd56df2..00000000 --- a/arch/src/main/scala/framework/memdomain/DomainDecoder.scala +++ /dev/null @@ -1,95 +0,0 @@ -package framework.memdomain - -import chisel3._ -import chisel3.util._ -import framework.frontend.decoder.PostGDCmd -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.DISA._ -import framework.memdomain.dma.LocalAddr -import freechips.rocketchip.tile._ -import org.chipsalliance.cde.config.Parameters - -// Detailed decode output for Mem domain -class MemDecodeCmd(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val is_load = Bool() - val is_store = Bool() - - // Memory address - val mem_addr = UInt(b.memAddrLen.W) - - // Iteration count - val iter = UInt(10.W) - - // Scratchpad address and bank information - // 3 bits, supports 8 banks (SPAD+ACC) - val sp_bank = UInt(log2Up(b.sp_banks + b.acc_banks).W) - // 12 bits, uses SPAD row count (sufficient to accommodate ACC's 10-bit address) - val sp_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - - val special = UInt(40.W) -} - - -// LS decode fields -object LSDecodeFields extends Enumeration { - type Field = Value - val LD_EN, ST_EN, MEMADDR, SPADDR, ITER, SPECIAL, VALID = Value -} - -// Default constants for Mem decoder -object MemDefaultConstants { - val Y = true.B - val N = false.B - val DADDR = 0.U(14.W) - val DITER = 0.U(10.W) - val DSPECIAL = 0.U(40.W) -} - -class MemDomainDecoder(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - import MemDefaultConstants._ - - val io = IO(new Bundle { - val raw_cmd_i = Flipped(Decoupled(new PostGDCmd)) - val mem_decode_cmd_o = Decoupled(new MemDecodeCmd) - }) - - val spAddrLen = b.spAddrLen - val memAddrLen = b.memAddrLen - - // Only process Mem instructions - io.raw_cmd_i.ready := io.mem_decode_cmd_o.ready - - val func7 = io.raw_cmd_i.bits.raw_cmd.inst.funct - val rs1 = io.raw_cmd_i.bits.raw_cmd.rs1 - val rs2 = io.raw_cmd_i.bits.raw_cmd.rs2 - - // Load/Store instruction decoding - import LSDecodeFields._ - val ls_default_decode = List(N,N,DADDR,DADDR,DITER,DSPECIAL,N) - val ls_decode_list = ListLookup(func7, ls_default_decode, Array( - MVIN_BITPAT -> List(Y,N,rs1(memAddrLen-1,0),rs2(spAddrLen-1,0),rs2(spAddrLen+9,spAddrLen),rs2(63,spAddrLen + 10),Y), // mvin - MVOUT_BITPAT -> List(N,Y,rs1(memAddrLen-1,0),rs2(spAddrLen-1,0),rs2(spAddrLen+9,spAddrLen),rs2(63,spAddrLen + 10),Y) // mvout - )) - - assert(!(io.raw_cmd_i.fire && !ls_decode_list(LSDecodeFields.VALID.id).asBool), - s"MemDomainDecoder: Invalid command opcode, func7 = 0x%x\n", func7) - -// ----------------------------------------------------------------------------- -// Output assignment -// ----------------------------------------------------------------------------- - io.mem_decode_cmd_o.valid := io.raw_cmd_i.valid && io.raw_cmd_i.bits.is_mem - - io.mem_decode_cmd_o.bits.is_load := Mux(io.mem_decode_cmd_o.valid, ls_decode_list(LSDecodeFields.LD_EN.id).asBool, false.B) - io.mem_decode_cmd_o.bits.is_store := Mux(io.mem_decode_cmd_o.valid, ls_decode_list(LSDecodeFields.ST_EN.id).asBool, false.B) - io.mem_decode_cmd_o.bits.mem_addr := Mux(io.mem_decode_cmd_o.valid, ls_decode_list(LSDecodeFields.MEMADDR.id).asUInt, 0.U(b.memAddrLen.W)) - io.mem_decode_cmd_o.bits.iter := Mux(io.mem_decode_cmd_o.valid, ls_decode_list(LSDecodeFields.ITER.id).asUInt, 0.U(10.W)) - - - // Address parsing - val ls_spaddr = ls_decode_list(LSDecodeFields.SPADDR.id).asUInt - val ls_laddr = LocalAddr.cast_to_sp_addr(b.local_addr_t, ls_spaddr) - - io.mem_decode_cmd_o.bits.sp_bank := Mux(io.mem_decode_cmd_o.valid, ls_laddr.mem_bank(), 0.U(log2Up(b.sp_banks + b.acc_banks).W)) - io.mem_decode_cmd_o.bits.sp_bank_addr := Mux(io.mem_decode_cmd_o.valid, ls_laddr.mem_row(), 0.U(log2Up(b.spad_bank_entries + b.acc_bank_entries).W)) - io.mem_decode_cmd_o.bits.special := Mux(io.mem_decode_cmd_o.valid, ls_decode_list(LSDecodeFields.SPECIAL.id).asUInt, 0.U(40.W)) -} diff --git a/arch/src/main/scala/framework/memdomain/MemController.scala b/arch/src/main/scala/framework/memdomain/MemController.scala deleted file mode 100644 index 891c802b..00000000 --- a/arch/src/main/scala/framework/memdomain/MemController.scala +++ /dev/null @@ -1,45 +0,0 @@ -package framework.memdomain - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.mem.{SramReadIO, SramWriteIO, Scratchpad} - -/** - * MemController: Controller that encapsulates scratchpad and accumulator - * Provides DMA interface and Ball Domain interface - */ -class MemController(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - // DMA interface - for MemLoader and MemStorer access - val dma = new Bundle { - val sramRead = Vec(b.sp_banks, new SramReadIO(b.spad_bank_entries, b.spad_w)) - val sramWrite = Vec(b.sp_banks, new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len)) - val accRead = Vec(b.acc_banks, new SramReadIO(b.acc_bank_entries, b.acc_w)) - val accWrite = Vec(b.acc_banks, new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len)) - } - - // Ball Domain interface - for BallController access - val ballDomain = new Bundle { - val sramRead = Vec(b.sp_banks, new SramReadIO(b.spad_bank_entries, b.spad_w)) - val sramWrite = Vec(b.sp_banks, new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len)) - val accRead = Vec(b.acc_banks, new SramReadIO(b.acc_bank_entries, b.acc_w)) - val accWrite = Vec(b.acc_banks, new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len)) - } - }) - - val spad = Module(new Scratchpad(b)) - - // Connect DMA interface to Scratchpad's DMA ports - io.dma.sramRead <> spad.io.dma.sramread - io.dma.sramWrite <> spad.io.dma.sramwrite - io.dma.accRead <> spad.io.dma.accread - io.dma.accWrite <> spad.io.dma.accwrite - - // Connect Ball Domain interface to Scratchpad's execution ports - io.ballDomain.sramRead <> spad.io.exec.sramread - io.ballDomain.sramWrite <> spad.io.exec.sramwrite - io.ballDomain.accRead <> spad.io.exec.accread - io.ballDomain.accWrite <> spad.io.exec.accwrite -} diff --git a/arch/src/main/scala/framework/memdomain/MemDomain.scala b/arch/src/main/scala/framework/memdomain/MemDomain.scala index 4efc9b33..56cf7eb4 100644 --- a/arch/src/main/scala/framework/memdomain/MemDomain.scala +++ b/arch/src/main/scala/framework/memdomain/MemDomain.scala @@ -2,138 +2,106 @@ package framework.memdomain import chisel3._ import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.frontend.decoder.PostGDCmd +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} import freechips.rocketchip.tile._ -import framework.memdomain.dma.{BBReadRequest, BBReadResponse, BBWriteRequest, BBWriteResponse} -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.memdomain.{MemLoader, MemStorer, MemController} -import framework.memdomain.rs.MemReservationStation -import framework.memdomain.tlb.{BBTLBCluster, BBTLBIO, BBTLBExceptionIO} -import framework.memdomain.pmc.MemCyclePMC -import freechips.rocketchip.tilelink.TLEdgeOut -import freechips.rocketchip.rocket.TLBPTWIO -import framework.frontend.globalrs.{GlobalRsIssue, GlobalRsComplete} - -class MemDomain(implicit b: CustomBuckyballConfig, p: Parameters, edge: TLEdgeOut) extends Module { +import framework.balldomain.blink.{BankRead, BankWrite} +import freechips.rocketchip.tilelink.{TLBundle, TLEdgeOut} +import framework.frontend.globalrs.{GlobalSchedComplete, GlobalSchedIssue} +import framework.top.GlobalConfig +import framework.memdomain.backend.MemRequestIO +import framework.memdomain.frontend.MemFrontend +import framework.memdomain.frontend.outside_channel.{MemConfigerIO} +import framework.memdomain.frontend.outside_channel.tlb.{BBTLBExceptionIO, BBTLBPTWIO} +import framework.memdomain.midend.MemMidend +import framework.memdomain.backend.MemBackend + +@instantiable +class MemDomain(val b: GlobalConfig)(edge: TLEdgeOut) extends Module { + val totalBallRead = b.ballDomain.ballIdMappings.map(_.inBW).sum + val totalBallWrite = b.ballDomain.ballIdMappings.map(_.outBW).sum + + @public val io = IO(new Bundle { - // Issue interface from global RS (single channel) - val global_issue_i = Flipped(Decoupled(new GlobalRsIssue)) - - // Report completion to global RS (single channel) - val global_complete_o = Decoupled(new GlobalRsComplete) - - // SRAM interface for interaction with Ball Domain + // ------------------------------------------------- + // Command Channel + // ------------------------------------------------- + val global_issue_i = Flipped(Decoupled(new GlobalSchedIssue(b))) + val global_complete_o = Decoupled(new GlobalSchedComplete(b)) + val busy = Output(Bool()) + + // ------------------------------------------------- + // Inside Channel + // ------------------------------------------------- val ballDomain = new Bundle { - val sramRead = Vec(b.sp_banks, new SramReadIO(b.spad_bank_entries, b.spad_w)) - val sramWrite = Vec(b.sp_banks, new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len)) - val accRead = Vec(b.acc_banks, new SramReadIO(b.acc_bank_entries, b.acc_w)) - val accWrite = Vec(b.acc_banks, new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len)) + val bankRead = Vec(totalBallRead, new BankRead(b)) + val bankWrite = Vec(totalBallWrite, new BankWrite(b)) } - // DMA interface - val dma = new Bundle { - val read = new Bundle { - val req = Decoupled(new BBReadRequest()) - val resp = Flipped(Decoupled(new BBReadResponse(b.spad_w))) - } - val write = new Bundle { - val req = Decoupled(new BBWriteRequest(b.spad_w)) - val resp = Flipped(Decoupled(new BBWriteResponse)) - } - } - - // TLB interface - exposed externally for DMA use - val tlb = Vec(2, Flipped(new BBTLBIO)) - - // PTW interface - needs to connect to upper level PTW (shared TLB has only 1 PTW) - val ptw = Vec(1, new TLBPTWIO) - - // TLB exception interface - exposed to upper level for handling flush, etc. (shared TLB has only 1 exp) - val tlbExp = Vec(1, new BBTLBExceptionIO) - - // Busy signal - val busy = Output(Bool()) + // ------------------------------------------------- + // Outside Channel + // ------------------------------------------------- + val ptw = Vec(1, new BBTLBPTWIO(b)) + val tlbExp = Vec(1, new BBTLBExceptionIO) + val tl_reader = new TLBundle(edge.bundle) + val tl_writer = new TLBundle(edge.bundle) + val hartid = Input(UInt(b.core.xLen.W)) + + // ------------------------------------------------- + // Shared memory path — exposed for tile-level sharing + // ------------------------------------------------- + val shared_mem_req = Vec(b.memDomain.bankChannel, new MemRequestIO(b)) + val shared_config = Decoupled(new MemConfigerIO(b)) + val shared_query_vbank_id = Output(UInt(8.W)) + val shared_query_group_count = Input(UInt(4.W)) }) - val memDecoder = Module(new MemDomainDecoder) - val memRs = Module(new MemReservationStation) - val memLoader = Module(new MemLoader) - val memStorer = Module(new MemStorer) - - // Internal MemController (encapsulates spad and acc) - val memController = Module(new MemController) - - // TLB cluster - use shared TLB like Gemmini - val tlbCluster = Module(new BBTLBCluster(2, b.tlb_size, b.dma_maxbytes, use_shared_tlb = true)) - -// ----------------------------------------------------------------------------- -// Global RS -> MemDecoder -// ----------------------------------------------------------------------------- - memDecoder.io.raw_cmd_i.valid := io.global_issue_i.valid - memDecoder.io.raw_cmd_i.bits := io.global_issue_i.bits.cmd - io.global_issue_i.ready := memDecoder.io.raw_cmd_i.ready - -// ----------------------------------------------------------------------------- -// MemDecoder -> MemReservationStation -// ----------------------------------------------------------------------------- - // Connect decoded instruction and global rob_id - memRs.io.mem_decode_cmd_i.valid := memDecoder.io.mem_decode_cmd_o.valid - memRs.io.mem_decode_cmd_i.bits.cmd := memDecoder.io.mem_decode_cmd_o.bits - memRs.io.mem_decode_cmd_i.bits.rob_id := io.global_issue_i.bits.rob_id - memDecoder.io.mem_decode_cmd_o.ready := memRs.io.mem_decode_cmd_i.ready - -// ----------------------------------------------------------------------------- -// MemReservationStation -> MemLoader/MemStorer -// ----------------------------------------------------------------------------- - memLoader.io.cmdReq <> memRs.io.issue_o.ld - memStorer.io.cmdReq <> memRs.io.issue_o.st - memRs.io.commit_i.ld <> memLoader.io.cmdResp - memRs.io.commit_i.st <> memStorer.io.cmdResp - -// ----------------------------------------------------------------------------- -// PMC - Performance Monitor Counter -// ----------------------------------------------------------------------------- - val pmc = Module(new MemCyclePMC) - pmc.io.ldReq_i.valid := memRs.io.issue_o.ld.fire - pmc.io.ldReq_i.bits := memRs.io.issue_o.ld.bits - pmc.io.stReq_i.valid := memRs.io.issue_o.st.fire - pmc.io.stReq_i.bits := memRs.io.issue_o.st.bits - pmc.io.ldResp_o.valid := memLoader.io.cmdResp.fire - pmc.io.ldResp_o.bits := memLoader.io.cmdResp.bits - pmc.io.stResp_o.valid := memStorer.io.cmdResp.fire - pmc.io.stResp_o.bits := memStorer.io.cmdResp.bits - - // Connect MemLoader and MemStorer to DMA - memLoader.io.dmaReq <> io.dma.read.req - io.dma.read.resp <> memLoader.io.dmaResp - memStorer.io.dmaReq <> io.dma.write.req - io.dma.write.resp <> memStorer.io.dmaResp - - // Connect TLB - now using internal BBTLBCluster - io.tlb <> tlbCluster.io.clients - io.ptw <> tlbCluster.io.ptw - - // Connect exception interface - note direction: internal TLB's exp is Output, external interface is Input - tlbCluster.io.exp <> io.tlbExp - - // Connect MemLoader and MemStorer to MemController's DMA interface - memLoader.io.sramWrite <> memController.io.dma.sramWrite - memLoader.io.accWrite <> memController.io.dma.accWrite - memStorer.io.sramRead <> memController.io.dma.sramRead - memStorer.io.accRead <> memController.io.dma.accRead - - // Ball Domain SRAM interface connected to MemController's Ball Domain interface - io.ballDomain.sramRead <> memController.io.ballDomain.sramRead - io.ballDomain.sramWrite <> memController.io.ballDomain.sramWrite - io.ballDomain.accRead <> memController.io.ballDomain.accRead - io.ballDomain.accWrite <> memController.io.ballDomain.accWrite - - // Completion signal connected to global RS - io.global_complete_o <> memRs.io.complete_o - - // Busy signal - // Simple busy signal - io.busy := !memRs.io.complete_o.ready + val frontend: Instance[MemFrontend] = Instantiate(new MemFrontend(b)(edge)) + val midend: Instance[MemMidend] = Instantiate(new MemMidend(b)) + val backend: Instance[MemBackend] = Instantiate(new MemBackend(b)) + + // Connect query interface from frontend to backend + backend.io.query_vbank_id := frontend.io.query_vbank_id + backend.io.query_is_shared := frontend.io.query_is_shared + frontend.io.query_group_count := backend.io.query_group_count + frontend.io.hartid := io.hartid + + // Shared query: backend delegates shared query to external SharedMemBackend + backend.io.shared_query_group_count := io.shared_query_group_count + io.shared_query_vbank_id := backend.io.shared_query_vbank_id + + // ------------------------------------------------- + // Connection with outside (all in frontend) + // ------------------------------------------------- + frontend.io.global_issue_i <> io.global_issue_i + frontend.io.global_complete_o <> io.global_complete_o + io.busy := frontend.io.busy + + frontend.io.ptw <> io.ptw + frontend.io.tlbExp <> io.tlbExp + + io.tl_reader <> frontend.io.tl_reader + io.tl_writer <> frontend.io.tl_writer + + // Ball Domain interface connects to midend unified bankRead/bankWrite + // Indices [0, totalBallRead) are balldomain; last index is frontend (DMA) + for (i <- 0 until totalBallRead) { + midend.io.bankRead(i).bankRead <> io.ballDomain.bankRead(i) + midend.io.bankRead(i).is_shared := false.B + } + for (i <- 0 until totalBallWrite) { + midend.io.bankWrite(i).bankWrite <> io.ballDomain.bankWrite(i) + midend.io.bankWrite(i).is_shared := false.B + } + midend.io.bankRead(totalBallRead).bankRead <> frontend.io.interdma.bankRead + midend.io.bankRead(totalBallRead).is_shared := frontend.io.interdma.read_is_shared + midend.io.bankWrite(totalBallWrite).bankWrite <> frontend.io.interdma.bankWrite + midend.io.bankWrite(totalBallWrite).is_shared := frontend.io.interdma.write_is_shared + midend.io.hartid := io.hartid + + midend.io.mem_req <> backend.io.mem_req + backend.io.config <> frontend.io.config + + // Shared path passthrough + io.shared_mem_req <> backend.io.shared_mem_req + io.shared_config <> backend.io.shared_config } diff --git a/arch/src/main/scala/framework/memdomain/MemLoader.scala b/arch/src/main/scala/framework/memdomain/MemLoader.scala deleted file mode 100644 index c55b63ff..00000000 --- a/arch/src/main/scala/framework/memdomain/MemLoader.scala +++ /dev/null @@ -1,130 +0,0 @@ -package framework.memdomain - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.rs.{MemRsIssue, MemRsComplete} -import framework.memdomain.mem.SramWriteIO -import framework.memdomain.dma.{BBReadRequest, BBReadResponse, LocalAddr} -import freechips.rocketchip.rocket.MStatus - -class MemLoader(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val rob_id_width = log2Up(b.rob_entries) - - val io = IO(new Bundle { - // Load instruction from ReservationStation - val cmdReq = Flipped(Decoupled(new MemRsIssue)) - // Completion signal sent to ReservationStation - val cmdResp = Decoupled(new MemRsComplete) - // Direct connection to DMA read interface - val dmaReq = Decoupled(new BBReadRequest()) - val dmaResp = Flipped(Decoupled(new BBReadResponse(b.spad_w))) - // Connected to Scratchpad SRAM write interface - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - }) - - val s_idle :: s_dma_req :: s_dma_wait :: Nil = Enum(3) - val state = RegInit(s_idle) - - val rob_id_reg = RegInit(0.U(rob_id_width.W)) - // Cache mem_addr - val mem_addr_reg = Reg(UInt(b.memAddrLen.W)) - // Cache iteration count - val iter_reg = Reg(UInt(10.W)) - // Count number of responses received, supports up to 16 responses - val resp_count = Reg(UInt(log2Up(16).W)) - - // Cache decoded bank information - val wr_bank_reg = Reg(UInt(log2Up(b.sp_banks + b.acc_banks).W)) - val wr_bank_addr_reg = Reg(UInt(log2Up(b.spad_bank_entries).W)) - // Whether this is an acc bank operation - val is_acc_reg = RegInit(false.B) - // Cache stride - val stride_reg = Reg(UInt(10.W)) - - // Receive load instruction - io.cmdReq.ready := state === s_idle - - when (io.cmdReq.fire && io.cmdReq.bits.cmd.is_load) { - state := s_dma_req - rob_id_reg := io.cmdReq.bits.rob_id - mem_addr_reg := io.cmdReq.bits.cmd.mem_addr - iter_reg := io.cmdReq.bits.cmd.iter - wr_bank_reg := io.cmdReq.bits.cmd.sp_bank - wr_bank_addr_reg := io.cmdReq.bits.cmd.sp_bank_addr - // Determine if acc based on bank - is_acc_reg := (io.cmdReq.bits.cmd.sp_bank >= b.sp_banks.U) - stride_reg := io.cmdReq.bits.cmd.special(10,0) - resp_count := 0.U - } - - // Issue DMA read request - read iter_reg rows of data - io.dmaReq.valid := state === s_dma_req - io.dmaReq.bits.vaddr := mem_addr_reg - // Byte count of iter rows of data - io.dmaReq.bits.len := iter_reg * (b.veclane * b.inputType.getWidth / 8).U - // Simplified: use default status - io.dmaReq.bits.status := 0.U.asTypeOf(new MStatus) - io.dmaReq.bits.stride := stride_reg - - when (io.dmaReq.fire) { - state := s_dma_wait - // Reset response counter - resp_count := 0.U - } - - // Wait for DMA response - io.dmaResp.ready := state === s_dma_wait - - when (io.dmaResp.fire) { - resp_count := resp_count + 1.U - // Return to idle state when last response is received - when (io.dmaResp.bits.last) { - state := s_idle - } - } - - // Stream write to SRAM - write immediately upon receiving each response - // Calculate current write bank and address - // Use address counter from DMA response - val current_bank_addr = wr_bank_addr_reg + io.dmaResp.bits.addrcounter - // All responses write to the same bank - val target_bank = wr_bank_reg - val target_row = current_bank_addr - - for (i <- 0 until b.sp_banks) { - io.sramWrite(i).req.valid := io.dmaResp.fire && (target_bank === i.U) - io.sramWrite(i).req.bits.addr := target_row - io.sramWrite(i).req.bits.data := io.dmaResp.bits.data - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(true.B)) - } - // Default assignment - for (i <- 0 until b.acc_banks) { - io.accWrite(i).req.valid := false.B - io.accWrite(i).req.bits.addr := 0.U - io.accWrite(i).req.bits.data := 0.U - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(false.B)) - } - - for (i <- 0 until b.acc_banks/2) { - when(io.dmaResp.fire && is_acc_reg){ - when(io.dmaResp.bits.addrcounter(2)){ - io.accWrite(i).req.valid := target_row(log2Ceil(b.acc_banks/2) - 1, 0) === i.U - io.accWrite(i).req.bits.addr := wr_bank_addr_reg + (io.dmaResp.bits.addrcounter >> (log2Ceil(b.acc_banks/2) + 1)) - io.accWrite(i).req.bits.data := io.dmaResp.bits.data - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - }.otherwise{ - io.accWrite(i + b.acc_banks/2).req.valid := target_row(log2Ceil(b.acc_banks/2) - 1, 0) === i.U - io.accWrite(i + b.acc_banks/2).req.bits.addr := wr_bank_addr_reg + (io.dmaResp.bits.addrcounter >> (log2Ceil(b.acc_banks/2) + 1)) - io.accWrite(i + b.acc_banks/2).req.bits.data := io.dmaResp.bits.data - io.accWrite(i + b.acc_banks/2).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - } - } - } - - // Send completion signal - only send when last response is received - io.cmdResp.valid := io.dmaResp.fire && io.dmaResp.bits.last - io.cmdResp.bits.rob_id := rob_id_reg -} diff --git a/arch/src/main/scala/framework/memdomain/MemStorer.scala b/arch/src/main/scala/framework/memdomain/MemStorer.scala deleted file mode 100644 index fb3c5a61..00000000 --- a/arch/src/main/scala/framework/memdomain/MemStorer.scala +++ /dev/null @@ -1,308 +0,0 @@ -package framework.memdomain - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.rs.{MemRsIssue, MemRsComplete} -import freechips.rocketchip.rocket.MStatus -import framework.memdomain.mem.SramReadIO -import framework.memdomain.dma.{BBWriteRequest, BBWriteResponse, LocalAddr} - -class MemStorer(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val rob_id_width = log2Up(b.rob_entries) - // Byte count of one row of data - val line_bytes = b.spad_w / 8 - // 16-byte alignment - val align_bytes = 16 - - val io = IO(new Bundle { - // Store instruction from ReservationStation - val cmdReq = Flipped(Decoupled(new MemRsIssue)) - // Completion signal sent to ReservationStation - val cmdResp = Decoupled(new MemRsComplete) - // Direct connection to DMA write interface - val dmaReq = Decoupled(new BBWriteRequest(b.spad_w)) - val dmaResp = Flipped(Decoupled(new BBWriteResponse)) - // Connected to Scratchpad SRAM read interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, b.spad_w))) - val accRead = Vec(b.acc_banks, Flipped(new SramReadIO(b.acc_bank_entries, b.acc_w))) - }) - - val s_idle :: s_sram_req :: s_dma_wait :: Nil = Enum(3) - val state = RegInit(s_idle) - - val rob_id_reg = RegInit(0.U(rob_id_width.W)) - val mem_addr_reg = Reg(UInt(b.memAddrLen.W)) - val iter_reg = Reg(UInt(10.W)) - val sram_count = Reg(UInt(10.W)) - // Whether this is an acc bank operation - val acc_reg = RegInit(false.B) - // Used to alternately read two acc banks - val acc_flip_reg = RegInit(false.B) - // Cache stride - val stride_reg = Reg(UInt(10.W)) - // Cache decoded bank information - // Need 3 bits to support 8 banks (SPAD+ACC) - val rd_bank_reg = Reg(UInt(log2Up(b.sp_banks + b.acc_banks).W)) - val rd_bank_addr_reg = Reg(UInt(log2Up(b.spad_bank_entries).W)) - - // Data buffer related registers - // 16-byte buffer - val data_buffer = Reg(UInt((align_bytes * 8).W)) - // Number of valid bytes in buffer - val buffer_valid_bytes = Reg(UInt(log2Ceil(align_bytes + 1).W)) - // Starting address corresponding to buffer - val buffer_start_addr = Reg(UInt(b.memAddrLen.W)) - - // Receive store instruction - io.cmdReq.ready := state === s_idle - - when (io.cmdReq.fire && io.cmdReq.bits.cmd.is_store) { - state := s_sram_req - rob_id_reg := io.cmdReq.bits.rob_id - mem_addr_reg := io.cmdReq.bits.cmd.mem_addr - iter_reg := io.cmdReq.bits.cmd.iter - rd_bank_reg := io.cmdReq.bits.cmd.sp_bank - rd_bank_addr_reg := io.cmdReq.bits.cmd.sp_bank_addr - sram_count := 0.U - // Determine if acc based on bank - acc_reg := (io.cmdReq.bits.cmd.sp_bank >= b.sp_banks.U) - // Reset flip register - acc_flip_reg := true.B - stride_reg := io.cmdReq.bits.cmd.special(10,0) - // Initialize buffer state - buffer_valid_bytes := 0.U - } - - // Stream read SRAM data - // Calculate current read bank and address - val current_bank_addr = rd_bank_addr_reg + sram_count - // All reads come from the same bank - val target_bank = rd_bank_reg - val target_row = current_bank_addr - - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := (state === s_sram_req) && (target_bank === i.U) && !acc_reg - io.sramRead(i).req.bits.addr := target_row - io.sramRead(i).req.bits.fromDMA := true.B - } - - // Default assignment - for (i <- 0 until b.acc_banks) { - io.accRead(i).req.valid := false.B - io.accRead(i).req.bits.addr := 0.U - io.accRead(i).req.bits.fromDMA := true.B - } - - for (i <- 0 until b.acc_banks/2){ - when((state === s_sram_req) && acc_reg){ - when(sram_count(2) === 0.U){ - io.accRead(i).req.valid := i.U === target_row(log2Ceil(b.acc_banks/2) - 1, 0) - io.accRead(i).req.bits.addr := rd_bank_addr_reg + (sram_count >> (log2Ceil(b.acc_banks/2) + 1)) - io.accRead(i).req.bits.fromDMA := true.B - }.otherwise{ - io.accRead(i + b.acc_banks/2).req.valid := i.U === target_row(log2Ceil(b.acc_banks/2) - 1, 0) - io.accRead(i + b.acc_banks/2).req.bits.addr := rd_bank_addr_reg + (sram_count >> (log2Ceil(b.acc_banks/2) + 1)) - io.accRead(i + b.acc_banks/2).req.bits.fromDMA := true.B - } - } - } - - // SRAM response processing - val sram_resp_valid = io.sramRead.map(_.resp.valid).reduce(_ || _) - val sram_resp_data = Mux1H(io.sramRead.map(_.resp.valid), io.sramRead.map(_.resp.bits.data)) - val acc_resp_valid = io.accRead.map(_.resp.valid).reduce(_ || _) - val acc_resp_data = Mux1H(io.accRead.map(_.resp.valid), io.accRead.map(_.resp.bits.data)) - - // Calculate memory address corresponding to current row - val current_mem_addr = mem_addr_reg + sram_count(1,0) * line_bytes.U + ((sram_count >> 2) << 2) * stride_reg * line_bytes.U - // Lower 4 bits of address, 0 when 16-byte aligned - val addr_offset = current_mem_addr(log2Ceil(align_bytes) - 1, 0) - val aligned_addr = Cat(current_mem_addr(b.memAddrLen - 1, log2Ceil(align_bytes)), 0.U(log2Ceil(align_bytes).W)) - val is_aligned = addr_offset === 0.U - dontTouch(is_aligned) - dontTouch(aligned_addr) - - // Data merge logic (line_bytes = 16 bytes) - val incoming_data = Mux(sram_resp_valid, sram_resp_data.asUInt, acc_resp_data.asUInt) - // Always 16 bytes - val incoming_bytes = 16.U - - // Data merged into buffer - val merged_data = Wire(UInt((align_bytes * 8).W)) - val total_valid_bytes = Wire(UInt(log2Ceil(align_bytes * 2).W)) - val is_last_iter = (sram_count >= (iter_reg - 1.U) && iter_reg > 0.U) || iter_reg === 0.U - - when (buffer_valid_bytes === 0.U) { - // Buffer is empty - when (addr_offset === 0.U) { - // Address is aligned, use data directly - merged_data := incoming_data - total_valid_bytes := incoming_bytes - }.otherwise { - // Address not aligned, first time: use low bits of new data as high bits of send data, pad low bits with 0 - val new_data_low = incoming_data & ((1.U << (addr_offset * 8.U)) - 1.U) - merged_data := new_data_low << (addr_offset * 8.U) - total_valid_bytes := align_bytes.U - } - }.otherwise { - // Buffer has data, concatenate: low bits of new data as high bits + buffer data as low bits - val new_data_low = incoming_data & ((1.U << (addr_offset * 8.U)) - 1.U) - merged_data := (new_data_low << (addr_offset * 8.U)) | data_buffer - // Always 16 bytes - total_valid_bytes := align_bytes.U - } - - // Send logic: except for last iteration, can always fill 16 bytes - val can_send_full_line = total_valid_bytes >= align_bytes.U - val send_bytes = Mux(can_send_full_line, align_bytes.U, total_valid_bytes) - - // Determine send address - always use aligned address - val send_addr = Mux(buffer_valid_bytes === 0.U, aligned_addr, - Cat(buffer_start_addr(b.memAddrLen - 1, log2Ceil(align_bytes)), 0.U(log2Ceil(align_bytes).W))) - - // DMA request logic - val should_send_normal = (sram_resp_valid || acc_resp_valid) && can_send_full_line - val should_send_first_unaligned = (sram_resp_valid || acc_resp_valid) && (buffer_valid_bytes === 0.U && addr_offset =/= 0.U) - val should_send_last = (sram_resp_valid || acc_resp_valid) && is_last_iter && !can_send_full_line - val should_send = should_send_normal || should_send_first_unaligned || should_send_last - - // Add a flag to track whether all data has been processed - // Completion detection logic - supports two cases: - // 1. Clean data completion (fully aligned case) - val aligned_completion = buffer_valid_bytes === 0.U && is_last_iter - // 2. Remaining data needs to be sent (unaligned case) - // Has remaining data - val has_remaining_data_completion = buffer_valid_bytes > 0.U && is_last_iter - val unaligned_completion_final_send = has_remaining_data_completion && !(sram_resp_valid || acc_resp_valid) - - // Generate mask - val send_mask = Wire(UInt(align_bytes.W)) - when (buffer_valid_bytes === 0.U && addr_offset =/= 0.U) { - // First unaligned: send high bits of new data, mask on high bits - val valid_bytes = align_bytes.U - addr_offset - // 0xFF00 (if addr_offset=8) - send_mask := ((1.U << valid_bytes) - 1.U) << addr_offset - }.elsewhen (buffer_valid_bytes > 0.U && can_send_full_line) { - // Middle concatenation: send full 16 bytes - send_mask := ~0.U(align_bytes.W) // 0xFFFF - }.elsewhen (unaligned_completion_final_send) { - // Last send remaining buffer data: buffer data in low bits - send_mask := (1.U << buffer_valid_bytes) - 1.U // 0x00FF - }.otherwise { - // Aligned case: full data - send_mask := ~0.U(align_bytes.W) // 0xFFFF - } - - // DMA request signal control logic - can only update when DMA is ready - val dma_req_valid_reg = RegInit(false.B) - val dma_req_vaddr_reg = RegInit(0.U(b.memAddrLen.W)) - val dma_req_data_reg = RegInit(0.U((align_bytes * 8).W)) - val dma_req_len_reg = RegInit(0.U(8.W)) - val dma_req_mask_reg = RegInit(0.U(align_bytes.W)) - val dma_req_status_reg = RegInit(0.U.asTypeOf(new MStatus)) - - // Calculate DMA request signals - val dma_req_valid_next = (should_send || unaligned_completion_final_send) && (state === s_sram_req || state === s_dma_wait) - val dma_req_vaddr_next = Mux(unaligned_completion_final_send, buffer_start_addr, send_addr) - val dma_req_data_next = Mux(unaligned_completion_final_send, data_buffer, merged_data) - val dma_req_len_next = align_bytes.U - val dma_req_mask_next = Mux(unaligned_completion_final_send, (1.U << buffer_valid_bytes) - 1.U, send_mask) - val dma_req_status_next = 0.U.asTypeOf(new MStatus) - - // Only update registers when DMA is ready - when (io.dmaReq.ready) { - dma_req_valid_reg := dma_req_valid_next - dma_req_vaddr_reg := dma_req_vaddr_next - dma_req_data_reg := dma_req_data_next - dma_req_len_reg := dma_req_len_next - dma_req_mask_reg := dma_req_mask_next - dma_req_status_reg := dma_req_status_next - } - - // Connect to DMA interface - io.dmaReq.valid := dma_req_valid_reg - io.dmaReq.bits.vaddr := dma_req_vaddr_reg - io.dmaReq.bits.data := dma_req_data_reg - io.dmaReq.bits.len := dma_req_len_reg - io.dmaReq.bits.mask := dma_req_mask_reg - io.dmaReq.bits.status := dma_req_status_reg - - // Connect SRAM response ready signal - based on DMA ready state - io.sramRead.foreach(_.resp.ready := io.dmaReq.ready && (state === s_sram_req || state === s_dma_wait)) - io.accRead.foreach(_.resp.ready := io.dmaReq.ready && (state === s_sram_req || state === s_dma_wait)) - // State transition and counter update - when (io.sramRead.map(_.req.fire).reduce(_ || _)) { - state := s_dma_wait - } - - when (io.dmaReq.fire) { - when (!unaligned_completion_final_send) { - sram_count := sram_count + 1.U - } - - // Update buffer state - when (addr_offset =/= 0.U && (sram_resp_valid || acc_resp_valid)) { - // Unaligned case: cache high bits of new data - // Cache is the high bits part - val remaining_bytes = align_bytes.U - addr_offset - data_buffer := incoming_data >> (addr_offset * 8.U) - buffer_valid_bytes := remaining_bytes - // Update buffer corresponding address (point to next 16-byte aligned address) - when (buffer_valid_bytes === 0.U) { - buffer_start_addr := aligned_addr + align_bytes.U - }.otherwise { - buffer_start_addr := buffer_start_addr + align_bytes.U - } - }.elsewhen (unaligned_completion_final_send) { - // Sent final remaining data, clear buffer - buffer_valid_bytes := 0.U - }.otherwise { - // In aligned case, if previous buffer data was merged and sent, need to clear buffer - when (buffer_valid_bytes > 0.U && can_send_full_line && sram_resp_valid) { - buffer_valid_bytes := 0.U - } - } - - // Fix state transition logic - when (unaligned_completion_final_send) { - // Only return to idle after unaligned_completion_final_send completes - state := s_idle - }.elsewhen (aligned_completion) { - // All data has been sent - state := s_idle - }.elsewhen (sram_count + 1.U >= iter_reg && iter_reg > 0.U) { - // Iteration ended, but there may still be buffer data to send - when (buffer_valid_bytes > 0.U) { - // Maintain state, wait for unaligned_completion_final_send - state := s_dma_wait - }.otherwise { - state := s_idle - } - }.elsewhen (iter_reg === 0.U) { - state := s_idle - }.otherwise { - state := s_sram_req - } - } - - // Wait for DMA to truly complete - io.dmaResp.ready := true.B - - // Fix completion signal logic - only issue completion signal after all data transfer is truly complete - val task_complete = RegInit(false.B) - when (io.cmdReq.fire && io.cmdReq.bits.cmd.is_store) { - task_complete := false.B - }.elsewhen (io.dmaReq.fire && (unaligned_completion_final_send || aligned_completion)) { - task_complete := true.B - } - - io.cmdResp.valid := task_complete && (state === s_idle) - io.cmdResp.bits.rob_id := rob_id_reg - - // Reset flag after sending completion signal - when (io.cmdResp.fire) { - task_complete := false.B - } -} diff --git a/arch/src/main/scala/framework/memdomain/README.md b/arch/src/main/scala/framework/memdomain/README.md index 5e7f8b30..4a3ebf47 100644 --- a/arch/src/main/scala/framework/memdomain/README.md +++ b/arch/src/main/scala/framework/memdomain/README.md @@ -126,12 +126,12 @@ Main Memory ←→ DMA Engine ←→ MemController ←→ Scratchpad/Accumulator ### dma/ - DMA Engines -**BBStreamReader**: Streaming data reader +**StreamReader**: Streaming data reader - TileLink interface for memory access - TLB support for virtual addressing - Transaction ID management for multiple outstanding requests -**BBStreamWriter**: Streaming data writer +**StreamWriter**: Streaming data writer - Handles data alignment (16-byte aligned) - Generates byte masks for partial writes - TLB support diff --git a/arch/src/main/scala/framework/memdomain/backend/MTrace.scala b/arch/src/main/scala/framework/memdomain/backend/MTrace.scala new file mode 100644 index 00000000..769d0677 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/MTrace.scala @@ -0,0 +1,57 @@ +package framework.memdomain.backend + +import chisel3._ +import chisel3.util._ + +// DPI-C BlackBox for memory trace +class MTraceDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val is_write = Input(UInt(8.W)) + val is_shared = Input(UInt(8.W)) + val channel = Input(UInt(32.W)) + val hart_id = Input(UInt(64.W)) + val vbank_id = Input(UInt(32.W)) + val group_id = Input(UInt(32.W)) + val addr = Input(UInt(32.W)) + val data_lo = Input(UInt(64.W)) + val data_hi = Input(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "MTraceDPI.v", + """ + |import "DPI-C" function void dpi_mtrace( + | input byte unsigned is_write, + | input byte unsigned is_shared, + | input int unsigned channel, + | input longint unsigned hart_id, + | input int unsigned vbank_id, + | input int unsigned group_id, + | input int unsigned addr, + | input longint unsigned data_lo, + | input longint unsigned data_hi + |); + | + |module MTraceDPI( + | input [7:0] is_write, + | input [7:0] is_shared, + | input [31:0] channel, + | input [63:0] hart_id, + | input [31:0] vbank_id, + | input [31:0] group_id, + | input [31:0] addr, + | input [63:0] data_lo, + | input [63:0] data_hi, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_mtrace(is_write, is_shared, channel, hart_id, vbank_id, group_id, addr, data_lo, data_hi); + | end + | end + |endmodule + """.stripMargin + ) +} diff --git a/arch/src/main/scala/framework/memdomain/backend/MemBackend.scala b/arch/src/main/scala/framework/memdomain/backend/MemBackend.scala new file mode 100644 index 00000000..fc71fc6d --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/MemBackend.scala @@ -0,0 +1,188 @@ +package framework.memdomain.backend + +import chisel3._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import chisel3.util._ +import framework.memdomain.frontend.outside_channel.MemConfigerIO +import framework.top.GlobalConfig +import framework.memdomain.backend.privatepath.PrivateMemBackend + +@instantiable +class MemBackend(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val mem_req = Vec(b.memDomain.bankChannel, Flipped(new MemRequestIO(b))) + val config = Flipped(Decoupled(new MemConfigerIO(b))) + + // Shared path — exposed to tile level for multi-core sharing. + // bankChannel ports: one per midend channel of this core. + val shared_mem_req = Vec(b.memDomain.bankChannel, new MemRequestIO(b)) + val shared_config = Decoupled(new MemConfigerIO(b)) + + // Query interface: shared query goes out, private query handled internally. + val shared_query_vbank_id = Output(UInt(8.W)) + val shared_query_group_count = Input(UInt(4.W)) + + // Original query interface from frontend + val query_vbank_id = Input(UInt(8.W)) + val query_is_shared = Input(Bool()) + val query_group_count = Output(UInt(4.W)) + }) + + // Keep the private backend datapath unchanged and isolate it in a dedicated module. + val privateBackend: Instance[PrivateMemBackend] = Instantiate(new PrivateMemBackend(b)) + + // Route config to the selected backend only. + val cfgToShared = io.config.bits.is_shared + privateBackend.io.config.valid := io.config.valid && !cfgToShared + privateBackend.io.config.bits := io.config.bits + io.shared_config.valid := io.config.valid && cfgToShared + io.shared_config.bits := io.config.bits + io.config.ready := Mux(cfgToShared, io.shared_config.ready, privateBackend.io.config.ready) + + // Query routing + privateBackend.io.query_vbank_id := io.query_vbank_id + io.shared_query_vbank_id := io.query_vbank_id + io.query_group_count := Mux(io.query_is_shared, io.shared_query_group_count, privateBackend.io.query_group_count) + + // Track whether a vbank is currently allocated in shared backend. + // Ball requests do not carry explicit shared/private info, so they are routed by this table. + private val vbankIdxWidth = log2Up(b.memDomain.bankNum) + val privateAllocByVbank = RegInit(VecInit(Seq.fill(b.memDomain.bankNum)(false.B))) + val sharedAllocByVbank = RegInit(VecInit(Seq.fill(b.memDomain.bankNum)(false.B))) + val cfgVbankIdx = io.config.bits.vbank_id(vbankIdxWidth - 1, 0) + when(io.config.fire) { + when(io.config.bits.alloc) { + when(io.config.bits.is_shared) { + sharedAllocByVbank(cfgVbankIdx) := true.B + }.otherwise { + privateAllocByVbank(cfgVbankIdx) := true.B + } + }.otherwise { + when(io.config.bits.is_shared) { + sharedAllocByVbank(cfgVbankIdx) := false.B + }.otherwise { + privateAllocByVbank(cfgVbankIdx) := false.B + } + } + } + + // Per-channel request routing: is_shared=0 -> private, is_shared=1 -> shared IO. + // Route selection is latched at request fire to keep response demux stable. + val readPending = RegInit(VecInit(Seq.fill(b.memDomain.bankChannel)(false.B))) + val writePending = RegInit(VecInit(Seq.fill(b.memDomain.bankChannel)(false.B))) + val readRouteShared = RegInit(VecInit(Seq.fill(b.memDomain.bankChannel)(false.B))) + val writeRouteShared = RegInit(VecInit(Seq.fill(b.memDomain.bankChannel)(false.B))) + + // Shared IO defaults + for (i <- 0 until b.memDomain.bankChannel) { + io.shared_mem_req(i).bank_id := 0.U + io.shared_mem_req(i).group_id := 0.U + io.shared_mem_req(i).is_shared := false.B + io.shared_mem_req(i).hart_id := 0.U + + io.shared_mem_req(i).read.req.valid := false.B + io.shared_mem_req(i).read.req.bits := DontCare + io.shared_mem_req(i).read.resp.ready := false.B + + io.shared_mem_req(i).write.req.valid := false.B + io.shared_mem_req(i).write.req.bits := DontCare + io.shared_mem_req(i).write.resp.ready := false.B + } + + for (i <- 0 until b.memDomain.bankChannel) { + val isBallChannel = i < b.top.memBallChannelNum + val hasPrivateAlloc = privateAllocByVbank(io.mem_req(i).bank_id) + val hasSharedAlloc = sharedAllocByVbank(io.mem_req(i).bank_id) + val hasBallReq = io.mem_req(i).read.req.valid || io.mem_req(i).write.req.valid + when(isBallChannel.B && hasBallReq) { + assert( + !(hasPrivateAlloc && hasSharedAlloc), + "MemBackend ambiguous Ball route: idx=%d has both private and shared allocations\n", + io.mem_req(i).bank_id + ) + } + val ballRouteShared = hasSharedAlloc && !hasPrivateAlloc + val useSharedReq = Mux(isBallChannel.B, ballRouteShared, io.mem_req(i).is_shared) + val useSharedReadResp = Mux(readPending(i), readRouteShared(i), useSharedReq) + val useSharedWriteResp = Mux(writePending(i), writeRouteShared(i), useSharedReq) + + when(io.mem_req(i).read.req.fire) { + readPending(i) := true.B + readRouteShared(i) := useSharedReq + } + when(io.mem_req(i).read.resp.fire) { + readPending(i) := false.B + } + + when(io.mem_req(i).write.req.fire) { + writePending(i) := true.B + writeRouteShared(i) := useSharedReq + } + when(io.mem_req(i).write.resp.fire) { + writePending(i) := false.B + } + + // Metadata is passed to both backends; only selected backend receives valid req/resp-ready. + privateBackend.io.mem_req(i).bank_id := io.mem_req(i).bank_id + privateBackend.io.mem_req(i).group_id := io.mem_req(i).group_id + privateBackend.io.mem_req(i).is_shared := useSharedReq + privateBackend.io.mem_req(i).hart_id := io.mem_req(i).hart_id + io.shared_mem_req(i).bank_id := io.mem_req(i).bank_id + io.shared_mem_req(i).group_id := io.mem_req(i).group_id + io.shared_mem_req(i).is_shared := useSharedReq + io.shared_mem_req(i).hart_id := io.mem_req(i).hart_id + + // Read request route + privateBackend.io.mem_req(i).read.req.valid := io.mem_req(i).read.req.valid && !useSharedReq + privateBackend.io.mem_req(i).read.req.bits := io.mem_req(i).read.req.bits + io.shared_mem_req(i).read.req.valid := io.mem_req(i).read.req.valid && useSharedReq + io.shared_mem_req(i).read.req.bits := io.mem_req(i).read.req.bits + io.mem_req(i).read.req.ready := Mux( + useSharedReq, + io.shared_mem_req(i).read.req.ready, + privateBackend.io.mem_req(i).read.req.ready + ) + + // Write request route + privateBackend.io.mem_req(i).write.req.valid := io.mem_req(i).write.req.valid && !useSharedReq + privateBackend.io.mem_req(i).write.req.bits := io.mem_req(i).write.req.bits + io.shared_mem_req(i).write.req.valid := io.mem_req(i).write.req.valid && useSharedReq + io.shared_mem_req(i).write.req.bits := io.mem_req(i).write.req.bits + io.mem_req(i).write.req.ready := Mux( + useSharedReq, + io.shared_mem_req(i).write.req.ready, + privateBackend.io.mem_req(i).write.req.ready + ) + + // Response ready route (selected by latched request route when pending). + privateBackend.io.mem_req(i).read.resp.ready := io.mem_req(i).read.resp.ready && !useSharedReadResp + io.shared_mem_req(i).read.resp.ready := io.mem_req(i).read.resp.ready && useSharedReadResp + privateBackend.io.mem_req(i).write.resp.ready := io.mem_req(i).write.resp.ready && !useSharedWriteResp + io.shared_mem_req(i).write.resp.ready := io.mem_req(i).write.resp.ready && useSharedWriteResp + + // Response valid/bits mux back to midend. + io.mem_req(i).read.resp.valid := Mux( + useSharedReadResp, + io.shared_mem_req(i).read.resp.valid, + privateBackend.io.mem_req(i).read.resp.valid + ) + io.mem_req(i).read.resp.bits := Mux( + useSharedReadResp, + io.shared_mem_req(i).read.resp.bits, + privateBackend.io.mem_req(i).read.resp.bits + ) + + io.mem_req(i).write.resp.valid := Mux( + useSharedWriteResp, + io.shared_mem_req(i).write.resp.valid, + privateBackend.io.mem_req(i).write.resp.valid + ) + io.mem_req(i).write.resp.bits := Mux( + useSharedWriteResp, + io.shared_mem_req(i).write.resp.bits, + privateBackend.io.mem_req(i).write.resp.bits + ) + } +} diff --git a/arch/src/main/scala/framework/memdomain/backend/MemBackendTypes.scala b/arch/src/main/scala/framework/memdomain/backend/MemBackendTypes.scala new file mode 100644 index 00000000..7f6201c8 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/MemBackendTypes.scala @@ -0,0 +1,15 @@ +package framework.memdomain.backend + +import chisel3._ +import chisel3.util._ +import framework.memdomain.backend.banks.{SramReadIO, SramWriteIO} +import framework.top.GlobalConfig + +class MemRequestIO(b: GlobalConfig) extends Bundle { + val write = Flipped(new SramWriteIO(b)) // midend sends write req into backend + val read = Flipped(new SramReadIO(b)) // midend sends read req into backend + val bank_id = Output(UInt(log2Up(b.memDomain.bankNum).W)) + val group_id = Output(UInt(3.W)) + val is_shared = Output(Bool()) + val hart_id = Output(UInt(b.core.xLen.W)) +} diff --git a/arch/src/main/scala/framework/memdomain/backend/accpipe/AccPipe.scala b/arch/src/main/scala/framework/memdomain/backend/accpipe/AccPipe.scala new file mode 100644 index 00000000..9422637c --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/accpipe/AccPipe.scala @@ -0,0 +1,103 @@ +package framework.memdomain.backend.accpipe + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} + +import framework.top.GlobalConfig +import framework.memdomain.backend.banks.{SramReadIO, SramWriteIO} +import framework.memdomain.backend.MemRequestIO + +/** + * AccPipe: Accumulator Pipeline + * - Direct write (wmode=0): write.req -> bank write -> forward resp + * - Accum write (wmode=1): bank read -> (old + new with mask) -> bank write -> forward resp + * - Read: bank read -> forward resp + * + * This version: + * - Uses correct IO directions based on your SramReadIO/SramWriteIO definitions + * - Uses strict Decoupled handshakes + * - Latches op type/address/data/mask + * - Latches old_data on read resp fire (no cross-state resp.bits usage) + */ +@instantiable +class AccPipe(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + // Interface to SramBank + // Your SramReadIO/SramWriteIO are SLAVE-shaped (req is Flipped), so master must Flipped(...) + val sramRead = Flipped(new SramReadIO(b)) // AccPipe -> bank: req out, resp in + val sramWrite = Flipped(new SramWriteIO(b)) // AccPipe -> bank: req out, resp in + + val mem_req = Flipped(new MemRequestIO(b)) + val is_multi = Input(Bool()) + + val busy = Output(Bool()) + val group_id = Output(UInt(3.W)) + val bank_id = Output(UInt(b.memDomain.bankNum.W)) + }) + + val read :: write :: Nil = Enum(2) + val state = RegInit(read) + + io.sramRead <> io.mem_req.read + io.sramWrite <> io.mem_req.write + + // Each group has its own physical bank, so no address shifting is needed. + // The previous is_multi shift (addr >> 2) was incorrect: it caused mvout reads + // to access wrong physical addresses while matmul writes used unshifted addresses. + + //group_id output + val group_id_reg = RegInit(0.U(3.W)) + when(io.mem_req.read.req.valid || io.mem_req.write.req.valid) { + io.group_id := io.mem_req.group_id + group_id_reg := io.mem_req.group_id + }.otherwise { + io.group_id := group_id_reg + } + + //Bank_id output + val bank_id_reg = RegInit(0.U(b.memDomain.bankNum.W)) + when(io.mem_req.read.req.valid || io.mem_req.write.req.valid) { + bank_id_reg := io.mem_req.bank_id + io.bank_id := io.mem_req.bank_id + }.otherwise { + io.bank_id := bank_id_reg + } + + val acc_data_reg = RegInit(0.U(b.memDomain.bankWidth.W)) + val acc_mask_reg = RegInit(VecInit(Seq.fill(b.memDomain.bankMaskLen)(false.B))) + val acc_addr_reg = RegInit(0.U(b.memDomain.memAddrLen.W)) + + switch(state) { + is(read) { //Stage 1: Read Acc Data + when(io.mem_req.write.req.valid && io.mem_req.write.req.bits.wmode) { + state := write + acc_data_reg := io.mem_req.write.req.bits.data + acc_mask_reg := io.mem_req.write.req.bits.mask + acc_addr_reg := io.mem_req.write.req.bits.addr + io.sramRead.req.bits.addr := io.mem_req.write.req.bits.addr + io.sramRead.req.valid := true.B + + io.sramWrite.req.valid := false.B + io.sramWrite.req.bits := DontCare + } + } + is(write) { //Stage 2: Write Acc Data + when(io.sramRead.resp.valid) { + state := read + io.sramWrite.req.bits.addr := acc_addr_reg + io.sramWrite.req.bits.data := acc_data_reg + io.sramRead.resp.bits.data + io.sramWrite.req.bits.mask := acc_mask_reg + io.sramWrite.req.bits.wmode := true.B + io.sramWrite.req.valid := true.B + + io.sramRead.req.valid := false.B + io.sramRead.req.bits := DontCare + } + } + } + + io.busy := false.B +} diff --git a/arch/src/main/scala/framework/memdomain/backend/banks/SramBank.scala b/arch/src/main/scala/framework/memdomain/backend/banks/SramBank.scala new file mode 100644 index 00000000..bf25b6a4 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/banks/SramBank.scala @@ -0,0 +1,53 @@ +package framework.memdomain.backend.banks + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig + +/** + * SramBank: Pure SRAM bank + * Simple read/write memory without any accumulation logic + * Each bank is a single-port SRAM + */ +@instantiable +class SramBank(val b: GlobalConfig) extends Module { + val mask_len = b.memDomain.bankMaskLen + val mask_elem = UInt((b.memDomain.bankWidth / mask_len).W) + + @public + val io = IO(new Bundle { + val sramRead = new SramReadIO(b) + val sramWrite = new SramWriteIO(b) + }) + + val mem = SyncReadMem(b.memDomain.bankEntries, Vec(mask_len, mask_elem)) + + // ----------------------------------------------------------------------------- + // Read path + // ----------------------------------------------------------------------------- + io.sramRead.req.ready := !io.sramWrite.req.valid + + val raddr = io.sramRead.req.bits.addr + val ren = io.sramRead.req.fire + val rdata = mem.read(raddr, ren) + + io.sramRead.resp.valid := RegNext(ren) + io.sramRead.resp.bits.data := rdata.asUInt + + // ----------------------------------------------------------------------------- + // Write path + // ----------------------------------------------------------------------------- + io.sramWrite.req.ready := !io.sramRead.req.valid + + when(io.sramWrite.req.fire) { + mem.write( + io.sramWrite.req.bits.addr, + io.sramWrite.req.bits.data.asTypeOf(Vec(mask_len, mask_elem)), + io.sramWrite.req.bits.mask + ) + } + + io.sramWrite.resp.valid := RegNext(io.sramWrite.req.fire) + io.sramWrite.resp.bits.ok := RegNext(io.sramWrite.req.fire) +} diff --git a/arch/src/main/scala/framework/memdomain/backend/banks/SramIO.scala b/arch/src/main/scala/framework/memdomain/backend/banks/SramIO.scala new file mode 100644 index 00000000..b566ee84 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/banks/SramIO.scala @@ -0,0 +1,37 @@ +package framework.memdomain.backend.banks + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig + +/** + * Generic SRAM interface definitions + */ +class SramReadReq(val b: GlobalConfig) extends Bundle { + val addr = UInt(log2Ceil(b.memDomain.bankEntries).W) +} + +class SramReadResp(val b: GlobalConfig) extends Bundle { + val data = UInt(b.memDomain.bankWidth.W) +} + +class SramReadIO(val b: GlobalConfig) extends Bundle { + val req = Flipped(Decoupled(new SramReadReq(b))) + val resp = Decoupled(new SramReadResp(b)) +} + +class SramWriteReq(val b: GlobalConfig) extends Bundle { + val addr = UInt(log2Ceil(b.memDomain.bankEntries).W) + val mask = Vec(b.memDomain.bankMaskLen, Bool()) + val data = UInt(b.memDomain.bankWidth.W) + val wmode = Bool() // true=accumulator mode, false=direct write mode +} + +class SramWriteIO(val b: GlobalConfig) extends Bundle { + val req = Flipped(Decoupled(new SramWriteReq(b))) + val resp = Decoupled(new SramWriteResp(b)) +} + +class SramWriteResp(val b: GlobalConfig) extends Bundle { + val ok = Bool() +} diff --git a/arch/src/main/scala/framework/memdomain/backend/privatepath/PrivateMemBackend.scala b/arch/src/main/scala/framework/memdomain/backend/privatepath/PrivateMemBackend.scala new file mode 100644 index 00000000..389616a9 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/privatepath/PrivateMemBackend.scala @@ -0,0 +1,204 @@ +package framework.memdomain.backend.privatepath + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.memdomain.frontend.outside_channel.MemConfigerIO +import framework.memdomain.backend.{MTraceDPI, MemRequestIO} +import framework.memdomain.backend.accpipe.AccPipe +import framework.memdomain.backend.banks.SramBank +import framework.top.GlobalConfig + +@instantiable +class PrivateMemBackend(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val mem_req = Vec(b.memDomain.bankChannel, Flipped(new MemRequestIO(b))) + val config = Flipped(Decoupled(new MemConfigerIO(b))) + + // Query interface for frontend to get group count + val query_vbank_id = Input(UInt(8.W)) + val query_group_count = Output(UInt(4.W)) + }) + + val banks: Seq[Instance[SramBank]] = Seq.fill(b.memDomain.bankNum)(Instantiate(new SramBank(b))) + val accPipes: Seq[Instance[AccPipe]] = Seq.fill(b.memDomain.bankChannel)(Instantiate(new AccPipe(b))) + + // Per-channel memory trace DPI-C modules to avoid losing simultaneous events + val mtraces = Seq.fill(b.memDomain.bankChannel)(Module(new MTraceDPI)) + for (mt <- mtraces) { + mt.io.is_write := 0.U + mt.io.is_shared := 0.U + mt.io.channel := 0.U + mt.io.hart_id := 0.U + mt.io.vbank_id := 0.U + mt.io.group_id := 0.U + mt.io.addr := 0.U + mt.io.data_lo := 0.U + mt.io.data_hi := 0.U + mt.io.enable := false.B + } + + // ----------------------------------------------------------------------------- + // Mapping table + // ----------------------------------------------------------------------------- + class MappingTableEntry extends Bundle { + val valid = Bool() + val vbank_id = UInt(5.W) + val is_multi = Bool() + val group_id = UInt(3.W) + } + + val mappingTable = RegInit(VecInit(Seq.fill(b.memDomain.bankNum)(0.U.asTypeOf(new MappingTableEntry)))) + + def isAcc(vbank_id: UInt): Bool = + mappingTable.map(entry => entry.valid && (entry.vbank_id === vbank_id) && entry.is_multi).reduce(_ || _) + + def addEntry( + vbank_id: UInt, + pbank_id: UInt, + is_multi: Bool, + group_id: UInt + ): Unit = { + val entry = mappingTable(pbank_id) + entry.valid := true.B + entry.vbank_id := vbank_id + entry.is_multi := is_multi + entry.group_id := group_id + } + + def deleteEntry(vbank_id: UInt): Unit = { + for (i <- 0 until b.memDomain.bankNum) { + when(mappingTable(i).valid && mappingTable(i).vbank_id === vbank_id) { + mappingTable(i).valid := false.B + mappingTable(i).vbank_id := 0.U + mappingTable(i).is_multi := false.B + mappingTable(i).group_id := 0.U + } + } + } + + def getFreePbankId(): UInt = { + val freePbankId = mappingTable.indexWhere(_.valid === false.B) + freePbankId + } + + // ----------------------------------------------------------------------------- + // Default Value + // ----------------------------------------------------------------------------- + + for (i <- 0 until b.memDomain.bankChannel) { + accPipes(i).io.mem_req.write <> io.mem_req(i).write + accPipes(i).io.mem_req.read <> io.mem_req(i).read + accPipes(i).io.mem_req.bank_id := io.mem_req(i).bank_id + accPipes(i).io.mem_req.group_id := io.mem_req(i).group_id + accPipes(i).io.mem_req.is_shared := io.mem_req(i).is_shared + accPipes(i).io.mem_req.hart_id := io.mem_req(i).hart_id + + // Bank-side defaults (only driven when a bank is actually connected) + accPipes(i).io.sramRead.req.ready := false.B + accPipes(i).io.sramRead.resp.valid := false.B + accPipes(i).io.sramRead.resp.bits := DontCare + + accPipes(i).io.sramWrite.req.ready := false.B + accPipes(i).io.sramWrite.resp.valid := false.B + accPipes(i).io.sramWrite.resp.bits := DontCare + + accPipes(i).io.is_multi := isAcc(io.mem_req(i).bank_id) + } + + io.config.ready := true.B + + banks.zipWithIndex.foreach { + case (bank, _) => + bank.io.sramRead.req.valid := false.B + bank.io.sramRead.req.bits := DontCare + bank.io.sramRead.resp.ready := true.B + + bank.io.sramWrite.req.valid := false.B + bank.io.sramWrite.req.bits := DontCare + bank.io.sramWrite.resp.ready := true.B + } + + // ----------------------------------------------------------------------------- + // Bank Alloc/Release + // ----------------------------------------------------------------------------- + + when(io.config.valid) { + when(io.config.bits.alloc) { + val pbank_id = getFreePbankId() + addEntry(io.config.bits.vbank_id, pbank_id, io.config.bits.is_multi, io.config.bits.group_id) + }.otherwise { + deleteEntry(io.config.bits.vbank_id) + } + } + + // ----------------------------------------------------------------------------- + // Query interface: return group count for a given vbank_id + // ----------------------------------------------------------------------------- + val groupCounts = mappingTable.map { entry => + val matches = entry.valid && (entry.vbank_id === io.query_vbank_id) + val count = Mux(entry.is_multi, entry.group_id + 1.U, 1.U) + Mux(matches, count, 0.U) + } + + io.query_group_count := groupCounts.reduce((a, b) => Mux(a > b, a, b)) + + // ----------------------------------------------------------------------------- + // Connect AccPipe and Banks + // ----------------------------------------------------------------------------- + private def emitTrace( + ch: Int, + isWrite: UInt, + addr: UInt, + dataLo: UInt, + dataHi: UInt, + en: Bool + ): Unit = { + mtraces(ch).io.is_write := isWrite + mtraces(ch).io.is_shared := io.mem_req(ch).is_shared.asUInt + mtraces(ch).io.channel := ch.U + mtraces(ch).io.hart_id := io.mem_req(ch).hart_id + mtraces(ch).io.vbank_id := io.mem_req(ch).bank_id + mtraces(ch).io.group_id := io.mem_req(ch).group_id + mtraces(ch).io.addr := addr + mtraces(ch).io.data_lo := dataLo + mtraces(ch).io.data_hi := dataHi + mtraces(ch).io.enable := en + } + + for (i <- 0 until b.memDomain.bankChannel) { + val req_valid = io.mem_req(i).read.req.valid || io.mem_req(i).write.req.valid + + // Memory trace: read request + when(io.mem_req(i).read.req.fire) { + emitTrace(i, 0.U, io.mem_req(i).read.req.bits.addr, 0.U, 0.U, true.B) + } + + // Memory trace: write request + when(io.mem_req(i).write.req.fire) { + emitTrace( + i, + 1.U, + io.mem_req(i).write.req.bits.addr, + io.mem_req(i).write.req.bits.data(63, 0), + io.mem_req(i).write.req.bits.data(127, 64), + true.B + ) + } + + for (j <- 0 until b.memDomain.bankNum) { + val hit_bank = mappingTable(j).valid && (mappingTable(j).vbank_id === io.mem_req(i).bank_id) && + (!mappingTable(j).is_multi || + (mappingTable(j).is_multi && (mappingTable(j).group_id === io.mem_req(i).group_id))) + + val hold_one = RegNext(hit_bank && req_valid, init = false.B) + + when((hit_bank && req_valid) || hold_one) { + banks(j).io.sramRead <> accPipes(i).io.sramRead + banks(j).io.sramWrite <> accPipes(i).io.sramWrite + } + } + } +} diff --git a/arch/src/main/scala/framework/memdomain/backend/shared/SharedMem.scala b/arch/src/main/scala/framework/memdomain/backend/shared/SharedMem.scala new file mode 100644 index 00000000..81caefc9 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/shared/SharedMem.scala @@ -0,0 +1,106 @@ +package framework.memdomain.backend.shared + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig +import framework.memdomain.backend.banks.{SramReadResp, SramWriteResp} + +object SharedMemLayout { + def bankPerHart(b: GlobalConfig): Int = b.memDomain.bankNum + def maxHart(b: GlobalConfig): Int = b.top.nCores + def totalBank(b: GlobalConfig): Int = bankPerHart(b) * maxHart(b) + def channelPerHart(b: GlobalConfig): Int = b.memDomain.bankChannel + def totalChannel(b: GlobalConfig): Int = channelPerHart(b) * maxHart(b) +} + +class SharedMemReadReq(val b: GlobalConfig) extends Bundle { + val hartid = UInt(b.core.xLen.W) + val pbank_id = UInt(log2Ceil(SharedMemLayout.totalBank(b)).W) + val group_id = UInt(3.W) + val addr = UInt(log2Ceil(b.memDomain.bankEntries).W) +} + +class SharedMemWriteReq(val b: GlobalConfig) extends Bundle { + val hartid = UInt(b.core.xLen.W) + val pbank_id = UInt(log2Ceil(SharedMemLayout.totalBank(b)).W) + val group_id = UInt(3.W) + val addr = UInt(log2Ceil(b.memDomain.bankEntries).W) + val mask = Vec(b.memDomain.bankMaskLen, Bool()) + val data = UInt(b.memDomain.bankWidth.W) + val wmode = Bool() +} + +class SharedMemReadIO(val b: GlobalConfig) extends Bundle { + val req = Flipped(Decoupled(new SharedMemReadReq(b))) + val resp = Decoupled(new SramReadResp(b)) +} + +class SharedMemWriteIO(val b: GlobalConfig) extends Bundle { + val req = Flipped(Decoupled(new SharedMemWriteReq(b))) + val resp = Decoupled(new SramWriteResp(b)) +} + +@instantiable +class SharedMem(val b: GlobalConfig) extends Module { + private val maskLen = b.memDomain.bankMaskLen + private val maskElem = UInt((b.memDomain.bankWidth / maskLen).W) + private val totalBanks = SharedMemLayout.totalBank(b) + private val minSharedLines = totalBanks * b.memDomain.bankEntries + + require( + b.memDomain.sharedEntries >= minSharedLines, + s"sharedEntries=${b.memDomain.sharedEntries} is too small, minimum=$minSharedLines" + ) + + @public + val io = IO(new Bundle { + val read = new SharedMemReadIO(b) + val write = new SharedMemWriteIO(b) + }) + + val mem = SyncReadMem(b.memDomain.sharedEntries, Vec(maskLen, maskElem)) + + // Shared memory address mapping (group_id intentionally ignored): + // shared_addr = pbank_id * bankEntries + local_addr. + private def toSharedAddr( + pbank_id: UInt, + _group_id: UInt, + local_addr: UInt + ): UInt = { + val pbankPart = pbank_id * b.memDomain.bankEntries.U + pbankPart + local_addr + } + + io.read.req.ready := !io.write.req.valid + io.write.req.ready := !io.read.req.valid + + val readReqFire = io.read.req.fire + val readAddr = toSharedAddr(io.read.req.bits.pbank_id, io.read.req.bits.group_id, io.read.req.bits.addr) + + when(readReqFire) { + assert(io.read.req.bits.pbank_id < totalBanks.U, "SharedMem: pbank_id out of range") + assert(io.read.req.bits.addr < b.memDomain.bankEntries.U, "SharedMem: local addr out of range") + } + + val readData = mem.read(readAddr, readReqFire) + + io.read.resp.valid := RegNext(readReqFire, init = false.B) + io.read.resp.bits.data := readData.asUInt + + val writeReqFire = io.write.req.fire + val writeAddr = toSharedAddr(io.write.req.bits.pbank_id, io.write.req.bits.group_id, io.write.req.bits.addr) + + when(writeReqFire) { + assert(io.write.req.bits.pbank_id < totalBanks.U, "SharedMem: pbank_id out of range") + assert(io.write.req.bits.addr < b.memDomain.bankEntries.U, "SharedMem: local addr out of range") + mem.write( + writeAddr, + io.write.req.bits.data.asTypeOf(Vec(maskLen, maskElem)), + io.write.req.bits.mask + ) + } + + io.write.resp.valid := RegNext(writeReqFire, init = false.B) + io.write.resp.bits.ok := RegNext(writeReqFire, init = false.B) +} diff --git a/arch/src/main/scala/framework/memdomain/backend/shared/SharedMemBackend.scala b/arch/src/main/scala/framework/memdomain/backend/shared/SharedMemBackend.scala new file mode 100644 index 00000000..c93da78e --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/backend/shared/SharedMemBackend.scala @@ -0,0 +1,220 @@ +package framework.memdomain.backend.shared + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.memdomain.backend.{MTraceDPI, MemRequestIO} +import framework.memdomain.backend.accpipe.AccPipe +import framework.memdomain.backend.banks.SramBank +import framework.memdomain.frontend.outside_channel.MemConfigerIO +import framework.top.GlobalConfig + +@instantiable +class SharedMemBackend(val b: GlobalConfig) extends Module { + private val nCores = b.top.nCores + private val totalBanks = SharedMemLayout.totalBank(b) + private val totalChannel = SharedMemLayout.totalChannel(b) + + @public + val io = IO(new Bundle { + val mem_req = Vec(totalChannel, Flipped(new MemRequestIO(b))) + val config = Flipped(Decoupled(new MemConfigerIO(b))) + + // Query interface for frontend to get group count + val query_vbank_id = Input(UInt(8.W)) + val query_group_count = Output(UInt(4.W)) + }) + + val banks: Seq[Instance[SramBank]] = Seq.fill(totalBanks)(Instantiate(new SramBank(b))) + val accPipes: Seq[Instance[AccPipe]] = Seq.fill(totalChannel)(Instantiate(new AccPipe(b))) + + // Per-channel memory trace DPI-C modules to avoid losing simultaneous events + val mtraces = Seq.fill(totalChannel)(Module(new MTraceDPI)) + for (mt <- mtraces) { + mt.io.is_write := 0.U + mt.io.is_shared := 0.U + mt.io.channel := 0.U + mt.io.hart_id := 0.U + mt.io.vbank_id := 0.U + mt.io.group_id := 0.U + mt.io.addr := 0.U + mt.io.data_lo := 0.U + mt.io.data_hi := 0.U + mt.io.enable := false.B + } + + // ----------------------------------------------------------------------------- + // Mapping table + // ----------------------------------------------------------------------------- + class MappingTableEntry extends Bundle { + val valid = Bool() + val hart_id = UInt(b.core.xLen.W) + val vbank_id = UInt(5.W) + val is_multi = Bool() + val group_id = UInt(3.W) + } + + val mappingTable = RegInit(VecInit(Seq.fill(totalBanks)(0.U.asTypeOf(new MappingTableEntry)))) + + def isAcc(hart_id: UInt, vbank_id: UInt): Bool = + mappingTable.map(entry => + entry.valid && (entry.vbank_id === vbank_id) && (entry.hart_id === hart_id) && entry.is_multi + ).reduce(_ || _) + + def addEntry( + hart_id: UInt, + vbank_id: UInt, + pbank_id: UInt, + is_multi: Bool, + group_id: UInt + ): Unit = { + val entry = mappingTable(pbank_id) + entry.valid := true.B + entry.hart_id := hart_id + entry.vbank_id := vbank_id + entry.is_multi := is_multi + entry.group_id := group_id + } + + def deleteEntry(hart_id: UInt, vbank_id: UInt): Unit = { + for (i <- 0 until totalBanks) { + when(mappingTable(i).valid && mappingTable(i).vbank_id === vbank_id && mappingTable(i).hart_id === hart_id) { + mappingTable(i).valid := false.B + mappingTable(i).vbank_id := 0.U + mappingTable(i).is_multi := false.B + mappingTable(i).group_id := 0.U + } + } + } + + def getFreePbankId(): UInt = { + val freePbankId = mappingTable.indexWhere(_.valid === false.B) + freePbankId + } + + // ----------------------------------------------------------------------------- + // Default Value + // ----------------------------------------------------------------------------- + + for (i <- 0 until totalChannel) { + accPipes(i).io.mem_req.write <> io.mem_req(i).write + accPipes(i).io.mem_req.read <> io.mem_req(i).read + accPipes(i).io.mem_req.bank_id := io.mem_req(i).bank_id + accPipes(i).io.mem_req.group_id := io.mem_req(i).group_id + accPipes(i).io.mem_req.is_shared := io.mem_req(i).is_shared + accPipes(i).io.mem_req.hart_id := io.mem_req(i).hart_id + + // Bank-side defaults (only driven when a bank is actually connected) + accPipes(i).io.sramRead.req.ready := false.B + accPipes(i).io.sramRead.resp.valid := false.B + accPipes(i).io.sramRead.resp.bits := DontCare + + accPipes(i).io.sramWrite.req.ready := false.B + accPipes(i).io.sramWrite.resp.valid := false.B + accPipes(i).io.sramWrite.resp.bits := DontCare + + accPipes(i).io.is_multi := isAcc(io.mem_req(i).hart_id, io.mem_req(i).bank_id) + } + + io.config.ready := true.B + + banks.zipWithIndex.foreach { + case (bank, _) => + bank.io.sramRead.req.valid := false.B + bank.io.sramRead.req.bits := DontCare + bank.io.sramRead.resp.ready := true.B + + bank.io.sramWrite.req.valid := false.B + bank.io.sramWrite.req.bits := DontCare + bank.io.sramWrite.resp.ready := true.B + } + + // ----------------------------------------------------------------------------- + // Bank Alloc/Release + // ----------------------------------------------------------------------------- + + when(io.config.valid) { + when(io.config.bits.alloc) { + val pbank_id = getFreePbankId() + addEntry( + io.config.bits.hart_id, + io.config.bits.vbank_id, + pbank_id, + io.config.bits.is_multi, + io.config.bits.group_id + ) + }.otherwise { + deleteEntry(io.config.bits.hart_id, io.config.bits.vbank_id) + } + } + + // ----------------------------------------------------------------------------- + // Query interface: return group count for a given vbank_id + // ----------------------------------------------------------------------------- + val groupCounts = mappingTable.map { entry => + val matches = entry.valid && (entry.vbank_id === io.query_vbank_id) + val count = Mux(entry.is_multi, entry.group_id + 1.U, 1.U) + Mux(matches, count, 0.U) + } + + io.query_group_count := groupCounts.reduce((a, b) => Mux(a > b, a, b)) + + // ----------------------------------------------------------------------------- + // Connect AccPipe and Banks + // ----------------------------------------------------------------------------- + private def emitTrace( + ch: Int, + isWrite: UInt, + addr: UInt, + dataLo: UInt, + dataHi: UInt, + en: Bool + ): Unit = { + mtraces(ch).io.is_write := isWrite + mtraces(ch).io.is_shared := io.mem_req(ch).is_shared.asUInt + mtraces(ch).io.channel := ch.U + mtraces(ch).io.hart_id := io.mem_req(ch).hart_id + mtraces(ch).io.vbank_id := io.mem_req(ch).bank_id + mtraces(ch).io.group_id := io.mem_req(ch).group_id + mtraces(ch).io.addr := addr + mtraces(ch).io.data_lo := dataLo + mtraces(ch).io.data_hi := dataHi + mtraces(ch).io.enable := en + } + + for (i <- 0 until totalChannel) { + val req_valid = io.mem_req(i).read.req.valid || io.mem_req(i).write.req.valid + + // Memory trace: read request + when(io.mem_req(i).read.req.fire) { + emitTrace(i, 0.U, io.mem_req(i).read.req.bits.addr, 0.U, 0.U, true.B) + } + + // Memory trace: write request + when(io.mem_req(i).write.req.fire) { + emitTrace( + i, + 1.U, + io.mem_req(i).write.req.bits.addr, + io.mem_req(i).write.req.bits.data(63, 0), + io.mem_req(i).write.req.bits.data(127, 64), + true.B + ) + } + + for (j <- 0 until totalBanks) { + val hit_bank = mappingTable(j).valid && + (mappingTable(j).hart_id === io.mem_req(i).hart_id) && + (mappingTable(j).vbank_id === io.mem_req(i).bank_id) && + (!mappingTable(j).is_multi || + (mappingTable(j).is_multi && (mappingTable(j).group_id === io.mem_req(i).group_id))) + + val hold_one = RegNext(hit_bank && req_valid, init = false.B) + + when((hit_bank && req_valid) || hold_one) { + banks(j).io.sramRead <> accPipes(i).io.sramRead + banks(j).io.sramWrite <> accPipes(i).io.sramWrite + } + } + } +} diff --git a/arch/src/main/scala/framework/memdomain/configs/MemDomainParam.scala b/arch/src/main/scala/framework/memdomain/configs/MemDomainParam.scala new file mode 100644 index 00000000..63c03cd4 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/configs/MemDomainParam.scala @@ -0,0 +1,34 @@ +package framework.memdomain.configs + +import upickle.default._ + +/** + * MemDomain Parameter + */ +case class MemDomainParam( + bankNum: Int, + bankWidth: Int, + bankEntries: Int, + bankMaskLen: Int, + sharedEnable: Boolean, + sharedEntries: Int, + sharedDefaultGroupCount: Int, + tlb_size: Int, + dma_n_xacts: Int, + dma_maxbytes: Int, + bankChannel: Int, + max_in_flight_mem_reqs: Int, + dma_buswidth: Int, + memAddrLen: Int, + tmaReadChannel: Int, + tmaWriteChannel: Int) + +object MemDomainParam { + implicit val rw: ReadWriter[MemDomainParam] = macroRW + + def apply(): MemDomainParam = { + val jsonStr = scala.io.Source.fromFile("src/main/scala/framework/memdomain/configs/default.json").mkString + read[MemDomainParam](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/memdomain/configs/default.json b/arch/src/main/scala/framework/memdomain/configs/default.json new file mode 100644 index 00000000..bf7223da --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/configs/default.json @@ -0,0 +1,19 @@ +{ + "bankNum": 32, + "bankWidth": 128, + "bankEntries": 128, + "bankMaskLen": 16, + "sharedEnable": true, + "sharedEntries": 16384, + "sharedDefaultGroupCount": 1, + "tlb_size": 4, + "dma_n_xacts": 8, + "dma_maxbytes": 64, + "bankChannel": 7, + "max_in_flight_mem_reqs": 16, + "dma_buswidth": 128, + "memAddrLen": 39, + "balldomainChannel": 6, + "tmaReadChannel": 4, + "tmaWriteChannel": 4 +} diff --git a/arch/src/main/scala/framework/memdomain/dma/DMA.scala b/arch/src/main/scala/framework/memdomain/dma/DMA.scala deleted file mode 100644 index 48231f89..00000000 --- a/arch/src/main/scala/framework/memdomain/dma/DMA.scala +++ /dev/null @@ -1,303 +0,0 @@ -package framework.memdomain.dma - -import chisel3._ -import chisel3.util._ - -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.diplomacy.{IdRange, LazyModule, LazyModuleImp} -import freechips.rocketchip.tile.{CoreBundle, HasCoreParameters} -import freechips.rocketchip.tilelink._ -import freechips.rocketchip.rocket.MStatus -import freechips.rocketchip.rocket.constants.MemoryOpConstants - -import framework.builtin.util.Util._ -import framework.memdomain.tlb.BBTLBIO -import framework.memdomain.dma.LocalAddr - - -class BBReadRequest()(implicit p: Parameters) extends CoreBundle { - val vaddr = UInt(coreMaxAddrBits.W) - // Read length (bytes) - val len = UInt(16.W) - val status = new MStatus - // Stride (bytes) - val stride = UInt(10.W) -} - -class BBReadResponse(dataWidth: Int) extends Bundle { - val data = UInt(dataWidth.W) - val last = Bool() - val addrcounter = UInt(10.W) -} - -class BBWriteRequest(dataWidth: Int)(implicit p: Parameters) extends CoreBundle { - val vaddr = UInt(coreMaxAddrBits.W) - val data = UInt(dataWidth.W) - // Write length (bytes) - val len = UInt(16.W) - // Byte mask - val mask = UInt((dataWidth / 8).W) - val status = new MStatus -} - -class BBWriteResponse extends Bundle { - val done = Bool() -} - -class BBStreamReader(nXacts: Int, beatBits: Int, maxBytes: Int, dataWidth: Int) - (implicit p: Parameters) extends LazyModule { - val node = TLClientNode(Seq(TLMasterPortParameters.v1(Seq(TLClientParameters( - name = "buckyball-stream-reader", sourceId = IdRange(0, nXacts)))))) - - lazy val module = new Impl - class Impl extends LazyModuleImp(this) with HasCoreParameters with MemoryOpConstants { - val (tl, edge) = node.out(0) - val beatBytes = beatBits / 8 - - val io = IO(new Bundle { - val req = Flipped(Decoupled(new BBReadRequest())) - val resp = Decoupled(new BBReadResponse(dataWidth)) - val tlb = new BBTLBIO - val busy = Output(Bool()) - val flush = Input(Bool()) - }) - - val s_idle :: s_req_new_block :: Nil = Enum(2) - val state = RegInit(s_idle) - - val req = Reg(new BBReadRequest()) - // Number of bytes requested - val bytesRequested = Reg(UInt(16.W)) - // Number of bytes received - val bytesReceived = Reg(UInt(16.W)) - val bytesLeft = req.len - bytesRequested - - // Select request size - simplified version, fixed use of beatBytes - val read_size = minOf(beatBytes.U, bytesLeft) - val read_vaddr = req.vaddr + bytesRequested * req.stride - - // Track byte range corresponding to each request for correct last signal calculation - // Starting byte position of current request - val req_byte_start = Reg(UInt(16.W)) - // Ending byte position of current request - val req_byte_end = Wire(UInt(16.W)) - req_byte_end := req_byte_start + read_size - - // Transaction ID management - val xactBusy = RegInit(0.U(nXacts.W)) - val xactOnehot = PriorityEncoderOH(~xactBusy) - val xactId = OHToUInt(xactOnehot) - - val xactBusy_fire = WireInit(false.B) - val xactBusy_add = Mux(xactBusy_fire, (1.U << xactId).asUInt, 0.U) - val xactBusy_remove = ~Mux(tl.d.fire, (1.U << tl.d.bits.source).asUInt, 0.U) - xactBusy := (xactBusy | xactBusy_add) & xactBusy_remove.asUInt - - // TileLink request construction - return to single beat requests to avoid address alignment issues - val get = edge.Get( - fromSource = xactId, - toAddress = 0.U, - // Request only one beat each time - lgSize = log2Ceil(beatBytes).U - )._2 - - // TLB processing pipeline - simplified based on Gemmini - class TLBundleAWithInfo extends Bundle { - val tl_a = tl.a.bits.cloneType - val vaddr = Output(UInt(vaddrBits.W)) - val status = Output(new MStatus) - } - - val untranslated_a = Wire(Decoupled(new TLBundleAWithInfo)) - xactBusy_fire := untranslated_a.fire && state === s_req_new_block - untranslated_a.valid := state === s_req_new_block && !xactBusy.andR - untranslated_a.bits.tl_a := get - untranslated_a.bits.vaddr := read_vaddr - untranslated_a.bits.status := req.status - - - - // Simplified: no retry mechanism, direct connection - val tlb_q = Module(new Queue(new TLBundleAWithInfo, 1, pipe=true)) - tlb_q.io.enq <> untranslated_a - - io.tlb.req.valid := tlb_q.io.deq.fire - io.tlb.req.bits := DontCare - io.tlb.req.bits.tlb_req.vaddr := tlb_q.io.deq.bits.vaddr - io.tlb.req.bits.tlb_req.passthrough := false.B - io.tlb.req.bits.tlb_req.size := 0.U - io.tlb.req.bits.tlb_req.cmd := M_XRD - io.tlb.req.bits.status := tlb_q.io.deq.bits.status - - val translate_q = Module(new Queue(new TLBundleAWithInfo, 1, pipe=true)) - translate_q.io.enq <> tlb_q.io.deq - translate_q.io.deq.ready := tl.a.ready || io.tlb.resp.miss - - // TileLink connection - tl.a.valid := translate_q.io.deq.valid - tl.a.bits := translate_q.io.deq.bits.tl_a - tl.a.bits.address := io.tlb.resp.paddr - - // Iteration counter for tracking number of requests - val iter_counter = RegInit(0.U(10.W)) - // Table for managing iteration counts - val iter_mangage_table = RegInit(VecInit(Seq.fill(16)(0.U(10.W)))) - // Iteration count management - update after each request - when (tl.a.fire) { - iter_counter := iter_counter + 1.U - iter_mangage_table(tl.a.bits.source) := iter_counter - } - - // Response processing - io.resp.valid := tl.d.valid - io.resp.bits.data := tl.d.bits.data - // Use source as address counter - io.resp.bits.addrcounter := iter_mangage_table(tl.d.bits.source) - // Fix last signal: calculate using received byte count - // Total byte count after receiving current beat - val resp_bytes_end = bytesReceived + beatBytes.U - io.resp.bits.last := edge.last(tl.d) && (resp_bytes_end >= req.len) - tl.d.ready := io.resp.ready - - // Update received byte count - when (tl.d.fire) { - bytesReceived := bytesReceived + beatBytes.U - } - - // State machine - io.req.ready := state === s_idle - io.busy := xactBusy.orR || (state =/= s_idle) - - when (io.req.fire) { - req := io.req.bits - bytesRequested := 0.U - // Reset received byte count - bytesReceived := 0.U - // Reset iteration counter - iter_counter := 0.U - state := s_req_new_block - } - - when (untranslated_a.fire) { - // Use actual requested byte count - bytesRequested := bytesRequested + read_size - // Check if more requests need to be sent - when (bytesRequested + read_size >= req.len) { - // All requests sent - state := s_idle - }.otherwise { - // Continue sending next request - state := s_req_new_block - } - } - } -} - -// Convention: data is already aligned and has mask -class BBStreamWriter(nXacts: Int, beatBits: Int, maxBytes: Int, dataWidth: Int) - (implicit p: Parameters) extends LazyModule { - val node = TLClientNode(Seq(TLMasterPortParameters.v1(Seq(TLClientParameters( - name = "buckyball-stream-writer", sourceId = IdRange(0, nXacts)))))) - - lazy val module = new Impl - class Impl extends LazyModuleImp(this) with HasCoreParameters with MemoryOpConstants { - val (tl, edge) = node.out(0) - val beatBytes = beatBits / 8 - - val io = IO(new Bundle { - val req = Flipped(Decoupled(new BBWriteRequest(dataWidth))) - val resp = Decoupled(new BBWriteResponse) - val tlb = new BBTLBIO - val busy = Output(Bool()) - val flush = Input(Bool()) - }) - - val s_idle :: s_writing :: Nil = Enum(2) - val state = RegInit(s_idle) - - val req = Reg(new BBWriteRequest(dataWidth)) - - val xactBusy = RegInit(0.U(nXacts.W)) - val xactOnehot = PriorityEncoderOH(~xactBusy) - val xactId = OHToUInt(xactOnehot) - - val xactBusy_fire = WireInit(false.B) - val xactBusy_add = Mux(xactBusy_fire, (1.U << xactId).asUInt, 0.U) - val xactBusy_remove = ~Mux(tl.d.fire, (1.U << tl.d.bits.source).asUInt, 0.U) - xactBusy := (xactBusy | xactBusy_add) & xactBusy_remove.asUInt - - // Simplified: data is already aligned, directly construct TileLink request - val lg_beat_bytes = log2Ceil(beatBytes) - val use_put_full = req.mask === ~0.U(beatBytes.W) - - val putFull = edge.Put( - fromSource = xactId, - toAddress = 0.U, - lgSize = lg_beat_bytes.U, - data = req.data - )._2 - - val putPartial = edge.Put( - fromSource = xactId, - toAddress = 0.U, - lgSize = lg_beat_bytes.U, - data = req.data, - mask = req.mask - )._2 - - val selected_put = Mux(use_put_full, putFull, putPartial) - - // TLB processing pipeline - class TLBundleAWithInfo extends Bundle { - val tl_a = tl.a.bits.cloneType - val vaddr = Output(UInt(vaddrBits.W)) - val status = Output(new MStatus) - } - - val untranslated_a = Wire(Decoupled(new TLBundleAWithInfo)) - xactBusy_fire := untranslated_a.fire - untranslated_a.valid := state === s_writing && !xactBusy.andR - untranslated_a.bits.tl_a := selected_put - untranslated_a.bits.vaddr := req.vaddr - untranslated_a.bits.status := req.status - - val tlb_q = Module(new Queue(new TLBundleAWithInfo, 1, pipe=true)) - tlb_q.io.enq <> untranslated_a - - io.tlb.req.valid := tlb_q.io.deq.valid - io.tlb.req.bits := DontCare - io.tlb.req.bits.tlb_req.vaddr := tlb_q.io.deq.bits.vaddr - io.tlb.req.bits.tlb_req.passthrough := false.B - io.tlb.req.bits.tlb_req.size := 0.U - io.tlb.req.bits.tlb_req.cmd := M_XWR - io.tlb.req.bits.status := tlb_q.io.deq.bits.status - - val translate_q = Module(new Queue(new TLBundleAWithInfo, 1, pipe=true)) - translate_q.io.enq <> tlb_q.io.deq - translate_q.io.deq.ready := tl.a.ready || io.tlb.resp.miss - - // TileLink connection - tl.a.valid := translate_q.io.deq.valid && !io.tlb.resp.miss - tl.a.bits := translate_q.io.deq.bits.tl_a - tl.a.bits.address := io.tlb.resp.paddr - - tl.d.ready := true.B - - // Response processing - io.resp.valid := tl.d.valid && edge.last(tl.d) - io.resp.bits.done := true.B - - // State machine - io.req.ready := state === s_idle - io.busy := xactBusy.orR || (state =/= s_idle) - - when (io.req.fire) { - req := io.req.bits - state := s_writing - } - - when (untranslated_a.fire) { - state := s_idle - } - } -} diff --git a/arch/src/main/scala/framework/memdomain/dma/LocalAddr.scala b/arch/src/main/scala/framework/memdomain/dma/LocalAddr.scala deleted file mode 100644 index 31b9b0ab..00000000 --- a/arch/src/main/scala/framework/memdomain/dma/LocalAddr.scala +++ /dev/null @@ -1,147 +0,0 @@ -package framework.memdomain.dma - -import chisel3._ -import chisel3.util._ - -class LocalAddr(sp_banks: Int, sp_bank_entries: Int, acc_banks: Int, acc_bank_entries: Int) extends Bundle { - private val localAddrBits = 32 // TODO magic number - - private val spAddrBits = log2Ceil(sp_banks * sp_bank_entries) - private val accAddrBits = log2Ceil(acc_banks * acc_bank_entries) - private val maxAddrBits = spAddrBits max accAddrBits - private val memAddrBits = log2Ceil(sp_banks * sp_bank_entries + acc_banks * acc_bank_entries) - - private val spBankBits = log2Up(sp_banks) - private val spBankRowBits = log2Up(sp_bank_entries) - - private val accBankBits = log2Up(acc_banks) - val accBankRowBits = log2Up(acc_bank_entries) - - val spRows = sp_banks * sp_bank_entries - - val is_acc_addr = Bool() - val accumulate = Bool() - val read_full_acc_row = Bool() - - private val metadata_w = is_acc_addr.getWidth + accumulate.getWidth + read_full_acc_row.getWidth - assert(maxAddrBits + metadata_w < 32) - - val garbage = UInt(((localAddrBits - maxAddrBits - metadata_w - 1) max 0).W) - val garbage_bit = if (localAddrBits - maxAddrBits >= metadata_w + 1) UInt(1.W) else UInt(0.W) - val data = UInt(memAddrBits.W) - - def sp_bank(dummy: Int = 0) = if (spAddrBits == spBankRowBits) 0.U else data(spAddrBits - 1, spBankRowBits) - def sp_row(dummy: Int = 0) = data(spBankRowBits - 1, 0) - def acc_bank(dummy: Int = 0) = if (accAddrBits == accBankRowBits) 0.U else data(accAddrBits - 1, accBankRowBits) - def acc_row(dummy: Int = 0) = data(accBankRowBits - 1, 0) - def mem_bank(dummy: Int = 0) = data(memAddrBits - 1, spBankRowBits) - def mem_row(dummy: Int = 0) = data(spBankRowBits - 1, 0) - - def full_sp_addr(dummy: Int = 0) = data(spAddrBits - 1, 0) - def full_acc_addr(dummy: Int = 0) = data(accAddrBits - 1, 0) - - def is_same_address(other: LocalAddr): Bool = is_acc_addr === other.is_acc_addr && data === other.data - def is_same_address(other: UInt): Bool = is_same_address(other.asTypeOf(this)) - def is_garbage(dummy: Int = 0) = is_acc_addr && accumulate && read_full_acc_row && data.andR && - (if (garbage_bit.getWidth > 0) garbage_bit.asBool else true.B) - - def +(other: UInt) = { - require(isPow2(sp_bank_entries)) // TODO remove this requirement - require(isPow2(acc_bank_entries)) // TODO remove this requirement - - val result = WireInit(this) - result.data := data + other - result - } - - def <=(other: LocalAddr) = - is_acc_addr === other.is_acc_addr && - Mux(is_acc_addr, full_acc_addr() <= other.full_acc_addr(), full_sp_addr() <= other.full_sp_addr()) - - def <(other: LocalAddr) = - is_acc_addr === other.is_acc_addr && - Mux(is_acc_addr, full_acc_addr() < other.full_acc_addr(), full_sp_addr() < other.full_sp_addr()) - - def >(other: LocalAddr) = - is_acc_addr === other.is_acc_addr && - Mux(is_acc_addr, full_acc_addr() > other.full_acc_addr(), full_sp_addr() > other.full_sp_addr()) - - def add_with_overflow(other: UInt): Tuple2[LocalAddr, Bool] = { - require(isPow2(sp_bank_entries)) // TODO remove this requirement - require(isPow2(acc_bank_entries)) // TODO remove this requirement - - val sum = data +& other - - val overflow = Mux(is_acc_addr, sum(accAddrBits), sum(spAddrBits)) - - val result = WireInit(this) - result.data := sum(maxAddrBits - 1, 0) - - (result, overflow) - } - - // This function can only be used with non-accumulator addresses. Returns both new address and underflow - def floorSub(other: UInt, floor: UInt): (LocalAddr, Bool) = { - require(isPow2(sp_bank_entries)) // TODO remove this requirement - require(isPow2(acc_bank_entries)) // TODO remove this requirement - - val underflow = data < (floor +& other) - - val result = WireInit(this) - result.data := Mux(underflow, floor, data - other) - - (result, underflow) - } - - def make_this_garbage(dummy: Int = 0): Unit = { - is_acc_addr := true.B - accumulate := true.B - read_full_acc_row := true.B - garbage_bit := 1.U - data := ~(0.U(maxAddrBits.W)) - } - -} - -object LocalAddr { - def cast_to_local_addr[T <: Data](local_addr_t: LocalAddr, t: T): LocalAddr = { - // This convenience function is basically the same as calling "asTypeOf(local_addr_t)". However, this convenience - // function will also cast unnecessary garbage bits to 0, which may help reduce multiplier/adder bitwidths - val result = WireInit(t.asTypeOf(local_addr_t)) - if (result.garbage_bit.getWidth > 0) result.garbage := 0.U - result - } - - def cast_to_sp_addr[T <: Data](local_addr_t: LocalAddr, t: T): LocalAddr = { - // This function is a wrapper around cast_to_local_addr, but it assumes that the input will not be the garbage - // address - val result = WireInit(cast_to_local_addr(local_addr_t, t)) - result.is_acc_addr := false.B - result.accumulate := false.B - result.read_full_acc_row := false.B - - // assert(!result.garbage_bit, "cast_to_sp_addr doesn't work on garbage addresses") - - result - } - - def cast_to_acc_addr[T <: Data](local_addr_t: LocalAddr, t: T, accumulate: Bool, read_full: Bool): LocalAddr = { - // This function is a wrapper around cast_to_local_addr, but it assumes that the input will not be the garbage - // address - val result = WireInit(cast_to_local_addr(local_addr_t, t)) - result.is_acc_addr := true.B - result.accumulate := accumulate - result.read_full_acc_row := read_full - - // assert(!result.garbage_bit, "cast_to_acc_addr doesn't work on garbage addresses") - - result - } - - def garbage_addr(local_addr_t: LocalAddr): LocalAddr = { - val result = Wire(chiselTypeOf(local_addr_t)) - result := DontCare - result.make_this_garbage() - result - } -} diff --git a/arch/src/main/scala/framework/memdomain/dma/README.md b/arch/src/main/scala/framework/memdomain/dma/README.md deleted file mode 100644 index 4aa264f3..00000000 --- a/arch/src/main/scala/framework/memdomain/dma/README.md +++ /dev/null @@ -1,138 +0,0 @@ -# DMA Engine Implementation - -## Overview - -DMA engine implementation for Buckyball's memory domain, located at `arch/src/main/scala/framework/builtin/memdomain/dma`. Provides high-performance memory data transfer services between main memory and on-chip storage. - -Main components: -- **BBStreamReader**: Streaming data reader for bulk reads from external memory -- **BBStreamWriter**: Streaming data writer for bulk writes to external memory -- **LocalAddr**: Local address management for Scratchpad and Accumulator mapping - -## File Structure - -``` -dma/ -├── DMA.scala - Streaming DMA read/write implementation -└── LocalAddr.scala - Local address management -``` - -## DMA.scala - -### Request/Response Interfaces - -```scala -class BBReadRequest()(implicit p: Parameters) extends CoreBundle { - val vaddr = UInt(coreMaxAddrBits.W) // Virtual address - val len = UInt(16.W) // Read length (bytes) - val status = new MStatus // Processor status -} - -class BBWriteRequest(dataWidth: Int)(implicit p: Parameters) extends CoreBundle { - val vaddr = UInt(coreMaxAddrBits.W) // Virtual address - val data = UInt(dataWidth.W) // Write data - val len = UInt(16.W) // Write length (bytes) - val mask = UInt((dataWidth / 8).W) // Byte mask - val status = new MStatus // Processor status -} -``` - -### BBStreamReader - -**State Machine**: -```scala -val s_idle :: s_req_new_block :: Nil = Enum(2) -val state = RegInit(s_idle) -``` - -**Byte Counting**: -```scala -val bytesRequested = Reg(UInt(16.W)) // Bytes requested -val bytesReceived = Reg(UInt(16.W)) // Bytes received -val bytesLeft = req.len - bytesRequested -``` - -**TileLink Request**: -```scala -val get = edge.Get( - fromSource = xactId, - toAddress = 0.U, - lgSize = log2Ceil(beatBytes).U -)._2 -``` - -**TLB Integration**: -```scala -io.tlb.req.bits.tlb_req.vaddr := tlb_q.io.deq.bits.vaddr -io.tlb.req.bits.tlb_req.cmd := M_XRD // Read operation -io.tlb.req.bits.status := tlb_q.io.deq.bits.status -``` - -### BBStreamWriter - -**Put Operation Selection**: -```scala -val use_put_full = req.mask === ~0.U(beatBytes.W) -val selected_put = Mux(use_put_full, putFull, putPartial) -``` - -**Response Handling**: -```scala -io.resp.valid := tl.d.valid && edge.last(tl.d) -io.resp.bits.done := true.B -``` - -## LocalAddr.scala - -### Address Structure - -```scala -class LocalAddr(sp_banks: Int, sp_bank_entries: Int, acc_banks: Int, acc_bank_entries: Int) extends Bundle { - val is_acc_addr = Bool() // Is accumulator address - val accumulate = Bool() // Perform accumulation - val read_full_acc_row = Bool() // Read full accumulator row - val data = UInt(memAddrBits.W) // Actual address data -} -``` - -### Address Decomposition - -```scala -// Scratchpad address decomposition -def sp_bank(dummy: Int = 0) = if (spAddrBits == spBankRowBits) 0.U - else data(spAddrBits - 1, spBankRowBits) -def sp_row(dummy: Int = 0) = data(spBankRowBits - 1, 0) - -// Accumulator address decomposition -def acc_bank(dummy: Int = 0) = if (accAddrBits == accBankRowBits) 0.U - else data(accAddrBits - 1, accBankRowBits) -def acc_row(dummy: Int = 0) = data(accBankRowBits - 1, 0) -``` - -### Address Operations - -```scala -// Address addition -def +(other: UInt) = { - val result = WireInit(this) - result.data := data + other - result -} - -// Addition with overflow check -def add_with_overflow(other: UInt): Tuple2[LocalAddr, Bool] = { - val sum = data +& other - val overflow = Mux(is_acc_addr, sum(accAddrBits), sum(spAddrBits)) - (result, overflow) -} -``` - -## Important Notes - -1. **Alignment Requirements**: DMA operations consider TileLink protocol alignment requirements -2. **Transaction ID Management**: Both engines implement transaction ID allocation and recycling for concurrent requests -3. **TLB Integration**: Full virtual address translation support for user and kernel mode -4. **Pipeline Design**: Multiple pipeline stages including address translation, TileLink request, and response handling -5. **Error Handling**: TLB miss handling implemented, relies on upper layer software for access failures -6. **Performance**: BBStreamWriter supports full and partial write modes, automatically selects optimal TileLink operation based on mask -7. **Configuration**: DMA engines support parametrized configuration of concurrent transactions, data width, max transfer bytes diff --git a/arch/src/main/scala/framework/memdomain/frontend/MemFrontend.scala b/arch/src/main/scala/framework/memdomain/frontend/MemFrontend.scala new file mode 100644 index 00000000..ac6ba3c2 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/MemFrontend.scala @@ -0,0 +1,185 @@ +package framework.memdomain.frontend + +import chisel3._ +import chisel3.util._ +import freechips.rocketchip.tile._ +import framework.memdomain.frontend.outside_channel.dma.{StreamReader, StreamWriter} +import framework.memdomain.frontend.outside_channel.{MemConfiger, MemConfigerIO, MemLoader, MemStorer} +import framework.memdomain.frontend.outside_channel.tlb.{BBTLBCluster, BBTLBExceptionIO, BBTLBIO, BBTLBPTWIO} +import freechips.rocketchip.tilelink.{TLBundle, TLEdgeOut} +import framework.frontend.globalrs.{GlobalSchedComplete, GlobalSchedIssue} +import framework.balldomain.blink.{BankRead, BankWrite} +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import framework.top.GlobalConfig +import framework.memdomain.frontend.cmd_channel.decoder.MemDomainDecoder +import framework.memdomain.frontend.cmd_channel.rs.MemReservationStation +import framework.memdomain.utils.pmc.MemCyclePMC + +/** + * MemFrontend: + * Provides DMA interface and Ball Domain interface + */ +@instantiable +class MemFrontend(val b: GlobalConfig)(edge: TLEdgeOut) extends Module { + + @public + val io = IO(new Bundle { + // Issue interface from global RS (single channel) + val global_issue_i = Flipped(Decoupled(new GlobalSchedIssue(b))) + // Report completion to global RS (single channel) + val global_complete_o = Decoupled(new GlobalSchedComplete(b)) + + // Bank read/write interface - used by load/store + val interdma = new Bundle { + val bankRead = Flipped(new BankRead(b)) + val bankWrite = Flipped(new BankWrite(b)) + val read_is_shared = Output(Bool()) + val write_is_shared = Output(Bool()) + } + + // TLB interfaces for internal DMA modules (Reader/Writer) + // These are NOT exposed to outside - only PTW and TLB exception are exposed + // PTW interface - needs to connect to upper level PTW (shared TLB has only 1 PTW) + val ptw = Vec(1, new BBTLBPTWIO(b)) + // TLB exception interface - exposed to upper level for handling flush, etc. (shared TLB has only 1 exp) + val tlbExp = Vec(1, new BBTLBExceptionIO) + + // TileLink physical connections for DMA (Reader/Writer) + val tl_reader = new TLBundle(edge.bundle) + val tl_writer = new TLBundle(edge.bundle) + + val config = Decoupled(new MemConfigerIO(b)) + + // Query interface to backend for group count + val query_vbank_id = Output(UInt(8.W)) + val query_is_shared = Output(Bool()) + val query_group_count = Input(UInt(4.W)) + + val hartid = Input(UInt(b.core.xLen.W)) + + // Busy signal + val busy = Output(Bool()) + }) + + val memDecoder: Instance[MemDomainDecoder] = Instantiate(new MemDomainDecoder(b)) + val memRs: Instance[MemReservationStation] = Instantiate(new MemReservationStation(b)) + val memLoader: Instance[MemLoader] = Instantiate(new MemLoader(b)) + val memStorer: Instance[MemStorer] = Instantiate(new MemStorer(b)) + val pmc: Instance[MemCyclePMC] = Instantiate(new MemCyclePMC(b)) + + // TLB cluster - internal TLB management for DMA modules + // Supports 2 clients: StreamReader (client 1) and StreamWriter (client 0) + val tlbCluster = Instantiate(new BBTLBCluster(b)(edge)) + + // DMA Reader and Writer modules - handle actual DMA transfers + val reader: Instance[StreamReader] = Instantiate(new StreamReader(b)(edge)) + val writer: Instance[StreamWriter] = Instantiate(new StreamWriter(b)(edge)) + val configer: Instance[MemConfiger] = Instantiate(new MemConfiger(b)) + +// ----------------------------------------------------------------------------- +// Global RS -> MemDecoder +// ----------------------------------------------------------------------------- + memDecoder.io.cmd_i.valid := io.global_issue_i.valid + memDecoder.io.cmd_i.bits := io.global_issue_i.bits.cmd + io.global_issue_i.ready := memDecoder.io.cmd_i.ready + + // Config signal goes to backend + io.config <> configer.io.config + + // Connect query interfaces + // Use memLoader's query by default, memStorer will override when active + io.query_vbank_id := Mux(memStorer.io.cmdReq.valid, memStorer.io.query_vbank_id, memLoader.io.query_vbank_id) + io.query_is_shared := Mux(memStorer.io.cmdReq.valid, memStorer.io.query_is_shared, memLoader.io.query_is_shared) + memLoader.io.query_group_count := io.query_group_count + memStorer.io.query_group_count := io.query_group_count + +// ----------------------------------------------------------------------------- +// MemDecoder -> MemReservationStation +// ----------------------------------------------------------------------------- + // Connect decoded instruction and global rob_id + memRs.io.mem_decode_cmd_i.valid := memDecoder.io.mem_decode_cmd_o.valid + memRs.io.mem_decode_cmd_i.bits.cmd := memDecoder.io.mem_decode_cmd_o.bits + memRs.io.mem_decode_cmd_i.bits.rob_id := io.global_issue_i.bits.rob_id + memRs.io.mem_decode_cmd_i.bits.is_sub := io.global_issue_i.bits.is_sub + memRs.io.mem_decode_cmd_i.bits.sub_rob_id := io.global_issue_i.bits.sub_rob_id + memDecoder.io.mem_decode_cmd_o.ready := memRs.io.mem_decode_cmd_i.ready + +// ----------------------------------------------------------------------------- +// MemReservationStation -> MemLoader/MemStorer +// ----------------------------------------------------------------------------- + memLoader.io.cmdReq <> memRs.io.issue_o.ld + memStorer.io.cmdReq <> memRs.io.issue_o.st + configer.io.cmdReq <> memRs.io.issue_o.cf + configer.io.hartid := io.hartid + memRs.io.commit_i.ld <> memLoader.io.cmdResp + memRs.io.commit_i.st <> memStorer.io.cmdResp + memRs.io.commit_i.cf <> configer.io.cmdResp + +//----------------------------------------------------------------------------- +// PMC - Performance Monitor Counter +// ----------------------------------------------------------------------------- + pmc.io.ldReq_i.valid := memRs.io.issue_o.ld.fire + pmc.io.ldReq_i.bits := memRs.io.issue_o.ld.bits + pmc.io.stReq_i.valid := memRs.io.issue_o.st.fire + pmc.io.stReq_i.bits := memRs.io.issue_o.st.bits + pmc.io.ldResp_o.valid := memLoader.io.cmdResp.fire + pmc.io.ldResp_o.bits := memLoader.io.cmdResp.bits + pmc.io.stResp_o.valid := memStorer.io.cmdResp.fire + pmc.io.stResp_o.bits := memStorer.io.cmdResp.bits + + // Connect Reader and Writer to MemLoader and MemStorer + memLoader.io.dmaReq <> reader.io.req + reader.io.resp <> memLoader.io.dmaResp + memStorer.io.dmaReq <> writer.io.req + writer.io.resp <> memStorer.io.dmaResp + + // TLB connection - internal TLB cluster connected to DMA modules + // Client 0: StreamWriter, Client 1: StreamReader + // Insert pipeline registers to break combinational loops + tlbCluster.io.clients(1).req.valid := reader.io.tlb.req.valid + tlbCluster.io.clients(1).req.bits := reader.io.tlb.req.bits + reader.io.tlb.req.ready := tlbCluster.io.clients(1).req.ready + + reader.io.tlb.resp.valid := tlbCluster.io.clients(1).resp.valid + reader.io.tlb.resp.bits := tlbCluster.io.clients(1).resp.bits + tlbCluster.io.clients(1).resp.ready := reader.io.tlb.resp.ready + + tlbCluster.io.clients(0).req.valid := writer.io.tlb.req.valid + tlbCluster.io.clients(0).req.bits := writer.io.tlb.req.bits + writer.io.tlb.req.ready := tlbCluster.io.clients(0).req.ready + + writer.io.tlb.resp.valid := tlbCluster.io.clients(0).resp.valid + writer.io.tlb.resp.bits := tlbCluster.io.clients(0).resp.bits + tlbCluster.io.clients(0).resp.ready := writer.io.tlb.resp.ready + + // Connect DMA flush signals to TLB exceptions + reader.io.flush := io.tlbExp(0).flush() + writer.io.flush := io.tlbExp(0).flush() + + // PTW interface - connect to upper level page table walker + io.ptw <> tlbCluster.io.ptw + + // TLB exception interface - connect to upper level for flush handling + tlbCluster.io.exp <> io.tlbExp + + // Connect TileLink physical ports from Reader/Writer to external interface + io.tl_reader <> reader.io.tl + io.tl_writer <> writer.io.tl + + // Connect MemLoader and MemStorer to MemController's DMA interface + memLoader.io.bankWrite <> io.interdma.bankWrite + memStorer.io.bankRead <> io.interdma.bankRead + io.interdma.read_is_shared := memStorer.io.is_shared + io.interdma.write_is_shared := memLoader.io.is_shared + + // Completion signal connected to global RS + io.global_complete_o.valid := memRs.io.complete_o.valid + io.global_complete_o.bits.rob_id := memRs.io.complete_o.bits.rob_id + io.global_complete_o.bits.is_sub := memRs.io.complete_o.bits.is_sub + io.global_complete_o.bits.sub_rob_id := memRs.io.complete_o.bits.sub_rob_id + memRs.io.complete_o.ready := io.global_complete_o.ready + + // Busy signal + // Simple busy signal + io.busy := !memRs.io.complete_o.ready +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/decoder/DISA.scala b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/decoder/DISA.scala new file mode 100644 index 00000000..4d3e073d --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/decoder/DISA.scala @@ -0,0 +1,12 @@ +package framework.memdomain.frontend.cmd_channel.decoder + +import chisel3._ +import chisel3.util._ + +object DISA { + // enable=010 (1 write), opcode group + val MSET_BITPAT = BitPat("b0100000") // 32 (0x20) — enable=010, opcode=0 + val MVIN_BITPAT = BitPat("b0100001") // 33 (0x21) — enable=010, opcode=1 + // enable=001 (1 read), opcode group + val MVOUT_BITPAT = BitPat("b0010000") // 16 (0x10) — enable=001, opcode=0 +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/decoder/DomainDecoder.scala b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/decoder/DomainDecoder.scala new file mode 100644 index 00000000..8890a58a --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/decoder/DomainDecoder.scala @@ -0,0 +1,151 @@ +package framework.memdomain.frontend.cmd_channel.decoder + +import chisel3._ +import chisel3.util._ +import framework.frontend.decoder.{DomainId, PostGDCmd} +import framework.top.GlobalConfig +import framework.memdomain.frontend.cmd_channel.decoder.DISA._ +import freechips.rocketchip.tile._ +import chisel3.experimental.hierarchy.{instantiable, public} + +// Detailed decode output for Mem domain +class MemDecodeCmd(b: GlobalConfig) extends Bundle { + val is_shared = Bool() // Shared memory access marker derived from bank_id threshold. + val is_load = Bool() + val is_store = Bool() + val is_config = Bool() + + // Memory address + val mem_addr = UInt(b.memDomain.memAddrLen.W) + // Iteration count + val iter = UInt(b.frontend.iter_len.W) + // Bank information + val bank_id = UInt(log2Up(b.memDomain.bankNum).W) + val special = UInt(64.W) +} + +// LS decode fields (iter removed from decode table — always from rs1[63:48]) +object LSDecodeFields extends Enumeration { + type Field = Value + val LD_EN, ST_EN, MEMADDR, BANK_ID, SPECIAL, VALID = Value +} + +// Default constants for Mem decoder +object MemDefaultConstants { + val Y = true.B + val N = false.B + val DADDR = 0.U(15.W) + val DSPECIAL = 0.U(64.W) +} + +@instantiable +class MemDomainDecoder(val b: GlobalConfig) extends Module { + import MemDefaultConstants._ + + @public + val io = IO(new Bundle { + val cmd_i = Flipped(Decoupled(new PostGDCmd(b))) + val mem_decode_cmd_o = Decoupled(new MemDecodeCmd(b)) + }) + + val bankAddrLen = log2Up(b.memDomain.bankEntries) + val memAddrLen = b.memDomain.memAddrLen + val bankIdLen = b.frontend.bank_id_len + val iterLen = b.frontend.iter_len + + // Only process Mem instructions + io.cmd_i.ready := io.mem_decode_cmd_o.ready + + val func7 = io.cmd_i.bits.cmd.funct + val rs1 = io.cmd_i.bits.cmd.rs1Data + val rs2 = io.cmd_i.bits.cmd.rs2Data + + // Unified encoding: + // rs1[9:0] = bank_id (BANK0) + // rs1[63:30] = iter (34-bit) + // funct7[6:4] = enable (bank access flags, decoded by GlobalDecoder) + // rs2[38:0] = mem_addr (for MVIN/MVOUT, 39-bit) + // rs2[63:0] = special (full 64-bit) + import LSDecodeFields._ + val ls_default_decode = List(N, N, DADDR, DADDR, DSPECIAL, N) + + val ls_decode_list = ListLookup( + func7, + ls_default_decode, + Array( + MSET_BITPAT -> List( + N, + N, + 0.U(memAddrLen.W), // mem_addr: not used for MSET + rs1(bankIdLen - 1, 0), // bank_id from rs1[9:0] + rs2, // special = full rs2 + Y + ), // mset + MVIN_BITPAT -> List( + Y, + N, + rs2(memAddrLen - 1, 0), // mem_addr from rs2[38:0] + rs1(bankIdLen - 1, 0), // bank_id from rs1[9:0] + rs2, // special = full rs2 + Y + ), // mvin + MVOUT_BITPAT -> List( + N, + Y, + rs2(memAddrLen - 1, 0), // mem_addr from rs2[38:0] + rs1(bankIdLen - 1, 0), // bank_id from rs1[9:0] + rs2, // special = full rs2 + Y + ) // mvout + ) + ) + + assert( + !(io.cmd_i.fire && !ls_decode_list(LSDecodeFields.VALID.id).asBool), + s"MemDomainDecoder: Invalid command opcode, func7 = 0x%x\n", + func7 + ) + +// ----------------------------------------------------------------------------- +// Output assignment +// ----------------------------------------------------------------------------- + io.mem_decode_cmd_o.valid := io.cmd_i.valid && (io.cmd_i.bits.domain_id === DomainId.MEM) + + val raw_bank_id = rs1(bankIdLen - 1, 0) + io.mem_decode_cmd_o.bits.is_shared := io.mem_decode_cmd_o.valid && (raw_bank_id > 31.U) + io.mem_decode_cmd_o.bits.is_load := Mux( + io.mem_decode_cmd_o.valid, + ls_decode_list(LSDecodeFields.LD_EN.id).asBool, + false.B + ) + io.mem_decode_cmd_o.bits.is_store := Mux( + io.mem_decode_cmd_o.valid, + ls_decode_list(LSDecodeFields.ST_EN.id).asBool, + false.B + ) + io.mem_decode_cmd_o.bits.is_config := Mux( + io.mem_decode_cmd_o.valid, + func7 === MSET_BITPAT, + false.B + ) + io.mem_decode_cmd_o.bits.mem_addr := Mux( + io.mem_decode_cmd_o.valid, + ls_decode_list(LSDecodeFields.MEMADDR.id).asUInt, + 0.U(b.memDomain.memAddrLen.W) + ) + // iter is always from rs1[63:30] + io.mem_decode_cmd_o.bits.iter := Mux( + io.mem_decode_cmd_o.valid, + rs1(63, 30), + 0.U(iterLen.W) + ) + + // Address parsing + val ls_bank_id = ls_decode_list(LSDecodeFields.BANK_ID.id).asUInt + io.mem_decode_cmd_o.bits.bank_id := Mux(io.mem_decode_cmd_o.valid, ls_bank_id, 0.U(log2Up(b.memDomain.bankNum).W)) + io.mem_decode_cmd_o.bits.special := Mux( + io.mem_decode_cmd_o.valid, + ls_decode_list(LSDecodeFields.SPECIAL.id).asUInt, + 0.U(64.W) + ) +} diff --git a/arch/src/main/scala/framework/memdomain/rs/README.md b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/README.md similarity index 100% rename from arch/src/main/scala/framework/memdomain/rs/README.md rename to arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/README.md diff --git a/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/reservationStation.scala b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/reservationStation.scala new file mode 100644 index 00000000..81343d23 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/reservationStation.scala @@ -0,0 +1,133 @@ +package framework.memdomain.frontend.cmd_channel.rs + +import chisel3._ +import chisel3.util._ +import chisel3.experimental._ +import framework.memdomain.frontend.cmd_channel.decoder.MemDecodeCmd +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig + +// Mem domain issue interface - includes global rob_id +class MemRsIssue(val b: GlobalConfig) extends Bundle { + val cmd = new MemDecodeCmd(b) + // Global ROB ID + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) +} + +// Mem domain completion interface +class MemRsComplete(val b: GlobalConfig) extends Bundle { + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) +} + +// Mem domain issue interface combination (Load + Store) +class MemIssueInterface(val b: GlobalConfig) extends Bundle { + val ld = Decoupled(new MemRsIssue(b)) + val st = Decoupled(new MemRsIssue(b)) + val cf = Decoupled(new MemRsIssue(b)) +} + +// Mem domain completion interface combination (Load + Store) +class MemCommitInterface(val b: GlobalConfig) extends Bundle { + val ld = Flipped(Decoupled(new MemRsComplete(b))) + val st = Flipped(Decoupled(new MemRsComplete(b))) + val cf = Flipped(Decoupled(new MemRsComplete(b))) +} + +// Local Mem reservation station - simple FIFO scheduler +@instantiable +class MemReservationStation(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + + // Decoded instruction input (with global rob_id) + val mem_decode_cmd_i = Flipped(new DecoupledIO(new Bundle { + val cmd = new MemDecodeCmd(b) + // Global ROB ID + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) + })) + + // Rs -> MemLoader/MemStorer + val issue_o = new MemIssueInterface(b) + val commit_i = new MemCommitInterface(b) + + // Output completion signal (with global rob_id, single channel) + val complete_o = Decoupled(new MemRsComplete(b)) + }) + + // Simple FIFO queue, only for buffering + val fifo = Module(new Queue( + new Bundle { + val cmd = new MemDecodeCmd(b) + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) + val is_sub = Bool() + val sub_rob_id = UInt(log2Up(b.frontend.sub_rob_depth * 4).W) + }, + entries = 4 + )) // Small buffer is sufficient + +// ----------------------------------------------------------------------------- +// Inbound - FIFO enqueue +// ----------------------------------------------------------------------------- + fifo.io.enq <> io.mem_decode_cmd_i + +// ----------------------------------------------------------------------------- +// Outbound - instruction issue (dispatch based on is_load/is_store) +// ----------------------------------------------------------------------------- + val headEntry = fifo.io.deq.bits + + // Load issue + io.issue_o.ld.valid := fifo.io.deq.valid && headEntry.cmd.is_load + io.issue_o.ld.bits.cmd := headEntry.cmd + io.issue_o.ld.bits.rob_id := headEntry.rob_id + io.issue_o.ld.bits.is_sub := headEntry.is_sub + io.issue_o.ld.bits.sub_rob_id := headEntry.sub_rob_id + + // Store issue + io.issue_o.st.valid := fifo.io.deq.valid && headEntry.cmd.is_store + io.issue_o.st.bits.cmd := headEntry.cmd + io.issue_o.st.bits.rob_id := headEntry.rob_id + io.issue_o.st.bits.is_sub := headEntry.is_sub + io.issue_o.st.bits.sub_rob_id := headEntry.sub_rob_id + + // Config issue + io.issue_o.cf.valid := fifo.io.deq.valid && headEntry.cmd.is_config + io.issue_o.cf.bits.cmd := headEntry.cmd + io.issue_o.cf.bits.rob_id := headEntry.rob_id + io.issue_o.cf.bits.is_sub := headEntry.is_sub + io.issue_o.cf.bits.sub_rob_id := headEntry.sub_rob_id + + // FIFO deq.ready - can only dequeue when target unit is ready + fifo.io.deq.ready := + (headEntry.cmd.is_load && io.issue_o.ld.ready) || + (headEntry.cmd.is_store && io.issue_o.st.ready) || + (headEntry.cmd.is_config && io.issue_o.cf.ready) + +// ----------------------------------------------------------------------------- +// Completion signal processing - directly forward to global RS +// ----------------------------------------------------------------------------- + val completeArb = Module(new Arbiter(new MemRsComplete(b), 3)) + + completeArb.io.in(0).valid := io.commit_i.ld.valid + completeArb.io.in(0).bits := io.commit_i.ld.bits + io.commit_i.ld.ready := completeArb.io.in(0).ready + + completeArb.io.in(1).valid := io.commit_i.st.valid + completeArb.io.in(1).bits := io.commit_i.st.bits + io.commit_i.st.ready := completeArb.io.in(1).ready + + completeArb.io.in(2).valid := io.commit_i.cf.valid + completeArb.io.in(2).bits := io.commit_i.cf.bits + io.commit_i.cf.ready := completeArb.io.in(2).ready + + // Forward completion signal (with global rob_id) + io.complete_o.valid := completeArb.io.out.valid + io.complete_o.bits := completeArb.io.out.bits + completeArb.io.out.ready := io.complete_o.ready +} diff --git a/arch/src/main/scala/framework/memdomain/rs/ringFifo.scala b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/ringFifo.scala similarity index 82% rename from arch/src/main/scala/framework/memdomain/rs/ringFifo.scala rename to arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/ringFifo.scala index 1d4cc1fb..15d37e94 100644 --- a/arch/src/main/scala/framework/memdomain/rs/ringFifo.scala +++ b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/ringFifo.scala @@ -1,4 +1,4 @@ -package framework.memdomain.rs +package framework.memdomain.frontend.cmd_channel.rs import chisel3._ import chisel3.util._ @@ -10,7 +10,7 @@ import chisel3.util._ class RingFifo[T <: Data](gen: T, n: Int) extends Module { require(n > 0, "FIFO size must be greater than 0") - val io = IO(new Bundle{ + val io = IO(new Bundle { // Flipped reverses the interface val enq = Flipped(new DecoupledIO(gen)) val deq = new DecoupledIO(gen) @@ -36,9 +36,16 @@ class RingFifo[T <: Data](gen: T, n: Int) extends Module { // Determine if it will be full next // Enqueue, and no dequeue, and stack will be full next - val isFullNext = Mux(doEnq && !doDeq && (enqPtrInc === deqPtr), - true.B , Mux(doDeq && isFull, // Dequeue, and full - false.B, isFull)) + val isFullNext = Mux( + doEnq && !doDeq && (enqPtrInc === deqPtr), + true.B, + Mux( + doDeq && isFull, // Dequeue, and full + false.B, + isFull + ) + ) + // Enqueue, change tail, add one element backward enqPtr := Mux(doEnq, enqPtrInc, enqPtr) // Dequeue, change head, head moves backward by one @@ -46,7 +53,7 @@ class RingFifo[T <: Data](gen: T, n: Int) extends Module { isFull := isFullNext val ram = Mem(n, gen) - when (doEnq){ + when(doEnq) { ram(enqPtr) := io.enq.bits } io.enq.ready := !isFull diff --git a/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/rob.scala b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/rob.scala new file mode 100644 index 00000000..e488697e --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/cmd_channel/rs/rob.scala @@ -0,0 +1,88 @@ +package framework.memdomain.frontend.cmd_channel.rs + +import chisel3._ +import chisel3.util._ +import chisel3.experimental._ +import framework.memdomain.frontend.cmd_channel.decoder.MemDecodeCmd +import framework.top.GlobalConfig + +// ROB entry data structure - preserves ROB ID to support out-of-order completion +class RobEntry(b: GlobalConfig) extends Bundle { + val cmd = new MemDecodeCmd(b) + val rob_id = UInt(log2Up(b.frontend.rob_entries).W) +} + +class ROB(val b: GlobalConfig) extends Module { + + val io = IO(new Bundle { + // Allocation interface + val alloc = Flipped(new DecoupledIO(new MemDecodeCmd(b))) + + // Issue interface - issue uncompleted head instruction + val issue = new DecoupledIO(new RobEntry(b)) + + // Completion interface - report instruction completion + val complete = Flipped(new DecoupledIO(UInt(log2Up(b.frontend.rob_entries).W))) + + // Commit interface - commit completed head instruction + // val commit = new DecoupledIO(new RobEntry) + + // Status signals + val empty = Output(Bool()) + val full = Output(Bool()) + }) + + // Only use FIFO + completion status table, only enqueue/dequeue, sequential execution and sequential completion + val robFifo = Module(new Queue(new RobEntry(b), b.frontend.rob_entries)) + val robIdCounter = RegInit(0.U(log2Up(b.frontend.rob_entries).W)) + // Initialize to false to avoid X states in FPGA + val robTable = RegInit(VecInit(Seq.fill(b.frontend.rob_entries)(false.B))) + + // Initialize completion status table + for (i <- 0 until b.frontend.rob_entries) { + when(reset.asBool) { + robTable(i) := true.B + } + } + +// ----------------------------------------------------------------------------- +// Inbound - instruction allocation +// ----------------------------------------------------------------------------- + robFifo.io.enq.valid := io.alloc.valid + robFifo.io.enq.bits.cmd := io.alloc.bits + robFifo.io.enq.bits.rob_id := robIdCounter + + io.alloc.ready := robFifo.io.enq.ready + + when(io.alloc.fire) { + robIdCounter := robIdCounter + 1.U + robTable(robIdCounter) := false.B + } + +// ----------------------------------------------------------------------------- +// Completion signal processing using robTable tracking +// ----------------------------------------------------------------------------- + io.complete.ready := true.B + when(io.complete.fire) { + robTable(io.complete.bits) := true.B + } + +// ----------------------------------------------------------------------------- +// Outbound - head instruction issue +// ----------------------------------------------------------------------------- + val headEntry = robFifo.io.deq.bits + val headCompleted = robTable(headEntry.rob_id) + io.issue.valid := robFifo.io.deq.valid && !headCompleted + io.issue.bits := headEntry + + robFifo.io.deq.ready := io.issue.ready && !headCompleted + +// ----------------------------------------------------------------------------- +// Status signals +// ----------------------------------------------------------------------------- + val isEmpty = robTable.reduce(_ && _) + val isFull = !robFifo.io.enq.ready + + io.empty := isEmpty + io.full := isFull +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemConfiger.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemConfiger.scala new file mode 100644 index 00000000..819ffec1 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemConfiger.scala @@ -0,0 +1,103 @@ +package framework.memdomain.frontend.outside_channel + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig +import framework.memdomain.frontend.cmd_channel.rs.{MemRsComplete, MemRsIssue} +import chisel3.experimental.hierarchy.{instantiable, public} + +class MemConfigerIO(val b: GlobalConfig) extends Bundle { + val vbank_id = Output(UInt(8.W)) + val is_shared = Output(Bool()) + val is_multi = Output(Bool()) + val alloc = Output(Bool()) + val group_id = Output(UInt(3.W)) + val hart_id = Output(UInt(b.core.xLen.W)) +} + +@instantiable +class MemConfiger(val b: GlobalConfig) extends Module { + + val rob_id_width = log2Up(b.frontend.rob_entries) + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new MemRsIssue(b))) + val cmdResp = Decoupled(new MemRsComplete(b)) + + val config = Decoupled(new MemConfigerIO(b)) + val hartid = Input(UInt(b.core.xLen.W)) + }) + + val idle :: config :: Nil = Enum(2) + val state = RegInit(idle) + val alloc_reg = RegInit(false.B) + val is_shared_reg = RegInit(false.B) + val row_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val col_reg = RegInit(0.U(log2Up(b.memDomain.bankEntries).W)) + val vbank_id_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val rob_id_reg = RegInit(0.U(rob_id_width.W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + val counter = RegInit(0.U(4.W)) + + io.config.bits.is_multi := false.B + io.config.bits.is_shared := false.B + io.config.bits.alloc := false.B + io.config.bits.vbank_id := 0.U(8.W) + io.config.bits.group_id := 0.U(3.W) + io.config.bits.hart_id := io.hartid + io.config.valid := false.B + io.cmdResp.valid := false.B + io.cmdResp.bits.rob_id := 0.U(rob_id_width.W) + io.cmdResp.bits.is_sub := false.B + io.cmdResp.bits.sub_rob_id := 0.U + + when(state === idle) { + when(io.cmdReq.valid) { + when(io.cmdReq.bits.cmd.special(9, 5) > 1.U) { //is multi bank (col > 1) + state := config + col_reg := io.cmdReq.bits.cmd.special(9, 5) + alloc_reg := io.cmdReq.bits.cmd.special(10) + is_shared_reg := io.cmdReq.bits.cmd.is_shared + vbank_id_reg := io.cmdReq.bits.cmd.bank_id + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + + }.otherwise { //not multi bank + io.config.bits.alloc := io.cmdReq.bits.cmd.special(10) + io.config.bits.is_shared := io.cmdReq.bits.cmd.is_shared + io.config.bits.vbank_id := io.cmdReq.bits.cmd.bank_id + io.config.valid := true.B + + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := io.cmdReq.bits.rob_id + io.cmdResp.bits.is_sub := io.cmdReq.bits.is_sub + io.cmdResp.bits.sub_rob_id := io.cmdReq.bits.sub_rob_id + } + } + + }.otherwise { + io.config.bits.is_multi := true.B + io.config.bits.is_shared := is_shared_reg + io.config.bits.alloc := alloc_reg + io.config.bits.vbank_id := vbank_id_reg + io.config.bits.group_id := counter + io.config.valid := (counter < col_reg) + + when(io.config.fire && counter < col_reg) { + counter := counter + 1.U + } + + when(counter >= col_reg) { + state := idle + counter := 0.U + io.cmdResp.valid := true.B + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + } + } + io.cmdReq.ready := state === idle +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemLoader.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemLoader.scala new file mode 100644 index 00000000..372dec51 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemLoader.scala @@ -0,0 +1,167 @@ +package framework.memdomain.frontend.outside_channel + +import chisel3._ +import chisel3.util._ +import framework.memdomain.frontend.cmd_channel.rs.{MemRsComplete, MemRsIssue} +import framework.memdomain.backend.banks.SramWriteIO +import framework.memdomain.frontend.outside_channel.dma.{BBReadRequest, BBReadResponse} +import freechips.rocketchip.rocket.MStatus +import framework.balldomain.blink.BankWrite +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig + +@instantiable +class MemLoader(val b: GlobalConfig) extends Module { + val rob_id_width = log2Up(b.frontend.rob_entries) + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new MemRsIssue(b))) + val cmdResp = Decoupled(new MemRsComplete(b)) + + val dmaReq = Decoupled(new BBReadRequest()) + val dmaResp = Flipped(Decoupled(new BBReadResponse(b.memDomain.bankWidth))) + + val bankWrite = Flipped(new BankWrite(b)) + + // Query interface to get group count + val query_vbank_id = Output(UInt(8.W)) + val query_is_shared = Output(Bool()) + val query_group_count = Input(UInt(4.W)) + + // Propagate decoded shared/private access intent. + val is_shared = Output(Bool()) + }) + + val s_idle :: s_dma_req :: s_dma_wait :: s_wait_last_write :: s_done :: Nil = Enum(5) + val state = RegInit(s_idle) + + val rob_id_reg = RegInit(0.U(rob_id_width.W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + val mem_addr_reg = Reg(UInt(b.memDomain.memAddrLen.W)) + val iter_reg = Reg(UInt(b.frontend.iter_len.W)) + val resp_count = RegInit(0.U(log2Up(16).W)) + val wr_bank_reg = Reg(UInt(log2Up(b.memDomain.bankNum).W)) + val stride_reg = Reg(UInt(11.W)) + val is_shared_reg = RegInit(false.B) + + // Group counter for multi-bank writes + val group_counter = RegInit(0.U(4.W)) + val group_count_reg = RegInit(0.U(4.W)) + + // ----------------------------- + // pending latch for 1-beat DMA -> bankWrite + // ----------------------------- + val pending = RegInit(false.B) + val latRow = Reg(UInt(log2Up(b.memDomain.bankEntries).W)) + val latData = Reg(UInt(b.memDomain.bankWidth.W)) + val latLast = RegInit(false.B) + + // ----------------------------- + // defaults + // ----------------------------- + io.cmdReq.ready := (state === s_idle) + + io.dmaReq.valid := (state === s_dma_req) + io.dmaReq.bits.vaddr := mem_addr_reg + io.dmaReq.bits.len := iter_reg * (b.memDomain.bankWidth / 8).U + io.dmaReq.bits.status := 0.U.asTypeOf(new MStatus) + io.dmaReq.bits.stride := stride_reg + + // only accept DMA beat when waiting AND no pending beat buffered + io.dmaResp.ready := (state === s_dma_wait) && !pending + + // bank write request driven from pending + io.bankWrite.io.req.valid := pending + io.bankWrite.io.req.bits.addr := latRow / group_count_reg + io.bankWrite.io.req.bits.data := latData + io.bankWrite.io.req.bits.mask := VecInit(Seq.fill(b.memDomain.bankMaskLen)(true.B)) + io.bankWrite.io.req.bits.wmode := false.B + + // IMPORTANT: always ready for write response (avoid deadlock) + io.bankWrite.io.resp.ready := true.B + + io.bankWrite.rob_id := rob_id_reg + io.bankWrite.bank_id := wr_bank_reg + io.bankWrite.ball_id := 0.U + io.bankWrite.group_id := group_counter + io.is_shared := is_shared_reg + + // cmdResp (Decoupled): hold valid until accepted + io.cmdResp.valid := (state === s_done) + io.cmdResp.bits := 0.U.asTypeOf(new MemRsComplete(b)) + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + // ----------------------------- + // Receive load instruction + // ----------------------------- + when(io.cmdReq.fire && io.cmdReq.bits.cmd.is_load) { + state := s_dma_req + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + mem_addr_reg := io.cmdReq.bits.cmd.mem_addr + wr_bank_reg := io.cmdReq.bits.cmd.bank_id + // stride from rs2[57:39] + stride_reg := io.cmdReq.bits.cmd.special(57, 39) + resp_count := 0.U + pending := false.B + latLast := false.B + group_counter := 0.U + group_count_reg := io.query_group_count + is_shared_reg := io.cmdReq.bits.cmd.is_shared + + // Query group count and multiply iter + iter_reg := io.cmdReq.bits.cmd.iter * io.query_group_count + } + + // Drive query interface + // When idle and cmdReq is valid, query the incoming bank_id + // Otherwise use the registered bank_id + io.query_vbank_id := Mux(state === s_idle && io.cmdReq.valid, io.cmdReq.bits.cmd.bank_id, wr_bank_reg) + io.query_is_shared := Mux(state === s_idle && io.cmdReq.valid, io.cmdReq.bits.cmd.is_shared, is_shared_reg) + + // DMA req accepted + when(io.dmaReq.fire) { + state := s_dma_wait + resp_count := 0.U + } + + // Latch DMA beat into pending buffer + when(io.dmaResp.fire) { + pending := true.B + latRow := io.dmaResp.bits.addrcounter + latData := io.dmaResp.bits.data + latLast := io.dmaResp.bits.last + } + + // When bankWrite request handshakes, consume pending beat + when(io.bankWrite.io.req.fire) { + pending := false.B + resp_count := resp_count + 1.U + + // Update group_counter + when(group_counter + 1.U < group_count_reg) { + group_counter := group_counter + 1.U + }.otherwise { + group_counter := 0.U + } + + when(latLast) { + // Last beat request sent, now wait for write response + state := s_wait_last_write + } + } + + // Wait for the last write response before completing + when(state === s_wait_last_write && io.bankWrite.io.resp.fire) { + state := s_done + } + + when(state === s_done && io.cmdResp.fire) { + state := s_idle + } +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemStorer.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemStorer.scala new file mode 100644 index 00000000..2c2516e0 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/MemStorer.scala @@ -0,0 +1,312 @@ +package framework.memdomain.frontend.outside_channel + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig +import framework.memdomain.frontend.cmd_channel.rs.{MemRsComplete, MemRsIssue} +import freechips.rocketchip.rocket.MStatus +import framework.memdomain.frontend.outside_channel.dma.{BBWriteRequest, BBWriteResponse} +import framework.balldomain.blink.BankRead +import chisel3.experimental.hierarchy.{instantiable, public} + +@instantiable +class MemStorer(val b: GlobalConfig) extends Module { + val rob_id_width = log2Up(b.frontend.rob_entries) + + // One bank line bytes + private val line_bytes = b.memDomain.bankWidth / 8 + // We pack/send 16B aligned beats to DMA + private val align_bytes = 16 + + @public + val io = IO(new Bundle { + val cmdReq = Flipped(Decoupled(new MemRsIssue(b))) + val cmdResp = Decoupled(new MemRsComplete(b)) + + val dmaReq = Decoupled(new BBWriteRequest(b.memDomain.bankWidth)) + val dmaResp = Flipped(Decoupled(new BBWriteResponse)) + + val bankRead = Flipped(new BankRead(b)) + + // Query interface to get group count + val query_vbank_id = Output(UInt(8.W)) + val query_is_shared = Output(Bool()) + val query_group_count = Input(UInt(4.W)) + + // Propagate decoded shared/private access intent. + val is_shared = Output(Bool()) + }) + + // ----------------------------- + // State + // ----------------------------- + val s_idle :: s_issue_sram_req :: s_wait_sram_resp :: s_have_sram_beat :: s_push_dma :: s_done :: Nil = Enum(6) + val state = RegInit(s_idle) + + val rob_id_reg = RegInit(0.U(rob_id_width.W)) + val is_sub_reg = RegInit(false.B) + val sub_rob_id_reg = RegInit(0.U(log2Up(b.frontend.sub_rob_depth * 4).W)) + val mem_addr_reg = RegInit(0.U(b.memDomain.memAddrLen.W)) + val iter_reg = RegInit(0.U(b.frontend.iter_len.W)) + val stride_reg = RegInit(0.U(10.W)) + val rd_bank_reg = RegInit(0.U(log2Up(b.memDomain.bankNum).W)) + val group_count_reg = RegInit(1.U(4.W)) // Store group count for current operation + val is_shared_reg = RegInit(false.B) + + // Address and group counters + val addr_counter = RegInit(0.U(b.frontend.iter_len.W)) // Row address counter + val group_counter = RegInit(0.U(4.W)) // Group counter within a row + + // ----------------------------- + // Pending buffer for SRAM resp + // ----------------------------- + val pending = RegInit(false.B) + val pendData = Reg(UInt(b.memDomain.bankWidth.W)) + val pendIsLast = RegInit(false.B) + + // ----------------------------- + // Optional: simple 16B align/merge support (keep your original intent) + // We'll keep a small byte buffer for unaligned head/tail. + // ----------------------------- + val data_buffer = RegInit(0.U((align_bytes * 8).W)) // 16B + val buffer_valid_bytes = RegInit(0.U(log2Ceil(align_bytes + 1).W)) + val buffer_start_addr = RegInit(0.U(b.memDomain.memAddrLen.W)) + + // Convenience + val target_bank = rd_bank_reg + + // ----------------------------- + // Cmd accept + // ----------------------------- + io.cmdReq.ready := (state === s_idle) + + when(io.cmdReq.fire && io.cmdReq.bits.cmd.is_store) { + rob_id_reg := io.cmdReq.bits.rob_id + is_sub_reg := io.cmdReq.bits.is_sub + sub_rob_id_reg := io.cmdReq.bits.sub_rob_id + mem_addr_reg := io.cmdReq.bits.cmd.mem_addr + rd_bank_reg := io.cmdReq.bits.cmd.bank_id + stride_reg := io.cmdReq.bits.cmd.special(57, 39) + + // Query and save group count + group_count_reg := io.query_group_count + iter_reg := io.cmdReq.bits.cmd.iter + is_shared_reg := io.cmdReq.bits.cmd.is_shared + + // Initialize counters + addr_counter := 0.U + group_counter := 0.U + + pending := false.B + data_buffer := 0.U + buffer_valid_bytes := 0.U + buffer_start_addr := 0.U + + state := s_issue_sram_req + } + + // Drive query interface + // When idle and cmdReq is valid, query the incoming bank_id + // Otherwise use the registered bank_id + io.query_vbank_id := Mux(state === s_idle && io.cmdReq.valid, io.cmdReq.bits.cmd.bank_id, rd_bank_reg) + io.query_is_shared := Mux(state === s_idle && io.cmdReq.valid, io.cmdReq.bits.cmd.is_shared, is_shared_reg) + + // ----------------------------- + // SRAM read request + // ----------------------------- + io.bankRead.rob_id := rob_id_reg + io.bankRead.bank_id := target_bank + io.bankRead.ball_id := 0.U + io.bankRead.group_id := group_counter + io.is_shared := is_shared_reg + + io.bankRead.io.req.valid := (state === s_issue_sram_req) + io.bankRead.io.req.bits.addr := addr_counter + + // SRAMBank read resp is a 1-cycle pulse, so we must ALWAYS be ready to take it, + // but only if we don't already hold a pending beat. + io.bankRead.io.resp.ready := !pending + + when(state === s_issue_sram_req) { + // Once request handshakes, wait for resp + when(io.bankRead.io.req.fire) { + state := s_wait_sram_resp + } + } + + // ----------------------------- + // Latch SRAM resp into pending (never drop it) + // ----------------------------- + val bank_resp_fire = io.bankRead.io.resp.fire + when(bank_resp_fire) { + pending := true.B + pendData := io.bankRead.io.resp.bits.data + // Last beat: last row and last group + val is_last_row = addr_counter >= iter_reg - 1.U + val is_last_group = group_counter >= group_count_reg - 1.U + pendIsLast := is_last_row && is_last_group && (iter_reg =/= 0.U) + state := s_have_sram_beat + } + + // ----------------------------- + // Address calculation + // For multi-bank: each row has group_count groups, each group is line_bytes + // So: offset = addr * (group_count * line_bytes) + group * line_bytes + // ----------------------------- + val current_mem_addr = + mem_addr_reg + + addr_counter * group_count_reg * line_bytes.U + + group_counter * line_bytes.U + + val addr_offset = current_mem_addr(log2Ceil(align_bytes) - 1, 0) + + val aligned_addr = Cat( + current_mem_addr(b.memDomain.memAddrLen - 1, log2Ceil(align_bytes)), + 0.U(log2Ceil(align_bytes).W) + ) + + // ----------------------------- + // Merge logic (kept compatible with your original behavior) + // incoming_data is always 16 bytes (bankWidth==128 in your waveforms) + // ----------------------------- + val incoming_data = pendData + val incoming_bytes = align_bytes.U + + val merged_data = Wire(UInt((align_bytes * 8).W)) + val total_valid_bytes = Wire(UInt(log2Ceil(align_bytes * 2).W)) + + when(buffer_valid_bytes === 0.U) { + when(addr_offset === 0.U) { + merged_data := incoming_data + total_valid_bytes := incoming_bytes + }.otherwise { + // first unaligned: send high part, pad low with 0 + val new_data_low = incoming_data & ((1.U << (addr_offset * 8.U)) - 1.U) + merged_data := new_data_low << (addr_offset * 8.U) + total_valid_bytes := align_bytes.U + } + }.otherwise { + val new_data_low = incoming_data & ((1.U << (addr_offset * 8.U)) - 1.U) + merged_data := (new_data_low << (addr_offset * 8.U)) | data_buffer + total_valid_bytes := align_bytes.U + } + + val can_send_full_line = total_valid_bytes >= align_bytes.U + + // send address (aligned) + val send_addr = Mux( + buffer_valid_bytes === 0.U, + aligned_addr, + Cat(buffer_start_addr(b.memDomain.memAddrLen - 1, log2Ceil(align_bytes)), 0.U(log2Ceil(align_bytes).W)) + ) + + // send mask + val send_mask = Wire(UInt(align_bytes.W)) + when(buffer_valid_bytes === 0.U && addr_offset =/= 0.U) { + val valid_bytes = align_bytes.U - addr_offset + send_mask := ((1.U << valid_bytes) - 1.U) << addr_offset + }.elsewhen(buffer_valid_bytes > 0.U && can_send_full_line) { + send_mask := ~0.U(align_bytes.W) + }.otherwise { + send_mask := ~0.U(align_bytes.W) + } + + // ----------------------------- + // DMA request (Decoupled correct): hold valid until fire + // ----------------------------- + val dma_v = RegInit(false.B) + val dma_addr = RegInit(0.U(b.memDomain.memAddrLen.W)) + val dma_data = RegInit(0.U((align_bytes * 8).W)) + val dma_mask = RegInit(0.U(align_bytes.W)) + + io.dmaReq.valid := dma_v + io.dmaReq.bits.vaddr := dma_addr + io.dmaReq.bits.data := dma_data + io.dmaReq.bits.len := align_bytes.U + io.dmaReq.bits.mask := dma_mask + io.dmaReq.bits.status := 0.U.asTypeOf(new MStatus) + + // By default we don't care dmaResp in this simple model + io.dmaResp.ready := true.B + + // When we have a pending SRAM beat, prepare one DMA beat (and keep it until fire) + when(state === s_have_sram_beat) { + // Only arm dma_v if not already armed + when(!dma_v) { + dma_v := true.B + dma_addr := send_addr + dma_data := merged_data + dma_mask := send_mask + state := s_push_dma + } + } + + // When DMA accepts the beat, consume pending and move forward + when(state === s_push_dma) { + when(io.dmaReq.fire) { + dma_v := false.B + + // Update buffer state like your original: + when(addr_offset =/= 0.U) { + val remaining_bytes = align_bytes.U - addr_offset + data_buffer := incoming_data >> (addr_offset * 8.U) + buffer_valid_bytes := remaining_bytes + when(buffer_valid_bytes === 0.U) { + buffer_start_addr := aligned_addr + align_bytes.U + }.otherwise { + buffer_start_addr := buffer_start_addr + align_bytes.U + } + }.otherwise { + // aligned: clear buffer if it was used + when(buffer_valid_bytes > 0.U && can_send_full_line) { + buffer_valid_bytes := 0.U + data_buffer := 0.U + } + } + + // Mark current beat consumed + pending := false.B + + // Check if this was the last beat before advancing counters + val is_last_row = addr_counter >= iter_reg - 1.U + val is_last_group = group_counter >= group_count_reg - 1.U + val all_done = is_last_row && is_last_group && (iter_reg =/= 0.U) + + // Advance counters + when(iter_reg =/= 0.U) { + when(group_counter + 1.U < group_count_reg) { + // Move to next group in same row + group_counter := group_counter + 1.U + }.otherwise { + // Move to next row, reset group counter + group_counter := 0.U + addr_counter := addr_counter + 1.U + } + } + + // Decide next state based on completion check done BEFORE counter update + when(pendIsLast || iter_reg === 0.U || all_done) { + state := s_done + }.otherwise { + state := s_issue_sram_req + } + } + } + + // If we are waiting for SRAM resp (but resp will pulse), just stay here + when(state === s_wait_sram_resp) { + // nothing; latch happens in bank_resp_fire block above + } + + // ----------------------------- + // Completion + // ----------------------------- + io.cmdResp.valid := (state === s_done) + io.cmdResp.bits.rob_id := rob_id_reg + io.cmdResp.bits.is_sub := is_sub_reg + io.cmdResp.bits.sub_rob_id := sub_rob_id_reg + + when(io.cmdResp.fire) { + state := s_idle + } +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/README.md b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/README.md new file mode 100644 index 00000000..dbdb263f --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/README.md @@ -0,0 +1,138 @@ +# DMA Engine Implementation + +## Overview + +DMA engine implementation for Buckyball's memory domain, located at `arch/src/main/scala/framework/builtin/memdomain/dma`. Provides high-performance memory data transfer services between main memory and on-chip storage. + +Main components: +- **StreamReader**: Streaming data reader for bulk reads from external memory +- **StreamWriter**: Streaming data writer for bulk writes to external memory +- **LocalAddr**: Local address management for Scratchpad and Accumulator mapping + +## File Structure + +``` +dma/ +├── DMA.scala - Streaming DMA read/write implementation +└── LocalAddr.scala - Local address management +``` + +## DMA.scala + +### Request/Response Interfaces + +```scala +class BBReadRequest()(implicit p: Parameters) extends CoreBundle { + val vaddr = UInt(coreMaxAddrBits.W) // Virtual address + val len = UInt(16.W) // Read length (bytes) + val status = new MStatus // Processor status +} + +class BBWriteRequest(dataWidth: Int)(implicit p: Parameters) extends CoreBundle { + val vaddr = UInt(coreMaxAddrBits.W) // Virtual address + val data = UInt(dataWidth.W) // Write data + val len = UInt(16.W) // Write length (bytes) + val mask = UInt((dataWidth / 8).W) // Byte mask + val status = new MStatus // Processor status +} +``` + +### StreamReader + +**State Machine**: +```scala +val s_idle :: s_req_new_block :: Nil = Enum(2) +val state = RegInit(s_idle) +``` + +**Byte Counting**: +```scala +val bytesRequested = Reg(UInt(16.W)) // Bytes requested +val bytesReceived = Reg(UInt(16.W)) // Bytes received +val bytesLeft = req.len - bytesRequested +``` + +**TileLink Request**: +```scala +val get = edge.Get( + fromSource = xactId, + toAddress = 0.U, + lgSize = log2Ceil(beatBytes).U +)._2 +``` + +**TLB Integration**: +```scala +io.tlb.req.bits.tlb_req.vaddr := tlb_q.io.deq.bits.vaddr +io.tlb.req.bits.tlb_req.cmd := M_XRD // Read operation +io.tlb.req.bits.status := tlb_q.io.deq.bits.status +``` + +### StreamWriter + +**Put Operation Selection**: +```scala +val use_put_full = req.mask === ~0.U(beatBytes.W) +val selected_put = Mux(use_put_full, putFull, putPartial) +``` + +**Response Handling**: +```scala +io.resp.valid := tl.d.valid && edge.last(tl.d) +io.resp.bits.done := true.B +``` + +## LocalAddr.scala + +### Address Structure + +```scala +class LocalAddr(sp_banks: Int, sp_bank_entries: Int, acc_banks: Int, acc_bank_entries: Int) extends Bundle { + val is_acc_addr = Bool() // Is accumulator address + val accumulate = Bool() // Perform accumulation + val read_full_acc_row = Bool() // Read full accumulator row + val data = UInt(memAddrBits.W) // Actual address data +} +``` + +### Address Decomposition + +```scala +// Scratchpad address decomposition +def sp_bank(dummy: Int = 0) = if (spAddrBits == spBankRowBits) 0.U + else data(spAddrBits - 1, spBankRowBits) +def sp_row(dummy: Int = 0) = data(spBankRowBits - 1, 0) + +// Accumulator address decomposition +def acc_bank(dummy: Int = 0) = if (accAddrBits == accBankRowBits) 0.U + else data(accAddrBits - 1, accBankRowBits) +def acc_row(dummy: Int = 0) = data(accBankRowBits - 1, 0) +``` + +### Address Operations + +```scala +// Address addition +def +(other: UInt) = { + val result = WireInit(this) + result.data := data + other + result +} + +// Addition with overflow check +def add_with_overflow(other: UInt): Tuple2[LocalAddr, Bool] = { + val sum = data +& other + val overflow = Mux(is_acc_addr, sum(accAddrBits), sum(spAddrBits)) + (result, overflow) +} +``` + +## Important Notes + +1. **Alignment Requirements**: DMA operations consider TileLink protocol alignment requirements +2. **Transaction ID Management**: Both engines implement transaction ID allocation and recycling for concurrent requests +3. **TLB Integration**: Full virtual address translation support for user and kernel mode +4. **Pipeline Design**: Multiple pipeline stages including address translation, TileLink request, and response handling +5. **Error Handling**: TLB miss handling implemented, relies on upper layer software for access failures +6. **Performance**: StreamWriter supports full and partial write modes, automatically selects optimal TileLink operation based on mask +7. **Configuration**: DMA engines support parametrized configuration of concurrent transactions, data width, max transfer bytes diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/StreamReader.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/StreamReader.scala new file mode 100644 index 00000000..d4df131e --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/StreamReader.scala @@ -0,0 +1,129 @@ +package framework.memdomain.frontend.outside_channel.dma + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import freechips.rocketchip.tilelink._ +import freechips.rocketchip.rocket.{MStatus, M_XRD} + +import framework.memdomain.frontend.outside_channel.tlb.BBTLBIO +import framework.top.GlobalConfig + +class BBReadRequest extends Bundle { + val vaddr = UInt(64.W) + val len = UInt(16.W) + val status = new MStatus + val stride = UInt(10.W) // 暂时不用 +} + +class BBReadResponse(dataWidth: Int) extends Bundle { + val data = UInt(dataWidth.W) + val last = Bool() + val addrcounter = UInt(10.W) +} + +@instantiable +class StreamReader(val b: GlobalConfig)(edge: TLEdgeOut) extends Module { + + val beatBits = b.memDomain.dma_buswidth + val beatBytes = beatBits / 8 + + @public + val io = IO(new Bundle { + val req = Flipped(Decoupled(new BBReadRequest())) + val resp = Decoupled(new BBReadResponse(beatBits)) + val tlb = Flipped(new BBTLBIO(b)) + val busy = Output(Bool()) + val flush = Input(Bool()) + val tl = new TLBundle(edge.bundle) + }) + + //------------------------------------------------------------ + // FSM + //------------------------------------------------------------ + + val s_idle :: s_run :: Nil = Enum(2) + val state = RegInit(s_idle) + + val reqReg = Reg(new BBReadRequest()) + + val bytesRequested = RegInit(0.U(16.W)) + val bytesReceived = RegInit(0.U(16.W)) + + val inflight = RegInit(false.B) + val read_vaddr = reqReg.vaddr + bytesRequested + + val get = edge.Get( + fromSource = 0.U, + toAddress = 0.U, + lgSize = log2Ceil(beatBytes).U + )._2 + + io.tlb.req.valid := + (state === s_run) && + (bytesRequested < reqReg.len) && + !inflight + + io.tlb.req.bits := DontCare + io.tlb.req.bits.vaddr := read_vaddr + io.tlb.req.bits.passthrough := false.B + io.tlb.req.bits.size := 0.U + io.tlb.req.bits.cmd := M_XRD + io.tlb.req.bits.prv := 3.U + io.tlb.req.bits.v := false.B + io.tlb.req.bits.status := reqReg.status + + io.tl.a.valid := + io.tlb.resp.valid && !io.tlb.resp.bits.miss && + !inflight && state =/= s_idle + + io.tl.a.bits := get + io.tl.a.bits.address := io.tlb.resp.bits.paddr + + io.tlb.resp.ready := io.tl.a.ready && !inflight + + when(io.tl.a.fire) { + inflight := true.B + bytesRequested := bytesRequested + beatBytes.U + } + + //------------------------------------------------------------ + // TL D → Response + //------------------------------------------------------------ + + io.tl.d.ready := io.resp.ready + + io.resp.valid := io.tl.d.valid + io.resp.bits.data := io.tl.d.bits.data + + val beatCountResp = bytesReceived >> log2Ceil(beatBytes) + io.resp.bits.addrcounter := beatCountResp(9, 0) + + io.resp.bits.last := + (bytesReceived + beatBytes.U >= reqReg.len) + + when(io.tl.d.fire) { + inflight := false.B + bytesReceived := bytesReceived + beatBytes.U + } + + io.tl.b.ready := true.B + io.tl.c.valid := false.B + io.tl.e.valid := false.B + + io.req.ready := (state === s_idle) + + io.busy := (state =/= s_idle) || inflight + + when(io.req.fire) { + reqReg := io.req.bits + bytesRequested := 0.U + bytesReceived := 0.U + inflight := false.B + state := s_run + } + + when(state === s_run && bytesReceived >= reqReg.len) { + state := s_idle + } +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/StreamWriter.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/StreamWriter.scala new file mode 100644 index 00000000..43580148 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/dma/StreamWriter.scala @@ -0,0 +1,141 @@ +package framework.memdomain.frontend.outside_channel.dma + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import freechips.rocketchip.tilelink._ +import freechips.rocketchip.rocket.{MStatus, M_XWR} +import framework.memdomain.frontend.outside_channel.tlb.BBTLBIO +import framework.top.GlobalConfig + +class BBWriteRequest(dataWidth: Int) extends Bundle { + val vaddr = UInt(64.W) + val data = UInt(dataWidth.W) + val len = UInt(16.W) + val mask = UInt((dataWidth / 8).W) + val status = new MStatus +} + +class BBWriteResponse extends Bundle { + val done = Bool() +} + +@instantiable +class StreamWriter(val b: GlobalConfig)(edge: TLEdgeOut) extends Module { + + val vaddrBits = b.core.vaddrBits + val beatBits = b.memDomain.dma_buswidth + val dataWidth = b.memDomain.dma_buswidth + val beatBytes = beatBits / 8 + val lgBeat = log2Ceil(beatBytes) + + @public + val io = IO(new Bundle { + val req = Flipped(Decoupled(new BBWriteRequest(dataWidth))) + val resp = Decoupled(new BBWriteResponse) + val tlb = Flipped(new BBTLBIO(b)) + val busy = Output(Bool()) + val flush = Input(Bool()) + val tl = new TLBundle(edge.bundle) + }) + + // --------------------------------------------------------------------------- + // Strict single-outstanding writer with PROPER handshakes + // --------------------------------------------------------------------------- + // NOTE: current TLB/Cluster returns resp combinationally in the same cycle as req.valid + // (see StreamReader usage). So we must NOT "fire req then wait for resp". + val s_idle :: s_tlb_req :: s_wait_d :: s_resp :: Nil = Enum(4) + val state = RegInit(s_idle) + + val reqReg = Reg(new BBWriteRequest(dataWidth)) + + // single outstanding => fixed source id 0 + val xactId = 0.U(io.tl.a.bits.source.getWidth.W) + + // ----------------------- + // Accept one request + // ----------------------- + io.req.ready := (state === s_idle) + + when(io.req.fire) { + reqReg := io.req.bits + state := s_tlb_req + } + + // ----------------------- + // Construct TileLink Put from LATCHED request + // ----------------------- + val use_put_full = reqReg.mask === ~0.U(beatBytes.W) + + val putFull = edge.Put( + fromSource = xactId, + toAddress = 0.U, // overwritten later + lgSize = lgBeat.U, + data = reqReg.data + )._2 + + val putPartial = edge.Put( + fromSource = xactId, + toAddress = 0.U, // overwritten later + lgSize = lgBeat.U, + data = reqReg.data, + mask = reqReg.mask + )._2 + + val putMsg = Wire(putFull.cloneType) + putMsg := Mux(use_put_full, putFull, putPartial) + + // ----------------------- + // TLB handshake (req.fire -> wait resp.valid) + // ----------------------- + io.tlb.req.valid := (state === s_tlb_req) + io.tlb.req.bits := DontCare + io.tlb.req.bits.vaddr := reqReg.vaddr + io.tlb.req.bits.passthrough := false.B + io.tlb.req.bits.size := 0.U + io.tlb.req.bits.cmd := M_XWR + io.tlb.req.bits.prv := 3.U + io.tlb.req.bits.v := false.B + io.tlb.req.bits.status := reqReg.status + + // We only "consume" the tlb response when we actually send A. + // This matches StreamReader: resp.valid is treated as a combinational translate result. + io.tlb.resp.ready := (state === s_tlb_req) && io.tl.a.ready + + // ----------------------- + // TileLink A channel (only when tlb resp is present and NOT a miss) + // ----------------------- + io.tl.a.valid := (state === s_tlb_req) && io.tlb.resp.valid && !io.tlb.resp.bits.miss + io.tl.a.bits := putMsg + io.tl.a.bits.address := io.tlb.resp.bits.paddr + + when(state === s_tlb_req && io.tl.a.fire) { + state := s_wait_d + } + + // ----------------------- + // TileLink D channel (ack) + // ----------------------- + io.tl.d.ready := (state === s_wait_d) + + // upper response + io.resp.valid := (state === s_resp) + io.resp.bits.done := true.B + + when(state === s_wait_d && io.tl.d.fire) { + state := s_resp + } + + when(state === s_resp && io.resp.fire) { + state := s_idle + } + + // ----------------------- + // Tie off unused TL channels + // ----------------------- + io.tl.b.ready := true.B + io.tl.c.valid := false.B + io.tl.e.valid := false.B + + io.busy := (state =/= s_idle) +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/BBTLBIO.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/BBTLBIO.scala new file mode 100644 index 00000000..ef7c3b59 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/BBTLBIO.scala @@ -0,0 +1,173 @@ +package framework.memdomain.frontend.outside_channel.tlb + +import chisel3._ +import chisel3.util.{Decoupled, Valid} +import chisel3.util.log2Ceil +import framework.top.GlobalConfig +import freechips.rocketchip.rocket.{HStatus, MStatus} +import freechips.rocketchip.rocket.constants.MemoryOpConstants + +// TLB Exception types +class TLBExceptions extends Bundle { + val ld = Bool() + val st = Bool() + val inst = Bool() +} + +// TLB Request +class BBTLBReq(val lgMaxSize: Int, val vaddrBits: Int, val xLen: Int) extends Bundle { + val vaddr = UInt(vaddrBits.W) + val passthrough = Bool() + val size = UInt(log2Ceil(lgMaxSize + 1).W) + val cmd = Bits(5.W) // M_SZ = 5 + val prv = UInt(2.W) // PRV.SZ = 2 + val v = Bool() + val status = new MStatus +} + +// TLB Response +class BBTLBResp(val lgMaxSize: Int, val paddrBits: Int, val vaddrBits: Int) extends Bundle { + val miss = Bool() + val paddr = UInt(paddrBits.W) + val gpa = UInt(vaddrBits.W) + val gpa_is_pte = Bool() + val pf = new TLBExceptions + val gf = new TLBExceptions + val ae = new TLBExceptions + val ma = new TLBExceptions + val cacheable = Bool() + val must_alloc = Bool() + val prefetchable = Bool() + val size = UInt(log2Ceil(lgMaxSize + 1).W) + val cmd = UInt(5.W) // M_SZ = 5 +} + +// TLB Exception IO +class BBTLBExceptionIO extends Bundle { + val interrupt = Output(Bool()) + val flush_retry = Input(Bool()) + val flush_skip = Input(Bool()) + + def flush(dummy: Int = 0): Bool = flush_retry || flush_skip +} + +// Page Table Base Register +class BBTLBPTBR(val paddrBits: Int, val pgIdxBits: Int, val xLen: Int) extends Bundle { + val modeBits = if (xLen == 32) 1 else 4 + val maxASIdBits = if (xLen == 32) 9 else 16 + val mode = UInt(modeBits.W) + val asid = UInt(maxASIdBits.W) + val ppn = UInt((paddrBits - pgIdxBits).W) +} + +// PTW Request +class BBTLBPTWReq(val vaddrBits: Int, val pgIdxBits: Int) extends Bundle { + val vpnBits = vaddrBits - pgIdxBits + val addr = UInt(vpnBits.W) + val need_gpa = Bool() + val vstage1 = Bool() + val stage2 = Bool() +} + +// PTE (Page Table Entry) - Simplified +class BBTLBPTE(val paddrBits: Int, val pgIdxBits: Int) extends Bundle { + val ppnBits = paddrBits - pgIdxBits + val ppn = UInt(ppnBits.W) + val reserved_for_future = UInt(10.W) + val reserved_for_software = Bits(2.W) + val d = Bool() // dirty + val a = Bool() // access + val g = Bool() // global + val u = Bool() // user + val x = Bool() // executable + val w = Bool() // writable + val r = Bool() // readable + val v = Bool() // valid + + def sr(): Bool = v && r + def sw(): Bool = v && w && d + def sx(): Bool = v && x +} + +// PTW Response +class BBTLBPTWResp( + val vaddrBits: Int, + val paddrBits: Int, + val pgIdxBits: Int, + val pgLevels: Int) + extends Bundle { + val ae_ptw = Bool() + val ae_final = Bool() + val pf = Bool() + val gf = Bool() + val hr = Bool() + val hw = Bool() + val hx = Bool() + val pte = new BBTLBPTE(paddrBits, pgIdxBits) + val level = UInt(log2Ceil(pgLevels).W) + val fragmented_superpage = Bool() + val homogeneous = Bool() + val gpa = Valid(UInt(vaddrBits.W)) + val gpa_is_pte = Bool() +} + +// Simplified CustomCSR IO wrapper - without Parameters dependency +class BBCustomCSRIO(val xLen: Int) extends Bundle { + val ren = Output(Bool()) + val wen = Output(Bool()) + val wdata = Output(UInt(xLen.W)) + val value = Output(UInt(xLen.W)) + val stall = Input(Bool()) + val set = Input(Bool()) + val sdata = Input(UInt(xLen.W)) +} + +// Simplified CustomCSRs Bundle - matches rocket-chip interface without Parameters +class BBCustomCSRs(val xLen: Int) extends Bundle { + // Empty by default - no custom CSRs defined + val csrs = Vec(0, new BBCustomCSRIO(xLen)) +} + +// Simplified PMP - without Parameters dependency +class BBPMP(val paddrBits: Int) extends Bundle { + + val cfg = new Bundle { + val l = Bool() + val res = UInt(2.W) // Reserved field + val a = UInt(2.W) + val x = Bool() + val w = Bool() + val r = Bool() + } + + val addr = UInt((paddrBits - 2).W) + val mask = UInt(paddrBits.W) +} + +// PTW IO +class BBTLBPTWIO(val b: GlobalConfig) extends Bundle { + val vaddrBits = b.core.vaddrBits + val paddrBits = b.core.paddrBits + val pgIdxBits = b.core.pgIdxBits + val xLen = b.core.xLen + val pgLevels = if (xLen == 32) 2 else 4 // Simplified: assume SV39 for 64-bit + val nPMPs = b.core.nPMPs + + val req = Decoupled(Valid(new BBTLBPTWReq(vaddrBits, pgIdxBits))) + val resp = Flipped(Valid(new BBTLBPTWResp(vaddrBits, paddrBits, pgIdxBits, pgLevels))) + val ptbr = Input(new BBTLBPTBR(paddrBits, pgIdxBits, xLen)) + val hgatp = Input(new BBTLBPTBR(paddrBits, pgIdxBits, xLen)) + val vsatp = Input(new BBTLBPTBR(paddrBits, pgIdxBits, xLen)) + val status = Input(new MStatus) + val hstatus = Input(new HStatus) + val gstatus = Input(new MStatus) + val pmp = Input(Vec(nPMPs, new BBPMP(paddrBits))) + val customCSRs = Flipped(new BBCustomCSRs(xLen)) +} + +// TLB Client IO (used in TLBCluster) +class BBTLBIO(val b: GlobalConfig) extends Bundle { + val lgMaxSize = log2Ceil(b.core.coreDataBytes) + val req = Flipped(Decoupled(new BBTLBReq(lgMaxSize, b.core.vaddrBits, b.core.xLen))) + val resp = Decoupled(new BBTLBResp(lgMaxSize, b.core.paddrBits, b.core.vaddrBits)) +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLB.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLB.scala new file mode 100644 index 00000000..894d79cb --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLB.scala @@ -0,0 +1,169 @@ +package framework.memdomain.frontend.outside_channel.tlb + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.top.GlobalConfig + +/** TLB implementation with fully-associative structure and LRU replacement */ +@instantiable +class TLB(val b: GlobalConfig, val lgMaxSize: Int) extends Module { + val entries = b.memDomain.tlb_size + val vaddrBits = b.core.vaddrBits + val paddrBits = b.core.paddrBits + val pgIdxBits = b.core.pgIdxBits + val vpnBits = vaddrBits - pgIdxBits + val ppnBits = paddrBits - pgIdxBits + val pgLevels = if (b.core.xLen == 32) 2 else 4 // Match Rocket PTW convention + + @public + val io = IO(new Bundle { + val req = Flipped(Decoupled(new BBTLBReq(lgMaxSize, vaddrBits, b.core.xLen))) + val resp = Decoupled(new BBTLBResp(lgMaxSize, paddrBits, vaddrBits)) + val ptw = new BBTLBPTWIO(b) + val sfence = Flipped(Valid(Bool())) // Simplified flush signal + val kill = Input(Bool()) + }) + + // TLB entries storage + val tlbEntries = Reg(Vec(entries, new TLBEntry(vaddrBits, pgIdxBits, paddrBits, pgLevels = pgLevels))) + val lru = Reg(Vec(entries, UInt(log2Ceil(entries).W))) // Simple LRU counter + + // State machine + val s_ready :: s_request :: s_wait :: Nil = Enum(3) + val state = RegInit(s_ready) + val refill_vpn = Reg(UInt(vpnBits.W)) + val refill_idx = Reg(UInt(log2Ceil(entries).W)) + + // Initialize LRU + when(reset.asBool) { + lru.foreach(_ := 0.U) + tlbEntries.foreach(_.invalidate()) + } + + val vpn = io.req.bits.vaddr(vaddrBits - 1, pgIdxBits) + val pgIdx = io.req.bits.vaddr(pgIdxBits - 1, 0) + + // TLB lookup + val hits = tlbEntries.map(_.hit(vpn)) + val hitVec = VecInit(hits) + val hitIdx = PriorityEncoder(hits) + val tlbHit = hits.reduce(_ || _) + + // Update LRU on hit + when(io.req.fire && tlbHit) { + lru(hitIdx) := (entries - 1).U + for (i <- 0 until entries) { + when(i.U =/= hitIdx && lru(i.U) > lru(hitIdx)) { + lru(i.U) := lru(i.U) - 1.U + } + } + } + + // Find LRU entry for replacement + val lruIdx = PriorityEncoder(lru.map(_ === 0.U)) + + // VM enable check: mode != 0 means virtual memory is on (Sv39/Sv48/etc.) + val vm_enabled = io.ptw.ptbr.mode =/= 0.U + + val tlbMiss = vm_enabled && !io.req.bits.passthrough && !tlbHit + + // State machine + io.req.ready := state === s_ready + + when(io.req.fire && tlbMiss && state === s_ready) { + state := s_request + refill_vpn := vpn + refill_idx := lruIdx + } + + when(state === s_request) { + when(io.kill) { + state := s_ready + }.elsewhen(io.ptw.req.ready) { + state := s_wait + } + } + + when(state === s_wait && io.ptw.resp.valid) { + state := s_ready + // Refill TLB entry + val pte = io.ptw.resp.bits.pte + val entryData = Wire(new TLBEntryData(paddrBits, pgIdxBits)) + entryData.ppn := pte.ppn(ppnBits - 1, 0) + entryData.u := pte.u + entryData.g := pte.g + entryData.sr := pte.sr() + entryData.sw := pte.sw() + entryData.sx := pte.sx() + entryData.cacheable := true.B // Simplified + entryData.pf := io.ptw.resp.bits.pf + entryData.ae_final := io.ptw.resp.bits.ae_final + + tlbEntries(refill_idx).insert(refill_vpn, io.ptw.resp.bits.level, entryData) + // Update LRU + lru(refill_idx) := (entries - 1).U + for (i <- 0 until entries) { + when(i.U =/= refill_idx && lru(i.U) > lru(refill_idx)) { + lru(i.U) := lru(i.U) - 1.U + } + } + } + + // PTW request + io.ptw.req.valid := state === s_request + io.ptw.req.bits.valid := !io.kill + io.ptw.req.bits.bits.addr := refill_vpn + io.ptw.req.bits.bits.vstage1 := false.B + io.ptw.req.bits.bits.stage2 := false.B + io.ptw.req.bits.bits.need_gpa := false.B + + // TLB flush on sfence + when(io.sfence.valid) { + tlbEntries.foreach(_.invalidate()) + state := s_ready + } + + // Response generation — use superpage-aware ppn from the hit entry + val hitEntryData = Mux( + tlbHit, + tlbEntries(hitIdx).data, + WireDefault(new TLBEntryData(paddrBits, pgIdxBits), 0.U.asTypeOf(new TLBEntryData(paddrBits, pgIdxBits))) + ) + + // For superpage entries, TLBEntry.ppn() replaces lower PPN bits with VPN bits + val hitPPN = Mux( + tlbHit, + tlbEntries(hitIdx).ppn(vpn), + 0.U(ppnBits.W) + ) + + val paddr = Mux( + io.req.bits.passthrough || !vm_enabled, + io.req.bits.vaddr, + Cat(hitPPN, pgIdx) + ) + + io.resp.valid := io.req.valid && state === s_ready + io.resp.bits.miss := tlbMiss || (state === s_wait) + io.resp.bits.paddr := paddr(paddrBits - 1, 0) + io.resp.bits.gpa := 0.U + io.resp.bits.gpa_is_pte := false.B + io.resp.bits.pf.ld := hitEntryData.pf + io.resp.bits.pf.st := hitEntryData.pf + io.resp.bits.pf.inst := hitEntryData.pf + io.resp.bits.gf.ld := false.B + io.resp.bits.gf.st := false.B + io.resp.bits.gf.inst := false.B + io.resp.bits.ae.ld := hitEntryData.ae_final + io.resp.bits.ae.st := hitEntryData.ae_final + io.resp.bits.ae.inst := hitEntryData.ae_final + io.resp.bits.ma.ld := false.B + io.resp.bits.ma.st := false.B + io.resp.bits.ma.inst := false.B + io.resp.bits.cacheable := hitEntryData.cacheable + io.resp.bits.must_alloc := false.B + io.resp.bits.prefetchable := hitEntryData.cacheable + io.resp.bits.size := io.req.bits.size + io.resp.bits.cmd := io.req.bits.cmd +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLBCluster.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLBCluster.scala new file mode 100644 index 00000000..3832702a --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLBCluster.scala @@ -0,0 +1,115 @@ +package framework.memdomain.frontend.outside_channel.tlb + +import chisel3._ +import chisel3.util._ +import chisel3.experimental.hierarchy.{instantiable, public, Instance, Instantiate} +import freechips.rocketchip.tilelink.TLEdgeOut +import framework.top.GlobalConfig + +@instantiable +class BBTLBCluster(val b: GlobalConfig)(implicit val edge: TLEdgeOut) extends Module { + + val nClients = 2 + val entries = b.memDomain.tlb_size + val maxSize = b.core.coreDataBytes + val lgMaxSize = log2Ceil(b.core.coreDataBytes) + val vaddrBits = b.core.vaddrBits + val paddrBits = b.core.paddrBits + val pgIdxBits = b.core.pgIdxBits + + @public + val io = IO(new Bundle { + val clients = Vec(nClients, new BBTLBIO(b)) + val ptw = Vec(1, new BBTLBPTWIO(b)) // Shared TLB has only 1 PTW port + val exp = Vec(1, new BBTLBExceptionIO) // Shared TLB has only 1 exception interface + }) + + val tlb = Instantiate(new TLB(b, lgMaxSize)) + + // Exception handling + val interrupt = RegInit(false.B) + io.exp(0).interrupt := interrupt + + // Connect PTW + io.ptw(0) <> tlb.io.ptw + + val tlbArb = Module(new RRArbiter(new BBTLBReq(lgMaxSize, vaddrBits, b.core.xLen), nClients)) + val tlbArbOut = tlbArb.io.out + val tlb_io = tlb.io + + tlb_io.req.valid := tlbArbOut.valid + tlb_io.req.bits := tlbArbOut.bits + tlbArbOut.ready := tlb_io.req.ready + + // Connect status to PTW + tlb_io.ptw.status := tlbArbOut.bits.status + tlb_io.kill := false.B + + // Handle sfence from exception IO + tlb_io.sfence.valid := io.exp(0).flush() + tlb_io.sfence.bits := false.B + + // Exception detection + val isRead = tlbArbOut.bits.cmd(0) === 0.U + + val exception = tlbArbOut.valid && !tlb_io.resp.bits.miss && Mux( + isRead, + tlb_io.resp.bits.pf.ld || tlb_io.resp.bits.ae.ld || tlb_io.resp.bits.gf.ld, + tlb_io.resp.bits.pf.st || tlb_io.resp.bits.ae.st || tlb_io.resp.bits.gf.st + ) + + when(exception) { + interrupt := true.B + } + + when(interrupt && io.exp(0).flush_skip) { + interrupt := false.B + } + + when(interrupt && io.exp(0).flush_retry) { + interrupt := false.B + } + + assert(!io.exp(0).flush_retry || !io.exp(0).flush_skip, "TLB: flushing with both retry and skip at same time") + + // Track which client won the arbiter + val arbGranted = tlbArb.io.chosen + + // TLB resp.ready: only the winning client controls backpressure + tlb_io.resp.ready := MuxLookup(arbGranted, false.B)( + io.clients.zipWithIndex.map { case (client, i) => i.U -> client.resp.ready } + ) + + io.clients.zipWithIndex.foreach { + case (client, i) => + val last_translated_valid = RegInit(false.B) + val last_translated_vpn = RegInit(0.U(vaddrBits.W)) + val last_translated_ppn = RegInit(0.U(paddrBits.W)) + + val l0_tlb_hit = + last_translated_valid && ((client.req.bits.vaddr >> pgIdxBits).asUInt === (last_translated_vpn >> pgIdxBits).asUInt) + val l0_tlb_paddr = Cat(last_translated_ppn >> pgIdxBits, client.req.bits.vaddr(pgIdxBits - 1, 0)) + + tlbArb.io.in(i).valid := client.req.valid + tlbArb.io.in(i).bits := client.req.bits + client.req.ready := tlbArb.io.in(i).ready + + val tlbReq = tlbArb.io.in(i).bits + val tlbReqFire = tlbArb.io.in(i).fire + + when(tlbReqFire && !tlb_io.resp.bits.miss) { + last_translated_valid := true.B + last_translated_vpn := tlbReq.vaddr + last_translated_ppn := tlb_io.resp.bits.paddr + } + + when(io.exp(0).flush()) { + last_translated_valid := false.B + } + + // Response routing: only the winning client gets the TLB response + val isMyTurn = tlbArbOut.valid && arbGranted === i.U + client.resp.valid := isMyTurn && tlb_io.resp.valid + client.resp.bits := Mux(isMyTurn, tlb_io.resp.bits, 0.U.asTypeOf(tlb_io.resp.bits)) + } +} diff --git a/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLBEntry.scala b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLBEntry.scala new file mode 100644 index 00000000..dcdc974e --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/frontend/outside_channel/tlb/TLBEntry.scala @@ -0,0 +1,115 @@ +package framework.memdomain.frontend.outside_channel.tlb + +import chisel3._ +import chisel3.util._ + +/** TLB entry data containing translation and permission information */ +class TLBEntryData(val paddrBits: Int, val pgIdxBits: Int) extends Bundle { + val ppnBits = paddrBits - pgIdxBits + + val ppn = UInt(ppnBits.W) + val u = Bool() // user page + val g = Bool() // global page + val sr = Bool() // supervisor read + val sw = Bool() // supervisor write + val sx = Bool() // supervisor execute + val cacheable = Bool() + + // Page fault and access exception flags + val pf = Bool() + val ae_final = Bool() +} + +/** + * TLB entry containing VPN tag and entry data. + * + * Supports superpages. The `level` field stores the PTW response level, + * which indicates at which page table walk step the PTE was found. + * For Sv39 with pgLevels=4 (matching Rocket PTW): + * level=1 → 1GB gigapage (only VPN[2] in PPN, VPN[1:0] from VA) + * level=2 → 2MB megapage (VPN[2:1] in PPN, VPN[0] from VA) + * level=3 → 4KB page (full PPN from PTE) + * + * The superpage PPN generation follows Rocket TLB's logic: + * for j in 1 until pgLevels: + * if level < j: use VPN chunk (superpage covers this level) + * else: use PPN chunk from PTE + */ +class TLBEntry( + val vaddrBits: Int, + val pgIdxBits: Int, + val paddrBits: Int, + val pgLevelBits: Int = 9, + val pgLevels: Int = 4) + extends Bundle { + val vpnBits = vaddrBits - pgIdxBits + val ppnBits = paddrBits - pgIdxBits + + val tag_vpn = UInt(vpnBits.W) + val valid = Bool() + val level = UInt(log2Ceil(pgLevels).W) + val data = new TLBEntryData(paddrBits, pgIdxBits) + + /** + * Superpage-aware hit: only compare VPN chunks that are above the + * superpage boundary. Follows Rocket's convention where `level < j` + * means the chunk at position j is covered by the superpage. + */ + def hit(vpn: UInt): Bool = { + // Walk from highest VPN chunk (j=1) to lowest (j=pgLevels-1), + // same ordering as Rocket TLB's ppn() method. + val matches = (1 until pgLevels).map { j => + // Rocket convention: j=1 is the highest VPN/PPN chunk below the top, + // j=pgLevels-1 is the lowest. + // supervisorVPNBits = pgLevels * pgLevelBits (may exceed actual vpnBits) + val supervisorVPNBits = pgLevels * pgLevelBits + val hi = supervisorVPNBits - j * pgLevelBits - 1 + val lo = supervisorVPNBits - (j + 1) * pgLevelBits + // Clamp to actual vpnBits + val clampedHi = math.min(hi, vpnBits - 1) + val clampedLo = math.max(lo, 0) + if (clampedHi < clampedLo) { + true.B // Chunk doesn't exist in VPN, always matches + } else { + val ignore = level < j.U + ignore || (vpn(clampedHi, clampedLo) === tag_vpn(clampedHi, clampedLo)) + } + } + valid && matches.reduce(_ && _) + } + + /** + * Generate the correct PPN for superpage translation. + * Follows Rocket TLB's ppn() method exactly. + */ + def ppn(vpn: UInt): UInt = { + val supervisorVPNBits = pgLevels * pgLevelBits + // Start with the highest PPN chunk (above all VPN levels) + var res = data.ppn >> (pgLevelBits * (pgLevels - 1)) + for (j <- 1 until pgLevels) { + val hi = supervisorVPNBits - j * pgLevelBits - 1 + val lo = supervisorVPNBits - (j + 1) * pgLevelBits + // Clamp to actual ppnBits and vpnBits + val ppnHi = math.min(hi, ppnBits - 1) + val ppnLo = math.max(lo, 0) + if (ppnHi >= ppnLo) { + val ignore = level < j.U + val vpnHi = math.min(hi, vpnBits - 1) + val vpnLo = math.max(lo, 0) + val chunk = Mux(ignore, vpn(vpnHi, vpnLo), data.ppn(ppnHi, ppnLo)) + res = Cat(res, chunk) + } + } + res + } + + def insert(vpn: UInt, lvl: UInt, entryData: TLBEntryData): Unit = { + tag_vpn := vpn + valid := true.B + level := lvl + data := entryData + } + + def invalidate(): Unit = + valid := false.B +} diff --git a/arch/src/main/scala/framework/memdomain/mem/AccBank.scala b/arch/src/main/scala/framework/memdomain/mem/AccBank.scala deleted file mode 100644 index 9ea71afe..00000000 --- a/arch/src/main/scala/framework/memdomain/mem/AccBank.scala +++ /dev/null @@ -1,142 +0,0 @@ -package framework.memdomain.mem - -import chisel3._ -import chisel3.util._ - -import framework.builtin.util.Util._ - -class AccWriteIO(n: Int, w: Int, mask_len: Int) extends SramWriteIO(n, w, mask_len) { - val is_acc = Input(Bool()) -} - -class AccPipe(val n: Int, val w: Int, val mask_len: Int) extends Module { - val io = IO(new Bundle { - val write_in = new AccWriteIO(n, w, mask_len) // outer —> Acc - val read = Flipped(new SramReadIO(n, w)) // Acc <—> SramBank - val write_out = Flipped(new SramWriteIO(n, w, mask_len)) // Acc —> SramBank - }) - - // Pipeline registers - val valid_reg = RegInit(false.B) - val addr_reg = RegInit(0.U(log2Ceil(n).W)) - val data_reg = RegInit(0.U(w.W)) - val mask_reg = RegInit(VecInit(Seq.fill(mask_len)(false.B))) - - when (io.write_in.is_acc || RegNext(io.write_in.is_acc)) { -// ----------------------------------------------------------------------------- -// exec->AccPipe->SramBank -// ----------------------------------------------------------------------------- - // Stage 1: Read request - io.read.req.valid := io.write_in.req.valid - io.read.req.bits.addr := io.write_in.req.bits.addr - // AccPipe read is not from DMA - io.read.req.bits.fromDMA := false.B - valid_reg := io.write_in.req.valid - addr_reg := io.write_in.req.bits.addr - data_reg := io.write_in.req.bits.data - mask_reg := io.write_in.req.bits.mask - - // Stage 2: Accumulate (when read data is ready) - val acc_data = WireDefault(0.U(w.W)) - when (valid_reg && io.read.resp.valid) { - acc_data := data_reg + io.read.resp.bits.data - }.otherwise { - acc_data := data_reg - } - - // Stage 3: Write back - io.write_out.req.valid := valid_reg && io.read.resp.valid - io.write_out.req.bits.addr := addr_reg - io.write_out.req.bits.data := acc_data - io.write_out.req.bits.mask := mask_reg - - // Backpressure - io.write_in.req.ready := io.read.req.ready - io.read.resp.ready := io.write_out.req.ready - }.otherwise { -// ----------------------------------------------------------------------------- -// main->SramBank -// ----------------------------------------------------------------------------- - io.read.req.valid := false.B - io.read.req.bits.addr := 0.U(log2Ceil(n).W) - io.read.req.bits.fromDMA := false.B - - io.write_out.req.valid := io.write_in.req.valid - io.write_out.req.bits.addr := io.write_in.req.bits.addr - io.write_out.req.bits.data := io.write_in.req.bits.data - io.write_out.req.bits.mask := io.write_in.req.bits.mask - - io.write_in.req.ready := io.write_out.req.ready - io.read.resp.ready := false.B - } -} - - -class AccReadRouter(val n: Int, val w: Int) extends Module { - val io = IO(new Bundle { - val read_in1 = new SramReadIO(n, w) - val read_in2 = new SramReadIO(n, w) - val read_out = Flipped(new SramReadIO(n, w)) - }) - -// ----------------------------------------------------------------------------- -// Arbiter - use two Arbiters to handle req and resp separately -// ----------------------------------------------------------------------------- - // Priority arbiter, read_in2 has index 0 for higher priority - val req_arbiter = Module(new Arbiter(new SramReadReq(n), 2)) - req_arbiter.io.in(0) <> io.read_in2.req - req_arbiter.io.in(1) <> io.read_in1.req - io.read_out.req <> req_arbiter.io.out - - // Response distributor: record which input initiated the request - val resp_to_in1 = RegNext(req_arbiter.io.chosen === 1.U && req_arbiter.io.out.fire, false.B) - val resp_to_in2 = RegNext(req_arbiter.io.chosen === 0.U && req_arbiter.io.out.fire, false.B) - - // Response distribution - io.read_in1.resp.valid := io.read_out.resp.valid && resp_to_in1 - io.read_in1.resp.bits := io.read_out.resp.bits - io.read_in2.resp.valid := io.read_out.resp.valid && resp_to_in2 - io.read_in2.resp.bits := io.read_out.resp.bits - - // Response ready signal - io.read_out.resp.ready := - (resp_to_in1 && io.read_in1.resp.ready) || - (resp_to_in2 && io.read_in2.resp.ready) - - assert(!(io.read_in1.req.valid && io.read_in2.req.valid), "[AccBank Router]: Read requests is not allowed at the same time") -} - - -class AccBank(n: Int, w: Int, aligned_to: Int, single_ported: Boolean) extends Module { - val mask_len = (w / (aligned_to * 8)) max 1 - val mask_elem = UInt((w min (aligned_to * 8)).W) - - val io = IO(new Bundle { - val read = new SramReadIO(n, w) - val write = new AccWriteIO(n, w, mask_len) - }) - - val sram = Module(new SramBank(n, w, aligned_to, single_ported)) - val pipe = Module(new AccPipe(n, w, mask_len)) - val read_router = Module(new AccReadRouter(n, w)) - -// ----------------------------------------------------------------------------- -// Write request enters pipeline -// ----------------------------------------------------------------------------- - pipe.io.write_in <> io.write - -// ----------------------------------------------------------------------------- -// Read request arbitration -// ----------------------------------------------------------------------------- - read_router.io.read_in1 <> pipe.io.read - read_router.io.read_in2 <> io.read - - // Connect AccRouter output to SramBank - sram.io.read <> read_router.io.read_out - -// ----------------------------------------------------------------------------- -// Pipeline output connected to underlying SRAM write port -// ----------------------------------------------------------------------------- - sram.io.write <> pipe.io.write_out - -} diff --git a/arch/src/main/scala/framework/memdomain/mem/README.md b/arch/src/main/scala/framework/memdomain/mem/README.md deleted file mode 100644 index cecedcf6..00000000 --- a/arch/src/main/scala/framework/memdomain/mem/README.md +++ /dev/null @@ -1,114 +0,0 @@ -# Memory Bank Implementation - -## Overview - -Core storage units for Buckyball's memory domain, located at `arch/src/main/scala/framework/builtin/memdomain/mem`. Provides high-performance on-chip memory components. - -Components: -- **SramBank**: Basic SRAM storage bank with synchronous read/write -- **AccBank**: Accumulator storage bank with read-modify-write support -- **Scratchpad**: Scratchpad module managing multiple memory banks with arbitration - -## File Structure - -``` -mem/ -├── SramBank.scala - Basic SRAM bank implementation -├── AccBank.scala - Accumulator bank implementation -└── Scratchpad.scala - Scratchpad management module -``` - -## SramBank.scala - -### Interface Definition - -```scala -class SramReadReq(val n: Int) extends Bundle { - val addr = UInt(log2Ceil(n).W) - val fromDMA = Bool() -} - -class SramWriteReq(val n: Int, val w: Int, val mask_len: Int) extends Bundle { - val addr = UInt(log2Ceil(n).W) - val mask = Vec(mask_len, Bool()) - val data = UInt(w.W) -} -``` - -### Core Logic - -```scala -val mem = SyncReadMem(n, Vec(mask_len, mask_elem)) - -// Read/write conflict arbitration -assert(!(io.read.req.valid && io.write.req.valid), - "SramBank: Read and write requests is not allowed at the same time") - -io.read.req.ready := !io.write.req.valid -io.write.req.ready := !io.read.req.valid -``` - -**Constraint**: No simultaneous read/write to same bank in same cycle - -## AccBank.scala - -### Accumulation Pipeline (AccPipe) - -```scala -when (io.write_in.is_acc || RegNext(io.write_in.is_acc)) { - // Stage 1: Read request - io.read.req.valid := io.write_in.req.valid - - // Stage 2: Accumulate - val acc_data = data_reg + io.read.resp.bits.data - - // Stage 3: Write back - io.write_out.req.bits.data := acc_data -} -``` - -### Read Request Router (AccReadRouter) - -```scala -val req_arbiter = Module(new Arbiter(new SramReadReq(n), 2)) -req_arbiter.io.in(0) <> io.read_in2.req // Higher priority -req_arbiter.io.in(1) <> io.read_in1.req // Lower priority - -// Response distribution -val resp_to_in1 = RegNext(req_arbiter.io.chosen === 1.U && req_arbiter.io.out.fire) -``` - -## Scratchpad.scala - -### Bank Instantiation - -```scala -val spad_mems = Seq.fill(sp_banks) { Module(new SramBank( - spad_bank_entries, spad_w, aligned_to, sp_singleported -)) } - -val acc_mems = Seq.fill(acc_banks) { Module(new AccBank( - acc_bank_entries, acc_w, aligned_to, sp_singleported -)) } -``` - -### Request Arbitration - -```scala -// Read arbitration: priority exec > dma -val exec_read_sel = exec_read_req.valid -val main_read_sel = main_read_req.valid && !exec_read_sel - -// Response distribution -val resp_to_main = RegNext(main_read_sel && bank.io.read.req.fire) -val resp_to_exec = RegNext(exec_read_sel && bank.io.read.req.fire) -``` - -## Important Notes - -1. **Single Port Limitation**: Configuration enforces single-port SRAM (`sp_singleported = true`), no same-cycle read/write -2. **Arbitration Priority**: Execution unit (exec) requests have higher priority than DMA requests in all modules -3. **Pipeline Design**: AccBank uses 3-stage pipeline (read-accumulate-write), requires careful data dependency handling -4. **Parameterized Configuration**: All modules support configuration through BaseConfig (bank count, capacity, data width) -5. **Assertions**: Code includes runtime assertions for detecting illegal concurrent access and configuration errors -6. **Mask Support**: Byte-granularity write mask operations, mask length calculated from data width and alignment diff --git a/arch/src/main/scala/framework/memdomain/mem/Scratchpad.scala b/arch/src/main/scala/framework/memdomain/mem/Scratchpad.scala deleted file mode 100644 index 08d20c2f..00000000 --- a/arch/src/main/scala/framework/memdomain/mem/Scratchpad.scala +++ /dev/null @@ -1,171 +0,0 @@ -package framework.memdomain.mem - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.tile._ - -import framework.builtin.util.Util._ -import examples.BuckyballConfigs.CustomBuckyballConfig - -import framework.memdomain.mem.AccWriteIO - - -class Scratchpad(config: CustomBuckyballConfig)(implicit val p: Parameters) extends Module with HasCoreParameters { - import config._ - - // Assertion: ensure configuration consistency - assert(sp_singleported, "Scratchpad expects single-ported SRAM banks") - - val io = IO(new Bundle { - // SRAM read/write interface - used by load/store - val dma = new Bundle { - val sramread = Vec(sp_banks, new SramReadIO(spad_bank_entries, spad_w)) - val sramwrite = Vec(sp_banks, new SramWriteIO(spad_bank_entries, spad_w, spad_mask_len)) - val accread = Vec(acc_banks, new SramReadIO(acc_bank_entries, acc_w)) - val accwrite = Vec(acc_banks, new SramWriteIO(acc_bank_entries, acc_w, acc_mask_len)) - } - // Execution unit read/write interface - one read and write per bank, OpA and OpB guaranteed to access different banks - val exec = new Bundle { - val sramread = Vec(sp_banks, new SramReadIO(spad_bank_entries, spad_w)) - val sramwrite = Vec(sp_banks, new SramWriteIO(spad_bank_entries, spad_w, spad_mask_len)) - val accread = Vec(acc_banks, new SramReadIO(acc_bank_entries, acc_w)) - val accwrite = Vec(acc_banks, new SramWriteIO(acc_bank_entries, acc_w, acc_mask_len)) - } - }) - -// ----------------------------------------------------------------------------- -// Scratchpad -// ----------------------------------------------------------------------------- - - // SRAM banks - each bank has only one port, supports simultaneous read and write - val spad_mems = Seq.fill(sp_banks) { Module(new SramBank( - spad_bank_entries, spad_w, - // Use configuration parameters - aligned_to, sp_singleported - )) } - - // Request arbitration and connection for each bank - spad_mems.zipWithIndex.foreach { case (bank, i) => - - // All read request sources - val main_read_req = io.dma.sramread(i).req - val exec_read_req = io.exec.sramread(i).req - - // All write request sources - val main_write = io.dma.sramwrite(i) - val exec_write = io.exec.sramwrite(i) - - // Assertion: OpA and OpB should not access the same bank simultaneously - assert(!(exec_read_req.valid && exec_write.req.valid), - s"Bank ${i}: exec and write cannot access the same bank simultaneously") - - // Read request arbitration: priority exec > main - val exec_read_sel = exec_read_req.valid - val main_read_sel = main_read_req.valid && !exec_read_sel - - // Write request arbitration: exec has higher priority - val exec_write_sel = exec_write.req.valid - - // Connect read request to SramBank - bank.io.read.req.valid := exec_read_sel || main_read_sel - bank.io.read.req.bits := Mux(exec_read_sel, exec_read_req.bits, main_read_req.bits) - - // Read request ready reverse connection - main_read_req.ready := main_read_sel && bank.io.read.req.ready - exec_read_req.ready := exec_read_sel && bank.io.read.req.ready - - // Record which client initiated read request (for response distribution) - val resp_to_main = RegNext(main_read_sel && bank.io.read.req.fire, false.B) - val resp_to_exec = RegNext(exec_read_sel && bank.io.read.req.fire, false.B) - - // Read response distribution - io.dma.sramread(i).resp.valid := bank.io.read.resp.valid && resp_to_main - io.dma.sramread(i).resp.bits := bank.io.read.resp.bits - - io.exec.sramread(i).resp.valid := bank.io.read.resp.valid && resp_to_exec - io.exec.sramread(i).resp.bits := bank.io.read.resp.bits - - // Read response ready signal: either client ready is sufficient - bank.io.read.resp.ready := - (resp_to_main && io.dma.sramread(i).resp.ready) || - (resp_to_exec && io.exec.sramread(i).resp.ready) - - // Connect write request to SramBank - bank.io.write.req.valid := Mux(exec_write_sel, exec_write.req.valid, main_write.req.valid) - bank.io.write.req.bits.addr := Mux(exec_write_sel, exec_write.req.bits.addr, main_write.req.bits.addr) - bank.io.write.req.bits.data := Mux(exec_write_sel, exec_write.req.bits.data, main_write.req.bits.data) - bank.io.write.req.bits.mask := Mux(exec_write_sel, exec_write.req.bits.mask, main_write.req.bits.mask) - - // Write request ready reverse connection - main_write.req.ready := !exec_write_sel && bank.io.write.req.ready - exec_write.req.ready := exec_write_sel && bank.io.write.req.ready - } - -// ----------------------------------------------------------------------------- -// Accumulator -// ----------------------------------------------------------------------------- - - val acc_mems = Seq.fill(acc_banks) { Module(new AccBank( - acc_bank_entries, acc_w, - // Use configuration parameters - aligned_to, sp_singleported - )) } - - // Request arbitration and connection for each acc bank - acc_mems.zipWithIndex.foreach { case (bank, i) => - // All read request sources - val main_read_req = io.dma.accread(i).req - val exec_read_req = io.exec.accread(i).req - - // All write request sources - val main_write = io.dma.accwrite(i) - val exec_write = io.exec.accwrite(i) - - // Assertion: OpA and OpB should not access the same bank simultaneously - assert(!(exec_read_req.valid && exec_write.req.valid), - s"Bank ${i}: exec and write cannot access the same bank simultaneously") - - // Read request arbitration: priority exec > main - val exec_read_sel = exec_read_req.valid - val main_read_sel = main_read_req.valid && !exec_read_sel - - // Write request arbitration: exec has higher priority - val exec_write_sel = exec_write.req.valid - - // Connect read request to SramBank - bank.io.read.req.valid := exec_read_sel || main_read_sel - bank.io.read.req.bits := Mux(exec_read_sel, exec_read_req.bits, main_read_req.bits) - - // Read request ready reverse connection - main_read_req.ready := main_read_sel && bank.io.read.req.ready - exec_read_req.ready := exec_read_sel && bank.io.read.req.ready - - // Record which client initiated read request (for response distribution) - val resp_to_main = RegNext(main_read_sel && bank.io.read.req.fire, false.B) - val resp_to_exec = RegNext(exec_read_sel && bank.io.read.req.fire, false.B) - - // Read response distribution - io.dma.accread(i).resp.valid := bank.io.read.resp.valid && resp_to_main - io.dma.accread(i).resp.bits := bank.io.read.resp.bits - - io.exec.accread(i).resp.valid := bank.io.read.resp.valid && resp_to_exec - io.exec.accread(i).resp.bits := bank.io.read.resp.bits - - // Read response ready signal: either client ready is sufficient - bank.io.read.resp.ready := - (resp_to_main && io.dma.accread(i).resp.ready) || - (resp_to_exec && io.exec.accread(i).resp.ready) - - // Connect write request to SramBank - bank.io.write.req.valid := Mux(exec_write_sel, exec_write.req.valid, main_write.req.valid) - bank.io.write.req.bits.addr := Mux(exec_write_sel, exec_write.req.bits.addr, main_write.req.bits.addr) - bank.io.write.req.bits.data := Mux(exec_write_sel, exec_write.req.bits.data, main_write.req.bits.data) - bank.io.write.req.bits.mask := Mux(exec_write_sel, exec_write.req.bits.mask, main_write.req.bits.mask) - bank.io.write.is_acc := Mux(exec_write_sel, true.B, false.B) - - // Write request ready reverse connection - main_write.req.ready := !exec_write_sel && bank.io.write.req.ready - exec_write.req.ready := exec_write_sel && bank.io.write.req.ready - } -} diff --git a/arch/src/main/scala/framework/memdomain/mem/SramBank.scala b/arch/src/main/scala/framework/memdomain/mem/SramBank.scala deleted file mode 100644 index 152a98a1..00000000 --- a/arch/src/main/scala/framework/memdomain/mem/SramBank.scala +++ /dev/null @@ -1,83 +0,0 @@ -package framework.memdomain.mem - -import chisel3._ -import chisel3.util._ - -import framework.builtin.util.Util._ - -class SramReadReq(val n: Int) extends Bundle { - val addr = UInt(log2Ceil(n).W) - val fromDMA = Bool() -} - -class SramReadResp(val w: Int) extends Bundle { - val data = UInt(w.W) - val fromDMA = Bool() -} - -class SramReadIO(val n: Int, val w: Int) extends Bundle { - val req = Flipped(Decoupled(new SramReadReq(n))) - val resp = Decoupled(new SramReadResp(w)) -} - -class SramWriteReq(val n: Int, val w: Int, val mask_len: Int) extends Bundle { - val addr = UInt(log2Ceil(n).W) - val mask = Vec(mask_len, Bool()) - val data = UInt(w.W) -} - -class SramWriteIO(val n: Int, val w: Int, val mask_len: Int) extends Bundle { - val req = Flipped(Decoupled(new SramWriteReq(n, w, mask_len))) -} - -class SramBank(n: Int, w: Int, aligned_to: Int, single_ported: Boolean) extends Module { - require(w % aligned_to == 0 || w < aligned_to) - - val mask_len = (w / (aligned_to * 8)) max 1 // How many mask bits are there? - val mask_elem = UInt((w min (aligned_to * 8)).W) // What datatype does each mask bit correspond to? - - val io = IO(new Bundle { - val read = new SramReadIO(n, w) - val write = new SramWriteIO(n, w, mask_len) - }) - - val mem = SyncReadMem(n, Vec(mask_len, mask_elem)) - - // Note: Memory is not initialized on reset to avoid FIRRTL compilation explosion - // Software should initialize memory before use if needed - - // Only one request per cycle is allowed - assert(!(io.read.req.valid && io.write.req.valid), "SramBank: Read and write requests is not allowed at the same time") - - // Read request can be ready as long as there is no write request - io.read.req.ready := !io.write.req.valid - -// ----------------------------------------------------------------------------- -// Write -// ----------------------------------------------------------------------------- - // Write request is always ready unless there is a read request in progress - io.write.req.ready := !io.read.req.valid - - when (io.write.req.valid) { - mem.write(io.write.req.bits.addr, io.write.req.bits.data.asTypeOf(Vec(mask_len, mask_elem)), VecInit((~(0.U(mask_len.W))).asBools)) - } - -// ----------------------------------------------------------------------------- -// Read -// ----------------------------------------------------------------------------- - val raddr = io.read.req.bits.addr - val ren = io.read.req.fire - val rdata = mem.read(raddr, ren).asUInt - val fromDMA = io.read.req.bits.fromDMA - - // Make a queue which buffers the result of an SRAM read if it can't immediately be consumed - // val q = Module(new Queue(new SramReadResp(w), 1, true, true)) - // q.io.enq.valid := RegNext(ren) - // q.io.enq.bits.data := rdata - // q.io.enq.bits.fromDMA := RegNext(fromDMA) - - // io.read.resp <> q.io.deq - io.read.resp.valid := RegNext(ren) - io.read.resp.bits.data := rdata - io.read.resp.bits.fromDMA := RegNext(fromDMA) -} diff --git a/arch/src/main/scala/framework/memdomain/midend/MemMidend.scala b/arch/src/main/scala/framework/memdomain/midend/MemMidend.scala new file mode 100644 index 00000000..4ef02ff6 --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/midend/MemMidend.scala @@ -0,0 +1,170 @@ +package framework.memdomain.midend + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig +import framework.balldomain.blink.{BankRead, BankWrite} +import chisel3.experimental.hierarchy.{instantiable, public} +import framework.memdomain.backend.MemRequestIO + +// BankRead/BankWrite with is_shared flag, used for unified midend interface +class BankReadWithShared(val b: GlobalConfig) extends Bundle { + val bankRead = new BankRead(b) + val is_shared = Input(Bool()) +} + +class BankWriteWithShared(val b: GlobalConfig) extends Bundle { + val bankWrite = new BankWrite(b) + val is_shared = Input(Bool()) +} + +/** + * MemMidend: Midend module for memory scheduling + * Connects MemFrontend to MemManager + * + * Unified interface: bankRead/bankWrite Vecs include both balldomain and frontend requests. + * The last entry (index totalBallRead / totalBallWrite) is the frontend (DMA). + * All requests go through the same mapping table and channel allocation logic. + */ +@instantiable +class MemMidend(val b: GlobalConfig) extends Module { + val totalBallRead = b.ballDomain.ballIdMappings.map(_.inBW).sum + val totalBallWrite = b.ballDomain.ballIdMappings.map(_.outBW).sum + + // Total slots: balldomain entries + 1 frontend entry + val totalRead = totalBallRead + 1 + val totalWrite = totalBallWrite + 1 + + @public + val io = IO(new Bundle { + // Unified read/write interfaces: indices [0, totalBallRead) are balldomain, + // index totalBallRead is frontend (DMA). Same for write. + val bankRead = Vec(totalRead, new BankReadWithShared(b)) + val bankWrite = Vec(totalWrite, new BankWriteWithShared(b)) + + val hartid = Input(UInt(b.core.xLen.W)) + + // Output to backend (MemManager) + val mem_req = Vec(b.memDomain.bankChannel, new MemRequestIO(b)) + }) + + // ----------------------------------------------------------------------------- + // Mapping table for tracking all requests (balldomain + frontend) + // ----------------------------------------------------------------------------- + class MappingTableEntry extends Bundle { + val valid = Bool() + val isRead = Bool() + val id = UInt(log2Ceil(math.max(totalRead, totalWrite)).W) + } + + val mappingTable = RegInit(VecInit(Seq.fill(b.memDomain.bankChannel)(0.U.asTypeOf(new MappingTableEntry)))) + + def addEntry(idx: UInt, isRead: Bool, id: UInt): Unit = { + mappingTable(idx).valid := true.B + mappingTable(idx).isRead := isRead + mappingTable(idx).id := id + } + + def allocateChannel(): (Bool, UInt) = { + val freeChannels = mappingTable.map(entry => !entry.valid) + val hasFreeChan = freeChannels.reduce(_ || _) + val chanId = PriorityEncoder(freeChannels) + (hasFreeChan, chanId) + } + + def isAllocated(isRead: Bool, id: UInt): Bool = + mappingTable.map(entry => entry.valid && entry.isRead === isRead && entry.id === id).reduce(_ || _) + + // Allocate channels for reads (all entries including frontend) + for (i <- 0 until totalRead) { + io.bankRead(i).bankRead.io.req.ready := false.B + io.bankRead(i).bankRead.io.resp.valid := false.B + io.bankRead(i).bankRead.io.resp.bits := DontCare + + when(io.bankRead(i).bankRead.io.req.valid && !isAllocated(true.B, i.U)) { + val (hasFree, chanId) = allocateChannel() + when(hasFree) { + addEntry(chanId, true.B, i.U) + } + } + } + + // Allocate channels for writes: one per cycle to avoid conflicts + val pendingWrites = + VecInit((0 until totalWrite).map(i => io.bankWrite(i).bankWrite.io.req.valid && !isAllocated(false.B, i.U))) + val hasPendingWrite = pendingWrites.asUInt.orR + val nextWriteToAllocate = PriorityEncoder(pendingWrites) + + for (i <- 0 until totalWrite) { + io.bankWrite(i).bankWrite.io.req.ready := false.B + io.bankWrite(i).bankWrite.io.resp.valid := false.B + io.bankWrite(i).bankWrite.io.resp.bits := DontCare + + when(hasPendingWrite && nextWriteToAllocate === i.U) { + val (hasFree, chanId) = allocateChannel() + when(hasFree) { + addEntry(chanId, false.B, i.U) + } + } + } + + // Connect mapped entries to backend channels + for (i <- 0 until b.memDomain.bankChannel) { + io.mem_req(i).read.req.valid := false.B + io.mem_req(i).read.req.bits := DontCare + io.mem_req(i).read.resp.ready := false.B + io.mem_req(i).write.req.valid := false.B + io.mem_req(i).write.req.bits := DontCare + io.mem_req(i).write.resp.ready := false.B + io.mem_req(i).bank_id := 0.U + io.mem_req(i).group_id := 0.U + io.mem_req(i).is_shared := false.B + io.mem_req(i).hart_id := io.hartid + + val isRead = mappingTable(i).isRead + val rid = mappingTable(i).id + val wid = mappingTable(i).id + val ballRead = io.bankRead(rid).bankRead.io + val ballWrite = io.bankWrite(wid).bankWrite.io + val rbank_id = io.bankRead(rid).bankRead.bank_id + val wbank_id = io.bankWrite(wid).bankWrite.bank_id + val rgroup_id = io.bankRead(rid).bankRead.group_id + val wgroup_id = io.bankWrite(wid).bankWrite.group_id + val r_shared = io.bankRead(rid).is_shared + val w_shared = io.bankWrite(wid).is_shared + + when(mappingTable(i).valid) { + when(isRead) { + io.mem_req(i).read <> ballRead + io.mem_req(i).bank_id := rbank_id + io.mem_req(i).group_id := rgroup_id + io.mem_req(i).is_shared := r_shared + }.otherwise { + io.mem_req(i).write <> ballWrite + io.mem_req(i).bank_id := wbank_id + io.mem_req(i).group_id := wgroup_id + io.mem_req(i).is_shared := w_shared + } + } + } + + // Mapping table release + for (i <- 0 until b.memDomain.bankChannel) { + val releaseCounter = RegInit(0.U(5.W)) + + when(mappingTable(i).valid && !(io.mem_req(i).read.resp.valid || + io.mem_req(i).write.resp.valid || io.mem_req(i).read.req.valid || + io.mem_req(i).write.req.valid)) { + releaseCounter := releaseCounter + 1.U + + when(releaseCounter === 16.U) { + releaseCounter := 0.U + mappingTable(i).valid := false.B + mappingTable(i).isRead := false.B + mappingTable(i).id := 0.U + } + }.otherwise { + releaseCounter := 0.U + } + } +} diff --git a/arch/src/main/scala/framework/memdomain/pmc/MemCyclePMC.scala b/arch/src/main/scala/framework/memdomain/pmc/MemCyclePMC.scala deleted file mode 100644 index 99e073a9..00000000 --- a/arch/src/main/scala/framework/memdomain/pmc/MemCyclePMC.scala +++ /dev/null @@ -1,50 +0,0 @@ -package framework.memdomain.pmc - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.rs.{MemRsIssue, MemRsComplete} - -class MemCyclePMC(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val ldReq_i = Input(Valid(new MemRsIssue)) - val stReq_i = Input(Valid(new MemRsIssue)) - val ldResp_o = Input(Valid(new MemRsComplete)) - val stResp_o = Input(Valid(new MemRsComplete)) - val ldTotalCycles = Output(UInt(64.W)) - val stTotalCycles = Output(UInt(64.W)) - }) - - val cycleCounter = RegInit(0.U(64.W)) - cycleCounter := cycleCounter + 1.U - - val startTime = Reg(Vec(b.rob_entries, UInt(64.W))) - val ldTotalCycles = RegInit(0.U(64.W)) - val stTotalCycles = RegInit(0.U(64.W)) - - when(io.ldReq_i.valid) { - startTime(io.ldReq_i.bits.rob_id) := cycleCounter - } - - when(io.stReq_i.valid) { - startTime(io.stReq_i.bits.rob_id) := cycleCounter - } - - when(io.ldResp_o.valid) { - val robId = io.ldResp_o.bits.rob_id - val elapsed = cycleCounter - startTime(robId) - ldTotalCycles := ldTotalCycles + elapsed - printf("[PMC] Load completed, elapsed: %d cycles\n", elapsed) - } - - when(io.stResp_o.valid) { - val robId = io.stResp_o.bits.rob_id - val elapsed = cycleCounter - startTime(robId) - stTotalCycles := stTotalCycles + elapsed - printf("[PMC] Store completed, elapsed: %d cycles\n", elapsed) - } - - io.ldTotalCycles := ldTotalCycles - io.stTotalCycles := stTotalCycles -} diff --git a/arch/src/main/scala/framework/memdomain/rs/reservationStation.scala b/arch/src/main/scala/framework/memdomain/rs/reservationStation.scala deleted file mode 100644 index d8d13f7f..00000000 --- a/arch/src/main/scala/framework/memdomain/rs/reservationStation.scala +++ /dev/null @@ -1,99 +0,0 @@ -package framework.memdomain.rs - -import chisel3._ -import chisel3.util._ -import chisel3.experimental._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain._ -// Mem domain issue interface - includes global rob_id -class MemRsIssue(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val cmd = new MemDecodeCmd - // Global ROB ID - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -// Mem domain completion interface -class MemRsComplete(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -// Mem domain issue interface combination (Load + Store) -class MemIssueInterface(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val ld = Decoupled(new MemRsIssue) - val st = Decoupled(new MemRsIssue) -} - -// Mem domain completion interface combination (Load + Store) -class MemCommitInterface(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val ld = Flipped(Decoupled(new MemRsComplete)) - val st = Flipped(Decoupled(new MemRsComplete)) -} - -// Local Mem reservation station - simple FIFO scheduler -class MemReservationStation(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - // Decoded instruction input (with global rob_id) - val mem_decode_cmd_i = Flipped(new DecoupledIO(new Bundle { - val cmd = new MemDecodeCmd - // Global ROB ID - val rob_id = UInt(log2Up(b.rob_entries).W) - })) - - // Rs -> MemLoader/MemStorer - val issue_o = new MemIssueInterface - val commit_i = new MemCommitInterface - - // Output completion signal (with global rob_id, single channel) - val complete_o = Decoupled(new MemRsComplete) - }) - - // Simple FIFO queue, only for buffering - val fifo = Module(new Queue(new Bundle { - val cmd = new MemDecodeCmd - val rob_id = UInt(log2Up(b.rob_entries).W) - }, entries = 4)) // Small buffer is sufficient - -// ----------------------------------------------------------------------------- -// Inbound - FIFO enqueue -// ----------------------------------------------------------------------------- - fifo.io.enq <> io.mem_decode_cmd_i - -// ----------------------------------------------------------------------------- -// Outbound - instruction issue (dispatch based on is_load/is_store) -// ----------------------------------------------------------------------------- - val headEntry = fifo.io.deq.bits - - // Load issue - io.issue_o.ld.valid := fifo.io.deq.valid && headEntry.cmd.is_load - io.issue_o.ld.bits.cmd := headEntry.cmd - io.issue_o.ld.bits.rob_id := headEntry.rob_id - - // Store issue - io.issue_o.st.valid := fifo.io.deq.valid && headEntry.cmd.is_store - io.issue_o.st.bits.cmd := headEntry.cmd - io.issue_o.st.bits.rob_id := headEntry.rob_id - - // FIFO deq.ready - can only dequeue when target unit is ready - fifo.io.deq.ready := - (headEntry.cmd.is_load && io.issue_o.ld.ready) || - (headEntry.cmd.is_store && io.issue_o.st.ready) - -// ----------------------------------------------------------------------------- -// Completion signal processing - directly forward to global RS -// ----------------------------------------------------------------------------- - val completeArb = Module(new Arbiter(UInt(log2Up(b.rob_entries).W), 2)) - - completeArb.io.in(0).valid := io.commit_i.ld.valid - completeArb.io.in(0).bits := io.commit_i.ld.bits.rob_id - io.commit_i.ld.ready := completeArb.io.in(0).ready - - completeArb.io.in(1).valid := io.commit_i.st.valid - completeArb.io.in(1).bits := io.commit_i.st.bits.rob_id - io.commit_i.st.ready := completeArb.io.in(1).ready - - // Forward completion signal (with global rob_id) - io.complete_o.valid := completeArb.io.out.valid - io.complete_o.bits.rob_id := completeArb.io.out.bits - completeArb.io.out.ready := io.complete_o.ready -} diff --git a/arch/src/main/scala/framework/memdomain/rs/rob.scala b/arch/src/main/scala/framework/memdomain/rs/rob.scala deleted file mode 100644 index b9776908..00000000 --- a/arch/src/main/scala/framework/memdomain/rs/rob.scala +++ /dev/null @@ -1,89 +0,0 @@ -package framework.memdomain.rs - -import chisel3._ -import chisel3.util._ -import chisel3.experimental._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.MemDecodeCmd - -// ROB entry data structure - preserves ROB ID to support out-of-order completion -class RobEntry(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val cmd = new MemDecodeCmd - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -class ROB (implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - // Allocation interface - val alloc = Flipped(new DecoupledIO(new MemDecodeCmd)) - - // Issue interface - issue uncompleted head instruction - val issue = new DecoupledIO(new RobEntry) - - // Completion interface - report instruction completion - val complete = Flipped(new DecoupledIO(UInt(log2Up(b.rob_entries).W))) - - // Commit interface - commit completed head instruction - // val commit = new DecoupledIO(new RobEntry) - - // Status signals - val empty = Output(Bool()) - val full = Output(Bool()) - }) - - // Only use FIFO + completion status table, only enqueue/dequeue, sequential execution and sequential completion - val robFifo = Module(new Queue(new RobEntry, b.rob_entries)) - val robIdCounter = RegInit(0.U(log2Up(b.rob_entries).W)) - // Initialize to false to avoid X states in FPGA - val robTable = RegInit(VecInit(Seq.fill(b.rob_entries)(false.B))) - - // Initialize completion status table - for (i <- 0 until b.rob_entries) { - when(reset.asBool) { - robTable(i) := true.B - } - } - -// ----------------------------------------------------------------------------- -// Inbound - instruction allocation -// ----------------------------------------------------------------------------- - robFifo.io.enq.valid := io.alloc.valid - robFifo.io.enq.bits.cmd := io.alloc.bits - robFifo.io.enq.bits.rob_id := robIdCounter - - io.alloc.ready := robFifo.io.enq.ready - - when(io.alloc.fire) { - robIdCounter := robIdCounter + 1.U - robTable(robIdCounter) := false.B - } - -// ----------------------------------------------------------------------------- -// Completion signal processing using robTable tracking -// ----------------------------------------------------------------------------- - io.complete.ready := true.B - when(io.complete.fire) { - robTable(io.complete.bits) := true.B - } - -// ----------------------------------------------------------------------------- -// Outbound - head instruction issue -// ----------------------------------------------------------------------------- - val headEntry = robFifo.io.deq.bits - val headCompleted = robTable(headEntry.rob_id) - io.issue.valid := robFifo.io.deq.valid && !headCompleted - io.issue.bits := headEntry - - robFifo.io.deq.ready := io.issue.ready && !headCompleted - - -// ----------------------------------------------------------------------------- -// Status signals -// ----------------------------------------------------------------------------- - val isEmpty = robTable.reduce(_ && _) - val isFull = !robFifo.io.enq.ready - - io.empty := isEmpty - io.full := isFull -} diff --git a/arch/src/main/scala/framework/memdomain/tlb/BBTLB.scala b/arch/src/main/scala/framework/memdomain/tlb/BBTLB.scala deleted file mode 100644 index c5bf3391..00000000 --- a/arch/src/main/scala/framework/memdomain/tlb/BBTLB.scala +++ /dev/null @@ -1,74 +0,0 @@ -package framework.memdomain.tlb - -import chisel3._ -import chisel3.util._ - -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.rocket._ -import freechips.rocketchip.tile.{CoreBundle, CoreModule} -import freechips.rocketchip.tilelink.TLEdgeOut - -import framework.builtin.util.Util._ - -class BBTLBReq(val lgMaxSize: Int)(implicit p: Parameters) extends CoreBundle { - val tlb_req = new TLBReq(lgMaxSize) - val status = new MStatus -} - -class BBTLBExceptionIO extends Bundle { - val interrupt = Output(Bool()) - val flush_retry = Input(Bool()) - val flush_skip = Input(Bool()) - - def flush(dummy: Int = 0): Bool = flush_retry || flush_skip -} - -class BBTLB(entries: Int, maxSize: Int)(implicit edge: TLEdgeOut, p: Parameters) - extends CoreModule { - - val lgMaxSize = log2Ceil(maxSize) - val io = IO(new Bundle { - val req = Flipped(Valid(new BBTLBReq(lgMaxSize))) - val resp = new TLBResp - val ptw = new TLBPTWIO - val exp = new BBTLBExceptionIO - }) - - val interrupt = RegInit(false.B) - io.exp.interrupt := interrupt - - val tlb = Module(new TLB(false, lgMaxSize, TLBConfig(nSets=1, nWays=entries))) - tlb.io.req.valid := io.req.valid - tlb.io.req.bits := io.req.bits.tlb_req - io.resp := tlb.io.resp - tlb.io.kill := false.B - - tlb.io.sfence.valid := io.exp.flush() - tlb.io.sfence.bits.rs1 := false.B - tlb.io.sfence.bits.rs2 := false.B - tlb.io.sfence.bits.addr := DontCare - tlb.io.sfence.bits.asid := DontCare - tlb.io.sfence.bits.hv := false.B - tlb.io.sfence.bits.hg := false.B - - io.ptw <> tlb.io.ptw - tlb.io.ptw.status := io.req.bits.status - - val exception = io.req.valid && Mux(io.req.bits.tlb_req.cmd === M_XRD, - tlb.io.resp.pf.ld || tlb.io.resp.ae.ld || tlb.io.resp.gf.ld, - tlb.io.resp.pf.st || tlb.io.resp.ae.st || tlb.io.resp.gf.st) - - when (exception) { - interrupt := true.B - } - - when (interrupt && io.exp.flush_skip) { - interrupt := false.B - } - - when (interrupt && io.exp.flush_retry) { - interrupt := false.B - } - - assert(!io.exp.flush_retry || !io.exp.flush_skip, "TLB: flushing with both retry and skip at same time") -} diff --git a/arch/src/main/scala/framework/memdomain/tlb/README.md b/arch/src/main/scala/framework/memdomain/tlb/README.md deleted file mode 100644 index 3791d836..00000000 --- a/arch/src/main/scala/framework/memdomain/tlb/README.md +++ /dev/null @@ -1,252 +0,0 @@ -# TLB Module (Translation Lookaside Buffer) - -## Overview - -The TLB module implements address translation caching from virtual to physical addresses, located at `framework/builtin/memdomain/tlb`. Based on Rocket-chip's TLB implementation, it provides Buckyball-specific TLB encapsulation and cluster management. - -## File Structure - -``` -tlb/ -├── BBTLB.scala - Buckyball TLB implementation -├── TLBCluster.scala - TLB cluster manager -├── spec-BBTLB.md - BBTLB specification -└── spec-BBTLBCluster.md - TLB cluster specification -``` - -## Core Components - -### BBTLB - Buckyball TLB - -BBTLB wraps Rocket-chip TLB with Buckyball-specific interface and exception handling: - -```scala -class BBTLB(entries: Int, maxSize: Int)(implicit edge: TLEdgeOut, p: Parameters) - extends CoreModule { - - val lgMaxSize = log2Ceil(maxSize) - val io = IO(new Bundle { - val req = Flipped(Valid(new BBTLBReq(lgMaxSize))) - val resp = new TLBResp - val ptw = new TLBPTWIO - val exp = new BBTLBExceptionIO - }) -} -``` - -**Request Interface**: -```scala -class BBTLBReq(val lgMaxSize: Int)(implicit p: Parameters) extends CoreBundle { - val tlb_req = new TLBReq(lgMaxSize) // TLB request - val status = new MStatus // Processor status -} -``` - -**Exception Interface**: -```scala -class BBTLBExceptionIO extends Bundle { - val interrupt = Output(Bool()) // Interrupt output - val flush_retry = Input(Bool()) // Retry flush - val flush_skip = Input(Bool()) // Skip flush - - def flush(dummy: Int = 0): Bool = flush_retry || flush_skip -} -``` - -**Implementation**: Internally instantiates Rocket-chip TLB with single-set configuration: -```scala -val tlb = Module(new TLB(false, lgMaxSize, TLBConfig(nSets=1, nWays=entries))) -``` - -**Exception Detection**: -```scala -val exception = io.req.valid && Mux( - io.req.bits.tlb_req.cmd === M_XRD, - tlb.io.resp.pf.ld || tlb.io.resp.ae.ld, // Read exceptions - tlb.io.resp.pf.st || tlb.io.resp.ae.st // Write exceptions -) -``` - -### BBTLBCluster - TLB Cluster - -BBTLBCluster manages multiple TLB instances for concurrent client access: - -```scala -class BBTLBCluster(nClients: Int, entries: Int, maxSize: Int) - (implicit edge: TLEdgeOut, p: Parameters) extends CoreModule { - - val io = IO(new Bundle { - val clients = Flipped(Vec(nClients, new BBTLBIO)) - val ptw = Vec(nClients, new TLBPTWIO) - val exp = Vec(nClients, new BBTLBExceptionIO) - }) -} -``` - -**Client Interface**: -```scala -class BBTLBIO(implicit p: Parameters) extends CoreBundle { - val lgMaxSize = log2Ceil(coreDataBytes) - val req = Valid(new BBTLBReq(lgMaxSize)) // TLB request - val resp = Flipped(new TLBResp) // TLB response -} -``` - -**L0 TLB Cache**: Each client has an L0 TLB cache for recent translations: -```scala -val last_translated_valid = RegInit(false.B) -val last_translated_vpn = RegInit(0.U(vaddrBits.W)) -val last_translated_ppn = RegInit(0.U(paddrBits.W)) - -val l0_tlb_hit = last_translated_valid && - ((client.req.bits.tlb_req.vaddr >> pgIdxBits).asUInt === - (last_translated_vpn >> pgIdxBits).asUInt) -``` - -**Translation Flow**: -1. **L0 Cache Check**: First check L0 TLB cache for hit -2. **L1 TLB Query**: Query L1 TLB on L0 miss -3. **Page Table Walk**: PTW on TLB miss -4. **Cache Update**: Update L0 cache on successful translation - -```scala -when (tlbReqFire && !tlb.io.resp.miss) { - last_translated_valid := true.B - last_translated_vpn := tlbReq.tlb_req.vaddr - last_translated_ppn := tlb.io.resp.paddr -} -``` - -## Configuration - -**TLB Parameters**: -- `entries`: Number of TLB entries -- `maxSize`: Maximum transfer size -- `nClients`: Number of clients - -**Example**: -```scala -val tlbConfig = TLBConfig( - nSets = 1, // TLB sets - nWays = 32 // Ways per set -) -``` - -## Usage - -### Single TLB - -```scala -val bbtlb = Module(new BBTLB(entries = 32, maxSize = 64)) - -// Connect request -bbtlb.io.req.valid := tlbReqValid -bbtlb.io.req.bits.tlb_req := tlbRequest -bbtlb.io.req.bits.status := processorStatus - -// Get response -val tlbResp = bbtlb.io.resp -val physicalAddr = tlbResp.paddr -val tlbMiss = tlbResp.miss - -// Connect PTW -ptw <> bbtlb.io.ptw - -// Handle exceptions -when(bbtlb.io.exp.interrupt) { - // Handle TLB exception -} -``` - -### TLB Cluster - -```scala -val tlbCluster = Module(new BBTLBCluster( - nClients = 4, - entries = 32, - maxSize = 64 -)) - -// Connect clients -for (i <- 0 until nClients) { - tlbCluster.io.clients(i).req.valid := clientReqValid(i) - tlbCluster.io.clients(i).req.bits := clientReq(i) - clientResp(i) := tlbCluster.io.clients(i).resp -} - -// Connect PTW -for (i <- 0 until nClients) { - ptw(i) <> tlbCluster.io.ptw(i) -} -``` - -## Address Translation - -### Virtual Address Format - -``` -Virtual Address (64-bit): -[63:39] [38:30] [29:21] [20:12] [11:0] - VPN3 VPN2 VPN1 VPN0 Offset -``` - -### Physical Address Format - -``` -Physical Address: -[PPN][Offset] -``` - -### Translation Process - -1. **Address Parsing**: Parse VPN and offset from virtual address -2. **TLB Lookup**: Look up PPN for VPN in TLB -3. **Page Table Walk**: PTW on TLB miss -4. **Permission Check**: Check access permissions -5. **Address Composition**: Combine PPN and offset to form physical address - -## Exception Handling - -### Exception Types - -- **Page Fault**: Access to invalid page -- **Access Exception**: Insufficient permissions -- **TLB Miss**: Requires page table walk - -### Exception Handling Flow - -```scala -val exception = io.req.valid && Mux( - io.req.bits.tlb_req.cmd === M_XRD, - tlb.io.resp.pf.ld || tlb.io.resp.ae.ld, // Read exception - tlb.io.resp.pf.st || tlb.io.resp.ae.st // Write exception -) -``` - -### TLB Flush - -```scala -tlb.io.sfence.valid := io.exp.flush() -tlb.io.sfence.bits.rs1 := false.B -tlb.io.sfence.bits.rs2 := false.B -``` - -## Performance Optimization - -### L0 TLB Cache - -- Reduces L1 TLB access latency -- Improves address translation throughput -- Lowers power consumption - -### Parallel Processing - -- Multiple clients access in parallel -- Independent page table walkers -- Separate exception handling - -## Related Modules - -- [Memory Domain Overview](../README.md) -- [DMA Engines](../dma/README.md) -- [Memory Controller](../mem/README.md) diff --git a/arch/src/main/scala/framework/memdomain/tlb/TLBCluster.scala b/arch/src/main/scala/framework/memdomain/tlb/TLBCluster.scala deleted file mode 100644 index 0bb4cc9f..00000000 --- a/arch/src/main/scala/framework/memdomain/tlb/TLBCluster.scala +++ /dev/null @@ -1,81 +0,0 @@ -package framework.memdomain.tlb - -import chisel3._ -import chisel3.util._ - -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.rocket._ -import freechips.rocketchip.tile.{CoreBundle, CoreModule} -import freechips.rocketchip.tilelink.TLEdgeOut - - -class BBTLBIO(implicit p: Parameters) extends CoreBundle { - val lgMaxSize = log2Ceil(coreDataBytes) - val req = Valid(new BBTLBReq(lgMaxSize)) - val resp = Flipped(new TLBResp) -} - -class BBTLBCluster(nClients: Int, entries: Int, maxSize: Int, use_shared_tlb: Boolean = true) - (implicit edge: TLEdgeOut, p: Parameters) extends CoreModule { - - val num_tlbs = if (use_shared_tlb) 1 else nClients - val lgMaxSize = log2Ceil(coreDataBytes) - - val io = IO(new Bundle { - val clients = Flipped(Vec(nClients, new BBTLBIO)) - val ptw = Vec(num_tlbs, new TLBPTWIO) - val exp = Vec(num_tlbs, new BBTLBExceptionIO) - }) - - val tlbs = Seq.fill(num_tlbs)(Module(new BBTLB(entries, maxSize))) - - io.ptw <> VecInit(tlbs.map(_.io.ptw)) - io.exp <> VecInit(tlbs.map(_.io.exp)) - - val tlbArbOpt = if (use_shared_tlb) Some(Module(new RRArbiter(new BBTLBReq(lgMaxSize), nClients))) else None - - if (use_shared_tlb) { - val tlbArb = tlbArbOpt.get - val tlb = tlbs.head - tlb.io.req.valid := tlbArb.io.out.valid - tlb.io.req.bits := tlbArb.io.out.bits - tlbArb.io.out.ready := true.B - } - - io.clients.zipWithIndex.foreach { case (client, i) => - val last_translated_valid = RegInit(false.B) - val last_translated_vpn = RegInit(0.U(vaddrBits.W)) - val last_translated_ppn = RegInit(0.U(paddrBits.W)) - - val l0_tlb_hit = last_translated_valid && ((client.req.bits.tlb_req.vaddr >> pgIdxBits).asUInt === (last_translated_vpn >> pgIdxBits).asUInt) - val l0_tlb_paddr = Cat(last_translated_ppn >> pgIdxBits, client.req.bits.tlb_req.vaddr(pgIdxBits-1,0)) - - val tlb = if (use_shared_tlb) tlbs.head else tlbs(i) - val tlbReq = if (use_shared_tlb) tlbArbOpt.get.io.in(i).bits else tlb.io.req.bits - val tlbReqValid = if (use_shared_tlb) tlbArbOpt.get.io.in(i).valid else tlb.io.req.valid - val tlbReqFire = if (use_shared_tlb) tlbArbOpt.get.io.in(i).fire else tlb.io.req.fire - - val l0_tlb_paddr_reg = RegEnable(client.req.bits.tlb_req.vaddr, client.req.valid) - - tlbReqValid := RegNext(client.req.valid && !l0_tlb_hit) - tlbReq := RegNext(client.req.bits) - - when (tlbReqFire && !tlb.io.resp.miss) { - last_translated_valid := true.B - last_translated_vpn := tlbReq.tlb_req.vaddr - last_translated_ppn := tlb.io.resp.paddr - } - - when (tlb.io.exp.flush()) { - last_translated_valid := false.B - } - - when (tlbReqFire) { - client.resp := tlb.io.resp - }.otherwise { - client.resp := DontCare - client.resp.paddr := RegNext(l0_tlb_paddr) - client.resp.miss := !RegNext(l0_tlb_hit) - } - } -} diff --git a/arch/src/main/scala/framework/memdomain/tlb/spec-BBTLB.md b/arch/src/main/scala/framework/memdomain/tlb/spec-BBTLB.md deleted file mode 100644 index a0037906..00000000 --- a/arch/src/main/scala/framework/memdomain/tlb/spec-BBTLB.md +++ /dev/null @@ -1,29 +0,0 @@ -# BBTLB (Translation Lookaside Buffer) Specification - -## Overview - -BBTLB is a decoupled translation lookaside buffer implementation that accelerates virtual to physical address translation. Inheriting from CoreModule, it provides a TLB interface with exception handling mechanism, supporting page table walk (PTW) and various memory access commands. The module uses parameterized design to support configurable entry count and maximum page size, while integrating complete exception handling flow. - -## Interface Design - -The module's IO interface contains four main components: -- Request interface (req): Receives TLB request and status information -- Response interface (resp): TLB returns translation result, page fault flags, and access exception information -- Page table walker interface (ptw): PTW interface communicates with memory management unit, handling page table lookup on TLB misses -- Exception handling interface (exp): Exception handling interface manages interrupt signal generation and clearing, supporting both retry and skip flush operation modes - -## Internal Implementation - -Module internally instantiates a standard TLB module, configured as single-set associative structure (nSets=1, nWays=entries), with instruction TLB feature disabled. Internal TLB's request signal directly connects to input request's tlb_req field, while kill signal is hardwired to false, indicating no support for request cancellation. Page table walker interface communicates with internal TLB through direct connection, while passing request's status information to PTW module to ensure correct permission checking. - -## Exception Handling Flow - -Exception handling uses interrupt-based mechanism, tracking exception state through RegInit-initialized interrupt register. When valid request with page fault or access exception is detected, module performs corresponding exception checks based on memory command type: for read operations (M_XRD) checks load page fault and access exception, for write operations checks store page fault and access exception. Once exception condition is detected, interrupt signal is set high and maintained until flush operation received and flush signal successfully fires, then cleared. - -## SFENCE Operation Support - -Module implements complete SFENCE (Supervisor Fence) operation support for TLB flush and synchronization. SFENCE operation trigger condition is any form of flush signal (flush_retry or flush_skip). During SFENCE execution, all related address and ASID fields are set to DontCare, rs1 and rs2 flags are cleared, hv and hg flags are also disabled, indicating this implementation adopts simplified global flush strategy rather than selective flush. Module uses assertion to ensure not receiving retry and skip flush signals simultaneously, guaranteeing operation determinism. - -## Parameterized Configuration - -Module supports flexible configuration through constructor parameters: entries parameter controls TLB entry count, maxSize parameter defines maximum supported page size. lgMaxSize calculated through log2Ceil, used to determine address width and internal logic precision. This parameterized design enables the module to adapt to different system requirements and performance needs while maintaining interface consistency and implementation reusability. diff --git a/arch/src/main/scala/framework/memdomain/tlb/spec-BBTLBCluster.md b/arch/src/main/scala/framework/memdomain/tlb/spec-BBTLBCluster.md deleted file mode 100644 index 71e356d8..00000000 --- a/arch/src/main/scala/framework/memdomain/tlb/spec-BBTLBCluster.md +++ /dev/null @@ -1,88 +0,0 @@ -# BBTLBCluster (Translation Lookaside Buffer Cluster) Specification - -## Overview - -BBTLBCluster is a multi-client TLB cluster implementation that supports concurrent virtual address translation for multiple clients. Inheriting from CoreModule, it instantiates multiple BBTLB modules to provide independent TLB services for each client, while implementing an L0-level fast translation cache mechanism to improve performance. The module supports parameterized configuration of client count, TLB entries, and maximum page size. - -## Interface Design - -BBTLBCluster's IO interface contains three main components: -- Client interface (clients): Provides Vec(nClients, BBTLBIO), each client has independent request and response channels -- Page table walker interface (ptw): Vec(nClients, TLBPTWIO), provides independent PTW connection for each client -- Exception handling interface (exp): Vec(nClients, BBTLBExceptionIO), manages exceptions and flush operations for each client - -BBTLBIO interface definition: -- req: Valid(BBTLBReq), contains TLB request and related status information -- resp: Flipped(TLBResp), returns address translation result, miss flag, and exception information - -## Internal Architecture - -### TLB Instantiation -The module instantiates an independent BBTLB module for each client through `Seq.fill(nClients)(Module(new BBTLB(entries, maxSize)))` to create a TLB array. Each TLB instance's PTW and exception interfaces are directly connected to corresponding output ports. - -### L0 Fast Cache Mechanism -To improve translation performance for frequently accessed addresses, each client implements a single-entry L0-level translation cache: -- `last_translated_valid`: Indicates cache entry validity -- `last_translated_vpn`: Cached virtual page number -- `last_translated_ppn`: Cached physical page number - -L0 cache hit condition: cache is valid and current request's virtual page number matches cached virtual page number. - -## Address Translation Flow - -### L0 Cache Lookup -Each client's address translation first looks up in the L0 cache: -1. Check `l0_tlb_hit` condition: cache valid and page number match -2. If hit, directly calculate physical address: `Cat(last_translated_ppn >> pgIdxBits, vaddr(pgIdxBits-1,0))` -3. If miss, forward request to corresponding BBTLB module - -### TLB Lookup Flow -When L0 cache misses: -1. Use `RegNext` to delay one cycle and forward client request to TLB -2. TLB valid signal set to: `RegNext(client.req.valid && !l0_tlb_hit)` -3. TLB request data set to: `RegNext(client.req.bits)` - -### Cache Update Mechanism -When TLB request completes without miss, update L0 cache: -```scala -when (tlbReqFire && !tlb.io.resp.miss) { - last_translated_valid := true.B - last_translated_vpn := tlbReq.tlb_req.vaddr - last_translated_ppn := tlb.io.resp.paddr -} -``` - -## Response Path Design - -Module implements dual-path response mechanism: -1. **TLB Path**: When TLB request fires, directly return TLB response result -2. **L0 Cache Path**: When using L0 cache, return cached calculated physical address, miss flag set to `!RegNext(l0_tlb_hit)` - -Register `l0_tlb_paddr_reg` saves client request's virtual address for L0 cache path response. - -## Exception and Flush Handling - -Module maintains L0 cache coherency by monitoring each TLB's flush signal: -```scala -when (tlb.io.exp.flush()) { - last_translated_valid := false.B -} -``` - -When any flush operation occurs, L0 cache is immediately invalidated, ensuring address translation correctness. - -## Parameterized Configuration - -BBTLBCluster supports three main parameters: -- `nClients`: Number of supported clients, determines degree of concurrent TLB access -- `entries`: Number of entries per TLB instance -- `maxSize`: Maximum supported page size - -`lgMaxSize` calculated through `log2Ceil(coreDataBytes)`, used to determine address width and related logic precision. - -## Performance Optimization Features - -1. **L0 Fast Cache**: Provides single-entry fast access for each client, reducing hot address access latency -2. **Parallel Processing**: Multiple clients can perform address translation simultaneously, improving system throughput -3. **Independent PTW**: Each client has independent page table walker interface, avoiding PTW resource contention -4. **Pipeline Design**: Uses register delays to implement pipeline operations, improving clock frequency diff --git a/arch/src/main/scala/framework/memdomain/utils/pmc/MemCyclePMC.scala b/arch/src/main/scala/framework/memdomain/utils/pmc/MemCyclePMC.scala new file mode 100644 index 00000000..cff18f5f --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/utils/pmc/MemCyclePMC.scala @@ -0,0 +1,75 @@ +package framework.memdomain.utils.pmc + +import chisel3._ +import chisel3.util._ +import framework.top.GlobalConfig +import framework.memdomain.frontend.cmd_channel.rs.{MemRsComplete, MemRsIssue} +import chisel3.experimental.hierarchy.{instantiable, public} + +@instantiable +class MemCyclePMC(val b: GlobalConfig) extends Module { + + @public + val io = IO(new Bundle { + val ldReq_i = Input(Valid(new MemRsIssue(b))) + val stReq_i = Input(Valid(new MemRsIssue(b))) + val ldResp_o = Input(Valid(new MemRsComplete(b))) + val stResp_o = Input(Valid(new MemRsComplete(b))) + val ldTotalCycles = Output(UInt(64.W)) + val stTotalCycles = Output(UInt(64.W)) + }) + + val cycleCounter = RegInit(0.U(64.W)) + cycleCounter := cycleCounter + 1.U + + val startTime = Reg(Vec(b.frontend.rob_entries, UInt(64.W))) + val ldTotalCycles = RegInit(0.U(64.W)) + val stTotalCycles = RegInit(0.U(64.W)) + + // DPI-C trace modules for load and store + val ldPmcTrace = Module(new MemPMCTraceDPI) + val stPmcTrace = Module(new MemPMCTraceDPI) + + ldPmcTrace.io.is_store := 0.U + ldPmcTrace.io.rob_id := 0.U + ldPmcTrace.io.elapsed := 0.U + ldPmcTrace.io.enable := false.B + + stPmcTrace.io.is_store := 1.U + stPmcTrace.io.rob_id := 0.U + stPmcTrace.io.elapsed := 0.U + stPmcTrace.io.enable := false.B + + when(io.ldReq_i.valid) { + startTime(io.ldReq_i.bits.rob_id) := cycleCounter + } + + when(io.stReq_i.valid) { + startTime(io.stReq_i.bits.rob_id) := cycleCounter + } + + when(io.ldResp_o.valid) { + val robId = io.ldResp_o.bits.rob_id + val elapsed = cycleCounter - startTime(robId) + ldTotalCycles := ldTotalCycles + elapsed + + // DPI-C trace output + ldPmcTrace.io.rob_id := robId + ldPmcTrace.io.elapsed := elapsed + ldPmcTrace.io.enable := true.B + } + + when(io.stResp_o.valid) { + val robId = io.stResp_o.bits.rob_id + val elapsed = cycleCounter - startTime(robId) + stTotalCycles := stTotalCycles + elapsed + + // DPI-C trace output + stPmcTrace.io.rob_id := robId + stPmcTrace.io.elapsed := elapsed + stPmcTrace.io.enable := true.B + } + + io.ldTotalCycles := ldTotalCycles + io.stTotalCycles := stTotalCycles +} diff --git a/arch/src/main/scala/framework/memdomain/utils/pmc/MemPMCTraceDPI.scala b/arch/src/main/scala/framework/memdomain/utils/pmc/MemPMCTraceDPI.scala new file mode 100644 index 00000000..07a24bdb --- /dev/null +++ b/arch/src/main/scala/framework/memdomain/utils/pmc/MemPMCTraceDPI.scala @@ -0,0 +1,39 @@ +package framework.memdomain.utils.pmc + +import chisel3._ +import chisel3.util._ + +// DPI-C BlackBox for memory PMC trace +class MemPMCTraceDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val is_store = Input(UInt(8.W)) + val rob_id = Input(UInt(32.W)) + val elapsed = Input(UInt(64.W)) + val enable = Input(Bool()) + }) + + setInline( + "MemPMCTraceDPI.v", + """ + |import "DPI-C" function void dpi_mem_pmctrace( + | input byte unsigned is_store, + | input int unsigned rob_id, + | input longint unsigned elapsed + |); + | + |module MemPMCTraceDPI( + | input [7:0] is_store, + | input [31:0] rob_id, + | input [63:0] elapsed, + | input enable + |); + | always @(*) begin + | if (enable) begin + | dpi_mem_pmctrace(is_store, rob_id, elapsed); + | end + | end + |endmodule + """.stripMargin + ) +} diff --git a/arch/src/main/scala/framework/rocket/CSRBB.scala b/arch/src/main/scala/framework/rocket/CSRBB.scala deleted file mode 100644 index 819c3279..00000000 --- a/arch/src/main/scala/framework/rocket/CSRBB.scala +++ /dev/null @@ -1,1450 +0,0 @@ -// See LICENSE.SiFive for license details. -// See LICENSE.Berkeley for license details. - -package framework.rocket - -import chisel3._ -import chisel3.util.{BitPat, Cat, Fill, Mux1H, PopCount, PriorityMux, RegEnable, UIntToOH, Valid, log2Ceil, log2Up} -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.devices.debug.DebugModuleKey -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import freechips.rocketchip.util.property - -import scala.collection.mutable.LinkedHashMap -import freechips.rocketchip.rocket._ -import freechips.rocketchip.rocket.Instructions._ -import freechips.rocketchip.rocket.CustomInstructions._ - - -class CSRDecodeIOBB(implicit p: Parameters) extends CoreBundle { - val inst = Input(UInt(iLen.W)) - - def csr_addr = (inst >> 20)(CSR.ADDRSZ-1, 0) - - val fp_illegal = Output(Bool()) - val vector_illegal = Output(Bool()) - val fp_csr = Output(Bool()) - val vector_csr = Output(Bool()) - val rocc_illegal = Output(Bool()) - val read_illegal = Output(Bool()) - val write_illegal = Output(Bool()) - val write_flush = Output(Bool()) - val system_illegal = Output(Bool()) - val virtual_access_illegal = Output(Bool()) - val virtual_system_illegal = Output(Bool()) -} - -class CSRFileIOBB(hasBeu: Boolean)(implicit p: Parameters) extends CoreBundle - with HasCoreParameters { - val ungated_clock = Input(Clock()) - val interrupts = Input(new CoreInterrupts(hasBeu)) - val hartid = Input(UInt(hartIdLen.W)) - val rw = new Bundle { - val addr = Input(UInt(CSR.ADDRSZ.W)) - val cmd = Input(Bits(CSR.SZ.W)) - val rdata = Output(Bits(xLen.W)) - val wdata = Input(Bits(xLen.W)) - } - - val decode = Vec(decodeWidth, new CSRDecodeIOBB) - - val csr_stall = Output(Bool()) // stall retire for wfi - val rw_stall = Output(Bool()) // stall rw, rw will have no effect while rw_stall - val eret = Output(Bool()) - val singleStep = Output(Bool()) - - val status = Output(new MStatus()) - val hstatus = Output(new HStatus()) - val gstatus = Output(new MStatus()) - val ptbr = Output(new PTBR()) - val hgatp = Output(new PTBR()) - val vsatp = Output(new PTBR()) - val evec = Output(UInt(vaddrBitsExtended.W)) - val exception = Input(Bool()) - val retire = Input(UInt(log2Up(1+retireWidth).W)) - val cause = Input(UInt(xLen.W)) - val pc = Input(UInt(vaddrBitsExtended.W)) - val tval = Input(UInt(vaddrBitsExtended.W)) - val htval = Input(UInt(((maxSVAddrBits + 1) min xLen).W)) - val mhtinst_read_pseudo = Input(Bool()) - val gva = Input(Bool()) - val time = Output(UInt(xLen.W)) - val fcsr_rm = Output(Bits(FPConstants.RM_SZ.W)) - val fcsr_flags = Flipped(Valid(Bits(FPConstants.FLAGS_SZ.W))) - val set_fs_dirty = coreParams.haveFSDirty.option(Input(Bool())) - val rocc_interrupt = Input(Bool()) - val interrupt = Output(Bool()) - val interrupt_cause = Output(UInt(xLen.W)) - val bp = Output(Vec(nBreakpoints, new BP)) - val pmp = Output(Vec(nPMPs, new PMP)) - val counters = Vec(nPerfCounters, new PerfCounterIO) - val csrw_counter = Output(UInt(CSR.nCtr.W)) - val inhibit_cycle = Output(Bool()) - val inst = Input(Vec(retireWidth, UInt(iLen.W))) - val trace = Output(Vec(retireWidth, new TracedInstruction)) - val mcontext = Output(UInt(coreParams.mcontextWidth.W)) - val scontext = Output(UInt(coreParams.scontextWidth.W)) - val fiom = Output(Bool()) - - val vector = usingVector.option(new Bundle { - val vconfig = Output(new VConfig()) - val vstart = Output(UInt(maxVLMax.log2.W)) - val vxrm = Output(UInt(2.W)) - val set_vs_dirty = Input(Bool()) - val set_vconfig = Flipped(Valid(new VConfig)) - val set_vstart = Flipped(Valid(vstart)) - val set_vxsat = Input(Bool()) - }) -} - -class VConfig(implicit p: Parameters) extends CoreBundle { - val vl = UInt((maxVLMax.log2 + 1).W) - val vtype = new VType -} - -object VType { - def fromUInt(that: UInt, ignore_vill: Boolean = false)(implicit p: Parameters): VType = { - val res = 0.U.asTypeOf(new VType) - val in = that.asTypeOf(res) - val vill = (in.max_vsew.U < in.vsew) || !in.lmul_ok || in.reserved =/= 0.U || in.vill - when (!vill || ignore_vill.B) { - res := in - res.vsew := in.vsew(log2Ceil(1 + in.max_vsew) - 1, 0) - } - res.reserved := 0.U - res.vill := vill - res - } - - def computeVL(avl: UInt, vtype: UInt, currentVL: UInt, useCurrentVL: Bool, useMax: Bool, useZero: Bool)(implicit p: Parameters): UInt = - VType.fromUInt(vtype, true).vl(avl, currentVL, useCurrentVL, useMax, useZero) -} - -class VType(implicit p: Parameters) extends CoreBundle { - val vill = Bool() - val reserved = UInt((xLen - 9).W) - val vma = Bool() - val vta = Bool() - val vsew = UInt(3.W) - val vlmul_sign = Bool() - val vlmul_mag = UInt(2.W) - - def vlmul_signed: SInt = Cat(vlmul_sign, vlmul_mag).asSInt - - @deprecated("use vlmul_sign, vlmul_mag, or vlmul_signed", "RVV 0.9") - def vlmul: UInt = vlmul_mag - - def max_vsew = log2Ceil(eLen/8) - def max_vlmul = (1 << vlmul_mag.getWidth) - 1 - - def lmul_ok: Bool = Mux(this.vlmul_sign, this.vlmul_mag =/= 0.U && ~this.vlmul_mag < max_vsew.U - this.vsew, true.B) - - def minVLMax: Int = ((maxVLMax / eLen) >> ((1 << vlmul_mag.getWidth) - 1)) max 1 - - def vlMax: UInt = (maxVLMax.U >> (this.vsew +& Cat(this.vlmul_sign, ~this.vlmul_mag))).andNot((minVLMax-1).U) - - def vl(avl: UInt, currentVL: UInt, useCurrentVL: Bool, useMax: Bool, useZero: Bool): UInt = { - val atLeastMaxVLMax = useMax || Mux(useCurrentVL, currentVL >= maxVLMax.U, avl >= maxVLMax.U) - val avl_lsbs = Mux(useCurrentVL, currentVL, avl)(maxVLMax.log2 - 1, 0) - - val atLeastVLMax = atLeastMaxVLMax || (avl_lsbs & (-maxVLMax.S >> (this.vsew +& Cat(this.vlmul_sign, ~this.vlmul_mag))).asUInt.andNot((minVLMax-1).U)).orR - val isZero = vill || useZero - Mux(!isZero && atLeastVLMax, vlMax, 0.U) | Mux(!isZero && !atLeastVLMax, avl_lsbs, 0.U) - } -} - -class CSRFileBB( - perfEventSets: EventSets = new EventSets(Seq()), - customCSRs: Seq[CustomCSR] = Nil, - roccCSRs: Seq[CustomCSR] = Nil, - hasBeu: Boolean = false)(implicit p: Parameters) - extends CoreModule()(p) - with HasCoreParameters { - val io = IO(new CSRFileIOBB(hasBeu) { - val customCSRs = Vec(CSRFileBB.this.customCSRs.size, new CustomCSRIO) - val roccCSRs = Vec(CSRFileBB.this.roccCSRs.size, new CustomCSRIO) - }) - - io.rw_stall := false.B - - val reset_mstatus = WireDefault(0.U.asTypeOf(new MStatus())) - reset_mstatus.mpp := PRV.M.U - reset_mstatus.prv := PRV.M.U - reset_mstatus.xs := (if (usingRoCC) 3.U else 0.U) - val reg_mstatus = RegInit(reset_mstatus) - - val new_prv = WireDefault(reg_mstatus.prv) - reg_mstatus.prv := legalizePrivilege(new_prv) - - val reset_dcsr = WireDefault(0.U.asTypeOf(new DCSR())) - reset_dcsr.xdebugver := 1.U - reset_dcsr.prv := PRV.M.U - val reg_dcsr = RegInit(reset_dcsr) - - val (supported_interrupts, delegable_interrupts) = { - val sup = Wire(new MIP) - sup.usip := false.B - sup.ssip := usingSupervisor.B - sup.vssip := usingHypervisor.B - sup.msip := true.B - sup.utip := false.B - sup.stip := usingSupervisor.B - sup.vstip := usingHypervisor.B - sup.mtip := true.B - sup.ueip := false.B - sup.seip := usingSupervisor.B - sup.vseip := usingHypervisor.B - sup.meip := true.B - sup.sgeip := false.B - sup.rocc := usingRoCC.B - sup.debug := false.B - sup.zero1 := false.B - sup.lip foreach { _ := true.B } - val supported_high_interrupts = if (io.interrupts.buserror.nonEmpty && !usingNMI) (BigInt(1) << CSR.busErrorIntCause).U else 0.U - - val del = WireDefault(sup) - del.msip := false.B - del.mtip := false.B - del.meip := false.B - - (sup.asUInt | supported_high_interrupts, del.asUInt) - } - val delegable_base_exceptions = Seq( - Causes.misaligned_fetch, - Causes.fetch_page_fault, - Causes.breakpoint, - Causes.load_page_fault, - Causes.store_page_fault, - Causes.misaligned_load, - Causes.misaligned_store, - Causes.illegal_instruction, - Causes.user_ecall, - ) - val delegable_hypervisor_exceptions = Seq( - Causes.virtual_supervisor_ecall, - Causes.fetch_guest_page_fault, - Causes.load_guest_page_fault, - Causes.virtual_instruction, - Causes.store_guest_page_fault, - ) - val delegable_exceptions = ( - delegable_base_exceptions - ++ (if (usingHypervisor) delegable_hypervisor_exceptions else Seq()) - ).map(1 << _).sum.U - - val hs_delegable_exceptions = Seq( - Causes.misaligned_fetch, - Causes.fetch_access, - Causes.illegal_instruction, - Causes.breakpoint, - Causes.misaligned_load, - Causes.load_access, - Causes.misaligned_store, - Causes.store_access, - Causes.user_ecall, - Causes.fetch_page_fault, - Causes.load_page_fault, - Causes.store_page_fault).map(1 << _).sum.U - - val (hs_delegable_interrupts, mideleg_always_hs) = { - val always = WireDefault(0.U.asTypeOf(new MIP())) - always.vssip := usingHypervisor.B - always.vstip := usingHypervisor.B - always.vseip := usingHypervisor.B - - val deleg = WireDefault(always) - deleg.lip.foreach { _ := usingHypervisor.B } - - (deleg.asUInt, always.asUInt) - } - - val reg_debug = RegInit(false.B) - val reg_dpc = Reg(UInt(vaddrBitsExtended.W)) - val reg_dscratch0 = Reg(UInt(xLen.W)) - val reg_dscratch1 = (p(DebugModuleKey).map(_.nDscratch).getOrElse(1) > 1).option(Reg(UInt(xLen.W))) - val reg_singleStepped = Reg(Bool()) - - val reg_mcontext = (coreParams.mcontextWidth > 0).option(RegInit(0.U(coreParams.mcontextWidth.W))) - val reg_scontext = (coreParams.scontextWidth > 0).option(RegInit(0.U(coreParams.scontextWidth.W))) - - val reg_tselect = Reg(UInt(log2Up(nBreakpoints).W)) - val reg_bp = Reg(Vec(1 << log2Up(nBreakpoints), new BP)) - val reg_pmp = Reg(Vec(nPMPs, new PMPReg)) - - val reg_mie = Reg(UInt(xLen.W)) - val (reg_mideleg, read_mideleg) = { - val reg = Reg(UInt(xLen.W)) - (reg, Mux(usingSupervisor.B, reg & delegable_interrupts | mideleg_always_hs, 0.U)) - } - val (reg_medeleg, read_medeleg) = { - val reg = Reg(UInt(xLen.W)) - (reg, Mux(usingSupervisor.B, reg & delegable_exceptions, 0.U)) - } - val reg_mip = Reg(new MIP) - val reg_mepc = Reg(UInt(vaddrBitsExtended.W)) - val reg_mcause = RegInit(0.U(xLen.W)) - val reg_mtval = Reg(UInt(vaddrBitsExtended.W)) - val reg_mtval2 = Reg(UInt(((maxSVAddrBits + 1) min xLen).W)) - val reg_mscratch = Reg(Bits(xLen.W)) - val mtvecWidth = paddrBits min xLen - val reg_mtvec = mtvecInit match { - case Some(addr) => RegInit(addr.U(mtvecWidth.W)) - case None => Reg(UInt(mtvecWidth.W)) - } - - val reset_mnstatus = WireDefault(0.U.asTypeOf(new MNStatus())) - reset_mnstatus.mpp := PRV.M.U - val reg_mnscratch = Reg(Bits(xLen.W)) - val reg_mnepc = Reg(UInt(vaddrBitsExtended.W)) - val reg_mncause = RegInit(0.U(xLen.W)) - val reg_mnstatus = RegInit(reset_mnstatus) - val reg_rnmie = RegInit(true.B) - val nmie = reg_rnmie - - val reg_menvcfg = RegInit(0.U.asTypeOf(new Envcfg)) - val reg_senvcfg = RegInit(0.U.asTypeOf(new Envcfg)) - val reg_henvcfg = RegInit(0.U.asTypeOf(new Envcfg)) - - val delegable_counters = ((BigInt(1) << (nPerfCounters + CSR.firstHPM)) - 1).U - val (reg_mcounteren, read_mcounteren) = { - val reg = Reg(UInt(32.W)) - (reg, Mux(usingUser.B, reg & delegable_counters, 0.U)) - } - val (reg_scounteren, read_scounteren) = { - val reg = Reg(UInt(32.W)) - (reg, Mux(usingSupervisor.B, reg & delegable_counters, 0.U)) - } - - val (reg_hideleg, read_hideleg) = { - val reg = Reg(UInt(xLen.W)) - (reg, Mux(usingHypervisor.B, reg & hs_delegable_interrupts, 0.U)) - } - val (reg_hedeleg, read_hedeleg) = { - val reg = Reg(UInt(xLen.W)) - (reg, Mux(usingHypervisor.B, reg & hs_delegable_exceptions, 0.U)) - } - val hs_delegable_counters = delegable_counters - val (reg_hcounteren, read_hcounteren) = { - val reg = Reg(UInt(32.W)) - (reg, Mux(usingHypervisor.B, reg & hs_delegable_counters, 0.U)) - } - val reg_hstatus = RegInit(0.U.asTypeOf(new HStatus)) - val reg_hgatp = Reg(new PTBR) - val reg_htval = Reg(reg_mtval2.cloneType) - val read_hvip = reg_mip.asUInt & hs_delegable_interrupts - val read_hie = reg_mie & hs_delegable_interrupts - - val (reg_vstvec, read_vstvec) = { - val reg = Reg(UInt(vaddrBitsExtended.W)) - (reg, formTVec(reg).sextTo(xLen)) - } - val reg_vsstatus = Reg(new MStatus) - val reg_vsscratch = Reg(Bits(xLen.W)) - val reg_vsepc = Reg(UInt(vaddrBitsExtended.W)) - val reg_vscause = Reg(Bits(xLen.W)) - val reg_vstval = Reg(UInt(vaddrBitsExtended.W)) - val reg_vsatp = Reg(new PTBR) - - val reg_sepc = Reg(UInt(vaddrBitsExtended.W)) - val reg_scause = Reg(Bits(xLen.W)) - val reg_stval = Reg(UInt(vaddrBitsExtended.W)) - val reg_sscratch = Reg(Bits(xLen.W)) - val reg_stvec = Reg(UInt((if (usingHypervisor) vaddrBitsExtended else vaddrBits).W)) - val reg_satp = Reg(new PTBR) - val reg_wfi = withClock(io.ungated_clock) { RegInit(false.B) } - - val reg_fflags = Reg(UInt(5.W)) - val reg_frm = Reg(UInt(3.W)) - val reg_vconfig = usingVector.option(Reg(new VConfig)) - val reg_vstart = usingVector.option(Reg(UInt(maxVLMax.log2.W))) - val reg_vxsat = usingVector.option(Reg(Bool())) - val reg_vxrm = usingVector.option(Reg(UInt(io.vector.get.vxrm.getWidth.W))) - - val reg_mtinst_read_pseudo = Reg(Bool()) - val reg_htinst_read_pseudo = Reg(Bool()) - // XLEN=32: 0x00002000 - // XLEN=64: 0x00003000 - val Seq(read_mtinst, read_htinst) = Seq(reg_mtinst_read_pseudo, reg_htinst_read_pseudo).map(r => Cat(r, (xLen == 32).option(0.U).getOrElse(r), 0.U(12.W))) - - val reg_mcountinhibit = RegInit(0.U((CSR.firstHPM + nPerfCounters).W)) - io.inhibit_cycle := reg_mcountinhibit(0) - val reg_instret = WideCounter(64, io.retire, inhibit = reg_mcountinhibit(2)) - val reg_cycle = if (enableCommitLog) WideCounter(64, io.retire, inhibit = reg_mcountinhibit(0)) - else withClock(io.ungated_clock) { WideCounter(64, !io.csr_stall, inhibit = reg_mcountinhibit(0)) } - val reg_hpmevent = io.counters.map(c => RegInit(0.U(xLen.W))) - (io.counters zip reg_hpmevent) foreach { case (c, e) => c.eventSel := e } - val reg_hpmcounter = io.counters.zipWithIndex.map { case (c, i) => - WideCounter(CSR.hpmWidth, c.inc, reset = false, inhibit = reg_mcountinhibit(CSR.firstHPM+i)) } - - val mip = WireDefault(reg_mip) - mip.lip := (io.interrupts.lip: Seq[Bool]) - mip.mtip := io.interrupts.mtip - mip.msip := io.interrupts.msip - mip.meip := io.interrupts.meip - // seip is the OR of reg_mip.seip and the actual line from the PLIC - io.interrupts.seip.foreach { mip.seip := reg_mip.seip || _ } - // Simimlar sort of thing would apply if the PLIC had a VSEIP line: - //io.interrupts.vseip.foreach { mip.vseip := reg_mip.vseip || _ } - mip.rocc := io.rocc_interrupt - val read_mip = mip.asUInt & supported_interrupts - val read_hip = read_mip & hs_delegable_interrupts - val high_interrupts = (if (usingNMI) 0.U else io.interrupts.buserror.map(_ << CSR.busErrorIntCause).getOrElse(0.U)) - - val pending_interrupts = high_interrupts | (read_mip & reg_mie) - val d_interrupts = io.interrupts.debug << CSR.debugIntCause - val (nmi_interrupts, nmiFlag) = io.interrupts.nmi.map(nmi => - (((nmi.rnmi && reg_rnmie) << CSR.rnmiIntCause) | - io.interrupts.buserror.map(_ << CSR.rnmiBEUCause).getOrElse(0.U), - !io.interrupts.debug && nmi.rnmi && reg_rnmie)).getOrElse(0.U, false.B) - val m_interrupts = Mux(nmie && (reg_mstatus.prv <= PRV.S.U || reg_mstatus.mie), ~(~pending_interrupts | read_mideleg), 0.U) - val s_interrupts = Mux(nmie && (reg_mstatus.v || reg_mstatus.prv < PRV.S.U || (reg_mstatus.prv === PRV.S.U && reg_mstatus.sie)), pending_interrupts & read_mideleg & ~read_hideleg, 0.U) - val vs_interrupts = Mux(nmie && (reg_mstatus.v && (reg_mstatus.prv < PRV.S.U || reg_mstatus.prv === PRV.S.U && reg_vsstatus.sie)), pending_interrupts & read_hideleg, 0.U) - val (anyInterrupt, whichInterrupt) = chooseInterrupt(Seq(vs_interrupts, s_interrupts, m_interrupts, nmi_interrupts, d_interrupts)) - val interruptMSB = BigInt(1) << (xLen-1) - val interruptCause = interruptMSB.U + (nmiFlag << (xLen-2)) + whichInterrupt - io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease) - io.interrupt_cause := interruptCause - io.bp := reg_bp take nBreakpoints - io.mcontext := reg_mcontext.getOrElse(0.U) - io.scontext := reg_scontext.getOrElse(0.U) - io.fiom := (reg_mstatus.prv < PRV.M.U && reg_menvcfg.fiom) || (reg_mstatus.prv < PRV.S.U && reg_senvcfg.fiom) || (reg_mstatus.v && reg_henvcfg.fiom) - io.pmp := reg_pmp.map(PMP(_)) - - val isaMaskString = - (if (usingMulDiv) "M" else "") + - (if (usingAtomics) "A" else "") + - (if (fLen >= 32) "F" else "") + - (if (fLen >= 64) "D" else "") + - (if (coreParams.hasV) "V" else "") + - (if (usingCompressed) "C" else "") - val isaString = (if (coreParams.useRVE) "E" else "I") + - isaMaskString + - (if (customIsaExt.isDefined || usingRoCC) "X" else "") + - (if (usingSupervisor) "S" else "") + - (if (usingHypervisor) "H" else "") + - (if (usingUser) "U" else "") - val isaMax = (BigInt(log2Ceil(xLen) - 4) << (xLen-2)) | isaStringToMask(isaString) - val reg_misa = RegInit(isaMax.U) - val read_mstatus = io.status.asUInt.extract(xLen-1,0) - val read_mtvec = formTVec(reg_mtvec).padTo(xLen) - val read_stvec = formTVec(reg_stvec).sextTo(xLen) - - val read_mapping = LinkedHashMap[Int,Bits]( - CSRs.tselect -> reg_tselect, - CSRs.tdata1 -> reg_bp(reg_tselect).control.asUInt, - CSRs.tdata2 -> reg_bp(reg_tselect).address.sextTo(xLen), - CSRs.tdata3 -> reg_bp(reg_tselect).textra.asUInt, - CSRs.misa -> reg_misa, - CSRs.mstatus -> read_mstatus, - CSRs.mtvec -> read_mtvec, - CSRs.mip -> read_mip, - CSRs.mie -> reg_mie, - CSRs.mscratch -> reg_mscratch, - CSRs.mepc -> readEPC(reg_mepc).sextTo(xLen), - CSRs.mtval -> reg_mtval.sextTo(xLen), - CSRs.mcause -> reg_mcause, - CSRs.mhartid -> io.hartid) - - val debug_csrs = if (!usingDebug) LinkedHashMap() else LinkedHashMap[Int,Bits]( - CSRs.dcsr -> reg_dcsr.asUInt, - CSRs.dpc -> readEPC(reg_dpc).sextTo(xLen), - CSRs.dscratch0 -> reg_dscratch0.asUInt) ++ - reg_dscratch1.map(r => CSRs.dscratch1 -> r) - - val read_mnstatus = WireInit(0.U.asTypeOf(new MNStatus())) - read_mnstatus.mpp := reg_mnstatus.mpp - read_mnstatus.mpv := reg_mnstatus.mpv - read_mnstatus.mie := reg_rnmie - val nmi_csrs = if (!usingNMI) LinkedHashMap() else LinkedHashMap[Int,Bits]( - CustomCSRs.mnscratch -> reg_mnscratch, - CustomCSRs.mnepc -> readEPC(reg_mnepc).sextTo(xLen), - CustomCSRs.mncause -> reg_mncause, - CustomCSRs.mnstatus -> read_mnstatus.asUInt) - - val context_csrs = LinkedHashMap[Int,Bits]() ++ - reg_mcontext.map(r => CSRs.mcontext -> r) ++ - reg_scontext.map(r => CSRs.scontext -> r) - - val read_fcsr = Cat(reg_frm, reg_fflags) - val fp_csrs = LinkedHashMap[Int,Bits]() ++ - usingFPU.option(CSRs.fflags -> reg_fflags) ++ - usingFPU.option(CSRs.frm -> reg_frm) ++ - (usingFPU || usingVector).option(CSRs.fcsr -> read_fcsr) - - val read_vcsr = Cat(reg_vxrm.getOrElse(0.U), reg_vxsat.getOrElse(0.U)) - val vector_csrs = if (!usingVector) LinkedHashMap() else LinkedHashMap[Int,Bits]( - CSRs.vxsat -> reg_vxsat.get, - CSRs.vxrm -> reg_vxrm.get, - CSRs.vcsr -> read_vcsr, - CSRs.vstart -> reg_vstart.get, - CSRs.vtype -> reg_vconfig.get.vtype.asUInt, - CSRs.vl -> reg_vconfig.get.vl, - CSRs.vlenb -> (vLen / 8).U) - - read_mapping ++= debug_csrs - read_mapping ++= nmi_csrs - read_mapping ++= context_csrs - read_mapping ++= fp_csrs - read_mapping ++= vector_csrs - - if (coreParams.haveBasicCounters) { - read_mapping += CSRs.mcountinhibit -> reg_mcountinhibit - read_mapping += CSRs.mcycle -> reg_cycle - read_mapping += CSRs.minstret -> reg_instret - - for (((e, c), i) <- (reg_hpmevent.padTo(CSR.nHPM, 0.U) - zip reg_hpmcounter.map(x => x: UInt).padTo(CSR.nHPM, 0.U)).zipWithIndex) { - read_mapping += (i + CSR.firstHPE) -> e // mhpmeventN - read_mapping += (i + CSR.firstMHPC) -> c // mhpmcounterN - read_mapping += (i + CSR.firstHPC) -> c // hpmcounterN - if (xLen == 32) { - read_mapping += (i + CSR.firstMHPCH) -> (c >> 32) // mhpmcounterNh - read_mapping += (i + CSR.firstHPCH) -> (c >> 32) // hpmcounterNh - } - } - - if (usingUser) { - read_mapping += CSRs.mcounteren -> read_mcounteren - } - read_mapping += CSRs.cycle -> reg_cycle - read_mapping += CSRs.instret -> reg_instret - - if (xLen == 32) { - read_mapping += CSRs.mcycleh -> (reg_cycle >> 32) - read_mapping += CSRs.minstreth -> (reg_instret >> 32) - read_mapping += CSRs.cycleh -> (reg_cycle >> 32) - read_mapping += CSRs.instreth -> (reg_instret >> 32) - } - } - - if (usingUser) { - read_mapping += CSRs.menvcfg -> reg_menvcfg.asUInt - if (xLen == 32) - read_mapping += CSRs.menvcfgh -> (reg_menvcfg.asUInt >> 32) - } - - val sie_mask = { - val sgeip_mask = WireInit(0.U.asTypeOf(new MIP)) - sgeip_mask.sgeip := true.B - read_mideleg & ~(hs_delegable_interrupts | sgeip_mask.asUInt) - } - if (usingSupervisor) { - val read_sie = reg_mie & sie_mask - val read_sip = read_mip & sie_mask - val read_sstatus = WireDefault(0.U.asTypeOf(new MStatus)) - read_sstatus.sd := io.status.sd - read_sstatus.uxl := io.status.uxl - read_sstatus.sd_rv32 := io.status.sd_rv32 - read_sstatus.mxr := io.status.mxr - read_sstatus.sum := io.status.sum - read_sstatus.xs := io.status.xs - read_sstatus.fs := io.status.fs - read_sstatus.vs := io.status.vs - read_sstatus.spp := io.status.spp - read_sstatus.spie := io.status.spie - read_sstatus.sie := io.status.sie - - read_mapping += CSRs.sstatus -> (read_sstatus.asUInt)(xLen-1,0) - read_mapping += CSRs.sip -> read_sip.asUInt - read_mapping += CSRs.sie -> read_sie.asUInt - read_mapping += CSRs.sscratch -> reg_sscratch - read_mapping += CSRs.scause -> reg_scause - read_mapping += CSRs.stval -> reg_stval.sextTo(xLen) - read_mapping += CSRs.satp -> reg_satp.asUInt - read_mapping += CSRs.sepc -> readEPC(reg_sepc).sextTo(xLen) - read_mapping += CSRs.stvec -> read_stvec - read_mapping += CSRs.scounteren -> read_scounteren - read_mapping += CSRs.mideleg -> read_mideleg - read_mapping += CSRs.medeleg -> read_medeleg - read_mapping += CSRs.senvcfg -> reg_senvcfg.asUInt - } - - val pmpCfgPerCSR = xLen / new PMPConfig().getWidth - def pmpCfgIndex(i: Int) = (xLen / 32) * (i / pmpCfgPerCSR) - if (reg_pmp.nonEmpty) { - require(reg_pmp.size <= CSR.maxPMPs) - val read_pmp = reg_pmp.padTo(CSR.maxPMPs, 0.U.asTypeOf(new PMP)) - for (i <- 0 until read_pmp.size by pmpCfgPerCSR) - read_mapping += (CSRs.pmpcfg0 + pmpCfgIndex(i)) -> read_pmp.map(_.cfg).slice(i, i + pmpCfgPerCSR).asUInt - for ((pmp, i) <- read_pmp.zipWithIndex) - read_mapping += (CSRs.pmpaddr0 + i) -> pmp.readAddr - } - - // implementation-defined CSRs - def generateCustomCSR(csr: CustomCSR, csr_io: CustomCSRIO) = { - require(csr.mask >= 0 && csr.mask.bitLength <= xLen) - require(!read_mapping.contains(csr.id)) - val reg = csr.init.map(init => RegInit(init.U(xLen.W))).getOrElse(Reg(UInt(xLen.W))) - val read = io.rw.cmd =/= CSR.N && io.rw.addr === csr.id.U - csr_io.ren := read - when (read && csr_io.stall) { io.rw_stall := true.B } - read_mapping += csr.id -> reg - reg - } - val reg_custom = customCSRs.zip(io.customCSRs).map(t => generateCustomCSR(t._1, t._2)) - val reg_rocc = roccCSRs.zip(io.roccCSRs).map(t => generateCustomCSR(t._1, t._2)) - - if (usingHypervisor) { - read_mapping += CSRs.mtinst -> read_mtinst - read_mapping += CSRs.mtval2 -> reg_mtval2 - - val read_hstatus = io.hstatus.asUInt.extract(xLen-1,0) - - read_mapping += CSRs.hstatus -> read_hstatus - read_mapping += CSRs.hedeleg -> read_hedeleg - read_mapping += CSRs.hideleg -> read_hideleg - read_mapping += CSRs.hcounteren-> read_hcounteren - read_mapping += CSRs.hgatp -> reg_hgatp.asUInt - read_mapping += CSRs.hip -> read_hip - read_mapping += CSRs.hie -> read_hie - read_mapping += CSRs.hvip -> read_hvip - read_mapping += CSRs.hgeie -> 0.U - read_mapping += CSRs.hgeip -> 0.U - read_mapping += CSRs.htval -> reg_htval - read_mapping += CSRs.htinst -> read_htinst - read_mapping += CSRs.henvcfg -> reg_henvcfg.asUInt - if (xLen == 32) - read_mapping += CSRs.henvcfgh -> (reg_henvcfg.asUInt >> 32) - - val read_vsie = (read_hie & read_hideleg) >> 1 - val read_vsip = (read_hip & read_hideleg) >> 1 - val read_vsepc = readEPC(reg_vsepc).sextTo(xLen) - val read_vstval = reg_vstval.sextTo(xLen) - val read_vsstatus = io.gstatus.asUInt.extract(xLen-1,0) - - read_mapping += CSRs.vsstatus -> read_vsstatus - read_mapping += CSRs.vsip -> read_vsip - read_mapping += CSRs.vsie -> read_vsie - read_mapping += CSRs.vsscratch -> reg_vsscratch - read_mapping += CSRs.vscause -> reg_vscause - read_mapping += CSRs.vstval -> read_vstval - read_mapping += CSRs.vsatp -> reg_vsatp.asUInt - read_mapping += CSRs.vsepc -> read_vsepc - read_mapping += CSRs.vstvec -> read_vstvec - } - - // mimpid, marchid, mvendorid, and mconfigptr are 0 unless overridden by customCSRs - Seq(CSRs.mimpid, CSRs.marchid, CSRs.mvendorid, CSRs.mconfigptr).foreach(id => read_mapping.getOrElseUpdate(id, 0.U)) - - val decoded_addr = { - val addr = Cat(io.status.v, io.rw.addr) - val pats = for (((k, _), i) <- read_mapping.zipWithIndex) - yield (BitPat(k.U), (0 until read_mapping.size).map(j => BitPat((i == j).B))) - val decoded = DecodeLogic(addr, Seq.fill(read_mapping.size)(X), pats) - val unvirtualized_mapping = (for (((k, _), v) <- read_mapping zip decoded) yield k -> v.asBool).toMap - - for ((k, v) <- unvirtualized_mapping) yield k -> { - val alt: Option[Bool] = CSR.mode(k) match { - // hcontext was assigned an unfortunate address; it lives where a - // hypothetical vscontext will live. Exclude them from the S/VS remapping. - // (on separate lines so scala-lint doesnt do something stupid) - case _ if k == CSRs.scontext => None - case _ if k == CSRs.hcontext => None - // When V=1, if a corresponding VS CSR exists, access it instead... - case PRV.H => unvirtualized_mapping.lift(k - (1 << CSR.modeLSB)) - // ...and don't access the original S-mode version. - case PRV.S => unvirtualized_mapping.contains(k + (1 << CSR.modeLSB)).option(false.B) - case _ => None - } - alt.map(Mux(reg_mstatus.v, _, v)).getOrElse(v) - } - } - - val wdata = readModifyWriteCSR(io.rw.cmd, io.rw.rdata, io.rw.wdata) - - val system_insn = io.rw.cmd === CSR.I - val hlsv = Seq(HLV_B, HLV_BU, HLV_H, HLV_HU, HLV_W, HLV_WU, HLV_D, HSV_B, HSV_H, HSV_W, HSV_D, HLVX_HU, HLVX_WU) - val decode_table = Seq( ECALL-> List(Y,N,N,N,N,N,N,N,N), - EBREAK-> List(N,Y,N,N,N,N,N,N,N), - MRET-> List(N,N,Y,N,N,N,N,N,N), - CEASE-> List(N,N,N,Y,N,N,N,N,N), - WFI-> List(N,N,N,N,Y,N,N,N,N)) ++ - usingDebug.option( DRET-> List(N,N,Y,N,N,N,N,N,N)) ++ - usingNMI.option( MNRET-> List(N,N,Y,N,N,N,N,N,N)) ++ - coreParams.haveCFlush.option(CFLUSH_D_L1-> List(N,N,N,N,N,N,N,N,N)) ++ - usingSupervisor.option( SRET-> List(N,N,Y,N,N,N,N,N,N)) ++ - usingVM.option( SFENCE_VMA-> List(N,N,N,N,N,Y,N,N,N)) ++ - usingHypervisor.option( HFENCE_VVMA-> List(N,N,N,N,N,N,Y,N,N)) ++ - usingHypervisor.option( HFENCE_GVMA-> List(N,N,N,N,N,N,N,Y,N)) ++ - (if (usingHypervisor) hlsv.map(_-> List(N,N,N,N,N,N,N,N,Y)) else Seq()) - val insn_call :: insn_break :: insn_ret :: insn_cease :: insn_wfi :: _ :: _ :: _ :: _ :: Nil = { - val insn = ECALL.value.U | (io.rw.addr << 20) - DecodeLogic(insn, decode_table(0)._2.map(x=>X), decode_table).map(system_insn && _.asBool) - } - - for (io_dec <- io.decode) { - val addr = io_dec.inst(31, 20) - - def decodeAny(m: LinkedHashMap[Int,Bits]): Bool = m.map { case(k: Int, _: Bits) => addr === k.U }.reduce(_||_) - def decodeFast(s: Seq[Int]): Bool = DecodeLogic(addr, s.map(_.U), (read_mapping -- s).keys.toList.map(_.U)) - - val _ :: is_break :: is_ret :: _ :: is_wfi :: is_sfence :: is_hfence_vvma :: is_hfence_gvma :: is_hlsv :: Nil = - DecodeLogic(io_dec.inst, decode_table(0)._2.map(x=>X), decode_table).map(_.asBool) - val is_counter = (addr.inRange(CSR.firstCtr.U, (CSR.firstCtr + CSR.nCtr).U) || addr.inRange(CSR.firstCtrH.U, (CSR.firstCtrH + CSR.nCtr).U)) - - val allow_wfi = (!usingSupervisor).B || reg_mstatus.prv > PRV.S.U || !reg_mstatus.tw && (!reg_mstatus.v || !reg_hstatus.vtw) - val allow_sfence_vma = (!usingVM).B || reg_mstatus.prv > PRV.S.U || !Mux(reg_mstatus.v, reg_hstatus.vtvm, reg_mstatus.tvm) - val allow_hfence_vvma = (!usingHypervisor).B || !reg_mstatus.v && (reg_mstatus.prv >= PRV.S.U) - val allow_hlsv = (!usingHypervisor).B || !reg_mstatus.v && (reg_mstatus.prv >= PRV.S.U || reg_hstatus.hu) - val allow_sret = (!usingSupervisor).B || reg_mstatus.prv > PRV.S.U || !Mux(reg_mstatus.v, reg_hstatus.vtsr, reg_mstatus.tsr) - val counter_addr = addr(log2Ceil(read_mcounteren.getWidth)-1, 0) - val allow_counter = (reg_mstatus.prv > PRV.S.U || read_mcounteren(counter_addr)) && - (!usingSupervisor.B || reg_mstatus.prv >= PRV.S.U || read_scounteren(counter_addr)) && - (!usingHypervisor.B || !reg_mstatus.v || read_hcounteren(counter_addr)) - io_dec.fp_illegal := io.status.fs === 0.U || reg_mstatus.v && reg_vsstatus.fs === 0.U || !reg_misa('f'-'a') - io_dec.vector_illegal := io.status.vs === 0.U || reg_mstatus.v && reg_vsstatus.vs === 0.U || !reg_misa('v'-'a') - io_dec.fp_csr := decodeFast(fp_csrs.keys.toList) - io_dec.vector_csr := decodeFast(vector_csrs.keys.toList) - io_dec.rocc_illegal := io.status.xs === 0.U || reg_mstatus.v && reg_vsstatus.xs === 0.U || !reg_misa('x'-'a') - val csr_addr_legal = reg_mstatus.prv >= CSR.mode(addr) || - usingHypervisor.B && !reg_mstatus.v && reg_mstatus.prv === PRV.S.U && CSR.mode(addr) === PRV.H.U - val csr_exists = decodeAny(read_mapping) - io_dec.read_illegal := !csr_addr_legal || - !csr_exists || - ((addr === CSRs.satp.U || addr === CSRs.hgatp.U) && !allow_sfence_vma) || - is_counter && !allow_counter || - decodeFast(debug_csrs.keys.toList) && !reg_debug || - decodeFast(vector_csrs.keys.toList) && io_dec.vector_illegal || - io_dec.fp_csr && io_dec.fp_illegal - io_dec.write_illegal := addr(11,10).andR - io_dec.write_flush := { - val addr_m = addr | (PRV.M.U << CSR.modeLSB) - !(addr_m >= CSRs.mscratch.U && addr_m <= CSRs.mtval.U) - } - io_dec.system_illegal := !csr_addr_legal && !is_hlsv || - is_wfi && !allow_wfi || - is_ret && !allow_sret || - is_ret && addr(10) && addr(7) && !reg_debug || - (is_sfence || is_hfence_gvma) && !allow_sfence_vma || - is_hfence_vvma && !allow_hfence_vvma || - is_hlsv && !allow_hlsv - - io_dec.virtual_access_illegal := reg_mstatus.v && csr_exists && ( - CSR.mode(addr) === PRV.H.U || - is_counter && read_mcounteren(counter_addr) && (!read_hcounteren(counter_addr) || !reg_mstatus.prv(0) && !read_scounteren(counter_addr)) || - CSR.mode(addr) === PRV.S.U && !reg_mstatus.prv(0) || - addr === CSRs.satp.U && reg_mstatus.prv(0) && reg_hstatus.vtvm) - - io_dec.virtual_system_illegal := reg_mstatus.v && ( - is_hfence_vvma || - is_hfence_gvma || - is_hlsv || - is_wfi && (!reg_mstatus.prv(0) || !reg_mstatus.tw && reg_hstatus.vtw) || - is_ret && CSR.mode(addr) === PRV.S.U && (!reg_mstatus.prv(0) || reg_hstatus.vtsr) || - is_sfence && (!reg_mstatus.prv(0) || reg_hstatus.vtvm)) - } - - val cause = - Mux(insn_call, Causes.user_ecall.U + Mux(reg_mstatus.prv(0) && reg_mstatus.v, PRV.H.U, reg_mstatus.prv), - Mux[UInt](insn_break, Causes.breakpoint.U, io.cause)) - val cause_lsbs = cause(log2Ceil(1 + CSR.busErrorIntCause)-1, 0) - val cause_deleg_lsbs = cause(log2Ceil(xLen)-1,0) - val causeIsDebugInt = cause(xLen-1) && cause_lsbs === CSR.debugIntCause.U - val causeIsDebugTrigger = !cause(xLen-1) && cause_lsbs === CSR.debugTriggerCause.U - val causeIsDebugBreak = !cause(xLen-1) && insn_break && Cat(reg_dcsr.ebreakm, reg_dcsr.ebreakh, reg_dcsr.ebreaks, reg_dcsr.ebreaku)(reg_mstatus.prv) - val trapToDebug = usingDebug.B && (reg_singleStepped || causeIsDebugInt || causeIsDebugTrigger || causeIsDebugBreak || reg_debug) - val debugEntry = p(DebugModuleKey).map(_.debugEntry).getOrElse(BigInt(0x800)) - val debugException = p(DebugModuleKey).map(_.debugException).getOrElse(BigInt(0x808)) - val debugTVec = Mux(reg_debug, Mux(insn_break, debugEntry.U, debugException.U), debugEntry.U) - val delegate = usingSupervisor.B && reg_mstatus.prv <= PRV.S.U && Mux(cause(xLen-1), read_mideleg(cause_deleg_lsbs), read_medeleg(cause_deleg_lsbs)) - val delegateVS = reg_mstatus.v && delegate && Mux(cause(xLen-1), read_hideleg(cause_deleg_lsbs), read_hedeleg(cause_deleg_lsbs)) - def mtvecBaseAlign = 2 - def mtvecInterruptAlign = { - require(reg_mip.getWidth <= xLen) - log2Ceil(xLen) - } - val notDebugTVec = { - val base = Mux(delegate, Mux(delegateVS, read_vstvec, read_stvec), read_mtvec) - val interruptOffset = cause(mtvecInterruptAlign-1, 0) << mtvecBaseAlign - val interruptVec = Cat(base >> (mtvecInterruptAlign + mtvecBaseAlign), interruptOffset) - val doVector = base(0) && cause(cause.getWidth-1) && (cause_lsbs >> mtvecInterruptAlign) === 0.U - Mux(doVector, interruptVec, base >> mtvecBaseAlign << mtvecBaseAlign) - } - - val causeIsRnmiInt = cause(xLen-1) && cause(xLen-2) && (cause_lsbs === CSR.rnmiIntCause.U || cause_lsbs === CSR.rnmiBEUCause.U) - val causeIsRnmiBEU = cause(xLen-1) && cause(xLen-2) && cause_lsbs === CSR.rnmiBEUCause.U - val causeIsNmi = causeIsRnmiInt - val nmiTVecInt = io.interrupts.nmi.map(nmi => nmi.rnmi_interrupt_vector).getOrElse(0.U) - val nmiTVecXcpt = io.interrupts.nmi.map(nmi => nmi.rnmi_exception_vector).getOrElse(0.U) - val trapToNmiInt = usingNMI.B && causeIsNmi - val trapToNmiXcpt = usingNMI.B && !nmie - val trapToNmi = trapToNmiInt || trapToNmiXcpt - val nmiTVec = (Mux(causeIsNmi, nmiTVecInt, nmiTVecXcpt)>>1)<<1 - - val tvec = Mux(trapToDebug, debugTVec, Mux(trapToNmi, nmiTVec, notDebugTVec)) - io.evec := tvec - io.ptbr := reg_satp - io.hgatp := reg_hgatp - io.vsatp := reg_vsatp - io.eret := insn_call || insn_break || insn_ret - io.singleStep := reg_dcsr.step && !reg_debug - io.status := reg_mstatus - io.status.sd := io.status.fs.andR || io.status.xs.andR || io.status.vs.andR - io.status.debug := reg_debug - io.status.isa := reg_misa - io.status.uxl := (if (usingUser) log2Ceil(xLen) - 4 else 0).U - io.status.sxl := (if (usingSupervisor) log2Ceil(xLen) - 4 else 0).U - io.status.dprv := Mux(reg_mstatus.mprv && !reg_debug, reg_mstatus.mpp, reg_mstatus.prv) - io.status.dv := reg_mstatus.v || Mux(reg_mstatus.mprv && !reg_debug, reg_mstatus.mpv, false.B) - io.status.sd_rv32 := (xLen == 32).B && io.status.sd - io.status.mpv := reg_mstatus.mpv - io.status.gva := reg_mstatus.gva - io.hstatus := reg_hstatus - io.hstatus.vsxl := (if (usingSupervisor) log2Ceil(xLen) - 4 else 0).U - io.gstatus := reg_vsstatus - io.gstatus.sd := io.gstatus.fs.andR || io.gstatus.xs.andR || io.gstatus.vs.andR - io.gstatus.uxl := (if (usingUser) log2Ceil(xLen) - 4 else 0).U - io.gstatus.sd_rv32 := (xLen == 32).B && io.gstatus.sd - - val exception = insn_call || insn_break || io.exception - assert(PopCount(insn_ret :: insn_call :: insn_break :: io.exception :: Nil) <= 1.U, "these conditions must be mutually exclusive") - - when (insn_wfi && !io.singleStep && !reg_debug) { reg_wfi := true.B } - when (pending_interrupts.orR || io.interrupts.debug || exception) { reg_wfi := false.B } - io.interrupts.nmi.map(nmi => when (nmi.rnmi) { reg_wfi := false.B } ) - - when (io.retire(0) || exception) { reg_singleStepped := true.B } - when (!io.singleStep) { reg_singleStepped := false.B } - assert(!io.singleStep || io.retire <= 1.U) - assert(!reg_singleStepped || io.retire === 0.U) - - val epc = formEPC(io.pc) - val tval = Mux(insn_break, epc, io.tval) - - when (exception) { - when (trapToDebug) { - when (!reg_debug) { - reg_mstatus.v := false.B - reg_debug := true.B - reg_dpc := epc - reg_dcsr.cause := Mux(reg_singleStepped, 4.U, Mux(causeIsDebugInt, 3.U, Mux[UInt](causeIsDebugTrigger, 2.U, 1.U))) - reg_dcsr.prv := trimPrivilege(reg_mstatus.prv) - reg_dcsr.v := reg_mstatus.v - new_prv := PRV.M.U - } - }.elsewhen (trapToNmiInt) { - when (reg_rnmie) { - reg_mstatus.v := false.B - reg_mnstatus.mpv := reg_mstatus.v - reg_rnmie := false.B - reg_mnepc := epc - reg_mncause := (BigInt(1) << (xLen-1)).U | Mux(causeIsRnmiBEU, 3.U, 2.U) - reg_mnstatus.mpp := trimPrivilege(reg_mstatus.prv) - new_prv := PRV.M.U - } - }.elsewhen (delegateVS && nmie) { - reg_mstatus.v := true.B - reg_vsstatus.spp := reg_mstatus.prv - reg_vsepc := epc - reg_vscause := Mux(cause(xLen-1), Cat(cause(xLen-1, 2), 1.U(2.W)), cause) - reg_vstval := tval - reg_vsstatus.spie := reg_vsstatus.sie - reg_vsstatus.sie := false.B - new_prv := PRV.S.U - }.elsewhen (delegate && nmie) { - reg_mstatus.v := false.B - reg_hstatus.spvp := Mux(reg_mstatus.v, reg_mstatus.prv(0),reg_hstatus.spvp) - reg_hstatus.gva := io.gva - reg_hstatus.spv := reg_mstatus.v - reg_sepc := epc - reg_scause := cause - reg_stval := tval - reg_htval := io.htval - reg_htinst_read_pseudo := io.mhtinst_read_pseudo - reg_mstatus.spie := reg_mstatus.sie - reg_mstatus.spp := reg_mstatus.prv - reg_mstatus.sie := false.B - new_prv := PRV.S.U - }.otherwise { - reg_mstatus.v := false.B - reg_mstatus.mpv := reg_mstatus.v - reg_mstatus.gva := io.gva - reg_mepc := epc - reg_mcause := cause - reg_mtval := tval - reg_mtval2 := io.htval - reg_mtinst_read_pseudo := io.mhtinst_read_pseudo - reg_mstatus.mpie := reg_mstatus.mie - reg_mstatus.mpp := trimPrivilege(reg_mstatus.prv) - reg_mstatus.mie := false.B - new_prv := PRV.M.U - } - } - - for (i <- 0 until supported_interrupts.getWidth) { - val en = exception && (supported_interrupts & (BigInt(1) << i).U) =/= 0.U && cause === (BigInt(1) << (xLen - 1)).U + i.U - val delegable = (delegable_interrupts & (BigInt(1) << i).U) =/= 0.U - property.cover(en && !delegate, s"INTERRUPT_M_$i") - property.cover(en && delegable && delegate, s"INTERRUPT_S_$i") - } - for (i <- 0 until xLen) { - val supported_exceptions: BigInt = 0x8fe | - (if (usingCompressed && !coreParams.misaWritable) 0 else 1) | - (if (usingUser) 0x100 else 0) | - (if (usingSupervisor) 0x200 else 0) | - (if (usingVM) 0xb000 else 0) - if (((supported_exceptions >> i) & 1) != 0) { - val en = exception && cause === i.U - val delegable = (delegable_exceptions & (BigInt(1) << i).U) =/= 0.U - property.cover(en && !delegate, s"EXCEPTION_M_$i") - property.cover(en && delegable && delegate, s"EXCEPTION_S_$i") - } - } - - when (insn_ret) { - val ret_prv = WireInit(UInt(), DontCare) - when (usingSupervisor.B && !io.rw.addr(9)) { - when (!reg_mstatus.v) { - reg_mstatus.sie := reg_mstatus.spie - reg_mstatus.spie := true.B - reg_mstatus.spp := PRV.U.U - ret_prv := reg_mstatus.spp - reg_mstatus.v := usingHypervisor.B && reg_hstatus.spv - io.evec := readEPC(reg_sepc) - reg_hstatus.spv := false.B - }.otherwise { - reg_vsstatus.sie := reg_vsstatus.spie - reg_vsstatus.spie := true.B - reg_vsstatus.spp := PRV.U.U - ret_prv := reg_vsstatus.spp - reg_mstatus.v := usingHypervisor.B - io.evec := readEPC(reg_vsepc) - } - }.elsewhen (usingDebug.B && io.rw.addr(10) && io.rw.addr(7)) { - ret_prv := reg_dcsr.prv - reg_mstatus.v := usingHypervisor.B && reg_dcsr.v && reg_dcsr.prv <= PRV.S.U - reg_debug := false.B - io.evec := readEPC(reg_dpc) - }.elsewhen (usingNMI.B && io.rw.addr(10) && !io.rw.addr(7)) { - ret_prv := reg_mnstatus.mpp - reg_mstatus.v := usingHypervisor.B && reg_mnstatus.mpv && reg_mnstatus.mpp <= PRV.S.U - reg_rnmie := true.B - io.evec := readEPC(reg_mnepc) - }.otherwise { - reg_mstatus.mie := reg_mstatus.mpie - reg_mstatus.mpie := true.B - reg_mstatus.mpp := legalizePrivilege(PRV.U.U) - reg_mstatus.mpv := false.B - ret_prv := reg_mstatus.mpp - reg_mstatus.v := usingHypervisor.B && reg_mstatus.mpv && reg_mstatus.mpp <= PRV.S.U - io.evec := readEPC(reg_mepc) - } - - new_prv := ret_prv - when (usingUser.B && ret_prv <= PRV.S.U) { - reg_mstatus.mprv := false.B - } - } - - io.time := reg_cycle - io.csr_stall := reg_wfi || io.status.cease - io.status.cease := RegEnable(true.B, false.B, insn_cease) - io.status.wfi := reg_wfi - - for ((io, reg) <- io.customCSRs zip reg_custom) { - io.wen := false.B - io.wdata := wdata - io.value := reg - } - - for ((io, reg) <- io.roccCSRs zip reg_rocc) { - io.wen := false.B - io.wdata := wdata - io.value := reg - } - - io.rw.rdata := Mux1H(for ((k, v) <- read_mapping) yield decoded_addr(k) -> v) - - // cover access to register - val coverable_counters = read_mapping.filterNot { case (k, _) => - k >= CSR.firstHPC + nPerfCounters && k < CSR.firstHPC + CSR.nHPM - } - coverable_counters.foreach( {case (k, v) => { - when (!k.U(11,10).andR) { // Cover points for RW CSR registers - property.cover(io.rw.cmd.isOneOf(CSR.W, CSR.S, CSR.C) && io.rw.addr===k.U, "CSR_access_"+k.toString, "Cover Accessing Core CSR field") - } .otherwise { // Cover points for RO CSR registers - property.cover(io.rw.cmd===CSR.R && io.rw.addr===k.U, "CSR_access_"+k.toString, "Cover Accessing Core CSR field") - } - }}) - - val set_vs_dirty = WireDefault(io.vector.map(_.set_vs_dirty).getOrElse(false.B)) - io.vector.foreach { vio => - when (set_vs_dirty) { - assert(reg_mstatus.vs > 0.U) - when (reg_mstatus.v) { reg_vsstatus.vs := 3.U } - reg_mstatus.vs := 3.U - } - } - - val set_fs_dirty = WireDefault(io.set_fs_dirty.getOrElse(false.B)) - if (coreParams.haveFSDirty) { - when (set_fs_dirty) { - assert(reg_mstatus.fs > 0.U) - when (reg_mstatus.v) { reg_vsstatus.fs := 3.U } - reg_mstatus.fs := 3.U - } - } - - io.fcsr_rm := reg_frm - when (io.fcsr_flags.valid) { - reg_fflags := reg_fflags | io.fcsr_flags.bits - set_fs_dirty := true.B - } - - io.vector.foreach { vio => - when (vio.set_vxsat) { - reg_vxsat.get := true.B - set_vs_dirty := true.B - } - } - - val csr_wen = io.rw.cmd.isOneOf(CSR.S, CSR.C, CSR.W) && !io.rw_stall - io.csrw_counter := Mux(coreParams.haveBasicCounters.B && csr_wen && (io.rw.addr.inRange(CSRs.mcycle.U, (CSRs.mcycle + CSR.nCtr).U) || io.rw.addr.inRange(CSRs.mcycleh.U, (CSRs.mcycleh + CSR.nCtr).U)), UIntToOH(io.rw.addr(log2Ceil(CSR.nCtr+nPerfCounters)-1, 0)), 0.U) - when (csr_wen) { - val scause_mask = ((BigInt(1) << (xLen-1)) + 31).U /* only implement 5 LSBs and MSB */ - val satp_valid_modes = 0 +: (minPgLevels to pgLevels).map(new PTBR().pgLevelsToMode(_)) - - when (decoded_addr(CSRs.mstatus)) { - val new_mstatus = wdata.asTypeOf(new MStatus()) - reg_mstatus.mie := new_mstatus.mie - reg_mstatus.mpie := new_mstatus.mpie - - if (usingUser) { - reg_mstatus.mprv := new_mstatus.mprv - reg_mstatus.mpp := legalizePrivilege(new_mstatus.mpp) - if (usingSupervisor) { - reg_mstatus.spp := new_mstatus.spp - reg_mstatus.spie := new_mstatus.spie - reg_mstatus.sie := new_mstatus.sie - reg_mstatus.tw := new_mstatus.tw - reg_mstatus.tsr := new_mstatus.tsr - } - if (usingVM) { - reg_mstatus.mxr := new_mstatus.mxr - reg_mstatus.sum := new_mstatus.sum - reg_mstatus.tvm := new_mstatus.tvm - } - if (usingHypervisor) { - reg_mstatus.mpv := new_mstatus.mpv - reg_mstatus.gva := new_mstatus.gva - } - } - - if (usingSupervisor || usingFPU) reg_mstatus.fs := formFS(new_mstatus.fs) - reg_mstatus.vs := formVS(new_mstatus.vs) - } - when (decoded_addr(CSRs.misa)) { - val mask = isaStringToMask(isaMaskString).U(xLen.W) - val f = wdata('f' - 'a') - // suppress write if it would cause the next fetch to be misaligned - when (!usingCompressed.B || !io.pc(1) || wdata('c' - 'a')) { - if (coreParams.misaWritable) - reg_misa := ~(~wdata | (!f << ('d' - 'a'))) & mask | reg_misa & ~mask - } - } - when (decoded_addr(CSRs.mip)) { - // MIP should be modified based on the value in reg_mip, not the value - // in read_mip, since read_mip.seip is the OR of reg_mip.seip and - // io.interrupts.seip. We don't want the value on the PLIC line to - // inadvertently be OR'd into read_mip.seip. - val new_mip = readModifyWriteCSR(io.rw.cmd, reg_mip.asUInt, io.rw.wdata).asTypeOf(new MIP) - if (usingSupervisor) { - reg_mip.ssip := new_mip.ssip - reg_mip.stip := new_mip.stip - reg_mip.seip := new_mip.seip - } - if (usingHypervisor) { - reg_mip.vssip := new_mip.vssip - } - } - when (decoded_addr(CSRs.mie)) { reg_mie := wdata & supported_interrupts } - when (decoded_addr(CSRs.mepc)) { reg_mepc := formEPC(wdata) } - when (decoded_addr(CSRs.mscratch)) { reg_mscratch := wdata } - if (mtvecWritable) - when (decoded_addr(CSRs.mtvec)) { reg_mtvec := wdata } - when (decoded_addr(CSRs.mcause)) { reg_mcause := wdata & ((BigInt(1) << (xLen-1)) + (BigInt(1) << whichInterrupt.getWidth) - 1).U } - when (decoded_addr(CSRs.mtval)) { reg_mtval := wdata } - - if (usingNMI) { - val new_mnstatus = wdata.asTypeOf(new MNStatus()) - when (decoded_addr(CustomCSRs.mnscratch)) { reg_mnscratch := wdata } - when (decoded_addr(CustomCSRs.mnepc)) { reg_mnepc := formEPC(wdata) } - when (decoded_addr(CustomCSRs.mncause)) { reg_mncause := wdata & ((BigInt(1) << (xLen-1)) + BigInt(3)).U } - when (decoded_addr(CustomCSRs.mnstatus)) { - reg_mnstatus.mpp := legalizePrivilege(new_mnstatus.mpp) - reg_mnstatus.mpv := usingHypervisor.B && new_mnstatus.mpv - reg_rnmie := reg_rnmie | new_mnstatus.mie // mnie bit settable but not clearable from software - } - } - - for (((e, c), i) <- (reg_hpmevent zip reg_hpmcounter).zipWithIndex) { - writeCounter(i + CSR.firstMHPC, c, wdata) - when (decoded_addr(i + CSR.firstHPE)) { e := perfEventSets.maskEventSelector(wdata) } - } - if (coreParams.haveBasicCounters) { - when (decoded_addr(CSRs.mcountinhibit)) { reg_mcountinhibit := wdata & ~2.U(xLen.W) } // mcountinhibit bit [1] is tied zero - writeCounter(CSRs.mcycle, reg_cycle, wdata) - writeCounter(CSRs.minstret, reg_instret, wdata) - } - - if (usingFPU) { - when (decoded_addr(CSRs.fflags)) { set_fs_dirty := true.B; reg_fflags := wdata } - when (decoded_addr(CSRs.frm)) { set_fs_dirty := true.B; reg_frm := wdata } - when (decoded_addr(CSRs.fcsr)) { - set_fs_dirty := true.B - reg_fflags := wdata - reg_frm := wdata >> reg_fflags.getWidth - } - } - if (usingDebug) { - when (decoded_addr(CSRs.dcsr)) { - val new_dcsr = wdata.asTypeOf(new DCSR()) - reg_dcsr.step := new_dcsr.step - reg_dcsr.ebreakm := new_dcsr.ebreakm - if (usingSupervisor) reg_dcsr.ebreaks := new_dcsr.ebreaks - if (usingUser) reg_dcsr.ebreaku := new_dcsr.ebreaku - if (usingUser) reg_dcsr.prv := legalizePrivilege(new_dcsr.prv) - if (usingHypervisor) reg_dcsr.v := new_dcsr.v - } - when (decoded_addr(CSRs.dpc)) { reg_dpc := formEPC(wdata) } - when (decoded_addr(CSRs.dscratch0)) { reg_dscratch0 := wdata } - reg_dscratch1.foreach { r => - when (decoded_addr(CSRs.dscratch1)) { r := wdata } - } - } - if (usingSupervisor) { - when (decoded_addr(CSRs.sstatus)) { - val new_sstatus = wdata.asTypeOf(new MStatus()) - reg_mstatus.sie := new_sstatus.sie - reg_mstatus.spie := new_sstatus.spie - reg_mstatus.spp := new_sstatus.spp - reg_mstatus.fs := formFS(new_sstatus.fs) - reg_mstatus.vs := formVS(new_sstatus.vs) - if (usingVM) { - reg_mstatus.mxr := new_sstatus.mxr - reg_mstatus.sum := new_sstatus.sum - } - } - when (decoded_addr(CSRs.sip)) { - val new_sip = ((read_mip & ~read_mideleg) | (wdata & read_mideleg)).asTypeOf(new MIP()) - reg_mip.ssip := new_sip.ssip - } - when (decoded_addr(CSRs.satp)) { - if (usingVM) { - val new_satp = wdata.asTypeOf(new PTBR()) - when (new_satp.mode.isOneOf(satp_valid_modes.map(_.U))) { - reg_satp.mode := new_satp.mode & satp_valid_modes.reduce(_|_).U - reg_satp.ppn := new_satp.ppn(ppnBits-1,0) - if (asIdBits > 0) reg_satp.asid := new_satp.asid(asIdBits-1,0) - } - } - } - when (decoded_addr(CSRs.sie)) { reg_mie := (reg_mie & ~sie_mask) | (wdata & sie_mask) } - when (decoded_addr(CSRs.sscratch)) { reg_sscratch := wdata } - when (decoded_addr(CSRs.sepc)) { reg_sepc := formEPC(wdata) } - when (decoded_addr(CSRs.stvec)) { reg_stvec := wdata } - when (decoded_addr(CSRs.scause)) { reg_scause := wdata & scause_mask } - when (decoded_addr(CSRs.stval)) { reg_stval := wdata } - when (decoded_addr(CSRs.mideleg)) { reg_mideleg := wdata } - when (decoded_addr(CSRs.medeleg)) { reg_medeleg := wdata } - when (decoded_addr(CSRs.scounteren)) { reg_scounteren := wdata } - when (decoded_addr(CSRs.senvcfg)) { reg_senvcfg.write(wdata) } - } - - if (usingHypervisor) { - when (decoded_addr(CSRs.hstatus)) { - val new_hstatus = wdata.asTypeOf(new HStatus()) - reg_hstatus.gva := new_hstatus.gva - reg_hstatus.spv := new_hstatus.spv - reg_hstatus.spvp := new_hstatus.spvp - reg_hstatus.hu := new_hstatus.hu - reg_hstatus.vtvm := new_hstatus.vtvm - reg_hstatus.vtw := new_hstatus.vtw - reg_hstatus.vtsr := new_hstatus.vtsr - reg_hstatus.vsxl := new_hstatus.vsxl - } - when (decoded_addr(CSRs.hideleg)) { reg_hideleg := wdata } - when (decoded_addr(CSRs.hedeleg)) { reg_hedeleg := wdata } - when (decoded_addr(CSRs.hgatp)) { - val new_hgatp = wdata.asTypeOf(new PTBR()) - val valid_modes = 0 +: (minPgLevels to pgLevels).map(new_hgatp.pgLevelsToMode(_)) - when (new_hgatp.mode.isOneOf(valid_modes.map(_.U))) { - reg_hgatp.mode := new_hgatp.mode & valid_modes.reduce(_|_).U - } - reg_hgatp.ppn := Cat(new_hgatp.ppn(ppnBits-1,2), 0.U(2.W)) - if (vmIdBits > 0) reg_hgatp.asid := new_hgatp.asid(vmIdBits-1,0) - } - when (decoded_addr(CSRs.hip)) { - val new_hip = ((read_mip & ~hs_delegable_interrupts) | (wdata & hs_delegable_interrupts)).asTypeOf(new MIP()) - reg_mip.vssip := new_hip.vssip - } - when (decoded_addr(CSRs.hie)) { reg_mie := (reg_mie & ~hs_delegable_interrupts) | (wdata & hs_delegable_interrupts) } - when (decoded_addr(CSRs.hvip)) { - val new_sip = ((read_mip & ~hs_delegable_interrupts) | (wdata & hs_delegable_interrupts)).asTypeOf(new MIP()) - reg_mip.vssip := new_sip.vssip - reg_mip.vstip := new_sip.vstip - reg_mip.vseip := new_sip.vseip - } - when (decoded_addr(CSRs.hcounteren)) { reg_hcounteren := wdata } - when (decoded_addr(CSRs.htval)) { reg_htval := wdata } - when (decoded_addr(CSRs.mtval2)) { reg_mtval2 := wdata } - - val write_mhtinst_read_pseudo = wdata(13) && (xLen == 32).option(true.B).getOrElse(wdata(12)) - when(decoded_addr(CSRs.mtinst)) { reg_mtinst_read_pseudo := write_mhtinst_read_pseudo } - when(decoded_addr(CSRs.htinst)) { reg_htinst_read_pseudo := write_mhtinst_read_pseudo } - - when (decoded_addr(CSRs.vsstatus)) { - val new_vsstatus = wdata.asTypeOf(new MStatus()) - reg_vsstatus.sie := new_vsstatus.sie - reg_vsstatus.spie := new_vsstatus.spie - reg_vsstatus.spp := new_vsstatus.spp - reg_vsstatus.mxr := new_vsstatus.mxr - reg_vsstatus.sum := new_vsstatus.sum - reg_vsstatus.fs := formFS(new_vsstatus.fs) - reg_vsstatus.vs := formVS(new_vsstatus.vs) - } - when (decoded_addr(CSRs.vsip)) { - val new_vsip = ((read_hip & ~read_hideleg) | ((wdata << 1) & read_hideleg)).asTypeOf(new MIP()) - reg_mip.vssip := new_vsip.vssip - } - when (decoded_addr(CSRs.vsatp)) { - val new_vsatp = wdata.asTypeOf(new PTBR()) - val mode_ok = new_vsatp.mode.isOneOf(satp_valid_modes.map(_.U)) - when (mode_ok) { - reg_vsatp.mode := new_vsatp.mode & satp_valid_modes.reduce(_|_).U - } - when (mode_ok || !reg_mstatus.v) { - reg_vsatp.ppn := new_vsatp.ppn(vpnBits.min(new_vsatp.ppn.getWidth)-1,0) - if (asIdBits > 0) reg_vsatp.asid := new_vsatp.asid(asIdBits-1,0) - } - } - when (decoded_addr(CSRs.vsie)) { reg_mie := (reg_mie & ~read_hideleg) | ((wdata << 1) & read_hideleg) } - when (decoded_addr(CSRs.vsscratch)) { reg_vsscratch := wdata } - when (decoded_addr(CSRs.vsepc)) { reg_vsepc := formEPC(wdata) } - when (decoded_addr(CSRs.vstvec)) { reg_vstvec := wdata } - when (decoded_addr(CSRs.vscause)) { reg_vscause := wdata & scause_mask } - when (decoded_addr(CSRs.vstval)) { reg_vstval := wdata } - when (decoded_addr(CSRs.henvcfg)) { reg_henvcfg.write(wdata) } - } - if (usingUser) { - when (decoded_addr(CSRs.mcounteren)) { reg_mcounteren := wdata } - when (decoded_addr(CSRs.menvcfg)) { reg_menvcfg.write(wdata) } - } - if (nBreakpoints > 0) { - when (decoded_addr(CSRs.tselect)) { reg_tselect := wdata } - - for ((bp, i) <- reg_bp.zipWithIndex) { - when (i.U === reg_tselect && (!bp.control.dmode || reg_debug)) { - when (decoded_addr(CSRs.tdata2)) { bp.address := wdata } - when (decoded_addr(CSRs.tdata3)) { - if (coreParams.mcontextWidth > 0) { - bp.textra.mselect := wdata(bp.textra.mselectPos) - bp.textra.mvalue := wdata >> bp.textra.mvaluePos - } - if (coreParams.scontextWidth > 0) { - bp.textra.sselect := wdata(bp.textra.sselectPos) - bp.textra.svalue := wdata >> bp.textra.svaluePos - } - } - when (decoded_addr(CSRs.tdata1)) { - bp.control := wdata.asTypeOf(bp.control) - - val prevChain = if (i == 0) false.B else reg_bp(i-1).control.chain - val prevDMode = if (i == 0) false.B else reg_bp(i-1).control.dmode - val nextChain = if (i >= nBreakpoints-1) true.B else reg_bp(i+1).control.chain - val nextDMode = if (i >= nBreakpoints-1) true.B else reg_bp(i+1).control.dmode - val newBPC = readModifyWriteCSR(io.rw.cmd, bp.control.asUInt, io.rw.wdata).asTypeOf(bp.control) - val dMode = newBPC.dmode && reg_debug && (prevDMode || !prevChain) - bp.control.dmode := dMode - when (dMode || (newBPC.action > 1.U)) { bp.control.action := newBPC.action }.otherwise { bp.control.action := 0.U } - bp.control.chain := newBPC.chain && !(prevChain || nextChain) && (dMode || !nextDMode) - } - } - } - } - reg_mcontext.foreach { r => when (decoded_addr(CSRs.mcontext)) { r := wdata }} - reg_scontext.foreach { r => when (decoded_addr(CSRs.scontext)) { r := wdata }} - if (reg_pmp.nonEmpty) for (((pmp, next), i) <- (reg_pmp zip (reg_pmp.tail :+ reg_pmp.last)).zipWithIndex) { - require(xLen % pmp.cfg.getWidth == 0) - when (decoded_addr(CSRs.pmpcfg0 + pmpCfgIndex(i)) && !pmp.cfgLocked) { - val newCfg = (wdata >> ((i * pmp.cfg.getWidth) % xLen)).asTypeOf(new PMPConfig()) - pmp.cfg := newCfg - // disallow unreadable but writable PMPs - pmp.cfg.w := newCfg.w && newCfg.r - // can't select a=NA4 with coarse-grained PMPs - if (pmpGranularity.log2 > PMP.lgAlign) - pmp.cfg.a := Cat(newCfg.a(1), newCfg.a.orR) - } - when (decoded_addr(CSRs.pmpaddr0 + i) && !pmp.addrLocked(next)) { - pmp.addr := wdata - } - } - def writeCustomCSR(io: CustomCSRIO, csr: CustomCSR, reg: UInt) = { - val mask = csr.mask.U(xLen.W) - when (decoded_addr(csr.id)) { - reg := (wdata & mask) | (reg & ~mask) - io.wen := true.B - } - } - for ((io, csr, reg) <- (io.customCSRs, customCSRs, reg_custom).zipped) { - writeCustomCSR(io, csr, reg) - } - for ((io, csr, reg) <- (io.roccCSRs, roccCSRs, reg_rocc).zipped) { - writeCustomCSR(io, csr, reg) - } - if (usingVector) { - when (decoded_addr(CSRs.vstart)) { set_vs_dirty := true.B; reg_vstart.get := wdata } - when (decoded_addr(CSRs.vxrm)) { set_vs_dirty := true.B; reg_vxrm.get := wdata } - when (decoded_addr(CSRs.vxsat)) { set_vs_dirty := true.B; reg_vxsat.get := wdata } - when (decoded_addr(CSRs.vcsr)) { - set_vs_dirty := true.B - reg_vxsat.get := wdata - reg_vxrm.get := wdata >> 1 - } - } - } - - def setCustomCSR(io: CustomCSRIO, csr: CustomCSR, reg: UInt) = { - val mask = csr.mask.U(xLen.W) - when (io.set) { - reg := (io.sdata & mask) | (reg & ~mask) - } - } - for ((io, csr, reg) <- (io.customCSRs, customCSRs, reg_custom).zipped) { - setCustomCSR(io, csr, reg) - } - for ((io, csr, reg) <- (io.roccCSRs, roccCSRs, reg_rocc).zipped) { - setCustomCSR(io, csr, reg) - } - - io.vector.map { vio => - when (vio.set_vconfig.valid) { - // user of CSRFileNpu is responsible for set_vs_dirty in this case - assert(vio.set_vconfig.bits.vl <= vio.set_vconfig.bits.vtype.vlMax) - reg_vconfig.get := vio.set_vconfig.bits - } - when (vio.set_vstart.valid) { - set_vs_dirty := true.B - reg_vstart.get := vio.set_vstart.bits - } - vio.vstart := reg_vstart.get - vio.vconfig := reg_vconfig.get - vio.vxrm := reg_vxrm.get - - when (reset.asBool) { - reg_vconfig.get.vl := 0.U - reg_vconfig.get.vtype := 0.U.asTypeOf(new VType) - reg_vconfig.get.vtype.vill := true.B - } - } - - when(reset.asBool) { - reg_satp.mode := 0.U - reg_vsatp.mode := 0.U - reg_hgatp.mode := 0.U - } - if (!usingVM) { - reg_satp.mode := 0.U - reg_satp.ppn := 0.U - reg_satp.asid := 0.U - } - if (!usingHypervisor) { - reg_vsatp.mode := 0.U - reg_vsatp.ppn := 0.U - reg_vsatp.asid := 0.U - reg_hgatp.mode := 0.U - reg_hgatp.ppn := 0.U - reg_hgatp.asid := 0.U - } - if (!(asIdBits > 0)) { - reg_satp.asid := 0.U - reg_vsatp.asid := 0.U - } - if (!(vmIdBits > 0)) { - reg_hgatp.asid := 0.U - } - reg_vsstatus.xs := (if (usingRoCC) 3.U else 0.U) - - if (nBreakpoints <= 1) reg_tselect := 0.U - for (bpc <- reg_bp map {_.control}) { - bpc.ttype := bpc.tType.U - bpc.maskmax := bpc.maskMax.U - bpc.reserved := 0.U - bpc.zero := 0.U - bpc.h := false.B - if (!usingSupervisor) bpc.s := false.B - if (!usingUser) bpc.u := false.B - if (!usingSupervisor && !usingUser) bpc.m := true.B - when (reset.asBool) { - bpc.action := 0.U - bpc.dmode := false.B - bpc.chain := false.B - bpc.r := false.B - bpc.w := false.B - bpc.x := false.B - } - } - for (bpx <- reg_bp map {_.textra}) { - if (coreParams.mcontextWidth == 0) bpx.mselect := false.B - if (coreParams.scontextWidth == 0) bpx.sselect := false.B - } - for (bp <- reg_bp drop nBreakpoints) - bp := 0.U.asTypeOf(new BP()) - for (pmp <- reg_pmp) { - pmp.cfg.res := 0.U - when (reset.asBool) { pmp.reset() } - } - - for (((t, insn), i) <- (io.trace zip io.inst).zipWithIndex) { - t.exception := io.retire >= i.U && exception - t.valid := io.retire > i.U || t.exception - t.insn := insn - t.iaddr := io.pc - t.priv := Cat(reg_debug, reg_mstatus.prv) - t.cause := cause - t.interrupt := cause(xLen-1) - t.tval := io.tval - t.wdata.foreach(_ := DontCare) - } - - def chooseInterrupt(masksIn: Seq[UInt]): (Bool, UInt) = { - val nonstandard = supported_interrupts.getWidth-1 to 12 by -1 - // MEI, MSI, MTI, SEI, SSI, STI, VSEI, VSSI, VSTI, UEI, USI, UTI - val standard = Seq(11, 3, 7, 9, 1, 5, 10, 2, 6, 8, 0, 4) - val priority = nonstandard ++ standard - val masks = masksIn.reverse - val any = masks.flatMap(m => priority.filter(_ < m.getWidth).map(i => m(i))).reduce(_||_) - val which = PriorityMux(masks.flatMap(m => priority.filter(_ < m.getWidth).map(i => (m(i), i.U)))) - (any, which) - } - - def readModifyWriteCSR(cmd: UInt, rdata: UInt, wdata: UInt) = { - (Mux(cmd(1), rdata, 0.U) | wdata) & ~Mux(cmd(1,0).andR, wdata, 0.U) - } - - def legalizePrivilege(priv: UInt): UInt = - if (usingSupervisor) Mux(priv === PRV.H.U, PRV.U.U, priv) - else if (usingUser) Fill(2, priv(0)) - else PRV.M.U - - def trimPrivilege(priv: UInt): UInt = - if (usingSupervisor) priv - else legalizePrivilege(priv) - - def writeCounter(lo: Int, ctr: WideCounter, wdata: UInt) = { - if (xLen == 32) { - val hi = lo + CSRs.mcycleh - CSRs.mcycle - when (decoded_addr(lo)) { ctr := Cat(ctr(ctr.getWidth-1, 32), wdata) } - when (decoded_addr(hi)) { ctr := Cat(wdata(ctr.getWidth-33, 0), ctr(31, 0)) } - } else { - when (decoded_addr(lo)) { ctr := wdata(ctr.getWidth-1, 0) } - } - } - def formEPC(x: UInt) = ~(~x | (if (usingCompressed) 1.U else 3.U)) - def readEPC(x: UInt) = ~(~x | Mux(reg_misa('c' - 'a'), 1.U, 3.U)) - def formTVec(x: UInt) = x andNot Mux(x(0), ((((BigInt(1) << mtvecInterruptAlign) - 1) << mtvecBaseAlign) | 2).U, 2.U) - def isaStringToMask(s: String) = s.map(x => 1 << (x - 'A')).foldLeft(0)(_|_) - def formFS(fs: UInt) = if (coreParams.haveFSDirty) fs else Fill(2, fs.orR) - def formVS(vs: UInt) = if (usingVector) vs else 0.U -} diff --git a/arch/src/main/scala/framework/rocket/Configs.scala b/arch/src/main/scala/framework/rocket/Configs.scala deleted file mode 100644 index 98f81003..00000000 --- a/arch/src/main/scala/framework/rocket/Configs.scala +++ /dev/null @@ -1,193 +0,0 @@ -package framework.rocket - -import chisel3.util._ - -import org.chipsalliance.cde.config._ -import org.chipsalliance.diplomacy.lazymodule._ - -import freechips.rocketchip.rocket._ -import freechips.rocketchip.prci.{SynchronousCrossing, AsynchronousCrossing, RationalCrossing, ClockCrossingType} -import freechips.rocketchip.subsystem.{TilesLocated, NumTiles, HierarchicalLocation, RocketCrossingParams, SystemBusKey, CacheBlockBytes, RocketTileAttachParams, InSubsystem, InCluster, HierarchicalElementMasterPortParams, HierarchicalElementSlavePortParams, CBUS, CCBUS, ClustersLocated, TileAttachConfig, CloneTileAttachParams} -import freechips.rocketchip.tile.{RocketTileParams, RocketTileBoundaryBufferParams, FPUParams} -import scala.reflect.ClassTag - -import framework.rocket.{RocketTileParamsBB, RocketCrossingParamsBB} - - -class WithNBuckyballCores( - n: Int, - location: HierarchicalLocation, - crossing: RocketCrossingParams, -) extends Config((site, here, up) => { - case TilesLocated(`location`) => { - val prev = up(TilesLocated(`location`), site) - val idOffset = up(NumTiles) - val big = RocketTileParamsBB( - core = RocketCoreParams( - mulDiv = Some(MulDivParams( - mulUnroll = 8, - mulEarlyOut = true, - divEarlyOut = true, - )), - useZba = true, - useZbb = true, - useZbs = true, - fpu = Some(FPUParams(minFLen = 16))), - dcache = Some(DCacheParams( - nSets = 64, - nWays = 8, - rowBits = site(SystemBusKey).beatBits, - nMSHRs = 0, - blockBytes = site(CacheBlockBytes))), - icache = Some(ICacheParams( - nSets = 64, - nWays = 8, - rowBits = site(SystemBusKey).beatBits, - blockBytes = site(CacheBlockBytes)))) - List.tabulate(n)(i => RocketTileAttachParamsBB( - big.copy(tileId = i + idOffset), - crossing - )) ++ prev - } - case NumTiles => up(NumTiles) + n -}) { - def this(n: Int, location: HierarchicalLocation = InSubsystem) = this(n, location, RocketCrossingParams( - master = HierarchicalElementMasterPortParams.locationDefault(location), - slave = HierarchicalElementSlavePortParams.locationDefault(location), - mmioBaseAddressPrefixWhere = location match { - case InSubsystem => CBUS - case InCluster(clusterId) => CCBUS(clusterId) - } - )) -} - -class RocketTileAttachConfig(f: RocketTileAttachParams => RocketTileAttachParams) extends TileAttachConfig[RocketTileAttachParams](f) - -class RocketTileConfig(f: RocketTileParams => RocketTileParams) extends RocketTileAttachConfig(tp => tp.copy( - tileParams = f(tp.tileParams) -)) - -class RocketCrossingConfig(f: RocketCrossingParams => RocketCrossingParams) extends RocketTileAttachConfig(tp => tp.copy( - crossingParams = f(tp.crossingParams) -)) - -class RocketCoreConfig(f: RocketCoreParams => RocketCoreParams) extends RocketTileConfig(tp => tp.copy( - core = f(tp.core) -)) - -class RocketICacheConfig(f: ICacheParams => ICacheParams) extends RocketTileConfig(tp => tp.copy( - icache = tp.icache.map(ic => f(ic)) -)) - -class RocketDCacheConfig(f: DCacheParams => DCacheParams) extends RocketTileConfig(tp => tp.copy( - dcache = tp.dcache.map(dc => f(dc)) -)) - -class WithL1ICacheSets(sets: Int) extends RocketICacheConfig(_.copy(nSets=sets)) -class WithL1ICacheWays(ways: Int) extends RocketICacheConfig(_.copy(nWays=ways)) -class WithL1ICacheECC(dataECC: String, tagECC: String) extends RocketICacheConfig(_.copy(dataECC = Some(dataECC), tagECC = Some(tagECC))) -class WithL1ICacheRowBits(n: Int) extends RocketICacheConfig(_.copy(rowBits = n)) -class WithL1ICacheTLBSets(sets: Int) extends RocketICacheConfig(_.copy(nTLBSets = sets)) -class WithL1ICacheTLBWays(ways: Int) extends RocketICacheConfig(_.copy(nTLBWays = ways)) -class WithL1ICacheTLBBasePageSectors(sectors: Int) extends RocketICacheConfig(_.copy(nTLBBasePageSectors = sectors)) -class WithL1ICacheTLBSuperpages(superpages: Int) extends RocketICacheConfig(_.copy(nTLBSuperpages = superpages)) -class WithL1ICacheBlockBytes(bytes: Int = 64) extends RocketICacheConfig(_.copy(blockBytes = bytes)) - -class WithL1DCacheSets(sets: Int) extends RocketDCacheConfig(_.copy(nSets=sets)) -class WithL1DCacheWays(ways: Int) extends RocketDCacheConfig(_.copy(nWays=ways)) -class WithL1DCacheECC(dataECC: String, tagECC: String) extends RocketDCacheConfig(_.copy(dataECC = Some(dataECC), tagECC = Some(tagECC))) -class WithL1DCacheRowBits(n: Int) extends RocketDCacheConfig(_.copy(rowBits = n)) -class WithL1DCacheTLBSets(sets: Int) extends RocketDCacheConfig(_.copy(nTLBSets = sets)) -class WithL1DCacheTLBWays(ways: Int) extends RocketDCacheConfig(_.copy(nTLBWays = ways)) -class WithL1DCacheTLBBasePageSectors(sectors: Int) extends RocketDCacheConfig(_.copy(nTLBBasePageSectors = sectors)) -class WithL1DCacheTLBSuperpages(superpages: Int) extends RocketDCacheConfig(_.copy(nTLBSuperpages = superpages)) -class WithL1DCacheBlockBytes(bytes: Int = 64) extends RocketDCacheConfig(_.copy(blockBytes = bytes)) -class WithL1DCacheNonblocking(nMSHRs: Int) extends RocketDCacheConfig(_.copy(nMSHRs = nMSHRs)) -class WithL1DCacheClockGating extends RocketDCacheConfig(_.copy(clockGate = true)) -class WithL1DCacheDTIMAddress(address: BigInt) extends RocketDCacheConfig(_.copy(scratch = Some(address))) - -class WithScratchpadsOnly extends RocketTileConfig(tp => tp.copy( - core = tp.core.copy(useVM = false), - dcache = tp.dcache.map(_.copy( - nSets = 256, // 16Kb scratchpad - nWays = 1, - scratch = Some(0x80000000L))) -)) - -class WithCacheRowBits(n: Int) extends RocketTileConfig(tp => tp.copy( - dcache = tp.dcache.map(_.copy(rowBits = n)), - icache = tp.icache.map(_.copy(rowBits = n)) -)) - -class WithBEU(addr: BigInt) extends RocketTileConfig(_.copy(beuAddr = Some(addr))) -class WithBoundaryBuffers(buffers: Option[RocketTileBoundaryBufferParams] = Some(RocketTileBoundaryBufferParams(true))) extends RocketTileConfig(_.copy(boundaryBuffers = buffers)) - -class WithRV32 extends RocketCoreConfig(c => c.copy( - xLen = 32, - pgLevels = 2, // sv32 - fpu = c.fpu.map(_.copy(fLen = 32)), - mulDiv = Some(MulDivParams(mulUnroll = 8)) -)) - -class WithoutVM extends RocketCoreConfig(_.copy(useVM = false)) -class WithCFlushEnabled extends RocketCoreConfig(_.copy(haveCFlush = true)) -class WithNBreakpoints(hwbp: Int) extends RocketCoreConfig(_.copy(nBreakpoints = hwbp)) -class WithHypervisor(hext: Boolean = true) extends RocketCoreConfig(_.copy(useHypervisor = hext)) -class WithCease(enable: Boolean = true) extends RocketCoreConfig(_.copy(haveCease = enable)) -class WithCoreClockGatingEnabled extends RocketCoreConfig(_.copy(clockGate = true)) -class WithPgLevels(n: Int) extends RocketCoreConfig(_.copy(pgLevels = n)) -class WithZba extends RocketCoreConfig(_.copy(useZba = true)) -class WithZbb extends RocketCoreConfig(_.copy(useZbb = true)) -class WithZbs extends RocketCoreConfig(_.copy(useZbs = true)) -class WithB extends RocketCoreConfig(_.copy(useZba = true, useZbb = true, useZbs = true)) -class WithSV48 extends WithPgLevels(4) -class WithSV39 extends WithPgLevels(3) - -// Simulation-only configs -class WithNoSimulationTimeout extends RocketCoreConfig(_.copy(haveSimTimeout = false)) -class WithDebugROB(enable: Boolean = true, size: Int = 0) extends RocketCoreConfig(_.copy(debugROB = Option.when(enable)(DebugROBParams(size)))) - -// FPU configs -class WithoutFPU extends RocketCoreConfig(_.copy(fpu = None)) -class WithFP16 extends RocketCoreConfig(c => c.copy(fpu = c.fpu.map(_.copy(minFLen = 16)))) -class WithFPUWithoutDivSqrt extends RocketCoreConfig(c => c.copy(fpu = c.fpu.map(_.copy(divSqrt = false)))) - -// mul-div configs -class WithFastMulDiv extends RocketCoreConfig(c => c.copy(mulDiv = Some( - MulDivParams(mulUnroll = 8, mulEarlyOut = c.xLen > 32, divEarlyOut = true) -))) -class WithCustomFastMulDiv(mUnroll: Int = 8, mEarlyOut: Boolean = true, dUnroll: Int = 1, dEarlyOut: Boolean = true, dEarlyOutGranularity: Int = 1) extends RocketCoreConfig(_.copy(mulDiv = Some( - MulDivParams(mulUnroll = mUnroll, mulEarlyOut = mEarlyOut, divUnroll = dUnroll, divEarlyOut = dEarlyOut, divEarlyOutGranularity = dEarlyOutGranularity) -))) -class WithoutMulDiv extends RocketCoreConfig(_.copy(mulDiv = None)) - -// Branch-prediction configs -class WithDefaultBtb extends RocketTileConfig(t => t.copy(btb = Some(BTBParams()))) -class WithNoBtb extends RocketTileConfig(_.copy(btb = None)) - -// Tile CDC configs -class WithCDC(crossingType: ClockCrossingType = SynchronousCrossing()) extends RocketCrossingConfig(_.copy(crossingType = crossingType)) -class WithSeperateClockReset extends RocketCrossingConfig(_.copy(forceSeparateClockReset = true)) -class WithSynchronousCDCs extends WithCDC(SynchronousCrossing()) -class WithAsynchronousCDCs(depth: Int, sync: Int) extends WithCDC(AsynchronousCrossing(depth, sync)) -class WithRationalCDCs extends WithCDC(RationalCrossing()) - -class WithCloneRocketTiles( - n: Int = 1, - cloneTileId: Int = 0, - location: HierarchicalLocation = InSubsystem, - cloneLocation: HierarchicalLocation = InSubsystem -) extends Config((site, here, up) => { - case TilesLocated(`location`) => { - val prev = up(TilesLocated(location), site) - val idOffset = up(NumTiles) - val tileAttachParams = up(TilesLocated(cloneLocation)).find(_.tileParams.tileId == cloneTileId) - .get.asInstanceOf[RocketTileAttachParams] - (0 until n).map { i => - CloneTileAttachParams(cloneTileId, tileAttachParams.copy( - tileParams = tileAttachParams.tileParams.copy(tileId = i + idOffset) - )) - } ++ prev - } - case NumTiles => up(NumTiles) + n -}) diff --git a/arch/src/main/scala/framework/rocket/LazyRoCCBB.scala b/arch/src/main/scala/framework/rocket/LazyRoCCBB.scala deleted file mode 100644 index 34643f95..00000000 --- a/arch/src/main/scala/framework/rocket/LazyRoCCBB.scala +++ /dev/null @@ -1,133 +0,0 @@ -// See LICENSE.Berkeley for license details. -// See LICENSE.SiFive for license details. - -package framework.rocket - -import chisel3._ -import chisel3.util._ -import chisel3.experimental.IntParam - -import freechips.rocketchip.rocket._ -import freechips.rocketchip.tile._ - - -import org.chipsalliance.cde.config._ -import org.chipsalliance.diplomacy.lazymodule._ - -import freechips.rocketchip.rocket.{ - MStatus, HellaCacheIO, TLBPTWIO, CanHavePTW, CanHavePTWModule, - SimpleHellaCacheIF, M_XRD, PTE, PRV, M_SZ -} -import freechips.rocketchip.tilelink.{ - TLNode, TLIdentityNode, TLClientNode, TLMasterParameters, TLMasterPortParameters -} -import freechips.rocketchip.util.InOrderArbiter - -case object BuildRoCCBB extends Field[Seq[Parameters => LazyRoCCBB]](Nil) - - -class RoCCCommandBB(implicit p: Parameters) extends CoreBundle()(p) { - val inst = new RoCCInstruction - val rs1 = Bits(xLen.W) - val rs2 = Bits(xLen.W) - val status = new MStatus -} - -class RoCCResponseBB(implicit p: Parameters) extends CoreBundle()(p) { - val rd = Bits(5.W) - val data = Bits(xLen.W) -} - -class RoCCCoreIOBB(val nRoCCCSRs: Int = 0)(implicit p: Parameters) extends CoreBundle()(p) { - val cmd = Flipped(Decoupled(new RoCCCommandBB)) - val resp = Decoupled(new RoCCResponseBB) - val mem = new HellaCacheIO - val busy = Output(Bool()) - val interrupt = Output(Bool()) - val exception = Input(Bool()) - val csrs = Flipped(Vec(nRoCCCSRs, new CustomCSRIO)) -} - -class RoCCIOBB(val nPTWPorts: Int, nRoCCCSRs: Int)(implicit p: Parameters) extends RoCCCoreIOBB(nRoCCCSRs)(p) { - val ptw = Vec(nPTWPorts, new TLBPTWIO) - val fpu_req = Decoupled(new FPInput) - val fpu_resp = Flipped(Decoupled(new FPResult)) -} - -/** Base classes for Diplomatic TL2 RoCC units **/ -abstract class LazyRoCCBB( - val opcodes: OpcodeSet, - val nPTWPorts: Int = 0, - val usesFPU: Boolean = false, - val roccCSRs: Seq[CustomCSR] = Nil -)(implicit p: Parameters) extends LazyModule { - val module: LazyRoCCModuleImpBB - require(roccCSRs.map(_.id).toSet.size == roccCSRs.size) - val atlNode: TLNode = TLIdentityNode() - val tlNode: TLNode = TLIdentityNode() - val stlNode: TLNode = TLIdentityNode() -} - -class LazyRoCCModuleImpBB(outer: LazyRoCCBB) extends LazyModuleImp(outer) { - val io = IO(new RoCCIOBB(outer.nPTWPorts, outer.roccCSRs.size)) - io := DontCare -} - -/** Mixins for including RoCC **/ - -trait HasLazyRoCCBB extends CanHavePTW { this: BaseTile => - val roccs = p(BuildRoCCBB).map(_(p)) - val roccCSRs = roccs.map(_.roccCSRs) // the set of custom CSRs requested by all roccs - require(roccCSRs.flatten.map(_.id).toSet.size == roccCSRs.flatten.size, - "LazyRoCC instantiations require overlapping CSRs") - roccs.map(_.atlNode).foreach { atl => tlMasterXbar.node :=* atl } - roccs.map(_.tlNode).foreach { tl => tlOtherMastersNode :=* tl } - roccs.map(_.stlNode).foreach { stl => stl :*= tlSlaveXbar.node } - - nPTWPorts += roccs.map(_.nPTWPorts).sum - nDCachePorts += roccs.size -} - -trait HasLazyRoCCModuleBB extends CanHavePTWModule - with HasCoreParameters { this: RocketTileModuleImpBB => - - val (respArb, cmdRouter) = if(outer.roccs.nonEmpty) { - val respArb = Module(new RRArbiter(new RoCCResponse()(outer.p), outer.roccs.size)) - val cmdRouter = Module(new RoccCommandRouter(outer.roccs.map(_.opcodes))(outer.p)) - outer.roccs.zipWithIndex.foreach { case (rocc, i) => - rocc.module.io.ptw ++=: ptwPorts - rocc.module.io.cmd <> cmdRouter.io.out(i) - val dcIF = Module(new SimpleHellaCacheIF()(outer.p)) - dcIF.io.requestor <> rocc.module.io.mem - dcachePorts += dcIF.io.cache - respArb.io.in(i) <> Queue(rocc.module.io.resp) - } - (Some(respArb), Some(cmdRouter)) - } else { - (None, None) - } - val roccCSRIOs = outer.roccs.map(_.module.io.csrs) -} - - -class RoccCommandRouterBB(opcodes: Seq[OpcodeSet])(implicit p: Parameters) - extends CoreModule()(p) { - val io = IO(new Bundle { - val in = Flipped(Decoupled(new RoCCCommandBB)) - val out = Vec(opcodes.size, Decoupled(new RoCCCommandBB)) - val busy = Output(Bool()) - }) - - val cmd = Queue(io.in) - val cmdReadys = io.out.zip(opcodes).map { case (out, opcode) => - val me = opcode.matches(cmd.bits.inst.opcode) - out.valid := cmd.valid && me - out.bits := cmd.bits - out.ready && me - } - cmd.ready := cmdReadys.reduce(_ || _) - io.busy := cmd.valid - - assert(PopCount(cmdReadys) <= 1.U, - "Custom opcode matched for more than one accelerator") -} diff --git a/arch/src/main/scala/framework/rocket/README.md b/arch/src/main/scala/framework/rocket/README.md deleted file mode 100644 index 40b53a9b..00000000 --- a/arch/src/main/scala/framework/rocket/README.md +++ /dev/null @@ -1,19 +0,0 @@ -# Buckyball Rocket Core Framework - -This directory contains the customized Rocket core implementation in the Buckyball framework. The architecture is built on top of the Chipyard/Rocket-chip framework, with extensive extensions and modifications to support Buckyball's custom RoCC coprocessors. Buckyball's design philosophy is to maintain compatibility with upstream Rocket-chip while implementing functional extensions through parallel class hierarchies, thus avoiding maintenance issues from directly modifying upstream code while fully leveraging Rocket-chip's mature architecture. - -In Chipyard's hierarchy, the top level is the SoC subsystem, responsible for integrating multiple processor cores, cache subsystems, interconnect buses, memory controllers, and various peripherals. Buckyball defines its subsystem implementation through `RocketSubsystem.scala`, where the `RocketSubsystem` class extends Chipyard's `BaseSubsystem` and mixes in multiple traits to obtain necessary functional support. These traits include `InstantiatesHierarchicalElements` for managing hierarchical component instantiation, `HasTileNotificationSinks` and `HasTileInputConstants` for handling inter-tile communication, `CanHavePeripheryCLINT` and `CanHavePeripheryPLIC` for interrupt controller support, and `HasPeripheryDebug` for debug support. Through this multiple inheritance design pattern, Buckyball can reuse most of Chipyard's infrastructure while performing customized extensions where needed. Importantly, `RocketSubsystem` also defines `RocketTileAttachParamsBB` to describe how to connect Buckyball's version of Rocket tiles to the subsystem, with this parameter class specifying tile configuration parameters and cross-clock-domain connection methods. - -One level down is the tile level, where `RocketTileBB.scala` defines the complete implementation of a single Rocket tile. In Chipyard's design, a tile is a relatively independent processing unit containing the processor core, L1 instruction and data caches, optional vector units, RoCC coprocessor interface, and system bus connection interfaces. The `RocketTileBB` class obtains basic tile functionality by extending `BaseTile` while mixing in several key traits. `SinksExternalInterrupts` and `SourcesExternalNotifications` handle reception and transmission of external interrupts and notifications, `HasLazyRoCCBB` is a Buckyball-specific trait for supporting Buckyball's RoCC coprocessor framework, `HasHellaCache` provides the interface to L1 data cache, and `HasICacheFrontend` provides the instruction fetch frontend implementation. This multi-trait composition design allows `RocketTileBB` to obtain all necessary functionality while maintaining code modularity and maintainability. Notably, `RocketTileBB` defines its own parameter type `RocketTileParamsBB`, which contains all tile configuration information including core parameters, cache parameters, BTB parameters, and provides an instantiation interface through the `InstantiableTileParams` trait. - -Inside the tile, the most core component is the processor core itself, implemented by `RocketCoreBB.scala`. This file contains a reimplementation of the original Rocket core, where the `RocketBB` class extends `CoreModule` and mixes in `HasRocketCoreParameters` and `HasRocketCoreIOBB` traits. `CoreModule` provides the basic framework for core modules, `HasRocketCoreParameters` provides access interfaces to various core parameters, while `HasRocketCoreIOBB` defines Buckyball-specific core IO interfaces. The key difference of this IO interface from the standard Rocket core IO is using `RoCCCoreIOBB` instead of standard `RoCCCoreIO`, enabling support for Buckyball-specific RoCC interface extensions. In the core implementation, the most critical modification is instruction decode table handling. The original Rocket core decides whether to include RoCC instruction decode logic based on the `usingRoCC` parameter, but since Buckyball uses `BuildRoCCBB` instead of standard `BuildRoCC`, this causes `usingRoCC` to return false, preventing RoCC instructions from being properly decoded. To solve this problem, Buckyball forces inclusion of `RoCCDecode` in the decode table, ensuring custom instructions can be correctly recognized and processed. - -RoCC coprocessor support is implemented through `LazyRoCCBB.scala`, which defines Buckyball's RoCC framework. The `HasLazyRoCCBB` trait is the core of this framework, responsible for managing RoCC coprocessor instantiation and connections. This trait creates corresponding RoCC instances based on `BuildRoCCBB` configuration and allocates independent CSR address spaces for each RoCC. The `HasLazyRoCCModuleBB` trait handles RoCC connections at the module level, instantiating `RoccCommandRouterBB` for instruction routing. This router decides which specific RoCC instance to send instructions to based on instruction opcode, and also arbitrates responses from different RoCCs before returning them to the core. The router design considers concurrent instruction execution and response ordering, ensuring system correctness and performance. - -`CSRBB.scala` contains a reimplementation of the control and status register subsystem, which is one of the most complex parts of the Buckyball framework. The CSR subsystem handles all control and status register accesses, including standard RISC-V CSRs and Buckyball-specific extended CSRs. This implementation is based on the original Rocket CSR implementation but extended for Buckyball's needs, supporting more flexible CSR address allocation and more complex read/write logic. Particularly for RoCC-related CSRs, Buckyball implements a dynamic address allocation mechanism that allows different RoCC instances to have independent CSR spaces, avoiding address conflicts. - -`RoCCFragments.scala` defines Buckyball's RoCC interface data structures, which maintain compatibility with standard RoCC interfaces while providing additional extension capabilities. This includes extended command formats, response formats, and additional control signals. These interface definitions form the foundation of the entire Buckyball RoCC ecosystem, ensuring correct communication between different components. - -`Configs.scala` contains rich configuration definitions that specify various hardware parameters through Chipyard's configuration system. The configuration system uses functional programming concepts, building complex system configurations through composition of configuration functions. Buckyball's configuration defines how to integrate Buckyball-specific components into the entire system, including RoCC configuration, CSR configuration, and various performance parameter settings. - -The most critical design challenge in the entire architecture lies in parameter passing and configuration consistency. The Chipyard/Rocket-chip framework extensively uses Scala's implicit parameter mechanism and the `Parameters` configuration system, which allows configuration information to be passed and overridden throughout the hardware hierarchy. The main problem Buckyball faces is how to ensure its configuration information (especially RoCC configurations defined in `BuildRoCCBB`) can be correctly passed to all components that need it. Since the original Rocket core only recognizes `BuildRoCC` and is unaware of `BuildRoCCBB`, Buckyball adopts an elegant solution: when `RocketTileBB` creates a `RocketBB` core instance, it dynamically modifies the `Parameters` object passed to the core, merging the contents of `BuildRoCCBB` into `BuildRoCC`. From the core's perspective, Buckyball's RoCC appears just like a standard RoCC, and all logic based on `BuildRoCC` works correctly, including `usingRoCC` parameter calculation, port count calculation, and instruction decode table construction. This design maintains compatibility with upstream code while achieving the functional extensions required by Buckyball, exemplifying an elegant application of the adapter pattern in software engineering. diff --git a/arch/src/main/scala/framework/rocket/RoCCFragments.scala b/arch/src/main/scala/framework/rocket/RoCCFragments.scala deleted file mode 100644 index e5ba7fbe..00000000 --- a/arch/src/main/scala/framework/rocket/RoCCFragments.scala +++ /dev/null @@ -1,39 +0,0 @@ -package framework.rocket - -import chisel3._ - -import org.chipsalliance.cde.config.{Field, Parameters, Config} -import freechips.rocketchip.tile._ -import freechips.rocketchip.diplomacy._ -import framework.rocket.{LazyRoCCBB, BuildRoCCBB} - -import chipyard.{TestSuitesKey, TestSuiteHelper} - -/** - * Map from a tileId to a particular RoCC accelerator - */ -case object MultiRoCCKey extends Field[Map[Int, Seq[Parameters => LazyRoCC]]](Map.empty[Int, Seq[Parameters => LazyRoCC]]) -case object MultiRoCCKeyBB extends Field[Map[Int, Seq[Parameters => LazyRoCCBB]]](Map.empty[Int, Seq[Parameters => LazyRoCCBB]]) - -/** - * Config fragment to enable different RoCCs based on the tileId - */ -class WithMultiRoCC extends Config((site, here, up) => { - case BuildRoCC => site(MultiRoCCKey).getOrElse(site(TileKey).tileId, Nil) -}) - -class WithMultiRoCCBB extends Config((site, here, up) => { - case BuildRoCCBB => site(MultiRoCCKeyBB).getOrElse(site(TileKey).tileId, Nil) -}) - - -/** - * Assigns what was previously in the BuildRoCC key to specific harts with MultiRoCCKey - * Must be paired with WithMultiRoCC - */ -class WithMultiRoCCFromBuildRoCC(harts: Int*) extends Config((site, here, up) => { - case BuildRoCC => Nil - case MultiRoCCKey => up(MultiRoCCKey, site) ++ harts.distinct.map { i => - (i -> up(BuildRoCC, site)) - } -}) diff --git a/arch/src/main/scala/framework/rocket/RocketCoreBB.scala b/arch/src/main/scala/framework/rocket/RocketCoreBB.scala deleted file mode 100644 index ab35490a..00000000 --- a/arch/src/main/scala/framework/rocket/RocketCoreBB.scala +++ /dev/null @@ -1,1245 +0,0 @@ -// See LICENSE.Berkeley for license details. -// See LICENSE.SiFive for license details. - -package framework.rocket - -import chisel3._ -import chisel3.util._ -import chisel3.withClock -import org.chipsalliance.cde.config.Parameters -import freechips.rocketchip.tile._ -import freechips.rocketchip.util._ -import freechips.rocketchip.util.property -import scala.collection.mutable.ArrayBuffer -import freechips.rocketchip.rocket._ -import framework.rocket.RoCCCoreIOBB - - -trait HasRocketCoreIOBB extends HasRocketCoreParameters { - implicit val p: Parameters - def nTotalRoCCCSRs: Int - val io = IO(new CoreBundle()(p) { - val hartid = Input(UInt(hartIdLen.W)) - val reset_vector = Input(UInt(resetVectorLen.W)) - val interrupts = Input(new CoreInterrupts(tileParams.asInstanceOf[RocketTileParamsBB].beuAddr.isDefined)) - val imem = new FrontendIO - val dmem = new HellaCacheIO - val ptw = Flipped(new DatapathPTWIO()) - val fpu = Flipped(new FPUCoreIO()) - val rocc = Flipped(new RoCCCoreIOBB(nTotalRoCCCSRs)) - val trace = Output(new TraceBundle) - val bpwatch = Output(Vec(coreParams.nBreakpoints, new BPWatch(coreParams.retireWidth))) - val cease = Output(Bool()) - val wfi = Output(Bool()) - val traceStall = Input(Bool()) - val vector = if (usingVector) Some(Flipped(new VectorCoreIO)) else None - }) -} - - -class RocketBB(tile: RocketTileBB)(implicit p: Parameters) extends CoreModule()(p) - with HasRocketCoreParameters - with HasRocketCoreIOBB { - def nTotalRoCCCSRs = tile.roccCSRs.flatten.size - import ALU._ - - val clock_en_reg = RegInit(true.B) - val long_latency_stall = Reg(Bool()) - val id_reg_pause = Reg(Bool()) - val imem_might_request_reg = Reg(Bool()) - val clock_en = WireDefault(true.B) - val gated_clock = - if (!rocketParams.clockGate) clock - else ClockGate(clock, clock_en, "rocket_clock_gate") - - class RocketImpl { // entering gated-clock domain - - // performance counters - def pipelineIDToWB[T <: Data](x: T): T = - RegEnable(RegEnable(RegEnable(x, !ctrl_killd), ex_pc_valid), mem_pc_valid) - val perfEvents = new EventSets(Seq( - new EventSet((mask, hits) => Mux(wb_xcpt, mask(0), wb_valid && pipelineIDToWB((mask & hits).orR)), Seq( - ("exception", () => false.B), - ("load", () => id_ctrl.mem && id_ctrl.mem_cmd === M_XRD && !id_ctrl.fp), - ("store", () => id_ctrl.mem && id_ctrl.mem_cmd === M_XWR && !id_ctrl.fp), - ("amo", () => usingAtomics.B && id_ctrl.mem && (isAMO(id_ctrl.mem_cmd) || id_ctrl.mem_cmd.isOneOf(M_XLR, M_XSC))), - ("system", () => id_ctrl.csr =/= CSR.N), - ("arith", () => id_ctrl.wxd && !(id_ctrl.jal || id_ctrl.jalr || id_ctrl.mem || id_ctrl.fp || id_ctrl.mul || id_ctrl.div || id_ctrl.csr =/= CSR.N)), - ("branch", () => id_ctrl.branch), - ("jal", () => id_ctrl.jal), - ("jalr", () => id_ctrl.jalr)) - ++ (if (!usingMulDiv) Seq() else Seq( - ("mul", () => if (pipelinedMul) id_ctrl.mul else id_ctrl.div && (id_ctrl.alu_fn & FN_DIV) =/= FN_DIV), - ("div", () => if (pipelinedMul) id_ctrl.div else id_ctrl.div && (id_ctrl.alu_fn & FN_DIV) === FN_DIV))) - ++ (if (!usingFPU) Seq() else Seq( - ("fp load", () => id_ctrl.fp && io.fpu.dec.ldst && io.fpu.dec.wen), - ("fp store", () => id_ctrl.fp && io.fpu.dec.ldst && !io.fpu.dec.wen), - ("fp add", () => id_ctrl.fp && io.fpu.dec.fma && io.fpu.dec.swap23), - ("fp mul", () => id_ctrl.fp && io.fpu.dec.fma && !io.fpu.dec.swap23 && !io.fpu.dec.ren3), - ("fp mul-add", () => id_ctrl.fp && io.fpu.dec.fma && io.fpu.dec.ren3), - ("fp div/sqrt", () => id_ctrl.fp && (io.fpu.dec.div || io.fpu.dec.sqrt)), - ("fp other", () => id_ctrl.fp && !(io.fpu.dec.ldst || io.fpu.dec.fma || io.fpu.dec.div || io.fpu.dec.sqrt))))), - new EventSet((mask, hits) => (mask & hits).orR, Seq( - ("load-use interlock", () => id_ex_hazard && ex_ctrl.mem || id_mem_hazard && mem_ctrl.mem || id_wb_hazard && wb_ctrl.mem), - ("long-latency interlock", () => id_sboard_hazard), - ("csr interlock", () => id_ex_hazard && ex_ctrl.csr =/= CSR.N || id_mem_hazard && mem_ctrl.csr =/= CSR.N || id_wb_hazard && wb_ctrl.csr =/= CSR.N), - ("I$ blocked", () => icache_blocked), - ("D$ blocked", () => id_ctrl.mem && dcache_blocked), - ("branch misprediction", () => take_pc_mem && mem_direction_misprediction), - ("control-flow target misprediction", () => take_pc_mem && mem_misprediction && mem_cfi && !mem_direction_misprediction && !icache_blocked), - ("flush", () => wb_reg_flush_pipe), - ("replay", () => replay_wb)) - ++ (if (!usingMulDiv) Seq() else Seq( - ("mul/div interlock", () => id_ex_hazard && (ex_ctrl.mul || ex_ctrl.div) || id_mem_hazard && (mem_ctrl.mul || mem_ctrl.div) || id_wb_hazard && wb_ctrl.div))) - ++ (if (!usingFPU) Seq() else Seq( - ("fp interlock", () => id_ex_hazard && ex_ctrl.fp || id_mem_hazard && mem_ctrl.fp || id_wb_hazard && wb_ctrl.fp || id_ctrl.fp && id_stall_fpu)))), - new EventSet((mask, hits) => (mask & hits).orR, Seq( - ("I$ miss", () => io.imem.perf.acquire), - ("D$ miss", () => io.dmem.perf.acquire), - ("D$ release", () => io.dmem.perf.release), - ("ITLB miss", () => io.imem.perf.tlbMiss), - ("DTLB miss", () => io.dmem.perf.tlbMiss), - ("L2 TLB miss", () => io.ptw.perf.l2miss))))) - - - val pipelinedMul = usingMulDiv && mulDivParams.mulUnroll == xLen - val decode_table = { - (if (usingMulDiv) new MDecode(pipelinedMul) +: (xLen > 32).option(new M64Decode(pipelinedMul)).toSeq else Nil) ++: - (if (usingAtomics) new ADecode +: (xLen > 32).option(new A64Decode).toSeq else Nil) ++: - (if (fLen >= 32) new FDecode +: (xLen > 32).option(new F64Decode).toSeq else Nil) ++: - (if (fLen >= 64) new DDecode +: (xLen > 32).option(new D64Decode).toSeq else Nil) ++: - (if (minFLen == 16) new HDecode +: (xLen > 32).option(new H64Decode).toSeq ++: (fLen >= 64).option(new HDDecode).toSeq else Nil) ++: - (usingRoCC.option(new RoCCDecode)) ++: - (if (xLen == 32) new I32Decode else new I64Decode) +: - (usingVM.option(new SVMDecode)) ++: - (usingSupervisor.option(new SDecode)) ++: - (usingHypervisor.option(new HypervisorDecode)) ++: - ((usingHypervisor && (xLen == 64)).option(new Hypervisor64Decode)) ++: - (usingDebug.option(new DebugDecode)) ++: - (usingNMI.option(new NMIDecode)) ++: - (usingConditionalZero.option(new ConditionalZeroDecode)) ++: - Seq(new FenceIDecode(tile.dcache.flushOnFenceI)) ++: - coreParams.haveCFlush.option(new CFlushDecode(tile.dcache.canSupportCFlushLine)) ++: - rocketParams.haveCease.option(new CeaseDecode) ++: - usingVector.option(new VCFGDecode) ++: - (if (coreParams.useZba) new ZbaDecode +: (xLen > 32).option(new Zba64Decode).toSeq else Nil) ++: - (if (coreParams.useZbb) Seq(new ZbbDecode, if (xLen == 32) new Zbb32Decode else new Zbb64Decode) else Nil) ++: - coreParams.useZbs.option(new ZbsDecode) ++: - Seq(new IDecode) - } flatMap(_.table) - - val ex_ctrl = Reg(new IntCtrlSigs) - val mem_ctrl = Reg(new IntCtrlSigs) - val wb_ctrl = Reg(new IntCtrlSigs) - - val ex_reg_xcpt_interrupt = Reg(Bool()) - val ex_reg_valid = Reg(Bool()) - val ex_reg_rvc = Reg(Bool()) - val ex_reg_btb_resp = Reg(new BTBResp) - val ex_reg_xcpt = Reg(Bool()) - val ex_reg_flush_pipe = Reg(Bool()) - val ex_reg_load_use = Reg(Bool()) - val ex_reg_cause = Reg(UInt()) - val ex_reg_replay = Reg(Bool()) - val ex_reg_pc = Reg(UInt()) - val ex_reg_mem_size = Reg(UInt()) - val ex_reg_hls = Reg(Bool()) - val ex_reg_inst = Reg(Bits()) - val ex_reg_raw_inst = Reg(UInt()) - val ex_reg_wphit = Reg(Vec(nBreakpoints, Bool())) - val ex_reg_set_vconfig = Reg(Bool()) - - val mem_reg_xcpt_interrupt = Reg(Bool()) - val mem_reg_valid = Reg(Bool()) - val mem_reg_rvc = Reg(Bool()) - val mem_reg_btb_resp = Reg(new BTBResp) - val mem_reg_xcpt = Reg(Bool()) - val mem_reg_replay = Reg(Bool()) - val mem_reg_flush_pipe = Reg(Bool()) - val mem_reg_cause = Reg(UInt()) - val mem_reg_slow_bypass = Reg(Bool()) - val mem_reg_load = Reg(Bool()) - val mem_reg_store = Reg(Bool()) - val mem_reg_set_vconfig = Reg(Bool()) - val mem_reg_sfence = Reg(Bool()) - val mem_reg_pc = Reg(UInt()) - val mem_reg_inst = Reg(Bits()) - val mem_reg_mem_size = Reg(UInt()) - val mem_reg_hls_or_dv = Reg(Bool()) - val mem_reg_raw_inst = Reg(UInt()) - val mem_reg_wdata = Reg(Bits()) - val mem_reg_rs2 = Reg(Bits()) - val mem_br_taken = Reg(Bool()) - val take_pc_mem = Wire(Bool()) - val mem_reg_wphit = Reg(Vec(nBreakpoints, Bool())) - - val wb_reg_valid = Reg(Bool()) - val wb_reg_xcpt = Reg(Bool()) - val wb_reg_replay = Reg(Bool()) - val wb_reg_flush_pipe = Reg(Bool()) - val wb_reg_cause = Reg(UInt()) - val wb_reg_set_vconfig = Reg(Bool()) - val wb_reg_sfence = Reg(Bool()) - val wb_reg_pc = Reg(UInt()) - val wb_reg_mem_size = Reg(UInt()) - val wb_reg_hls_or_dv = Reg(Bool()) - val wb_reg_hfence_v = Reg(Bool()) - val wb_reg_hfence_g = Reg(Bool()) - val wb_reg_inst = Reg(Bits()) - val wb_reg_raw_inst = Reg(UInt()) - val wb_reg_wdata = Reg(Bits()) - val wb_reg_rs2 = Reg(Bits()) - val take_pc_wb = Wire(Bool()) - val wb_reg_wphit = Reg(Vec(nBreakpoints, Bool())) - - val take_pc_mem_wb = take_pc_wb || take_pc_mem - val take_pc = take_pc_mem_wb - - // decode stage - val ibuf = Module(new IBuf) - val id_expanded_inst = ibuf.io.inst.map(_.bits.inst) - val id_raw_inst = ibuf.io.inst.map(_.bits.raw) - val id_inst = id_expanded_inst.map(_.bits) - ibuf.io.imem <> io.imem.resp - ibuf.io.kill := take_pc - - require(decodeWidth == 1 /* TODO */ && retireWidth == decodeWidth) - require(!(coreParams.useRVE && coreParams.fpu.nonEmpty), "Can't select both RVE and floating-point") - require(!(coreParams.useRVE && coreParams.useHypervisor), "Can't select both RVE and Hypervisor") - val id_ctrl = Wire(new IntCtrlSigs).decode(id_inst(0), decode_table) - - val lgNXRegs = if (coreParams.useRVE) 4 else 5 - val regAddrMask = (1 << lgNXRegs) - 1 - - def decodeReg(x: UInt) = (x.extract(x.getWidth-1, lgNXRegs).asBool, x(lgNXRegs-1, 0)) - val (id_raddr3_illegal, id_raddr3) = decodeReg(id_expanded_inst(0).rs3) - val (id_raddr2_illegal, id_raddr2) = decodeReg(id_expanded_inst(0).rs2) - val (id_raddr1_illegal, id_raddr1) = decodeReg(id_expanded_inst(0).rs1) - val (id_waddr_illegal, id_waddr) = decodeReg(id_expanded_inst(0).rd) - - val id_load_use = Wire(Bool()) - val id_reg_fence = RegInit(false.B) - val id_ren = IndexedSeq(id_ctrl.rxs1, id_ctrl.rxs2) - val id_raddr = IndexedSeq(id_raddr1, id_raddr2) - val rf = new RegFile(regAddrMask, xLen) - val id_rs = id_raddr.map(rf.read _) - val ctrl_killd = Wire(Bool()) - val id_npc = (ibuf.io.pc.asSInt + ImmGen(IMM_UJ, id_inst(0))).asUInt - - val csr = Module(new CSRFileBB(perfEvents, coreParams.customCSRs.decls, tile.roccCSRs.flatten, tile.rocketParams.beuAddr.isDefined)) - val id_csr_en = id_ctrl.csr.isOneOf(CSR.S, CSR.C, CSR.W) - val id_system_insn = id_ctrl.csr === CSR.I - val id_csr_ren = id_ctrl.csr.isOneOf(CSR.S, CSR.C) && id_expanded_inst(0).rs1 === 0.U - val id_csr = Mux(id_system_insn && id_ctrl.mem, CSR.N, Mux(id_csr_ren, CSR.R, id_ctrl.csr)) - val id_csr_flush = id_system_insn || (id_csr_en && !id_csr_ren && csr.io.decode(0).write_flush) - val id_set_vconfig = Seq(Instructions.VSETVLI, Instructions.VSETIVLI, Instructions.VSETVL).map(_ === id_inst(0)).orR && usingVector.B - - id_ctrl.vec := false.B - if (usingVector) { - val v_decode = rocketParams.vector.get.decoder(p) - v_decode.io.inst := id_inst(0) - v_decode.io.vconfig := csr.io.vector.get.vconfig - when (v_decode.io.legal) { - id_ctrl.legal := !csr.io.vector.get.vconfig.vtype.vill - id_ctrl.fp := v_decode.io.fp - id_ctrl.rocc := false.B - id_ctrl.branch := false.B - id_ctrl.jal := false.B - id_ctrl.jalr := false.B - id_ctrl.rxs2 := v_decode.io.read_rs2 - id_ctrl.rxs1 := v_decode.io.read_rs1 - id_ctrl.mem := false.B - id_ctrl.rfs1 := v_decode.io.read_frs1 - id_ctrl.rfs2 := false.B - id_ctrl.rfs3 := false.B - id_ctrl.wfd := v_decode.io.write_frd - id_ctrl.mul := false.B - id_ctrl.div := false.B - id_ctrl.wxd := v_decode.io.write_rd - id_ctrl.csr := CSR.N - id_ctrl.fence_i := false.B - id_ctrl.fence := false.B - id_ctrl.amo := false.B - id_ctrl.dp := false.B - id_ctrl.vec := true.B - } - } - - - val id_illegal_insn = !id_ctrl.legal || - (id_ctrl.mul || id_ctrl.div) && !csr.io.status.isa('m'-'a') || - id_ctrl.amo && !csr.io.status.isa('a'-'a') || - id_ctrl.fp && (csr.io.decode(0).fp_illegal || (io.fpu.illegal_rm && !id_ctrl.vec)) || - (id_ctrl.vec) && (csr.io.decode(0).vector_illegal || csr.io.vector.map(_.vconfig.vtype.vill).getOrElse(false.B)) || - id_ctrl.dp && !csr.io.status.isa('d'-'a') || - ibuf.io.inst(0).bits.rvc && !csr.io.status.isa('c'-'a') || - id_raddr2_illegal && id_ctrl.rxs2 || - id_raddr1_illegal && id_ctrl.rxs1 || - id_waddr_illegal && id_ctrl.wxd || - id_ctrl.rocc && csr.io.decode(0).rocc_illegal || - id_csr_en && (csr.io.decode(0).read_illegal || !id_csr_ren && csr.io.decode(0).write_illegal) || - !ibuf.io.inst(0).bits.rvc && (id_system_insn && csr.io.decode(0).system_illegal) - val id_virtual_insn = id_ctrl.legal && - ((id_csr_en && !(!id_csr_ren && csr.io.decode(0).write_illegal) && csr.io.decode(0).virtual_access_illegal) || - (!ibuf.io.inst(0).bits.rvc && id_system_insn && csr.io.decode(0).virtual_system_illegal)) - // stall decode for fences (now, for AMO.rl; later, for AMO.aq and FENCE) - val id_amo_aq = id_inst(0)(26) - val id_amo_rl = id_inst(0)(25) - val id_fence_pred = id_inst(0)(27,24) - val id_fence_succ = id_inst(0)(23,20) - val id_fence_next = id_ctrl.fence || id_ctrl.amo && id_amo_aq - val id_mem_busy = !io.dmem.ordered || io.dmem.req.valid - when (!id_mem_busy) { id_reg_fence := false.B } - val id_rocc_busy = usingRoCC.B && - (io.rocc.busy || ex_reg_valid && ex_ctrl.rocc || - mem_reg_valid && mem_ctrl.rocc || wb_reg_valid && wb_ctrl.rocc) - val id_csr_rocc_write = tile.roccCSRs.flatten.map(_.id.U === id_inst(0)(31,20)).orR && id_csr_en && !id_csr_ren - val id_vec_busy = io.vector.map(v => v.backend_busy || v.trap_check_busy).getOrElse(false.B) - val id_do_fence = WireDefault(id_rocc_busy && (id_ctrl.fence || id_csr_rocc_write) || - id_vec_busy && id_ctrl.fence || - id_mem_busy && (id_ctrl.amo && id_amo_rl || id_ctrl.fence_i || id_reg_fence && (id_ctrl.mem || id_ctrl.rocc))) - - val bpu = Module(new BreakpointUnit(nBreakpoints)) - bpu.io.status := csr.io.status - bpu.io.bp := csr.io.bp - bpu.io.pc := ibuf.io.pc - bpu.io.ea := mem_reg_wdata - bpu.io.mcontext := csr.io.mcontext - bpu.io.scontext := csr.io.scontext - - val id_xcpt0 = ibuf.io.inst(0).bits.xcpt0 - val id_xcpt1 = ibuf.io.inst(0).bits.xcpt1 - val (id_xcpt, id_cause) = checkExceptions(List( - (csr.io.interrupt, csr.io.interrupt_cause), - (bpu.io.debug_if, CSR.debugTriggerCause.U), - (bpu.io.xcpt_if, Causes.breakpoint.U), - (id_xcpt0.pf.inst, Causes.fetch_page_fault.U), - (id_xcpt0.gf.inst, Causes.fetch_guest_page_fault.U), - (id_xcpt0.ae.inst, Causes.fetch_access.U), - (id_xcpt1.pf.inst, Causes.fetch_page_fault.U), - (id_xcpt1.gf.inst, Causes.fetch_guest_page_fault.U), - (id_xcpt1.ae.inst, Causes.fetch_access.U), - (id_virtual_insn, Causes.virtual_instruction.U), - (id_illegal_insn, Causes.illegal_instruction.U))) - - val idCoverCauses = List( - (CSR.debugTriggerCause, "DEBUG_TRIGGER"), - (Causes.breakpoint, "BREAKPOINT"), - (Causes.fetch_access, "FETCH_ACCESS"), - (Causes.illegal_instruction, "ILLEGAL_INSTRUCTION") - ) ++ (if (usingVM) List( - (Causes.fetch_page_fault, "FETCH_PAGE_FAULT") - ) else Nil) - coverExceptions(id_xcpt, id_cause, "DECODE", idCoverCauses) - - val dcache_bypass_data = - if (fastLoadByte) io.dmem.resp.bits.data(xLen-1, 0) - else if (fastLoadWord) io.dmem.resp.bits.data_word_bypass(xLen-1, 0) - else wb_reg_wdata - - // detect bypass opportunities - val ex_waddr = ex_reg_inst(11,7) & regAddrMask.U - val mem_waddr = mem_reg_inst(11,7) & regAddrMask.U - val wb_waddr = wb_reg_inst(11,7) & regAddrMask.U - val bypass_sources = IndexedSeq( - (true.B, 0.U, 0.U), // treat reading x0 as a bypass - (ex_reg_valid && ex_ctrl.wxd, ex_waddr, mem_reg_wdata), - (mem_reg_valid && mem_ctrl.wxd && !mem_ctrl.mem, mem_waddr, wb_reg_wdata), - (mem_reg_valid && mem_ctrl.wxd, mem_waddr, dcache_bypass_data)) - val id_bypass_src = id_raddr.map(raddr => bypass_sources.map(s => s._1 && s._2 === raddr)) - - // execute stage - val bypass_mux = bypass_sources.map(_._3) - val ex_reg_rs_bypass = Reg(Vec(id_raddr.size, Bool())) - val ex_reg_rs_lsb = Reg(Vec(id_raddr.size, UInt(log2Ceil(bypass_sources.size).W))) - val ex_reg_rs_msb = Reg(Vec(id_raddr.size, UInt())) - val ex_rs = for (i <- 0 until id_raddr.size) - yield Mux(ex_reg_rs_bypass(i), bypass_mux(ex_reg_rs_lsb(i)), Cat(ex_reg_rs_msb(i), ex_reg_rs_lsb(i))) - val ex_imm = ImmGen(ex_ctrl.sel_imm, ex_reg_inst) - val ex_rs1shl = Mux(ex_reg_inst(3), ex_rs(0)(31,0), ex_rs(0)) << ex_reg_inst(14,13) - val ex_op1 = MuxLookup(ex_ctrl.sel_alu1, 0.S)(Seq( - A1_RS1 -> ex_rs(0).asSInt, - A1_PC -> ex_reg_pc.asSInt, - A1_RS1SHL -> (if (rocketParams.useZba) ex_rs1shl.asSInt else 0.S) - )) - val ex_op2_oh = UIntToOH(Mux(ex_ctrl.sel_alu2(0), (ex_reg_inst >> 20).asUInt, ex_rs(1))(log2Ceil(xLen)-1,0)).asSInt - val ex_op2 = MuxLookup(ex_ctrl.sel_alu2, 0.S)(Seq( - A2_RS2 -> ex_rs(1).asSInt, - A2_IMM -> ex_imm, - A2_SIZE -> Mux(ex_reg_rvc, 2.S, 4.S), - ) ++ (if (coreParams.useZbs) Seq( - A2_RS2OH -> ex_op2_oh, - A2_IMMOH -> ex_op2_oh, - ) else Nil)) - - val (ex_new_vl, ex_new_vconfig) = if (usingVector) { - val ex_new_vtype = VType.fromUInt(MuxCase(ex_rs(1), Seq( - ex_reg_inst(31,30).andR -> ex_reg_inst(29,20), - !ex_reg_inst(31) -> ex_reg_inst(30,20)))) - val ex_avl = Mux(ex_ctrl.rxs1, - Mux(ex_reg_inst(19,15) === 0.U, - Mux(ex_reg_inst(11,7) === 0.U, csr.io.vector.get.vconfig.vl, ex_new_vtype.vlMax), - ex_rs(0) - ), - ex_reg_inst(19,15)) - val ex_new_vl = ex_new_vtype.vl(ex_avl, csr.io.vector.get.vconfig.vl, false.B, false.B, false.B) - val ex_new_vconfig = Wire(new VConfig) - ex_new_vconfig.vtype := ex_new_vtype - ex_new_vconfig.vl := ex_new_vl - (Some(ex_new_vl), Some(ex_new_vconfig)) - } else { (None, None) } - - val alu = Module(new ALU) - alu.io.dw := ex_ctrl.alu_dw - alu.io.fn := ex_ctrl.alu_fn - alu.io.in2 := ex_op2.asUInt - alu.io.in1 := ex_op1.asUInt - - // multiplier and divider - val div = Module(new MulDiv(if (pipelinedMul) mulDivParams.copy(mulUnroll = 0) else mulDivParams, width = xLen)) - div.io.req.valid := ex_reg_valid && ex_ctrl.div - div.io.req.bits.dw := ex_ctrl.alu_dw - div.io.req.bits.fn := ex_ctrl.alu_fn - div.io.req.bits.in1 := ex_rs(0) - div.io.req.bits.in2 := ex_rs(1) - div.io.req.bits.tag := ex_waddr - val mul = pipelinedMul.option { - val m = Module(new PipelinedMultiplier(xLen, 2)) - m.io.req.valid := ex_reg_valid && ex_ctrl.mul - m.io.req.bits := div.io.req.bits - m - } - - ex_reg_valid := !ctrl_killd - ex_reg_replay := !take_pc && ibuf.io.inst(0).valid && ibuf.io.inst(0).bits.replay - ex_reg_xcpt := !ctrl_killd && id_xcpt - ex_reg_xcpt_interrupt := !take_pc && ibuf.io.inst(0).valid && csr.io.interrupt - - when (!ctrl_killd) { - ex_ctrl := id_ctrl - ex_reg_rvc := ibuf.io.inst(0).bits.rvc - ex_ctrl.csr := id_csr - when (id_ctrl.fence && id_fence_succ === 0.U) { id_reg_pause := true.B } - when (id_fence_next) { id_reg_fence := true.B } - when (id_xcpt) { // pass PC down ALU writeback pipeline for badaddr - ex_ctrl.alu_fn := FN_ADD - ex_ctrl.alu_dw := DW_XPR - ex_ctrl.sel_alu1 := A1_RS1 // badaddr := instruction - ex_ctrl.sel_alu2 := A2_ZERO - when (id_xcpt1.asUInt.orR) { // badaddr := PC+2 - ex_ctrl.sel_alu1 := A1_PC - ex_ctrl.sel_alu2 := A2_SIZE - ex_reg_rvc := true.B - } - when (bpu.io.xcpt_if || id_xcpt0.asUInt.orR) { // badaddr := PC - ex_ctrl.sel_alu1 := A1_PC - ex_ctrl.sel_alu2 := A2_ZERO - } - } - ex_reg_flush_pipe := id_ctrl.fence_i || id_csr_flush - ex_reg_load_use := id_load_use - ex_reg_hls := usingHypervisor.B && id_system_insn && id_ctrl.mem_cmd.isOneOf(M_XRD, M_XWR, M_HLVX) - ex_reg_mem_size := Mux(usingHypervisor.B && id_system_insn, id_inst(0)(27, 26), id_inst(0)(13, 12)) - when (id_ctrl.mem_cmd.isOneOf(M_SFENCE, M_HFENCEV, M_HFENCEG, M_FLUSH_ALL)) { - ex_reg_mem_size := Cat(id_raddr2 =/= 0.U, id_raddr1 =/= 0.U) - } - when (id_ctrl.mem_cmd === M_SFENCE && csr.io.status.v) { - ex_ctrl.mem_cmd := M_HFENCEV - } - if (tile.dcache.flushOnFenceI) { - when (id_ctrl.fence_i) { - ex_reg_mem_size := 0.U - } - } - - for (i <- 0 until id_raddr.size) { - val do_bypass = id_bypass_src(i).reduce(_||_) - val bypass_src = PriorityEncoder(id_bypass_src(i)) - ex_reg_rs_bypass(i) := do_bypass - ex_reg_rs_lsb(i) := bypass_src - when (id_ren(i) && !do_bypass) { - ex_reg_rs_lsb(i) := id_rs(i)(log2Ceil(bypass_sources.size)-1, 0) - ex_reg_rs_msb(i) := id_rs(i) >> log2Ceil(bypass_sources.size) - } - } - when (id_illegal_insn || id_virtual_insn) { - val inst = Mux(ibuf.io.inst(0).bits.rvc, id_raw_inst(0)(15, 0), id_raw_inst(0)) - ex_reg_rs_bypass(0) := false.B - ex_reg_rs_lsb(0) := inst(log2Ceil(bypass_sources.size)-1, 0) - ex_reg_rs_msb(0) := inst >> log2Ceil(bypass_sources.size) - } - } - when (!ctrl_killd || csr.io.interrupt || ibuf.io.inst(0).bits.replay) { - ex_reg_cause := id_cause - ex_reg_inst := id_inst(0) - ex_reg_raw_inst := id_raw_inst(0) - ex_reg_pc := ibuf.io.pc - ex_reg_btb_resp := ibuf.io.btb_resp - ex_reg_wphit := bpu.io.bpwatch.map { bpw => bpw.ivalid(0) } - ex_reg_set_vconfig := id_set_vconfig && !id_xcpt - } - - // replay inst in ex stage? - val ex_pc_valid = ex_reg_valid || ex_reg_replay || ex_reg_xcpt_interrupt - val wb_dcache_miss = wb_ctrl.mem && !io.dmem.resp.valid - val replay_ex_structural = ex_ctrl.mem && !io.dmem.req.ready || - ex_ctrl.div && !div.io.req.ready || - ex_ctrl.vec && !io.vector.map(_.ex.ready).getOrElse(true.B) - val replay_ex_load_use = wb_dcache_miss && ex_reg_load_use - val replay_ex = ex_reg_replay || (ex_reg_valid && (replay_ex_structural || replay_ex_load_use)) - val ctrl_killx = take_pc_mem_wb || replay_ex || !ex_reg_valid - // detect 2-cycle load-use delay for LB/LH/SC - val ex_slow_bypass = ex_ctrl.mem_cmd === M_XSC || ex_reg_mem_size < 2.U - val ex_sfence = usingVM.B && ex_ctrl.mem && (ex_ctrl.mem_cmd === M_SFENCE || ex_ctrl.mem_cmd === M_HFENCEV || ex_ctrl.mem_cmd === M_HFENCEG) - - val (ex_xcpt, ex_cause) = checkExceptions(List( - (ex_reg_xcpt_interrupt || ex_reg_xcpt, ex_reg_cause))) - - val exCoverCauses = idCoverCauses - coverExceptions(ex_xcpt, ex_cause, "EXECUTE", exCoverCauses) - - // memory stage - val mem_pc_valid = mem_reg_valid || mem_reg_replay || mem_reg_xcpt_interrupt - val mem_br_target = mem_reg_pc.asSInt + - Mux(mem_ctrl.branch && mem_br_taken, ImmGen(IMM_SB, mem_reg_inst), - Mux(mem_ctrl.jal, ImmGen(IMM_UJ, mem_reg_inst), - Mux(mem_reg_rvc, 2.S, 4.S))) - val mem_npc = (Mux(mem_ctrl.jalr || mem_reg_sfence, encodeVirtualAddress(mem_reg_wdata, mem_reg_wdata).asSInt, mem_br_target) & (-2).S).asUInt - val mem_wrong_npc = - Mux(ex_pc_valid, mem_npc =/= ex_reg_pc, - Mux(ibuf.io.inst(0).valid || ibuf.io.imem.valid, mem_npc =/= ibuf.io.pc, true.B)) - val mem_npc_misaligned = !csr.io.status.isa('c'-'a') && mem_npc(1) && !mem_reg_sfence - val mem_int_wdata = Mux(!mem_reg_xcpt && (mem_ctrl.jalr ^ mem_npc_misaligned), mem_br_target, mem_reg_wdata.asSInt).asUInt - val mem_cfi = mem_ctrl.branch || mem_ctrl.jalr || mem_ctrl.jal - val mem_cfi_taken = (mem_ctrl.branch && mem_br_taken) || mem_ctrl.jalr || mem_ctrl.jal - val mem_direction_misprediction = mem_ctrl.branch && mem_br_taken =/= (usingBTB.B && mem_reg_btb_resp.taken) - val mem_misprediction = if (usingBTB) mem_wrong_npc else mem_cfi_taken - take_pc_mem := mem_reg_valid && !mem_reg_xcpt && (mem_misprediction || mem_reg_sfence) - - mem_reg_valid := !ctrl_killx - mem_reg_replay := !take_pc_mem_wb && replay_ex - mem_reg_xcpt := !ctrl_killx && ex_xcpt - mem_reg_xcpt_interrupt := !take_pc_mem_wb && ex_reg_xcpt_interrupt - - // on pipeline flushes, cause mem_npc to hold the sequential npc, which - // will drive the W-stage npc mux - when (mem_reg_valid && mem_reg_flush_pipe) { - mem_reg_sfence := false.B - }.elsewhen (ex_pc_valid) { - mem_ctrl := ex_ctrl - mem_reg_rvc := ex_reg_rvc - mem_reg_load := ex_ctrl.mem && isRead(ex_ctrl.mem_cmd) - mem_reg_store := ex_ctrl.mem && isWrite(ex_ctrl.mem_cmd) - mem_reg_sfence := ex_sfence - mem_reg_btb_resp := ex_reg_btb_resp - mem_reg_flush_pipe := ex_reg_flush_pipe - mem_reg_slow_bypass := ex_slow_bypass - mem_reg_wphit := ex_reg_wphit - mem_reg_set_vconfig := ex_reg_set_vconfig - - mem_reg_cause := ex_cause - mem_reg_inst := ex_reg_inst - mem_reg_raw_inst := ex_reg_raw_inst - mem_reg_mem_size := ex_reg_mem_size - mem_reg_hls_or_dv := io.dmem.req.bits.dv - mem_reg_pc := ex_reg_pc - // IDecode ensured they are 1H - mem_reg_wdata := Mux(ex_reg_set_vconfig, ex_new_vl.getOrElse(alu.io.out), alu.io.out) - mem_br_taken := alu.io.cmp_out - - - when (ex_ctrl.rxs2 && (ex_ctrl.mem || ex_ctrl.rocc || ex_sfence)) { - val size = Mux(ex_ctrl.rocc, log2Ceil(xLen/8).U, ex_reg_mem_size) - mem_reg_rs2 := new StoreGen(size, 0.U, ex_rs(1), coreDataBytes).data - } - if (usingVector) { when (ex_reg_set_vconfig) { - mem_reg_rs2 := ex_new_vconfig.get.asUInt - } } - when (ex_ctrl.jalr && csr.io.status.debug) { - // flush I$ on D-mode JALR to effect uncached fetch without D$ flush - mem_ctrl.fence_i := true.B - mem_reg_flush_pipe := true.B - } - } - - val mem_breakpoint = (mem_reg_load && bpu.io.xcpt_ld) || (mem_reg_store && bpu.io.xcpt_st) - val mem_debug_breakpoint = (mem_reg_load && bpu.io.debug_ld) || (mem_reg_store && bpu.io.debug_st) - val (mem_ldst_xcpt, mem_ldst_cause) = checkExceptions(List( - (mem_debug_breakpoint, CSR.debugTriggerCause.U), - (mem_breakpoint, Causes.breakpoint.U))) - - val (mem_xcpt, mem_cause) = checkExceptions(List( - (mem_reg_xcpt_interrupt || mem_reg_xcpt, mem_reg_cause), - (mem_reg_valid && mem_npc_misaligned, Causes.misaligned_fetch.U), - (mem_reg_valid && mem_ldst_xcpt, mem_ldst_cause))) - - val memCoverCauses = (exCoverCauses ++ List( - (CSR.debugTriggerCause, "DEBUG_TRIGGER"), - (Causes.breakpoint, "BREAKPOINT"), - (Causes.misaligned_fetch, "MISALIGNED_FETCH") - )).distinct - coverExceptions(mem_xcpt, mem_cause, "MEMORY", memCoverCauses) - - val dcache_kill_mem = mem_reg_valid && mem_ctrl.wxd && io.dmem.replay_next // structural hazard on writeback port - val fpu_kill_mem = mem_reg_valid && mem_ctrl.fp && io.fpu.nack_mem - val vec_kill_mem = mem_reg_valid && mem_ctrl.mem && io.vector.map(_.mem.block_mem).getOrElse(false.B) - val vec_kill_all = mem_reg_valid && io.vector.map(_.mem.block_all).getOrElse(false.B) - val replay_mem = dcache_kill_mem || mem_reg_replay || fpu_kill_mem || vec_kill_mem || vec_kill_all - val killm_common = dcache_kill_mem || take_pc_wb || mem_reg_xcpt || !mem_reg_valid - div.io.kill := killm_common && RegNext(div.io.req.fire) - val ctrl_killm = killm_common || mem_xcpt || fpu_kill_mem || vec_kill_mem - - // writeback stage - wb_reg_valid := !ctrl_killm - wb_reg_replay := replay_mem && !take_pc_wb - wb_reg_xcpt := mem_xcpt && !take_pc_wb && !io.vector.map(_.mem.block_all).getOrElse(false.B) - wb_reg_flush_pipe := !ctrl_killm && mem_reg_flush_pipe - when (mem_pc_valid) { - wb_ctrl := mem_ctrl - wb_reg_sfence := mem_reg_sfence - wb_reg_wdata := Mux(!mem_reg_xcpt && mem_ctrl.fp && mem_ctrl.wxd, io.fpu.toint_data, mem_int_wdata) - when (mem_ctrl.rocc || mem_reg_sfence || mem_reg_set_vconfig) { - wb_reg_rs2 := mem_reg_rs2 - } - wb_reg_cause := mem_cause - wb_reg_inst := mem_reg_inst - wb_reg_raw_inst := mem_reg_raw_inst - wb_reg_mem_size := mem_reg_mem_size - wb_reg_hls_or_dv := mem_reg_hls_or_dv - wb_reg_hfence_v := mem_ctrl.mem_cmd === M_HFENCEV - wb_reg_hfence_g := mem_ctrl.mem_cmd === M_HFENCEG - wb_reg_pc := mem_reg_pc - wb_reg_wphit := mem_reg_wphit | bpu.io.bpwatch.map { bpw => (bpw.rvalid(0) && mem_reg_load) || (bpw.wvalid(0) && mem_reg_store) } - wb_reg_set_vconfig := mem_reg_set_vconfig - } - - val (wb_xcpt, wb_cause) = checkExceptions(List( - (wb_reg_xcpt, wb_reg_cause), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.pf.st, Causes.store_page_fault.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.pf.ld, Causes.load_page_fault.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.gf.st, Causes.store_guest_page_fault.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.gf.ld, Causes.load_guest_page_fault.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ae.st, Causes.store_access.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ae.ld, Causes.load_access.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ma.st, Causes.misaligned_store.U), - (wb_reg_valid && wb_ctrl.mem && io.dmem.s2_xcpt.ma.ld, Causes.misaligned_load.U) - )) - - val wbCoverCauses = List( - (Causes.misaligned_store, "MISALIGNED_STORE"), - (Causes.misaligned_load, "MISALIGNED_LOAD"), - (Causes.store_access, "STORE_ACCESS"), - (Causes.load_access, "LOAD_ACCESS") - ) ++ (if(usingVM) List( - (Causes.store_page_fault, "STORE_PAGE_FAULT"), - (Causes.load_page_fault, "LOAD_PAGE_FAULT") - ) else Nil) ++ (if (usingHypervisor) List( - (Causes.store_guest_page_fault, "STORE_GUEST_PAGE_FAULT"), - (Causes.load_guest_page_fault, "LOAD_GUEST_PAGE_FAULT"), - ) else Nil) - coverExceptions(wb_xcpt, wb_cause, "WRITEBACK", wbCoverCauses) - - val wb_pc_valid = wb_reg_valid || wb_reg_replay || wb_reg_xcpt - val wb_wxd = wb_reg_valid && wb_ctrl.wxd - val wb_set_sboard = wb_ctrl.div || wb_dcache_miss || wb_ctrl.rocc || wb_ctrl.vec - val replay_wb_common = io.dmem.s2_nack || wb_reg_replay - val replay_wb_rocc = wb_reg_valid && wb_ctrl.rocc && !io.rocc.cmd.ready - val replay_wb_csr: Bool = wb_reg_valid && csr.io.rw_stall - val replay_wb_vec = wb_reg_valid && io.vector.map(_.wb.replay).getOrElse(false.B) - val replay_wb = replay_wb_common || replay_wb_rocc || replay_wb_csr || replay_wb_vec - take_pc_wb := replay_wb || wb_xcpt || csr.io.eret || wb_reg_flush_pipe - - // writeback arbitration - val dmem_resp_xpu = !io.dmem.resp.bits.tag(0).asBool - val dmem_resp_fpu = io.dmem.resp.bits.tag(0).asBool - val dmem_resp_waddr = io.dmem.resp.bits.tag(5, 1) - val dmem_resp_valid = io.dmem.resp.valid && io.dmem.resp.bits.has_data - val dmem_resp_replay = dmem_resp_valid && io.dmem.resp.bits.replay - - class LLWB extends Bundle { - val data = UInt(xLen.W) - val tag = UInt(5.W) - } - - val ll_arb = Module(new Arbiter(new LLWB, 3)) // div, rocc, vec - ll_arb.io.in.foreach(_.valid := false.B) - ll_arb.io.in.foreach(_.bits := DontCare) - val ll_wdata = WireInit(ll_arb.io.out.bits.data) - val ll_waddr = WireInit(ll_arb.io.out.bits.tag) - val ll_wen = WireInit(ll_arb.io.out.fire) - ll_arb.io.out.ready := !wb_wxd - - div.io.resp.ready := ll_arb.io.in(0).ready - ll_arb.io.in(0).valid := div.io.resp.valid - ll_arb.io.in(0).bits.data := div.io.resp.bits.data - ll_arb.io.in(0).bits.tag := div.io.resp.bits.tag - - if (usingRoCC) { - io.rocc.resp.ready := ll_arb.io.in(1).ready - ll_arb.io.in(1).valid := io.rocc.resp.valid - ll_arb.io.in(1).bits.data := io.rocc.resp.bits.data - ll_arb.io.in(1).bits.tag := io.rocc.resp.bits.rd - } else { - // tie off RoCC - io.rocc.resp.ready := false.B - io.rocc.mem.req.ready := false.B - } - - io.vector.map { v => - v.resp.ready := Mux(v.resp.bits.fp, !(dmem_resp_valid && dmem_resp_fpu), ll_arb.io.in(2).ready) - ll_arb.io.in(2).valid := v.resp.valid && !v.resp.bits.fp - ll_arb.io.in(2).bits.data := v.resp.bits.data - ll_arb.io.in(2).bits.tag := v.resp.bits.rd - } - // Dont care mem since not all RoCC need accessing memory - io.rocc.mem := DontCare - - when (dmem_resp_replay && dmem_resp_xpu) { - ll_arb.io.out.ready := false.B - ll_waddr := dmem_resp_waddr - ll_wen := true.B - } - - val wb_valid = wb_reg_valid && !replay_wb && !wb_xcpt - val wb_wen = wb_valid && wb_ctrl.wxd - val rf_wen = wb_wen || ll_wen - val rf_waddr = Mux(ll_wen, ll_waddr, wb_waddr) - val rf_wdata = Mux(dmem_resp_valid && dmem_resp_xpu, io.dmem.resp.bits.data(xLen-1, 0), - Mux(ll_wen, ll_wdata, - Mux(wb_ctrl.csr =/= CSR.N, csr.io.rw.rdata, - Mux(wb_ctrl.mul, mul.map(_.io.resp.bits.data).getOrElse(wb_reg_wdata), - wb_reg_wdata)))) - when (rf_wen) { rf.write(rf_waddr, rf_wdata) } - - // hook up control/status regfile - csr.io.ungated_clock := clock - csr.io.decode(0).inst := id_inst(0) - csr.io.exception := wb_xcpt - csr.io.cause := wb_cause - csr.io.retire := wb_valid - csr.io.inst(0) := (if (usingCompressed) Cat(Mux(wb_reg_raw_inst(1, 0).andR, wb_reg_inst >> 16, 0.U), wb_reg_raw_inst(15, 0)) else wb_reg_inst) - csr.io.interrupts := io.interrupts - csr.io.hartid := io.hartid - io.fpu.fcsr_rm := csr.io.fcsr_rm - val vector_fcsr_flags = io.vector.map(_.set_fflags.bits).getOrElse(0.U(5.W)) - val vector_fcsr_flags_valid = io.vector.map(_.set_fflags.valid).getOrElse(false.B) - csr.io.fcsr_flags.valid := io.fpu.fcsr_flags.valid | vector_fcsr_flags_valid - csr.io.fcsr_flags.bits := (io.fpu.fcsr_flags.bits & Fill(5, io.fpu.fcsr_flags.valid)) | (vector_fcsr_flags & Fill(5, vector_fcsr_flags_valid)) - io.fpu.time := csr.io.time(31,0) - io.fpu.hartid := io.hartid - csr.io.rocc_interrupt := io.rocc.interrupt - csr.io.pc := wb_reg_pc - - val tval_dmem_addr = !wb_reg_xcpt - val tval_any_addr = tval_dmem_addr || - wb_reg_cause.isOneOf(Causes.breakpoint.U, Causes.fetch_access.U, Causes.fetch_page_fault.U, Causes.fetch_guest_page_fault.U) - val tval_inst = wb_reg_cause === Causes.illegal_instruction.U - val tval_valid = wb_xcpt && (tval_any_addr || tval_inst) - csr.io.gva := wb_xcpt && (tval_any_addr && csr.io.status.v || tval_dmem_addr && wb_reg_hls_or_dv) - csr.io.tval := Mux(tval_valid, encodeVirtualAddress(wb_reg_wdata, wb_reg_wdata), 0.U) - val (htval, mhtinst_read_pseudo) = { - val htval_valid_imem = wb_reg_xcpt && wb_reg_cause === Causes.fetch_guest_page_fault.U - val htval_imem = Mux(htval_valid_imem, io.imem.gpa.bits, 0.U) - assert(!htval_valid_imem || io.imem.gpa.valid) - - val htval_valid_dmem = wb_xcpt && tval_dmem_addr && io.dmem.s2_xcpt.gf.asUInt.orR && !io.dmem.s2_xcpt.pf.asUInt.orR - val htval_dmem = Mux(htval_valid_dmem, io.dmem.s2_gpa, 0.U) - - val htval = (htval_dmem | htval_imem) >> hypervisorExtraAddrBits - // read pseudoinstruction if a guest-page fault is caused by an implicit memory access for VS-stage address translation - val mhtinst_read_pseudo = (io.imem.gpa_is_pte && htval_valid_imem) || (io.dmem.s2_gpa_is_pte && htval_valid_dmem) - (htval, mhtinst_read_pseudo) - } - - csr.io.vector.foreach { v => - v.set_vconfig.valid := wb_reg_set_vconfig && wb_reg_valid - v.set_vconfig.bits := wb_reg_rs2.asTypeOf(new VConfig) - v.set_vs_dirty := wb_valid && wb_ctrl.vec - v.set_vstart.valid := wb_valid && wb_reg_set_vconfig - v.set_vstart.bits := 0.U - } - - io.vector.foreach { v => - when (v.wb.retire || v.wb.xcpt || wb_ctrl.vec) { - csr.io.pc := v.wb.pc - csr.io.retire := v.wb.retire - csr.io.inst(0) := v.wb.inst - when (v.wb.xcpt && !wb_reg_xcpt) { - wb_xcpt := true.B - wb_cause := v.wb.cause - csr.io.tval := v.wb.tval - } - } - v.wb.store_pending := io.dmem.store_pending - v.wb.vxrm := csr.io.vector.get.vxrm - v.wb.frm := csr.io.fcsr_rm - csr.io.vector.get.set_vxsat := v.set_vxsat - when (v.set_vconfig.valid) { - csr.io.vector.get.set_vconfig.valid := true.B - csr.io.vector.get.set_vconfig.bits := v.set_vconfig.bits - } - when (v.set_vstart.valid) { - csr.io.vector.get.set_vstart.valid := true.B - csr.io.vector.get.set_vstart.bits := v.set_vstart.bits - } - } - - csr.io.htval := htval - csr.io.mhtinst_read_pseudo := mhtinst_read_pseudo - io.ptw.ptbr := csr.io.ptbr - io.ptw.hgatp := csr.io.hgatp - io.ptw.vsatp := csr.io.vsatp - (io.ptw.customCSRs.csrs zip csr.io.customCSRs).map { case (lhs, rhs) => lhs <> rhs } - io.ptw.status := csr.io.status - io.ptw.hstatus := csr.io.hstatus - io.ptw.gstatus := csr.io.gstatus - io.ptw.pmp := csr.io.pmp - csr.io.rw.addr := wb_reg_inst(31,20) - csr.io.rw.cmd := CSR.maskCmd(wb_reg_valid, wb_ctrl.csr) - csr.io.rw.wdata := wb_reg_wdata - - - io.rocc.csrs <> csr.io.roccCSRs - io.trace.time := csr.io.time - io.trace.insns := csr.io.trace - if (rocketParams.debugROB.isDefined) { - val sz = rocketParams.debugROB.get.size - if (sz < 1) { // use unsynthesizable ROB - val csr_trace_with_wdata = WireInit(csr.io.trace(0)) - csr_trace_with_wdata.wdata.get := rf_wdata - val should_wb = WireInit((wb_ctrl.wfd || (wb_ctrl.wxd && wb_waddr =/= 0.U)) && !csr.io.trace(0).exception) - val has_wb = WireInit(wb_ctrl.wxd && wb_wen && !wb_set_sboard) - val wb_addr = WireInit(wb_waddr + Mux(wb_ctrl.wfd, 32.U, 0.U)) - - io.vector.foreach { v => when (v.wb.retire) { - should_wb := v.wb.rob_should_wb - has_wb := false.B - wb_addr := Cat(v.wb.rob_should_wb_fp, csr_trace_with_wdata.insn(11,7)) - }} - - DebugROB.pushTrace(clock, reset, - io.hartid, csr_trace_with_wdata, - should_wb, has_wb, wb_addr) - - io.trace.insns(0) := DebugROB.popTrace(clock, reset, io.hartid) - - DebugROB.pushWb(clock, reset, io.hartid, ll_wen, rf_waddr, rf_wdata) - } else { // synthesizable ROB (no FPRs) - require(!usingVector, "Synthesizable ROB does not support vector implementations") - val csr_trace_with_wdata = WireInit(csr.io.trace(0)) - csr_trace_with_wdata.wdata.get := rf_wdata - - val debug_rob = Module(new HardDebugROB(sz, 32)) - debug_rob.io.i_insn := csr_trace_with_wdata - debug_rob.io.should_wb := (wb_ctrl.wfd || (wb_ctrl.wxd && wb_waddr =/= 0.U)) && - !csr.io.trace(0).exception - debug_rob.io.has_wb := wb_ctrl.wxd && wb_wen && !wb_set_sboard - debug_rob.io.tag := wb_waddr + Mux(wb_ctrl.wfd, 32.U, 0.U) - - debug_rob.io.wb_val := ll_wen - debug_rob.io.wb_tag := rf_waddr - debug_rob.io.wb_data := rf_wdata - - io.trace.insns(0) := debug_rob.io.o_insn - } - } else { - io.trace.insns := csr.io.trace - } - for (((iobpw, wphit), bp) <- io.bpwatch zip wb_reg_wphit zip csr.io.bp) { - iobpw.valid(0) := wphit - iobpw.action := bp.control.action - // tie off bpwatch valids - iobpw.rvalid.foreach(_ := false.B) - iobpw.wvalid.foreach(_ := false.B) - iobpw.ivalid.foreach(_ := false.B) - } - - val hazard_targets = Seq((id_ctrl.rxs1 && id_raddr1 =/= 0.U, id_raddr1), - (id_ctrl.rxs2 && id_raddr2 =/= 0.U, id_raddr2), - (id_ctrl.wxd && id_waddr =/= 0.U, id_waddr)) - val fp_hazard_targets = Seq((io.fpu.dec.ren1, id_raddr1), - (io.fpu.dec.ren2, id_raddr2), - (io.fpu.dec.ren3, id_raddr3), - (io.fpu.dec.wen, id_waddr)) - - val sboard = new Scoreboard(32, true) - sboard.clear(ll_wen, ll_waddr) - def id_sboard_clear_bypass(r: UInt) = { - // ll_waddr arrives late when D$ has ECC, so reshuffle the hazard check - if (!tileParams.dcache.get.dataECC.isDefined) ll_wen && ll_waddr === r - else div.io.resp.fire && div.io.resp.bits.tag === r || dmem_resp_replay && dmem_resp_xpu && dmem_resp_waddr === r - } - val id_sboard_hazard = checkHazards(hazard_targets, rd => sboard.read(rd) && !id_sboard_clear_bypass(rd)) - sboard.set(wb_set_sboard && wb_wen, wb_waddr) - - // stall for RAW/WAW hazards on CSRs, loads, AMOs, and mul/div in execute stage. - val ex_cannot_bypass = ex_ctrl.csr =/= CSR.N || ex_ctrl.jalr || ex_ctrl.mem || ex_ctrl.mul || ex_ctrl.div || ex_ctrl.fp || ex_ctrl.rocc || ex_ctrl.vec - val data_hazard_ex = ex_ctrl.wxd && checkHazards(hazard_targets, _ === ex_waddr) - val fp_data_hazard_ex = id_ctrl.fp && ex_ctrl.wfd && checkHazards(fp_hazard_targets, _ === ex_waddr) - val id_ex_hazard = ex_reg_valid && (data_hazard_ex && ex_cannot_bypass || fp_data_hazard_ex) - - // stall for RAW/WAW hazards on CSRs, LB/LH, and mul/div in memory stage. - val mem_mem_cmd_bh = - if (fastLoadWord) (!fastLoadByte).B && mem_reg_slow_bypass - else true.B - val mem_cannot_bypass = mem_ctrl.csr =/= CSR.N || mem_ctrl.mem && mem_mem_cmd_bh || mem_ctrl.mul || mem_ctrl.div || mem_ctrl.fp || mem_ctrl.rocc || mem_ctrl.vec - val data_hazard_mem = mem_ctrl.wxd && checkHazards(hazard_targets, _ === mem_waddr) - val fp_data_hazard_mem = id_ctrl.fp && mem_ctrl.wfd && checkHazards(fp_hazard_targets, _ === mem_waddr) - val id_mem_hazard = mem_reg_valid && (data_hazard_mem && mem_cannot_bypass || fp_data_hazard_mem) - id_load_use := mem_reg_valid && data_hazard_mem && mem_ctrl.mem - val id_vconfig_hazard = id_ctrl.vec && ( - (ex_reg_valid && ex_reg_set_vconfig) || - (mem_reg_valid && mem_reg_set_vconfig) || - (wb_reg_valid && wb_reg_set_vconfig)) - - // stall for RAW/WAW hazards on load/AMO misses and mul/div in writeback. - val data_hazard_wb = wb_ctrl.wxd && checkHazards(hazard_targets, _ === wb_waddr) - val fp_data_hazard_wb = id_ctrl.fp && wb_ctrl.wfd && checkHazards(fp_hazard_targets, _ === wb_waddr) - val id_wb_hazard = wb_reg_valid && (data_hazard_wb && wb_set_sboard || fp_data_hazard_wb) - - val id_stall_fpu = if (usingFPU) { - val fp_sboard = new Scoreboard(32) - fp_sboard.set(((wb_dcache_miss || wb_ctrl.vec) && wb_ctrl.wfd || io.fpu.sboard_set) && wb_valid, wb_waddr) - val v_ll = io.vector.map(v => v.resp.fire && v.resp.bits.fp).getOrElse(false.B) - fp_sboard.clear((dmem_resp_replay && dmem_resp_fpu) || v_ll, io.fpu.ll_resp_tag) - fp_sboard.clear(io.fpu.sboard_clr, io.fpu.sboard_clra) - - checkHazards(fp_hazard_targets, fp_sboard.read _) - } else false.B - - val dcache_blocked = { - // speculate that a blocked D$ will unblock the cycle after a Grant - val blocked = Reg(Bool()) - blocked := !io.dmem.req.ready && io.dmem.clock_enabled && !io.dmem.perf.grant && (blocked || io.dmem.req.valid || io.dmem.s2_nack) - blocked && !io.dmem.perf.grant - } - val rocc_blocked = Reg(Bool()) - rocc_blocked := !wb_xcpt && !io.rocc.cmd.ready && (io.rocc.cmd.valid || rocc_blocked) - - val ctrl_stalld = - id_ex_hazard || id_mem_hazard || id_wb_hazard || id_sboard_hazard || - id_vconfig_hazard || - csr.io.singleStep && (ex_reg_valid || mem_reg_valid || wb_reg_valid) || - id_csr_en && csr.io.decode(0).fp_csr && !io.fpu.fcsr_rdy || - id_csr_en && csr.io.decode(0).vector_csr && id_vec_busy || - id_ctrl.fp && id_stall_fpu || - id_ctrl.mem && dcache_blocked || // reduce activity during D$ misses - id_ctrl.rocc && rocc_blocked || // reduce activity while RoCC is busy - id_ctrl.div && (!(div.io.req.ready || (div.io.resp.valid && !wb_wxd)) || div.io.req.valid) || // reduce odds of replay - !clock_en || - id_do_fence || io.rocc.busy || - csr.io.csr_stall || - id_reg_pause || - io.traceStall - ctrl_killd := !ibuf.io.inst(0).valid || ibuf.io.inst(0).bits.replay || take_pc_mem_wb || ctrl_stalld || csr.io.interrupt - - io.imem.req.valid := take_pc - io.imem.req.bits.speculative := !take_pc_wb - io.imem.req.bits.pc := - Mux(wb_xcpt || csr.io.eret, csr.io.evec, // exception or [m|s]ret - Mux(replay_wb, wb_reg_pc, // replay - mem_npc)) // flush or branch misprediction - io.imem.flush_icache := wb_reg_valid && wb_ctrl.fence_i && !io.dmem.s2_nack - io.imem.might_request := { - imem_might_request_reg := ex_pc_valid || mem_pc_valid || io.ptw.customCSRs.disableICacheClockGate || io.vector.map(_.trap_check_busy).getOrElse(false.B) - imem_might_request_reg - } - io.imem.progress := RegNext(wb_reg_valid && !replay_wb_common) - io.imem.sfence.valid := wb_reg_valid && wb_reg_sfence - io.imem.sfence.bits.rs1 := wb_reg_mem_size(0) - io.imem.sfence.bits.rs2 := wb_reg_mem_size(1) - io.imem.sfence.bits.addr := wb_reg_wdata - io.imem.sfence.bits.asid := wb_reg_rs2 - io.imem.sfence.bits.hv := wb_reg_hfence_v - io.imem.sfence.bits.hg := wb_reg_hfence_g - io.ptw.sfence := io.imem.sfence - - ibuf.io.inst(0).ready := !ctrl_stalld - - io.imem.btb_update.valid := mem_reg_valid && !take_pc_wb && mem_wrong_npc && (!mem_cfi || mem_cfi_taken) - io.imem.btb_update.bits.isValid := mem_cfi - io.imem.btb_update.bits.cfiType := - Mux((mem_ctrl.jal || mem_ctrl.jalr) && mem_waddr(0), CFIType.call, - Mux(mem_ctrl.jalr && (mem_reg_inst(19,15) & regAddrMask.U) === BitPat("b00?01"), CFIType.ret, - Mux(mem_ctrl.jal || mem_ctrl.jalr, CFIType.jump, - CFIType.branch))) - io.imem.btb_update.bits.target := io.imem.req.bits.pc - io.imem.btb_update.bits.br_pc := (if (usingCompressed) mem_reg_pc + Mux(mem_reg_rvc, 0.U, 2.U) else mem_reg_pc) - io.imem.btb_update.bits.pc := ~(~io.imem.btb_update.bits.br_pc | (coreInstBytes*fetchWidth-1).U) - io.imem.btb_update.bits.prediction := mem_reg_btb_resp - io.imem.btb_update.bits.taken := DontCare - - io.imem.bht_update.valid := mem_reg_valid && !take_pc_wb - io.imem.bht_update.bits.pc := io.imem.btb_update.bits.pc - io.imem.bht_update.bits.taken := mem_br_taken - io.imem.bht_update.bits.mispredict := mem_wrong_npc - io.imem.bht_update.bits.branch := mem_ctrl.branch - io.imem.bht_update.bits.prediction := mem_reg_btb_resp.bht - - // Connect RAS in Frontend - io.imem.ras_update := DontCare - - io.fpu.valid := !ctrl_killd && id_ctrl.fp - io.fpu.killx := ctrl_killx - io.fpu.killm := killm_common - io.fpu.inst := id_inst(0) - io.fpu.fromint_data := ex_rs(0) - io.fpu.ll_resp_val := dmem_resp_valid && dmem_resp_fpu - io.fpu.ll_resp_data := (if (minFLen == 32) io.dmem.resp.bits.data_word_bypass else io.dmem.resp.bits.data) - io.fpu.ll_resp_type := io.dmem.resp.bits.size - io.fpu.ll_resp_tag := dmem_resp_waddr - io.fpu.keep_clock_enabled := io.ptw.customCSRs.disableCoreClockGate - - io.fpu.v_sew := csr.io.vector.map(_.vconfig.vtype.vsew).getOrElse(0.U) - - io.vector.map { v => - when (!(dmem_resp_valid && dmem_resp_fpu)) { - io.fpu.ll_resp_val := v.resp.valid && v.resp.bits.fp - io.fpu.ll_resp_data := v.resp.bits.data - io.fpu.ll_resp_type := v.resp.bits.size - io.fpu.ll_resp_tag := v.resp.bits.rd - } - } - - io.vector.foreach { v => - v.ex.valid := ex_reg_valid && (ex_ctrl.vec || rocketParams.vector.get.issueVConfig.B && ex_reg_set_vconfig) && !ctrl_killx - v.ex.inst := ex_reg_inst - v.ex.vconfig := csr.io.vector.get.vconfig - v.ex.vstart := Mux(mem_reg_valid && mem_ctrl.vec || wb_reg_valid && wb_ctrl.vec, 0.U, csr.io.vector.get.vstart) - v.ex.rs1 := ex_rs(0) - v.ex.rs2 := ex_rs(1) - v.ex.pc := ex_reg_pc - v.mem.frs1 := io.fpu.store_data - v.killm := killm_common - v.status := csr.io.status - } - - - io.dmem.req.valid := ex_reg_valid && ex_ctrl.mem - val ex_dcache_tag = Cat(ex_waddr, ex_ctrl.fp) - require(coreParams.dcacheReqTagBits >= ex_dcache_tag.getWidth) - io.dmem.req.bits.tag := ex_dcache_tag - io.dmem.req.bits.cmd := ex_ctrl.mem_cmd - io.dmem.req.bits.size := ex_reg_mem_size - io.dmem.req.bits.signed := !Mux(ex_reg_hls, ex_reg_inst(20), ex_reg_inst(14)) - io.dmem.req.bits.phys := false.B - io.dmem.req.bits.addr := encodeVirtualAddress(ex_rs(0), alu.io.adder_out) - io.dmem.req.bits.idx.foreach(_ := io.dmem.req.bits.addr) - io.dmem.req.bits.dprv := Mux(ex_reg_hls, csr.io.hstatus.spvp, csr.io.status.dprv) - io.dmem.req.bits.dv := ex_reg_hls || csr.io.status.dv - io.dmem.req.bits.no_resp := !isRead(ex_ctrl.mem_cmd) || (!ex_ctrl.fp && ex_waddr === 0.U) - io.dmem.req.bits.no_alloc := DontCare - io.dmem.req.bits.no_xcpt := DontCare - io.dmem.req.bits.data := DontCare - io.dmem.req.bits.mask := DontCare - - io.dmem.s1_data.data := (if (fLen == 0) mem_reg_rs2 else Mux(mem_ctrl.fp, Fill(coreDataBits / fLen, io.fpu.store_data), mem_reg_rs2)) - io.dmem.s1_data.mask := DontCare - - io.dmem.s1_kill := killm_common || mem_ldst_xcpt || fpu_kill_mem || vec_kill_mem - io.dmem.s2_kill := false.B - // don't let D$ go to sleep if we're probably going to use it soon - io.dmem.keep_clock_enabled := ibuf.io.inst(0).valid && id_ctrl.mem && !csr.io.csr_stall - - io.rocc.cmd.valid := wb_reg_valid && wb_ctrl.rocc && !replay_wb_common - io.rocc.exception := wb_xcpt && csr.io.status.xs.orR - io.rocc.cmd.bits.status := csr.io.status - io.rocc.cmd.bits.inst := wb_reg_inst.asTypeOf(new RoCCInstruction()) - io.rocc.cmd.bits.rs1 := wb_reg_wdata - io.rocc.cmd.bits.rs2 := wb_reg_rs2 - - // gate the clock - val unpause = csr.io.time(rocketParams.lgPauseCycles-1, 0) === 0.U || csr.io.inhibit_cycle || io.dmem.perf.release || take_pc - when (unpause) { id_reg_pause := false.B } - io.cease := csr.io.status.cease && !clock_en_reg - io.wfi := csr.io.status.wfi - if (rocketParams.clockGate) { - long_latency_stall := csr.io.csr_stall || io.dmem.perf.blocked || id_reg_pause && !unpause - clock_en := clock_en_reg || ex_pc_valid || (!long_latency_stall && io.imem.resp.valid) - clock_en_reg := - ex_pc_valid || mem_pc_valid || wb_pc_valid || // instruction in flight - io.ptw.customCSRs.disableCoreClockGate || // chicken bit - !div.io.req.ready || // mul/div in flight - usingFPU.B && !io.fpu.fcsr_rdy || // long-latency FPU in flight - io.dmem.replay_next || // long-latency load replaying - (!long_latency_stall && (ibuf.io.inst(0).valid || io.imem.resp.valid)) // instruction pending - - assert(!(ex_pc_valid || mem_pc_valid || wb_pc_valid) || clock_en) - } - - // evaluate performance counters - val icache_blocked = !(io.imem.resp.valid || RegNext(io.imem.resp.valid)) - csr.io.counters foreach { c => c.inc := RegNext(perfEvents.evaluate(c.eventSel)) } - - val coreMonitorBundle = Wire(new CoreMonitorBundle(xLen, fLen)) - - coreMonitorBundle.clock := clock - coreMonitorBundle.reset := reset - coreMonitorBundle.hartid := io.hartid - coreMonitorBundle.timer := csr.io.time(31,0) - coreMonitorBundle.valid := csr.io.trace(0).valid && !csr.io.trace(0).exception - coreMonitorBundle.pc := csr.io.trace(0).iaddr(vaddrBitsExtended-1, 0).sextTo(xLen) - coreMonitorBundle.wrenx := wb_wen && !wb_set_sboard - coreMonitorBundle.wrenf := false.B - coreMonitorBundle.wrdst := wb_waddr - coreMonitorBundle.wrdata := rf_wdata - coreMonitorBundle.rd0src := wb_reg_inst(19,15) - coreMonitorBundle.rd0val := RegNext(RegNext(ex_rs(0))) - coreMonitorBundle.rd1src := wb_reg_inst(24,20) - coreMonitorBundle.rd1val := RegNext(RegNext(ex_rs(1))) - coreMonitorBundle.inst := csr.io.trace(0).insn - coreMonitorBundle.excpt := csr.io.trace(0).exception - coreMonitorBundle.priv_mode := csr.io.trace(0).priv - - if (enableCommitLog) { - val t = csr.io.trace(0) - val rd = wb_waddr - val wfd = wb_ctrl.wfd - val wxd = wb_ctrl.wxd - val has_data = wb_wen && !wb_set_sboard - - when (t.valid && !t.exception) { - when (wfd) { - printf ("%d 0x%x (0x%x) f%d p%d 0xXXXXXXXXXXXXXXXX\n", t.priv, t.iaddr, t.insn, rd, rd+32.U) - } - .elsewhen (wxd && rd =/= 0.U && has_data) { - printf ("%d 0x%x (0x%x) x%d 0x%x\n", t.priv, t.iaddr, t.insn, rd, rf_wdata) - } - .elsewhen (wxd && rd =/= 0.U && !has_data) { - printf ("%d 0x%x (0x%x) x%d p%d 0xXXXXXXXXXXXXXXXX\n", t.priv, t.iaddr, t.insn, rd, rd) - } - .otherwise { - printf ("%d 0x%x (0x%x)\n", t.priv, t.iaddr, t.insn) - } - } - - when (ll_wen && rf_waddr =/= 0.U) { - printf ("x%d p%d 0x%x\n", rf_waddr, rf_waddr, rf_wdata) - } - } - else { - when (csr.io.trace(0).valid) { - printf("C%d: %d [%d] pc=[%x] W[r%d=%x][%d] R[r%d=%x] R[r%d=%x] inst=[%x] DASM(%x) wb_xcpt:%d\n", - io.hartid, coreMonitorBundle.timer, coreMonitorBundle.valid, - coreMonitorBundle.pc, - Mux(wb_ctrl.wxd || wb_ctrl.wfd, coreMonitorBundle.wrdst, 0.U), - Mux(coreMonitorBundle.wrenx, coreMonitorBundle.wrdata, 0.U), - coreMonitorBundle.wrenx, - Mux(wb_ctrl.rxs1 || wb_ctrl.rfs1, coreMonitorBundle.rd0src, 0.U), - Mux(wb_ctrl.rxs1 || wb_ctrl.rfs1, coreMonitorBundle.rd0val, 0.U), - Mux(wb_ctrl.rxs2 || wb_ctrl.rfs2, coreMonitorBundle.rd1src, 0.U), - Mux(wb_ctrl.rxs2 || wb_ctrl.rfs2, coreMonitorBundle.rd1val, 0.U), - coreMonitorBundle.inst, coreMonitorBundle.inst, wb_xcpt) - } - } - - // CoreMonitorBundle for late latency writes - val xrfWriteBundle = Wire(new CoreMonitorBundle(xLen, fLen)) - - xrfWriteBundle.clock := clock - xrfWriteBundle.reset := reset - xrfWriteBundle.hartid := io.hartid - xrfWriteBundle.timer := csr.io.time(31,0) - xrfWriteBundle.valid := false.B - xrfWriteBundle.pc := 0.U - xrfWriteBundle.wrdst := rf_waddr - xrfWriteBundle.wrenx := rf_wen && !(csr.io.trace(0).valid && wb_wen && (wb_waddr === rf_waddr)) - xrfWriteBundle.wrenf := false.B - xrfWriteBundle.wrdata := rf_wdata - xrfWriteBundle.rd0src := 0.U - xrfWriteBundle.rd0val := 0.U - xrfWriteBundle.rd1src := 0.U - xrfWriteBundle.rd1val := 0.U - xrfWriteBundle.inst := 0.U - xrfWriteBundle.excpt := false.B - xrfWriteBundle.priv_mode := csr.io.trace(0).priv - - if (rocketParams.haveSimTimeout) PlusArg.timeout( - name = "max_core_cycles", - docstring = "Kill the emulation after INT rdtime cycles. Off if 0." - )(csr.io.time) - - } // leaving gated-clock domain - val rocketImpl = withClock (gated_clock) { new RocketImpl } - - def checkExceptions(x: Seq[(Bool, UInt)]) = - (WireInit(x.map(_._1).reduce(_||_)), WireInit(PriorityMux(x))) - - def coverExceptions(exceptionValid: Bool, cause: UInt, labelPrefix: String, coverCausesLabels: Seq[(Int, String)]): Unit = { - for ((coverCause, label) <- coverCausesLabels) { - property.cover(exceptionValid && (cause === coverCause.U), s"${labelPrefix}_${label}") - } - } - - def checkHazards(targets: Seq[(Bool, UInt)], cond: UInt => Bool) = - targets.map(h => h._1 && cond(h._2)).reduce(_||_) - - def encodeVirtualAddress(a0: UInt, ea: UInt) = if (vaddrBitsExtended == vaddrBits) ea else { - // efficient means to compress 64-bit VA into vaddrBits+1 bits - // (VA is bad if VA(vaddrBits) != VA(vaddrBits-1)) - val b = vaddrBitsExtended-1 - val a = (a0 >> b).asSInt - val msb = Mux(a === 0.S || a === -1.S, ea(b), !ea(b-1)) - Cat(msb, ea(b-1, 0)) - } - - class Scoreboard(n: Int, zero: Boolean = false) - { - def set(en: Bool, addr: UInt): Unit = update(en, _next | mask(en, addr)) - def clear(en: Bool, addr: UInt): Unit = update(en, _next & ~mask(en, addr)) - def read(addr: UInt): Bool = r(addr) - def readBypassed(addr: UInt): Bool = _next(addr) - - private val _r = RegInit(0.U(n.W)) - private val r = if (zero) (_r >> 1 << 1) else _r - private var _next = r - private var ens = false.B - private def mask(en: Bool, addr: UInt) = Mux(en, 1.U << addr, 0.U) - private def update(en: Bool, update: UInt) = { - _next = update - ens = ens || en - when (ens) { _r := _next } - } - } -} - -class RegFile(n: Int, w: Int, zero: Boolean = false) { - val rf = Mem(n, UInt(w.W)) - private def access(addr: UInt) = rf(~addr(log2Up(n)-1,0)) - private val reads = ArrayBuffer[(UInt,UInt)]() - private var canRead = true - def read(addr: UInt) = { - require(canRead) - reads += addr -> Wire(UInt()) - reads.last._2 := Mux(zero.B && addr === 0.U, 0.U, access(addr)) - reads.last._2 - } - def write(addr: UInt, data: UInt) = { - canRead = false - when (addr =/= 0.U) { - access(addr) := data - for ((raddr, rdata) <- reads) - when (addr === raddr) { rdata := data } - } - } -} - -object ImmGen { - def apply(sel: UInt, inst: UInt) = { - val sign = Mux(sel === IMM_Z, 0.S, inst(31).asSInt) - val b30_20 = Mux(sel === IMM_U, inst(30,20).asSInt, sign) - val b19_12 = Mux(sel =/= IMM_U && sel =/= IMM_UJ, sign, inst(19,12).asSInt) - val b11 = Mux(sel === IMM_U || sel === IMM_Z, 0.S, - Mux(sel === IMM_UJ, inst(20).asSInt, - Mux(sel === IMM_SB, inst(7).asSInt, sign))) - val b10_5 = Mux(sel === IMM_U || sel === IMM_Z, 0.U, inst(30,25)) - val b4_1 = Mux(sel === IMM_U, 0.U, - Mux(sel === IMM_S || sel === IMM_SB, inst(11,8), - Mux(sel === IMM_Z, inst(19,16), inst(24,21)))) - val b0 = Mux(sel === IMM_S, inst(7), - Mux(sel === IMM_I, inst(20), - Mux(sel === IMM_Z, inst(15), 0.U))) - - Cat(sign, b30_20, b19_12, b11, b10_5, b4_1, b0).asSInt - } -} diff --git a/arch/src/main/scala/framework/rocket/RocketSubsystem.scala b/arch/src/main/scala/framework/rocket/RocketSubsystem.scala deleted file mode 100644 index 6248477c..00000000 --- a/arch/src/main/scala/framework/rocket/RocketSubsystem.scala +++ /dev/null @@ -1,64 +0,0 @@ -// See LICENSE.SiFive for license details. - -package framework.rocket - -import org.chipsalliance.cde.config._ - -import freechips.rocketchip.subsystem._ -import freechips.rocketchip.devices.debug.HasPeripheryDebug -import freechips.rocketchip.devices.tilelink.{CanHavePeripheryCLINT, CanHavePeripheryPLIC} -import freechips.rocketchip.prci.{ResetCrossingType, NoResetCrossing, SynchronousCrossing, ClockCrossingType} -import freechips.rocketchip.tile.{RocketTile, RocketTileParams} -import freechips.rocketchip.util.HasCoreMonitorBundles - -import framework.rocket._ - -// currently, this RocketCrossingParamsBB is not used, we use the default RocketCrossingParams -case class RocketCrossingParamsBB( - crossingType: ClockCrossingType = SynchronousCrossing(), - master: HierarchicalElementPortParamsLike = HierarchicalElementMasterPortParams(), - slave: HierarchicalElementSlavePortParams = HierarchicalElementSlavePortParams(), - mmioBaseAddressPrefixWhere: TLBusWrapperLocation = CBUS, - resetCrossingType: ResetCrossingType = NoResetCrossing(), - forceSeparateClockReset: Boolean = false -) extends HierarchicalElementCrossingParamsLike - -case class RocketTileAttachParams( - tileParams: RocketTileParams, - crossingParams: RocketCrossingParams -) extends CanAttachTile { type TileType = RocketTile } - - -case class RocketTileAttachParamsBB( - tileParams: RocketTileParamsBB, - crossingParams: RocketCrossingParams -) extends CanAttachTile { type TileType = RocketTileBB } - -trait HasRocketTiles { - this: BaseSubsystem with InstantiatesHierarchicalElements => - val rocketTiles = totalTiles.values.collect { case r: RocketTile => r } - - def coreMonitorBundles = (rocketTiles map { t => - t.module.core.rocketImpl.coreMonitorBundle - }).toList -} - -class RocketSubsystem(implicit p: Parameters) extends BaseSubsystem - with InstantiatesHierarchicalElements - with HasTileNotificationSinks - with HasTileInputConstants - with CanHavePeripheryCLINT - with CanHavePeripheryPLIC - with HasPeripheryDebug - with HasHierarchicalElementsRootContext - with HasHierarchicalElements - with HasCoreMonitorBundles - with HasRocketTiles -{ - override lazy val module = new RocketSubsystemModuleImp(this) -} - -class RocketSubsystemModuleImp[+L <: RocketSubsystem](_outer: L) extends BaseSubsystemModuleImp(_outer) - with HasHierarchicalElementsRootContextModuleImp { - override lazy val outer = _outer -} diff --git a/arch/src/main/scala/framework/rocket/RocketTileBB.scala b/arch/src/main/scala/framework/rocket/RocketTileBB.scala deleted file mode 100644 index 346e6b19..00000000 --- a/arch/src/main/scala/framework/rocket/RocketTileBB.scala +++ /dev/null @@ -1,278 +0,0 @@ -// See LICENSE.SiFive for license details. -// See LICENSE.Berkeley for license details. - -package framework.rocket - -import chisel3._ - -import org.chipsalliance.cde.config._ -import org.chipsalliance.diplomacy.lazymodule._ -import freechips.rocketchip.rocket._ -import freechips.rocketchip.tile._ - -import freechips.rocketchip.devices.tilelink.{BasicBusBlockerParams, BasicBusBlocker} -import freechips.rocketchip.diplomacy.{ - AddressSet, DisableMonitors, BufferParams -} -import freechips.rocketchip.resources.{ - SimpleDevice, Description, - ResourceAnchors, ResourceBindings, ResourceBinding, Resource, ResourceAddress, -} -import freechips.rocketchip.interrupts.IntIdentityNode -import freechips.rocketchip.tilelink.{TLIdentityNode, TLBuffer} -import freechips.rocketchip.rocket.{ - RocketCoreParams, ICacheParams, DCacheParams, BTBParams, HasHellaCache, - HasICacheFrontend, ScratchpadSlavePort, HasICacheFrontendModule, Rocket -} -import freechips.rocketchip.subsystem.HierarchicalElementCrossingParamsLike -import freechips.rocketchip.prci.{ClockSinkParameters, RationalCrossing, ClockCrossingType} -import freechips.rocketchip.util.{Annotated, InOrderArbiter} - -import freechips.rocketchip.util.BooleanToAugmentedBoolean - -case class RocketTileBoundaryBufferParams(force: Boolean = false) - -case class RocketTileParamsBB( - core: RocketCoreParams = RocketCoreParams(), - icache: Option[ICacheParams] = Some(ICacheParams()), - dcache: Option[DCacheParams] = Some(DCacheParams()), - btb: Option[BTBParams] = Some(BTBParams()), - dataScratchpadBytes: Int = 0, - tileId: Int = 0, - beuAddr: Option[BigInt] = None, - blockerCtrlAddr: Option[BigInt] = None, - clockSinkParams: ClockSinkParameters = ClockSinkParameters(), - boundaryBuffers: Option[RocketTileBoundaryBufferParams] = None - ) extends InstantiableTileParams[RocketTileBB] { - require(icache.isDefined) - require(dcache.isDefined) - val baseName = "rockettile" - val uniqueName = s"${baseName}_$tileId" - def instantiate(crossing: HierarchicalElementCrossingParamsLike, lookup: LookupByHartIdImpl)(implicit p: Parameters): RocketTileBB = { - new RocketTileBB(this, crossing, lookup) - } -} - -class RocketTileBB private( - val rocketParams: RocketTileParamsBB, - crossing: ClockCrossingType, - lookup: LookupByHartIdImpl, - q: Parameters) - extends BaseTile(rocketParams, crossing, lookup, q) - with SinksExternalInterrupts - with SourcesExternalNotifications - with HasLazyRoCC // Use standard HasLazyRoCC instead of HasLazyRoCCBB - with HasHellaCache - with HasICacheFrontend -{ - // Private constructor ensures altered LazyModule.p is used implicitly - def this(params: RocketTileParamsBB, crossing: HierarchicalElementCrossingParamsLike, lookup: LookupByHartIdImpl)(implicit p: Parameters) = - this(params, crossing.crossingType, lookup, p) - - val intOutwardNode = rocketParams.beuAddr map { _ => IntIdentityNode() } - val slaveNode = TLIdentityNode() - val masterNode = visibilityNode - - val dtim_adapter = rocketParams.dcache.flatMap { d => d.scratch.map { s => - LazyModule(new ScratchpadSlavePort(AddressSet.misaligned(s, d.dataScratchpadBytes), lazyCoreParamsView.coreDataBytes, rocketParams.core.useAtomics && !rocketParams.core.useAtomicsOnlyForIO)) - }} - dtim_adapter.foreach(lm => connectTLSlave(lm.node, lm.node.portParams.head.beatBytes)) - - val bus_error_unit = rocketParams.beuAddr map { a => - val beu = LazyModule(new BusErrorUnit(new L1BusErrors, BusErrorUnitParams(a), xLen/8)) - intOutwardNode.get := beu.intNode - connectTLSlave(beu.node, xBytes) - beu - } - - val tile_master_blocker = - rocketParams.blockerCtrlAddr - .map(BasicBusBlockerParams(_, xBytes, masterPortBeatBytes, deadlock = true)) - .map(bp => LazyModule(new BasicBusBlocker(bp))) - - tile_master_blocker.foreach(lm => connectTLSlave(lm.controlNode, xBytes)) - - // TODO: this doesn't block other masters, e.g. RoCCs - tlOtherMastersNode := tile_master_blocker.map { _.node := tlMasterXbar.node } getOrElse { tlMasterXbar.node } - masterNode :=* tlOtherMastersNode - DisableMonitors { implicit p => tlSlaveXbar.node :*= slaveNode } - - nDCachePorts += 1 /*core */ + (dtim_adapter.isDefined).toInt + rocketParams.core.vector.map(_.useDCache.toInt).getOrElse(0) - - val dtimProperty = dtim_adapter.map(d => Map( - "sifive,dtim" -> d.device.asProperty)).getOrElse(Nil) - - val itimProperty = frontend.icache.itimProperty.toSeq.flatMap(p => Map("sifive,itim" -> p)) - - val beuProperty = bus_error_unit.map(d => Map( - "sifive,buserror" -> d.device.asProperty)).getOrElse(Nil) - - val cpuDevice: SimpleDevice = new SimpleDevice("cpu", Seq("sifive,rocket0", "riscv")) { - override def parent = Some(ResourceAnchors.cpus) - override def describe(resources: ResourceBindings): Description = { - val Description(name, mapping) = super.describe(resources) - Description(name, mapping ++ cpuProperties ++ nextLevelCacheProperty - ++ tileProperties ++ dtimProperty ++ itimProperty ++ beuProperty) - } - } - - val vector_unit = rocketParams.core.vector.map(v => LazyModule(v.build(p))) - vector_unit.foreach(vu => tlMasterXbar.node :=* vu.atlNode) - vector_unit.foreach(vu => tlOtherMastersNode :=* vu.tlNode) - - - ResourceBinding { - Resource(cpuDevice, "reg").bind(ResourceAddress(rocketParams.tileId)) - } - - override lazy val module = new RocketTileModuleImpBB(this) - - override def makeMasterBoundaryBuffers(crossing: ClockCrossingType)(implicit p: Parameters) = (rocketParams.boundaryBuffers, crossing) match { - case (Some(RocketTileBoundaryBufferParams(true )), _) => TLBuffer() - case (Some(RocketTileBoundaryBufferParams(false)), _: RationalCrossing) => TLBuffer(BufferParams.none, BufferParams.flow, BufferParams.none, BufferParams.flow, BufferParams(1)) - case _ => TLBuffer(BufferParams.none) - } - - override def makeSlaveBoundaryBuffers(crossing: ClockCrossingType)(implicit p: Parameters) = (rocketParams.boundaryBuffers, crossing) match { - case (Some(RocketTileBoundaryBufferParams(true )), _) => TLBuffer() - case (Some(RocketTileBoundaryBufferParams(false)), _: RationalCrossing) => TLBuffer(BufferParams.flow, BufferParams.none, BufferParams.none, BufferParams.none, BufferParams.none) - case _ => TLBuffer(BufferParams.none) - } -} - -class RocketTileModuleImpBB(outer: RocketTileBB) extends BaseTileModuleImp(outer) - with HasFpuOptBB - with HasLazyRoCCModuleBB // Use HasLazyRoCCModuleBB but with standard RoCC types - with HasICacheFrontendModule { - Annotated.params(this, outer.rocketParams) - - val core = Module(new RocketBB(outer)(outer.p)) - // Create RocketBB with modified parameters that include BuildRoCCBB as BuildRoCC - // We override the useRoCC and dcacheArbPorts to include BuildRoCCBB - // if we override after in RocketTileBB it will be too late - // that other modules like dcache will use the original parameters - // ================================================================================= - // implicit val modifiedP: Parameters = outer.p.alterMap(Map( - // BuildRoCC -> (outer.p(BuildRoCC) ++ outer.p(BuildRoCCBB)) - // )) - // val core = Module(new RocketBB(outer)(modifiedP)) - // ================================================================================= - outer.vector_unit.foreach { v => - core.io.vector.get <> v.module.io.core - v.module.io.tlb <> outer.dcache.module.io.tlb_port - } - - // reset vector is connected in the Frontend to s2_pc - core.io.reset_vector := DontCare - - // Report unrecoverable error conditions; for now the only cause is cache ECC errors - outer.reportHalt(List(outer.dcache.module.io.errors)) - - // Report when the tile has ceased to retire instructions; for now the only cause is clock gating - outer.reportCease(outer.rocketParams.core.clockGate.option( - !outer.dcache.module.io.cpu.clock_enabled && - !outer.frontend.module.io.cpu.clock_enabled && - !ptw.io.dpath.clock_enabled && - core.io.cease)) - - outer.reportWFI(Some(core.io.wfi)) - - outer.decodeCoreInterrupts(core.io.interrupts) // Decode the interrupt vector - - outer.bus_error_unit.foreach { beu => - core.io.interrupts.buserror.get := beu.module.io.interrupt - beu.module.io.errors.dcache := outer.dcache.module.io.errors - beu.module.io.errors.icache := outer.frontend.module.io.errors - } - - core.io.interrupts.nmi.foreach { nmi => nmi := outer.nmiSinkNode.get.bundle } - - // Pass through various external constants and reports that were bundle-bridged into the tile - outer.traceSourceNode.bundle <> core.io.trace - core.io.traceStall := outer.traceAuxSinkNode.bundle.stall - outer.bpwatchSourceNode.bundle <> core.io.bpwatch - core.io.hartid := outer.hartIdSinkNode.bundle - require(core.io.hartid.getWidth >= outer.hartIdSinkNode.bundle.getWidth, - s"core hartid wire (${core.io.hartid.getWidth}b) truncates external hartid wire (${outer.hartIdSinkNode.bundle.getWidth}b)") - - // Connect the core pipeline to other intra-tile modules - outer.frontend.module.io.cpu <> core.io.imem - dcachePorts += core.io.dmem // TODO outer.dcachePorts += () => module.core.io.dmem ?? - fpuOpt foreach { fpu => - core.io.fpu :<>= fpu.io.waiveAs[FPUCoreIO](_.cp_req, _.cp_resp) - } - if (fpuOpt.isEmpty) { - core.io.fpu := DontCare - } - outer.vector_unit foreach { v => if (outer.rocketParams.core.vector.get.useDCache) { - dcachePorts += v.module.io.dmem - } else { - v.module.io.dmem := DontCare - } } - core.io.ptw <> ptw.io.dpath - - // Connect the coprocessor interfaces - if (outer.roccs.size > 0) { - cmdRouter.get.io.in <> core.io.rocc.cmd - outer.roccs.foreach{ lm => - lm.module.io.exception := core.io.rocc.exception - lm.module.io.fpu_req.ready := DontCare - lm.module.io.fpu_resp.valid := DontCare - lm.module.io.fpu_resp.bits.data := DontCare - lm.module.io.fpu_resp.bits.exc := DontCare - } - core.io.rocc.resp <> respArb.get.io.out - core.io.rocc.busy <> (cmdRouter.get.io.busy || outer.roccs.map(_.module.io.busy).reduce(_ || _)) - core.io.rocc.interrupt := outer.roccs.map(_.module.io.interrupt).reduce(_ || _) - (core.io.rocc.csrs zip roccCSRIOs.flatten).foreach { t => t._2 <> t._1 } - } else { - // tie off - core.io.rocc.cmd.ready := false.B - core.io.rocc.resp.valid := false.B - core.io.rocc.resp.bits := DontCare - core.io.rocc.busy := DontCare - core.io.rocc.interrupt := DontCare - } - // Dont care mem since not all RoCC need accessing memory - core.io.rocc.mem := DontCare - - // Rocket has higher priority to DTIM than other TileLink clients - outer.dtim_adapter.foreach { lm => dcachePorts += lm.module.io.dmem } - - // TODO eliminate this redundancy - val h = dcachePorts.size - val c = core.dcacheArbPorts - val o = outer.nDCachePorts - require(h == c, s"port list size was $h, core expected $c") - require(h == o, s"port list size was $h, outer counted $o") - // TODO figure out how to move the below into their respective mix-ins - dcacheArb.io.requestor <> dcachePorts.toSeq - ptw.io.requestor <> ptwPorts.toSeq -} - -trait HasFpuOptBB { this: RocketTileModuleImpBB => - val fpuOpt = outer.rocketParams.core.fpu.map(params => Module(new FPU(params)(outer.p))) - fpuOpt.foreach { fpu => - val nRoCCFPUPorts = outer.roccs.count(_.usesFPU) - val nFPUPorts = nRoCCFPUPorts + outer.rocketParams.core.useVector.toInt - if (nFPUPorts > 0) { - val fpArb = Module(new InOrderArbiter(new FPInput()(outer.p), new FPResult()(outer.p), nFPUPorts)) - fpu.io.cp_req <> fpArb.io.out_req - fpArb.io.out_resp <> fpu.io.cp_resp - - val fp_rocc_ios = outer.roccs.filter(_.usesFPU).map(_.module.io) - for (i <- 0 until nRoCCFPUPorts) { - fpArb.io.in_req(i) <> fp_rocc_ios(i).fpu_req - fp_rocc_ios(i).fpu_resp <> fpArb.io.in_resp(i) - } - outer.vector_unit.foreach(vu => { - fpArb.io.in_req(nRoCCFPUPorts) <> vu.module.io.fp_req - vu.module.io.fp_resp <> fpArb.io.in_resp(nRoCCFPUPorts) - }) - } else { - fpu.io.cp_req.valid := false.B - fpu.io.cp_req.bits := DontCare - fpu.io.cp_resp.ready := false.B - } - } -} diff --git a/arch/src/main/scala/framework/switcher/README.md b/arch/src/main/scala/framework/switcher/README.md deleted file mode 100644 index d2092788..00000000 --- a/arch/src/main/scala/framework/switcher/README.md +++ /dev/null @@ -1,69 +0,0 @@ -# Switcher Module - -This directory contains two small but critical adapters that translate between Ball devices' physical memory ports and a unified "virtual line" representation. - -- `ToVirtualLine`: Merges SPAD/ACC physical ports into a uniform virtual interface carrying metadata (`is_acc`, `bank_id`, `rob_id`). -- `ToPhysicalLine`: Restores the virtual interface back into physical SPAD/ACC ports, preserving data and `rob_id`. - -## Goals -- **Unified view**: Provide a single, consistent interface for memory requests and responses regardless of origin (SPAD or ACC). -- **Explicit metadata**: Attach `is_acc`, `bank_id`, and `rob_id` so downstream routing and accounting are straightforward. -- **Clear behavior**: Support either bank-sharing with arbitration or segmented direct mapping, depending on system choice. -- **Type correctness**: Keep widths and masks consistent; no implicit format conversions. - -## Interfaces -- Physical ports: - - `SramReadWithRobId(n, w)`: wraps `SramReadIO(n, w)` plus `rob_id`. - - `SramWriteWithRobId(n, w, mask_len)`: wraps `SramWriteIO(n, w, mask_len)` plus `rob_id`. -- Virtual ports: - - `SramReadWithInfo(n, w)`: `SramReadIO` plus `rob_id`, `is_acc` (Bool), `bank_id`. - - `SramWriteWithInfo(n, w, mask_len)`: `SramWriteIO` plus the same metadata. -- Widths: - - `rob_id` depends on `b.rob_entries`. - - `bank_id` wide enough for `b.sp_banks + b.acc_banks`. - - `is_acc` is a `Bool`. - -## ToVirtualLine -- Inputs: - - `sramRead_i/sramWrite_i`: size `b.sp_banks`. - - `accRead_i/accWrite_i`: size `b.acc_banks`. -- Outputs: - - `sramRead_o/sramWrite_o`: virtual lines. -- Port count (choose one design and keep consistent): - - **Max** design: `numBanks = max(b.sp_banks, b.acc_banks)`. Shared banks arbitrate (typically SPAD priority); tail banks connect to whichever exists. - - **Sum** design: `numBanks = b.sp_banks + b.acc_banks`. Lower range maps SPAD; upper range maps ACC. No arbitration; direct mapping with metadata. -- Read path: - - Drive `req` from physical to virtual; broadcast `resp` from virtual to the selected physical endpoint, using `ready` to consume. - - Set `rob_id/is_acc/bank_id` based on origin and index. Use `false.B/true.B` for `is_acc`. -- Write path: - - Map write `addr/data/mask` and metadata to virtual lines. - - No `resp` channel on writes; only handshake and field mapping. - -## ToPhysicalLine -- Inputs: - - `sramRead_i/sramWrite_i`: virtual lines (same size as ToVirtualLine outputs). -- Outputs: - - `sramRead_o/sramWrite_o`: size `b.sp_banks`. - - `accRead_o/accWrite_o`: size `b.acc_banks`. -- Routing: - - **Max** design: Use `is_acc` and `bank_id` to select the ACC bank; SPAD uses the virtual index `i`. - - **Sum** design: Lower (SPAD) and upper (ACC) segments map 1:1 back to physical ports; `bank_id` marks the internal index. -- Timing notes: - - If read `resp` and meta signals form a combinational loop, consider `RegNext` on meta to break it. - -## Integration -- In `bbus`, each Ball connects physical ports → `ToVirtualLine` → virtual interface. -- Then `ToPhysicalLine` restores to physical ports that connect to `MemRouter`. -- If a strict direct-connect behavior is desired, use the **Sum** design and ensure both sides agree on vector sizes. - -## Common Pitfalls -- Assigning `0.U/1.U` to `is_acc` (must be `false.B/true.B`). -- Using `io.sramRead_i(j)` instead of `io.accRead_i(i)` for ACC reads. -- Typos like `b.acc_Banks` (should be `b.acc_banks`). -- Dangling logical operators (e.g., trailing `||`) causing compile errors. -- Missing write-path mapping, leaving `ready` low and blocking writes. - -## Recommendations -- Decide upfront: **Max + arbitration** vs **Sum + segmented direct mapping** and keep both modules and top-level wiring consistent. -- Align priority policy (SPAD vs ACC) with system requirements if using shared-bank arbitration. -- Keep `rob_id` and `bank_id` consistent to avoid accounting and replay issues. diff --git a/arch/src/main/scala/framework/switcher/ToPhysicalLine.scala b/arch/src/main/scala/framework/switcher/ToPhysicalLine.scala deleted file mode 100644 index 1307e7e8..00000000 --- a/arch/src/main/scala/framework/switcher/ToPhysicalLine.scala +++ /dev/null @@ -1,149 +0,0 @@ -package framework.switcher - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.blink.{SramReadWithRobId, SramWriteWithRobId, SramReadWithInfo, SramWriteWithInfo} - -class ToPhysicalLine(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - - private val numBanks = b.sp_banks + b.acc_banks - - val io = IO(new Bundle { - // Unified virtual input ports (from ToVirtualLine) - val sramRead_i = Vec(numBanks, new SramReadWithInfo(b.spad_bank_entries, b.spad_w)) - val sramWrite_i = Vec(numBanks, new SramWriteWithInfo(b.spad_bank_entries, b.spad_w, b.spad_mask_len)) - - // Physical memory endpoints - val sramRead_o = Vec(b.sp_banks, Flipped(new SramReadWithRobId(b.spad_bank_entries, b.spad_w))) - val sramWrite_o = Vec(b.sp_banks, Flipped(new SramWriteWithRobId(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - - val accRead_o = Vec(b.acc_banks, Flipped(new SramReadWithRobId(b.acc_bank_entries, b.acc_w))) - val accWrite_o = Vec(b.acc_banks, Flipped(new SramWriteWithRobId(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - }) - - // -------------------------------------------------------------------------- - // Default initialization for all physical ports - // -------------------------------------------------------------------------- - - // SPAD read/write ports - for (i <- 0 until b.sp_banks) { - val spR = io.sramRead_o(i) - spR.io.req.valid := false.B - spR.io.req.bits := DontCare - spR.io.resp.ready := false.B - spR.rob_id := 0.U - - val spW = io.sramWrite_o(i) - spW.io.req.valid := false.B - spW.io.req.bits := DontCare - spW.rob_id := 0.U - } - - // ACC read/write ports - for (i <- 0 until b.acc_banks) { - val accR = io.accRead_o(i) - accR.io.req.valid := false.B - accR.io.req.bits := DontCare - accR.io.resp.ready := false.B - accR.rob_id := 0.U - - val accW = io.accWrite_o(i) - accW.io.req.valid := false.B - accW.io.req.bits := DontCare - accW.rob_id := 0.U - } - - // Default values for all virtual ports - for (i <- 0 until numBanks) { - val vR = io.sramRead_i(i) - vR.io.req.ready := false.B - vR.io.resp.valid := false.B - vR.io.resp.bits := DontCare - - val vW = io.sramWrite_i(i) - vW.io.req.ready := false.B - } - - // -------------------------------------------------------------------------- - // Read routing: virtual → SPAD (indices 0 .. sp_banks-1) - // -------------------------------------------------------------------------- - - for (i <- 0 until b.sp_banks) { - val vR = io.sramRead_i(i) - val spR = io.sramRead_o(i) - - // Request path (virtual → SPAD) - spR.io.req.valid := vR.io.req.valid - spR.io.req.bits.addr := vR.io.req.bits.addr - spR.io.req.bits.fromDMA := vR.io.req.bits.fromDMA - spR.rob_id := vR.rob_id - - vR.io.req.ready := spR.io.req.ready - - // Response path (SPAD → virtual) - vR.io.resp.valid := spR.io.resp.valid - vR.io.resp.bits := spR.io.resp.bits - spR.io.resp.ready := vR.io.resp.ready - } - - // -------------------------------------------------------------------------- - // Read routing: virtual → ACC (indices sp_banks .. sp_banks+acc_banks-1) - // -------------------------------------------------------------------------- - - for (i <- 0 until b.acc_banks) { - val idx = i + b.sp_banks - val vR = io.sramRead_i(idx) - val accR = io.accRead_o(i) - - // Request path (virtual → ACC) - accR.io.req.valid := vR.io.req.valid - accR.io.req.bits.addr := vR.io.req.bits.addr - accR.io.req.bits.fromDMA := vR.io.req.bits.fromDMA - accR.rob_id := vR.rob_id - - vR.io.req.ready := accR.io.req.ready - - // Response path (ACC → virtual) - vR.io.resp.valid := accR.io.resp.valid - vR.io.resp.bits := accR.io.resp.bits - accR.io.resp.ready := vR.io.resp.ready - } - - // -------------------------------------------------------------------------- - // Write routing: virtual → SPAD - // -------------------------------------------------------------------------- - - for (i <- 0 until b.sp_banks) { - val vW = io.sramWrite_i(i) - val spW = io.sramWrite_o(i) - - spW.io.req.valid := vW.io.req.valid - spW.io.req.bits.addr := vW.io.req.bits.addr - spW.io.req.bits.data := vW.io.req.bits.data - spW.io.req.bits.mask := vW.io.req.bits.mask - spW.rob_id := vW.rob_id - - vW.io.req.ready := spW.io.req.ready - } - - // -------------------------------------------------------------------------- - // Write routing: virtual → ACC - // -------------------------------------------------------------------------- - - for (i <- 0 until b.acc_banks) { - val idx = i + b.sp_banks - val vW = io.sramWrite_i(idx) - val accW = io.accWrite_o(i) - - accW.io.req.valid := vW.io.req.valid - accW.io.req.bits.addr := vW.io.req.bits.addr - accW.io.req.bits.data := vW.io.req.bits.data - accW.io.req.bits.mask := vW.io.req.bits.mask - accW.rob_id := vW.rob_id - - vW.io.req.ready := accW.io.req.ready - } -} diff --git a/arch/src/main/scala/framework/switcher/ToVirtuaLine.scala b/arch/src/main/scala/framework/switcher/ToVirtuaLine.scala deleted file mode 100644 index 76e07780..00000000 --- a/arch/src/main/scala/framework/switcher/ToVirtuaLine.scala +++ /dev/null @@ -1,167 +0,0 @@ -package framework.switcher - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.memdomain.mem.{SramReadIO, SramWriteIO, SramReadReq, SramReadResp, SramWriteReq} -import framework.balldomain.blink.{SramReadWithRobId, SramWriteWithRobId, SramReadWithInfo, SramWriteWithInfo} - -class ToVirtualLine(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - // Total number of unified virtual banks = sp_banks + acc_banks - private val numBanks = b.sp_banks + b.acc_banks - - val io = IO(new Bundle { - // Physical SRAM/ACC ports - val sramRead_i = Vec(b.sp_banks, new SramReadWithRobId(b.spad_bank_entries, b.spad_w)) - val sramWrite_i = Vec(b.sp_banks, new SramWriteWithRobId(b.spad_bank_entries, b.spad_w, b.spad_mask_len)) - val accRead_i = Vec(b.acc_banks, new SramReadWithRobId(b.acc_bank_entries, b.acc_w)) - val accWrite_i = Vec(b.acc_banks, new SramWriteWithRobId(b.acc_bank_entries, b.acc_w, b.acc_mask_len)) - - // Unified virtual interface - val sramRead_o = Vec(numBanks, Flipped(new SramReadWithInfo(b.spad_bank_entries, b.spad_w))) - val sramWrite_o = Vec(numBanks, Flipped(new SramWriteWithInfo(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - }) - - // -------------------------------------------------------------------------- - // Default initialization for virtual output banks - // -------------------------------------------------------------------------- - - for (i <- 0 until numBanks) { - val vr = io.sramRead_o(i) - vr.io.req.valid := false.B - vr.io.req.bits := DontCare - vr.io.resp.ready := false.B - vr.is_acc := false.B - vr.bank_id := 0.U - vr.rob_id := 0.U - - val vw = io.sramWrite_o(i) - vw.io.req.valid := false.B - vw.io.req.bits := DontCare - vw.is_acc := false.B - vw.bank_id := 0.U - vw.rob_id := 0.U - } - - // Default init for physical inputs - for (i <- 0 until b.sp_banks) { - val spR = io.sramRead_i(i) - spR.io.req.ready := false.B - spR.io.resp.valid := false.B - spR.io.resp.bits := DontCare - - val spW = io.sramWrite_i(i) - spW.io.req.ready := false.B - } - - for (i <- 0 until b.acc_banks) { - val accR = io.accRead_i(i) - accR.io.req.ready := false.B - accR.io.resp.valid := false.B - accR.io.resp.bits := DontCare - - val accW = io.accWrite_i(i) - accW.io.req.ready := false.B - } - - // -------------------------------------------------------------------------- - // Read Path Routing: SPAD → virtual line (low bank index range) - // -------------------------------------------------------------------------- - - for (i <- 0 until b.sp_banks) { - val vR = io.sramRead_o(i) - val sp = io.sramRead_i(i) - val spRq = sp.io.req - - val selSp = spRq.valid - - vR.io.req.valid := selSp - spRq.ready := selSp && vR.io.req.ready - - vR.io.req.bits.addr := spRq.bits.addr - vR.io.req.bits.fromDMA := spRq.bits.fromDMA - - vR.is_acc := false.B - vR.bank_id := i.U(vR.bank_id.getWidth.W) - vR.rob_id := sp.rob_id - - sp.io.resp.valid := vR.io.resp.valid - sp.io.resp.bits := vR.io.resp.bits - vR.io.resp.ready := sp.io.resp.ready && selSp - } - - // -------------------------------------------------------------------------- - // Read Path Routing: ACC → virtual line (higher bank index range) - // -------------------------------------------------------------------------- - - for (i <- 0 until b.acc_banks) { - val j = i + b.sp_banks - val vR = io.sramRead_o(j) - val acc = io.accRead_i(i) - val accRq = acc.io.req - - val selAcc = accRq.valid - - vR.io.req.valid := selAcc - accRq.ready := selAcc && vR.io.req.ready - - vR.io.req.bits.addr := accRq.bits.addr - vR.io.req.bits.fromDMA := accRq.bits.fromDMA - - vR.is_acc := true.B - vR.bank_id := i.U(vR.bank_id.getWidth.W) - vR.rob_id := acc.rob_id - - acc.io.resp.valid := vR.io.resp.valid - acc.io.resp.bits := vR.io.resp.bits - vR.io.resp.ready := acc.io.resp.ready && selAcc - } - - // -------------------------------------------------------------------------- - // Write Path Routing: SPAD → virtual line - // -------------------------------------------------------------------------- - - for (i <- 0 until b.sp_banks) { - val vW = io.sramWrite_o(i) - val sp = io.sramWrite_i(i) - val spRq = sp.io.req - - val selSp = spRq.valid - - vW.io.req.valid := selSp - spRq.ready := selSp && vW.io.req.ready - - vW.io.req.bits.addr := spRq.bits.addr - vW.io.req.bits.data := spRq.bits.data - vW.io.req.bits.mask := spRq.bits.mask - - vW.is_acc := false.B - vW.bank_id := i.U(vW.bank_id.getWidth.W) - vW.rob_id := sp.rob_id - } - - // -------------------------------------------------------------------------- - // Write Path Routing: ACC → virtual line - // -------------------------------------------------------------------------- - - for (i <- 0 until b.acc_banks) { - val j = i + b.sp_banks - val vW = io.sramWrite_o(j) - val acc = io.accWrite_i(i) - val accRq = acc.io.req - - val selAcc = accRq.valid - - vW.io.req.valid := selAcc - accRq.ready := selAcc && vW.io.req.ready - - vW.io.req.bits.addr := accRq.bits.addr - vW.io.req.bits.data := accRq.bits.data - vW.io.req.bits.mask := accRq.bits.mask - - vW.is_acc := true.B - vW.bank_id := i.U(vW.bank_id.getWidth.W) - vW.rob_id := acc.rob_id - } -} diff --git a/arch/src/main/scala/framework/top/GlobalConfig.scala b/arch/src/main/scala/framework/top/GlobalConfig.scala new file mode 100644 index 00000000..ca167cf1 --- /dev/null +++ b/arch/src/main/scala/framework/top/GlobalConfig.scala @@ -0,0 +1,41 @@ +package framework.top + +import upickle.default.{macroRW, ReadWriter} +import chisel3.experimental.SerializableModuleParameter +import framework.memdomain.configs.MemDomainParam +import framework.frontend.configs.FrontendParam +import framework.gpdomain.configs.GpDomainParam +import framework.balldomain.configs.BallDomainParam +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.relu.configs.ReluBallParam +import framework.core.configs.CoreParam +import framework.top.configs.TopConfig + +case class GlobalConfig( + memDomain: MemDomainParam, + frontend: FrontendParam, + gpDomain: GpDomainParam, + ballDomain: BallDomainParam, + vectorBall: VectorBallParam, + reluBall: ReluBallParam, + core: CoreParam, + top: TopConfig) + extends SerializableModuleParameter + +object GlobalConfig { + implicit val rw: ReadWriter[GlobalConfig] = macroRW[GlobalConfig] + + def apply(): GlobalConfig = { + GlobalConfig( + memDomain = MemDomainParam(), + frontend = FrontendParam(), + gpDomain = GpDomainParam(), + ballDomain = BallDomainParam(), + vectorBall = VectorBallParam(), + reluBall = ReluBallParam(), + core = CoreParam(), + top = TopConfig() + ) + } + +} diff --git a/arch/src/main/scala/framework/top/configs/TopConfig.scala b/arch/src/main/scala/framework/top/configs/TopConfig.scala new file mode 100644 index 00000000..3f448654 --- /dev/null +++ b/arch/src/main/scala/framework/top/configs/TopConfig.scala @@ -0,0 +1,18 @@ +package framework.top.configs + +import upickle.default._ + +case class TopConfig( + ballMemChannelNum: Int, + memBallChannelNum: Int, + nCores: Int) + +object TopConfig { + implicit val rw: ReadWriter[TopConfig] = macroRW + + def apply(): TopConfig = { + val jsonStr = scala.io.Source.fromFile("src/main/scala/framework/top/configs/default.json").mkString + read[TopConfig](jsonStr) + } + +} diff --git a/arch/src/main/scala/framework/top/configs/default.json b/arch/src/main/scala/framework/top/configs/default.json new file mode 100644 index 00000000..cbb6925a --- /dev/null +++ b/arch/src/main/scala/framework/top/configs/default.json @@ -0,0 +1,5 @@ +{ + "ballMemChannelNum": 6, + "memBallChannelNum": 6, + "nCores": 1 +} diff --git a/arch/src/main/scala/prototype/README.md b/arch/src/main/scala/prototype/README.md deleted file mode 100644 index 49acdb87..00000000 --- a/arch/src/main/scala/prototype/README.md +++ /dev/null @@ -1,327 +0,0 @@ -# Buckyball Prototype Accelerators - -This directory contains prototype implementations of various domain-specific computation accelerators in the Buckyball framework, covering hardware accelerator designs for machine learning, numerical computation, and data processing domains. - -## Directory Structure - -``` -prototype/ -├── format/ - Data format conversion accelerators -├── im2col/ - Image-to-column transformation accelerator -├── matrix/ - Matrix computation accelerators -├── relu/ - ReLU activation accelerator -├── transpose/ - Matrix transpose accelerator -└── vector/ - Vector processing unit -``` - -## Accelerator Components - -### format/ - Data Format Processing -Implements hardware acceleration for various data format conversions and arithmetic operations: -- **Arithmetic.scala**: Custom arithmetic operation units -- **Dataformat.scala**: Data format conversion and encoding - -**Key Features**: -- Support for multiple data formats (INT8, FP16, FP32, BBFP) -- Abstract arithmetic interface for extensibility -- Concrete implementations for different data types - -**Use Cases**: -- Floating-point format conversion -- Fixed-point arithmetic optimization -- Data compression and decompression -- Mixed-precision computation - -### im2col/ - Image Processing Acceleration -Specialized accelerator for im2col operations in convolutional neural networks: -- **im2col.scala**: Hardware implementation of image-to-column matrix transformation - -**Key Features**: -- Configurable kernel size and stride -- Efficient data reorganization for convolution -- Pipeline-based processing for high throughput -- Support for different input dimensions - -**Use Cases**: -- CNN convolution layer acceleration -- Image preprocessing pipeline -- Feature extraction optimization -- Memory-efficient convolution implementation - -### matrix/ - Matrix Computation Engine -Matrix computation accelerator implementation with multiple modules: - -**Core Components**: -- **bbfpIns_decode.scala**: Instruction decoder for matrix operations -- **bbfp_load.scala**: Data loading unit for matrix operands -- **bbfp_ex.scala**: Execution unit for matrix multiplication -- **bbfp_pe.scala**: Processing Element (PE) array implementation -- **bbfp_control.scala**: Control logic for matrix operations - -**PE Array Architecture**: -- **BBFP_PE**: Individual processing element with weight stationary mode -- **BBFP_PE_Array2x2**: 2×2 PE array building block -- **BBFP_PE_Array16x16**: 16×16 PE array for high-performance computing -- Systolic array dataflow for efficient matrix multiplication - -**Supported Formats**: -- INT8 integer arithmetic -- FP16 half-precision floating-point -- FP32 single-precision floating-point -- BBFP (Brain Floating Point) custom format - -**Use Cases**: -- Deep learning training and inference -- Scientific computing acceleration -- Linear algebra operations -- High-performance GEMM operations - -### relu/ - ReLU Activation -Efficient hardware implementation of ReLU (Rectified Linear Unit) activation: -- **Relu.scala**: Pipelined ReLU accelerator - -**Key Features**: -- Element-wise ReLU computation -- Configurable tile size -- Pipeline-based processing -- Integrated with scratchpad memory - -**Use Cases**: -- Neural network activation layers -- Non-linear transformation -- Post-convolution activation - -### transpose/ - Matrix Transpose -Efficient hardware implementation for matrix transpose operations: -- **Transpose.scala**: Matrix transpose accelerator - -**Key Features**: -- Tile-based transpose for large matrices -- Optimized memory access patterns -- Configurable tile size -- Pipeline-based implementation - -**Use Cases**: -- Matrix operation preprocessing -- Data reorganization and transformation -- Memory access pattern optimization -- Transpose in GEMM operations - -### vector/ - Vector Processing Unit -Vector processing architecture supporting SIMD and multi-threading: - -**Core Components**: -- **VecUnit.scala**: Vector processor top-level module -- **VecCtrlUnit.scala**: Vector control unit for instruction dispatch -- **VecLoadUnit.scala**: Vector load unit for data fetching -- **VecEXUnit.scala**: Vector execution unit with multiple functional units -- **VecStoreUnit.scala**: Vector store unit for result write-back - -**Submodules**: -- **bond/**: Binding and synchronization mechanisms - - Various bond types (VSSBond, VVVBond, VSVBond, VVSBond, VVBond) - - Operand routing and data distribution - -- **op/**: Vector operation implementations - - AddOp, MulOp, CascadeOp, SelectOp, etc. - - Arithmetic and logical operations - -- **thread/**: Multi-threading support - - Thread-level parallelism - - Warp-based execution model - -- **warp/**: Thread bundle management (MeshWarp) - - 16×16 PE mesh for vector operations - - Parallel execution of vector instructions - -**Architecture Highlights**: -- Configurable number of PEs and threads -- Support for various vector operations (add, mul, cascade, select) -- Flexible data routing through bond mechanisms -- High parallelism with warp-level execution - -**Use Cases**: -- Parallel numerical computation -- Signal processing acceleration -- High-performance computing applications -- SIMD-style data processing - -## Design Features - -### Modular Design -Each accelerator adopts modular design for: -- Independent development and testing -- Flexible composition and configuration -- Performance tuning and extension -- Easy integration with Buckyball framework - -### Pipeline Architecture -Most accelerators use deep pipeline design: -- Improved throughput and frequency -- Support for continuous data stream processing -- Optimized resource utilization -- Latency hiding through pipelining - -### Configurable Parameters -Support rich configuration parameters: -- Data width and precision -- Parallelism and pipeline depth -- Cache size and organization -- Interface protocol and timing - -## Integration Method - -### Blink Protocol Interface -All Ball accelerators implement the Blink protocol interface: -```scala -class CustomBall(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module with BallRegist { - val io = IO(new BlinkIO) - def ballId = .U - def Blink = // Implement Blink protocol -} -``` - -**Blink Interface Components**: -- **cmdReq**: Command request interface with rob_id tracking -- **cmdResp**: Command response interface for completion signaling -- **status**: Status signals (ready, valid, idle, complete) -- **sramRead/Write**: SRAM interfaces for scratchpad and accumulator access - -### Memory Interface -Support multiple memory access patterns: -- DMA bulk transfer through MemDomain -- Scratchpad direct access for low-latency operations -- Accumulator access for result accumulation -- Bank-aware memory access (op1 and op2 must access different banks) - -### Configuration Integration -Parameterized through Buckyball configuration system: -```scala -case class BaseConfig( - veclane: Int = 16, // Vector lane width - numVecPE: Int = 16, // Number of vector PEs - numVecThread: Int = 16, // Number of vector threads - // ... more parameters -) -``` - -## Performance Optimization - -### Data Locality -- Optimize data access patterns for spatial and temporal locality -- Reduce memory bandwidth requirements through data reuse -- Improve cache hit rate with tile-based processing -- Scratchpad memory for frequently accessed data - -### Parallel Processing -- Multi-level parallelism design - - Instruction-level parallelism (ILP) through pipelining - - Data-level parallelism (DLP) through vector operations - - Thread-level parallelism (TLP) through multiple warps -- Pipeline parallelism for continuous data flow -- Data parallelism through PE arrays - -### Resource Sharing -- Arithmetic unit reuse across different operations -- Storage resource sharing between modules -- Control logic optimization for area efficiency -- Flexible routing for resource utilization - -## Verification and Testing - -Each accelerator comes with corresponding test cases: -- Functional correctness verification -- Performance benchmark testing -- Boundary condition checking -- Random test generation -- Integration testing with complete system - -## Development Guidelines - -### Adding New Accelerators - -**Steps**: -1. Implement Ball device with BallRegist trait -2. Define Blink protocol interfaces -3. Implement computation logic -4. Add SRAM access logic (respect bank constraints) -5. Register in BBus and Ball RS - -**Example Template**: -```scala -class NewBall(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module with BallRegist { - val io = IO(new BlinkIO) - - def ballId = .U - def Blink = io - - // State machine - val sIdle :: sCompute :: sComplete :: Nil = Enum(3) - val state = RegInit(sIdle) - - // Computation logic - switch(state) { - is(sIdle) { - when(io.cmdReq.fire) { - state := sCompute - } - } - is(sCompute) { - // Perform computation - when(done) { - state := sComplete - } - } - is(sComplete) { - io.cmdResp.valid := true.B - state := sIdle - } - } -} -``` - -### Performance Optimization Tips - -1. **Memory Access**: - - Group memory accesses to same bank - - Use streaming access patterns - - Minimize random access - -2. **Pipeline Design**: - - Balance pipeline stages - - Add registers for timing closure - - Use buffering for throughput - -3. **Resource Utilization**: - - Share expensive resources (multipliers, dividers) - - Use LUTs for simple operations - - Optimize control logic - -### Common Pitfalls - -1. **Bank Conflict**: op1 and op2 accessing same bank - violates design constraint -2. **ROB ID Tracking**: Must forward rob_id from request to response -3. **Ready/Valid Protocol**: Carefully implement handshake to avoid deadlock -4. **Iteration Count**: Properly handle iteration for multi-row operations - -## Related Documentation - -- [Format Conversion](format/README.md) - Data format details -- [Im2col Implementation](im2col/README.md) - Im2col accelerator -- [Matrix Operations](matrix/README.md) - Matrix computation -- [ReLU Activation](relu/README.md) - ReLU implementation -- [Transpose Operations](transpose/README.md) - Matrix transpose -- [Vector Processing](vector/README.md) - Vector unit architecture -- [Blink Protocol](../framework/blink/README.md) - Ball protocol specification - -## Future Enhancements - -Potential areas for extension: -- Support for additional data formats (INT4, BF16) -- Advanced matrix operations (SVD, QR decomposition) -- Fused operations (Conv+ReLU, GEMM+BiasAdd) -- Dynamic reconfiguration for different workloads -- Power management and clock gating -- Advanced synchronization mechanisms diff --git a/arch/src/main/scala/prototype/abft/ABFTSystolicArray.scala b/arch/src/main/scala/prototype/abft/ABFTSystolicArray.scala deleted file mode 100644 index e64742cb..00000000 --- a/arch/src/main/scala/prototype/abft/ABFTSystolicArray.scala +++ /dev/null @@ -1,323 +0,0 @@ -package prototype.abft - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - -/** - * ABFTSystolicArray - Simple systolic array with ABFT (Algorithm-Based Fault Tolerance) - * - * ABFT mechanism: - * - Matrix A has an extra checksum row (sum of each column) - * - Matrix B has an extra checksum column (sum of each row) - * - Result matrix C will have checksum row and column - * - Verify checksums match to detect errors - * - * Simple implementation: process veclane x veclane tiles - */ -class ABFTSystolicArray(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // Command interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Accumulator write interface (for partial sums) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - // Status output - val status = new Status - }) - - // State machine - val idle :: sLoadA :: sLoadB :: sCompute :: sWrite :: sCheck :: complete :: Nil = Enum(7) - val state = RegInit(idle) - - // Registers for matrix A (veclane x veclane) - val matrixA = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.S(b.inputType.getWidth.W))) - )) - ) - - // Registers for matrix B (veclane x veclane) - val matrixB = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.S(b.inputType.getWidth.W))) - )) - ) - - // Result matrix C (veclane x veclane) - val matrixC = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.S(32.W))) - )) - ) - - // Checksum registers - val checksumA_row = RegInit(VecInit(Seq.fill(b.veclane)(0.S(32.W)))) - val checksumB_col = RegInit(VecInit(Seq.fill(b.veclane)(0.S(32.W)))) - val checksumC_row = RegInit(VecInit(Seq.fill(b.veclane)(0.S(32.W)))) - val checksumC_col = RegInit(VecInit(Seq.fill(b.veclane)(0.S(32.W)))) - - // Counters - val rowCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val colCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val op1_addr_reg = RegInit(0.U(10.W)) - val op1_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val op2_addr_reg = RegInit(0.U(10.W)) - val op2_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val wr_addr_reg = RegInit(0.U(10.W)) - val wr_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - val cycle_reg = RegInit(0.U(6.W)) - val iterCnt = RegInit(0.U(32.W)) - - // Error detection flag - val errorDetected = RegInit(false.B) - - // Write data register - val writeDataReg = Reg(UInt(spad_w.W)) - val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) - - // Default SRAM assignments - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // Default accumulator assignments - for (i <- 0 until b.acc_banks) { - io.accWrite(i).req.valid := false.B - io.accWrite(i).req.bits.addr := 0.U - io.accWrite(i).req.bits.data := 0.U - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(0.U(1.W))) - } - - // Command interface defaults - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sLoadA - readCounter := 0.U - rowCounter := 0.U - colCounter := 0.U - writeCounter := 0.U - errorDetected := false.B - - robid_reg := io.cmdReq.bits.rob_id - op1_addr_reg := io.cmdReq.bits.cmd.op1_bank_addr - op1_bank_reg := io.cmdReq.bits.cmd.op1_bank - op2_addr_reg := io.cmdReq.bits.cmd.op2_bank_addr - op2_bank_reg := io.cmdReq.bits.cmd.op2_bank - wr_addr_reg := io.cmdReq.bits.cmd.wr_bank_addr - wr_bank_reg := io.cmdReq.bits.cmd.wr_bank - iter_reg := io.cmdReq.bits.cmd.iter - cycle_reg := (io.cmdReq.bits.cmd.iter +& (b.veclane.U - 1.U)) / b.veclane.U - 1.U - } - } - - is(sLoadA) { - // Load matrix A row by row - when(readCounter < b.veclane.U) { - io.sramRead(op1_bank_reg).req.valid := true.B - io.sramRead(op1_bank_reg).req.bits.addr := op1_addr_reg + readCounter - readCounter := readCounter + 1.U - } - - io.sramRead(op1_bank_reg).resp.ready := true.B - when(io.sramRead(op1_bank_reg).resp.fire) { - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = io.sramRead(op1_bank_reg).resp.bits.data(hi, lo) - matrixA(rowCounter)(col) := raw.asSInt - } - rowCounter := rowCounter + 1.U - } - - when(rowCounter === b.veclane.U) { - state := sLoadB - readCounter := 0.U - rowCounter := 0.U - // Compute checksum for matrix A (sum of each column) - for (col <- 0 until b.veclane) { - checksumA_row(col) := (0 until b.veclane).map(i => matrixA(i)(col)).reduce(_ + _) - } - } - } - - is(sLoadB) { - // Load matrix B row by row (same as matrix A) - when(readCounter < b.veclane.U) { - io.sramRead(op2_bank_reg).req.valid := true.B - io.sramRead(op2_bank_reg).req.bits.addr := op2_addr_reg + readCounter - readCounter := readCounter + 1.U - } - - io.sramRead(op2_bank_reg).resp.ready := true.B - when(io.sramRead(op2_bank_reg).resp.fire) { - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = io.sramRead(op2_bank_reg).resp.bits.data(hi, lo) - matrixB(rowCounter)(col) := raw.asSInt - } - rowCounter := rowCounter + 1.U - } - - when(rowCounter === b.veclane.U) { - state := sCompute - rowCounter := 0.U - colCounter := 0.U - // Compute checksum for matrix B (sum of each row) - for (row <- 0 until b.veclane) { - checksumB_col(row) := (0 until b.veclane).map(j => matrixB(row)(j)).reduce(_ + _) - } - } - } - - is(sCompute) { - // Simple systolic array computation: C[i][j] = sum(A[i][k] * B[k][j]) - // Compute all elements in one cycle (simple implementation) - for (i <- 0 until b.veclane) { - for (j <- 0 until b.veclane) { - val sum = (0 until b.veclane).map(k => matrixA(i)(k) * matrixB(k)(j)).reduce(_ + _) - matrixC(i)(j) := sum - } - } - - // Compute checksums for result matrix C - for (col <- 0 until b.veclane) { - checksumC_row(col) := (0 until b.veclane).map(i => matrixC(i)(col)).reduce(_ + _) - } - for (row <- 0 until b.veclane) { - checksumC_col(row) := (0 until b.veclane).map(j => matrixC(row)(j)).reduce(_ + _) - } - - state := sCheck - } - - is(sCheck) { - // ABFT verification: check if checksums match - // For C = A * B, checksum of row i in C should equal sum(A[i][k] * checksumB_col[k]) - // For C = A * B, checksum of col j in C should equal sum(checksumA_row[k] * B[k][j]) - // Simple check: verify first row and first column checksums - val expectedRowChecksum = (0 until b.veclane).map(k => - matrixA(0)(k) * checksumB_col(k) - ).reduce(_ + _) - - val expectedColChecksum = (0 until b.veclane).map(k => - checksumA_row(k) * matrixB(k)(0) - ).reduce(_ + _) - - val rowMatch = checksumC_row(0) === expectedRowChecksum - val colMatch = checksumC_col(0) === expectedColChecksum - - errorDetected := !rowMatch || !colMatch - state := sWrite - writeCounter := 0.U - // Prepare first write data (clamp 32-bit result to 8-bit) - writeDataReg := Cat((0 until b.veclane).reverse.map(j => { - val clamped = Mux(matrixC(0)(j) > 127.S, 127.U, - Mux(matrixC(0)(j) < (-128).S, (-128).S.asUInt, - matrixC(0)(j)(b.inputType.getWidth - 1, 0))) - clamped - })) - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 1.U(1.W) - } - } - - is(sWrite) { - // Write results back to scratchpad - io.sramWrite(wr_bank_reg).req.valid := writeCounter < b.veclane.U - io.sramWrite(wr_bank_reg).req.bits.addr := wr_addr_reg + writeCounter - io.sramWrite(wr_bank_reg).req.bits.data := writeDataReg - io.sramWrite(wr_bank_reg).req.bits.mask := writeMaskReg - - when(io.sramWrite(wr_bank_reg).req.fire) { - when(writeCounter === (b.veclane - 1).U) { - state := complete - }.otherwise { - writeCounter := writeCounter + 1.U - // Prepare next row's write data (clamp 32-bit result to 8-bit) - val nextRow = writeCounter + 1.U - writeDataReg := Cat((0 until b.veclane).reverse.map(j => { - val idx = nextRow - val clamped = Mux(matrixC(idx)(j) > 127.S, 127.U, - Mux(matrixC(idx)(j) < (-128).S, (-128).S.asUInt, - matrixC(idx)(j)(b.inputType.getWidth - 1, 0))) - clamped - })) - } - } - } - - is(complete) { - when(cycle_reg === 0.U) { - io.cmdResp.valid := true.B - io.cmdResp.bits.rob_id := robid_reg - when(io.cmdResp.fire) { - iterCnt := iterCnt + 1.U - } - } - state := idle - } - } - - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sLoadA) || (state === sLoadB) - io.status.running := (state === sCompute) || (state === sCheck) || (state === sWrite) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := iterCnt - - // Reset handling - when(reset.asBool) { - for (i <- 0 until b.veclane) { - for (j <- 0 until b.veclane) { - matrixA(i)(j) := 0.S - matrixB(i)(j) := 0.S - matrixC(i)(j) := 0.S - } - checksumA_row(i) := 0.S - checksumB_col(i) := 0.S - checksumC_row(i) := 0.S - checksumC_col(i) := 0.S - } - writeDataReg := 0.U - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 0.U - } - errorDetected := false.B - } -} diff --git a/arch/src/main/scala/prototype/abft/ABFTSystolicArrayBall.scala b/arch/src/main/scala/prototype/abft/ABFTSystolicArrayBall.scala deleted file mode 100644 index 5ae4b703..00000000 --- a/arch/src/main/scala/prototype/abft/ABFTSystolicArrayBall.scala +++ /dev/null @@ -1,57 +0,0 @@ -package prototype.abft - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.abft.ABFTSystolicArray - -/** - * ABFTSystolicArrayBall - A systolic array Ball with ABFT support - * Behavior: Read matrices A and B from Scratchpad, compute C = A * B with ABFT checks, - * then write back to Scratchpad. - */ -class ABFTSystolicArrayBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate ABFT systolic array computation unit - private val abftUnit = Module(new ABFTSystolicArray) - - // Connect command interface - abftUnit.io.cmdReq <> io.cmdReq - abftUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - abftUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - abftUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect Accumulator write interface - for (i <- 0 until b.acc_banks) { - abftUnit.io.accWrite(i) <> io.accWrite(i).io - io.accWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (not used, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> abftUnit.io.status - - override lazy val desiredName: String = "ABFTSystolicArrayBall" -} diff --git a/arch/src/main/scala/prototype/conv/Conv.scala b/arch/src/main/scala/prototype/conv/Conv.scala deleted file mode 100644 index 7e275098..00000000 --- a/arch/src/main/scala/prototype/conv/Conv.scala +++ /dev/null @@ -1,307 +0,0 @@ -package prototype.conv - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - -/** - * NVDLAConvBlackBox - BlackBox wrapper for NVDLA CONV module - * Uses inline verilog to embed NVDLA CSC module - */ -class NVDLAConvBlackBox extends BlackBox with HasBlackBoxInline { - val io = IO(new Bundle { - val clock = Input(Clock()) - val reset = Input(Bool()) - - // Simplified CONV interface - val start = Input(Bool()) - val done = Output(Bool()) - - // Input feature map address - val ifmap_addr = Input(UInt(32.W)) - // Weight address - val weight_addr = Input(UInt(32.W)) - // Output feature map address - val ofmap_addr = Input(UInt(32.W)) - - // Convolution parameters - val in_height = Input(UInt(16.W)) - val in_width = Input(UInt(16.W)) - val in_channels = Input(UInt(16.W)) - val out_channels = Input(UInt(16.W)) - val kernel_h = Input(UInt(8.W)) - val kernel_w = Input(UInt(8.W)) - val stride_h = Input(UInt(8.W)) - val stride_w = Input(UInt(8.W)) - val pad_h = Input(UInt(8.W)) - val pad_w = Input(UInt(8.W)) - - // Data width - val data_width = Input(UInt(8.W)) - }) - - setInline("NVDLAConvBlackBox.v", - s""" - |module NVDLAConvBlackBox( - | input clock, - | input reset, - | input start, - | output reg done, - | input [31:0] ifmap_addr, - | input [31:0] weight_addr, - | input [31:0] ofmap_addr, - | input [15:0] in_height, - | input [15:0] in_width, - | input [15:0] in_channels, - | input [15:0] out_channels, - | input [7:0] kernel_h, - | input [7:0] kernel_w, - | input [7:0] stride_h, - | input [7:0] stride_w, - | input [7:0] pad_h, - | input [7:0] pad_w, - | input [7:0] data_width - |); - | - | reg [31:0] cycle_count; - | reg running; - | - | always @(posedge clock) begin - | if (reset) begin - | done <= 1'b0; - | cycle_count <= 32'b0; - | running <= 1'b0; - | end else begin - | if (start && !running) begin - | running <= 1'b1; - | cycle_count <= 32'b0; - | done <= 1'b0; - | end else if (running) begin - | // Simplified: compute cycles based on convolution size - | // This is a placeholder - actual NVDLA CSC would be instantiated here - | if (cycle_count >= (in_height * in_width * kernel_h * kernel_w * in_channels * out_channels / 64)) begin - | done <= 1'b1; - | running <= 1'b0; - | end else begin - | cycle_count <= cycle_count + 1; - | end - | end - | end - | end - | - |endmodule - """.stripMargin) -} - -/** - * Conv - Convolution computation unit - * Simplified wrapper around NVDLA CONV module - * Reads input feature map and weights from scratchpad, performs convolution, writes output - */ -class Conv(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // Command interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Accumulator write interface (for partial sums in convolution) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - // Status output - val status = new Status - }) - - // State machine - val idle :: sLoadIfmap :: sLoadWeight :: sCompute :: sWrite :: complete :: Nil = Enum(6) - val state = RegInit(idle) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val ifmap_addr_reg = RegInit(0.U(10.W)) - val ifmap_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val weight_addr_reg = RegInit(0.U(10.W)) - val weight_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val ofmap_addr_reg = RegInit(0.U(10.W)) - val ofmap_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - - // Convolution parameters from special field (40 bits total) - // special[15:0] = in_height (16 bits) - // special[31:16] = in_width (16 bits) - // special[39:32] = kernel_h (8 bits) - // Note: kernel_w is encoded in lower 8 bits of kernel_h, or use a different encoding - // For simplicity, we'll use kernel_h for both dimensions or extract from iter - val in_height_reg = RegInit(0.U(16.W)) - val in_width_reg = RegInit(0.U(16.W)) - val kernel_h_reg = RegInit(0.U(8.W)) - val kernel_w_reg = RegInit(0.U(8.W)) - - // Counters - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val computeCounter = RegInit(0.U(32.W)) - - // NVDLA CONV BlackBox instance - val nvdlaConv = Module(new NVDLAConvBlackBox) - nvdlaConv.io.clock := clock - nvdlaConv.io.reset := reset.asBool - - // Default SRAM assignments - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // Default accumulator assignments - for (i <- 0 until b.acc_banks) { - io.accWrite(i).req.valid := false.B - io.accWrite(i).req.bits.addr := 0.U - io.accWrite(i).req.bits.data := 0.U - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(0.U(1.W))) - } - - // Command interface defaults - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // NVDLA CONV interface defaults - nvdlaConv.io.start := false.B - nvdlaConv.io.ifmap_addr := ifmap_addr_reg - nvdlaConv.io.weight_addr := weight_addr_reg - nvdlaConv.io.ofmap_addr := ofmap_addr_reg - nvdlaConv.io.in_height := in_height_reg - nvdlaConv.io.in_width := in_width_reg - nvdlaConv.io.in_channels := 16.U // Default - nvdlaConv.io.out_channels := 16.U // Default - nvdlaConv.io.kernel_h := kernel_h_reg - nvdlaConv.io.kernel_w := kernel_w_reg - nvdlaConv.io.stride_h := 1.U - nvdlaConv.io.stride_w := 1.U - nvdlaConv.io.pad_h := 0.U - nvdlaConv.io.pad_w := 0.U - nvdlaConv.io.data_width := b.inputType.getWidth.U - - // Status output - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sLoadIfmap) || (state === sLoadWeight) - io.status.running := (state === sCompute) || (state === sWrite) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := computeCounter - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sLoadIfmap - readCounter := 0.U - writeCounter := 0.U - computeCounter := 0.U - - robid_reg := io.cmdReq.bits.rob_id - ifmap_addr_reg := io.cmdReq.bits.cmd.op1_bank_addr - ifmap_bank_reg := io.cmdReq.bits.cmd.op1_bank - weight_addr_reg := io.cmdReq.bits.cmd.op2_bank_addr - weight_bank_reg := io.cmdReq.bits.cmd.op2_bank - ofmap_addr_reg := io.cmdReq.bits.cmd.wr_bank_addr - ofmap_bank_reg := io.cmdReq.bits.cmd.wr_bank - iter_reg := io.cmdReq.bits.cmd.iter - - // Extract convolution parameters from special field (40 bits) - in_height_reg := io.cmdReq.bits.cmd.special(15, 0) - in_width_reg := io.cmdReq.bits.cmd.special(31, 16) - kernel_h_reg := io.cmdReq.bits.cmd.special(39, 32) - // kernel_w uses same value as kernel_h for simplicity, or could be encoded differently - kernel_w_reg := io.cmdReq.bits.cmd.special(39, 32) - } - } - - is(sLoadIfmap) { - // Load input feature map (simplified: load one tile) - when(readCounter < iter_reg) { - io.sramRead(ifmap_bank_reg).req.valid := true.B - io.sramRead(ifmap_bank_reg).req.bits.addr := ifmap_addr_reg + readCounter - io.sramRead(ifmap_bank_reg).req.bits.fromDMA := false.B - - when(io.sramRead(ifmap_bank_reg).resp.valid) { - io.sramRead(ifmap_bank_reg).resp.ready := true.B - readCounter := readCounter + 1.U - } - }.otherwise { - state := sLoadWeight - readCounter := 0.U - } - } - - is(sLoadWeight) { - // Load weights (simplified: load one tile) - when(readCounter < iter_reg) { - io.sramRead(weight_bank_reg).req.valid := true.B - io.sramRead(weight_bank_reg).req.bits.addr := weight_addr_reg + readCounter - io.sramRead(weight_bank_reg).req.bits.fromDMA := false.B - - when(io.sramRead(weight_bank_reg).resp.valid) { - io.sramRead(weight_bank_reg).resp.ready := true.B - readCounter := readCounter + 1.U - } - }.otherwise { - state := sCompute - readCounter := 0.U - nvdlaConv.io.start := true.B - } - } - - is(sCompute) { - // Wait for NVDLA CONV to complete - when(nvdlaConv.io.done) { - state := sWrite - writeCounter := 0.U - }.otherwise { - computeCounter := computeCounter + 1.U - } - } - - is(sWrite) { - // Write output feature map (simplified: write one tile) - when(writeCounter < iter_reg) { - io.sramWrite(ofmap_bank_reg).req.valid := true.B - io.sramWrite(ofmap_bank_reg).req.bits.addr := ofmap_addr_reg + writeCounter - // Simplified: write zeros as placeholder (actual output would come from NVDLA CONV) - io.sramWrite(ofmap_bank_reg).req.bits.data := 0.U - io.sramWrite(ofmap_bank_reg).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(1.U(1.W))) - - when(io.sramWrite(ofmap_bank_reg).req.ready) { - writeCounter := writeCounter + 1.U - } - }.otherwise { - state := complete - } - } - - is(complete) { - io.cmdResp.valid := true.B - when(io.cmdResp.ready) { - state := idle - } - } - } -} diff --git a/arch/src/main/scala/prototype/conv/ConvBall.scala b/arch/src/main/scala/prototype/conv/ConvBall.scala deleted file mode 100644 index f96ec594..00000000 --- a/arch/src/main/scala/prototype/conv/ConvBall.scala +++ /dev/null @@ -1,57 +0,0 @@ -package prototype.conv - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.conv.Conv - -/** - * ConvBall - A Convolution computation Ball that complies with the Blink protocol - * Behavior: Read input feature map and weights from Scratchpad, perform convolution using NVDLA CONV, - * then write output feature map back to Scratchpad. - */ -class ConvBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate Conv computation unit - private val convUnit = Module(new Conv) - - // Connect command interface - convUnit.io.cmdReq <> io.cmdReq - convUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - convUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - convUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect Accumulator write interface (for partial sums in convolution) - for (i <- 0 until b.acc_banks) { - convUnit.io.accWrite(i) <> io.accWrite(i).io - io.accWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (not used, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> convUnit.io.status - - override lazy val desiredName: String = "ConvBall" -} diff --git a/arch/src/main/scala/prototype/format/Arithmetic.scala b/arch/src/main/scala/prototype/format/Arithmetic.scala deleted file mode 100644 index e2f1f0b4..00000000 --- a/arch/src/main/scala/prototype/format/Arithmetic.scala +++ /dev/null @@ -1,32 +0,0 @@ -package prototype.format - -import chisel3._ -import chisel3.util._ - -// Arithmetic type class -abstract class Arithmetic[T <: Data] { - def add(x: T, y: T): T - def sub(x: T, y: T): T - def mul(x: T, y: T): T - def div(x: T, y: T): T - def gt(x: T, y: T): Bool -} - -// UInt arithmetic implementation -class UIntArithmetic extends Arithmetic[UInt] { - override def add(x: UInt, y: UInt): UInt = x + y - override def sub(x: UInt, y: UInt): UInt = x - y - override def mul(x: UInt, y: UInt): UInt = x * y - override def div(x: UInt, y: UInt): UInt = Mux(y =/= 0.U, x / y, 0.U) - override def gt(x: UInt, y: UInt): Bool = x > y -} - -// Factory -object ArithmeticFactory { - def createArithmetic[T <: Data](dataType: T): Arithmetic[T] = { - dataType match { - case _: UInt => new UIntArithmetic().asInstanceOf[Arithmetic[T]] - case _ => throw new IllegalArgumentException(s"Unsupported data type: ${dataType.getClass}") - } - } -} diff --git a/arch/src/main/scala/prototype/format/Dataformat.scala b/arch/src/main/scala/prototype/format/Dataformat.scala deleted file mode 100644 index 5bdac4df..00000000 --- a/arch/src/main/scala/prototype/format/Dataformat.scala +++ /dev/null @@ -1,54 +0,0 @@ -package prototype.format - -import chisel3._ -import chisel3.util._ - -// Data format definition -abstract class DataFormat { - def width: Int - def dataType: Data - def name: String -} - -// INT8 format -class INT8Format extends DataFormat { - override def width: Int = 8 - override def dataType: Data = UInt(8.W) - override def name: String = "INT8" -} - -// FP16 format -class FP16Format extends DataFormat { - override def width: Int = 16 - // Temporarily use UInt representation, can be extended to Float type later - override def dataType: Data = UInt(16.W) - override def name: String = "FP16" -} - -// FP32 format -class FP32Format extends DataFormat { - override def width: Int = 32 - // Temporarily use UInt representation, can be extended to Float type later - override def dataType: Data = UInt(32.W) - override def name: String = "FP32" -} - - -// Data format factory -object DataFormatFactory { - def create(formatType: String): DataFormat = formatType.toUpperCase match { - case "INT8" => new INT8Format - case "FP16" => new FP16Format - case "FP32" => new FP32Format - case _ => throw new IllegalArgumentException(s"Unsupported data format: $formatType") - } -} - -// Generic data format parameters -case class DataFormatParams( - formatType: String = "INT8" -) { - def format: DataFormat = DataFormatFactory.create(formatType) - def width: Int = format.width - def dataType: Data = format.dataType -} diff --git a/arch/src/main/scala/prototype/format/README.md b/arch/src/main/scala/prototype/format/README.md deleted file mode 100644 index 53021202..00000000 --- a/arch/src/main/scala/prototype/format/README.md +++ /dev/null @@ -1,134 +0,0 @@ -# Data Format Processing Module - -## Overview - -This directory implements data format definitions and arithmetic operation abstractions in Buckyball, providing a unified data type processing interface. Located at `arch/src/main/scala/prototype/format`, it serves as the data format layer, providing type-safe data format support for other prototype accelerators. - -Core components: -- **Dataformat.scala**: Data format definitions and factory classes -- **Arithmetic.scala**: Arithmetic operation type class implementations - -## Code Structure - -``` -format/ -├── Dataformat.scala - Data format definitions -└── Arithmetic.scala - Arithmetic operation abstractions -``` - -### File Dependencies - -**Dataformat.scala** (Format definition layer) -- Defines DataFormat abstract class and concrete format implementations -- Provides DataFormatFactory factory class -- Implements DataFormatParams parameter class - -**Arithmetic.scala** (Operation abstraction layer) -- Defines Arithmetic type class interface -- Implements UIntArithmetic concrete operations -- Provides ArithmeticFactory factory class - -## Module Description - -### Dataformat.scala - -**Main functionality**: Defines supported data format types - -**Format definition**: -```scala -abstract class DataFormat { - def width: Int - def dataType: Data - def name: String -} -``` - -**Supported formats**: -```scala -class INT8Format extends DataFormat { - override def width: Int = 8 - override def dataType: Data = UInt(8.W) - override def name: String = "INT8" -} - -class FP16Format extends DataFormat { - override def width: Int = 16 - override def dataType: Data = UInt(16.W) - override def name: String = "FP16" -} - -class FP32Format extends DataFormat { - override def width: Int = 32 - override def dataType: Data = UInt(32.W) - override def name: String = "FP32" -} -``` - -**Factory class**: -```scala -object DataFormatFactory { - def create(formatType: String): DataFormat = formatType.toUpperCase match { - case "INT8" => new INT8Format - case "FP16" => new FP16Format - case "FP32" => new FP32Format - case _ => throw new IllegalArgumentException(...) - } -} -``` - -**Parameter class**: -```scala -case class DataFormatParams(formatType: String = "INT8") { - def format: DataFormat = DataFormatFactory.create(formatType) - def width: Int = format.width - def dataType: Data = format.dataType -} -``` - -### Arithmetic.scala - -**Main functionality**: Provides type-safe arithmetic operation abstractions - -**Type class definition**: -```scala -abstract class Arithmetic[T <: Data] { - def add(x: T, y: T): T - def sub(x: T, y: T): T - def mul(x: T, y: T): T - def div(x: T, y: T): T - def gt(x: T, y: T): Bool -} -``` - -**UInt implementation**: -```scala -class UIntArithmetic extends Arithmetic[UInt] { - override def add(x: UInt, y: UInt): UInt = x + y - override def sub(x: UInt, y: UInt): UInt = x - y - override def mul(x: UInt, y: UInt): UInt = x * y - override def div(x: UInt, y: UInt): UInt = Mux(y =/= 0.U, x / y, 0.U) - override def gt(x: UInt, y: UInt): Bool = x > y -} -``` - -**Factory class**: -```scala -object ArithmeticFactory { - def createArithmetic[T <: Data](dataType: T): Arithmetic[T] = { - dataType match { - case _: UInt => new UIntArithmetic().asInstanceOf[Arithmetic[T]] - case _ => throw new IllegalArgumentException(...) - } - } -} -``` - -## Usage - -### Notes - -1. **Floating-point support**: FP16 and FP32 currently use UInt representation, can be extended to true floating-point types later -2. **Division by zero protection**: UInt division operation includes division-by-zero check, returns 0 as default value -3. **Type safety**: Uses Scala type system to ensure operation type safety -4. **Extensibility**: Factory pattern supports adding new data formats and arithmetic implementations -5. **Parameterization**: DataFormatParams provides convenient parameterized configuration interface diff --git a/arch/src/main/scala/prototype/ibuki/matmul/LIF.scala b/arch/src/main/scala/prototype/ibuki/matmul/LIF.scala deleted file mode 100644 index 534cd84c..00000000 --- a/arch/src/main/scala/prototype/ibuki/matmul/LIF.scala +++ /dev/null @@ -1,215 +0,0 @@ -package prototype.ibuki.matmul - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - - -class LIF(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // cmd interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Status output - val status = new Status - }) - - // State definitions - val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) - val state = RegInit(idle) - - // Store a veclane x veclane tile - val regArray = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - )) - ) - - // Counters - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val respCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val waddr_reg = RegInit(0.U(10.W)) - val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val raddr_reg = RegInit(0.U(10.W)) - val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - val cycle_reg = RegInit(0.U(6.W)) - val iterCnt = RegInit(0.U(32.W)) - - // LIF parameters from special field - // special[7:0] = threshold (8 bits) - // special[15:8] = leak_factor (8 bits, represents leak rate) - val threshold_reg = RegInit(127.U(8.W)) // Default threshold - val leak_factor_reg = RegInit(240.U(8.W)) // Default leak factor (240/256 ≈ 0.9375) - - // Precompute write data - val writeDataReg = Reg(UInt(spad_w.W)) - val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) - - // SRAM default assignment - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // cmd interface default assignment - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sRead - readCounter := 0.U - respCounter := 0.U - writeCounter := 0.U - - robid_reg := io.cmdReq.bits.rob_id - waddr_reg := io.cmdReq.bits.cmd.wr_bank_addr - wbank_reg := io.cmdReq.bits.cmd.wr_bank - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr - rbank_reg := io.cmdReq.bits.cmd.op1_bank - iter_reg := io.cmdReq.bits.cmd.iter - cycle_reg := (io.cmdReq.bits.cmd.iter +& (b.veclane.U - 1.U)) / b.veclane.U - 1.U - - // Extract LIF parameters from special field - threshold_reg := io.cmdReq.bits.cmd.special(7, 0) - leak_factor_reg := io.cmdReq.bits.cmd.special(15, 8) - } - - when(cycle_reg =/= 0.U) { - state := sRead - readCounter := 0.U - writeCounter := 0.U - respCounter := 0.U - waddr_reg := waddr_reg + b.veclane.U - raddr_reg := raddr_reg + b.veclane.U - cycle_reg := cycle_reg - 1.U - } - } - - is(sRead) { - when(readCounter < b.veclane.U) { - // Issue read request - readCounter := readCounter + 1.U - io.sramRead(rbank_reg).req.valid := true.B - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + readCounter - } - - // Receive response and perform LIF neuron computation - io.sramRead(rbank_reg).resp.ready := true.B - when(io.sramRead(rbank_reg).resp.fire) { - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = io.sramRead(rbank_reg).resp.bits.data(hi, lo) - val signed = raw.asSInt - - // LIF neuron model: - // 1. Leak: membrane_potential = membrane_potential * leak_factor / 256 - // 2. Integrate: (input is already in membrane_potential, so just apply leak) - // 3. Fire: if membrane_potential >= threshold, output spike (threshold value), else output leaked potential - - // Apply leak (multiply by leak_factor, then divide by 256) - // leak_factor is unsigned (0-255), representing leak rate - // Convert to signed for multiplication, then shift right by 8 - val leak_factor_signed = leak_factor_reg.zext.asSInt - val leaked = (signed * leak_factor_signed) >> 8 - - // Fire condition: if leaked >= threshold, output spike (threshold), else output leaked - // For simplicity, we output the threshold value as spike, or the leaked value - val result = Mux(leaked >= threshold_reg.asSInt, - threshold_reg.asSInt, - Mux(leaked < (-threshold_reg).asSInt, - (-threshold_reg).asSInt, - leaked)) - - regArray(respCounter)(col) := result.asUInt - } - respCounter := respCounter + 1.U - } - - when(respCounter === b.veclane.U) { - state := sWrite - // Precompute first write data (row 0, concatenated by column) - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(0)(j))) - // Set write mask (write all) - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 1.U(1.W) - } - } - } - - is(sWrite) { - // Write back results - io.sramWrite(wbank_reg).req.valid := writeCounter < b.veclane.U - io.sramWrite(wbank_reg).req.bits.addr := waddr_reg + writeCounter - io.sramWrite(wbank_reg).req.bits.data := writeDataReg - io.sramWrite(wbank_reg).req.bits.mask := writeMaskReg - - when(writeCounter === (b.veclane - 1).U) { - state := complete - }.otherwise { - writeCounter := writeCounter + 1.U - // Prepare next row's write data - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(writeCounter + 1.U)(j))) - } - } - - is(complete) { - when(cycle_reg === 0.U) { - io.cmdResp.valid := true.B - io.cmdResp.bits.rob_id := robid_reg - when(io.cmdResp.fire) { - iterCnt := iterCnt + 1.U - } - } - state := idle - } - } - - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sRead) && (respCounter < b.veclane.U) - io.status.running := (state === sWrite) || ((state === sRead) && (respCounter === b.veclane.U)) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := iterCnt - - when(reset.asBool) { - for (i <- 0 until b.veclane) { - for (j <- 0 until b.veclane) { - regArray(i)(j) := 0.U - } - } - writeDataReg := 0.U - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 0.U - } - } -} diff --git a/arch/src/main/scala/prototype/ibuki/matmul/LIFMatmulBall.scala b/arch/src/main/scala/prototype/ibuki/matmul/LIFMatmulBall.scala deleted file mode 100644 index f58c67b8..00000000 --- a/arch/src/main/scala/prototype/ibuki/matmul/LIFMatmulBall.scala +++ /dev/null @@ -1,54 +0,0 @@ -package prototype.ibuki.matmul - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.ibuki.matmul.LIF - - -class LIFMatmulBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate LIF computation unit - private val lifUnit = Module(new LIF) - - // Connect command interface - lifUnit.io.cmdReq <> io.cmdReq - lifUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - lifUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - lifUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (LIF does not access accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Accumulator write interface (LIF does not write accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> lifUnit.io.status - - override lazy val desiredName: String = "LIFMatmulBall" -} diff --git a/arch/src/main/scala/prototype/im2col/Im2colBall.scala b/arch/src/main/scala/prototype/im2col/Im2colBall.scala deleted file mode 100644 index 7fe70b5f..00000000 --- a/arch/src/main/scala/prototype/im2col/Im2colBall.scala +++ /dev/null @@ -1,59 +0,0 @@ -package prototype.im2col - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.im2col.Im2col - -/** - * Im2colBall - An Im2col computation Ball that complies with the Blink protocol - */ -class Im2colBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - def Blink: Blink = io - - // Instantiate Im2col - val im2colUnit = Module(new Im2col) - - // Connect command interface - im2colUnit.io.cmdReq <> io.cmdReq - im2colUnit.io.cmdResp <> io.cmdResp - - // Connect SRAM read interface - Im2col needs to read data from scratchpad - for (i <- 0 until b.sp_banks) { - im2colUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect SRAM write interface - Im2col needs to write to scratchpad - for (i <- 0 until b.sp_banks) { - im2colUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Handle Accumulator read interface - Im2col does not read accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramReadIO), we need to drive req.valid, req.bits (outputs) and resp.ready (output) - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Handle Accumulator write interface - Im2col does not write accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramWriteIO), we need to drive req.valid and req.bits (outputs) - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - - // Connect Status signals - directly obtained from internal unit - io.status <> im2colUnit.io.status - - override lazy val desiredName = "Im2colBall" -} diff --git a/arch/src/main/scala/prototype/im2col/im2col.scala b/arch/src/main/scala/prototype/im2col/im2col.scala deleted file mode 100644 index 9e578b0d..00000000 --- a/arch/src/main/scala/prototype/im2col/im2col.scala +++ /dev/null @@ -1,203 +0,0 @@ -package prototype.im2col - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status -import firrtl2.passes.CheckTypes.st - - -class Im2col(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // cmd interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Status output - val status = new Status - }) - - // State definitions - val idle :: read :: read_and_convert :: complete :: Nil = Enum(4) - // Current state register - val state = RegInit(idle) - // Conversion buffer - val ConvertBuffer = RegInit(VecInit(Seq.fill(4)(VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W)))))) - // Row pointer marking top-left corner of convolution window - val rowptr = RegInit(0.U(10.W)) - // Column pointer marking top-left corner of convolution window - val colptr = RegInit(0.U(5.W)) - // Request counter in read state - val reqcounter = RegInit(0.U(5.W)) - // Response counter in read state - val respcounter = RegInit(0.U(5.W)) - // Store current instruction's RoB ID - val robid_reg = RegInit(0.U(10.W)) - // Store kernel row count - val krow_reg = RegInit(0.U(log2Up(b.veclane).W)) - // Store kernel column count - val kcol_reg = RegInit(0.U(log2Up(b.veclane).W)) - // Store input matrix row count - val inrow_reg = RegInit(0.U(10.W)) - // Store input matrix column count - val incol_reg = RegInit(0.U((log2Up(b.veclane) + 1).W)) - // Store starting column number - val startcol_reg = RegInit(0.U((log2Up(b.veclane) + 1).W)) - // Store starting row number - val startrow_reg = RegInit(0.U(10.W)) - // Store write starting address - val waddr_reg = RegInit(0.U(10.W)) - // Store write bank - val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - // Store read starting address - val raddr_reg = RegInit(0.U(10.W)) - // Store read bank - val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - // Batch iteration counter - val iterCnt = RegInit(0.U(32.W)) - - - // SRAM default assignment - for(i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := (state === read) || (state === read_and_convert) - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - // cmd interface default assignment - io.cmdReq.ready := true.B - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := 0.U - - val rowcnt = rowptr - startrow_reg - val colcnt = colptr - startcol_reg - val rowmax = inrow_reg - krow_reg - val colmax = incol_reg - kcol_reg - - switch(state) { - // Idle state, waiting for instruction - is(idle) { - // Instruction arrives, initialize registers - when(io.cmdReq.fire) { - state := read - rowptr := io.cmdReq.bits.cmd.special(37,28) - colptr := io.cmdReq.bits.cmd.special(27,23) - reqcounter := 0.U - respcounter:= 0.U - // Kernel column count - kcol_reg := io.cmdReq.bits.cmd.special(3,0) - // Kernel row count - krow_reg := io.cmdReq.bits.cmd.special(7,4) - // Input matrix column count - incol_reg := io.cmdReq.bits.cmd.special(12,8) - // Input matrix row count - inrow_reg := io.cmdReq.bits.cmd.special(22,13) - // Starting column number - startcol_reg := io.cmdReq.bits.cmd.special(27,23) - // Starting row number - startrow_reg := io.cmdReq.bits.cmd.special(37,28) - robid_reg := io.cmdReq.bits.rob_id - waddr_reg := io.cmdReq.bits.cmd.op2_bank_addr - wbank_reg := io.cmdReq.bits.cmd.op2_bank - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr - rbank_reg := io.cmdReq.bits.cmd.op1_bank - } - } - // Read part of data, fill ConvertBuffer - is(read) { - // Send read request - when(reqcounter < krow_reg) { - reqcounter := reqcounter + 1.U - io.sramRead(rbank_reg).req.valid := true.B - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + reqcounter + startrow_reg - } - // Process read response and store in ConvertBuffer - when(io.sramRead(rbank_reg).resp.fire) { - ConvertBuffer(respcounter) := io.sramRead(rbank_reg).resp.bits.data.asTypeOf(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - respcounter := respcounter + 1.U - } - // Determine whether to transition state - state := Mux(respcounter === krow_reg, read_and_convert, read) - - } - // Convert data and read remaining data, write back to spad - is(read_and_convert) { - // Move pointer - when(colptr <= colmax && rowptr <= rowmax) { - colptr := Mux(colptr === colmax, startcol_reg, colptr + 1.U) - io.sramWrite(wbank_reg).req.valid := true.B - io.sramWrite(wbank_reg).req.bits.addr := waddr_reg + rowcnt * (colmax + 1.U - startcol_reg) + colcnt - io.sramWrite(wbank_reg).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(~0.U(1.W))) - io.sramWrite(wbank_reg).req.bits.data := { - - val window = Wire(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - // Initialize all to 0 first - for (i <- 0 until b.veclane) { - window(i) := 0.U - } - - // Fill window data - for (i <- 0 until 4; j <- 0 until 4) { - when(i.U < krow_reg && j.U < kcol_reg) { - val bufferRow = (rowcnt + i.U) % krow_reg - val bufferCol = (colptr + j.U) % incol_reg - window((i.U * kcol_reg) + j.U) := ConvertBuffer(bufferRow)(bufferCol) - }.otherwise { - window((i.U * kcol_reg) + j.U) := 0.U - } - } - - // Rearrange data - // For example, for klen_reg=3, combine (00)(01)(02)(10)(11)(12)(20)(21)(22) - Cat((0 until b.veclane).map(i => window(i)).reverse) - } - } - // Send read request early - when(colptr === colmax - 1.U){ - io.sramRead(rbank_reg).req.valid := true.B - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + krow_reg + rowptr - } - // Process read response and store in ConvertBuffer - when(io.sramRead(rbank_reg).resp.fire){ - ConvertBuffer(rowcnt % krow_reg) := io.sramRead(rbank_reg).resp.bits.data.asTypeOf(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - rowptr := rowptr + 1.U - } - // Determine whether to transition state - state := Mux(rowptr === rowmax && colptr === colmax, complete, read_and_convert) - } - // Complete state, send completion signal - is(complete) { - io.cmdResp.valid := true.B - io.cmdResp.bits.rob_id := robid_reg - state := idle - when(io.cmdResp.fire) { - iterCnt := iterCnt + 1.U - } - } - } - - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === read) - io.status.running := (state === read_and_convert) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := iterCnt -} diff --git a/arch/src/main/scala/prototype/matrix/MatrixBall.scala b/arch/src/main/scala/prototype/matrix/MatrixBall.scala deleted file mode 100644 index d920fbf6..00000000 --- a/arch/src/main/scala/prototype/matrix/MatrixBall.scala +++ /dev/null @@ -1,60 +0,0 @@ -package prototype.matrix - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.matrix.BBFP_Control - -/** - * MatrixBall - A matrix computation Ball that complies with the Blink protocol - */ -class MatrixBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - def Blink: Blink = io - - // Instantiate BBFP_Control - val matrixUnit = Module(new BBFP_Control) - - // Connect command interface - matrixUnit.io.cmdReq <> io.cmdReq - matrixUnit.io.cmdResp <> io.cmdResp - - // Set is_matmul_ws signal - matrixUnit.io.is_matmul_ws := false.B // TODO: - - // Connect SRAM read interface - MatrixBall needs to read data from scratchpad - for (i <- 0 until b.sp_banks) { - matrixUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect SRAM write interface - MatrixBall needs to write to scratchpad - for (i <- 0 until b.sp_banks) { - matrixUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Handle Accumulator read interface - MatrixBall does not read accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramReadIO), we need to drive req.valid, req.bits (outputs) and resp.ready (output) - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Connect Accumulator write interface - MatrixBall writes results to accumulator - for (i <- 0 until b.acc_banks) { - matrixUnit.io.accWrite(i) <> io.accWrite(i).io - io.accWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect Status signals - directly obtained from internal unit - io.status <> matrixUnit.io.status - - override lazy val desiredName = "MatrixBall" -} diff --git a/arch/src/main/scala/prototype/matrix/README.md b/arch/src/main/scala/prototype/matrix/README.md deleted file mode 100644 index 7f506d2f..00000000 --- a/arch/src/main/scala/prototype/matrix/README.md +++ /dev/null @@ -1,153 +0,0 @@ -# Matrix Computation Accelerator - -## Overview - -This directory implements Buckyball's matrix computation accelerator for matrix multiplication and related operations. Located at `arch/src/main/scala/prototype/matrix`, it serves as a matrix computation accelerator supporting multiple data formats and operation modes. - -Core components: -- **bbfp_control.scala**: Matrix computation controller -- **bbfp_pe.scala**: Processing Element (PE) and MAC unit -- **bbfp_buffer.scala**: Data buffer management -- **bbfp_load.scala**: Data load unit -- **bbfp_ex.scala**: Execution unit -- **bbfpIns_decode.scala**: Instruction decoder - -## Code Structure - -``` -matrix/ -├── bbfp_control.scala - Controller main module -├── bbfp_pe.scala - Processing element implementation -├── bbfp_buffer.scala - Buffer management -├── bbfp_load.scala - Load unit -├── bbfp_ex.scala - Execution unit -└── bbfpIns_decode.scala - Instruction decode -``` - -### File Dependencies - -**bbfp_control.scala** (Controller layer) -- Integrates submodules (ID, LU, EX, etc.) -- Manages SRAM and Accumulator interfaces -- Handles Ball domain commands - -**bbfp_pe.scala** (Computation core layer) -- Implements MacUnit multiply-accumulate unit -- Defines PEControl control signals -- Handles signed/unsigned operations - -**Other modules** (Functional support layer) -- Provides data buffering, loading, execution and other support functions - -## Module Description - -### bbfp_control.scala - -**Main functionality**: Top-level control module for matrix computation accelerator - -**Module integration**: -```scala -class BBFP_Control extends Module { - val BBFP_ID = Module(new BBFP_ID) - val ID_LU = Module(new ID_LU) - val BBFP_LoadUnit = Module(new BBFP_LoadUnit) - val LU_EX = Module(new LU_EX) -} -``` - -**Interface definition**: -```scala -val io = IO(new Bundle { - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - val is_matmul_ws = Input(Bool()) - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(...))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(...))) - val accRead = Vec(b.acc_banks, Flipped(new SramReadIO(...))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(...))) -}) -``` - -**Data flow**: -``` -cmdReq → BBFP_ID → ID_LU → BBFP_LoadUnit → LU_EX - ↓ - SRAM/ACC interface -``` - -### bbfp_pe.scala - -**Main functionality**: Implements basic processing element for matrix computation - -**MAC unit definition**: -```scala -class MacUnit extends Module { - val io = IO(new Bundle { - val in_a = Input(UInt(7.W)) // [6]=sign, [5]=flag, [4:0]=value - val in_b = Input(UInt(7.W)) // [6]=sign, [5]=flag, [4:0]=value - val in_c = Input(UInt(32.W)) // [31]=sign, [30:0]=value - val out_d = Output(UInt(32.W)) // Output result - }) -} -``` - -**Data format processing**: -```scala -// Extract sign bit and value -val sign_a = io.in_a(6) -val sign_b = io.in_b(6) -val flag_a = io.in_a(5) -val flag_b = io.in_b(5) -val value_a = io.in_a(4, 0) -val value_b = io.in_b(4, 0) - -// Determine left shift based on flag bit -val shifted_a = Mux(flag_a === 1.U, value_a << 2, value_a) -val shifted_b = Mux(flag_b === 1.U, value_b << 2, value_b) -``` - -**Signed arithmetic**: -```scala -val a_signed = Mux(sign_a === 1.U, -(shifted_a.zext), shifted_a.zext).asSInt -val b_signed = Mux(sign_b === 1.U, -(shifted_b.zext), shifted_b.zext).asSInt -``` - -**Control signals**: -```scala -class PEControl extends Bundle { - val propagate = UInt(1.W) // Propagation control -} -``` - -## Usage - -### Data Format - -**Input format**: 7-bit compressed format -- bit[6]: Sign bit (0=positive, 1=negative) -- bit[5]: Flag bit (1=left shift by 2) -- bit[4:0]: 5-bit value - -**Output format**: 32-bit signed number -- bit[31]: Sign bit -- bit[30:0]: 31-bit value - -### Operation Characteristics - -**MAC operation**: Multiply-Accumulate operation -- Supports signed and unsigned operations -- Configurable shift operations -- 32-bit accumulator output - -**Pipeline structure**: -- ID: Instruction decode stage -- LU: Load unit stage -- EX: Execution unit stage - -### Notes - -1. **Data format**: Uses custom 7-bit compressed format to reduce storage overhead -2. **Sign handling**: Supports correct signed number operations and sign extension -3. **Shift optimization**: Controls data preprocessing shift through flag bit -4. **Interface compatibility**: Fully compatible with SRAM and Accumulator interfaces -5. **Pipeline design**: Multi-stage pipeline improves throughput diff --git a/arch/src/main/scala/prototype/matrix/bbfpIns_decode.scala b/arch/src/main/scala/prototype/matrix/bbfpIns_decode.scala deleted file mode 100644 index 014d6542..00000000 --- a/arch/src/main/scala/prototype/matrix/bbfpIns_decode.scala +++ /dev/null @@ -1,104 +0,0 @@ -package prototype.matrix - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.matrix._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig - -class BBFP_ID(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val rob_id_width = log2Up(b.rob_entries) - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle{ - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val is_matmul_ws = Output(Bool()) - val id_lu_o = Decoupled(new id_lu_req) - }) - - val idle :: busy :: Nil = Enum(2) - // Register definitions - val state = RegInit(idle) - val rob_id_reg = RegInit(0.U(rob_id_width.W)) - val iteration_counter = RegInit(0.U(10.W)) - val iteration = RegInit(0.U(10.W)) - val op1_bank = RegInit(0.U(2.W)) - val op1_bank_addr = RegInit(0.U(12.W)) - val op2_bank_addr = RegInit(0.U(12.W)) - val op2_bank = RegInit(0.U(2.W)) - val wr_bank = RegInit(0.U(2.W)) - val wr_bank_addr = RegInit(0.U(12.W)) - val is_matmul_ws = RegInit(false.B) - io.is_matmul_ws := false.B - - switch(state) { - is(idle) { - when(io.cmdReq.valid && io.cmdReq.bits.cmd.bid === 1.U) { - iteration := io.cmdReq.bits.cmd.iter - iteration_counter := 0.U - is_matmul_ws := false.B - rob_id_reg := io.cmdReq.bits.rob_id - op1_bank := io.cmdReq.bits.cmd.op1_bank - op1_bank_addr := io.cmdReq.bits.cmd.op1_bank_addr - op2_bank := io.cmdReq.bits.cmd.op2_bank - op2_bank_addr := io.cmdReq.bits.cmd.op2_bank_addr - wr_bank := io.cmdReq.bits.cmd.wr_bank - wr_bank_addr := io.cmdReq.bits.cmd.wr_bank_addr - state := busy - io.is_matmul_ws := false.B - } - when(io.cmdReq.valid && io.cmdReq.bits.cmd.special(0)){ - iteration := io.cmdReq.bits.cmd.iter - iteration_counter := 0.U - is_matmul_ws := true.B - rob_id_reg := io.cmdReq.bits.rob_id - op1_bank := io.cmdReq.bits.cmd.op1_bank - op1_bank_addr := io.cmdReq.bits.cmd.op1_bank_addr - op2_bank := io.cmdReq.bits.cmd.op2_bank - op2_bank_addr := io.cmdReq.bits.cmd.op2_bank_addr - wr_bank := io.cmdReq.bits.cmd.wr_bank - wr_bank_addr := io.cmdReq.bits.cmd.wr_bank_addr - state := busy - io.is_matmul_ws := true.B - } - } - is(busy) { - iteration_counter := iteration_counter + 1.U - when(iteration_counter === iteration - 1.U) { - iteration_counter := 0.U - state := idle - } - } - } - // Generate ID_LU request - io.id_lu_o.valid := state === busy - io.id_lu_o.bits.op1_bank := op1_bank - io.id_lu_o.bits.op1_bank_addr := op1_bank_addr + b.veclane.U - iteration_counter - 1.U - io.id_lu_o.bits.op2_bank := op2_bank - io.id_lu_o.bits.op2_bank_addr := op2_bank_addr + iteration_counter - io.id_lu_o.bits.wr_bank := wr_bank - io.id_lu_o.bits.wr_bank_addr := wr_bank_addr + iteration_counter - io.id_lu_o.bits.opcode := 1.U - io.id_lu_o.bits.iter := iteration - io.id_lu_o.bits.thread_id := iteration_counter - io.id_lu_o.bits.rob_id := rob_id_reg - - io.cmdReq.ready := io.id_lu_o.ready - - // Instruction completion signal - - - // Delay complete signal by 10 cycles - // val complete_delay = RegInit(VecInit(Seq.fill(10)(false.B))) - // complete_delay(0) := complete - // for (i <- 1 until 10) { - // complete_delay(i) := complete_delay(i-1) - // } - // val complete_10clk = complete_delay(9) - - -} diff --git a/arch/src/main/scala/prototype/matrix/bbfp_buffer.scala b/arch/src/main/scala/prototype/matrix/bbfp_buffer.scala deleted file mode 100644 index bc68a86d..00000000 --- a/arch/src/main/scala/prototype/matrix/bbfp_buffer.scala +++ /dev/null @@ -1,68 +0,0 @@ -package prototype.matrix - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.matrix._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import examples.BuckyballConfigs.CustomBuckyballConfig - -class id_lu_req(implicit b: CustomBuckyballConfig) extends Bundle { - val op1_bank = UInt(log2Up(b.sp_banks).W) - val op1_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val op2_bank = UInt(log2Up(b.sp_banks).W) - val op2_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val wr_bank = UInt(log2Up(b.sp_banks).W) - val wr_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val opcode = UInt(3.W) - val iter = UInt(10.W) - val thread_id = UInt(10.W) - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -class lu_ex_req(implicit b: CustomBuckyballConfig) extends Bundle { - val op1_bank = UInt(log2Up(b.sp_banks).W) - val op2_bank = UInt(log2Up(b.sp_banks).W) - val wr_bank = UInt(log2Up(b.sp_banks).W) - val wr_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val opcode = UInt(3.W) - val iter = UInt(10.W) - val thread_id = UInt(10.W) - val rob_id = UInt(log2Up(b.rob_entries).W) -} - -class ID_LU(implicit b: CustomBuckyballConfig) extends Module{ - val io = IO(new Bundle { - val id_lu_i = Flipped(Decoupled(new id_lu_req)) - val ld_lu_o = Decoupled(new id_lu_req) - }) - // 1-cycle delay register - val delayed_req = RegEnable(io.id_lu_i.bits, io.id_lu_i.fire) - val delayed_valid = RegNext(io.id_lu_i.valid, false.B) - - // Output connection - io.ld_lu_o.bits := delayed_req - io.ld_lu_o.valid := delayed_valid - - // Backpressure: input is ready if output is ready (since we have a 1-slot buffer) - io.id_lu_i.ready := io.ld_lu_o.ready -} - -class LU_EX(implicit b: CustomBuckyballConfig) extends Module{ - val io = IO(new Bundle { - val lu_ex_i = Flipped(Decoupled(new lu_ex_req)) - val lu_ex_o = Decoupled(new lu_ex_req) - }) - // 1-cycle delay register - val delayed_req = RegEnable(io.lu_ex_i.bits, io.lu_ex_i.fire) - val delayed_valid = RegNext(io.lu_ex_i.valid, false.B) - - // Output connection - io.lu_ex_o.bits := delayed_req - io.lu_ex_o.valid := delayed_valid - - // Backpressure: input is ready if output is ready (since we have a 1-slot buffer) - io.lu_ex_i.ready := io.lu_ex_o.ready -} diff --git a/arch/src/main/scala/prototype/matrix/bbfp_control.scala b/arch/src/main/scala/prototype/matrix/bbfp_control.scala deleted file mode 100644 index 877c6232..00000000 --- a/arch/src/main/scala/prototype/matrix/bbfp_control.scala +++ /dev/null @@ -1,97 +0,0 @@ -package prototype.matrix - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.matrix._ -import org.yaml.snakeyaml.events.Event.ID -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import framework.balldomain.blink.Status - -class BBFP_Control(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - val is_matmul_ws = Input(Bool()) - // Connect to Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, b.spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, b.spad_w, b.spad_mask_len))) - - // Connect to Accumulator read/write interface - // val accRead = Vec(b.acc_banks, Flipped(new SramReadIO(b.acc_bank_entries, b.acc_w))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - // Status output - val status = new Status - }) -// ----------------------------------------------------------------------------- -// BBFP_ID -// ----------------------------------------------------------------------------- - val BBFP_ID = Module(new BBFP_ID) - BBFP_ID.io.cmdReq <> io.cmdReq -// ----------------------------------------------------------------------------- -// ID_LU -// ----------------------------------------------------------------------------- - val ID_LU = Module(new ID_LU) - ID_LU.io.id_lu_i <> BBFP_ID.io.id_lu_o - -// ----------------------------------------------------------------------------- -// BBFP_LoadUnit -// ----------------------------------------------------------------------------- - val BBFP_LoadUnit = Module(new BBFP_LoadUnit) - BBFP_LoadUnit.io.id_lu_i <> ID_LU.io.ld_lu_o - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req <> BBFP_LoadUnit.io.sramReadReq(i) - } -// ----------------------------------------------------------------------------- -// LU_EX -// ----------------------------------------------------------------------------- - val LU_EX = Module(new LU_EX) - LU_EX.io.lu_ex_i <> BBFP_LoadUnit.io.lu_ex_o - -// ----------------------------------------------------------------------------- -// BBFP_EX -// ----------------------------------------------------------------------------- - val BBFP_EX = Module(new BBFP_EX) - BBFP_EX.io.lu_ex_i <> LU_EX.io.lu_ex_o - for (i <- 0 until b.sp_banks) { - BBFP_EX.io.sramReadResp(i) <> io.sramRead(i).resp - io.sramWrite(i) <> BBFP_EX.io.sramWrite(i) - } - BBFP_EX.io.is_matmul_ws := io.is_matmul_ws - for (i <- 0 until b.acc_banks) { - io.accWrite(i) <> BBFP_EX.io.accWrite(i) - // io.accRead(i) := DontCare - } - io.cmdResp <> BBFP_EX.io.cmdResp - - // Status tracking - val iterCnt = RegInit(0.U(32.W)) - val hasInput = RegInit(false.B) - val hasOutput = RegInit(false.B) - - when(io.cmdReq.fire) { - hasInput := true.B - } - when(io.cmdResp.fire) { - hasOutput := false.B - hasInput := false.B - iterCnt := iterCnt + 1.U - } - when(io.cmdResp.valid && !hasOutput) { - hasOutput := true.B - } - - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := !hasInput && !hasOutput - io.status.init := hasInput && !hasOutput - io.status.running := hasOutput - io.status.complete := io.cmdResp.fire - io.status.iter := iterCnt - -} diff --git a/arch/src/main/scala/prototype/matrix/bbfp_ex.scala b/arch/src/main/scala/prototype/matrix/bbfp_ex.scala deleted file mode 100644 index d7403431..00000000 --- a/arch/src/main/scala/prototype/matrix/bbfp_ex.scala +++ /dev/null @@ -1,296 +0,0 @@ -package prototype.matrix - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.matrix._ -import framework.memdomain.mem.{SramWriteIO, SramReadIO, SramReadResp} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} - -class BBFP_EX(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val rob_id_width = log2Up(b.rob_entries) - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - val lu_ex_i = Flipped(Decoupled(new lu_ex_req)) - val sramReadResp = Vec(b.sp_banks, Flipped(Decoupled(new SramReadResp(spad_w)))) - val is_matmul_ws = Input(Bool()) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - val cmdResp = Decoupled(new BallRsComplete) - }) - - for(i <- 0 until b.sp_banks) { - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(spad_w / 8)(false.B)) - } - - for(i <- 0 until b.acc_banks) { - io.accWrite(i).req.valid := false.B - io.accWrite(i).req.bits.addr := DontCare - io.accWrite(i).req.bits.data := DontCare - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - } - - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := 0.U - - val idle::weight_load::data_compute::Nil = Enum(3) - val weight_cycles = RegInit(0.U(10.W)) - val act_cycles = RegInit(0.U(10.W)) - val weight_expreg=RegInit(0.U(32.W)) - val act_expreg=RegInit(0.U(32.W)) - // Extract signals from pipeline frontend - val op1_bank_reg = Reg(UInt(io.lu_ex_i.bits.op1_bank.getWidth.W)) - val op2_bank_reg = Reg(UInt(io.lu_ex_i.bits.op2_bank.getWidth.W)) - val wr_bank = Reg(UInt(io.lu_ex_i.bits.wr_bank.getWidth.W)) - val opcode = Reg(UInt(io.lu_ex_i.bits.opcode.getWidth.W)) - - - val act_shift_reg = Reg(Vec(16, Vec(16, UInt(7.W)))) - val col_enable = RegInit(VecInit(Seq.fill(16)(false.B))) - val input_cycle = RegInit(0.U(5.W)) - val act_data_ready = RegInit(false.B) - - - when(io.lu_ex_i.valid) { - op1_bank_reg := io.lu_ex_i.bits.op1_bank - op2_bank_reg := io.lu_ex_i.bits.op2_bank - wr_bank := io.lu_ex_i.bits.wr_bank - opcode := io.lu_ex_i.bits.opcode - } - - val op1_bank = Mux(io.lu_ex_i.valid, io.lu_ex_i.bits.op1_bank, op1_bank_reg) - val op2_bank = Mux(io.lu_ex_i.valid, io.lu_ex_i.bits.op2_bank, op2_bank_reg) - - val state = RegInit(idle) - val pe_array = Module(new BBFP_PE_Array16x16) - - // Use shift registers instead of original regular registers - - - // Activation data input logic - val act_reg_ptr = RegInit(0.U(5.W)) - - // In idle and weight_load phases, store data like regular registers - when(io.sramReadResp(op2_bank).valid && state =/= data_compute && act_data_ready === false.B) { - val data = io.sramReadResp(op2_bank).bits.data - for(i <- 0 until 16) { - act_shift_reg(act_reg_ptr)(i) := data((i+1)*8-1, i*8)(6,0) - } - act_reg_ptr := act_reg_ptr + 1.U - } - when(act_reg_ptr === 16.U && state =/= data_compute) { - act_data_ready := true.B - } - val weight_reg = Reg(Vec(16, UInt(7.W))) - - when(io.sramReadResp(op1_bank).valid && !io.is_matmul_ws) { - val data = io.sramReadResp(op1_bank).bits.data - for(i <- 0 until 16) { - weight_reg(i) := data((i+1)*8-1, i*8)(6,0) - } - } - - // New registers for weight and activation counting - - // Save 64 4x32 registers for output - val output_buffer = Reg(Vec(64, Vec(4, UInt(32.W)))) - // Save parallelogram output - val output_buffer_parallelogram = Reg(Vec(32, Vec(16, UInt(32.W)))) - val output_ptr = RegInit(0.U(5.W)) - val output_ready = RegInit(false.B) - - // New output data processing logic - // Extended to 7 bits to support 64 cycles - val write_cycles = RegInit(0.U(7.W)) - val writing_output = RegInit(false.B) - - val wr_bank_addr_base = Reg(UInt(io.lu_ex_i.bits.wr_bank_addr.getWidth.W)) - val addr_base_captured = RegInit(false.B) - - when(io.lu_ex_i.valid && !addr_base_captured) { - wr_bank_addr_base := io.lu_ex_i.bits.wr_bank_addr - addr_base_captured := true.B - weight_expreg := io.sramReadResp(op1_bank).bits.data(127,96) - act_expreg := io.sramReadResp(op2_bank).bits.data(127,96) - } - val result_exp = weight_expreg+act_expreg - - - - // PE array default value assignment - pe_array.io.in_last := VecInit(Seq.fill(16)(false.B)) - pe_array.io.in_id := DontCare - pe_array.io.in_a := DontCare - pe_array.io.in_d := DontCare - pe_array.io.in_b := VecInit(Seq.fill(16)(0.U)) - pe_array.io.in_control.foreach(_.propagate := 0.U) - pe_array.io.in_valid := VecInit(Seq.fill(16)(false.B)) - - // Default SRAM write port assignment - for(i <- 0 until b.sp_banks){ - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(spad_w / 8)(false.B)) - } - - // Start writing to SRAM when output is ready - when(output_ready && !writing_output) { - writing_output := true.B - write_cycles := 0.U - } - - when(writing_output) { - when(write_cycles < 16.U) { - // Concatenate 4x32-bit data into 128-bit wide data and write to SRAM - for(i <- 0 until b.acc_banks/2) { - when((wr_bank_addr_base + write_cycles)(0) === 0.U){ - io.accWrite(i).req.valid := true.B - io.accWrite(i).req.bits.addr := wr_bank_addr_base + (write_cycles >> 1.U) - val idx = (write_cycles * 4.U + i.U)(5,0) // 6 bits for 64 elements - io.accWrite(i).req.bits.data := Cat( - output_buffer(idx)(3), - output_buffer(idx)(2), - output_buffer(idx)(1), - output_buffer(idx)(0) - ) - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - }.otherwise{ - io.accWrite(i + b.acc_banks/2).req.valid := true.B - io.accWrite(i + b.acc_banks/2).req.bits.addr := wr_bank_addr_base + (write_cycles >> 1.U) - val idx = (write_cycles * 4.U + i.U)(5,0) // 6 bits for 64 elements - io.accWrite(i + b.acc_banks/2).req.bits.data := Cat( - output_buffer(idx)(3), - output_buffer(idx)(2), - output_buffer(idx)(1), - output_buffer(idx)(0) - ) - io.accWrite(i + b.acc_banks/2).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - } - } - write_cycles := write_cycles + 1.U - }.otherwise { - // Write completed - writing_output := false.B - output_ready := false.B - addr_base_captured:=false.B - io.cmdResp.bits.rob_id := io.lu_ex_i.bits.rob_id - io.cmdResp.valid := true.B - } - } - - io.lu_ex_i.ready := true.B - - switch(state) { - is(idle) { - when(io.lu_ex_i.valid && !io.is_matmul_ws) { - // Start weight loading - weight_cycles := 0.U - state := weight_load - }.elsewhen(io.is_matmul_ws && act_data_ready){ - state := data_compute - act_cycles := 0.U - } - } - is(weight_load) { - // Load 16 cycles of weights - when(weight_cycles < 16.U) { - pe_array.io.in_d := weight_reg - pe_array.io.in_control.foreach(_.propagate := 1.U) - pe_array.io.in_valid.foreach(_ := true.B) - weight_cycles := weight_cycles + 1.U - }.otherwise { - // Weight loading completed, wait for activation data to be ready then enter computation - when(act_data_ready) { - act_cycles := 0.U - state := data_compute - } - } - } - is(data_compute) { - act_cycles := act_cycles + 1.U - // In computation phase, start parallelogram input mode for shift registers - when(act_cycles < 31.U) { - // Each cycle enables more rows for shifting row by row - for(col <- 0 until 16) { - when(col.U <= act_cycles && col.U < 16.U) { - col_enable(col) := true.B - }.otherwise { - col_enable(col) := false.B - } - } - - // Shift register operation: enabled rows shift right - for(col <- 0 until 16) { - when(col_enable(col)) { - for(row <- 0 until 15) { - act_shift_reg(row)(col) := act_shift_reg(row + 1)(col) - } - // Pad left with 0 (because in computation phase, no new data is input) - act_shift_reg(15)(col) := 0.U(7.W) - } - } - - // Input rightmost element of each row (column 15) to PE array - val current_input = WireDefault(VecInit(Seq.fill(16)(0.U(7.W)))) - for(col <- 0 until 16) { - when(col_enable(col)) { - current_input(col) := act_shift_reg(0)(col) - }.otherwise { - current_input(col) := 0.U(7.W) - } - } - pe_array.io.in_a := current_input - pe_array.io.in_b.foreach(_ := 0.U) - } - // Start receiving output from cycle 16, end at cycle 47 - - when(act_cycles > 16.U && act_cycles <= 48.U) { - output_buffer_parallelogram(output_ptr) := pe_array.io.out_b - output_ptr := output_ptr + 1.U - } - - // After cycle 47, convert parallelogram output to 64 4x32 registers - when(act_cycles === 49.U) { - output_ready := true.B - // Convert parallelogram output to 64 4x32 registers, organized in row-major order - for(row <- 0 until 16) { - for(col <- 0 until 16) { - val src_row = row + col - // Every 4 columns form one register group - val buffer_index = row * 4 + (col >> 2) - // Position within 4-element group - val element_index = col & 0x3 - if(src_row < 32) { - output_buffer(buffer_index)(element_index) := output_buffer_parallelogram(src_row)(col) - } else { - output_buffer(buffer_index)(element_index) := 0.U(32.W) - } - } - } - } - - // Reset data ready flag - when(act_cycles === 50.U) { - act_data_ready := false.B - act_reg_ptr := 0.U - output_ptr := 0.U - // Reset all row enable signals - col_enable := VecInit(Seq.fill(16)(false.B)) - state := idle - } - } - } - - - io.sramReadResp.foreach { resp => - resp.ready := state =/= idle - } -} diff --git a/arch/src/main/scala/prototype/matrix/bbfp_load.scala b/arch/src/main/scala/prototype/matrix/bbfp_load.scala deleted file mode 100644 index e1d2ffcf..00000000 --- a/arch/src/main/scala/prototype/matrix/bbfp_load.scala +++ /dev/null @@ -1,60 +0,0 @@ -package prototype.matrix - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.matrix._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO, SramReadReq} -import examples.BuckyballConfigs.CustomBuckyballConfig - -class BBFP_LoadUnit(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val rob_id_width = log2Up(b.rob_entries) - val spad_w = b.veclane * b.inputType.getWidth - val io = IO(new Bundle { - val sramReadReq = Vec(b.sp_banks,Decoupled(new SramReadReq(b.spad_bank_entries))) - val id_lu_i = Flipped(Decoupled(new id_lu_req)) - val lu_ex_o = Decoupled(new lu_ex_req) - }) - - val op1_bank = io.id_lu_i.bits.op1_bank - val op1_bank_addr = io.id_lu_i.bits.op1_bank_addr - val op2_bank = io.id_lu_i.bits.op2_bank - val op2_bank_addr = io.id_lu_i.bits.op2_bank_addr - val wr_bank = io.id_lu_i.bits.wr_bank - val wr_bank_addr = io.id_lu_i.bits.wr_bank_addr - - // Default assignment for each bank read request - for(i <- 0 until b.sp_banks){ - io.sramReadReq(i).valid := false.B - io.sramReadReq(i).bits.fromDMA := false.B - io.sramReadReq(i).bits.addr := 0.U - } - - // Generate SRAM read request based on ID_LU input - when(io.id_lu_i.valid){ - io.sramReadReq(op1_bank).valid := true.B - io.sramReadReq(op1_bank).bits.fromDMA := false.B - io.sramReadReq(op1_bank).bits.addr := op1_bank_addr - - io.sramReadReq(op2_bank).valid := true.B - io.sramReadReq(op2_bank).bits.fromDMA := false.B - io.sramReadReq(op2_bank).bits.addr := op2_bank_addr - } - - // Generate LU_EX request - io.lu_ex_o.valid := io.id_lu_i.valid - io.lu_ex_o.bits.op1_bank := op1_bank - io.lu_ex_o.bits.op2_bank := op2_bank - io.lu_ex_o.bits.wr_bank := wr_bank - io.lu_ex_o.bits.wr_bank_addr := wr_bank_addr - io.lu_ex_o.bits.opcode := io.id_lu_i.bits.opcode - io.lu_ex_o.bits.iter := io.id_lu_i.bits.iter - io.lu_ex_o.bits.thread_id := io.id_lu_i.bits.thread_id - io.lu_ex_o.bits.rob_id := io.id_lu_i.bits.rob_id - - io.id_lu_i.ready := io.lu_ex_o.ready - - -} diff --git a/arch/src/main/scala/prototype/matrix/bbfp_pe.scala b/arch/src/main/scala/prototype/matrix/bbfp_pe.scala deleted file mode 100644 index a8236338..00000000 --- a/arch/src/main/scala/prototype/matrix/bbfp_pe.scala +++ /dev/null @@ -1,412 +0,0 @@ -package prototype.matrix -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import prototype.matrix._ -// PE control signal Bundle -class PEControl extends Bundle { - // Propagation control - val propagate = UInt(1.W) -} - -class MacUnit extends Module { - val io = IO(new Bundle { - // Unsigned input: [6]=sign, [5]=flag, [4:0]=value - val in_a = Input(UInt(7.W)) - // Unsigned input: [6]=sign, [5]=flag, [4:0]=value - val in_b = Input(UInt(7.W)) - // Unsigned input: [31]=sign, [30:0]=value - val in_c = Input(UInt(32.W)) - // Signed output - val out_d = Output(UInt(32.W)) - }) - - // Extract sign bits - val sign_a = io.in_a(6) - val sign_b = io.in_b(6) - val sign_c = io.in_c(31) - - // Extract flag bits - val flag_a = io.in_a(5) - val flag_b = io.in_b(5) - - // Extract value parts - // 5-bit value - val value_a = io.in_a(4, 0) - // 5-bit value - val value_b = io.in_b(4, 0) - // 31-bit value - val value_c = io.in_c(30, 0) - - - // val extended_a = value_a.asUInt().pad(7) // Extend to 7 bits - // val extended_b = value_b.asUInt().pad(7) // Extend to 7 bits - - - // Determine left shift based on flag bit - val shifted_a = Mux(flag_a === 1.U, value_a << 2, value_a) - val shifted_b = Mux(flag_b === 1.U, value_b << 2, value_b) - - - // Convert value to signed number, considering sign bit - // First extend bit width to avoid overflow, then determine sign based on sign bit - val a_signed = Mux(sign_a === 1.U, - -(shifted_a.zext), - shifted_a.zext - ).asSInt - - val b_signed = Mux(sign_b === 1.U, - -(shifted_b.zext), - shifted_b.zext - ).asSInt - - val c_signed = Mux(sign_c === 1.U, - -(value_c.zext), - value_c.zext - ).asSInt - - // Perform MAC operation: a * b + c - val result = a_signed * b_signed + c_signed - - // Output result - io.out_d := result.asUInt -} - - -// BBFP PE unit (only supports Weight Stationary) -class BBFP_PE_WS(max_simultaneous_matmuls: Int = 16) extends Module { - val io = IO(new Bundle { - // Data input/output - // Input activation value - val in_a = Input(UInt(7.W)) - // Input partial sum - val in_b = Input(UInt(32.W)) - // Input weight - val in_d = Input(UInt(7.W)) - // Output activation value - val out_a = Output(UInt(7.W)) - // Output partial sum - val out_b = Output(UInt(32.W)) - // Output weight - val out_d = Output(UInt(7.W)) - // Control signals - val in_control = Input(new PEControl()) - val out_control = Output(new PEControl()) - - // ID and valid signals - val in_id = Input(UInt(log2Up(max_simultaneous_matmuls).W)) - val out_id = Output(UInt(log2Up(max_simultaneous_matmuls).W)) - - val in_last = Input(Bool()) - val out_last = Output(Bool()) - - val in_valid = Input(Bool()) - val out_valid = Output(Bool()) - }) - - // Instantiate MAC unit - val mac_unit = Module(new MacUnit) - - // Input signals - val a = io.in_a - val b = io.in_b - val d = io.in_d - val prop = io.in_control.propagate - // val shift = io.in_control.shift // Removed - val id = io.in_id - val last = io.in_last - val valid = io.in_valid - - // Accumulation register - - val weight = Reg(UInt(7.W)) - // Pass-through signals - io.out_a := a - io.out_control.propagate := prop - io.out_id := id - io.out_last := last - io.out_valid := valid - - // MAC unit connections - mac_unit.io.in_a := a - mac_unit.io.in_b := d - - // Propagation control constant - val PROPAGATE = 1.U(1.W) - - // Default assignment - io.out_b := b - mac_unit.io.in_c := b - // Weight Stationary mode - when(prop === PROPAGATE) { - when(valid) { - weight := d - } - io.out_d := d - mac_unit.io.in_b := d - mac_unit.io.in_c := b - mac_unit.io.in_a := a - io.out_b := mac_unit.io.out_d - io.out_a := a - }.otherwise { - // Computation mode: output c2, use c1 as weight for computation - io.out_a := a - io.out_d := DontCare - mac_unit.io.in_b := weight - mac_unit.io.in_c := b - mac_unit.io.in_a := a - io.out_b := mac_unit.io.out_d - } - - // Do not update register when invalid - when(!valid) { - weight := weight - } -} - - - -class BBFP_PE_Array2x2 extends Module { - val io = IO(new Bundle { - // Row input activation (horizontal propagation) - val in_a = Input(Vec(2, UInt(7.W))) - // Column input weight (vertical propagation) - val in_d = Input(Vec(2, UInt(7.W))) - // Column input partial sum (vertical propagation) - val in_b = Input(Vec(2, UInt(32.W))) - // Column input control signal (follows weight vertical propagation) - val in_control = Input(Vec(2, new PEControl())) - val in_id = Input(Vec(2, UInt(1.W))) - val in_last = Input(Vec(2, Bool())) - val in_valid = Input(Vec(2, Bool())) - - // Output - // Bottom row output partial sum - val out_b = Output(Vec(2, UInt(32.W))) - // Right column output activation - val out_a = Output(Vec(2, UInt(7.W))) - // Bottom row output weight - val out_d = Output(Vec(2, UInt(7.W))) - }) - - // 2x2 PE array - val pes = Seq.fill(2, 2)(Module(new BBFP_PE_WS())) - - // Registers connecting between PEs - // Activation horizontal propagation register (row direction, left->right) - val reg_a_h = Seq.fill(2)(Reg(UInt(7.W))) // pes(i)(0) -> pes(i)(1) - - // Weight vertical propagation register (column direction, top->bottom) - val reg_d_v = Seq.fill(2)(Reg(UInt(7.W))) // pes(0)(j) -> pes(1)(j) - - // Partial sum vertical propagation register (column direction, top->bottom) - val reg_b_v = Seq.fill(2)(Reg(UInt(32.W))) // pes(0)(j) -> pes(1)(j) - - // Control signal vertical propagation register (column direction, top->bottom) - val reg_ctrl_v = Seq.fill(2)(Wire(new PEControl)) - val reg_id_v = Seq.fill(2)(Wire(UInt(1.W))) - val reg_last_v = Seq.fill(2)(Wire(Bool())) - val reg_valid_v = Seq.fill(2)(Wire(Bool())) - - // ================ PE(0,0) ================ - // Row 0 activation input - pes(0)(0).io.in_a := io.in_a(0) - // Column 0 weight input - pes(0)(0).io.in_d := io.in_d(0) - // Column 0 partial sum input - pes(0)(0).io.in_b := io.in_b(0) - // Column 0 control signal input - pes(0)(0).io.in_control := io.in_control(0) - pes(0)(0).io.in_id := io.in_id(0) - pes(0)(0).io.in_last := io.in_last(0) - pes(0)(0).io.in_valid := io.in_valid(0) - - // ================ PE(0,1) ================ - // Activation propagates horizontally from PE(0,0) (through register) - reg_a_h(0) := pes(0)(0).io.out_a - pes(0)(1).io.in_a := reg_a_h(0) - // Weight, partial sum, control signal input from external column 1 - pes(0)(1).io.in_d := io.in_d(1) - pes(0)(1).io.in_b := io.in_b(1) - pes(0)(1).io.in_control := io.in_control(1) - pes(0)(1).io.in_id := io.in_id(1) - pes(0)(1).io.in_last := io.in_last(1) - pes(0)(1).io.in_valid := io.in_valid(1) - - // ================ PE(1,0) ================ - // Activation input from external row 1 - pes(1)(0).io.in_a := io.in_a(1) - // Weight, partial sum, control signal propagate vertically from PE(0,0) (through register) - reg_d_v(0) := pes(0)(0).io.out_d - reg_b_v(0) := pes(0)(0).io.out_b - reg_ctrl_v(0) := pes(0)(0).io.out_control - reg_id_v(0) := pes(0)(0).io.out_id - reg_last_v(0) := pes(0)(0).io.out_last - reg_valid_v(0):= pes(0)(0).io.out_valid - - pes(1)(0).io.in_d := reg_d_v(0) - pes(1)(0).io.in_b := reg_b_v(0) - pes(1)(0).io.in_control := reg_ctrl_v(0) - pes(1)(0).io.in_id := reg_id_v(0) - pes(1)(0).io.in_last := reg_last_v(0) - pes(1)(0).io.in_valid := reg_valid_v(0) - - // ================ PE(1,1) ================ - // Activation propagates horizontally from PE(1,0) (through register) - reg_a_h(1) := pes(1)(0).io.out_a - pes(1)(1).io.in_a := reg_a_h(1) - // Weight, partial sum, control signal propagate vertically from PE(0,1) (through register) - reg_d_v(1) := pes(0)(1).io.out_d - reg_b_v(1) := pes(0)(1).io.out_b - reg_ctrl_v(1) := pes(0)(1).io.out_control - reg_id_v(1) := pes(0)(1).io.out_id - reg_last_v(1) := pes(0)(1).io.out_last - reg_valid_v(1):= pes(0)(1).io.out_valid - - pes(1)(1).io.in_d := reg_d_v(1) - pes(1)(1).io.in_b := reg_b_v(1) - pes(1)(1).io.in_control := reg_ctrl_v(1) - pes(1)(1).io.in_id := reg_id_v(1) - pes(1)(1).io.in_last := reg_last_v(1) - pes(1)(1).io.in_valid := reg_valid_v(1) - - // ================ Output connections ================ - // Add registers for outputs - val out_b_reg = Reg(Vec(2, UInt(32.W))) - val out_a_reg = Reg(Vec(2, UInt(7.W))) - val out_d_reg = Reg(Vec(2, UInt(7.W))) - - out_b_reg(0) := pes(1)(0).io.out_b - out_b_reg(1) := pes(1)(1).io.out_b - out_a_reg(0) := pes(0)(1).io.out_a - out_a_reg(1) := pes(1)(1).io.out_a - out_d_reg(0) := pes(1)(0).io.out_d - out_d_reg(1) := pes(1)(1).io.out_d - - io.out_b := out_b_reg - io.out_a := out_a_reg - io.out_d := out_d_reg -} - -class BBFP_PE_Array16x16 extends Module { - val io = IO(new Bundle { - // Row input activation (horizontal propagation) - 16 rows - val in_a = Input(Vec(16, UInt(7.W))) - // Column input weight (vertical propagation) - 16 columns - val in_d = Input(Vec(16, UInt(7.W))) - // Column input partial sum (vertical propagation) - 16 columns - val in_b = Input(Vec(16, UInt(32.W))) - // Column input control signal (follows weight vertical propagation) - 16 columns - val in_control = Input(Vec(16, new PEControl())) - val in_id = Input(Vec(16, UInt(1.W))) - val in_last = Input(Vec(16, Bool())) - val in_valid = Input(Vec(16, Bool())) - - // Output - // Bottom row output partial sum - val out_b = Output(Vec(16, UInt(32.W))) - // Right column output activation - val out_a = Output(Vec(16, UInt(7.W))) - // Bottom row output weight - val out_d = Output(Vec(16, UInt(7.W))) - }) - - // 16x16 PE array - val pes = Seq.fill(16, 16)(Module(new BBFP_PE_WS())) - - // Registers connecting between PEs - // Activation horizontal propagation register (row direction, left->right) - val reg_a_h = Seq.fill(16, 15)(Reg(UInt(7.W))) // pes(i)(j) -> pes(i)(j+1) - - // Weight vertical propagation register (column direction, top->bottom) - val reg_d_v = Seq.fill(15, 16)(Reg(UInt(7.W))) // pes(i)(j) -> pes(i+1)(j) - - // Partial sum vertical propagation register (column direction, top->bottom) - val reg_b_v = Seq.fill(15, 16)(Reg(UInt(32.W))) // pes(i)(j) -> pes(i+1)(j) - - // Control signal vertical propagation register (column direction, top->bottom) - val reg_ctrl_v = Seq.fill(15, 16)(Wire(new PEControl)) - val reg_id_v = Seq.fill(15, 16)(Wire(UInt(1.W))) - val reg_last_v = Seq.fill(15, 16)(Wire(Bool())) - val reg_valid_v = Seq.fill(15, 16)(Wire(Bool())) - - // ================ PE array connections ================ - for (i <- 0 until 16) { - for (j <- 0 until 16) { - // Activation input connection (horizontal propagation) - if (j == 0) { - // First column: input from external - pes(i)(j).io.in_a := io.in_a(i) - } else { - // Other columns: propagate from left PE through register - reg_a_h(i)(j-1) := pes(i)(j-1).io.out_a - pes(i)(j).io.in_a := reg_a_h(i)(j-1) - } - - // Weight input connection (vertical propagation) - if (i == 0) { - // First row: input from external - pes(i)(j).io.in_d := io.in_d(j) - } else { - // Other rows: propagate from top PE through register - reg_d_v(i-1)(j) := pes(i-1)(j).io.out_d - pes(i)(j).io.in_d := reg_d_v(i-1)(j) - } - - // Partial sum input connection (vertical propagation) - if (i == 0) { - // First row: input from external - pes(i)(j).io.in_b := io.in_b(j) - } else { - // Other rows: propagate from top PE through register - reg_b_v(i-1)(j) := pes(i-1)(j).io.out_b - pes(i)(j).io.in_b := reg_b_v(i-1)(j) - } - - // Control signal input connection (vertical propagation) - if (i == 0) { - // First row: input from external - pes(i)(j).io.in_control := io.in_control(j) - pes(i)(j).io.in_id := io.in_id(j) - pes(i)(j).io.in_last := io.in_last(j) - pes(i)(j).io.in_valid := io.in_valid(j) - } else { - // Other rows: propagate from top PE through register - reg_ctrl_v(i-1)(j) := pes(i-1)(j).io.out_control - reg_id_v(i-1)(j) := pes(i-1)(j).io.out_id - reg_last_v(i-1)(j) := pes(i-1)(j).io.out_last - reg_valid_v(i-1)(j):= pes(i-1)(j).io.out_valid - - pes(i)(j).io.in_control := reg_ctrl_v(i-1)(j) - pes(i)(j).io.in_id := reg_id_v(i-1)(j) - pes(i)(j).io.in_last := reg_last_v(i-1)(j) - pes(i)(j).io.in_valid := reg_valid_v(i-1)(j) - } - } - } - - // ================ Output connections ================ - // Add registers for outputs - val out_b_reg = Reg(Vec(16, UInt(32.W))) - val out_a_reg = Reg(Vec(16, UInt(7.W))) - val out_d_reg = Reg(Vec(16, UInt(7.W))) - - // Bottom row output partial sum (all columns of row 15) - for (j <- 0 until 16) { - out_b_reg(j) := pes(15)(j).io.out_b - } - - // Right column output activation (row 15 of all rows) - for (i <- 0 until 16) { - out_a_reg(i) := pes(i)(15).io.out_a - } - - // Bottom row output weight (all columns of row 15) - for (j <- 0 until 16) { - out_d_reg(j) := pes(15)(j).io.out_d - } - - io.out_b := out_b_reg - io.out_a := out_a_reg - io.out_d := out_d_reg -} diff --git a/arch/src/main/scala/prototype/nagisa/matmul/Macro.scala b/arch/src/main/scala/prototype/nagisa/matmul/Macro.scala deleted file mode 100644 index 028444ef..00000000 --- a/arch/src/main/scala/prototype/nagisa/matmul/Macro.scala +++ /dev/null @@ -1,324 +0,0 @@ -package prototype.nagisa.matmul - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - - -class PiDRAMmarcoBlackBox extends BlackBox with HasBlackBoxInline { - val io = IO(new Bundle { - val clock = Input(Clock()) - val reset = Input(Bool()) - - // Control interface - val start = Input(Bool()) - val done = Output(Bool()) - - // Data addresses - val op1_addr = Input(UInt(32.W)) - val op2_addr = Input(UInt(32.W)) - val result_addr = Input(UInt(32.W)) - - // Computation parameters - val rows = Input(UInt(16.W)) - val cols = Input(UInt(16.W)) - val op_type = Input(UInt(4.W)) // 0: matmul, 1: add, 2: mul, etc. - - // Data width - val data_width = Input(UInt(8.W)) - }) - - setInline("PiDRAMmarcoBlackBox.v", - s""" - |module PiDRAMmarcoBlackBox( - | input clock, - | input reset, - | input start, - | output reg done, - | input [31:0] op1_addr, - | input [31:0] op2_addr, - | input [31:0] result_addr, - | input [15:0] rows, - | input [15:0] cols, - | input [3:0] op_type, - | input [7:0] data_width - |); - | - | // State machine for marco computation - | reg [2:0] state; - | localparam IDLE = 3'b000; - | localparam LOAD = 3'b001; - | localparam COMPUTE = 3'b010; - | localparam STORE = 3'b011; - | localparam DONE = 3'b100; - | - | // Cycle counter for computation - | reg [31:0] cycle_count; - | reg [31:0] total_cycles; - | reg running; - | - | // Compute total cycles based on operation type and size - | always @(*) begin - | case(op_type) - | 4'b0000: // Matrix multiplication: rows * cols * cols - | total_cycles = rows * cols * cols; - | 4'b0001: // Element-wise addition: rows * cols - | total_cycles = rows * cols; - | 4'b0010: // Element-wise multiplication: rows * cols - | total_cycles = rows * cols; - | default: - | total_cycles = rows * cols; - | endcase - | end - | - | always @(posedge clock) begin - | if (reset) begin - | done <= 1'b0; - | state <= IDLE; - | cycle_count <= 32'b0; - | running <= 1'b0; - | end else begin - | case(state) - | IDLE: begin - | done <= 1'b0; - | cycle_count <= 32'b0; - | running <= 1'b0; - | if (start) begin - | state <= LOAD; - | running <= 1'b1; - | end - | end - | LOAD: begin - | // Load phase: simulate data loading from memory - | // In real marco, this would be handled by the memory controller - | state <= COMPUTE; - | end - | COMPUTE: begin - | // Compute phase: perform marco computation - | // This simulates the computation cycles in marco - | if (cycle_count >= total_cycles) begin - | state <= STORE; - | cycle_count <= 32'b0; - | end else begin - | cycle_count <= cycle_count + 1; - | end - | end - | STORE: begin - | // Store phase: write results back - | state <= DONE; - | end - | DONE: begin - | done <= 1'b1; - | running <= 1'b0; - | if (!start) begin - | state <= IDLE; - | end - | end - | default: begin - | state <= IDLE; - | end - | endcase - | end - | end - | - |endmodule - """.stripMargin) -} - - -class marco(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // Command interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Accumulator write interface (for partial sums in marco operations) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - // Status output - val status = new Status - }) - - // State machine - val idle :: sLoadOp1 :: sLoadOp2 :: sCompute :: sWrite :: complete :: Nil = Enum(6) - val state = RegInit(idle) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val op1_addr_reg = RegInit(0.U(10.W)) - val op1_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val op2_addr_reg = RegInit(0.U(10.W)) - val op2_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val result_addr_reg = RegInit(0.U(10.W)) - val result_bank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - - // marco parameters from special field (40 bits total) - // special[15:0] = rows (16 bits) - // special[31:16] = cols (16 bits) - // special[35:32] = op_type (4 bits): 0=matmul, 1=add, 2=mul - val rows_reg = RegInit(0.U(16.W)) - val cols_reg = RegInit(0.U(16.W)) - val op_type_reg = RegInit(0.U(4.W)) - - // Counters - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val computeCounter = RegInit(0.U(32.W)) - - // PiDRAM marco BlackBox instance - val pidrammarco = Module(new PiDRAMmarcoBlackBox) - pidrammarco.io.clock := clock - pidrammarco.io.reset := reset.asBool - - // Default SRAM assignments - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // Default accumulator assignments - for (i <- 0 until b.acc_banks) { - io.accWrite(i).req.valid := false.B - io.accWrite(i).req.bits.addr := 0.U - io.accWrite(i).req.bits.data := 0.U - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(0.U(1.W))) - } - - // Command interface defaults - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // PiDRAM marco interface defaults - pidrammarco.io.start := false.B - pidrammarco.io.op1_addr := op1_addr_reg - pidrammarco.io.op2_addr := op2_addr_reg - pidrammarco.io.result_addr := result_addr_reg - pidrammarco.io.rows := rows_reg - pidrammarco.io.cols := cols_reg - pidrammarco.io.op_type := op_type_reg - pidrammarco.io.data_width := b.inputType.getWidth.U - - // Status output - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sLoadOp1) || (state === sLoadOp2) - io.status.running := (state === sCompute) || (state === sWrite) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := computeCounter - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sLoadOp1 - readCounter := 0.U - writeCounter := 0.U - computeCounter := 0.U - - robid_reg := io.cmdReq.bits.rob_id - op1_addr_reg := io.cmdReq.bits.cmd.op1_bank_addr - op1_bank_reg := io.cmdReq.bits.cmd.op1_bank - op2_addr_reg := io.cmdReq.bits.cmd.op2_bank_addr - op2_bank_reg := io.cmdReq.bits.cmd.op2_bank - result_addr_reg := io.cmdReq.bits.cmd.wr_bank_addr - result_bank_reg := io.cmdReq.bits.cmd.wr_bank - iter_reg := io.cmdReq.bits.cmd.iter - - // Extract marco parameters from special field (40 bits) - // special[15:0] = rows, special[31:16] = cols, special[35:32] = op_type - rows_reg := io.cmdReq.bits.cmd.special(15, 0) - cols_reg := io.cmdReq.bits.cmd.special(31, 16) - op_type_reg := io.cmdReq.bits.cmd.special(35, 32) - } - } - - is(sLoadOp1) { - // Load operand 1 (simplified: load one tile) - when(readCounter < iter_reg) { - io.sramRead(op1_bank_reg).req.valid := true.B - io.sramRead(op1_bank_reg).req.bits.addr := op1_addr_reg + readCounter - io.sramRead(op1_bank_reg).req.bits.fromDMA := false.B - - when(io.sramRead(op1_bank_reg).resp.valid) { - io.sramRead(op1_bank_reg).resp.ready := true.B - readCounter := readCounter + 1.U - } - }.otherwise { - state := sLoadOp2 - readCounter := 0.U - } - } - - is(sLoadOp2) { - // Load operand 2 (simplified: load one tile) - when(readCounter < iter_reg) { - io.sramRead(op2_bank_reg).req.valid := true.B - io.sramRead(op2_bank_reg).req.bits.addr := op2_addr_reg + readCounter - io.sramRead(op2_bank_reg).req.bits.fromDMA := false.B - - when(io.sramRead(op2_bank_reg).resp.valid) { - io.sramRead(op2_bank_reg).resp.ready := true.B - readCounter := readCounter + 1.U - } - }.otherwise { - state := sCompute - readCounter := 0.U - pidrammarco.io.start := true.B - } - } - - is(sCompute) { - // Wait for PiDRAM marco to complete - when(pidrammarco.io.done) { - state := sWrite - writeCounter := 0.U - }.otherwise { - computeCounter := computeCounter + 1.U - } - } - - is(sWrite) { - // Write result (simplified: write one tile) - when(writeCounter < iter_reg) { - io.sramWrite(result_bank_reg).req.valid := true.B - io.sramWrite(result_bank_reg).req.bits.addr := result_addr_reg + writeCounter - // Simplified: write zeros as placeholder (actual output would come from PiDRAM marco) - io.sramWrite(result_bank_reg).req.bits.data := 0.U - io.sramWrite(result_bank_reg).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(1.U(1.W))) - - when(io.sramWrite(result_bank_reg).req.ready) { - writeCounter := writeCounter + 1.U - } - }.otherwise { - state := complete - } - } - - is(complete) { - io.cmdResp.valid := true.B - when(io.cmdResp.ready) { - state := idle - } - } - } -} diff --git a/arch/src/main/scala/prototype/nagisa/matmul/MacroMatmulBall.scala b/arch/src/main/scala/prototype/nagisa/matmul/MacroMatmulBall.scala deleted file mode 100644 index 597d8daa..00000000 --- a/arch/src/main/scala/prototype/nagisa/matmul/MacroMatmulBall.scala +++ /dev/null @@ -1,57 +0,0 @@ -package prototype.nagisa.matmul - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.nagisa.matmul.marco - -/** - * marcoMatmulBall - A Compute-in-Memory Ball that complies with the Blink protocol - * Behavior: Read operands from Scratchpad, perform marco computation (inspired by PiDRAM), - * then write results back to Scratchpad. - */ -class marcoMatmulBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate marco computation unit - private val marcoUnit = Module(new marco) - - // Connect command interface - marcoUnit.io.cmdReq <> io.cmdReq - marcoUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - marcoUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - marcoUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect Accumulator write interface (for partial sums in marco operations) - for (i <- 0 until b.acc_banks) { - marcoUnit.io.accWrite(i) <> io.accWrite(i).io - io.accWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (not used, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> marcoUnit.io.status - - override lazy val desiredName: String = "marcoMatmulBall" -} diff --git a/arch/src/main/scala/prototype/nnlut/NNLut.scala b/arch/src/main/scala/prototype/nnlut/NNLut.scala deleted file mode 100644 index 85dc872d..00000000 --- a/arch/src/main/scala/prototype/nnlut/NNLut.scala +++ /dev/null @@ -1,202 +0,0 @@ -package prototype.nnlut - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - -/** - * NNLut - Neural Network Look-Up Table unit - * Simple implementation: read data from scratchpad, lookup in LUT, write back - */ -class NNLut(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // cmd interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Status output - val status = new Status - }) - - // State definitions - val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) - val state = RegInit(idle) - - // Store a veclane x veclane tile - val regArray = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - )) - ) - - // Counters - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val respCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val waddr_reg = RegInit(0.U(10.W)) - val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val raddr_reg = RegInit(0.U(10.W)) - val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - val cycle_reg = RegInit(0.U(6.W)) - val iterCnt = RegInit(0.U(32.W)) - - // Precompute write data - val writeDataReg = Reg(UInt(spad_w.W)) - val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) - - // Simple LUT: 256 entries for 8-bit input - // This is a simple example LUT, can be replaced with actual NN-LUT values - val LUT_SIZE = 256 - val lut = VecInit(Seq.tabulate(LUT_SIZE) { i => - // Simple example: identity function with saturation - // In real NN-LUT, this would contain pre-computed activation function values - val input = i.asSInt - val output = Mux(input < -128.S, -128.S, Mux(input > 127.S, 127.S, input)) - output.asUInt - }) - - // SRAM default assignment - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // cmd interface default assignment - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sRead - readCounter := 0.U - respCounter := 0.U - writeCounter := 0.U - - robid_reg := io.cmdReq.bits.rob_id - waddr_reg := io.cmdReq.bits.cmd.wr_bank_addr - wbank_reg := io.cmdReq.bits.cmd.wr_bank - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr - rbank_reg := io.cmdReq.bits.cmd.op1_bank - iter_reg := io.cmdReq.bits.cmd.iter - cycle_reg := (io.cmdReq.bits.cmd.iter +& (b.veclane.U - 1.U)) / b.veclane.U - 1.U - } - - when(cycle_reg =/= 0.U) { - state := sRead - readCounter := 0.U - writeCounter := 0.U - respCounter := 0.U - waddr_reg := waddr_reg + b.veclane.U - raddr_reg := raddr_reg + b.veclane.U - cycle_reg := cycle_reg - 1.U - } - } - - is(sRead) { - when(readCounter < b.veclane.U) { - // Issue read request - readCounter := readCounter + 1.U - io.sramRead(rbank_reg).req.valid := true.B - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + readCounter - } - - // Receive response and perform LUT lookup - io.sramRead(rbank_reg).resp.ready := true.B - when(io.sramRead(rbank_reg).resp.fire) { - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = io.sramRead(rbank_reg).resp.bits.data(hi, lo) - // Perform LUT lookup - // Convert signed value to unsigned index (0-255) - // Take lower 8 bits as unsigned index for LUT - val idx = raw(7, 0) // Use lower 8 bits as unsigned index - regArray(respCounter)(col) := lut(idx) - } - respCounter := respCounter + 1.U - } - - when(respCounter === b.veclane.U) { - state := sWrite - // Precompute first write data (row 0, concatenated by column) - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(0)(j))) - // Set write mask (write all) - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 1.U(1.W) - } - } - } - - is(sWrite) { - // Write back results - io.sramWrite(wbank_reg).req.valid := writeCounter < b.veclane.U - io.sramWrite(wbank_reg).req.bits.addr := waddr_reg + writeCounter - io.sramWrite(wbank_reg).req.bits.data := writeDataReg - io.sramWrite(wbank_reg).req.bits.mask := writeMaskReg - - when(writeCounter === (b.veclane - 1).U) { - state := complete - }.otherwise { - writeCounter := writeCounter + 1.U - // Prepare next row's write data - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(writeCounter + 1.U)(j))) - } - } - - is(complete) { - when(cycle_reg === 0.U) { - io.cmdResp.valid := true.B - io.cmdResp.bits.rob_id := robid_reg - when(io.cmdResp.fire) { - iterCnt := iterCnt + 1.U - } - } - state := idle - } - } - - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sRead) && (respCounter < b.veclane.U) - io.status.running := (state === sWrite) || ((state === sRead) && (respCounter === b.veclane.U)) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := iterCnt - - when(reset.asBool) { - for (i <- 0 until b.veclane) { - for (j <- 0 until b.veclane) { - regArray(i)(j) := 0.U - } - } - writeDataReg := 0.U - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 0.U - } - } -} diff --git a/arch/src/main/scala/prototype/nnlut/NNLutBall.scala b/arch/src/main/scala/prototype/nnlut/NNLutBall.scala deleted file mode 100644 index 21fce564..00000000 --- a/arch/src/main/scala/prototype/nnlut/NNLutBall.scala +++ /dev/null @@ -1,57 +0,0 @@ -package prototype.nnlut - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.nnlut.NNLut - -/** - * NNLutBall - A Neural Network Look-Up Table computation Ball that complies with the Blink protocol. - * Behavior: Read data from Scratchpad, perform LUT lookup, then write back to Scratchpad. - */ -class NNLutBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate NNLut computation unit - private val nnlutUnit = Module(new NNLut) - - // Connect command interface - nnlutUnit.io.cmdReq <> io.cmdReq - nnlutUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - nnlutUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - nnlutUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (NN-LUT does not access accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Accumulator write interface (NN-LUT does not write accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> nnlutUnit.io.status - - override lazy val desiredName: String = "NNLutBall" -} diff --git a/arch/src/main/scala/prototype/relu/Relu.scala b/arch/src/main/scala/prototype/relu/Relu.scala deleted file mode 100644 index 4cbbbce1..00000000 --- a/arch/src/main/scala/prototype/relu/Relu.scala +++ /dev/null @@ -1,199 +0,0 @@ -package prototype.relu - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - -class PipelinedRelu[T <: Data](implicit b: CustomBuckyballConfig, p: Parameters) - extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // cmd interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = - Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec( - b.sp_banks, - Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len)) - ) - - // Status output - val status = new Status - }) - - val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) - val state = RegInit(idle) - - // Store a row of split elements (veclane elements) - // Store a veclane x veclane tile (perform element-wise ReLU then write back) - val regArray = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - )) - ) - - // Counters - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val respCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val waddr_reg = RegInit(0.U(10.W)) - val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val raddr_reg = RegInit(0.U(10.W)) - val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - val cycle_reg = RegInit(0.U(6.W)) - // Batch iteration counter - val iterCnt = RegInit(0.U(32.W)) - - // Precompute write data - val writeDataReg = Reg(UInt(spad_w.W)) - val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) - - // SRAM default assignment - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // cmd interface default assignment - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sRead - readCounter := 0.U - respCounter := 0.U - writeCounter := 0.U - - robid_reg := io.cmdReq.bits.rob_id - // For ReLU, output write-back should use decoded wr_bank/addr, not op2_* fields - waddr_reg := io.cmdReq.bits.cmd.wr_bank_addr - wbank_reg := io.cmdReq.bits.cmd.wr_bank - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr - rbank_reg := io.cmdReq.bits.cmd.op1_bank - iter_reg := io.cmdReq.bits.cmd.iter - cycle_reg := (io.cmdReq.bits.cmd.iter +& (b.veclane.U - 1.U)) / b.veclane.U - 1.U - } - - when(cycle_reg =/= 0.U) { - state := sRead - readCounter := 0.U - writeCounter := 0.U - respCounter := 0.U - waddr_reg := waddr_reg + b.veclane.U - raddr_reg := raddr_reg + b.veclane.U - cycle_reg := cycle_reg - 1.U - } - } - - is(sRead) { - when(readCounter < b.veclane.U) { - // Issue read request - readCounter := readCounter + 1.U - io.sramRead(rbank_reg).req.valid := true.B - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + readCounter - - } - - // Receive response, only raise ready when there are outstanding reads - val dataWord = io.sramRead(rbank_reg).resp.bits.data - // val hasOutstandingRead = readCounter =/= respCounter - io.sramRead(rbank_reg).resp.ready := true.B - when(io.sramRead(rbank_reg).resp.fire) { - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = dataWord(hi, lo) - val signed = raw.asSInt - val relu = Mux(signed < 0.S, 0.S(b.inputType.getWidth.W), signed) - regArray(respCounter)(col) := relu.asUInt - } - respCounter := respCounter + 1.U - } - - when(respCounter === b.veclane.U) { - state := sWrite - // Precompute first write data (row 0, concatenated by column) - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(0)(j))) - // Set write mask (write all) - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 1.U(1.W) - } - } - } - - is(sWrite) { - // Correctly use ready/valid handshake to advance writes, avoid dropped writes - io.sramWrite(wbank_reg).req.valid := writeCounter < b.veclane.U - io.sramWrite(wbank_reg).req.bits.addr := waddr_reg + writeCounter - io.sramWrite(wbank_reg).req.bits.data := writeDataReg - io.sramWrite(wbank_reg).req.bits.mask := writeMaskReg - - when(writeCounter === (b.veclane - 1).U) { - state := complete - }.otherwise { - writeCounter := writeCounter + 1.U - // Prepare next row's write data - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(writeCounter + 1.U)(j))) - } - - } - - is(complete) { - when(cycle_reg === 0.U) { - io.cmdResp.valid := true.B - io.cmdResp.bits.rob_id := robid_reg - when(io.cmdResp.fire) { - iterCnt := iterCnt + 1.U - } - } - state := idle - } - } - - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sRead) && (respCounter < b.veclane.U) - io.status.running := (state === sWrite) || ((state === sRead) && (respCounter === b.veclane.U)) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := iterCnt - - when(reset.asBool) { - for (i <- 0 until b.veclane) { - for (j <- 0 until b.veclane) { - regArray(i)(j) := 0.U - } - } - writeDataReg := 0.U - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 0.U - } - } -} diff --git a/arch/src/main/scala/prototype/relu/ReluBall.scala b/arch/src/main/scala/prototype/relu/ReluBall.scala deleted file mode 100644 index 1084b297..00000000 --- a/arch/src/main/scala/prototype/relu/ReluBall.scala +++ /dev/null @@ -1,57 +0,0 @@ -package prototype.relu - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.relu.PipelinedRelu - -/** ReluBall - A ReLU computation Ball that complies with the Blink protocol. - * Behavior: Read data from Scratchpad, perform element-wise ReLU (set negative values to 0), - * then write back to Scratchpad. - */ -class ReluBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate PipelinedRelu computation unit - private val reluUnit = Module(new PipelinedRelu[UInt]) - - // Connect command interface - reluUnit.io.cmdReq <> io.cmdReq - reluUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - reluUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - reluUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (ReLU does not access accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Accumulator write interface (ReLU does not write accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> reluUnit.io.status - - override lazy val desiredName: String = "ReluBall" -} diff --git a/arch/src/main/scala/prototype/transfer/README.md b/arch/src/main/scala/prototype/transfer/README.md deleted file mode 100644 index be885574..00000000 --- a/arch/src/main/scala/prototype/transfer/README.md +++ /dev/null @@ -1,206 +0,0 @@ -# Transfer Tile Data Mover Accelerator - -## Overview - -This directory implements Buckyball's tile-based Transfer accelerator, located at `arch/src/main/scala/prototype/transfer`. The module copies Scratchpad data by tiles (`veclane × veclane`) from a source bank/address to a destination bank/address, in a vectorized manner, and writes results back without any arithmetic modification. - -Core components: -- **Transfer.scala**: Transfer accelerator main implementation - -## Code Structure - -``` -transfer/ -└── Transfer.scala - Transfer accelerator implementation -``` - -### Module Responsibilities - -**Transfer.scala** (Accelerator implementation layer) -- Reads a `veclane × veclane` tile of data from a source Scratchpad bank/address -- Performs no element-wise computation (pure copy) -- Packs rows and writes back the same-sized tile with full mask to the destination bank/address -- Provides Ball domain command interface and returns completion response/status - -## Module Description - -### Transfer.scala - -**Main functionality**: - -Read input tile by tile (`veclane × veclane`) → Pack row data → Write output back row by row; supports `iter`-driven batch processing and pipelined workflow. - -**State machine definition**: - -```scala -val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) -val state = RegInit(idle) -``` - -**Key registers**: - -```scala -// Data cache: veclane × veclane, each element width is inputType.getWidth -val regArray = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - )) -) - -// Counters -val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) // Read request row count -val respCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) // Read response row count -val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) // Write-back row count - -// Instruction field registers -val robid_reg = RegInit(0.U(10.W)) // Command ROB ID -val waddr_reg = RegInit(0.U(10.W)) // Write-back start row address -val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) // Write-back target Scratchpad bank selection -val raddr_reg = RegInit(0.U(10.W)) // Read start row address -val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) // Read source Scratchpad bank selection -val iter_reg = RegInit(0.U(10.W)) // Processing row count/length specified in command -val cycle_reg = RegInit(0.U(6.W)) // Tile round count (derived from iter) -val iterCnt = RegInit(0.U(32.W)) // Completed batch count - -// Write-back data and mask -val spad_w = b.veclane * b.inputType.getWidth // Packed data width per row -val writeDataReg = Reg(UInt(spad_w.W)) // Packed data to write back per row -val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) // Write-back mask vector -``` - -**Command parsing**: - -```scala -when(io.cmdReq.fire) { - // Enter read phase and initialize round counters - state := sRead - readCounter := 0.U // Clear read request row count - respCounter := 0.U // Clear read response row count - writeCounter := 0.U // Clear write-back row count - - // Record command identifier - robid_reg := io.cmdReq.bits.rob_id // ROB ID (for completion response matching) - - // Output (write-back) target address: use wr_* fields - waddr_reg := io.cmdReq.bits.cmd.wr_bank_addr // Write-back start row address - wbank_reg := io.cmdReq.bits.cmd.wr_bank // Write-back target bank - - // Input (read) source address: use op1_* fields - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr // Read start row address - rbank_reg := io.cmdReq.bits.cmd.op1_bank // Read source bank - - // Iteration and rounds - iter_reg := io.cmdReq.bits.cmd.iter // Total rows to process (iteration count) - // Calculate required tile rounds for this batch: each round processes veclane rows - // cycle_reg = ceil(iter / veclane) - 1, decrements after each read/write round completes - cycle_reg := (io.cmdReq.bits.cmd.iter +& (b.veclane.U - 1.U)) / b.veclane.U - 1.U -} -``` - -**Data path (split & pack)**: - -- Read returns a packed data row of width `spad_w = veclane × inputWidth`; -- Split into `veclane` elements (no arithmetic modification) and buffer into `regArray` by row; -- When writing back, repack a full row of `veclane` elements: - -```scala -// Split by column -val dataWord = io.sramRead(rbank_reg).resp.bits.data -for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = dataWord(hi, lo) - regArray(respCounter)(col) := raw.asUInt -} - -// Pack regArray(rowIdx) into one row for write-back -writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(rowIdx)(j))) -// Full mask write -for (i <- 0 until b.spad_mask_len) { writeMaskReg(i) := 1.U } -``` - -**SRAM interface**: - -```scala -val io = IO(new Bundle { - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - val status = new Status -}) -``` - -**Processing flow**: - -1. **idle**: Wait for command; parse input/output bank/addr, iteration count `iter`, and calculate `cycle_reg` accordingly. Optionally, advance to next round by adjusting addresses when `cycle_reg =/= 0`. -2. **sRead**: Issue consecutive read requests row by row for the source bank/addr; after receiving data, split and fill into `regArray`; enter write phase after accumulating `veclane` rows. -3. **sWrite**: Pack and write back row by row to consecutive addresses in `wbank` with full mask write; enter complete phase after writing `veclane` rows. -4. **complete**: When all rounds complete (`cycle_reg == 0`), issue `cmdResp` completion response; then return to `idle`. - -**Inputs/Outputs**: - -- Input: Ball domain commands (`wr_bank/wr_bank_addr`, `op1_bank/op1_bank_addr`, `iter`, etc.) -- Output: Copied tile written to destination bank, `cmdResp` completion notification -- Boundaries and constraints: - - Each round processes `veclane` rows, iteration round count derived from `iter`; - - Write-back uses full mask (can be extended for partial write as needed). - -## ISA Structure - -The Ball instruction corresponding to this module performs tile-based copy on Scratchpad data and writes back to the destination. - -**Function**: Copy input rows from source Scratchpad address to destination Scratchpad address, write back row by row, loop `iter` times. - -**Format**: `bb_transfer rs1, rs2` - -**Operands**: - -- `rs1[spAddrLen-1:0]`: Source Scratchpad address (op1_spaddr) -- `rs2[spAddrLen-1:0]`: Destination Scratchpad address (wr_spaddr) -- `rs2[spAddrLen+9:spAddrLen]`: Iteration count (iter, row count) -- `rs2[63:spAddrLen+10]`: special/reserved field (not used by current Transfer) - -Address note: Local address of `spAddrLen` width is further split into bank and row in hardware (see LocalAddr), no need to explicitly distinguish at ISA level. - -**Operation**: Read data from Scratchpad address specified by `rs1`, then write back row by row to address specified by `rs2`, loop `iter` times. - -rs1 (source address): - -``` -┌──────────────────────────────────────────────────────┐ -│ op1_spaddr │ -│ (spAddrLen bits) │ -├──────────────────────────────────────────────────────┤ -│ [spAddrLen-1:0] │ -└──────────────────────────────────────────────────────┘ -``` - -rs2 (destination address and iteration count): - -``` -┌──────────────────────────────────┬────────────────────┐ -│ iter (rows) │ wr_spaddr │ -│ (10 bits) │ (spAddrLen bits) │ -├──────────────────────────────────┼────────────────────┤ -│ [spAddrLen+9: spAddrLen] │ [spAddrLen-1:0] │ -└──────────────────────────────────┴────────────────────┘ -``` - -Note: During decode, `op1_spaddr` comes from `rs1`, `wr_spaddr` and `iter` come from `rs2`, remaining `special` high bits can be reserved for extension. - -## Usage - -- Place source data at Scratchpad starting position specified by `op1_bank/op1_bank_addr`, ensure each row width is `veclane × inputWidth`; -- Configure destination `wr_bank/wr_bank_addr`, and element row count `iter` to process; -- After sending Ball command, wait for `cmdResp` completion; -- Can poll `status`: `ready/valid/idle/init/running/complete/iter` to get runtime information. - -### Notes - -1. **Handshake robustness**: Prefer gating `resp.ready` by local buffer availability (e.g., `respCounter < readCounter && respCounter < b.veclane.U`) to avoid overruns; simple `true.B` is acceptable when the Scratchpad controller guarantees pacing. -2. **Bandwidth and alignment**: Each read/write is one packed row (`spad_w` bits), addresses need to be row-aligned and increment consecutively. -3. **Mask strategy**: Current implementation uses full mask write; if sparse/partial write needed, extend `writeMaskReg` generation logic or use constant full-ones without a register. -4. **Iteration and chunking**: When `iter` is not a multiple of `veclane`, `cycle_reg` handles remaining rows with ceiling rounding; addresses advance by `veclane` per round. -5. **Read/Write phasing**: If the Scratchpad bank ports disallow same-bank concurrent read/write, keep two-phase read-then-write; otherwise consider streaming per-row read→write to reduce buffer size. -6. **Reset behavior**: Reset clears `regArray`, `writeDataReg`, `writeMaskReg`, facilitating reproducible simulation. diff --git a/arch/src/main/scala/prototype/transfer/Transfer.scala b/arch/src/main/scala/prototype/transfer/Transfer.scala deleted file mode 100644 index 9cc5871a..00000000 --- a/arch/src/main/scala/prototype/transfer/Transfer.scala +++ /dev/null @@ -1,195 +0,0 @@ -package prototype.transfer - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - -class Transfer[T <: Data](implicit b: CustomBuckyballConfig, p: Parameters) - extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // cmd interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = - Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec( - b.sp_banks, - Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len)) - ) - - // Status output - val status = new Status - }) - - val idle :: sRead :: sWrite :: complete :: Nil = Enum(4) - val state = RegInit(idle) - - // Store a row of split elements (veclane elements) - val regArray = RegInit( - VecInit(Seq.fill(b.veclane)( - VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - )) - ) - - // Counters - val readCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val respCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - val writeCounter = RegInit(0.U(log2Ceil(b.veclane + 1).W)) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val waddr_reg = RegInit(0.U(10.W)) - val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val raddr_reg = RegInit(0.U(10.W)) - val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - val cycle_reg = RegInit(0.U(6.W)) - // Batch iteration counter - val iterCnt = RegInit(0.U(32.W)) - - // Precompute write data - val writeDataReg = Reg(UInt(spad_w.W)) - val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) - - // SRAM default assignment - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // cmd interface default assignment - io.cmdReq.ready := state === idle - io.cmdResp.valid := false.B - io.cmdResp.bits.rob_id := robid_reg - - // State machine - switch(state) { - is(idle) { - when(io.cmdReq.fire) { - state := sRead - readCounter := 0.U - respCounter := 0.U - writeCounter := 0.U - - robid_reg := io.cmdReq.bits.rob_id - waddr_reg := io.cmdReq.bits.cmd.wr_bank_addr - wbank_reg := io.cmdReq.bits.cmd.wr_bank - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr - rbank_reg := io.cmdReq.bits.cmd.op1_bank - iter_reg := io.cmdReq.bits.cmd.iter - cycle_reg := (io.cmdReq.bits.cmd.iter +& (b.veclane.U - 1.U)) / b.veclane.U - 1.U - } - - when(cycle_reg =/= 0.U) { - state := sRead - readCounter := 0.U - writeCounter := 0.U - respCounter := 0.U - waddr_reg := waddr_reg + b.veclane.U - raddr_reg := raddr_reg + b.veclane.U - cycle_reg := cycle_reg - 1.U - } - } - - is(sRead) { - when(readCounter < b.veclane.U) { - // Issue read request - readCounter := readCounter + 1.U - io.sramRead(rbank_reg).req.valid := true.B - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + readCounter - - } - - // Receive response, only raise ready when there are outstanding reads - val dataWord = io.sramRead(rbank_reg).resp.bits.data - // val hasOutstandingRead = readCounter =/= respCounter - io.sramRead(rbank_reg).resp.ready := true.B - when(io.sramRead(rbank_reg).resp.fire) { - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - val raw = dataWord(hi, lo) - regArray(respCounter)(col) := raw.asUInt - } - respCounter := respCounter + 1.U - } - - when(respCounter === b.veclane.U) { - state := sWrite - // Precompute first write data (row 0, concatenated by column) - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(0)(j))) - // Set write mask (write all) - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 1.U(1.W) - } - } - } - - is(sWrite) { - // Correctly use ready/valid handshake to advance writes, avoid dropped writes - io.sramWrite(wbank_reg).req.valid := writeCounter < b.veclane.U - io.sramWrite(wbank_reg).req.bits.addr := waddr_reg + writeCounter - io.sramWrite(wbank_reg).req.bits.data := writeDataReg - io.sramWrite(wbank_reg).req.bits.mask := writeMaskReg - - when(writeCounter === (b.veclane - 1).U) { - state := complete - }.otherwise { - writeCounter := writeCounter + 1.U - // Prepare next row's write data - writeDataReg := Cat((0 until b.veclane).reverse.map(j => regArray(writeCounter + 1.U)(j))) - } - - } - - is(complete) { - when(cycle_reg === 0.U) { - io.cmdResp.valid := true.B - io.cmdResp.bits.rob_id := robid_reg - when(io.cmdResp.fire) { - iterCnt := iterCnt + 1.U - } - } - state := idle - } - } - - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := (state === idle) - io.status.init := (state === sRead) && (respCounter < b.veclane.U) - io.status.running := (state === sWrite) || ((state === sRead) && (respCounter === b.veclane.U)) - io.status.complete := (state === complete) && io.cmdResp.fire - io.status.iter := iterCnt - - when(reset.asBool) { - for (i <- 0 until b.veclane) { - for (j <- 0 until b.veclane) { - regArray(i)(j) := 0.U - } - } - writeDataReg := 0.U - for (i <- 0 until b.spad_mask_len) { - writeMaskReg(i) := 0.U - } - } -} diff --git a/arch/src/main/scala/prototype/transfer/TransferBall.scala b/arch/src/main/scala/prototype/transfer/TransferBall.scala deleted file mode 100644 index 75ea7cd2..00000000 --- a/arch/src/main/scala/prototype/transfer/TransferBall.scala +++ /dev/null @@ -1,56 +0,0 @@ -package prototype.transfer - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.transfer.Transfer - -// TransferBall - A data transfer Ball that complies with the Blink protocol. -// Behavior: Read data from Scratchpad and write it back without modification. - -class TransferBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) - extends Module - with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - // Satisfy BallRegist requirements - def Blink: Blink = io - - // Instantiate Transfer computation unit - private val transferUnit = Module(new Transfer[UInt]) - - // Connect command interface - transferUnit.io.cmdReq <> io.cmdReq - transferUnit.io.cmdResp <> io.cmdResp - - // Connect Scratchpad SRAM read/write interface - for (i <- 0 until b.sp_banks) { - transferUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - transferUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Accumulator read interface (Transfer does not access accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Accumulator write interface (Transfer does not write accumulator, tie-off) - for (i <- 0 until b.acc_banks) { - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - - // Pass through status signals - io.status <> transferUnit.io.status - - override lazy val desiredName: String = "TransferBall" -} diff --git a/arch/src/main/scala/prototype/transpose/Transpose.scala b/arch/src/main/scala/prototype/transpose/Transpose.scala deleted file mode 100644 index c84e51e1..00000000 --- a/arch/src/main/scala/prototype/transpose/Transpose.scala +++ /dev/null @@ -1,150 +0,0 @@ -package prototype.transpose - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status -import freechips.rocketchip.tilelink.MemoryOpCategories.wr -import os.read - -class PipelinedTransposer[T <: Data](implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - // cmd interface - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - - // Status output - val status = new Status - }) - - val idle :: compute :: Nil = Enum(2) - val state = RegInit(idle) - - // Matrix storage register (veclane x veclane) - val regArray = Reg(Vec(b.veclane * 2, Vec(b.veclane, UInt(b.inputType.getWidth.W)))) - - // Counters - val readCounter = RegInit(0.U(10.W)) - val respCounter = RegInit(0.U(10.W)) - val writeCounter = RegInit(0.U(10.W)) - val respWaitcounter = RegInit(0.U(10.W)) - val writeHeadptr = RegInit(0.U(10.W)) - val writeTailptr = RegInit(0.U(10.W)) - - // Instruction registers - val robid_reg = RegInit(0.U(10.W)) - val waddr_reg = RegInit(0.U(10.W)) - val wbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val raddr_reg = RegInit(0.U(10.W)) - val rbank_reg = RegInit(0.U(log2Up(b.sp_banks).W)) - val iter_reg = RegInit(0.U(10.W)) - val write_iter_reg = RegInit(0.U(10.W)) - val mode_reg = RegInit(0.U(1.W)) - - - // Precompute write data - val writeDataReg = Reg(UInt(spad_w.W)) - val writeMaskReg = Reg(Vec(b.spad_mask_len, UInt(1.W))) - - val start_write = RegInit(false.B) - // SRAM default assignment - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req.valid := false.B - io.sramRead(i).req.bits.addr := 0.U - io.sramRead(i).req.bits.fromDMA := false.B - io.sramRead(i).resp.ready := false.B - - io.sramWrite(i).req.valid := false.B - io.sramWrite(i).req.bits.addr := 0.U - io.sramWrite(i).req.bits.data := 0.U - io.sramWrite(i).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(0.U(1.W))) - } - - // cmd interface default assignment - io.cmdReq.ready := state === idle - - when(state === idle && io.cmdReq.fire){ - state := compute - readCounter := 0.U - respCounter := 0.U - respWaitcounter := io.cmdReq.bits.cmd.iter + respWaitcounter - - robid_reg := io.cmdReq.bits.rob_id - waddr_reg := io.cmdReq.bits.cmd.op2_bank_addr - wbank_reg := io.cmdReq.bits.cmd.op2_bank - raddr_reg := io.cmdReq.bits.cmd.op1_bank_addr - rbank_reg := io.cmdReq.bits.cmd.op1_bank - iter_reg := io.cmdReq.bits.cmd.iter - mode_reg := io.cmdReq.bits.cmd.special(0) - } - // read req - when(((mode_reg === 1.U) &&(state === compute) && RegNext(io.sramWrite(0).req.ready))|| - ((mode_reg === 0.U) && (state === compute))){ - readCounter := readCounter + 1.U - io.sramRead(rbank_reg).req.valid := readCounter < iter_reg - io.sramRead(rbank_reg).req.bits.addr := raddr_reg + readCounter - state := Mux((readCounter >= iter_reg - 1.U) && io.cmdResp.ready, idle, state) - } - io.cmdResp.valid := (readCounter >= (iter_reg - 1.U)) && (state === compute) - io.cmdResp.bits.rob_id := robid_reg - - // read resp - io.sramRead(rbank_reg).resp.ready := true.B - val dataWord = io.sramRead(rbank_reg).resp.bits.data - val row = respCounter(4,0) - when(io.sramRead(rbank_reg).resp.fire && respWaitcounter > 0.U){ - for (col <- 0 until b.veclane) { - val hi = (col + 1) * b.inputType.getWidth - 1 - val lo = col * b.inputType.getWidth - regArray(row)(col) := dataWord(hi, lo) - } - respCounter := Mux(respCounter === iter_reg - 1.U, 0.U, respCounter + 1.U) - writeHeadptr := Mux(writeHeadptr === 2.U* b.veclane.U - 1.U, 0.U, writeHeadptr + 1.U) - respWaitcounter := Mux(state === idle && io.cmdReq.fire, io.cmdReq.bits.cmd.iter + respWaitcounter - 1.U, respWaitcounter - 1.U) - } - - // write req - val wreg = RegInit(0.U(10.W)) - val array_full = ((writeTailptr < b.veclane.U) && (writeHeadptr >= b.veclane.U)) || - ((writeTailptr >= b.veclane.U) && (writeHeadptr < b.veclane.U)) - when(writeCounter === iter_reg - 1.U){ - start_write := false.B - }.elsewhen( array_full && !start_write){ - start_write := true.B - wreg := waddr_reg - write_iter_reg := iter_reg - }.otherwise{ - start_write := start_write - } - - when(start_write){ - io.sramWrite(wbank_reg).req.valid := true.B - io.sramWrite(wbank_reg).req.bits.addr := wreg + writeCounter - io.sramWrite(wbank_reg).req.bits.data := Mux( writeCounter(4) === 0.U, Cat((0 until b.veclane).reverse.map(i => regArray(i)(writeCounter(3,0)))) , - Cat((0 until b.veclane).reverse.map(i => regArray(i + b.veclane)(writeCounter(3,0))))) - io.sramWrite(wbank_reg).req.bits.mask := VecInit(Seq.fill(b.spad_mask_len)(~0.U(1.W))) - writeCounter := Mux(writeCounter === write_iter_reg - 1.U, 0.U,writeCounter + 1.U) - writeTailptr := Mux(writeTailptr === 2.U* b.veclane.U - 1.U, 0.U, writeTailptr + 1.U) - } - // Status signals - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := state === idle - io.status.init := readCounter === 0.U && state === compute - io.status.running := state === compute - io.status.iter := readCounter - io.status.complete := io.cmdResp.valid - -} diff --git a/arch/src/main/scala/prototype/transpose/TransposeBall.scala b/arch/src/main/scala/prototype/transpose/TransposeBall.scala deleted file mode 100644 index a970ee8b..00000000 --- a/arch/src/main/scala/prototype/transpose/TransposeBall.scala +++ /dev/null @@ -1,59 +0,0 @@ -package prototype.transpose - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.transpose.PipelinedTransposer - -/** - * TransposeBall - A transpose computation Ball that complies with the Blink protocol - */ -class TransposeBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - def Blink: Blink = io - - // Instantiate PipelinedTransposer - val transposeUnit = Module(new PipelinedTransposer) - - // Connect command interface - transposeUnit.io.cmdReq <> io.cmdReq - transposeUnit.io.cmdResp <> io.cmdResp - - // Connect SRAM read interface - Transpose needs to read data from scratchpad - for (i <- 0 until b.sp_banks) { - transposeUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect SRAM write interface - Transpose needs to write to scratchpad - for (i <- 0 until b.sp_banks) { - transposeUnit.io.sramWrite(i) <> io.sramWrite(i).io - io.sramWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Handle Accumulator read interface - Transpose does not read accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramReadIO), we need to drive req.valid, req.bits (outputs) and resp.ready (output) - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Handle Accumulator write interface - Transpose does not write accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramWriteIO), we need to drive req.valid and req.bits (outputs) - io.accWrite(i).io.req.valid := false.B - io.accWrite(i).io.req.bits := DontCare - io.accWrite(i).rob_id := 0.U - } - - // Connect Status signals - directly obtained from internal unit - io.status <> transposeUnit.io.status - - override lazy val desiredName = "TransposeBall" -} diff --git a/arch/src/main/scala/prototype/vector/VecBall.scala b/arch/src/main/scala/prototype/vector/VecBall.scala deleted file mode 100644 index f44cd915..00000000 --- a/arch/src/main/scala/prototype/vector/VecBall.scala +++ /dev/null @@ -1,59 +0,0 @@ -package prototype.vector - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config.Parameters -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.{Blink, BallRegist} -import prototype.vector.VecUnit - -/** - * VecBall - A vector computation Ball that complies with the Blink protocol - */ -class VecBall(id: Int)(implicit b: CustomBuckyballConfig, p: Parameters) extends Module with BallRegist { - val io = IO(new Blink) - val ballId = id.U - - def Blink: Blink = io - - // Instantiate VecUnit - val vecUnit = Module(new VecUnit) - - // Connect command interface - vecUnit.io.cmdReq <> io.cmdReq - vecUnit.io.cmdResp <> io.cmdResp - - // Connect SRAM read interface - VecUnit needs to read data from scratchpad - for (i <- 0 until b.sp_banks) { - vecUnit.io.sramRead(i) <> io.sramRead(i).io - io.sramRead(i).rob_id := io.cmdReq.bits.rob_id - } - - // Handle SRAM write interface - VecUnit does not write to scratchpad, so tie off - for (i <- 0 until b.sp_banks) { - // For Flipped(SramWriteIO), we need to drive req.valid and req.bits (outputs) - io.sramWrite(i).io.req.valid := false.B - io.sramWrite(i).io.req.bits := DontCare - io.sramWrite(i).rob_id := 0.U - } - - // Handle Accumulator read interface - VecUnit does not read accumulator, so tie off - for (i <- 0 until b.acc_banks) { - // For Flipped(SramReadIO), we need to drive req.valid, req.bits (outputs) and resp.ready (output) - io.accRead(i).io.req.valid := false.B - io.accRead(i).io.req.bits := DontCare - io.accRead(i).io.resp.ready := true.B - io.accRead(i).rob_id := 0.U - } - - // Connect Accumulator write interface - VecUnit writes results to accumulator - for (i <- 0 until b.acc_banks) { - vecUnit.io.accWrite(i) <> io.accWrite(i).io - io.accWrite(i).rob_id := io.cmdReq.bits.rob_id - } - - // Connect Status signals - directly obtained from internal unit - io.status <> vecUnit.io.status - - override lazy val desiredName = "VecBall" -} diff --git a/arch/src/main/scala/prototype/vector/VecCtrlUnit.scala b/arch/src/main/scala/prototype/vector/VecCtrlUnit.scala deleted file mode 100644 index a8d59fb4..00000000 --- a/arch/src/main/scala/prototype/vector/VecCtrlUnit.scala +++ /dev/null @@ -1,115 +0,0 @@ -package prototype.vector - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig - -class VecCtrlUnit(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle{ - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp_o = Decoupled(new BallRsComplete) - - val ctrl_ld_o = Decoupled(new ctrl_ld_req) - val ctrl_st_o = Decoupled(new ctrl_st_req) - val ctrl_ex_o = Decoupled(new ctrl_ex_req) - - val cmdResp_i = Flipped(Valid(new Bundle {val commit = Bool()})) // from store unit - }) - - val rob_id_reg = RegInit(0.U(log2Up(b.rob_entries).W)) - val iter = RegInit(0.U(10.W)) - val op1_bank = RegInit(0.U(2.W)) - val op1_bank_addr = RegInit(0.U(12.W)) - val op2_bank_addr = RegInit(0.U(12.W)) - val op2_bank = RegInit(0.U(2.W)) - val wr_bank = RegInit(0.U(2.W)) - val wr_bank_addr = RegInit(0.U(12.W)) - val is_acc = RegInit(false.B) - val has_send = RegInit(false.B) - val mode = RegInit(0.U(1.W)) - - val idle :: busy :: Nil = Enum(2) - val state = RegInit(idle) - -// ----------------------------------------------------------------------------- -// Set registers when EX instruction arrives -// ----------------------------------------------------------------------------- - - when(io.cmdReq.fire) { - iter := io.cmdReq.bits.cmd.iter - rob_id_reg := io.cmdReq.bits.rob_id - op1_bank := io.cmdReq.bits.cmd.op1_bank - op1_bank_addr := io.cmdReq.bits.cmd.op1_bank_addr - op2_bank := io.cmdReq.bits.cmd.op2_bank - op2_bank_addr := io.cmdReq.bits.cmd.op2_bank_addr - wr_bank := io.cmdReq.bits.cmd.wr_bank - wr_bank_addr := io.cmdReq.bits.cmd.wr_bank_addr - is_acc := io.cmdReq.bits.cmd.is_acc - mode := io.cmdReq.bits.cmd.special(0) - - state := busy - } - -// ----------------------------------------------------------------------------- -// Send control signals to VecUnit's load/store/ex units -// ----------------------------------------------------------------------------- - - when(state === busy && !has_send) { - io.ctrl_ld_o.valid := true.B - io.ctrl_ld_o.bits.op1_bank := op1_bank - io.ctrl_ld_o.bits.op1_bank_addr := op1_bank_addr - io.ctrl_ld_o.bits.op2_bank := op2_bank - io.ctrl_ld_o.bits.op2_bank_addr := op2_bank_addr - io.ctrl_ld_o.bits.iter := iter - io.ctrl_ld_o.bits.mode := mode - - io.ctrl_ex_o.valid := true.B - io.ctrl_ex_o.bits.iter := iter - - io.ctrl_st_o.valid := true.B - io.ctrl_st_o.bits.wr_bank := wr_bank - io.ctrl_st_o.bits.wr_bank_addr := wr_bank_addr - io.ctrl_st_o.bits.iter := iter - - has_send := true.B - }.otherwise { - io.ctrl_ld_o.valid := false.B - io.ctrl_ld_o.bits.op1_bank := 0.U - io.ctrl_ld_o.bits.op1_bank_addr := 0.U - io.ctrl_ld_o.bits.op2_bank := 0.U - io.ctrl_ld_o.bits.op2_bank_addr := 0.U - io.ctrl_ld_o.bits.iter := 0.U - io.ctrl_ld_o.bits.mode := 0.U - - io.ctrl_ex_o.valid := false.B - io.ctrl_ex_o.bits.iter := 0.U - - io.ctrl_st_o.valid := false.B - io.ctrl_st_o.bits.wr_bank := 0.U - io.ctrl_st_o.bits.wr_bank_addr := 0.U - io.ctrl_st_o.bits.iter := 0.U - } - -// ----------------------------------------------------------------------------- -// Wait for VecUnit's final write-back to complete -// ----------------------------------------------------------------------------- - - when(io.cmdResp_i.valid) { - io.cmdResp_o.valid := true.B - io.cmdResp_o.bits.rob_id := rob_id_reg - state := idle - has_send := false.B - }.otherwise { - io.cmdResp_o.valid := false.B - io.cmdResp_o.bits.rob_id := 0.U - } - - io.cmdReq.ready := state === idle - -} diff --git a/arch/src/main/scala/prototype/vector/VecEXUnit.scala b/arch/src/main/scala/prototype/vector/VecEXUnit.scala deleted file mode 100644 index 9605304a..00000000 --- a/arch/src/main/scala/prototype/vector/VecEXUnit.scala +++ /dev/null @@ -1,86 +0,0 @@ -package prototype.vector - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO, SramReadResp} -import examples.BuckyballConfigs.CustomBuckyballConfig -import warp.VecBall - - -class ctrl_ex_req(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val iter = UInt(10.W) -} - -class ld_ex_req(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val op1 = Vec(b.veclane, UInt(b.inputType.getWidth.W)) - val op2 = Vec(b.veclane, UInt(b.inputType.getWidth.W)) - val iter = UInt(10.W) -} - -class VecEXUnit(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val ctrl_ex_i = Flipped(Decoupled(new ctrl_ex_req)) - val ld_ex_i = Flipped(Decoupled(new ld_ex_req)) - - val ex_st_o = Decoupled(new ex_st_req) - }) - - val idle :: busy :: Nil = Enum(2) - val state = RegInit(idle) - - val VecBall = Module(new VecBall) - - // Initialize default values for all signals - io.ctrl_ex_i.ready := false.B - io.ex_st_o.valid := false.B - io.ex_st_o.bits.rst := VecInit(Seq.fill(b.veclane)(0.U(b.accType.getWidth.W))) - io.ex_st_o.bits.iter := 0.U - - // Initialize VecBall input signals with default values - VecBall.io.iterIn.valid := false.B - VecBall.io.iterIn.bits := 0.U - VecBall.io.op1In.valid := false.B - VecBall.io.op1In.bits := VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - VecBall.io.op2In.valid := false.B - VecBall.io.op2In.bits := VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - VecBall.io.rstOut.ready := false.B - -// ----------------------------------------------------------------------------- -// Set registers when Ctrl instruction arrives -// ----------------------------------------------------------------------------- - io.ctrl_ex_i.ready := state === idle - when(io.ctrl_ex_i.fire) { - VecBall.io.iterIn.valid := true.B - VecBall.io.iterIn.bits := io.ctrl_ex_i.bits.iter - state := busy - } - -// ----------------------------------------------------------------------------- -// Accept read results from load unit and perform computation -// ----------------------------------------------------------------------------- - io.ld_ex_i.ready := state === busy && VecBall.io.iterIn.ready - when(io.ld_ex_i.valid) { - VecBall.io.op1In.valid := true.B - VecBall.io.op1In.bits := io.ld_ex_i.bits.op1 - VecBall.io.op2In.valid := true.B - VecBall.io.op2In.bits := io.ld_ex_i.bits.op2 - //assert((io.ld_ex_i.bits.iter - VecBall.get_iterCounter() === 16.U) && VecBall.get_arrive(), - //"[VecLoad -> VecEX] iteration mismatch") - } - -// ----------------------------------------------------------------------------- -// Send computation results to store unit for write-back -// ----------------------------------------------------------------------------- - io.ex_st_o.valid := VecBall.io.rstOut.valid - VecBall.io.rstOut.ready := io.ex_st_o.ready - - when(io.ex_st_o.fire) { - io.ex_st_o.bits.rst := VecBall.io.rstOut.bits - io.ex_st_o.bits.iter := VecBall.io.iterOut.bits - } - -} diff --git a/arch/src/main/scala/prototype/vector/VecLoadUnit.scala b/arch/src/main/scala/prototype/vector/VecLoadUnit.scala deleted file mode 100644 index d90b9809..00000000 --- a/arch/src/main/scala/prototype/vector/VecLoadUnit.scala +++ /dev/null @@ -1,149 +0,0 @@ -package prototype.vector - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO, SramReadReq, SramReadResp} -import examples.BuckyballConfigs.CustomBuckyballConfig - - -class ctrl_ld_req(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val op1_bank = UInt(log2Up(b.sp_banks).W) - val op1_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val op2_bank = UInt(log2Up(b.sp_banks).W) - val op2_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val iter = UInt(10.W) - val mode = UInt(1.W) -} - -class VecLoadUnit(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val rob_id_width = log2Up(b.rob_entries) - val io = IO(new Bundle { - val sramReadReq = Vec(b.sp_banks, Decoupled(new SramReadReq(b.spad_bank_entries))) - val sramReadResp = Vec(b.sp_banks, Flipped(Decoupled(new SramReadResp(b.spad_w)))) - val ctrl_ld_i = Flipped(Decoupled(new ctrl_ld_req)) - val ld_ex_o = Decoupled(new ld_ex_req) - }) - - val op1_bank = RegInit(0.U(log2Up(b.sp_banks).W)) - val op2_bank = RegInit(0.U(log2Up(b.sp_banks).W)) - val op1_addr = RegInit(0.U(log2Up(b.spad_bank_entries).W)) - val op2_addr = RegInit(0.U(log2Up(b.spad_bank_entries).W)) - val iter = RegInit(0.U(10.W)) - val iter_counter = RegInit(0.U(10.W)) - val mode = RegInit(0.U(1.W)) - - val idle :: busy :: Nil = Enum(2) - val state = RegInit(idle) - - // Output register to break combinational logic loop - val ld_ex_valid_reg = RegInit(false.B) - val ld_ex_op1_reg = Reg(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - val ld_ex_op2_reg = Reg(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - val ld_ex_iter_reg = RegInit(0.U(10.W)) - - // Default assignment for each bank read request - for (i <- 0 until b.sp_banks){ - io.sramReadReq(i).valid := false.B - io.sramReadReq(i).bits.fromDMA := false.B - io.sramReadReq(i).bits.addr := 0.U - } - - io.ctrl_ld_i.ready := state === idle - -// ----------------------------------------------------------------------------- -// Set registers when Ctrl instruction arrives -// ----------------------------------------------------------------------------- - - when (io.ctrl_ld_i.fire) { - op1_bank := io.ctrl_ld_i.bits.op1_bank - op2_bank := io.ctrl_ld_i.bits.op2_bank - op1_addr := io.ctrl_ld_i.bits.op1_bank_addr - op2_addr := io.ctrl_ld_i.bits.op2_bank_addr - iter := io.ctrl_ld_i.bits.iter - iter_counter := 0.U - state := busy - mode := io.ctrl_ld_i.bits.mode - assert(io.ctrl_ld_i.bits.iter > 0.U, "iter should be greater than 0") - } - io.sramReadResp.foreach { resp => - resp.ready := state === busy - } - when(mode === 0.U){ -// ----------------------------------------------------------------------------- -// Send SRAM read request (only when output register is idle) -// ----------------------------------------------------------------------------- - when (state === busy && (!ld_ex_valid_reg || io.ld_ex_o.ready)) { - io.sramReadReq(op1_bank).valid := iter_counter < iter - io.sramReadReq(op1_bank).bits.fromDMA := false.B - io.sramReadReq(op1_bank).bits.addr := op1_addr + iter_counter - - io.sramReadReq(op2_bank).valid := iter_counter < iter - io.sramReadReq(op2_bank).bits.fromDMA := false.B - io.sramReadReq(op2_bank).bits.addr := op2_addr + iter_counter - iter_counter := iter_counter + 1.U - } - -// ----------------------------------------------------------------------------- -// SRAM returns data and passes to EX unit (use register to break combinational logic loop) -// ----------------------------------------------------------------------------- - // ready signal for sramReadResp: can receive when there's no pending data or downstream has received - /* - io.sramReadResp.foreach { resp => - resp.ready := !ld_ex_valid_reg || io.ld_ex_o.ready - } -*/ - // Receive SRAM data and cache to register - when (io.sramReadResp(op1_bank).valid && io.sramReadResp(op2_bank).valid && - (!ld_ex_valid_reg || io.ld_ex_o.ready) && (state === busy)) { - ld_ex_valid_reg := true.B - ld_ex_op1_reg := io.sramReadResp(op1_bank).bits.data.asTypeOf(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - ld_ex_op2_reg := io.sramReadResp(op2_bank).bits.data.asTypeOf(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - ld_ex_iter_reg := iter_counter - }.elsewhen(io.ld_ex_o.ready) { - ld_ex_valid_reg := false.B - } - - // Output comes from register - io.ld_ex_o.valid := ld_ex_valid_reg - io.ld_ex_o.bits.op1 := ld_ex_op1_reg - io.ld_ex_o.bits.op2 := ld_ex_op2_reg - io.ld_ex_o.bits.iter := ld_ex_iter_reg - -// ----------------------------------------------------------------------------- -// Reset iter_counter and return to idle state -// ----------------------------------------------------------------------------- - - when(state === busy && iter_counter === iter && (!ld_ex_valid_reg || io.ld_ex_o.ready)) { - state := idle - iter_counter := 0.U - } - }.otherwise{ - // Default assignment - io.ld_ex_o.valid := false.B - io.ld_ex_o.bits.op1 := VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - io.ld_ex_o.bits.op2 := VecInit(Seq.fill(b.veclane)(0.U(b.inputType.getWidth.W))) - io.ld_ex_o.bits.iter := 0.U - when (state === busy && io.sramReadResp(0).valid){ - iter_counter := iter_counter + 1.U - ld_ex_op1_reg := io.sramReadResp(0).bits.data.asTypeOf(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - io.sramReadReq(1).valid := true.B - io.sramReadReq(1).bits.addr := op2_addr + iter_counter - io.sramReadReq(1).bits.fromDMA := false.B - } - when(state === busy && io.sramReadResp(1).valid && RegNext(io.sramReadResp(0).valid)){ - io.ld_ex_o.valid := true.B - io.ld_ex_o.bits.op1 := ld_ex_op1_reg - io.ld_ex_o.bits.op2 := io.sramReadResp(1).bits.data.asTypeOf(Vec(b.veclane, UInt(b.inputType.getWidth.W))) - io.ld_ex_o.bits.iter := iter_counter - 1.U - } - when(state === busy && iter_counter === iter ){ - state := idle - iter_counter := 0.U - } - } - -} diff --git a/arch/src/main/scala/prototype/vector/VecStoreUnit.scala b/arch/src/main/scala/prototype/vector/VecStoreUnit.scala deleted file mode 100644 index 74f203e8..00000000 --- a/arch/src/main/scala/prototype/vector/VecStoreUnit.scala +++ /dev/null @@ -1,126 +0,0 @@ -package prototype.vector - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import examples.BuckyballConfigs.CustomBuckyballConfig - - -class ctrl_st_req(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - val wr_bank = UInt(log2Up(b.sp_banks).W) - val wr_bank_addr = UInt(log2Up(b.spad_bank_entries).W) - val iter = UInt(10.W) -} - -class ex_st_req(implicit b: CustomBuckyballConfig, p: Parameters) extends Bundle { - // Use accumulator type, 32 bits - val rst = Vec(b.veclane, UInt(b.accType.getWidth.W)) - val iter = UInt(10.W) -} - -class VecStoreUnit(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Bundle { - val ctrl_st_i = Flipped(Decoupled(new ctrl_st_req)) - val ex_st_i = Flipped(Decoupled(new ex_st_req)) - - // val sramWrite = Vec(b.sp_banks, new SramWriteIO(b.sp_bank_entries, spad_w, spad_w/8)) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - val cmdResp_o = Valid(new Bundle {val commit = Bool()}) - }) - - // val wr_bank = RegInit(0.U(log2Up(b.sp_banks).W)) - val wr_bank_addr = RegInit(0.U(log2Up(b.spad_bank_entries).W)) - val iter = RegInit(0.U(10.W)) - val iter_counter = RegInit(0.U(10.W)) - - - val idle :: busy :: Nil = Enum(2) - val state = RegInit(idle) - -// ----------------------------------------------------------------------------- -// Set registers when Ctrl instruction arrives -// ----------------------------------------------------------------------------- - io.ctrl_st_i.ready := state === idle - - when(io.ctrl_st_i.fire) { - // wr_bank := io.ctrl_st_i.bits.wr_bank - wr_bank_addr := io.ctrl_st_i.bits.wr_bank_addr - iter := (io.ctrl_st_i.bits.iter + 15.U(10.W)) & (~15.U(10.W)) - iter_counter := 0.U - state := busy - } - -// ----------------------------------------------------------------------------- -// Accept computation results from EX unit and perform write-back -// ----------------------------------------------------------------------------- - io.ex_st_i.ready := state === busy - // for(i <- 0 until b.sp_banks) { - // io.sramWrite(i).en := false.B - // io.sramWrite(i).addr := 0.U - // io.sramWrite(i).data := 0.U - // io.sramWrite(i).mask := VecInit(Seq.fill(spad_w / 8)(false.B)) - // } -io.accWrite.foreach { acc => - acc.req.valid := false.B - acc.req.bits.addr := 0.U - acc.req.bits.data := Cat(Seq.fill(b.acc_w / 8)(0.U(8.W))) - acc.req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(false.B)) - } -val waddr = wr_bank_addr + iter_counter(log2Ceil(b.veclane) - 1, 0) - when(io.ex_st_i.fire) { - for(i <- 0 until b.acc_banks/2) { - when(waddr(0) === 0.U){ - io.accWrite(i).req.valid := true.B - io.accWrite(i).req.bits.addr := wr_bank_addr + (iter_counter(log2Ceil(b.veclane) - 1, 0) >> 1.U) - - // Each accumulator bank stores veclane/acc_banks elements - // 16/4 = 4 elements - val elementsPerBank = b.veclane / b.acc_banks * 2 - val startIdx = i * elementsPerBank - val endIdx = startIdx + elementsPerBank - 1 - - // Pack corresponding elements into a UInt - val bankData = Cat(io.ex_st_i.bits.rst.slice(startIdx, endIdx + 1).reverse) - io.accWrite(i).req.bits.data := bankData - - io.accWrite(i).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - }.otherwise{ - io.accWrite(i + b.acc_banks/2).req.valid := true.B - io.accWrite(i + b.acc_banks/2).req.bits.addr := wr_bank_addr + (iter_counter(log2Ceil(b.veclane) - 1, 0) >> 1.U) - - // Each accumulator bank stores veclane/acc_banks elements - // 16/4 = 4 elements - val elementsPerBank = b.veclane / b.acc_banks * 2 - val startIdx = i * elementsPerBank - val endIdx = startIdx + elementsPerBank - 1 - - // Pack corresponding elements into a UInt - val bankData = Cat(io.ex_st_i.bits.rst.slice(startIdx, endIdx + 1).reverse) - io.accWrite(i + b.acc_banks/2).req.bits.data := bankData - - io.accWrite(i + b.acc_banks/2).req.bits.mask := VecInit(Seq.fill(b.acc_mask_len)(true.B)) - } - } - iter_counter := iter_counter + 1.U - } - -// ----------------------------------------------------------------------------- -// Reset iter counter, commit cmdResp, return to idle state -// ----------------------------------------------------------------------------- - when(state === busy && iter_counter >= iter) { - state := idle - io.cmdResp_o.valid := true.B - io.cmdResp_o.bits.commit := true.B - }.otherwise { - io.cmdResp_o.valid := false.B - io.cmdResp_o.bits.commit := false.B - } - - - -} diff --git a/arch/src/main/scala/prototype/vector/VecUnit.scala b/arch/src/main/scala/prototype/vector/VecUnit.scala deleted file mode 100644 index 460ac337..00000000 --- a/arch/src/main/scala/prototype/vector/VecUnit.scala +++ /dev/null @@ -1,105 +0,0 @@ -package prototype.vector -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector._ -import framework.memdomain.mem.{SramReadIO, SramWriteIO} -import framework.balldomain.rs.{BallRsIssue, BallRsComplete} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Status - - -class VecUnit(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val spad_w = b.veclane * b.inputType.getWidth - - val io = IO(new Bundle { - val cmdReq = Flipped(Decoupled(new BallRsIssue)) - val cmdResp = Decoupled(new BallRsComplete) - - // Connect to Scratchpad SRAM read/write interface - val sramRead = Vec(b.sp_banks, Flipped(new SramReadIO(b.spad_bank_entries, spad_w))) - // val sramWrite = Vec(b.sp_banks, Flipped(new SramWriteIO(b.spad_bank_entries, spad_w, b.spad_mask_len))) - // Connect to Accumulator read/write interface - // val accRead = Vec(b.acc_banks, Flipped(new SramReadIO(b.acc_bank_entries, b.acc_w))) - val accWrite = Vec(b.acc_banks, Flipped(new SramWriteIO(b.acc_bank_entries, b.acc_w, b.acc_mask_len))) - - // Status output - val status = new Status - }) - -// ----------------------------------------------------------------------------- -// VECCTRLUNIT -// ----------------------------------------------------------------------------- - val VecCtrlUnit = Module(new VecCtrlUnit) - VecCtrlUnit.io.cmdReq <> io.cmdReq - io.cmdResp <> VecCtrlUnit.io.cmdResp_o - -// ----------------------------------------------------------------------------- -// VECLOADUNIT -// ----------------------------------------------------------------------------- - val VecLoadUnit = Module(new VecLoadUnit) - VecLoadUnit.io.ctrl_ld_i <> VecCtrlUnit.io.ctrl_ld_o - for (i <- 0 until b.sp_banks) { - io.sramRead(i).req <> VecLoadUnit.io.sramReadReq(i) - VecLoadUnit.io.sramReadResp(i) <> io.sramRead(i).resp - } - -// ----------------------------------------------------------------------------- -// VECEX -// ----------------------------------------------------------------------------- - val VecEX = Module(new VecEXUnit) - VecEX.io.ctrl_ex_i <> VecCtrlUnit.io.ctrl_ex_o - VecEX.io.ld_ex_i <> VecLoadUnit.io.ld_ex_o - - -// ----------------------------------------------------------------------------- -// VECSTOREUNIT -// ----------------------------------------------------------------------------- - val VecStoreUnit = Module(new VecStoreUnit) - VecStoreUnit.io.ctrl_st_i <> VecCtrlUnit.io.ctrl_st_o - VecStoreUnit.io.ex_st_i <> VecEX.io.ex_st_o - for (i <- 0 until b.acc_banks) { - io.accWrite(i) <> VecStoreUnit.io.accWrite(i) - } - VecCtrlUnit.io.cmdResp_i <> VecStoreUnit.io.cmdResp_o - - -// ----------------------------------------------------------------------------- -// Set DontCare -// ----------------------------------------------------------------------------- - // for (i <- 0 until b.sp_banks) { - // io.sramWrite(i) := DontCare - // } - // for (i <- 0 until b.acc_banks) { - // io.accRead(i) := DontCare - // } - -// ----------------------------------------------------------------------------- -// Status tracking -// ----------------------------------------------------------------------------- - val iterCnt = RegInit(0.U(32.W)) - val hasInput = RegInit(false.B) - val hasOutput = RegInit(false.B) - - when(io.cmdReq.fire) { - hasInput := true.B - } - when(io.cmdResp.fire) { - hasOutput := false.B - hasInput := false.B - iterCnt := iterCnt + 1.U - } - when(io.cmdResp.valid && !hasOutput) { - hasOutput := true.B - } - - io.status.ready := io.cmdReq.ready - io.status.valid := io.cmdResp.valid - io.status.idle := !hasInput && !hasOutput - io.status.init := hasInput && !hasOutput - io.status.running := hasOutput - io.status.complete := io.cmdResp.fire - io.status.iter := iterCnt -} diff --git a/arch/src/main/scala/prototype/vector/bond/BondWrapper.scala b/arch/src/main/scala/prototype/vector/bond/BondWrapper.scala deleted file mode 100644 index 902a9941..00000000 --- a/arch/src/main/scala/prototype/vector/bond/BondWrapper.scala +++ /dev/null @@ -1,22 +0,0 @@ -package prototype.vector.bond - -import org.chipsalliance.cde.config._ -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import org.chipsalliance.diplomacy._ -import org.chipsalliance.diplomacy.bundlebridge._ -import org.chipsalliance.diplomacy.lazymodule._ -import org.chipsalliance.diplomacy.nodes._ - -abstract class BondWrapper(implicit p: Parameters) extends LazyModule { - val bondName = "vvv" - - def to[T](name: String)(body: => T): T = { - LazyScope(s"bond_to_${name}", s"Bond_${bondName}_to_${name}") { body } - } - - def from[T](name: String)(body: => T): T = { - LazyScope(s"bond_from_${name}", s"Bond_${bondName}_from_${name}") { body } - } -} diff --git a/arch/src/main/scala/prototype/vector/bond/vvv.scala b/arch/src/main/scala/prototype/vector/bond/vvv.scala deleted file mode 100644 index 614a7bdf..00000000 --- a/arch/src/main/scala/prototype/vector/bond/vvv.scala +++ /dev/null @@ -1,40 +0,0 @@ -//===----------------------------------------------------------------------===// -// VVV Bond: -// Input: Vec, Vec -// Output: Vec -//===----------------------------------------------------------------------===// - -package prototype.vector.bond - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import prototype.vector.thread.{ThreadKey, ThreadBondKey, BaseThread} - -class VVV(implicit p: Parameters) extends Bundle { - val lane = p(ThreadKey).get.lane - val bondParam = p(ThreadBondKey).get - val inputWidth = bondParam.inputWidth - val outputWidth = bondParam.outputWidth - - // Input interface (Flipped Decoupled) - val in = Flipped(Decoupled(new Bundle { - val in1 = Vec(lane, UInt(inputWidth.W)) - val in2 = Vec(lane, UInt(inputWidth.W)) - })) - - // Decoupled output interface - val out = Decoupled(new Bundle { - val out = Vec(lane, UInt(outputWidth.W)) - }) -} - -trait CanHaveVVVBond { this: BaseThread => - val vvvBond = params(ThreadBondKey).filter(_.bondType == "vvv").map { bondParam => - // println(s"[VVVBond] Creating BondType: ${bondParam.bondType}") - - IO(new VVV()(params)) - } - - def getVVVBond = vvvBond -} diff --git a/arch/src/main/scala/prototype/vector/op/cascade.scala b/arch/src/main/scala/prototype/vector/op/cascade.scala deleted file mode 100644 index ecabb34a..00000000 --- a/arch/src/main/scala/prototype/vector/op/cascade.scala +++ /dev/null @@ -1,52 +0,0 @@ -package prototype.vector.op - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import prototype.vector.thread.{ThreadKey, ThreadOpKey, ThreadBondKey, BaseThread} -import prototype.vector.bond.VVV - -class CascadeOp(implicit p: Parameters) extends Module { - val lane = p(ThreadKey).get.lane - val bondParam = p(ThreadBondKey).get - val outputWidth = bondParam.outputWidth - - val io = IO(new VVV()(p)) - - val reg1 = RegInit(VecInit(Seq.fill(lane)(0.U(outputWidth.W)))) - val reg2 = RegInit(VecInit(Seq.fill(lane)(0.U(outputWidth.W)))) - val valid1 = RegInit(false.B) - val valid2 = RegInit(false.B) - - io.in.ready := io.out.ready - - when (io.in.valid) { - valid1 := true.B - reg1 := io.in.bits.in1.zip(io.in.bits.in2).map { case (a, b) => a + b } - }.elsewhen(!io.in.ready){ - valid1 := valid1 - }.otherwise { - valid1 := false.B - } - - - val valid = valid1 - - when (io.out.ready && valid) { - io.out.valid := true.B - io.out.bits.out := reg1 - }.otherwise { - io.out.valid := false.B - io.out.bits.out := VecInit(Seq.fill(lane)(0.U(outputWidth.W))) - } -} - -trait CanHaveCascadeOp { this: BaseThread => - val cascadeOp = params(ThreadOpKey).filter(_.OpType == "cascade").map { opParam => - // println(s"[CascadeOp] Creating OpType: ${opParam.OpType}") - - Module(new CascadeOp()(params)) - } - - def getCascadeOp = cascadeOp -} diff --git a/arch/src/main/scala/prototype/vector/op/mul.scala b/arch/src/main/scala/prototype/vector/op/mul.scala deleted file mode 100644 index 91ed3f52..00000000 --- a/arch/src/main/scala/prototype/vector/op/mul.scala +++ /dev/null @@ -1,51 +0,0 @@ -package prototype.vector.op - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import prototype.vector.thread.{ThreadKey, ThreadOpKey, ThreadBondKey, BaseThread} -import prototype.vector.bond.VVV - -class MulOp(implicit p: Parameters) extends Module { - val lane = p(ThreadKey).get.lane - val bondParam = p(ThreadBondKey).get - val inputWidth = bondParam.inputWidth - - val io = IO(new VVV()(p)) - - val reg1 = RegInit(VecInit(Seq.fill(lane)(0.U(inputWidth.W)))) - val reg2 = RegInit(VecInit(Seq.fill(lane)(0.U(inputWidth.W)))) - - val cnt = RegInit(0.U(log2Ceil(lane).W)) - val active = RegInit(false.B) - - io.out.valid := active && io.out.ready - io.in.ready := io.out.ready - - when (io.in.valid) { - reg1 := io.in.bits.in1 - reg2 := io.in.bits.in2 - cnt := 0.U - active := true.B - } .elsewhen (active && io.out.ready) { - cnt := cnt + 1.U - when (cnt === (lane-1).U) { - active := false.B - } - } - - for (i <- 0 until lane) { - io.out.bits.out(i) := Mux(io.out.valid, reg1(cnt) * reg2(i), 0.U) - } - -} - -trait CanHaveMulOp { this: BaseThread => - val mulOp = params(ThreadOpKey).filter(_.OpType == "mul").map { opParam => - // println(s"[MulOp] Creating OpType: ${opParam.OpType}") - - Module(new MulOp()(params)) - } - - def getMulOp = mulOp -} diff --git a/arch/src/main/scala/prototype/vector/thread/BaseThread.scala b/arch/src/main/scala/prototype/vector/thread/BaseThread.scala deleted file mode 100644 index 0f9740cd..00000000 --- a/arch/src/main/scala/prototype/vector/thread/BaseThread.scala +++ /dev/null @@ -1,31 +0,0 @@ -//===- BaseThread.scala - Level 1: Thread ---===// -package prototype.vector.thread - -import chisel3._ -import org.chipsalliance.cde.config._ - -// Parameter definitions -case class ThreadParam(lane: Int, attr: String, threadName: String, Op: OpParam) -case class OpParam(OpType: String, bondType: BondParam) -case class BondParam(bondType: String, inputWidth: Int = 8, outputWidth: Int = 32) - -case object ThreadKey extends Field[Option[ThreadParam]](None) -case object ThreadOpKey extends Field[Option[OpParam]](None) -case object ThreadBondKey extends Field[Option[BondParam]](None) -case object ThreadMapKey extends Field[Map[String, ThreadParam]](Map.empty) - -//===----------------------------------------------------------------------===// -// BaseThread base class -//===----------------------------------------------------------------------===// -class BaseThread(implicit p: Parameters) extends Module { - val io = IO(new Bundle {}) - val params = p - val threadMap = p(ThreadMapKey) - val threadParam = threadMap.getOrElse( - p(ThreadKey).get.threadName, - throw new Exception(s"ThreadParam not found for threadName: ${p(ThreadKey).get.threadName}") - ) - val opParam = p(ThreadOpKey).get - val bondParam = p(ThreadBondKey).get - println(s"[Thread_${threadParam.threadName}] Op: ${opParam.OpType}, bond: ${bondParam.bondType}, Lanes: ${threadParam.lane}") -} diff --git a/arch/src/main/scala/prototype/vector/thread/CasThread.scala b/arch/src/main/scala/prototype/vector/thread/CasThread.scala deleted file mode 100644 index e90d4beb..00000000 --- a/arch/src/main/scala/prototype/vector/thread/CasThread.scala +++ /dev/null @@ -1,21 +0,0 @@ -package prototype.vector.thread - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import prototype.vector.bond.CanHaveVVVBond -import prototype.vector.op.{CascadeOp, CanHaveCascadeOp} - -class CasThread(implicit p: Parameters) extends BaseThread - with CanHaveCascadeOp - with CanHaveVVVBond { - - // Connect CascadeOp and VVVBond - for { - op <- cascadeOp - bond <- vvvBond - } { - op.io.in <> bond.in - op.io.out <> bond.out - } -} diff --git a/arch/src/main/scala/prototype/vector/thread/MulThread.scala b/arch/src/main/scala/prototype/vector/thread/MulThread.scala deleted file mode 100644 index 5be6d498..00000000 --- a/arch/src/main/scala/prototype/vector/thread/MulThread.scala +++ /dev/null @@ -1,21 +0,0 @@ -package prototype.vector.thread - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ -import prototype.vector.bond.CanHaveVVVBond -import prototype.vector.op.{MulOp, CanHaveMulOp} - -class MulThread(implicit p: Parameters) extends BaseThread - with CanHaveMulOp - with CanHaveVVVBond { - - // Connect MulOp and VVVBond - for { - op <- mulOp - bond <- vvvBond - } { - op.io.in <> bond.in - op.io.out <> bond.out - } -} diff --git a/arch/src/main/scala/prototype/vector/warp/MeshWarp.scala b/arch/src/main/scala/prototype/vector/warp/MeshWarp.scala deleted file mode 100644 index b974e035..00000000 --- a/arch/src/main/scala/prototype/vector/warp/MeshWarp.scala +++ /dev/null @@ -1,138 +0,0 @@ -package prototype.vector.warp - -import chisel3._ -import chisel3.util._ -import chisel3.stage._ -import org.chipsalliance.cde.config.Parameters - -import prototype.vector.thread._ -import prototype.vector.bond.BondWrapper - -class MeshWarpInput extends Bundle { - val op1 = Vec(16, UInt(8.W)) - val op2 = Vec(16, UInt(8.W)) - val thread_id = UInt(10.W) -} - -class MeshWarpOutput extends Bundle { - val res = Vec(16, UInt(32.W)) -} - -class MeshWarp(implicit p: Parameters) extends Module { - val io = IO(new Bundle { - val in = Flipped(Decoupled(new MeshWarpInput)) - val out = Decoupled(new MeshWarpOutput) - }) - - val threadMap = (0 until 32).map { i => - val threadName = i.toString - // 0-15 mul, 16-31 cascade - val opType = if (i < 16) "mul" else "cascade" - // mul operation: 8-bit input, 32-bit output; cascade operation: 32-bit input, 32-bit output - val bond = if (opType == "mul") { - BondParam("vvv", inputWidth = 8, outputWidth = 32) - } else { - BondParam("vvv", inputWidth = 32, outputWidth = 32) - } - val op = OpParam(opType, bond) - // All threads use the same lane count - val thread = ThreadParam(16, s"attr$threadName", threadName, op) - threadName -> thread - }.toMap - - val mulThreads = (0 until 16).map { i => - val threadName = i.toString - val threadParam = threadMap(threadName) - val opParam = threadParam.Op - val bondParam = opParam.bondType - val threadParams = p.alterMap(Map( - ThreadMapKey -> threadMap, - ThreadKey -> Some(threadParam), - ThreadOpKey -> Some(opParam), - ThreadBondKey -> Some(bondParam) - )) - Module(new MulThread()(threadParams)) - } - - val casThreads = (16 until 32).map { i => - val threadName = i.toString - val threadParam = threadMap(threadName) - val opParam = threadParam.Op - val bondParam = opParam.bondType - val threadParams = p.alterMap(Map( - ThreadMapKey -> threadMap, - ThreadKey -> Some(threadParam), - ThreadOpKey -> Some(opParam), - ThreadBondKey -> Some(bondParam) - )) - Module(new CasThread()(threadParams)) - } - - io.in.ready := mulThreads(0).vvvBond.get.in.ready - - for (i <- 0 until 16) { - val mulThread = mulThreads(i) - val casThread = casThreads(i) - for { - mulBond <- mulThread.vvvBond - casBond <- casThread.vvvBond - } { - // Connect mul thread output to cascade thread input - casBond.in.bits.in1 := mulBond.out.bits.out - mulBond.out.ready := casBond.in.ready - - // Connect cascade thread's second input and output ready signal - if (i == 0) { - casBond.in.bits.in2 := VecInit(Seq.fill(16)(0.U(32.W))) - // First cascade thread's valid is determined by mulBond's output valid - casBond.in.valid := mulBond.out.valid - // First cascade thread's output ready is connected to next cascade thread's input ready - if (i < 15) { - for { - nextCasBond <- casThreads(i + 1).vvvBond - } { - casBond.out.ready := nextCasBond.in.ready - } - } - } else { - for { - prevCasBond <- casThreads(i - 1).vvvBond - } { - // Directly connect 32-bit output to 32-bit input - casBond.in.bits.in2 := prevCasBond.out.bits.out - // casBond's valid is jointly determined by previous casBond's output valid and current mulBond's output valid - casBond.in.valid := prevCasBond.out.valid || mulBond.out.valid - } - // Middle cascade thread's output ready is connected to next cascade thread's input ready - if (i < 15) { - for { - nextCasBond <- casThreads(i + 1).vvvBond - } { - casBond.out.ready := nextCasBond.in.ready - } - } - } - - // Only allow mulOp corresponding to thread_id to drive input - when (i.U === io.in.bits.thread_id && io.in.valid) { - mulBond.in.valid := true.B - mulBond.in.bits.in1 := io.in.bits.op1 - mulBond.in.bits.in2 := io.in.bits.op2 - io.in.ready := mulBond.in.ready - }.otherwise { - mulBond.in.valid := false.B - mulBond.in.bits.in1 := VecInit(Seq.fill(16)(0.U(8.W))) - mulBond.in.bits.in2 := VecInit(Seq.fill(16)(0.U(8.W))) - } - } - } - - // Connect output - for { - finalCasBond <- casThreads(15).vvvBond - } { - io.out.valid := finalCasBond.out.valid - io.out.bits.res := finalCasBond.out.bits.out - finalCasBond.out.ready := io.out.ready - } -} diff --git a/arch/src/main/scala/prototype/vector/warp/VecBall.scala b/arch/src/main/scala/prototype/vector/warp/VecBall.scala deleted file mode 100644 index f4b632d1..00000000 --- a/arch/src/main/scala/prototype/vector/warp/VecBall.scala +++ /dev/null @@ -1,89 +0,0 @@ -package prototype.vector.warp - -import chisel3._ -import chisel3.util._ -import org.chipsalliance.cde.config._ - -class BallIO extends Bundle { - // val start = Output(Bool()) - // val arrive = Output(Bool()) - // val done = Output(Bool()) - val iterIn = Flipped(Decoupled(UInt(10.W))) - val iterOut = Valid(UInt(10.W)) -} - -class VecBallIO extends BallIO { - val op1In = Flipped(Valid(Vec(16, UInt(8.W)))) - val op2In = Flipped(Valid(Vec(16, UInt(8.W)))) - val rstOut = Decoupled(Vec(16, UInt(32.W))) -} - -class VecBall(implicit p: Parameters) extends Module { - val io = IO(new VecBallIO()) - - // Internal state registers & iteration counter - val start = RegInit(false.B) - val arrive = RegInit(false.B) - val done = RegInit(false.B) - val iter = RegInit(0.U(10.W)) - val iterCounter = RegInit(0.U(10.W)) - - // Independent control logic - val threadId = RegInit(0.U(4.W)) - when (io.op1In.valid && io.op2In.valid && threadId < 15.U) { - threadId := threadId + 1.U - } .elsewhen (io.op1In.valid && io.op2In.valid && threadId === 15.U) { - threadId := 0.U - } - - // Instantiate MeshWarp - val meshWarp = Module(new MeshWarp()(p)) - - // Connect external IO to MeshWarp - meshWarp.io.in.valid := io.op1In.valid && io.op2In.valid - meshWarp.io.in.bits.op1 := io.op1In.bits - meshWarp.io.in.bits.op2 := io.op2In.bits - meshWarp.io.in.bits.thread_id := threadId - - io.rstOut.valid := meshWarp.io.out.valid - io.rstOut.bits := meshWarp.io.out.bits.res - meshWarp.io.out.ready := io.rstOut.ready - - // Handle iteration input - when (io.iterIn.fire) {iterCounter := 0.U; iter := io.iterIn.bits} - // Pull start high when external input arrives - when (io.op1In.valid && io.op2In.valid) {start := true.B} - // Pull arrive high when first output starts to be valid - when (io.rstOut.valid && !arrive) {arrive := true.B} - // Increment by one for each output - when (io.rstOut.valid && iterCounter =/= iter) {iterCounter := iterCounter + 1.U} - // Pull done high when iter returns to 0 - when (iterCounter === iter) {done := true.B} - - // Reset logic - when (io.iterIn.fire) { - start := false.B - arrive := false.B - done := false.B - iterCounter := 0.U - } - - // Output state - // io.start := start - // io.arrive := arrive - // io.done := done - - // Output current iteration count - io.iterOut.valid := io.rstOut.valid - io.iterOut.bits := iterCounter - io.iterIn.ready := meshWarp.io.in.ready - - // def get_iterCounter(): UInt = { - // iterCounter - // } - - // def get_arrive(): Bool = { - // arrive - // } - -} diff --git a/arch/src/main/scala/sims/bebop/BebopCosimBlocks.scala b/arch/src/main/scala/sims/bebop/BebopCosimBlocks.scala new file mode 100644 index 00000000..531ba63b --- /dev/null +++ b/arch/src/main/scala/sims/bebop/BebopCosimBlocks.scala @@ -0,0 +1,144 @@ +package sims.bebop + +import chisel3._ +import chisel3.util._ + +/** Per-`funct` hooks for Spike↔Verilator cosim. Keep literals in sync with `bebop/src/emu/inst/decode.rs`. */ +object BebopCosimBlocks { + + // FUNCT_* (7-bit RoCC custom field) + val F_FENCE: UInt = 0.U(7.W) + val F_BARRIER: UInt = 1.U(7.W) + val F_GEMMINI_CONFIG: UInt = 2.U(7.W) + val F_GEMMINI_FLUSH: UInt = 3.U(7.W) + val F_BDB_COUNTER: UInt = 4.U(7.W) + val F_MVOUT: UInt = 16.U(7.W) + val F_MSET: UInt = 32.U(7.W) + val F_MVIN: UInt = 33.U(7.W) + val F_IM2COL: UInt = 48.U(7.W) + val F_TRANSPOSE: UInt = 49.U(7.W) + val F_RELU: UInt = 50.U(7.W) + val F_QUANT: UInt = 51.U(7.W) + val F_DEQUANT: UInt = 52.U(7.W) + val F_GEMMINI_PRELOAD: UInt = 53.U(7.W) + val F_BDB_BACKDOOR: UInt = 54.U(7.W) + val F_MUL_WARP16: UInt = 64.U(7.W) + val F_BFP: UInt = 65.U(7.W) + val F_GEMMINI_COMPUTE_PRELOADED: UInt = 66.U(7.W) + val F_GEMMINI_COMPUTE_ACCUMULATED: UInt = 67.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_BOUNDS: UInt = 80.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_ADDR_A: UInt = 81.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_ADDR_B: UInt = 82.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_ADDR_D: UInt = 83.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_ADDR_C: UInt = 84.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_STRIDES_AB: UInt = 85.U(7.W) + val F_GEMMINI_LOOP_WS_CONFIG_STRIDES_DC: UInt = 86.U(7.W) + val F_GEMMINI_LOOP_WS: UInt = 87.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_1: UInt = 96.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_2: UInt = 97.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_3: UInt = 98.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_4: UInt = 99.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_5: UInt = 100.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_6: UInt = 101.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_7: UInt = 102.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_8: UInt = 103.U(7.W) + val F_GEMMINI_LOOP_CONV_WS_CONFIG_9: UInt = 104.U(7.W) + val F_GEMMINI_LOOP_CONV_WS: UInt = 105.U(7.W) + + val knownFuncts: Seq[UInt] = Seq( + F_FENCE, + F_BARRIER, + F_GEMMINI_CONFIG, + F_GEMMINI_FLUSH, + F_BDB_COUNTER, + F_MVOUT, + F_MSET, + F_MVIN, + F_IM2COL, + F_TRANSPOSE, + F_RELU, + F_QUANT, + F_DEQUANT, + F_GEMMINI_PRELOAD, + F_BDB_BACKDOOR, + F_MUL_WARP16, + F_BFP, + F_GEMMINI_COMPUTE_PRELOADED, + F_GEMMINI_COMPUTE_ACCUMULATED, + F_GEMMINI_LOOP_WS_CONFIG_BOUNDS, + F_GEMMINI_LOOP_WS_CONFIG_ADDR_A, + F_GEMMINI_LOOP_WS_CONFIG_ADDR_B, + F_GEMMINI_LOOP_WS_CONFIG_ADDR_D, + F_GEMMINI_LOOP_WS_CONFIG_ADDR_C, + F_GEMMINI_LOOP_WS_CONFIG_STRIDES_AB, + F_GEMMINI_LOOP_WS_CONFIG_STRIDES_DC, + F_GEMMINI_LOOP_WS, + F_GEMMINI_LOOP_CONV_WS_CONFIG_1, + F_GEMMINI_LOOP_CONV_WS_CONFIG_2, + F_GEMMINI_LOOP_CONV_WS_CONFIG_3, + F_GEMMINI_LOOP_CONV_WS_CONFIG_4, + F_GEMMINI_LOOP_CONV_WS_CONFIG_5, + F_GEMMINI_LOOP_CONV_WS_CONFIG_6, + F_GEMMINI_LOOP_CONV_WS_CONFIG_7, + F_GEMMINI_LOOP_CONV_WS_CONFIG_8, + F_GEMMINI_LOOP_CONV_WS_CONFIG_9, + F_GEMMINI_LOOP_CONV_WS, + ) + + def isKnownFunct(funct: UInt): Bool = knownFuncts.map(_ === funct).reduce(_ || _) + + /** Mirrors `decode::execute_known` inner `u64` return (`ret` before iss maps rd). */ + def execRet(funct: UInt, xs1: UInt, xs2: UInt): UInt = { + val _ = (xs1, xs2) + MuxLookup( + funct, + 0.U(64.W), + )( + Seq( + F_FENCE -> 0.U(64.W), + F_BARRIER -> 0.U(64.W), + F_GEMMINI_CONFIG -> 0.U(64.W), + F_GEMMINI_FLUSH -> 0.U(64.W), + F_BDB_COUNTER -> 0.U(64.W), + F_MVOUT -> 0.U(64.W), + F_MSET -> 0.U(64.W), + F_MVIN -> 0.U(64.W), + F_IM2COL -> 0.U(64.W), + F_TRANSPOSE -> 0.U(64.W), + F_RELU -> 0.U(64.W), + F_QUANT -> 0.U(64.W), + F_DEQUANT -> 0.U(64.W), + F_GEMMINI_PRELOAD -> 0.U(64.W), + F_BDB_BACKDOOR -> 0.U(64.W), + F_MUL_WARP16 -> 0.U(64.W), + F_BFP -> 0.U(64.W), + F_GEMMINI_COMPUTE_PRELOADED -> 0.U(64.W), + F_GEMMINI_COMPUTE_ACCUMULATED -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_BOUNDS -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_ADDR_A -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_ADDR_B -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_ADDR_D -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_ADDR_C -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_STRIDES_AB -> 0.U(64.W), + F_GEMMINI_LOOP_WS_CONFIG_STRIDES_DC -> 0.U(64.W), + F_GEMMINI_LOOP_WS -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_1 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_2 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_3 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_4 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_5 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_6 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_7 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_8 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS_CONFIG_9 -> 0.U(64.W), + F_GEMMINI_LOOP_CONV_WS -> 0.U(64.W), + ), + ) + } + + /** Optional observable for bank difftest; 0 = not implemented. Wire Ball SRAM hash later. */ + def bankDigestPeek(funct: UInt, xs1: UInt, xs2: UInt): UInt = { + val _ = (funct, xs1, xs2) + 0.U(64.W) + } +} diff --git a/arch/src/main/scala/sims/bebop/BebopSpikeCosimTop.scala b/arch/src/main/scala/sims/bebop/BebopSpikeCosimTop.scala new file mode 100644 index 00000000..dd631759 --- /dev/null +++ b/arch/src/main/scala/sims/bebop/BebopSpikeCosimTop.scala @@ -0,0 +1,25 @@ +package sims.bebop + +import chisel3._ +import chisel3.util.Cat + +/** Verilator cosim top: RoCC insn → rd + optional `bankDigestPeek` for future BEMU bank hash compare. + * + * `execRet` comes from [[BebopCosimBlocks.execRet]] (per-funct). rd rule matches + * `bebop/src/emu/iss/iss.rs`: `rd = if v == 0 { funct } else { 0 }`. + */ +class BebopSpikeCosimTop extends RawModule { + val funct = IO(Input(UInt(7.W))) + val xs1 = IO(Input(UInt(64.W))) + val xs2 = IO(Input(UInt(64.W))) + + val result = IO(Output(UInt(64.W))) + + /** 0 until Ball exposes a 64-bit digest; then compare in bebop `vl_worker` when enabled. */ + val bankDigestPeek = IO(Output(UInt(64.W))) + + val execRet = BebopCosimBlocks.execRet(funct, xs1, xs2) + val known = BebopCosimBlocks.isKnownFunct(funct) + result := Mux(known && execRet === 0.U, Cat(0.U(57.W), funct), 0.U(64.W)) + bankDigestPeek := BebopCosimBlocks.bankDigestPeek(funct, xs1, xs2) +} diff --git a/arch/src/main/scala/sims/bebop/EmitBebopSpikeCosimVerilog.scala b/arch/src/main/scala/sims/bebop/EmitBebopSpikeCosimVerilog.scala new file mode 100644 index 00000000..1edb83e4 --- /dev/null +++ b/arch/src/main/scala/sims/bebop/EmitBebopSpikeCosimVerilog.scala @@ -0,0 +1,21 @@ +package sims.bebop + +import _root_.circt.stage.ChiselStage + +/** `mill buckyball.runMain sims.bebop.EmitBebopSpikeCosimVerilog ` */ +object EmitBebopSpikeCosimVerilog { + def main(args: Array[String]): Unit = { + val dir = if (args.nonEmpty) args(0) else "gen-bebop-cosim" + ChiselStage.emitSystemVerilogFile( + new BebopSpikeCosimTop, + firtoolOpts = Array.empty, + args = Array("-td", dir), + ) + ChiselStage.emitSystemVerilogFile( + new VecComputeTop, + firtoolOpts = Array.empty, + args = Array("-td", dir), + ) + println(s"EmitBebopSpikeCosimVerilog: wrote under $dir") + } +} diff --git a/arch/src/main/scala/sims/bebop/VecComputeTop.scala b/arch/src/main/scala/sims/bebop/VecComputeTop.scala new file mode 100644 index 00000000..20f8f83c --- /dev/null +++ b/arch/src/main/scala/sims/bebop/VecComputeTop.scala @@ -0,0 +1,64 @@ +package sims.bebop + +import chisel3._ +import chisel3.util._ +import framework.balldomain.prototype.vector.configs.VectorBallParam +import framework.balldomain.prototype.vector.op.MulOp + +class VecComputeTop extends Module { + private val cfg = VectorBallParam() + require(cfg.lane == 16, s"VecComputeTop requires lane=16, got ${cfg.lane}") + require(cfg.inputWidth == 8, s"VecComputeTop requires inputWidth=8, got ${cfg.inputWidth}") + require(cfg.outputWidth == 32, s"VecComputeTop requires outputWidth=32, got ${cfg.outputWidth}") + + val io = IO(new Bundle { + val start = Input(Bool()) + val iter = Input(UInt(16.W)) + val op1 = Input(Vec(16, UInt(8.W))) + val op2 = Input(Vec(16, UInt(8.W))) + val res = Output(Vec(16, UInt(32.W))) + val valid = Output(Bool()) + val done = Output(Bool()) + }) + + val mul = Module(new MulOp(cfg)) + + val op1Reg = Reg(Vec(16, UInt(8.W))) + val op2Reg = Reg(Vec(16, UInt(8.W))) + val inFire = RegInit(false.B) + + val rowCnt = RegInit(0.U(4.W)) + val active = RegInit(false.B) + val doneR = RegInit(false.B) + + when(io.start) { + assert(io.iter =/= 0.U, "VecComputeTop: iter must be non-zero") + op1Reg := io.op1 + op2Reg := io.op2 + inFire := true.B + rowCnt := 0.U + active := true.B + doneR := false.B + }.otherwise { + inFire := false.B + when(active && mul.io.out.valid) { + when(rowCnt === 15.U) { + active := false.B + doneR := true.B + }.otherwise { + rowCnt := rowCnt + 1.U + } + }.elsewhen(doneR) { + doneR := false.B + } + } + + mul.io.in.valid := inFire + mul.io.in.bits.in1 := op1Reg + mul.io.in.bits.in2 := op2Reg + mul.io.out.ready := true.B + + io.res := mul.io.out.bits.out + io.valid := active && mul.io.out.valid + io.done := doneR +} diff --git a/arch/src/main/scala/sims/firesim/TargetConfigs.scala b/arch/src/main/scala/sims/firesim/TargetConfigs.scala index 9d4b3531..6ceeb7c3 100644 --- a/arch/src/main/scala/sims/firesim/TargetConfigs.scala +++ b/arch/src/main/scala/sims/firesim/TargetConfigs.scala @@ -3,34 +3,42 @@ package sims.firesim import chisel3._ import java.io.File -import org.chipsalliance.cde.config.{Config} +import org.chipsalliance.cde.config.Config import freechips.rocketchip.tile._ import freechips.rocketchip.tilelink._ import freechips.rocketchip.subsystem._ -import freechips.rocketchip.devices.tilelink.{BootROMParams, BootROMLocated} +import freechips.rocketchip.devices.tilelink.{BootROMLocated, BootROMParams} -class WithBootROM extends Config((site, here, up) => { - case BootROMLocated(x) => { - val chipyardBootROM = new File(s"./thirdparty/chipyard/generators/testchipip/bootrom/bootrom.rv${site(MaxXLen)}.img") - val firesimBootROM = new File(s"./thirdparty/chipyard/target-rtl/chipyard/generators/testchipip/bootrom/bootrom.rv${site(MaxXLen)}.img") +class WithBootROM + extends Config((site, here, up) => { + case BootROMLocated(x) => + val chipyardBootROM = + new File(s"./thirdparty/chipyard/generators/testchipip/bootrom/bootrom.rv${site(MaxXLen)}.img") + val firesimBootROM = new File( + s"./thirdparty/chipyard/target-rtl/chipyard/generators/testchipip/bootrom/bootrom.rv${site(MaxXLen)}.img" + ) - val bootROMPath = if (chipyardBootROM.exists()) { - chipyardBootROM.getAbsolutePath() - } else { - firesimBootROM.getAbsolutePath() - } - up(BootROMLocated(x)).map(_.copy(contentFileName = bootROMPath)) - } -}) + val bootROMPath = + if (chipyardBootROM.exists()) { + chipyardBootROM.getAbsolutePath() + } else { + firesimBootROM.getAbsolutePath() + } + up(BootROMLocated(x)).map(_.copy(contentFileName = bootROMPath)) + }) -class FireSimGemminiBuckyballConfig extends Config( - new WithBootROM ++ - new firechip.chip.WithDefaultFireSimBridges ++ - new firechip.chip.WithFireSimConfigTweaks ++ - new chipyard.GemminiRocketConfig) +class FireSimGemminiBuckyballConfig + extends Config( + new WithBootROM ++ + new firechip.chip.WithDefaultFireSimBridges ++ + new firechip.chip.WithFireSimConfigTweaks ++ + new chipyard.GemminiRocketConfig + ) -class FireSimBuckyballToyConfig extends Config( - new WithBootROM ++ - new firechip.chip.WithDefaultFireSimBridges ++ - new firechip.chip.WithFireSimConfigTweaks ++ - new examples.toy.BuckyballToyConfig) +class FireSimBuckyballToyConfig + extends Config( + new WithBootROM ++ + new firechip.chip.WithDefaultFireSimBridges ++ + new firechip.chip.WithFireSimConfigTweaks ++ + new examples.toy.BuckyballToyConfig + ) diff --git a/arch/src/main/scala/sims/palladium/TargetConfigs.scala b/arch/src/main/scala/sims/palladium/TargetConfigs.scala index 21ec9a89..52da638a 100644 --- a/arch/src/main/scala/sims/palladium/TargetConfigs.scala +++ b/arch/src/main/scala/sims/palladium/TargetConfigs.scala @@ -3,8 +3,17 @@ package sims.palladium import org.chipsalliance.cde.config.Config import chipyard._ -class BuckyballToyP2EConfig extends Config( - new palladium.fpga.WithFPGAFrequency(50) ++ - new palladium.fpga.WithVCU19PTweaks ++ - new examples.toy.BuckyballToy256Config // Test with 256 cores + 8 L2 banks -) +// class BuckyballToyP2EConfig +// extends Config( +// new palladium.fpga.WithFPGAFrequency(50) ++ +// new palladium.fpga.WithVCU19PTweaks ++ +// new examples.toy.BuckyballToy256Config // Test with 256 cores + 8 L2 banks +// ) + +// // Cross-bar config +// class BuckyballToyP2ECBConfig +// extends Config( +// new palladium.fpga.WithFPGAFrequency(50) ++ +// new palladium.fpga.WithVCU19PTweaks ++ +// new examples.toy.BuckyballToy256CBConfig +// ) diff --git a/arch/src/main/scala/sims/pegasus/PegasusHarness.scala b/arch/src/main/scala/sims/pegasus/PegasusHarness.scala new file mode 100644 index 00000000..cd8a7310 --- /dev/null +++ b/arch/src/main/scala/sims/pegasus/PegasusHarness.scala @@ -0,0 +1,124 @@ +// package sims.pegasus + +// import chisel3._ +// import chisel3.util._ + +// import org.chipsalliance.cde.config.{Parameters} +// import freechips.rocketchip.util.{ResetCatchAndSync} + +// import chipyard.harness.{HasHarnessInstantiators} +// import chipyard.iobinders.{AXI4MemPort} + +// import pegasus._ + +// // PegasusHarness: top-level Chisel Module for AU280 FPGA. +// // Integrates PegasusShell (FPGA IPs) with ChipTop (SoC DUT) via Chipyard +// // harness infrastructure. +// // +// // PCIe and HBM2 reference clock pins are the top-level IOs. +// // ChipTop is instantiated inside via instantiateChipTops(). +// // HarnessBinders connect ChipTop's exposed ports (AXI4MemPort, UARTPort, etc.) +// // to pegasusShell's interface signals. +// // +// class PegasusHarness(implicit val p: Parameters) extends Module with HasHarnessInstantiators { + +// val io = IO(new Bundle { +// // PCIe physical interface +// val pcie_sys_clk = Input(Clock()) +// val pcie_sys_clk_gt = Input(Clock()) +// val pcie_sys_rst_n = Input(Bool()) +// val pcie_exp_txp = Output(UInt(16.W)) +// val pcie_exp_txn = Output(UInt(16.W)) +// val pcie_exp_rxp = Input(UInt(16.W)) +// val pcie_exp_rxn = Input(UInt(16.W)) + +// // HBM2 reference clock (100 MHz) +// val hbm_ref_clk = Input(Clock()) +// }) + +// // --- Instantiate PegasusShell --- +// val pegasusShell = Module(new PegasusShell) +// pegasusShell.io.pcie_sys_clk := io.pcie_sys_clk +// pegasusShell.io.pcie_sys_clk_gt := io.pcie_sys_clk_gt +// pegasusShell.io.pcie_sys_rst_n := io.pcie_sys_rst_n +// io.pcie_exp_txp := pegasusShell.io.pcie_exp_txp +// io.pcie_exp_txn := pegasusShell.io.pcie_exp_txn +// pegasusShell.io.pcie_exp_rxp := io.pcie_exp_rxp +// pegasusShell.io.pcie_exp_rxn := io.pcie_exp_rxn +// pegasusShell.io.hbm_ref_clk := io.hbm_ref_clk + +// // UART TX: default idle high until ChipTop provides real signal +// pegasusShell.io.uart_tx := true.B + +// // Chip mem: tie off until ChipTop provides real signals +// pegasusShell.io.chip_mem_awid := 0.U +// pegasusShell.io.chip_mem_awaddr := 0.U +// pegasusShell.io.chip_mem_awlen := 0.U +// pegasusShell.io.chip_mem_awsize := 0.U +// pegasusShell.io.chip_mem_awburst := 0.U +// pegasusShell.io.chip_mem_awvalid := false.B +// pegasusShell.io.chip_mem_wdata := 0.U +// pegasusShell.io.chip_mem_wstrb := 0.U +// pegasusShell.io.chip_mem_wlast := false.B +// pegasusShell.io.chip_mem_wvalid := false.B +// pegasusShell.io.chip_mem_bready := false.B +// pegasusShell.io.chip_mem_arid := 0.U +// pegasusShell.io.chip_mem_araddr := 0.U +// pegasusShell.io.chip_mem_arlen := 0.U +// pegasusShell.io.chip_mem_arsize := 0.U +// pegasusShell.io.chip_mem_arburst := 0.U +// pegasusShell.io.chip_mem_arvalid := false.B +// pegasusShell.io.chip_mem_rready := false.B + +// // --- HasHarnessInstantiators required interface --- +// def referenceClockFreqMHz: Double = 200.0 +// def referenceClock: Clock = pegasusShell.io.dut_clk +// def referenceReset: Reset = pegasusShell.io.dut_reset.asAsyncReset + +// val success = WireInit(false.B) + +// // --- Instantiate ChipTop and apply HarnessBinders --- +// // HarnessBinders for UART and AXI4Mem will override the defaults above +// // by calling connectChipMem() and driving pegasusShell.io.uart_tx +// val lazyDuts = instantiateChipTops() + +// // Called by WithPegasusAXIMem HarnessBinder to connect ChipTop ExtMem AXI4 +// def connectChipMem(port: AXI4MemPort): Unit = { +// val axi = port.io.bits +// // Write address channel +// pegasusShell.io.chip_mem_awid := axi.aw.bits.id.asTypeOf(UInt(6.W)) +// pegasusShell.io.chip_mem_awaddr := axi.aw.bits.addr.asTypeOf(UInt(33.W)) +// pegasusShell.io.chip_mem_awlen := axi.aw.bits.len +// pegasusShell.io.chip_mem_awsize := axi.aw.bits.size +// pegasusShell.io.chip_mem_awburst := axi.aw.bits.burst +// pegasusShell.io.chip_mem_awvalid := axi.aw.valid +// axi.aw.ready := pegasusShell.io.chip_mem_awready +// // Write data channel +// pegasusShell.io.chip_mem_wdata := axi.w.bits.data.asTypeOf(UInt(256.W)) +// pegasusShell.io.chip_mem_wstrb := axi.w.bits.strb.asTypeOf(UInt(32.W)) +// pegasusShell.io.chip_mem_wlast := axi.w.bits.last +// pegasusShell.io.chip_mem_wvalid := axi.w.valid +// axi.w.ready := pegasusShell.io.chip_mem_wready +// // Write response channel +// axi.b.bits.id := pegasusShell.io.chip_mem_bid.asTypeOf(axi.b.bits.id) +// axi.b.bits.resp := pegasusShell.io.chip_mem_bresp +// axi.b.valid := pegasusShell.io.chip_mem_bvalid +// pegasusShell.io.chip_mem_bready := axi.b.ready +// // Read address channel +// pegasusShell.io.chip_mem_arid := axi.ar.bits.id.asTypeOf(UInt(6.W)) +// pegasusShell.io.chip_mem_araddr := axi.ar.bits.addr.asTypeOf(UInt(33.W)) +// pegasusShell.io.chip_mem_arlen := axi.ar.bits.len +// pegasusShell.io.chip_mem_arsize := axi.ar.bits.size +// pegasusShell.io.chip_mem_arburst := axi.ar.bits.burst +// pegasusShell.io.chip_mem_arvalid := axi.ar.valid +// axi.ar.ready := pegasusShell.io.chip_mem_arready +// // Read data channel +// axi.r.bits.id := pegasusShell.io.chip_mem_rid.asTypeOf(axi.r.bits.id) +// axi.r.bits.data := pegasusShell.io.chip_mem_rdata.asTypeOf(axi.r.bits.data) +// axi.r.bits.resp := pegasusShell.io.chip_mem_rresp +// axi.r.bits.last := pegasusShell.io.chip_mem_rlast +// axi.r.valid := pegasusShell.io.chip_mem_rvalid +// pegasusShell.io.chip_mem_rready := axi.r.ready +// } + +// } diff --git a/arch/src/main/scala/sims/pegasus/PegasusHarnessBinders.scala b/arch/src/main/scala/sims/pegasus/PegasusHarnessBinders.scala new file mode 100644 index 00000000..23087123 --- /dev/null +++ b/arch/src/main/scala/sims/pegasus/PegasusHarnessBinders.scala @@ -0,0 +1,73 @@ +// package sims.pegasus + +// import chisel3._ +// import chisel3.util._ + +// import org.chipsalliance.cde.config.{Config} + +// import chipyard.harness._ +// import chipyard.iobinders._ + +// import pegasus._ + +// // HarnessBinder: connect ChipTop ExtMem AXI4 port to PegasusShell chip_mem interface +// // Replaces WithBlackBoxSimMem (Verilator simulation DRAM model) +// class WithPegasusAXIMem +// extends HarnessBinder({ +// case (th: PegasusHarness, port: AXI4MemPort, chipId: Int) => { +// th.connectChipMem(port) +// } +// }) + +// // HarnessBinder: connect ChipTop UART TX to PegasusShell (goes to UARTCapture) +// // Replaces WithUARTAdapter (Verilator stdout adapter) +// class WithPegasusUART +// extends HarnessBinder({ +// case (th: PegasusHarness, port: UARTPort, chipId: Int) => { +// th.pegasusShell.io.uart_tx := port.io.txd +// port.io.rxd := true.B // UART RX idle high (no input from host) +// } +// }) + +// // Tie off JTAG (not used on FPGA; debug via GDB/OpenOCD over UART is possible but not in scope) +// class WithPegasusTiedOffJTAG +// extends HarnessBinder({ +// case (th: HasHarnessInstantiators, port: JTAGPort, chipId: Int) => { +// port.io.TCK := true.B.asClock +// port.io.TMS := true.B +// port.io.TDI := true.B +// } +// }) + +// // Tie off DMI (no simulation-side debug model needed on FPGA) +// class WithPegasusTiedOffDMI +// extends HarnessBinder({ +// case (th: HasHarnessInstantiators, port: DMIPort, chipId: Int) => { +// port.io.dmi.req.valid := false.B +// port.io.dmi.req.bits := DontCare +// port.io.dmi.resp.ready := true.B +// port.io.dmiClock := false.B.asClock +// port.io.dmiReset := true.B +// } +// }) + +// // Aggregate config fragment: select PegasusHarness binders +// // and override Verilator-only simulation models +// class WithPegasusHarness +// extends Config( +// new WithPegasusAXIMem ++ +// new WithPegasusUART ++ +// new WithPegasusTiedOffJTAG ++ +// new WithPegasusTiedOffDMI ++ +// // Standard binders that are safe to keep on FPGA +// new chipyard.harness.WithTieOffInterrupts ++ +// new chipyard.harness.WithTieOffL2FBusAXI ++ +// new chipyard.harness.WithGPIOTiedOff ++ +// new chipyard.harness.WithGPIOPinsTiedOff ++ +// new chipyard.harness.WithDriveChipIdPin ++ +// new chipyard.harness.WithCustomBootPinPlusArg ++ +// new chipyard.harness.WithSerialTLTiedOff ++ +// new chipyard.harness.WithClockFromHarness ++ +// new chipyard.harness.WithResetFromHarness ++ +// new chipyard.harness.WithAbsoluteFreqHarnessClockInstantiator +// ) diff --git a/arch/src/main/scala/sims/pegasus/TargetConfigs.scala b/arch/src/main/scala/sims/pegasus/TargetConfigs.scala new file mode 100644 index 00000000..fa2a121d --- /dev/null +++ b/arch/src/main/scala/sims/pegasus/TargetConfigs.scala @@ -0,0 +1,46 @@ +// package sims.pegasus + +// import chisel3._ +// import _root_.circt.stage.ChiselStage +// import org.chipsalliance.cde.config.Config + +// import freechips.rocketchip.devices.tilelink.{BootROMLocated, BootROMParams} +// import freechips.rocketchip.subsystem.InSubsystem + +// // import pegasus.{PegasusHarness, WithPegasusHarness} + +// // BootROM for FPGA (points to the same bootrom image as the Verilator target) +// class WithPegasusBootROM +// extends Config((site, here, up) => { +// case BootROMLocated(InSubsystem) => Some(BootROMParams( +// contentFileName = "src/main/resources/bootrom/bootrom.rv64.img" +// )) +// }) + +// // PegasusBuckyballToyConfig: Buckyball Toy SoC on AU280 FPGA +// // +// // Target clock: 200 MHz. If timing closure yields a different Fmax, +// // update all WithXxxBusFrequency values and re-elaborate so the DTB +// // gets the correct UART baud divisor (otherwise UART output will be garbled). +// class PegasusBuckyballToyConfig +// extends Config( +// new WithPegasusHarness ++ +// new WithPegasusBootROM ++ +// new chipyard.config.WithSystemBusFrequency(200.0) ++ +// new chipyard.config.WithMemoryBusFrequency(200.0) ++ +// new chipyard.config.WithPeripheryBusFrequency(200.0) ++ +// new chipyard.config.WithControlBusFrequency(200.0) ++ +// new chipyard.config.WithFrontBusFrequency(200.0) ++ +// new chipyard.config.WithOffchipBusFrequency(200.0) ++ +// new examples.toy.BuckyballToyConfig +// ) + +// // Elaborate entry point: generate SystemVerilog for PegasusHarness +// // Usage: mill pegasus.runMain sims.pegasus.ElaboratePegasus [firtool options] +// object ElaboratePegasus extends App { +// ChiselStage.emitSystemVerilogFile( +// new PegasusHarness()(new PegasusBuckyballToyConfig().toInstance), +// firtoolOpts = args, +// args = Array.empty +// ) +// } diff --git a/arch/src/main/scala/sims/verify/TargetConfig.scala b/arch/src/main/scala/sims/verify/TargetConfig.scala index b61566a6..5bbcfa1d 100644 --- a/arch/src/main/scala/sims/verify/TargetConfig.scala +++ b/arch/src/main/scala/sims/verify/TargetConfig.scala @@ -2,104 +2,80 @@ package sims.verify import chisel3._ import _root_.circt.stage.ChiselStage -import org.chipsalliance.cde.config.{Config, Parameters, Field} -import examples.BuckyballConfigs.CustomBuckyballConfig -import framework.balldomain.blink.Blink -import prototype.vector.VecBall -import prototype.matrix.MatrixBall -import prototype.transpose.TransposeBall -import prototype.im2col.Im2colBall -import prototype.relu.ReluBall -import prototype.nnlut.NNLutBall +import framework.top.GlobalConfig +import framework.balldomain.blink.{BlinkIO, HasBlink} +import framework.balldomain.prototype.vector.VecBall +import framework.balldomain.prototype.relu.ReluBall +import framework.balldomain.prototype.im2col.Im2colBall +import framework.balldomain.prototype.transpose.TransposeBall -// Ball type definitions sealed trait BallType -case object VecBallType extends BallType -case object MatrixBallType extends BallType +case object VecBallType extends BallType +case object MatrixBallType extends BallType case object TransposeBallType extends BallType -case object Im2colBallType extends BallType -case object ReluBallType extends BallType -case object NNLutBallType extends BallType +case object Im2colBallType extends BallType +case object ReluBallType extends BallType +case object NNLutBallType extends BallType -// Config Key -case object TargetBallKey extends Field[BallType](VecBallType) +class TargetBall(ballType: BallType, b: GlobalConfig) extends Module with HasBlink { -// TargetBall - directly instantiate pre-packaged Ball -class TargetBall(implicit b: CustomBuckyballConfig, p: Parameters) extends Module { - val io = IO(new Blink) - - p(TargetBallKey) match { - case VecBallType => - val ball = Module(new VecBall(0)) - io <> ball.io - case MatrixBallType => - val ball = Module(new MatrixBall(0)) - io <> ball.io - case Im2colBallType => - val ball = Module(new Im2colBall(0)) - io <> ball.io - case TransposeBallType => - val ball = Module(new TransposeBall(0)) - io <> ball.io - case ReluBallType => - val ball = Module(new ReluBall(0)) - io <> ball.io - case NNLutBallType => - val ball = Module(new NNLutBall(0)) - io <> ball.io - case _ => throw new scala.MatchError("TargetBall does not handle this ball type") + val ballName = ballType match { + case VecBallType => "VecBall" + case ReluBallType => "ReluBall" + case Im2colBallType => "Im2colBall" + case TransposeBallType => "TransposeBall" + case MatrixBallType => throw new IllegalArgumentException("MatrixBall not implemented") + case NNLutBallType => throw new IllegalArgumentException("NNLutBall not implemented") + case _ => throw new scala.MatchError("TargetBall does not handle this ball type") } - override lazy val desiredName = "TargetBall" -} -// WithBlink configuration - empty configuration, used to combine with other configurations -// class WithBlink extends Config((site, here, up) => { -// case _ => up(site) -// }) + val mapping = b.ballDomain.ballIdMappings.find(_.ballName == ballName) + .getOrElse(throw new IllegalArgumentException(s"$ballName not found in config")) + val inBW = mapping.inBW + val outBW = mapping.outBW -// ============================================================================ -// Config combination usage examples: -// new Config(new WithVecBall ++ new WithBlink) -// new Config(new WithMatrixBall ++ new WithBlink) -// new Config(new WithTransposeBall ++ new WithBlink) -// ============================================================================ + val io = IO(new BlinkIO(b, inBW, outBW)) -class WithTargetBall(ballType: BallType) extends Config((site, here, up) => { - case TargetBallKey => ballType -}) + def blink: BlinkIO = io -class CustomBallTopConfig(ballType: BallType) extends Config( - // new WithBlink ++ - new WithTargetBall(ballType) -) + val ball = ballType match { + case VecBallType => Module(new VecBall(b)) + case ReluBallType => Module(new ReluBall(b)) + case Im2colBallType => Module(new Im2colBall(b)) + case TransposeBallType => Module(new TransposeBall(b)) + case _ => throw new scala.MatchError("TargetBall does not handle this ball type") + } + + io <> ball.blink +} object BallTopMain extends App { + // Select Ball type from command line arguments - val ballType = if (args.isEmpty) { - println("Usage: BallTopMain [firtool-opts...]") - println("Available ball types: vecball, matrixball, transposeball, im2colball, reluball, nnlutball") - println("Using default: vecball") - VecBallType - } else { - args(0).toLowerCase match { - case "vecball" => VecBallType - case "matrixball" => MatrixBallType - case "transposeball" => TransposeBallType - case "im2colball" => Im2colBallType - case "reluball" => ReluBallType - case "nnlutball" => NNLutBallType - case other => - println(s"Unknown ball type: $other, using vecball") - VecBallType + val ballType = + if (args.isEmpty) { + println("Usage: BallTopMain [firtool-opts...]") + println("Available ball types: vecball, matrixball, transposeball, im2colball, reluball, nnlutball") + println("Using default: vecball") + VecBallType + } else { + args(0).toLowerCase match { + case "vecball" => VecBallType + case "matrixball" => MatrixBallType + case "transposeball" => TransposeBallType + case "im2colball" => Im2colBallType + case "reluball" => ReluBallType + case "nnlutball" => NNLutBallType + case other => + println(s"Unknown ball type: $other, using vecball") + VecBallType + } } - } - implicit val config: CustomBuckyballConfig = examples.CustomBuckyballConfig() - implicit val params: Parameters = new Config(new CustomBallTopConfig(ballType)) + val b: GlobalConfig = GlobalConfig() ChiselStage.emitSystemVerilogFile( - new TargetBall(), - // Remaining parameters passed to firtool + new TargetBall(ballType, b), firtoolOpts = args.drop(1), args = Array.empty ) diff --git a/arch/src/main/scala/sims/verilator/BBSimHarness.scala b/arch/src/main/scala/sims/verilator/BBSimHarness.scala new file mode 100644 index 00000000..dd7a5bde --- /dev/null +++ b/arch/src/main/scala/sims/verilator/BBSimHarness.scala @@ -0,0 +1,141 @@ +package sims.verilator + +import chisel3._ +import chisel3.util._ + +import org.chipsalliance.cde.config.{Config, Parameters} + +import chipyard.harness.{HarnessBinder, HasHarnessInstantiators} +import chipyard.iobinders.{AXI4MMIOPort, UARTPort} + +// ============================================================================= +// WithBBSimMMIO: wire AXI4 MMIO port to C++ mmio_tick(). +// +// All registers run in port.io.clock domain (= 1 GHz, from WithUniformBusFrequencies). +// C++ samples three stable register outputs each posedge: +// - firePulse = RegNext(wFire) — 1-cycle pulse, no debounce needed +// - latchedAddr = Reg latched on AW — stable address +// - latchedData = RegEnable on wFire — stable write data +// bPending is set on wFire (not delayed) so B response reaches the CPU +// one cycle before C++ processes the event. +// ============================================================================= +class WithBBSimMMIO + extends HarnessBinder({ + case (th: BBSimHarness, port: AXI4MMIOPort, chipId: Int) => { + withClockAndReset(port.io.clock, th.reset) { + val addrBits = port.io.bits.aw.bits.addr.getWidth + val idBits = port.io.bits.aw.bits.id.getWidth + + val sIdle :: sGotAW :: Nil = Enum(2) + val state = RegInit(sIdle) + val latchedAddr = Reg(UInt(addrBits.W)) + val latchedId = Reg(UInt(idBits.W)) + val bPending = RegInit(false.B) + val bId = Reg(UInt(idBits.W)) + + // --- AW channel --- + port.io.bits.aw.ready := (state === sIdle) + when(state === sIdle && port.io.bits.aw.valid) { + latchedAddr := port.io.bits.aw.bits.addr + latchedId := port.io.bits.aw.bits.id + state := sGotAW + } + + // --- W channel --- + port.io.bits.w.ready := (state === sGotAW) + val wFire = (state === sGotAW) && port.io.bits.w.valid + val latchedData = RegEnable(port.io.bits.w.bits.data, wFire) + when(wFire) { + state := sIdle + bPending := true.B + bId := latchedId + } + + // --- B channel --- + when(port.io.bits.b.valid && port.io.bits.b.ready) { + bPending := false.B + } + port.io.bits.b.valid := bPending + port.io.bits.b.bits.id := bId + port.io.bits.b.bits.resp := 0.U + + // --- Fire pulse for C++ mmio_tick() --- + val firePulse = RegNext(wFire, false.B) + th.io.mmio_fire := firePulse + th.io.mmio_fire_addr := latchedAddr + th.io.mmio_fire_data := latchedData + + // --- AR channel: accept immediately, return 0 next cycle --- + port.io.bits.ar.ready := true.B + val rValid = RegNext(port.io.bits.ar.valid, false.B) + val rId = RegNext(port.io.bits.ar.bits.id) + port.io.bits.r.valid := rValid + port.io.bits.r.bits.data := 0.U + port.io.bits.r.bits.resp := 0.U + port.io.bits.r.bits.last := true.B + port.io.bits.r.bits.id := rId + } + } + }) + +// ============================================================================= +// WithNoUARTAdapter: suppress UARTAdapter; tie RX high (idle line) +// ============================================================================= +class WithNoUARTAdapter + extends HarnessBinder({ + case (th: HasHarnessInstantiators, port: UARTPort, chipId: Int) => { + port.io.rxd := true.B + } + }) + +// ============================================================================= +// BBSimConfig: harness-level config for BBSimHarness. +// Concrete VerilatorConfigs (e.g. BuckyballToyVerilatorConfig) extend this +// with WithDefaultMMIOPort + SoC config. +// ============================================================================= +class BBSimConfig + extends Config( + new WithNoUARTAdapter ++ + new WithBBSimMMIO ++ + new chipyard.config.WithUniformBusFrequencies(1000.0) ++ + new chipyard.harness.WithBlackBoxSimMem ++ + new chipyard.harness.WithSerialTLTiedOff ++ + new chipyard.harness.WithTieOffInterrupts ++ + new chipyard.harness.WithGPIOTiedOff ++ + new chipyard.harness.WithTieOffL2FBusAXI ++ + new chipyard.harness.WithClockFromHarness ++ + new chipyard.harness.WithResetFromHarness ++ + new chipyard.harness.WithAbsoluteFreqHarnessClockInstantiator ++ + new chipyard.iobinders.WithAXI4MemPunchthrough ++ + new chipyard.iobinders.WithAXI4MMIOPunchthrough ++ + new chipyard.iobinders.WithNMITiedOff + ) + +// ============================================================================= +// BBSimHarness +// ============================================================================= +class BBSimHarness(implicit val p: Parameters) extends Module with HasHarnessInstantiators { + + val io = IO(new Bundle { + val mmio_fire = Output(Bool()) + val mmio_fire_addr = Output(UInt(32.W)) + val mmio_fire_data = Output(UInt(64.W)) + }) + + // Defaults; WithBBSimMMIO binder overrides these. + io.mmio_fire := false.B + io.mmio_fire_addr := 0.U + io.mmio_fire_data := 0.U + + val bdbClkDpi = Module(new BdbClkDPI) + bdbClkDpi.io.clock := clock + bdbClkDpi.io.reset := reset.asBool + + def referenceClockFreqMHz: Double = 1000.0 + def referenceClock: Clock = clock + def referenceReset: Reset = reset + + val success = WireInit(false.B) + + val lazyDuts = instantiateChipTops() +} diff --git a/arch/src/main/scala/sims/verilator/BdbClkDPI.scala b/arch/src/main/scala/sims/verilator/BdbClkDPI.scala new file mode 100644 index 00000000..b1fc4aa8 --- /dev/null +++ b/arch/src/main/scala/sims/verilator/BdbClkDPI.scala @@ -0,0 +1,38 @@ +package sims.verilator + +import chisel3._ +import chisel3.util.HasBlackBoxInline + +/** + * Pushes harness reference clock cycle index into C++ via DPI each posedge. + * Matches BBSimHarness.clock (see ball_exec_once: clock=1 half-cycle). + */ +class BdbClkDPI extends BlackBox with HasBlackBoxInline { + + val io = IO(new Bundle { + val clock = Input(Clock()) + val reset = Input(Bool()) + }) + + setInline( + "BdbClkDPI.v", + """ + |import "DPI-C" function void dpi_bdb_set_clk(input longint unsigned c); + |module BdbClkDPI( + | input wire clock, + | input wire reset + |); + | reg [63:0] cnt; + | always @(posedge clock) begin + | if (reset) begin + | cnt <= 64'd0; + | dpi_bdb_set_clk(64'd0); + | end else begin + | dpi_bdb_set_clk(cnt); + | cnt <= cnt + 64'd1; + | end + | end + |endmodule + """.stripMargin + ) +} diff --git a/arch/src/main/scala/sims/verilator/README.md b/arch/src/main/scala/sims/verilator/README.md deleted file mode 100644 index 5971340d..00000000 --- a/arch/src/main/scala/sims/verilator/README.md +++ /dev/null @@ -1,31 +0,0 @@ -# Verilator Simulation Configuration - -## Overview - -This directory contains Buckyball system simulation configuration for the Verilator platform. Verilator is an open-source Verilog/SystemVerilog simulator that compiles RTL code into high-performance C++ simulation models, providing a fast functional simulation and verification environment. - -## File Structure - -``` -verilator/ -└── Elaborate.scala - Verilator elaboration configuration -``` - -## Core Implementation - -### Elaborate.scala - -This file implements the Verilog generation and elaboration process for the Buckyball system: - -```scala -object Elaborate extends App { - val config = new examples.toy.BuckyballToyConfig - val params = config.toInstance - - ChiselStage.emitSystemVerilogFile( - new chipyard.harness.TestHarness()(config.toInstance), - firtoolOpts = args, - args = Array.empty // Pass command line arguments directly - ) -} -``` diff --git a/arch/src/main/scala/sims/verilator/TargetConfig.scala b/arch/src/main/scala/sims/verilator/TargetConfig.scala index 39269e85..f67860e5 100644 --- a/arch/src/main/scala/sims/verilator/TargetConfig.scala +++ b/arch/src/main/scala/sims/verilator/TargetConfig.scala @@ -1,37 +1,36 @@ package sims.verilator import chisel3._ -// _root_ disambiguates from package chisel3.util.circt if user imports chisel3.util._ import _root_.circt.stage.ChiselStage -import org.chipsalliance.cde.config.{Config, Parameters} - -import freechips.rocketchip.devices.tilelink.{BootROMParams, BootROMLocated} -import freechips.rocketchip.subsystem.InSubsystem - - -// Custom BootROM configuration, pointing to correct resource path -class WithCustomBootROM extends Config((site, here, up) => { - case BootROMLocated(InSubsystem) => Some(BootROMParams( - contentFileName = "src/main/resources/bootrom/bootrom.rv64.img" - )) -}) - -class BuckyballToyVerilatorConfig extends Config( - new WithCustomBootROM ++ - new examples.toy.BuckyballToyConfig) - -class BuckyballToyVectorVerilatorConfig extends Config( - new WithCustomBootROM ++ - new examples.toy.BuckyballToyVectorConfig) - -class BuckyballGemminiVerilatorConfig extends Config( - new WithCustomBootROM ++ - new gemmini.DefaultGemminiConfig) - - +import org.chipsalliance.cde.config.Config + +import freechips.rocketchip.devices.tilelink.{BootROMLocated, BootROMParams} +import freechips.rocketchip.subsystem.{InSubsystem, WithDefaultMMIOPort} + +class WithCustomBootROM + extends Config((site, here, up) => { + case BootROMLocated(InSubsystem) => Some(BootROMParams( + contentFileName = "src/main/resources/bootrom/bootrom.rv64.img" + )) + }) + +class BuckyballToyVerilatorConfig + extends Config( + new BBSimConfig ++ + new WithDefaultMMIOPort ++ + new WithCustomBootROM ++ + new examples.toy.BuckyballToyConfig + ) + +class BuckyballGobanVerilatorConfig + extends Config( + new BBSimConfig ++ + new WithDefaultMMIOPort ++ + new WithCustomBootROM ++ + new examples.goban.BuckyballGobanConfig + ) object Elaborate extends App { - // Accept full config class name like "sims.verilator.BuckyballToyVerilatorConfig" if (args.isEmpty) { println("Usage: Elaborate [firtool-opts...]") println("Example: Elaborate sims.verilator.BuckyballToyVerilatorConfig") @@ -39,25 +38,24 @@ object Elaborate extends App { } val configClassName = args(0) - println(s"Elaborating with config class: $configClassName") - - // Dynamically load the config class - val config: Config = try { - val configClass = Class.forName(configClassName) - configClass.getDeclaredConstructor().newInstance().asInstanceOf[Config] - } catch { - case e: ClassNotFoundException => - println(s"Error: Config class not found: $configClassName") - sys.exit(1) - case e: Exception => - println(s"Error loading config class: ${e.getMessage}") - e.printStackTrace() - sys.exit(1) - } + println(s"Elaborating BBSimHarness with config: $configClassName") + + val config: Config = + try { + val configClass = Class.forName(configClassName) + configClass.getDeclaredConstructor().newInstance().asInstanceOf[Config] + } catch { + case e: ClassNotFoundException => + println(s"Error: Config class not found: $configClassName") + sys.exit(1) + case e: Exception => + println(s"Error loading config class: ${e.getMessage}") + e.printStackTrace() + sys.exit(1) + } ChiselStage.emitSystemVerilogFile( - new chipyard.harness.TestHarness()(config.toInstance), - // Remaining parameters passed to firtool + new BBSimHarness()(config.toInstance), firtoolOpts = args.drop(1), args = Array.empty ) diff --git a/arch/thirdparty/t1 b/arch/thirdparty/t1 new file mode 160000 index 00000000..07c71ebd --- /dev/null +++ b/arch/thirdparty/t1 @@ -0,0 +1 @@ +Subproject commit 07c71ebd150aa4cbcd5968a5949413829a7a3f71 diff --git a/bb-tests/sardine/package-lock.json b/bb-tests/sardine/package-lock.json index c2f45d29..3d59121d 100644 --- a/bb-tests/sardine/package-lock.json +++ b/bb-tests/sardine/package-lock.json @@ -5,14 +5,14 @@ "packages": { "": { "dependencies": { - "allure-commandline": "^2.34.1", + "allure-commandline": "^2.36.0", "httpx": "^3.0.1" } }, "node_modules/allure-commandline": { - "version": "2.34.1", - "resolved": "https://registry.npmjs.org/allure-commandline/-/allure-commandline-2.34.1.tgz", - "integrity": "sha512-l42csZ2bz7FdtJI1t5zA3IXtOZ0YJaP/+JMRC9gt6aBHRVUIu+6r+3F7KRyshQ79osLz9/MHlGqAjBPRqH0QFw==", + "version": "2.36.0", + "resolved": "https://registry.npmjs.org/allure-commandline/-/allure-commandline-2.36.0.tgz", + "integrity": "sha512-ls/4fk2Psv2Tu2PbWFrQPmUnm3gmmO9MBan4MuPWwqdkJPEmln2KRwtvtWYr9Av+e5AnFK1fGXWVyxqJIPiPwA==", "license": "Apache-2.0", "bin": { "allure": "bin/allure" diff --git a/bb-tests/sardine/package.json b/bb-tests/sardine/package.json index 3c6abc82..5d71a221 100644 --- a/bb-tests/sardine/package.json +++ b/bb-tests/sardine/package.json @@ -1,6 +1,6 @@ { "dependencies": { - "allure-commandline": "^2.34.1", + "allure-commandline": "^2.36.0", "httpx": "^3.0.1" } } diff --git a/bb-tests/sardine/pytest.ini b/bb-tests/sardine/pytest.ini index 389d4deb..ec881495 100644 --- a/bb-tests/sardine/pytest.ini +++ b/bb-tests/sardine/pytest.ini @@ -20,10 +20,10 @@ addopts = --log-file-level=DEBUG --log-cli-level=INFO --capture=no - # Allure 配置(当使用 --allure 时) -# --alluredir=reports/allure-results -# --clean-alluredir + --alluredir=reports/allure-results + --clean-alluredir + # 测试标记定义 markers = @@ -35,6 +35,7 @@ markers = fast: Fast tests ctest: CTest tests + mlir: MLIR OpTest tests # 最小版本要求 minversion = 6.0 diff --git a/bb-tests/sardine/run_tests.py b/bb-tests/sardine/run_tests.py index ef8bf450..eb98cc10 100644 --- a/bb-tests/sardine/run_tests.py +++ b/bb-tests/sardine/run_tests.py @@ -29,39 +29,7 @@ def get_git_commit(): return "unknown" -def check_allure_installed(): - """Check if Allure command line tool is installed.""" - try: - result = subprocess.run(["allure", "--version"], capture_output=True, text=True) - return result.returncode == 0 - except FileNotFoundError: - return False - - -def install_allure(): - """Install Allure command line tool.""" - print("Installing Allure command line tool...") - try: - # Try installing using npm - result = subprocess.run( - ["npm", "install", "-g", "allure-commandline"], - capture_output=True, - text=True, - ) - if result.returncode == 0: - print("Allure installed successfully via npm") - return True - except FileNotFoundError: - pass - - print("Please install Allure manually:") - print(" npm install -g allure-commandline") - print(" or") - print(" https://docs.qameta.io/allure/#_installing_a_commandline") - return False - - -def run_pytest(args=None, use_allure=False): +def run_pytest(args=None): """Run pytest with given arguments.""" args = args or [] @@ -76,21 +44,13 @@ def run_pytest(args=None, use_allure=False): reports_dir.mkdir(exist_ok=True) # Build pytest command - cmd = ["python", "-m", "pytest", "-s", "-v", "-n", "auto"] + cmd = ["python3", "-m", "pytest", "-s", "-v", "-n", "auto"] - # 检查 Allure 是否已安装 - if use_allure: - if not check_allure_installed(): - if not install_allure(): - print("Falling back to default HTML report") - use_allure = False - - # Allure 配置 - if use_allure: - allure_results_dir = reports_dir / "allure-results" - allure_results_dir.mkdir(exist_ok=True) - cmd.extend(["--alluredir", str(allure_results_dir), "--clean-alluredir"]) - print(f"Allure results will be saved to: {allure_results_dir}") + # Allure 配置 + allure_results_dir = reports_dir / "allure-results" + allure_results_dir.mkdir(exist_ok=True) + cmd.extend(["--alluredir", str(allure_results_dir), "--clean-alluredir"]) + print(f"Allure results will be saved to: {allure_results_dir}") cmd.extend(args) @@ -98,84 +58,55 @@ def run_pytest(args=None, use_allure=False): print(f"Working directory: {script_dir}") print(f"Git commit: {git_commit}") - # Run pytest - try: - result = subprocess.run(cmd, cwd=script_dir) - - # Process reports whether tests succeed or fail - if use_allure: - # Generate Allure report - allure_results_dir = reports_dir / "allure-results" - allure_report_dir = reports_dir / f"{git_commit}" - current_report_dir = reports_dir / "allure" - - print("Generating Allure report...") + result = subprocess.run(cmd, cwd=script_dir) + # Process reports whether tests succeed or fail + # Generate Allure report + allure_results_dir = reports_dir / "allure-results" + allure_report_dir = reports_dir / f"{git_commit}" + current_report_dir = reports_dir / "allure" + print("Generating Allure report...") + # Generate versioned report + allure_cmd = [ + "allure", + "generate", + str(allure_results_dir), + "-o", + str(allure_report_dir), + "--clean", + ] + allure_result = subprocess.run(allure_cmd, cwd=script_dir) + if allure_result.returncode == 0: + # Generate current run report (saved in allure directory) + current_cmd = [ + "allure", + "generate", + str(allure_results_dir), + "-o", + str(current_report_dir), + "--clean", + ] + current_result = subprocess.run(current_cmd, cwd=script_dir) + print("Generated Allure reports:") + print(f" - {allure_results_dir} (raw results)") + print(f" - {allure_report_dir} (versioned HTML report)") + if current_result.returncode == 0: + print(f" - {current_report_dir} (current HTML report)") + return result.returncode - # Generate versioned report - allure_cmd = [ - "allure", - "generate", - str(allure_results_dir), - "-o", - str(allure_report_dir), - "--clean", - ] - allure_result = subprocess.run(allure_cmd, cwd=script_dir) - if allure_result.returncode == 0: - # Generate current run report (saved in allure directory) - current_cmd = [ - "allure", - "generate", - str(allure_results_dir), - "-o", - str(current_report_dir), - "--clean", - ] - - current_result = subprocess.run(current_cmd, cwd=script_dir) - - print("Generated Allure reports:") - print(f" - {allure_results_dir} (raw results)") - print(f" - {allure_report_dir} (versioned HTML report)") - if current_result.returncode == 0: - print(f" - {current_report_dir} (current HTML report)") - else: - print("Failed to generate Allure report") - - return result.returncode - except KeyboardInterrupt: - print("\nTest execution interrupted by user") - return 1 - except Exception as e: - print(f"Error running tests: {e}") - return 1 - - -def main(): - """Main entry point.""" +if __name__ == "__main__": if len(sys.argv) > 1: if sys.argv[1] in ["-h", "--help"]: print("Sardine Test Runner with Allure Support") print() print("Usage:") print(" python run_tests.py [pytest arguments]") - print(" python run_tests.py --allure [pytest arguments]") print(" python run_tests.py --open-report") print() print("Examples:") print( " python run_tests.py # Run all tests with default HTML report" ) - print( - " python run_tests.py --allure # Run all tests with Allure report" - ) - print( - " python run_tests.py --allure -m smoke # Run smoke tests with Allure report" - ) - print( - " python run_tests.py --allure -m verilator # Run verilator tests with Allure report" - ) print( " python run_tests.py --open-report # Open latest Allure report in browser" ) @@ -202,15 +133,16 @@ def main(): print(" - Historical trends") print(" - Detailed failure analysis") print(" - Test categorization") - return 0 - - elif sys.argv[1] == "--allure": - # Pass all arguments after --allure to pytest - return run_pytest(sys.argv[2:], use_allure=True) - - # Pass all arguments to pytest (default uses HTML report) - return run_pytest(sys.argv[1:]) - - -if __name__ == "__main__": - sys.exit(main()) + sys.exit(0) + print("Test started") + # Filter out --coverage flag (handled by bbdev, not pytest) + # Set env var so test cases know to pass --coverage to verilator sim + pytest_args = [] + for a in sys.argv[1:]: + if a == "--coverage": + os.environ["SARDINE_COVERAGE"] = "1" + else: + pytest_args.append(a) + result = run_pytest(pytest_args) + print(f"Test result: {result}") + sys.exit(result) diff --git a/bb-tests/sardine/tests/test_ctest.py b/bb-tests/sardine/tests/test_ctest.py index 87571193..1814b48b 100644 --- a/bb-tests/sardine/tests/test_ctest.py +++ b/bb-tests/sardine/tests/test_ctest.py @@ -1,6 +1,7 @@ import pytest import logging import time +import os from pathlib import Path import subprocess @@ -14,6 +15,22 @@ # Define all ctest workloads with absolute paths and corresponding IDs ctest_workloads = [ + ( + "ctest_tlb_test_singlecore-baremetal", + "ctest_tlb_test_singlecore-baremetal", + ), + ( + "ctest_gemmini_os_risc_basic_test_singlecore-baremetal", + "ctest_gemmini_os_risc_basic_test_singlecore-baremetal", + ), + ( + "ctest_gemmini_ws_risc_basic_test_singlecore-baremetal", + "ctest_gemmini_ws_risc_basic_test_singlecore-baremetal", + ), + ( + "ctest_gemmini_os_cisc_basic_test_singlecore-baremetal", + "ctest_gemmini_os_cisc_basic_test_singlecore-baremetal", + ), ( "ctest_mvin_mvout_test_singlecore-baremetal", "ctest_mvin_mvout_test_singlecore-baremetal", @@ -58,25 +75,36 @@ "ctest_vecunit_matmul_random2_singlecore-baremetal", "ctest_vecunit_matmul_random2_singlecore-baremetal", ), - ( - "ctest_vecunit_matmul_random3_singlecore-baremetal", - "ctest_vecunit_matmul_random3_singlecore-baremetal", - ), + # ( "ctest_vecunit_tiled_matmul_singlecore-baremetal", + # "ctest_vecunit_tiled_matmul_singlecore-baremetal", + # ), ( "ctest_vecunit_matmul_zero_random_singlecore-baremetal", "ctest_vecunit_matmul_zero_random_singlecore-baremetal", ), - ( - "ctest_vecunit_simple_nn_forward_pass_test_singlecore-baremetal", - "ctest_vecunit_simple_nn_forward_pass_test_singlecore-baremetal", - ), ( "ctest_relu_test_singlecore-baremetal", "ctest_relu_test_singlecore-baremetal", ), ( - "ctest_transfer_test_singlecore-baremetal", - "ctest_transfer_test_singlecore-baremetal", + "ctest_transpose_test_singlecore-baremetal", + "ctest_transpose_test_singlecore-baremetal", + ), + ( + "ctest_transpose_16xn_test_singlecore-baremetal", + "ctest_transpose_16xn_test_singlecore-baremetal", + ), + ( + "ctest_im2col_test_singlecore-baremetal", + "ctest_im2col_test_singlecore-baremetal", + ), + ( + "ctest_quant_test_singlecore-baremetal", + "ctest_quant_test_singlecore-baremetal", + ), + ( + "ctest_dequant_test_singlecore-baremetal", + "ctest_dequant_test_singlecore-baremetal", ), ] @@ -93,9 +121,10 @@ def test_ctest_workload_debug( ): caplog.set_level(logging.INFO) - time.sleep(test_index * 20) + time.sleep(test_index * 15) start_time = time.time() - command = f'source {sardine_dir}/../../env.sh && bbdev verilator --sim "--binary {workload_path} --batch"' + coverage_flag = " --coverage" if os.environ.get("SARDINE_COVERAGE") else "" + command = f'bbdev verilator --sim "--binary {workload_path} --batch{coverage_flag}"' logging.info(f"Running command: {command}") # 使用 command_run 执行命令,带提前退出检测 diff --git a/bb-tests/sardine/tests/test_mlir.py b/bb-tests/sardine/tests/test_mlir.py new file mode 100644 index 00000000..b9d273aa --- /dev/null +++ b/bb-tests/sardine/tests/test_mlir.py @@ -0,0 +1,121 @@ +import pytest +import logging +import time +import os +from pathlib import Path + + +sardine_dir = Path(__file__).parent.parent +mlir_toy_workload_dir = ( + sardine_dir / ".." / "output" / "workloads" / "src" / "OpTest" / "toy" +) +mlir_tile_workload_dir = ( + sardine_dir / ".." / "output" / "workloads" / "src" / "OpTest" / "tile" +) + + +# Define all MLIR OpTest workloads (binary name, test id) +mlir_workloads = [ + ("bb_mvin_mvout_singlecore-baremetal", "bb_mvin_mvout"), + ("bb_dma1_singlecore-baremetal", "bb_dma1"), + ("bb_dma2_singlecore-baremetal", "bb_dma2"), + ("bb_dma3_singlecore-baremetal", "bb_dma3"), + ("bb_mul_warp16_singlecore-baremetal", "bb_mul_warp16"), + ("bb_im2col_singlecore-baremetal", "bb_im2col"), + ("bb_quant_dequant_singlecore-baremetal", "bb_quant_dequant"), +] + +# Tile-level tests +mlir_tile_workloads = [ + ("tile_matmul_singlecore-baremetal", "tile_matmul"), + ("tile_transpose_singlecore-baremetal", "tile_transpose"), + ("tile_conv2d_singlecore-baremetal", "tile_conv2d"), +] + + +@pytest.mark.verilator +@pytest.mark.mlir +@pytest.mark.parametrize( + "workload_path, workload_id, test_index", + [(path, id, idx) for idx, (path, id) in enumerate(mlir_workloads)], + ids=[w[1] for w in mlir_workloads], +) +def test_mlir_optest(command_run, caplog, workload_path, workload_id, test_index): + caplog.set_level(logging.INFO) + + time.sleep(test_index * 20) + start_time = time.time() + coverage_flag = " --coverage" if os.environ.get("SARDINE_COVERAGE") else "" + command = f'bbdev verilator --sim "--binary {workload_path} --batch{coverage_flag}"' + logging.info(f"Running command: {command}") + + early_exit_pattern = ( + r"Task completed\. Command running on http://localhost:\d+ is finished" + ) + result = command_run(command, early_exit_pattern=early_exit_pattern, timeout=1200) + execution_time = time.time() - start_time + + logging.info(f"Workload: {workload_id}") + logging.info(f"Workload path: {workload_path}") + logging.info(f"Test index: {test_index}") + logging.info(f"Execution time: {execution_time:.2f} seconds") + logging.info(f"Return code: {result['returncode']}") + logging.info("Script output completed") + + min_execution_time = 5.0 + assert ( + execution_time >= min_execution_time + ), f"Script executed too quickly: {execution_time:.2f}s < {min_execution_time}s" + assert result["returncode"] in [ + 0, + 1, + ], f"Script failed with unexpected return code: {result['returncode']}" + + if "PASSED" not in result["stdout"]: + assert False, f"Script failed: {result['stdout']}" + + logging.info("test completed") + + +@pytest.mark.verilator +@pytest.mark.mlir +@pytest.mark.parametrize( + "workload_path, workload_id, test_index", + [(path, id, idx) for idx, (path, id) in enumerate(mlir_tile_workloads)], + ids=[w[1] for w in mlir_tile_workloads], +) +def test_mlir_tile_optest(command_run, caplog, workload_path, workload_id, test_index): + caplog.set_level(logging.INFO) + + time.sleep(test_index * 20) + start_time = time.time() + coverage_flag = " --coverage" if os.environ.get("SARDINE_COVERAGE") else "" + command = f'bbdev verilator --sim "--binary {workload_path} --batch{coverage_flag}"' + logging.info(f"Running command: {command}") + + early_exit_pattern = ( + r"Task completed\. Command running on http://localhost:\d+ is finished" + ) + result = command_run(command, early_exit_pattern=early_exit_pattern, timeout=1200) + execution_time = time.time() - start_time + + logging.info(f"Workload: {workload_id}") + logging.info(f"Workload path: {workload_path}") + logging.info(f"Test index: {test_index}") + logging.info(f"Execution time: {execution_time:.2f} seconds") + logging.info(f"Return code: {result['returncode']}") + logging.info("Script output completed") + + min_execution_time = 5.0 + assert ( + execution_time >= min_execution_time + ), f"Script executed too quickly: {execution_time:.2f}s < {min_execution_time}s" + assert result["returncode"] in [ + 0, + 1, + ], f"Script failed with unexpected return code: {result['returncode']}" + + if "PASSED" not in result["stdout"]: + assert False, f"Script failed: {result['stdout']}" + + logging.info("test completed") diff --git a/bb-tests/workloads/CMakeLists.txt b/bb-tests/workloads/CMakeLists.txt index 2939e206..175e78b5 100644 --- a/bb-tests/workloads/CMakeLists.txt +++ b/bb-tests/workloads/CMakeLists.txt @@ -2,14 +2,6 @@ cmake_minimum_required(VERSION 3.12) project(BuckyballTest C CXX) set(CMAKE_EXPORT_COMPILE_COMMANDS ON) -# get the root directory of the project -# execute_process( -# COMMAND git rev-parse --show-toplevel -# WORKING_DIRECTORY ${CMAKE_SOURCE_DIR} -# OUTPUT_VARIABLE BBDIR -# OUTPUT_STRIP_TRAILING_WHITESPACE -# ) - set(WORKLOAD_DIR ${CMAKE_CURRENT_SOURCE_DIR}) set(WORKLOAD_SRC_DIR ${WORKLOAD_DIR}/src) set(WORKLOAD_LIB_DIR ${WORKLOAD_DIR}/lib) @@ -38,7 +30,6 @@ add_subdirectory(src) #------------------------------------------------------------------------------- # setup directory #------------------------------------------------------------------------------- -# Will be created during cmake file(MAKE_DIRECTORY ${OUTPUT_BIN_DIR}) #------------------------------------------------------------------------------- @@ -49,12 +40,6 @@ add_custom_target(create-dirs COMMENT "Creating necessary directories" ) - -# add_custom_target(build-compiler -# COMMAND ${BBDIR}/bb-tests/scripts/build-compiler.sh -# COMMENT "Building Buddy Compiler" -# ) - add_custom_target(sync-bin COMMAND rsync -av --exclude="CMakeFiles" --include="*/" --include="*-baremetal" --include="*-linux" @@ -63,14 +48,12 @@ add_custom_target(sync-bin COMMENT "Syncing binary files to output directory" ) - add_dependencies(sync-bin tutorial-build - buckyball-CTest-build - buckyball-gemmini-build - buckyball-rvv-build - OpTest-all - ) + buckyball-CTest-build + OpTest-all + buckyball-goban-CTest-build +) add_custom_target(build-all ALL DEPENDS sync-bin diff --git a/bb-tests/workloads/lib/bbhw/CMakeLists.txt b/bb-tests/workloads/lib/bbhw/CMakeLists.txt index 32b5a090..74c30af2 100644 --- a/bb-tests/workloads/lib/bbhw/CMakeLists.txt +++ b/bb-tests/workloads/lib/bbhw/CMakeLists.txt @@ -1,74 +1,2 @@ -set(ELF_CC "riscv64-unknown-elf-gcc") -set(LINUX_CC "riscv64-unknown-linux-gnu-gcc") - -#------------------------------------------------------------------------------- -# Add subdirectories, each compiles three versions -#------------------------------------------------------------------------------- -add_subdirectory(mem) -add_subdirectory(isa) - -#------------------------------------------------------------------------------- -# Combine final library - only responsible for linking submodules -#------------------------------------------------------------------------------- -set(CMAKE_C_COMPILER "riscv64-unknown-linux-gnu-gcc") -set(CMAKE_CXX_COMPILER "riscv64-unknown-linux-gnu-gcc") - -# 1. Linux version - merge Linux versions of submodules -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbhw-linux.a - COMMAND ${CMAKE_COMMAND} -E make_directory temp_extract_linux - COMMAND ${CMAKE_COMMAND} -E chdir temp_extract_linux ar x ../mem/libbbmem-linux.a - COMMAND ${CMAKE_COMMAND} -E chdir temp_extract_linux ar x ../isa/libbbisa-linux.a - COMMAND ar rcs libbbhw-linux.a temp_extract_linux/*.o - COMMAND ranlib libbbhw-linux.a - COMMAND ${CMAKE_COMMAND} -E remove_directory temp_extract_linux - DEPENDS bbisa-linux-build bbmem-linux-build - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Combining RISC-V Linux version of bbhw library" -) - -add_custom_target(bbhw-linux ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbhw-linux.a) - -#------------------------------------------------------------------------------- -# build linux version library -#------------------------------------------------------------------------------- -set(LINK_FLAGS "-static -Wl,--no-dynamic-linker") - -#------------------------------------------------------------------------------- -# Generate x86_64 version library (for toy project) -#------------------------------------------------------------------------------- -# 3. x86 version - combine x86 versions of submodules -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbhw-x86.a - COMMAND ${CMAKE_COMMAND} -E make_directory temp_extract - COMMAND ${CMAKE_COMMAND} -E chdir temp_extract ar x ../mem/libbbmem-x86.a - COMMAND ${CMAKE_COMMAND} -E chdir temp_extract ar x ../isa/libbbisa-x86.a - COMMAND ar rcs libbbhw-x86.a temp_extract/*.o - COMMAND ranlib libbbhw-x86.a - COMMAND ${CMAKE_COMMAND} -E remove_directory temp_extract - DEPENDS bbisa-x86-build bbmem-x86-build - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Combining x86_64 version of bbhw library" -) - -add_custom_target(bbhw-x86 ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbhw-x86.a) - -#------------------------------------------------------------------------------- -# Set baremetal compilation flags -#------------------------------------------------------------------------------- - -# 2. Baremetal version - merge Baremetal versions of submodules -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbhw-baremetal.a - COMMAND ${CMAKE_COMMAND} -E make_directory temp_extract_baremetal - COMMAND ${CMAKE_COMMAND} -E chdir temp_extract_baremetal ar x ../mem/libbbmem-baremetal.a - COMMAND ${CMAKE_COMMAND} -E chdir temp_extract_baremetal ar x ../isa/libbbisa-baremetal.a - COMMAND ar rcs libbbhw-baremetal.a temp_extract_baremetal/*.o - COMMAND ranlib libbbhw-baremetal.a - COMMAND ${CMAKE_COMMAND} -E remove_directory temp_extract_baremetal - DEPENDS bbisa-baremetal-build bbmem-baremetal-build - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Combining RISC-V Baremetal version of bbhw library" -) - -add_custom_target(bbhw-baremetal ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbhw-baremetal.a) +# bbhw is header-only: isa.h and mem.h provide macros (inline asm), no library to build. +# Include path is WORKLOAD_LIB_DIR (workloads/lib), so #include works. diff --git a/bb-tests/workloads/lib/bbhw/isa/00_fence.c b/bb-tests/workloads/lib/bbhw/isa/00_fence.c new file mode 100644 index 00000000..4ec9514e --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/00_fence.c @@ -0,0 +1,10 @@ +#ifndef _BB_FENCE_H_ +#define _BB_FENCE_H_ + +#include "isa.h" + +#define BB_FENCE_FUNC7 0 + +#define bb_fence() BUCKYBALL_INSTRUCTION_R_R(0, 0, BB_FENCE_FUNC7) + +#endif // _BB_FENCE_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/01_barrier.c b/bb-tests/workloads/lib/bbhw/isa/01_barrier.c new file mode 100644 index 00000000..5bfbff3c --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/01_barrier.c @@ -0,0 +1,22 @@ +#ifndef _BB_BARRIER_H_ +#define _BB_BARRIER_H_ + +#include "isa.h" + +#define BB_BARRIER_FUNC7 1 + +/** + * bb_barrier() - hardware multi-core barrier synchronization. + * + * Semantics: + * 1. Waits for this core's own instruction ROB to drain (implicit fence). + * 2. Signals arrive to the tile-level BarrierUnit. + * 3. Stalls until all nCores cores have arrived (hardware all-reduce). + * + * All cores in the same BBTile must call bb_barrier() at the same point. + * Mixing bb_barrier() with bb_fence() within the same barrier epoch is + * undefined behaviour. + */ +#define bb_barrier() BUCKYBALL_INSTRUCTION_R_R(0, 0, BB_BARRIER_FUNC7) + +#endif // _BB_BARRIER_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/02_gemmini_config.c b/bb-tests/workloads/lib/bbhw/isa/02_gemmini_config.c new file mode 100644 index 00000000..d8e66904 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/02_gemmini_config.c @@ -0,0 +1,24 @@ +#ifndef _BB_GEMMINI_CONFIG_H_ +#define _BB_GEMMINI_CONFIG_H_ + +#include "isa.h" + +#define BB_GEMMINI_CONFIG_FUNC7 2 + +// Configure Gemmini systolic array +// All config parameters go in rs2 (special), starting from bit 4 +// (bits [3:0] reserved for sub-command tag in decoder) +// dataflow: 0=OS, 1=WS +// activation: 0=none, 1=relu +// a_transpose, b_transpose: transpose flags +// in_shift: right-shift amount for output +#define bb_gemmini_config(dataflow, activation, a_transpose, b_transpose, \ + in_shift) \ + BUCKYBALL_INSTRUCTION_R_R(0, \ + (FIELD(dataflow, 4, 4) | FIELD(activation, 5, 6) | \ + FIELD(a_transpose, 7, 7) | \ + FIELD(b_transpose, 8, 8) | \ + FIELD(in_shift, 9, 40)), \ + BB_GEMMINI_CONFIG_FUNC7) + +#endif // _BB_GEMMINI_CONFIG_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/03_gemmini_flush.c b/bb-tests/workloads/lib/bbhw/isa/03_gemmini_flush.c new file mode 100644 index 00000000..0d19750a --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/03_gemmini_flush.c @@ -0,0 +1,12 @@ +#ifndef _BB_GEMMINI_FLUSH_H_ +#define _BB_GEMMINI_FLUSH_H_ + +#include "isa.h" + +#define BB_GEMMINI_FLUSH_FUNC7 3 + +// Flush the systolic array state +#define bb_gemmini_flush() \ + BUCKYBALL_INSTRUCTION_R_R(0, 0, BB_GEMMINI_FLUSH_FUNC7) + +#endif // _BB_GEMMINI_FLUSH_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/04_bdb_counter.c b/bb-tests/workloads/lib/bbhw/isa/04_bdb_counter.c new file mode 100644 index 00000000..449a6a8e --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/04_bdb_counter.c @@ -0,0 +1,33 @@ +#ifndef _BDB_COUNTER_H_ +#define _BDB_COUNTER_H_ + +#include "isa.h" + +#define BDB_COUNTER_FUNC7 4 + +// subcmd values +#define BDB_CTR_START 0 +#define BDB_CTR_STOP 1 +#define BDB_CTR_READ 2 + +// rs2 layout: [3:0]=subcmd, [7:4]=ctr_id, [63:8]=payload +#define BDB_CTR_RS2(subcmd, ctr_id, payload) \ + (((unsigned long long)(payload) << 8) | (((ctr_id) & 0xF) << 4) | \ + ((subcmd) & 0xF)) + +// Start counter ctr_id with user tag +#define bdb_counter_start(ctr_id, tag) \ + BUCKYBALL_INSTRUCTION_R_R(0, BDB_CTR_RS2(BDB_CTR_START, ctr_id, tag), \ + BDB_COUNTER_FUNC7) + +// Stop counter ctr_id, output elapsed to trace +#define bdb_counter_stop(ctr_id) \ + BUCKYBALL_INSTRUCTION_R_R(0, BDB_CTR_RS2(BDB_CTR_STOP, ctr_id, 0), \ + BDB_COUNTER_FUNC7) + +// Read counter ctr_id current value (non-destructive), output to trace +#define bdb_counter_read(ctr_id) \ + BUCKYBALL_INSTRUCTION_R_R(0, BDB_CTR_RS2(BDB_CTR_READ, ctr_id, 0), \ + BDB_COUNTER_FUNC7) + +#endif // _BDB_COUNTER_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/105_gemmini_loop_conv_ws.c b/bb-tests/workloads/lib/bbhw/isa/105_gemmini_loop_conv_ws.c new file mode 100644 index 00000000..7e551a2a --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/105_gemmini_loop_conv_ws.c @@ -0,0 +1,72 @@ +#ifndef _BB_GEMMINI_LOOP_CONV_WS_H +#define _BB_GEMMINI_LOOP_CONV_WS_H + +#include "isa.h" + +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_1_FUNC7 96 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_2_FUNC7 97 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_3_FUNC7 98 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_4_FUNC7 99 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_5_FUNC7 100 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_6_FUNC7 101 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_7_FUNC7 102 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_8_FUNC7 103 +#define BB_GEMMINI_LOOP_CONV_WS_CONFIG_9_FUNC7 104 +#define BB_GEMMINI_LOOP_CONV_WS_FUNC7 105 + +#define bb_gemmini_loop_conv_ws_config_1(batch_size, in_dim, in_channels) \ + BUCKYBALL_INSTRUCTION_R_R(0, \ + (FIELD(batch_size, 0, 15) | \ + FIELD(in_dim, 16, 31) | \ + FIELD(in_channels, 32, 47)), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_1_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_2(out_channels, out_dim, stride, \ + padding) \ + BUCKYBALL_INSTRUCTION_R_R(0, \ + (FIELD(out_channels, 0, 15) | \ + FIELD(out_dim, 16, 31) | FIELD(stride, 32, 39) | \ + FIELD(padding, 40, 47)), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_2_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_3(kernel_dim, pool_size, pool_stride, \ + pool_padding) \ + BUCKYBALL_INSTRUCTION_R_R( \ + 0, \ + (FIELD(kernel_dim, 0, 7) | FIELD(pool_size, 8, 15) | \ + FIELD(pool_stride, 16, 23) | FIELD(pool_padding, 24, 31)), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_3_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_4(addr_bias) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr_bias, 0, 38), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_4_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_5(addr_input) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr_input, 0, 38), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_5_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_6(addr_weight) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr_weight, 0, 38), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_6_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_7(addr_output) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr_output, 0, 38), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_7_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_8(input_stride, weight_stride) \ + BUCKYBALL_INSTRUCTION_R_R( \ + 0, (FIELD(input_stride, 0, 31) | FIELD(weight_stride, 32, 63)), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_8_FUNC7) + +#define bb_gemmini_loop_conv_ws_config_9(output_stride) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(output_stride, 0, 31), \ + BB_GEMMINI_LOOP_CONV_WS_CONFIG_9_FUNC7) + +#define bb_gemmini_loop_conv_ws(bank_input, bank_weight, bank_output, no_bias) \ + BUCKYBALL_INSTRUCTION_R_R( \ + 0, \ + (FIELD(bank_input, 0, 9) | FIELD(bank_weight, 10, 19) | \ + FIELD(bank_output, 20, 29) | FIELD(no_bias, 30, 30)), \ + BB_GEMMINI_LOOP_CONV_WS_FUNC7) + +#endif diff --git a/bb-tests/workloads/lib/bbhw/isa/16_mvout.c b/bb-tests/workloads/lib/bbhw/isa/16_mvout.c new file mode 100644 index 00000000..faffc689 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/16_mvout.c @@ -0,0 +1,13 @@ +#ifndef _BB_MVOUT_H_ +#define _BB_MVOUT_H_ + +#include "isa.h" + +#define BB_MVOUT_FUNC7 16 + +#define bb_mvout(mem_addr, bank_id, depth, stride) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(bank_id) | BB_ITER(depth)), \ + (FIELD(mem_addr, 0, 38) | FIELD(stride, 39, 57)), \ + BB_MVOUT_FUNC7) + +#endif // _BB_MVOUT_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/24_mvin.c b/bb-tests/workloads/lib/bbhw/isa/24_mvin.c deleted file mode 100644 index 1e2dae37..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/24_mvin.c +++ /dev/null @@ -1,36 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig mvin_config = { - .rs1_fields = (BitFieldConfig[]){{"base_dram_addr", 0, 31}, {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"base_sp_addr", 0, 14}, - {"iter", 15, 24}, - {"stride", 24, 33}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define MVIN_ENCODE_RS1(dram_addr) ENCODE_FIELD(dram_addr, 0, 32) - -#define MVIN_ENCODE_RS2(sp_addr, iter, stride) \ - (ENCODE_FIELD(sp_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(stride, 25, 10)) - -// MVIN instruction low-level implementation -#ifndef __x86_64__ -#define MVIN_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 24, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define MVIN_RAW(rs1, rs2) -#endif - -// MVIN instruction high-level API implementation -void bb_mvin(uint64_t mem_addr, uint32_t sp_addr, uint32_t iter, - uint32_t stride) { - uint64_t rs1_val = MVIN_ENCODE_RS1(mem_addr); - uint64_t rs2_val = MVIN_ENCODE_RS2(sp_addr, iter, stride); - MVIN_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/25_mvout.c b/bb-tests/workloads/lib/bbhw/isa/25_mvout.c deleted file mode 100644 index fe44d0f0..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/25_mvout.c +++ /dev/null @@ -1,36 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig mvout_config = { - .rs1_fields = (BitFieldConfig[]){{"base_dram_addr", 0, 31}, {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"base_sp_addr", 0, 14}, - {"iter", 15, 24}, - {"stride", 24, 33}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define MVOUT_ENCODE_RS1(dram_addr) ENCODE_FIELD(dram_addr, 0, 32) - -#define MVOUT_ENCODE_RS2(sp_addr, iter, stride) \ - (ENCODE_FIELD(sp_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(stride, 25, 10)) - -// MVOUT instruction low-level implementation -#ifndef __x86_64__ -#define MVOUT_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 25, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define MVOUT_RAW(rs1, rs2) -#endif - -// MVOUT instruction high-level API implementation -void bb_mvout(uint64_t mem_addr, uint32_t sp_addr, uint32_t iter, - uint32_t stride) { - uint64_t rs1_val = MVOUT_ENCODE_RS1(mem_addr); - uint64_t rs2_val = MVOUT_ENCODE_RS2(sp_addr, iter, stride); - MVOUT_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/26_bbfp_mul.c b/bb-tests/workloads/lib/bbhw/isa/26_bbfp_mul.c deleted file mode 100644 index 0d61ddb1..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/26_bbfp_mul.c +++ /dev/null @@ -1,36 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig bbfp_mul_config = { - .rs1_fields = (BitFieldConfig[]){{"op1_spaddr", 0, 14}, - {"op2_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"wr_spaddr", 0, 14}, {"iter", 15, 24}, {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define BBFP_MUL_ENCODE_RS1(op1_addr, op2_addr) \ - (ENCODE_FIELD(op1_addr, 0, 15) | ENCODE_FIELD(op2_addr, 15, 15)) - -#define BBFP_MUL_ENCODE_RS2(wr_addr, iter) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10)) - -// BBFP_MUL instruction low-level implementation -#ifndef __x86_64__ -#define BBFP_MUL_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 26, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define BBFP_MUL_RAW(rs1, rs2) -#endif - -// BBFP_MUL instruction high-level API implementation -void bb_bbfp_mul(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter) { - uint64_t rs1_val = BBFP_MUL_ENCODE_RS1(op1_addr, op2_addr); - uint64_t rs2_val = BBFP_MUL_ENCODE_RS2(wr_addr, iter); - BBFP_MUL_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/27_matmul_ws.c b/bb-tests/workloads/lib/bbhw/isa/27_matmul_ws.c deleted file mode 100644 index 13f7ecf6..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/27_matmul_ws.c +++ /dev/null @@ -1,39 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig matmul_ws_config = { - .rs1_fields = (BitFieldConfig[]){{"op1_spaddr", 0, 14}, - {"op2_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"wr_spaddr", 0, 14}, - {"iter", 15, 24}, - {"ws_flag", 25, 25}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define MATMUL_WS_ENCODE_RS1(op1_addr, op2_addr) \ - (ENCODE_FIELD(op1_addr, 0, 15) | ENCODE_FIELD(op2_addr, 15, 15)) - -#define MATMUL_WS_ENCODE_RS2(wr_addr, iter, ws_flag) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(ws_flag, 25, 1)) - -// MATMUL_WS instruction low-level implementation -#ifndef __x86_64__ -#define MATMUL_WS_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 27, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define MATMUL_WS_RAW(rs1, rs2) -#endif - -// MATMUL_WS instruction high-level API implementation -void bb_matmul_ws(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter) { - uint64_t rs1_val = MATMUL_WS_ENCODE_RS1(op1_addr, op2_addr); - uint64_t rs2_val = MATMUL_WS_ENCODE_RS2(wr_addr, iter, 1); - MATMUL_WS_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/31_fence.c b/bb-tests/workloads/lib/bbhw/isa/31_fence.c deleted file mode 100644 index dc6ada62..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/31_fence.c +++ /dev/null @@ -1,20 +0,0 @@ -#include "isa.h" - -// =========================== for CTest =========================== -// FENCE instruction has no parameters, define assembly macro directly - -// FENCE instruction low-level implementation -#ifndef __x86_64__ -#define FENCE_RAW() \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 31, x0, x0, x0" \ - : \ - : \ - : "memor" \ - "y") -#else -// Do not execute RISC-V instructions on x86 platform -#define FENCE_RAW() -#endif - -// FENCE instruction high-level API implementation -void bb_fence(void) { FENCE_RAW(); } diff --git a/bb-tests/workloads/lib/bbhw/isa/32_mset.c b/bb-tests/workloads/lib/bbhw/isa/32_mset.c new file mode 100644 index 00000000..d9851bcf --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/32_mset.c @@ -0,0 +1,18 @@ +#ifndef _BB_MSET_H_ +#define _BB_MSET_H_ + +#include "isa.h" + +#define BB_MSET_FUNC7 32 + +#define bb_mset(bank_id, alloc, row, col) \ + BUCKYBALL_INSTRUCTION_R_R( \ + BB_BANK0(bank_id), \ + (FIELD(row, 0, 4) | FIELD(col, 5, 9) | FIELD(alloc, 10, 10)), \ + BB_MSET_FUNC7) + +#define bb_mem_release(bank_id) bb_mset(bank_id, 0, 0, 0); + +#define bb_mem_alloc(bank_id, row, col) bb_mset(bank_id, 1, row, col) + +#endif // _BB_MSET_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/32_mul_warp16.c b/bb-tests/workloads/lib/bbhw/isa/32_mul_warp16.c deleted file mode 100644 index 406ec367..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/32_mul_warp16.c +++ /dev/null @@ -1,39 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig mul_warp16_config = { - .rs1_fields = (BitFieldConfig[]){{"op1_spaddr", 0, 14}, - {"op2_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"wr_spaddr", 0, 14}, - {"iter", 15, 24}, - {"mode", 25, 25}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define MUL_WARP16_ENCODE_RS1(op1_addr, op2_addr) \ - (ENCODE_FIELD(op1_addr, 0, 15) | ENCODE_FIELD(op2_addr, 15, 15)) - -#define MUL_WARP16_ENCODE_RS2(wr_addr, iter, mode) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(mode, 25, 1)) - -// MUL_WARP16 instruction low-level implementation -#ifndef __x86_64__ -#define MUL_WARP16_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 32, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define MUL_WARP16_RAW(rs1, rs2) -#endif - -// MUL_WARP16 instruction high-level API implementation -void bb_mul_warp16(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter, uint32_t mode) { - uint64_t rs1_val = MUL_WARP16_ENCODE_RS1(op1_addr, op2_addr); - uint64_t rs2_val = MUL_WARP16_ENCODE_RS2(wr_addr, iter, mode); - MUL_WARP16_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/33_im2col.c b/bb-tests/workloads/lib/bbhw/isa/33_im2col.c deleted file mode 100644 index 34508494..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/33_im2col.c +++ /dev/null @@ -1,45 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig im2col_config = { - .rs1_fields = (BitFieldConfig[]){{"op_spaddr", 0, 14}, - {"wr_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"kcol", 26, 29}, - {"krow", 30, 33}, - {"incol", 34, 38}, - {"inrow", 39, 43}, - {"startcol", 49, 53}, - {"startrow", 54, 58}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define IM2COL_ENCODE_RS1(op_addr, wr_addr) \ - (ENCODE_FIELD(op_addr, 0, 15) | ENCODE_FIELD(wr_addr, 15, 15)) - -#define IM2COL_ENCODE_RS2(krow, kcol, inrow, incol, startrow, startcol) \ - (ENCODE_FIELD(kcol, 26, 4) | ENCODE_FIELD(krow, 30, 4) | \ - ENCODE_FIELD(incol, 34, 5) | ENCODE_FIELD(inrow, 39, 5) | \ - ENCODE_FIELD(startcol, 49, 5) | ENCODE_FIELD(startrow, 54, 5)) - -// IM2COL instruction low-level implementation -#ifndef __x86_64__ -#define IM2COL_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 33, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define IM2COL_RAW(rs1, rs2) -#endif - -// IM2COL instruction high-level API implementation -void bb_im2col(uint32_t op1_addr, uint32_t wr_addr, uint32_t krow, - uint32_t kcol, uint32_t inrow, uint32_t incol, uint32_t startrow, - uint32_t startcol) { - uint64_t rs1_val = IM2COL_ENCODE_RS1(op1_addr, wr_addr); - uint64_t rs2_val = - IM2COL_ENCODE_RS2(krow, kcol, inrow, incol, startrow, startcol); - IM2COL_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/33_mvin.c b/bb-tests/workloads/lib/bbhw/isa/33_mvin.c new file mode 100644 index 00000000..e45fec13 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/33_mvin.c @@ -0,0 +1,13 @@ +#ifndef _BB_MVIN_H_ +#define _BB_MVIN_H_ + +#include "isa.h" + +#define BB_MVIN_FUNC7 33 + +#define bb_mvin(mem_addr, bank_id, depth, stride) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(bank_id) | BB_ITER(depth)), \ + (FIELD(mem_addr, 0, 38) | FIELD(stride, 39, 57)), \ + BB_MVIN_FUNC7) + +#endif // _BB_MVIN_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/34_transpose.c b/bb-tests/workloads/lib/bbhw/isa/34_transpose.c deleted file mode 100644 index f2c8d492..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/34_transpose.c +++ /dev/null @@ -1,36 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig transpose_config = { - .rs1_fields = (BitFieldConfig[]){{"op_spaddr", 0, 14}, - {"wr_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = - (BitFieldConfig[]){{"mode", 25, 25}, {"iter", 15, 24}, {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define TRANSPOSE_ENCODE_RS1(op_addr, wr_addr) \ - (ENCODE_FIELD(op_addr, 0, 15) | ENCODE_FIELD(wr_addr, 15, 15)) - -#define TRANSPOSE_ENCODE_RS2(iter, mode) \ - (ENCODE_FIELD(iter, 15, 10) | ENCODE_FIELD(mode, 25, 1)) - -// TRANSPOSE instruction low-level implementation -#ifndef __x86_64__ -#define TRANSPOSE_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 34, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define TRANSPOSE_RAW(rs1, rs2) -#endif - -// TRANSPOSE instruction high-level API implementation -void bb_transpose(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter, - uint32_t mode) { - uint64_t rs1_val = TRANSPOSE_ENCODE_RS1(op1_addr, wr_addr); - uint64_t rs2_val = TRANSPOSE_ENCODE_RS2(iter, mode); - TRANSPOSE_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/38_relu.c b/bb-tests/workloads/lib/bbhw/isa/38_relu.c deleted file mode 100644 index e829d4ad..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/38_relu.c +++ /dev/null @@ -1,31 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig relu_config = { - .rs1_fields = (BitFieldConfig[]){{"op_spaddr", 0, 14}, {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"wr_spaddr", 0, 14}, {"iter", 15, 24}, {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define RELU_ENCODE_RS1(op_addr) (ENCODE_FIELD(op_addr, 0, 15)) -#define RELU_ENCODE_RS2(wr_addr, iter) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10)) - -// RELU instruction low-level implementation -#ifndef __x86_64__ -#define RELU_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 38, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define RELU_RAW(rs1, rs2) -#endif - -// RELU instruction high-level API implementation -void bb_relu(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter) { - uint64_t rs1_val = RELU_ENCODE_RS1(op1_addr); - uint64_t rs2_val = RELU_ENCODE_RS2(wr_addr, iter); - RELU_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/39_bbus_config.c b/bb-tests/workloads/lib/bbhw/isa/39_bbus_config.c deleted file mode 100644 index 7cab7796..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/39_bbus_config.c +++ /dev/null @@ -1,35 +0,0 @@ -#include "isa.h" -#include -#include -// =========================== for simulator =========================== -const InstructionConfig bbus_config_config = { - .rs1_fields = (BitFieldConfig[]){{NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"src_bid", 25, 30}, - {"dst_bid", 31, 36}, - {"enable", 37, 37}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define bbus_config_ENCODE_RS1(op_addr) (0) -#define bbus_config_ENCODE_RS2(src_bid, dst_bid, enable) \ - (ENCODE_FIELD(src_bid, 25, 6) | ENCODE_FIELD(dst_bid, 31, 6) | \ - ENCODE_FIELD(enable, 37, 1)) - -// bbus_config instruction low-level implementation -#ifndef __x86_64__ -#define bbus_config_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 39, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define bbus_config_RAW(rs1, rs2) -#endif - -// bbus_config instruction high-level API implementation -void bb_bbus_config(uint32_t src_bid, uint32_t dst_bid, uint64_t enable) { - uint64_t rs1_val = bbus_config_ENCODE_RS1(0); - uint64_t rs2_val = bbus_config_ENCODE_RS2(src_bid, dst_bid, enable); - bbus_config_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/40_nnlut.c b/bb-tests/workloads/lib/bbhw/isa/40_nnlut.c deleted file mode 100644 index b20bc8cb..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/40_nnlut.c +++ /dev/null @@ -1,31 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig nnlut_config = { - .rs1_fields = (BitFieldConfig[]){{"op_spaddr", 0, 14}, {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"wr_spaddr", 0, 14}, {"iter", 15, 24}, {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define NNLUT_ENCODE_RS1(op_addr) (ENCODE_FIELD(op_addr, 0, 15)) -#define NNLUT_ENCODE_RS2(wr_addr, iter) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10)) - -// NNLUT instruction low-level implementation -#ifndef __x86_64__ -#define NNLUT_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 40, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define NNLUT_RAW(rs1, rs2) -#endif - -// NNLUT instruction high-level API implementation -void bb_nnlut(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter) { - uint64_t rs1_val = NNLUT_ENCODE_RS1(op1_addr); - uint64_t rs2_val = NNLUT_ENCODE_RS2(wr_addr, iter); - NNLUT_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/41_snn.c b/bb-tests/workloads/lib/bbhw/isa/41_snn.c deleted file mode 100644 index 2e4980d3..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/41_snn.c +++ /dev/null @@ -1,36 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig snn_config = { - .rs1_fields = (BitFieldConfig[]){{"op_spaddr", 0, 14}, {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){{"wr_spaddr", 0, 14}, - {"iter", 15, 24}, - {"threshold", 25, 32}, - {"leak_factor", 33, 40}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define SNN_ENCODE_RS1(op_addr) (ENCODE_FIELD(op_addr, 0, 15)) -#define SNN_ENCODE_RS2(wr_addr, iter, threshold, leak_factor) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(threshold, 25, 8) | ENCODE_FIELD(leak_factor, 33, 8)) - -// SNN instruction low-level implementation -#ifndef __x86_64__ -#define SNN_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 41, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define SNN_RAW(rs1, rs2) -#endif - -// SNN instruction high-level API implementation -void bb_snn(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter, - uint32_t threshold, uint32_t leak_factor) { - uint64_t rs1_val = SNN_ENCODE_RS1(op1_addr); - uint64_t rs2_val = SNN_ENCODE_RS2(wr_addr, iter, threshold, leak_factor); - SNN_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/42_abft_systolic.c b/bb-tests/workloads/lib/bbhw/isa/42_abft_systolic.c deleted file mode 100644 index d6e806d5..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/42_abft_systolic.c +++ /dev/null @@ -1,36 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig abft_systolic_config = { - .rs1_fields = (BitFieldConfig[]){{"op1_spaddr", 0, 14}, - {"op2_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"wr_spaddr", 0, 14}, {"iter", 15, 24}, {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define ABFT_SYSTOLIC_ENCODE_RS1(op1_addr, op2_addr) \ - (ENCODE_FIELD(op1_addr, 0, 15) | ENCODE_FIELD(op2_addr, 15, 15)) - -#define ABFT_SYSTOLIC_ENCODE_RS2(wr_addr, iter) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10)) - -// ABFT_SYSTOLIC instruction low-level implementation -#ifndef __x86_64__ -#define ABFT_SYSTOLIC_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 42, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define ABFT_SYSTOLIC_RAW(rs1, rs2) -#endif - -// ABFT_SYSTOLIC instruction high-level API implementation -void bb_abft_systolic(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter) { - uint64_t rs1_val = ABFT_SYSTOLIC_ENCODE_RS1(op1_addr, op2_addr); - uint64_t rs2_val = ABFT_SYSTOLIC_ENCODE_RS2(wr_addr, iter); - ABFT_SYSTOLIC_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/43_conv.c b/bb-tests/workloads/lib/bbhw/isa/43_conv.c deleted file mode 100644 index 47261d3d..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/43_conv.c +++ /dev/null @@ -1,57 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig conv_config = { - .rs1_fields = (BitFieldConfig[]){{"ifmap_spaddr", 0, 14}, - {"weight_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"ofmap_spaddr", 0, 14}, - {"iter", 15, 24}, - // special field is 40 bits: rs2(63, spAddrLen + 10) = rs2(63, 24) = 40 - // bits Encode: in_height[15:0], in_width[31:16], kernel_h[39:32], - // kernel_w[47:40] but only 40 bits available Adjust: use - // in_height[15:0], in_width[31:16], kernel_h[39:32], kernel_w same as - // kernel_h - {"in_height", 25, 40}, - {"in_width", 41, 56}, - {"kernel_h", 57, 64}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define CONV_ENCODE_RS1(ifmap_addr, weight_addr) \ - (ENCODE_FIELD(ifmap_addr, 0, 15) | ENCODE_FIELD(weight_addr, 15, 15)) - -// Note: special field is only 40 bits (rs2[63:24]) -// DomainDecoder extracts rs2(63, spAddrLen + 10) = rs2(63, 24) for special -// So special[39:0] = rs2[63:24] -// Encode in special: in_height[15:0] = special[15:0], in_width[15:0] = -// special[31:16], kernel_h[7:0] = special[39:32] kernel_w is assumed to equal -// kernel_h for simplicity -#define CONV_ENCODE_RS2(ofmap_addr, iter, in_height, in_width, kernel_h, \ - kernel_w) \ - (ENCODE_FIELD(ofmap_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(in_height, 24, 16) | ENCODE_FIELD(in_width, 40, 16) | \ - ENCODE_FIELD(kernel_h, 56, 8)) - -// CONV instruction low-level implementation -#ifndef __x86_64__ -#define CONV_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 43, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define CONV_RAW(rs1, rs2) -#endif - -// CONV instruction high-level API implementation -void bb_conv(uint32_t ifmap_addr, uint32_t weight_addr, uint32_t ofmap_addr, - uint32_t iter, uint32_t in_height, uint32_t in_width, - uint32_t kernel_h, uint32_t kernel_w) { - uint64_t rs1_val = CONV_ENCODE_RS1(ifmap_addr, weight_addr); - uint64_t rs2_val = CONV_ENCODE_RS2(ofmap_addr, iter, in_height, in_width, - kernel_h, kernel_w); - CONV_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/44_cim.c b/bb-tests/workloads/lib/bbhw/isa/44_cim.c deleted file mode 100644 index fb12e53d..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/44_cim.c +++ /dev/null @@ -1,51 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig cim_config = { - .rs1_fields = (BitFieldConfig[]){{"op1_spaddr", 0, 14}, - {"op2_spaddr", 15, 29}, - {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"result_spaddr", 0, 14}, - {"iter", 15, 24}, - // special field is 40 bits: rs2(63, spAddrLen + 10) = rs2(63, 24) = 40 - // bits Encode: rows[15:0], cols[31:16], op_type[35:32] - {"rows", 25, 40}, - {"cols", 41, 56}, - {"op_type", 57, 60}, - {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define CIM_ENCODE_RS1(op1_addr, op2_addr) \ - (ENCODE_FIELD(op1_addr, 0, 15) | ENCODE_FIELD(op2_addr, 15, 15)) - -// Note: special field is only 40 bits (rs2[63:24]) -// DomainDecoder extracts rs2(63, spAddrLen + 10) = rs2(63, 24) for special -// So special[39:0] = rs2[63:24] -// Encode in special: rows[15:0] = special[15:0], cols[15:0] = special[31:16], -// op_type[3:0] = special[35:32] In rs2: rows in rs2[39:24], cols in rs2[55:40], -// op_type in rs2[59:56] -#define CIM_ENCODE_RS2(result_addr, iter, rows, cols, op_type) \ - (ENCODE_FIELD(result_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10) | \ - ENCODE_FIELD(rows, 24, 16) | ENCODE_FIELD(cols, 40, 16) | \ - ENCODE_FIELD(op_type, 56, 4)) - -// CIM instruction low-level implementation -#ifndef __x86_64__ -#define CIM_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 44, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define CIM_RAW(rs1, rs2) -#endif - -// CIM instruction high-level API implementation -void bb_cim(uint32_t op1_addr, uint32_t op2_addr, uint32_t result_addr, - uint32_t iter, uint32_t rows, uint32_t cols, uint32_t op_type) { - uint64_t rs1_val = CIM_ENCODE_RS1(op1_addr, op2_addr); - uint64_t rs2_val = CIM_ENCODE_RS2(result_addr, iter, rows, cols, op_type); - CIM_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/45_transfer.c b/bb-tests/workloads/lib/bbhw/isa/45_transfer.c deleted file mode 100644 index 241f7605..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/45_transfer.c +++ /dev/null @@ -1,32 +0,0 @@ -#include "isa.h" - -// =========================== for simulator =========================== -const InstructionConfig transfer_config = { - .rs1_fields = (BitFieldConfig[]){{"op1_spaddr", 0, 14}, {NULL, 0, 0}}, - .rs2_fields = (BitFieldConfig[]){ - {"wr_spaddr", 0, 14}, {"iter", 15, 24}, {NULL, 0, 0}}}; - -// =========================== for CTest =========================== -#define TRANSFER_ENCODE_RS1(op1_addr) (ENCODE_FIELD(op1_addr, 0, 15)) -#define TRANSFER_ENCODE_RS2(wr_addr, iter) \ - (ENCODE_FIELD(wr_addr, 0, 15) | ENCODE_FIELD(iter, 15, 10)) - -// TRANSFER instruction low-level implementation -#ifndef __x86_64__ -#define TRANSFER_RAW(rs1, rs2) \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 45, x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2) \ - : "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define TRANSFER_RAW(rs1, rs2) -#endif - -// TRANSFER instruction high-level API implementation -void bb_transfer(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter) { - if (iter > 1023) iter = 1023; - uint64_t rs1_val = TRANSFER_ENCODE_RS1(op1_addr); - uint64_t rs2_val = TRANSFER_ENCODE_RS2(wr_addr, iter); - TRANSFER_RAW(rs1_val, rs2_val); -} diff --git a/bb-tests/workloads/lib/bbhw/isa/48_im2col.c b/bb-tests/workloads/lib/bbhw/isa/48_im2col.c new file mode 100644 index 00000000..eaaa8488 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/48_im2col.c @@ -0,0 +1,17 @@ +#ifndef _BB_IM2COL_H_ +#define _BB_IM2COL_H_ + +#include "isa.h" + +#define BB_IM2COL_FUNC7 48 + +#define bb_im2col(op1_bank_id, wr_bank_id, krow, kcol, inrow, incol, startrow, \ + startcol) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(op1_bank_id) | BB_BANK2(wr_bank_id)), \ + (FIELD(kcol, 0, 3) | FIELD(krow, 4, 7) | \ + FIELD(incol, 8, 12) | FIELD(inrow, 13, 22) | \ + FIELD(startcol, 23, 27) | \ + FIELD(startrow, 28, 37)), \ + BB_IM2COL_FUNC7) + +#endif // _BB_IM2COL_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/49_transpose.c b/bb-tests/workloads/lib/bbhw/isa/49_transpose.c new file mode 100644 index 00000000..7421b5b2 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/49_transpose.c @@ -0,0 +1,13 @@ +#ifndef _BB_TRANSPOSE_H_ +#define _BB_TRANSPOSE_H_ + +#include "isa.h" + +#define BB_TRANSPOSE_FUNC7 49 + +#define bb_transpose(op1_bank_id, wr_bank_id, iter, mode) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(op1_bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + (FIELD(mode, 0, 63)), BB_TRANSPOSE_FUNC7) + +#endif // _BB_TRANSPOSE_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/50_relu.c b/bb-tests/workloads/lib/bbhw/isa/50_relu.c new file mode 100644 index 00000000..4bb12ec6 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/50_relu.c @@ -0,0 +1,13 @@ +#ifndef _BB_RELU_H_ +#define _BB_RELU_H_ + +#include "isa.h" + +#define BB_RELU_FUNC7 50 + +#define bb_relu(bank_id, wr_bank_id, iter) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), 0, \ + BB_RELU_FUNC7) + +#endif // _BB_RELU_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/51_quant.c b/bb-tests/workloads/lib/bbhw/isa/51_quant.c new file mode 100644 index 00000000..7cde739d --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/51_quant.c @@ -0,0 +1,17 @@ +#ifndef _BB_QUANT_H_ +#define _BB_QUANT_H_ + +#include "isa.h" + +#define BB_QUANT_FUNC7 51 + +// bb_quant(bank_id, wr_bank_id, iter, scale_fp32) +// scale_fp32 is a 32-bit FP32 value passed as uint32_t bit pattern +// Encoding: rs1 = banks | iter +// rs2 = scale_fp32 +#define bb_quant(bank_id, wr_bank_id, iter, scale_fp32) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + (FIELD((uint64_t)(scale_fp32), 0, 31)), BB_QUANT_FUNC7) + +#endif // _BB_QUANT_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/52_dequant.c b/bb-tests/workloads/lib/bbhw/isa/52_dequant.c new file mode 100644 index 00000000..6bbba87e --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/52_dequant.c @@ -0,0 +1,17 @@ +#ifndef _BB_DEQUANT_H_ +#define _BB_DEQUANT_H_ + +#include "isa.h" + +#define BB_DEQUANT_FUNC7 52 + +// bb_dequant(bank_id, wr_bank_id, iter, scale_fp32) +// scale_fp32 is a 32-bit FP32 value passed as uint32_t bit pattern +// Encoding: rs1 = banks | iter +// rs2 = FIELD(scale_fp32, 0, 31) +#define bb_dequant(bank_id, wr_bank_id, iter, scale_fp32) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + (FIELD((uint64_t)(scale_fp32), 0, 31)), BB_DEQUANT_FUNC7) + +#endif // _BB_DEQUANT_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/53_gemmini_preload.c b/bb-tests/workloads/lib/bbhw/isa/53_gemmini_preload.c new file mode 100644 index 00000000..fc700fa1 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/53_gemmini_preload.c @@ -0,0 +1,17 @@ +#ifndef _BB_GEMMINI_PRELOAD_H_ +#define _BB_GEMMINI_PRELOAD_H_ + +#include "isa.h" + +#define BB_GEMMINI_PRELOAD_FUNC7 53 + +// Preload D/B matrix into systolic array +// op1_bank_id: source bank for D (OS) or B (WS) +// wr_bank_id: destination bank for C output +// iter: number of rows to preload +#define bb_gemmini_preload(op1_bank_id, wr_bank_id, iter) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(op1_bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), 0, \ + BB_GEMMINI_PRELOAD_FUNC7) + +#endif // _BB_GEMMINI_PRELOAD_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/54_bdb_backdoor.c b/bb-tests/workloads/lib/bbhw/isa/54_bdb_backdoor.c new file mode 100644 index 00000000..27d981be --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/54_bdb_backdoor.c @@ -0,0 +1,25 @@ +#ifndef _BDB_BACKDOOR_H_ +#define _BDB_BACKDOOR_H_ + +#include "isa.h" + +#define BDB_BACKDOOR_FUNC7 54 + +// Backdoor write: DPI-C provides data, write to external bank +// bank_id = target bank, iter = number of rows +#define bdb_backdoor_write(bank_id, iter) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK2(bank_id) | BB_ITER(iter)), 0, \ + BDB_BACKDOOR_FUNC7) + +// Backdoor read: read from external bank, output via DPI-C +// bank_id = source bank, iter = number of rows +#define bdb_backdoor_read(bank_id, iter) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(bank_id) | BB_ITER(iter)), 0, \ + BDB_BACKDOOR_FUNC7) + +// Backdoor peek: read single row from external bank +#define bdb_backdoor_peek(bank_id, row_count) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(bank_id) | BB_ITER(row_count)), 0, \ + BDB_BACKDOOR_FUNC7) + +#endif // _BDB_BACKDOOR_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/55_mxfp.c b/bb-tests/workloads/lib/bbhw/isa/55_mxfp.c new file mode 100644 index 00000000..9bb1170e --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/55_mxfp.c @@ -0,0 +1,24 @@ +#ifndef _BB_MXFP_H_ +#define _BB_MXFP_H_ + +#include "isa.h" + +#define BB_MXFP_FUNC7 55 + +// Basic version: +// rs1 = bank0(read) | bank2(write) | iter +// rs2 = 0 +#define bb_mxfp(bank_id, wr_bank_id, iter) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), 0, \ + BB_MXFP_FUNC7) + +// Extended version: +// rs2 carries user-defined special field. +// Useful later for format select / rounding mode / debug flags. +#define bb_mxfp_ex(bank_id, wr_bank_id, iter, special) \ + BUCKYBALL_INSTRUCTION_R_R( \ + (BB_BANK0(bank_id) | BB_BANK2(wr_bank_id) | BB_ITER(iter)), (special), \ + BB_MXFP_FUNC7) + +#endif // _BB_MXFP_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/64_mul_warp16.c b/bb-tests/workloads/lib/bbhw/isa/64_mul_warp16.c new file mode 100644 index 00000000..2f732742 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/64_mul_warp16.c @@ -0,0 +1,13 @@ +#ifndef _BB_MUL_WARP16_H_ +#define _BB_MUL_WARP16_H_ + +#include "isa.h" + +#define BB_MUL_WARP16_FUNC7 64 + +#define bb_mul_warp16(op1_bank_id, op2_bank_id, wr_bank_id, iter, mode) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(op1_bank_id) | BB_BANK1(op2_bank_id) | \ + BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + (FIELD(mode, 0, 63)), BB_MUL_WARP16_FUNC7) + +#endif // _BB_MUL_WARP16_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/65_bfp.c b/bb-tests/workloads/lib/bbhw/isa/65_bfp.c new file mode 100644 index 00000000..fdde6a91 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/65_bfp.c @@ -0,0 +1,13 @@ +#ifndef _BB_BFP_H_ +#define _BB_BFP_H_ + +#include "isa.h" + +#define BB_BFP_FUNC7 65 + +#define bb_BFP(op1_bank_id, op2_bank_id, wr_bank_id, iter, mode) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(op1_bank_id) | BB_BANK1(op2_bank_id) | \ + BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + (FIELD(mode, 0, 63)), BB_BFP_FUNC7) + +#endif // _BB_BFP_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/66_gemmini_compute_preloaded.c b/bb-tests/workloads/lib/bbhw/isa/66_gemmini_compute_preloaded.c new file mode 100644 index 00000000..6bd6fa6a --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/66_gemmini_compute_preloaded.c @@ -0,0 +1,19 @@ +#ifndef _BB_GEMMINI_COMPUTE_PRELOADED_H_ +#define _BB_GEMMINI_COMPUTE_PRELOADED_H_ + +#include "isa.h" + +#define BB_GEMMINI_COMPUTE_PRELOADED_FUNC7 66 + +// Compute matmul using preloaded data: C = A * B + D(preloaded) +// op1_bank_id: bank for A matrix +// op2_bank_id: bank for B matrix (OS) or D matrix (WS) +// wr_bank_id: bank for C output +// iter: number of rows +#define bb_gemmini_compute_preloaded(op1_bank_id, op2_bank_id, wr_bank_id, \ + iter) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(op1_bank_id) | BB_BANK1(op2_bank_id) | \ + BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + 0, BB_GEMMINI_COMPUTE_PRELOADED_FUNC7) + +#endif // _BB_GEMMINI_COMPUTE_PRELOADED_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/67_gemmini_compute_accumulated.c b/bb-tests/workloads/lib/bbhw/isa/67_gemmini_compute_accumulated.c new file mode 100644 index 00000000..93239d52 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/67_gemmini_compute_accumulated.c @@ -0,0 +1,19 @@ +#ifndef _BB_GEMMINI_COMPUTE_ACCUMULATED_H_ +#define _BB_GEMMINI_COMPUTE_ACCUMULATED_H_ + +#include "isa.h" + +#define BB_GEMMINI_COMPUTE_ACCUMULATED_FUNC7 67 + +// Compute matmul reusing previously accumulated results +// op1_bank_id: bank for A matrix +// op2_bank_id: bank for B/D matrix +// wr_bank_id: bank for C output +// iter: number of rows +#define bb_gemmini_compute_accumulated(op1_bank_id, op2_bank_id, wr_bank_id, \ + iter) \ + BUCKYBALL_INSTRUCTION_R_R((BB_BANK0(op1_bank_id) | BB_BANK1(op2_bank_id) | \ + BB_BANK2(wr_bank_id) | BB_ITER(iter)), \ + 0, BB_GEMMINI_COMPUTE_ACCUMULATED_FUNC7) + +#endif // _BB_GEMMINI_COMPUTE_ACCUMULATED_H_ diff --git a/bb-tests/workloads/lib/bbhw/isa/7_flush.c b/bb-tests/workloads/lib/bbhw/isa/7_flush.c deleted file mode 100644 index 4d37a88d..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/7_flush.c +++ /dev/null @@ -1,14 +0,0 @@ -#include "isa.h" - -// =========================== for CTest =========================== -// FLUSH instruction low-level implementation -#ifndef __x86_64__ -#define FLUSH_RAW() \ - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 7, x0, x0, x0" ::: "memory") -#else -// Do not execute RISC-V instructions on x86 platform -#define FLUSH_RAW() -#endif - -// FLUSH instruction high-level API implementation -void bb_flush(void) { FLUSH_RAW(); } diff --git a/bb-tests/workloads/lib/bbhw/isa/87_gemmini_loop_ws.c b/bb-tests/workloads/lib/bbhw/isa/87_gemmini_loop_ws.c new file mode 100644 index 00000000..3b99ec94 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/isa/87_gemmini_loop_ws.c @@ -0,0 +1,52 @@ +#ifndef _BB_GEMMINI_LOOP_WS_H +#define _BB_GEMMINI_LOOP_WS_H + +#include "isa.h" + +#define BB_GEMMINI_LOOP_WS_CONFIG_BOUNDS_FUNC7 80 +#define BB_GEMMINI_LOOP_WS_CONFIG_ADDR_A_FUNC7 81 +#define BB_GEMMINI_LOOP_WS_CONFIG_ADDR_B_FUNC7 82 +#define BB_GEMMINI_LOOP_WS_CONFIG_ADDR_D_FUNC7 83 +#define BB_GEMMINI_LOOP_WS_CONFIG_ADDR_C_FUNC7 84 +#define BB_GEMMINI_LOOP_WS_CONFIG_STRIDES_AB_FUNC7 85 +#define BB_GEMMINI_LOOP_WS_CONFIG_STRIDES_DC_FUNC7 86 +#define BB_GEMMINI_LOOP_WS_FUNC7 87 + +#define bb_gemmini_loop_ws_config_bounds(max_i, max_j, max_k) \ + BUCKYBALL_INSTRUCTION_R_R( \ + 0, (FIELD(max_k, 0, 15) | FIELD(max_j, 16, 31) | FIELD(max_i, 32, 47)), \ + BB_GEMMINI_LOOP_WS_CONFIG_BOUNDS_FUNC7) + +#define bb_gemmini_loop_ws_config_addr_a(addr) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr, 0, 38), \ + BB_GEMMINI_LOOP_WS_CONFIG_ADDR_A_FUNC7) + +#define bb_gemmini_loop_ws_config_addr_b(addr) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr, 0, 38), \ + BB_GEMMINI_LOOP_WS_CONFIG_ADDR_B_FUNC7) + +#define bb_gemmini_loop_ws_config_addr_d(addr) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr, 0, 38), \ + BB_GEMMINI_LOOP_WS_CONFIG_ADDR_D_FUNC7) + +#define bb_gemmini_loop_ws_config_addr_c(addr) \ + BUCKYBALL_INSTRUCTION_R_R(0, FIELD(addr, 0, 38), \ + BB_GEMMINI_LOOP_WS_CONFIG_ADDR_C_FUNC7) + +#define bb_gemmini_loop_ws_config_strides_ab(stride_a, stride_b) \ + BUCKYBALL_INSTRUCTION_R_R( \ + 0, (FIELD(stride_a, 0, 31) | FIELD(stride_b, 32, 63)), \ + BB_GEMMINI_LOOP_WS_CONFIG_STRIDES_AB_FUNC7) + +#define bb_gemmini_loop_ws_config_strides_dc(stride_d, stride_c) \ + BUCKYBALL_INSTRUCTION_R_R( \ + 0, (FIELD(stride_d, 0, 31) | FIELD(stride_c, 32, 63)), \ + BB_GEMMINI_LOOP_WS_CONFIG_STRIDES_DC_FUNC7) + +#define bb_gemmini_loop_ws(bank_a, bank_b, bank_c, low_d) \ + BUCKYBALL_INSTRUCTION_R_R(0, \ + (FIELD(bank_a, 0, 9) | FIELD(bank_b, 10, 19) | \ + FIELD(bank_c, 20, 29) | FIELD(low_d, 30, 30)), \ + BB_GEMMINI_LOOP_WS_FUNC7) + +#endif diff --git a/bb-tests/workloads/lib/bbhw/isa/CMakeLists.txt b/bb-tests/workloads/lib/bbhw/isa/CMakeLists.txt deleted file mode 100644 index 33122850..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/CMakeLists.txt +++ /dev/null @@ -1,167 +0,0 @@ -# ISA submodule library - -set(ISA_SOURCES - isa.c - 24_mvin.c - 25_mvout.c - 31_fence.c - 32_mul_warp16.c - 26_bbfp_mul.c - 27_matmul_ws.c - 33_im2col.c - 34_transpose.c - 38_relu.c - 39_bbus_config.c - 40_nnlut.c - 41_snn.c - 42_abft_systolic.c - 43_conv.c - 44_cim.c - 45_transfer.c - 7_flush.c -) - -# 1. Linux version -add_library(bbisa-linux STATIC IMPORTED) -set_target_properties(bbisa-linux PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-linux.a) - -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-linux.a - COMMAND riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/isa.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-isa.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/24_mvin.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-24_mvin.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/25_mvout.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-25_mvout.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/31_fence.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-31_fence.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/32_mul_warp16.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-32_mul_warp16.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/26_bbfp_mul.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-26_bbfp_mul.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/27_matmul_ws.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-27_matmul_ws.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/33_im2col.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-33_im2col.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/34_transpose.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-34_transpose.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/38_relu.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-38_relu.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/39_bbus_config.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-39_bbus_config.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/40_nnlut.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-40_nnlut.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/41_snn.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-41_snn.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/42_abft_systolic.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-42_abft_systolic.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/43_conv.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-43_conv.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/44_cim.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-44_cim.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/45_transfer.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-45_transfer.o - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/7_flush.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-7_flush.o - && ar rcs ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-linux.a - linux-isa.o - linux-24_mvin.o - linux-25_mvout.o - linux-31_fence.o - linux-32_mul_warp16.o - linux-26_bbfp_mul.o - linux-27_matmul_ws.o - linux-33_im2col.o - linux-34_transpose.o - linux-38_relu.o - linux-39_bbus_config.o - linux-40_nnlut.o - linux-41_snn.o - linux-42_abft_systolic.o - linux-43_conv.o - linux-44_cim.o - linux-45_transfer.o - linux-7_flush.o - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Building RISC-V Linux version of ISA library" -) -add_custom_target(bbisa-linux-build ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-linux.a) - -# 2. Baremetal version -add_library(bbisa-baremetal STATIC IMPORTED) -set_target_properties(bbisa-baremetal PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-baremetal.a) - -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-baremetal.a - COMMAND riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/isa.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-isa.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/24_mvin.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-24_mvin.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/25_mvout.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-25_mvout.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/31_fence.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-31_fence.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/32_mul_warp16.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-32_mul_warp16.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/26_bbfp_mul.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-26_bbfp_mul.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/27_matmul_ws.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-27_matmul_ws.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/33_im2col.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-33_im2col.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/34_transpose.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-34_transpose.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/38_relu.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-38_relu.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/39_bbus_config.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-39_bbus_config.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/40_nnlut.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-40_nnlut.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/41_snn.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-41_snn.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/42_abft_systolic.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-42_abft_systolic.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/43_conv.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-43_conv.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/44_cim.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-44_cim.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/45_transfer.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-45_transfer.o - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/7_flush.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-7_flush.o - && ar rcs ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-baremetal.a - baremetal-isa.o - baremetal-24_mvin.o - baremetal-25_mvout.o - baremetal-31_fence.o - baremetal-32_mul_warp16.o - baremetal-26_bbfp_mul.o - baremetal-27_matmul_ws.o - baremetal-33_im2col.o - baremetal-34_transpose.o - baremetal-38_relu.o - baremetal-39_bbus_config.o - baremetal-40_nnlut.o - baremetal-41_snn.o - baremetal-42_abft_systolic.o - baremetal-43_conv.o - baremetal-44_cim.o - baremetal-45_transfer.o - baremetal-7_flush.o - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Building RISC-V Baremetal version of ISA library" -) -add_custom_target(bbisa-baremetal-build ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-baremetal.a) - -# 3. x86 version -add_library(bbisa-x86 STATIC IMPORTED) -set_target_properties(bbisa-x86 PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-x86.a) - -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-x86.a - COMMAND gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/isa.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-isa.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/24_mvin.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-24_mvin.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/25_mvout.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-25_mvout.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/31_fence.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-31_fence.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/32_mul_warp16.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-32_mul_warp16.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/26_bbfp_mul.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-26_bbfp_mul.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/27_matmul_ws.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-27_matmul_ws.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/33_im2col.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-33_im2col.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/34_transpose.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-34_transpose.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/38_relu.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-38_relu.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/39_bbus_config.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-39_bbus_config.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/40_nnlut.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-40_nnlut.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/41_snn.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-41_snn.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/42_abft_systolic.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-42_abft_systolic.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/43_conv.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-43_conv.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/44_cim.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-44_cim.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/45_transfer.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-45_transfer.o - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/7_flush.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-7_flush.o - && ar rcs ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-x86.a - x86-isa.o - x86-24_mvin.o - x86-25_mvout.o - x86-31_fence.o - x86-32_mul_warp16.o - x86-26_bbfp_mul.o - x86-27_matmul_ws.o - x86-33_im2col.o - x86-34_transpose.o - x86-38_relu.o - x86-39_bbus_config.o - x86-40_nnlut.o - x86-41_snn.o - x86-42_abft_systolic.o - x86-43_conv.o - x86-44_cim.o - x86-45_transfer.o - x86-7_flush.o - && ranlib ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-x86.a - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Building x86_64 version of ISA library" -) -add_custom_target(bbisa-x86-build ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbisa-x86.a) diff --git a/bb-tests/workloads/lib/bbhw/isa/README.md b/bb-tests/workloads/lib/bbhw/isa/README.md deleted file mode 100644 index f7e15684..00000000 --- a/bb-tests/workloads/lib/bbhw/isa/README.md +++ /dev/null @@ -1,79 +0,0 @@ -# Buckyball ISA Instruction Registration Mechanism - -## Overview - -This ISA library now supports a dynamic instruction registration mechanism, allowing you to easily add new custom instructions without modifying core code. - -## How to Add New Instructions - -### 1. Define New Instruction Type in Header File - -Add new instructions to the `InstructionType` enum in `isa.h`: - -```c -typedef enum { - // Existing instructions... - CUSTOM_ADD_FUNC7 = 50, // New custom addition instruction - CUSTOM_SUB_FUNC7 = 51, // New custom subtraction instruction -} InstructionType; -``` - -### 2. Implement Instruction Execution Function - -Create a new .c file or add to existing file: - -```c -#ifndef __x86_64__ -static void execute_custom_add(uint32_t rs1_val, uint32_t rs2_val) { - asm volatile(".insn r " STR(CUSTOM_3) ", 0x3, 50, x0, %0, %1" - : : "r"(rs1_val), "r"(rs2_val) : "memory"); -} -#else -static void execute_custom_add_nop(uint32_t rs1_val, uint32_t rs2_val) { - // No-op implementation on x86 platform -} -#endif -``` - -### 3. Register Instruction - -Register new instruction in initialization code: - -```c -void init_custom_instructions(void) { -#ifndef __x86_64__ - register_instruction(CUSTOM_ADD_FUNC7, execute_custom_add); -#else - register_instruction(CUSTOM_ADD_FUNC7, execute_custom_add_nop); -#endif -} -``` - -### 4. Use Instruction - -```c -BuckyballInstruction inst = build_instruction(CUSTOM_ADD_FUNC7); -InstructionBuilder builder = create_builder(&inst, CUSTOM_ADD_FUNC7); - -// Set parameters -builder.set.rs1(builder.set.builder_ptr, "field_name", value1); -builder.set.rs2(builder.set.builder_ptr, "field_name", value2); - -// Execute instruction -execute_builder(builder); -``` - -## Advantages - -1. **Modularity**: New instructions can be implemented in separate files -2. **Extensibility**: Add new instructions without modifying core code -3. **Platform Compatibility**: Automatically handles differences between x86 and RISC-V platforms -4. **Type Safety**: Compile-time checking of instruction types -5. **Performance**: Minimal runtime lookup overhead - -## Notes - -- func7 values must be in the range 0-127 -- Each instruction type can only be registered once -- Supports up to 16 instructions (can be adjusted by modifying MAX_INSTRUCTIONS) -- On RISC-V platforms, func7 values in inline assembly must be compile-time constants diff --git a/bb-tests/workloads/lib/bbhw/isa/isa.c b/bb-tests/workloads/lib/bbhw/isa/isa.c index 065ec429..28315640 100644 --- a/bb-tests/workloads/lib/bbhw/isa/isa.c +++ b/bb-tests/workloads/lib/bbhw/isa/isa.c @@ -1,92 +1 @@ #include "isa.h" -#include - -// =========================== for simulator =========================== -uint32_t get_bbinst_field(uint64_t value, const char *field_name, - const BitFieldConfig *config) { - for (int i = 0; config[i].name != NULL; i++) { - if (strcmp(config[i].name, field_name) == 0) { - uint32_t bit_width = config[i].end_bit - config[i].start_bit + 1; - uint64_t mask = ((1ULL << bit_width) - 1); - return (value >> config[i].start_bit) & mask; - } - } - // Field not found - return 0; -} - -void set_bbinst_field(uint64_t *value, const char *field_name, - uint32_t field_value, const BitFieldConfig *config) { - for (int i = 0; config[i].name != NULL; i++) { - if (strcmp(config[i].name, field_name) == 0) { - uint32_t bit_width = config[i].end_bit - config[i].start_bit + 1; - uint64_t mask = ((1ULL << bit_width) - 1); - // Clear original value - *value &= ~(mask << config[i].start_bit); - // Set new value - *value |= ((uint64_t)(field_value & mask) << config[i].start_bit); - return; - } - } -} - -// External configuration declarations - defined in individual instruction files -extern const InstructionConfig mvin_config; -extern const InstructionConfig mvout_config; -extern const InstructionConfig mul_warp16_config; -extern const InstructionConfig bbfp_mul_config; -extern const InstructionConfig matmul_ws_config; -extern const InstructionConfig im2col_config; -extern const InstructionConfig transpose_config; -extern const InstructionConfig relu_config; -extern const InstructionConfig bbus_config_config; -extern const InstructionConfig nnlut_config; -extern const InstructionConfig snn_config; -extern const InstructionConfig abft_systolic_config; -extern const InstructionConfig conv_config; -extern const InstructionConfig cim_config; -extern const InstructionConfig transfer_config; - -// Get instruction configuration by func7 -const InstructionConfig *config(InstructionType func7) { - switch (func7) { - case MVIN_FUNC7: - return &mvin_config; - case MVOUT_FUNC7: - return &mvout_config; - case MUL_WARP16_FUNC7: - return &mul_warp16_config; - case BBFP_MUL_FUNC7: - return &bbfp_mul_config; - case MATMUL_WS_FUNC7: - return &matmul_ws_config; - case IM2COL_FUNC7: - return &im2col_config; - case TRANSPOSE_FUNC7: - return &transpose_config; - case RELU_FUNC7: - return &relu_config; - case BBUS_CONFIG_FUNC7: - return &bbus_config_config; - case NNLUT_FUNC7: - return &nnlut_config; - case SNN_FUNC7: - return &snn_config; - case ABFT_SYSTOLIC_FUNC7: - return &abft_systolic_config; - case CONV_FUNC7: - return &conv_config; - case CIM_FUNC7: - return &cim_config; - case TRANSFER_FUNC7: - return &transfer_config; - case FENCE_FUNC7: - // FENCE instruction has no parameters, no configuration needed - return NULL; - case FLUSH_FUNC7: - // FLUSH instruction has no parameters, no configuration needed - return NULL; - default: - return NULL; - } -} diff --git a/bb-tests/workloads/lib/bbhw/isa/isa.h b/bb-tests/workloads/lib/bbhw/isa/isa.h index a3a17af1..d628c2dd 100644 --- a/bb-tests/workloads/lib/bbhw/isa/isa.h +++ b/bb-tests/workloads/lib/bbhw/isa/isa.h @@ -4,103 +4,64 @@ #include #include -/* Pure C implementation - no C++ linkage needed */ - // Data type for matrix elements typedef int8_t elem_t; typedef int32_t result_t; // Custom instruction opcodes #define CUSTOM_3 0x7b -// String macros (from xcustom.h) + +// String macros #define STR1(x) #x #ifndef STR #define STR(x) STR1(x) #endif -// Generic field encoding macro -#define ENCODE_FIELD(value, start_bit, width) \ - (((value) & ((1ULL << (width)) - 1)) << (start_bit)) - -// Bit field configuration structure -typedef struct { - const char *name; // Field name (NULL indicates end of array) - uint32_t start_bit; // Start bit - uint32_t end_bit; // End bit (inclusive) -} BitFieldConfig; - -// Instruction type enum - directly uses func7 values -typedef enum { - MVIN_FUNC7 = 24, // 0x18 - Move in function code - MVOUT_FUNC7 = 25, // 0x19 - Move out function code - FENCE_FUNC7 = 31, // 0x1F - Fence function code - MUL_WARP16_FUNC7 = 32, // 0x20 - Matrix multiply function code - IM2COL_FUNC7 = 33, // 0x21 - Matrix im2col function code - TRANSPOSE_FUNC7 = 34, // 0x22 - Matrix transpose function code - FLUSH_FUNC7 = 7, // 0x07 - Flush function code - BBFP_MUL_FUNC7 = 26, // 0x1A - BBFP matrix multiply function code - MATMUL_WS_FUNC7 = 27, // 0x1B - Matrix multiply with warp16 function code - RELU_FUNC7 = 38, // 0x26 - ReLU activation function code - BBUS_CONFIG_FUNC7 = 39, // 0x27 - BBUS configuration function code - NNLUT_FUNC7 = 40, // 0x28 - NN-LUT lookup function code - SNN_FUNC7 = 41, // 0x29 - SNN spiking neural network function code - ABFT_SYSTOLIC_FUNC7 = 42, // 0x2A - ABFT systolic array function code - CONV_FUNC7 = 43, // 0x2B - CONV convolution function code - CIM_FUNC7 = 44, // 0x2C - CIM compute-in-memory function code - TRANSFER_FUNC7 = 45 // 0x2D - Transfer function code -} InstructionType; - -// Instruction configuration structure (for simulator) -typedef struct { - // Field configuration for rs1 register (terminated by NULL name) - const BitFieldConfig *rs1_fields; - // Field configuration for rs2 register (terminated by NULL name) - const BitFieldConfig *rs2_fields; -} InstructionConfig; - -// Generic field access functions (for simulator) -uint32_t get_bbinst_field(uint64_t value, const char *field_name, - const BitFieldConfig *config); -void set_bbinst_field(uint64_t *value, const char *field_name, - uint32_t field_value, const BitFieldConfig *config); +// Field encoding macro with start and end bit +#define FIELD(val, start_bit, end_bit) \ + (((val) & ((2UL << ((end_bit) - (start_bit))) - 1)) << (start_bit)) -// High-level API (for CTest) +// rs1 bank field helpers (10-bit each) +#define BB_BANK0(id) FIELD(id, 0, 9) +#define BB_BANK1(id) FIELD(id, 10, 19) +#define BB_BANK2(id) FIELD(id, 20, 29) -void bb_mvin(uint64_t mem_addr, uint32_t sp_addr, uint32_t iter, - uint32_t col_stride); -void bb_mvout(uint64_t mem_addr, uint32_t sp_addr, uint32_t iter, - uint32_t stride); -void bb_fence(void); -void bb_mul_warp16(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter, uint32_t mode); -void bb_bbfp_mul(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter); -void bb_matmul_ws(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter); -void bb_im2col(uint32_t op1_addr, uint32_t wr_addr, uint32_t krow, - uint32_t kcol, uint32_t inrow, uint32_t incol, uint32_t startrow, - uint32_t startcol); -void bb_transpose(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter, - uint32_t mode); -void bb_relu(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter); -void bb_bbus_config(uint32_t src_bid, uint32_t dst_bid, uint64_t enable); -void bb_nnlut(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter); -void bb_snn(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter, - uint32_t threshold, uint32_t leak_factor); -void bb_abft_systolic(uint32_t op1_addr, uint32_t op2_addr, uint32_t wr_addr, - uint32_t iter); -void bb_conv(uint32_t ifmap_addr, uint32_t weight_addr, uint32_t ofmap_addr, - uint32_t iter, uint32_t in_height, uint32_t in_width, - uint32_t kernel_h, uint32_t kernel_w); -void bb_cim(uint32_t op1_addr, uint32_t op2_addr, uint32_t result_addr, - uint32_t iter, uint32_t rows, uint32_t cols, uint32_t op_type); -void bb_transfer(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter); +// rs1 iter field (34-bit, bits 30-63) +#define BB_ITER(n) FIELD(n, 30, 63) -void bb_flush(void); +// funct7 encoding: [6:4]=enable, [3:0]=opcode +// enable: 000=none, 001=1rd, 010=1wr, 011=1rd+1wr, 100=2rd+1wr +// 101/110/111 = none (extended opcode space) -// Get instruction configuration by func7 -const InstructionConfig *config(InstructionType func7); +// Generic RISC-V custom instruction macro (funct3 always 0x3 = CUSTOM3_RS1_RS2) +#define BUCKYBALL_INSTRUCTION_R_R(rs1_val, rs2_val, func7) \ + asm volatile(".insn r " STR(CUSTOM_3) ", 3, %c2, x0, %0, %1" \ + : \ + : "r"(rs1_val), "r"(rs2_val), "i"(func7) \ + : "memory") -/* End of pure C header */ +// Include all instruction definitions +#include "00_fence.c" +#include "01_barrier.c" +#include "02_gemmini_config.c" +#include "03_gemmini_flush.c" +#include "04_bdb_counter.c" +#include "105_gemmini_loop_conv_ws.c" +#include "16_mvout.c" +#include "32_mset.c" +#include "33_mvin.c" +#include "48_im2col.c" +#include "49_transpose.c" +#include "50_relu.c" +#include "51_quant.c" +#include "52_dequant.c" +#include "53_gemmini_preload.c" +#include "54_bdb_backdoor.c" +#include "55_mxfp.c" +#include "64_mul_warp16.c" +#include "65_bfp.c" +#include "66_gemmini_compute_preloaded.c" +#include "67_gemmini_compute_accumulated.c" +#include "87_gemmini_loop_ws.c" #endif // BUCKYBALL_ISA_H diff --git a/bb-tests/workloads/lib/bbhw/mem/CMakeLists.txt b/bb-tests/workloads/lib/bbhw/mem/CMakeLists.txt deleted file mode 100644 index 983452b2..00000000 --- a/bb-tests/workloads/lib/bbhw/mem/CMakeLists.txt +++ /dev/null @@ -1,42 +0,0 @@ -# MEM submodule library - -set(MEM_SOURCES - bank.c - spad.c -) - -# 1. Linux version -add_library(bbmem-linux STATIC IMPORTED) -set_target_properties(bbmem-linux PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-linux.a) - -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-linux.a - COMMAND riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/bank.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-bank.o && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/spad.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-spad.o && ar rcs ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-linux.a linux-bank.o linux-spad.o - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Building RISC-V Linux version of MEM library" -) -add_custom_target(bbmem-linux-build ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-linux.a) - -# 2. Baremetal version -add_library(bbmem-baremetal STATIC IMPORTED) -set_target_properties(bbmem-baremetal PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-baremetal.a) - -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-baremetal.a - COMMAND riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/bank.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-bank.o && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/spad.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-spad.o && ar rcs ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-baremetal.a baremetal-bank.o baremetal-spad.o - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Building RISC-V Baremetal version of MEM library" -) -add_custom_target(bbmem-baremetal-build ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-baremetal.a) - -# 3. x86 version -add_library(bbmem-x86 STATIC IMPORTED) -set_target_properties(bbmem-x86 PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-x86.a) - -add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-x86.a - COMMAND gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/bank.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-bank.o && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/spad.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-spad.o && ar rcs ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-x86.a x86-bank.o x86-spad.o && ranlib ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-x86.a - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - COMMENT "Building x86_64 version of MEM library" -) -add_custom_target(bbmem-x86-build ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/libbbmem-x86.a) diff --git a/bb-tests/workloads/lib/bbhw/mem/bank.c b/bb-tests/workloads/lib/bbhw/mem/bank.c deleted file mode 100644 index 109bcb8c..00000000 --- a/bb-tests/workloads/lib/bbhw/mem/bank.c +++ /dev/null @@ -1,19 +0,0 @@ -#include "bank.h" - -const BankConfig bank_configs[BANK_NUM] = { - // SPAD area: bank 0-3, address 0-16383 - // bank 0: SPAD0, rows 0-4095 - {"spad0", 0, 4096, 16, 8}, - // bank 1: SPAD1, rows 4096-8191 - {"spad1", 4096, 4096, 16, 8}, - // bank 2: SPAD2, rows 8192-12287 - {"spad2", 8192, 4096, 16, 8}, - // bank 3: SPAD3, rows 12288-16383 - {"spad3", 12288, 4096, 16, 8}, - - // ACC area: bank 4+, starting at address 16384 - // (each ACC bank actually has 1024 rows, but address is aligned to 4096) - // bank 4: ACC0, rows 16384-17407 (uses first 1024 rows), accumulator uses - // 32 bits - {"acc0", 16384, 1024, 16, 32}, -}; diff --git a/bb-tests/workloads/lib/bbhw/mem/bank.h b/bb-tests/workloads/lib/bbhw/mem/bank.h deleted file mode 100644 index 8b313a1e..00000000 --- a/bb-tests/workloads/lib/bbhw/mem/bank.h +++ /dev/null @@ -1,55 +0,0 @@ -#ifndef _BANK_H_ -#define _BANK_H_ - -#include - -/* Pure C implementation - no C++ linkage needed */ - -typedef struct { - const char *name_; - uint32_t base_addr_; - uint32_t row_num_; - uint32_t elem_num_; - uint32_t elem_width_; -} BankConfig; - -static inline const char *bank_name(const BankConfig *config) { - return config->name_; -} -static inline uint32_t bank_base_addr(const BankConfig *config) { - return config->base_addr_; -} -static inline uint32_t bank_row_num(const BankConfig *config) { - return config->row_num_; -} -static inline uint32_t bank_elem_num(const BankConfig *config) { - return config->elem_num_; -} -static inline uint32_t bank_elem_width(const BankConfig *config) { - return config->elem_width_; -} - -static inline uint32_t bank_row_width(const BankConfig *config) { - return config->elem_num_ * config->elem_width_; -} - -static inline uint32_t bank_row_bytes(const BankConfig *config) { - return bank_row_width(config) / 8; -} - -static inline uint32_t bank_total_size(const BankConfig *config) { - return config->row_num_ * bank_row_bytes(config); -} - -static inline uint32_t bank_addr(const BankConfig *config, uint32_t row) { - return config->base_addr_ + row; -} - -#define BANK_NUM 5 -#define DIM 16 - -extern const BankConfig bank_configs[BANK_NUM]; - -/* End of pure C header */ - -#endif // _BANK_H_ diff --git a/bb-tests/workloads/lib/bbhw/mem/dma.h b/bb-tests/workloads/lib/bbhw/mem/dma.h deleted file mode 100644 index 8203b6b7..00000000 --- a/bb-tests/workloads/lib/bbhw/mem/dma.h +++ /dev/null @@ -1,19 +0,0 @@ -#ifndef _DMA_H_ -#define _DMA_H_ - -#include - -#ifdef __cplusplus -extern "C" { -#endif - -#define DMA_BANDWIDTH 128 - -// Get DMA row width (bytes) -static inline uint32_t dma_row_bytes() { return DMA_BANDWIDTH / 8; } - -#ifdef __cplusplus -} -#endif - -#endif // _DMA_H_ diff --git a/bb-tests/workloads/lib/bbhw/mem/mem.h b/bb-tests/workloads/lib/bbhw/mem/mem.h new file mode 100644 index 00000000..5c40881a --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/mem/mem.h @@ -0,0 +1,6 @@ +#ifndef _MEM_H_ +#define _MEM_H_ + +#include "params.h" + +#endif // _MEM_H_ diff --git a/bb-tests/workloads/lib/bbhw/mem/params.h b/bb-tests/workloads/lib/bbhw/mem/params.h new file mode 100644 index 00000000..4d6e7986 --- /dev/null +++ b/bb-tests/workloads/lib/bbhw/mem/params.h @@ -0,0 +1,11 @@ +#ifndef _PARAMS_H_ +#define _PARAMS_H_ + +// number of virtual banks +#define BANK_NUM 16 +// bank width in bits +#define BANK_WIDTH 128 +// number of lines in a bank (16KB) +#define BANK_LINES 1024 + +#endif // _PARAMS_H_ diff --git a/bb-tests/workloads/lib/bbhw/mem/spad.c b/bb-tests/workloads/lib/bbhw/mem/spad.c deleted file mode 100644 index 437fbe1d..00000000 --- a/bb-tests/workloads/lib/bbhw/mem/spad.c +++ /dev/null @@ -1,60 +0,0 @@ -#include "spad.h" - -/** - * SPAD (Scratchpad) memory address management library - * - * Provides conversion from bank and row to physical address - */ - -// Runtime interface - pure C implementation -uint32_t spad_addr(uint32_t bank, uint32_t row) { - return (bank < BANK_NUM) ? bank_addr(&bank_configs[bank], row) : -1; -} - -// Get bank number from spad address -uint32_t spad_get_bank(uint32_t addr) { - for (uint32_t bank = 0; bank < BANK_NUM; bank++) { - uint32_t base = bank_configs[bank].base_addr_; - uint32_t size = bank_configs[bank].row_num_; - if (addr >= base && addr < base + size) { - return bank; - } - } - // Bank not found - return -1; -} - -// Get offset (row) within bank from spad address -uint32_t spad_get_offset(uint32_t addr) { - for (uint32_t bank = 0; bank < BANK_NUM; bank++) { - uint32_t base = bank_configs[bank].base_addr_; - uint32_t size = bank_configs[bank].row_num_; - if (addr >= base && addr < base + size) { - return addr - base; - } - } - // Bank not found - return -1; -} - -// Get both bank and offset from spad address -void spad_get_bank_offset(uint32_t addr, uint32_t *bank, uint32_t *offset) { - for (uint32_t i = 0; i < BANK_NUM; i++) { - uint32_t base = bank_configs[i].base_addr_; - uint32_t size = bank_configs[i].row_num_; - if (addr >= base && addr < base + size) { - *bank = i; - *offset = addr - base; - return; - } - } - *bank = -1; - *offset = -1; -} - -// Get row width (bytes) of specified bank -uint32_t spad_get_bank_row_bytes(uint32_t bank) { - if (bank >= BANK_NUM) - return 0; - return bank_configs[bank].elem_num_ * bank_configs[bank].elem_width_ / 8; -} diff --git a/bb-tests/workloads/lib/bbhw/mem/spad.h b/bb-tests/workloads/lib/bbhw/mem/spad.h deleted file mode 100644 index 82015d73..00000000 --- a/bb-tests/workloads/lib/bbhw/mem/spad.h +++ /dev/null @@ -1,17 +0,0 @@ -#ifndef SPAD_H -#define SPAD_H - -#include "bank.h" -#include - -/* Pure C implementation - no C++ linkage needed */ - -uint32_t spad_addr(uint32_t bank, uint32_t row); -uint32_t spad_get_bank(uint32_t addr); -uint32_t spad_get_offset(uint32_t addr); -void spad_get_bank_offset(uint32_t addr, uint32_t *bank, uint32_t *offset); -uint32_t spad_get_bank_row_bytes(uint32_t bank); - -/* End of pure C header */ - -#endif // SPAD_H diff --git a/bb-tests/workloads/src/CMakeLists.txt b/bb-tests/workloads/src/CMakeLists.txt index 950d9a2f..5ca8cd9b 100644 --- a/bb-tests/workloads/src/CMakeLists.txt +++ b/bb-tests/workloads/src/CMakeLists.txt @@ -1,10 +1,9 @@ set(CTEST_WORKLOAD_DIR ${WORKLOAD_SRC_DIR}/CTest) -set(SPARSE_WORKLOAD_DIR ${WORKLOAD_SRC_DIR}/sparse) +set(CTEST_TOY_WORKLOAD_DIR ${CTEST_WORKLOAD_DIR}/toy) set(MODELTEST_WORKLOAD_DIR ${WORKLOAD_SRC_DIR}/ModelTest) set(OPTEST_WORKLOAD_DIR ${WORKLOAD_SRC_DIR}/OpTest) set(TUTORIAL_WORKLOAD_DIR ${WORKLOAD_SRC_DIR}/tutorial) -# add_subdirectory(sparse) add_subdirectory(CTest) add_subdirectory(ModelTest) add_subdirectory(OpTest) diff --git a/bb-tests/workloads/src/CTest/CMakeLists.txt b/bb-tests/workloads/src/CTest/CMakeLists.txt index 0a37b932..65ce0354 100644 --- a/bb-tests/workloads/src/CTest/CMakeLists.txt +++ b/bb-tests/workloads/src/CTest/CMakeLists.txt @@ -1,7 +1,4 @@ -set(CTEST_GEMMINI_WORKLOAD_DIR ${CTEST_WORKLOAD_DIR}/gemmini) set(CTEST_TOY_WORKLOAD_DIR ${CTEST_WORKLOAD_DIR}/toy) -set(CTEST_RVV_WORKLOAD_DIR ${CTEST_WORKLOAD_DIR}/rvv) -add_subdirectory(gemmini) add_subdirectory(toy) -add_subdirectory(rvv) +add_subdirectory(goban) diff --git a/bb-tests/workloads/src/CTest/gemmini/CMakeLists.txt b/bb-tests/workloads/src/CTest/gemmini/CMakeLists.txt deleted file mode 100644 index 340a2bff..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/CMakeLists.txt +++ /dev/null @@ -1,218 +0,0 @@ -set(ELF_CC "riscv64-unknown-elf-gcc") -set(LINUX_CC "riscv64-unknown-linux-gnu-g++") - -#------------------------------------------------------------------------------- -# Set baremetal compilation flags -#------------------------------------------------------------------------------- -set(C_FLAGS -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany - -std=gnu99 -ffast-math -fno-builtin-printf -fno-tree-loop-distribute-patterns - -DBAREMETAL=1 -DPREALLOCATE=1 -DMULTITHREAD=1 -DPRINT_TILE=0 - -specs=htif_nano.specs -I${CTEST_GEMMINI_WORKLOAD_DIR}) - -# Link libraries -set(LINK_LIBS -lm -lgcc) - -#------------------------------------------------------------------------------- -# Define common compilation step functions -#------------------------------------------------------------------------------- - -#------------------------------------------------------------------------------- -# Generate executables for different platforms -#------------------------------------------------------------------------------- -set(CMAKE_C_COMPILER "riscv64-unknown-linux-gnu-gcc") -set(CMAKE_CXX_COMPILER "riscv64-unknown-linux-gnu-g++") -set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=rv64gc -mcmodel=medany -std=gnu99 -O2 -ffast-math -fno-common -fno-builtin-printf -fno-tree-loop-distribute-patterns -DPREALLOCATE=1 -DMULTITHREAD=1 -DPRINT_TILE=0") -set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=rv64gc -mcmodel=medany -std=gnu++11 -O2 -ffast-math -fno-common -fno-builtin-printf -fno-tree-loop-distribute-patterns -DPREALLOCATE=1 -DMULTITHREAD=1 -DPRINT_TILE=0") - -# Generate Linux version executables -function(add_linux_test_target TEST_NAME SOURCE_FILE) - set(EXECUTABLE "${TEST_NAME}-linux") - - add_executable(${EXECUTABLE} ${CTEST_GEMMINI_WORKLOAD_DIR}/${SOURCE_FILE}) - target_include_directories(${EXECUTABLE} PRIVATE - ${WORKLOAD_LIB_DIR} - ${CTEST_GEMMINI_WORKLOAD_DIR} - ) - # Ensure dependent libraries are built first and link merged library files - add_dependencies(${EXECUTABLE} bbhw-linux) - target_link_libraries(${EXECUTABLE} - m - ) -endfunction() - -# Generate multicore baremetal version executables -function(add_multicore_baremetal_test_target TEST_NAME SOURCE_FILE) - set(EXECUTABLE "${TEST_NAME}_multicore-baremetal") - - add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} - COMMAND ${ELF_CC} ${C_FLAGS} -o ${EXECUTABLE} ${CTEST_TOY_WORKLOAD_DIR}/start.S - -DMULTICORE=3 - ${CTEST_GEMMINI_WORKLOAD_DIR}/${SOURCE_FILE} - -I${WORKLOAD_LIB_DIR} - ${LINK_LIBS} - DEPENDS ${CTEST_GEMMINI_WORKLOAD_DIR}/${SOURCE_FILE} - ${CTEST_TOY_WORKLOAD_DIR}/start.S - COMMENT "Building multicore baremetal executable: ${EXECUTABLE}" - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - ) - - add_custom_target(${TEST_NAME}_multicore_baremetal - DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} bbhw-baremetal) -endfunction() - -# Generate singlecore baremetal version executables -function(add_singlecore_baremetal_test_target TEST_NAME SOURCE_FILE) - set(EXECUTABLE "${TEST_NAME}_singlecore-baremetal") - - add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} - COMMAND ${ELF_CC} ${C_FLAGS} -o ${EXECUTABLE} - ${CTEST_GEMMINI_WORKLOAD_DIR}/${SOURCE_FILE} - -I${WORKLOAD_LIB_DIR} - ${LINK_LIBS} - DEPENDS ${CTEST_GEMMINI_WORKLOAD_DIR}/${SOURCE_FILE} - COMMENT "Building singlecore baremetal executable: ${EXECUTABLE}" - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - ) - - add_custom_target(${TEST_NAME}_singlecore_baremetal - DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} bbhw-baremetal - ) -endfunction() - -# Create cross-platform test targets -function(add_cross_platform_test_target TEST_NAME SOURCE_FILE) - add_linux_test_target(${TEST_NAME} ${SOURCE_FILE}) - add_multicore_baremetal_test_target(${TEST_NAME} ${SOURCE_FILE}) - add_singlecore_baremetal_test_target(${TEST_NAME} ${SOURCE_FILE}) - - # Create a master target that builds all platforms simultaneously - add_custom_target(${TEST_NAME} - DEPENDS ${TEST_NAME}-linux ${TEST_NAME}_multicore_baremetal ${TEST_NAME}_singlecore_baremetal - COMMENT "Building ${TEST_NAME} for all platforms" - ) -endfunction() - -#------------------------------------------------------------------------------- -# build linux version workload -#------------------------------------------------------------------------------- -set(LINK_FLAGS "-static -Wl,--no-dynamic-linker") -set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LINK_FLAGS}") - -#------------------------------------------------------------------------------- -# Build list - use cross-platform function to generate all test targets -#------------------------------------------------------------------------------- -add_cross_platform_test_target(gemmini_mvin_mvout mvin_mvout.c) -add_cross_platform_test_target(gemmini_mvin_mvout_zeros mvin_mvout_zeros.c) -add_cross_platform_test_target(gemmini_mvin_mvout_stride mvin_mvout_stride.c) -add_cross_platform_test_target(gemmini_mvin_mvout_block_stride mvin_mvout_block_stride.c) -add_cross_platform_test_target(gemmini_mvin_mvout_acc mvin_mvout_acc.c) -add_cross_platform_test_target(gemmini_mvin_mvout_acc_zero_stride mvin_mvout_acc_zero_stride.c) -add_cross_platform_test_target(gemmini_mvin_mvout_acc_stride mvin_mvout_acc_stride.c) -add_cross_platform_test_target(gemmini_mvin_mvout_acc_full mvin_mvout_acc_full.c) -add_cross_platform_test_target(gemmini_mvin_mvout_acc_full_stride mvin_mvout_acc_full_stride.c) -add_cross_platform_test_target(gemmini_matmul_os matmul_os.c) -add_cross_platform_test_target(gemmini_matmul_ws matmul_ws.c) -add_cross_platform_test_target(gemmini_matmul matmul.c) -add_cross_platform_test_target(gemmini_raw_hazard raw_hazard.c) -add_cross_platform_test_target(gemmini_aligned aligned.c) -add_cross_platform_test_target(gemmini_padded padded.c) -add_cross_platform_test_target(gemmini_mvin_scale mvin_scale.c) -add_cross_platform_test_target(gemmini_conv conv.c) -add_cross_platform_test_target(gemmini_conv_stride conv_stride.c) -add_cross_platform_test_target(gemmini_conv_rect conv_rect.c) -add_cross_platform_test_target(gemmini_conv_rect_pool conv_rect_pool.c) -add_cross_platform_test_target(gemmini_conv_with_pool conv_with_pool.c) -add_cross_platform_test_target(gemmini_conv_with_rot180 conv_with_rot180.c) -add_cross_platform_test_target(gemmini_conv_with_kernel_dilation conv_with_kernel_dilation.c) -add_cross_platform_test_target(gemmini_conv_with_input_dilation conv_with_input_dilation.c) -add_cross_platform_test_target(gemmini_conv_with_input_dilation_and_rot180 conv_with_input_dilation_and_rot180.c) -add_cross_platform_test_target(gemmini_conv_with_input_dilation_and_neg_padding conv_with_input_dilation_and_neg_padding.c) -add_cross_platform_test_target(gemmini_conv_trans_output_1203 conv_trans_output_1203.c) -add_cross_platform_test_target(gemmini_conv_trans_weight_1203 conv_trans_weight_1203.c) -add_cross_platform_test_target(gemmini_conv_trans_weight_0132 conv_trans_weight_0132.c) -add_cross_platform_test_target(gemmini_conv_trans_input_3120 conv_trans_input_3120.c) -add_cross_platform_test_target(gemmini_conv_trans_input_3120_with_kernel_dilation conv_trans_input_3120_with_kernel_dilation.c) -add_cross_platform_test_target(gemmini_conv_first_layer conv_first_layer.c) -add_cross_platform_test_target(gemmini_conv_dw conv_dw.c) -add_cross_platform_test_target(gemmini_conv_perf conv_perf.c) -add_cross_platform_test_target(gemmini_conv_dw_perf conv_dw_perf.c) -add_cross_platform_test_target(gemmini_tiled_matmul_os tiled_matmul_os.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws tiled_matmul_ws.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_At tiled_matmul_ws_At.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_Bt tiled_matmul_ws_Bt.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_full_C tiled_matmul_ws_full_C.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_low_D tiled_matmul_ws_low_D.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_igelu tiled_matmul_ws_igelu.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_layernorm tiled_matmul_ws_layernorm.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_softmax tiled_matmul_ws_softmax.c) -add_cross_platform_test_target(gemmini_tiled_matmul_ws_perf tiled_matmul_ws_perf.c) -add_cross_platform_test_target(gemmini_tiled_matmul_cpu tiled_matmul_cpu.c) -add_cross_platform_test_target(gemmini_tiled_matmul_option tiled_matmul_option.c) -add_cross_platform_test_target(gemmini_transpose transpose.c) -add_cross_platform_test_target(gemmini_matrix_add matrix_add.c) -add_cross_platform_test_target(gemmini_resadd resadd.c) -add_cross_platform_test_target(gemmini_resadd_stride resadd_stride.c) -add_cross_platform_test_target(gemmini_global_average global_average.c) -add_cross_platform_test_target(gemmini_gemmini_counter gemmini_counter.c) -add_cross_platform_test_target(gemmini_template template.c) - -# Create master build target -add_custom_target(buckyball-gemmini-build ALL DEPENDS - gemmini_mvin_mvout - gemmini_mvin_mvout_zeros - gemmini_mvin_mvout_stride - gemmini_mvin_mvout_block_stride - gemmini_mvin_mvout_acc - gemmini_mvin_mvout_acc_zero_stride - gemmini_mvin_mvout_acc_stride - gemmini_mvin_mvout_acc_full - gemmini_mvin_mvout_acc_full_stride - gemmini_matmul_os - gemmini_matmul_ws - gemmini_matmul - gemmini_raw_hazard - gemmini_aligned - gemmini_padded - gemmini_mvin_scale - gemmini_conv - gemmini_conv_stride - gemmini_conv_rect - gemmini_conv_rect_pool - gemmini_conv_with_pool - gemmini_conv_with_rot180 - gemmini_conv_with_kernel_dilation - gemmini_conv_with_input_dilation - gemmini_conv_with_input_dilation_and_rot180 - gemmini_conv_with_input_dilation_and_neg_padding - gemmini_conv_trans_output_1203 - gemmini_conv_trans_weight_1203 - gemmini_conv_trans_weight_0132 - gemmini_conv_trans_input_3120 - gemmini_conv_trans_input_3120_with_kernel_dilation - gemmini_conv_first_layer - gemmini_conv_dw - gemmini_conv_perf - gemmini_conv_dw_perf - gemmini_tiled_matmul_os - gemmini_tiled_matmul_ws - gemmini_tiled_matmul_ws_At - gemmini_tiled_matmul_ws_Bt - gemmini_tiled_matmul_ws_full_C - gemmini_tiled_matmul_ws_low_D - gemmini_tiled_matmul_ws_igelu - gemmini_tiled_matmul_ws_layernorm - gemmini_tiled_matmul_ws_softmax - gemmini_tiled_matmul_ws_perf - gemmini_tiled_matmul_cpu - gemmini_tiled_matmul_option - gemmini_transpose - gemmini_matrix_add - gemmini_resadd - gemmini_resadd_stride - gemmini_global_average - gemmini_gemmini_counter - gemmini_template - COMMENT "Building all Gemmini workloads for Buckyball" - VERBATIM) diff --git a/bb-tests/workloads/src/CTest/gemmini/Makefile b/bb-tests/workloads/src/CTest/gemmini/Makefile deleted file mode 100644 index fd4d4aaf..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/Makefile +++ /dev/null @@ -1,132 +0,0 @@ -include $(abs_top_srcdir)/Makefrag - -tests = \ - mvin_mvout \ - mvin_mvout_zeros \ - mvin_mvout_stride \ - mvin_mvout_block_stride \ - mvin_mvout_acc \ - mvin_mvout_acc_zero_stride \ - mvin_mvout_acc_stride \ - mvin_mvout_acc_full \ - mvin_mvout_acc_full_stride \ - matmul_os \ - matmul_ws \ - matmul \ - raw_hazard \ - aligned \ - padded \ - mvin_scale \ - conv \ - conv_stride \ - conv_rect \ - conv_rect_pool \ - conv_with_pool \ - conv_with_rot180 \ - conv_with_kernel_dilation \ - conv_with_input_dilation \ - conv_with_input_dilation_and_rot180 \ - conv_with_input_dilation_and_neg_padding \ - conv_trans_output_1203 \ - conv_trans_weight_1203 \ - conv_trans_weight_0132 \ - conv_trans_input_3120 \ - conv_trans_input_3120_with_kernel_dilation \ - conv_first_layer \ - conv_dw \ - conv_perf \ - conv_dw_perf \ - tiled_matmul_os \ - tiled_matmul_ws \ - tiled_matmul_ws_At \ - tiled_matmul_ws_Bt \ - tiled_matmul_ws_full_C \ - tiled_matmul_ws_low_D \ - tiled_matmul_ws_igelu \ - tiled_matmul_ws_layernorm \ - tiled_matmul_ws_softmax \ - tiled_matmul_ws_perf \ - tiled_matmul_cpu \ - tiled_matmul_option \ - transpose \ - matrix_add \ - resadd \ - resadd_stride \ - global_average \ - gemmini_counter \ - template \ - -tests_baremetal = $(tests:=-baremetal) - -ifeq ($(findstring spike,$(RUNNER)),spike) -# Currently don't support conv or conv-with-pool on spike -runs_baremetal = $(addsuffix .run,$(filter-out conv-baremetal conv_with_pool-baremetal,$(tests_baremetal))) -else -# Don't run very long benchmarks for RTL sim -runs_baremetal = $(addsuffix .run,$(filter-out tiled_matmul_cpu-baremetal tiled_matmul_option-baremetal,$(tests_baremetal))) -endif - -ifdef BAREMETAL_ONLY - tests_linux = - tests_pk = -else - tests_linux = $(tests:=-linux) - tests_pk = $(tests:=-pk) -endif - -BENCH_COMMON = $(abs_top_srcdir)/riscv-tests/benchmarks/common -GEMMINI_HEADERS = $(abs_top_srcdir)/include/gemmini.h $(abs_top_srcdir)/include/gemmini_params.h $(abs_top_srcdir)/include/gemmini_testutils.h - -CFLAGS := $(CFLAGS) \ - -DPREALLOCATE=1 \ - -DMULTITHREAD=1 \ - -mcmodel=medany \ - -std=gnu99 \ - -O2 \ - -ffast-math \ - -fno-common \ - -fno-builtin-printf \ - -fno-tree-loop-distribute-patterns \ - -march=rv64gc -Wa,-march=rv64gc \ - -lm \ - -lgcc \ - -I$(abs_top_srcdir)/riscv-tests \ - -I$(abs_top_srcdir)/riscv-tests/env \ - -I$(abs_top_srcdir) \ - -I$(BENCH_COMMON) \ - -DID_STRING=$(ID_STRING) \ - -DPRINT_TILE=0 \ - -CFLAGS_PK := \ - $(CFLAGS) \ - -static \ - -DBAREMETAL=1 \ - -CFLAGS_BAREMETAL := \ - $(CFLAGS) \ - -nostdlib \ - -nostartfiles \ - -static \ - -T $(BENCH_COMMON)/test.ld \ - -DBAREMETAL=1 \ - -all: $(tests_baremetal) $(tests_linux) $(tests_pk) - -vpath %.c $(src_dir) - -%-baremetal: %.c $(GEMMINI_HEADERS) - $(CC_BAREMETAL) $(CFLAGS_BAREMETAL) $< $(LFLAGS) -o $@ \ - $(wildcard $(BENCH_COMMON)/*.c) $(wildcard $(BENCH_COMMON)/*.S) $(LIBS) - -%-linux: %.c $(GEMMINI_HEADERS) - $(CC_LINUX) $(CFLAGS) $< $(LFLAGS) -o $@ - -%-pk: %.c $(GEMMINI_HEADERS) - $(CC_LINUX) $(CFLAGS_PK) $< $(LFLAGS) -o $@ - -run-baremetal: $(runs_baremetal) - -%-baremetal.run: %-baremetal - $(RUNNER)$(abs_top_srcdir)/build/bareMetalC/$^ - -junk += $(tests_baremetal) $(tests_linux) $(tests_pk) diff --git a/bb-tests/workloads/src/CTest/gemmini/aligned.c b/bb-tests/workloads/src/CTest/gemmini/aligned.c deleted file mode 100644 index c48c6ad8..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/aligned.c +++ /dev/null @@ -1,73 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define PG_SIZE (4 * 1024) -#define OFFSET 1 - -/*struct aligned_buffer { - char garbage[0]; - elem_t data[DIM][DIM]; -} __attribute__((__packed__));*/ - -struct offset_buffer { - elem_t garbage[OFFSET]; - elem_t data[DIM][DIM]; -} __attribute__((__packed__)); - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // static struct aligned_buffer In __attribute__((aligned(PG_SIZE))); - static struct offset_buffer In __attribute__((aligned(PG_SIZE))); - static struct offset_buffer Out __attribute__((aligned(PG_SIZE))); - - for (size_t i = 0; i < OFFSET; ++i) { - In.garbage[i] = ~0; - Out.garbage[i] = 1; - } - - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { - In.data[i][j] = i * DIM + j; - Out.data[i][j] = 1; - } - - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - // printf("Mvin\n"); - gemmini_mvin(In.data, 0); - // printf("Mvout\n"); - gemmini_mvout(Out.data, 0); - - // printf("Fence\n"); - gemmini_fence(); - - if (!is_equal(In.data, Out.data)) { - printf("Matrix:\n"); - printMatrix(In.data); - printf("Matrix output:\n"); - printMatrix(Out.data); - printf("\n"); - - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv.c b/bb-tests/workloads/src/CTest/gemmini/conv.c deleted file mode 100644 index 073e8cae..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv.c +++ /dev/null @@ -1,337 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != (in_row_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - exit(1); - } - - if (out_col_dim != (in_col_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions (rows by columns): %u by %u\n", IN_ROW_DIM, - IN_COL_DIM); - printf("Output dimensions (rows by columns): %u by %u\n\n", OUT_ROW_DIM, - OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, PADDING, input, weights, - bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_dw.c b/bb-tests/workloads/src/CTest/gemmini/conv_dw.c deleted file mode 100644 index 89992329..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_dw.c +++ /dev/null @@ -1,247 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 3 -#define IN_ROW_DIM 112 -#define IN_COL_DIM 112 -#define CHANNELS 17 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define CHANNELS 5 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define CHANNELS 15 - -#endif - -#define BATCH_SIZE 3 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][CHANNELS]; - static elem_t weights[CHANNELS][KERNEL_DIM][KERNEL_DIM]; - static acc_t bias[CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - tiled_conv_dw_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, CHANNELS, OUT_ROW_DIM, - OUT_COL_DIM, STRIDE, PADDING, KERNEL_DIM, - - (elem_t *)input, (elem_t *)weights, (acc_t *)bias, - (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 1, 0, 0, - - CPU); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t output_mat[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][CHANNELS]; - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_dw_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, CHANNELS, OUT_ROW_DIM, - OUT_COL_DIM, STRIDE, PADDING, KERNEL_DIM, - - (elem_t *)input, (elem_t *)weights, (acc_t *)bias, - (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 1, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int i = 0; i < sizeof(output_mat) / sizeof(output_mat[0]); i++) { - elem_t v = *((elem_t *)output_mat + i); - if (v != 5 && v != 7 && v != 10) { - success = false; - break; - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0][0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("%d,", weights[och][wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < CHANNELS; och++) { - printf("%d,", output_mat[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_dw_perf.c b/bb-tests/workloads/src/CTest/gemmini/conv_dw_perf.c deleted file mode 100644 index 2f71be6a..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_dw_perf.c +++ /dev/null @@ -1,103 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define HEAP_SIZE (8 * 1024 * 1024) - -int str2int(char *str) { - int res = 0; - for (int i = 0; str[i] != '\0'; ++i) - res = res * 10 + str[i] - '0'; - return res; -} - -int main(int argc, char *argv[]) { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - int BATCH_SIZE = 3; - int IN_DIM = 112; - int CHANNELS = 17; - int KERNEL_DIM = 3; - int PADDING = 1; - int STRIDE = 2; - bool NO_BIAS = false; - - if (argc == 8) { - BATCH_SIZE = str2int(argv[1]); - IN_DIM = str2int(argv[2]); - CHANNELS = str2int(argv[3]); - KERNEL_DIM = str2int(argv[4]); - PADDING = str2int(argv[5]); - STRIDE = str2int(argv[6]); - NO_BIAS = str2int(argv[7]); - } else if (argc > 1) { - printf("BATCH_SIZE IN_DIM CHANNELS KERNEL_DIM PADDING STRIDE NO_BIAS\n"); - exit(1); - } - - int OUT_DIM = ((IN_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1); - - printf("BATCH_SIZE = %d\n", BATCH_SIZE); - printf("IN_DIM = %d\n", IN_DIM); - printf("CHANNELS = %d\n", CHANNELS); - printf("KERNEL_DIM = %d\n", KERNEL_DIM); - printf("PADDING = %d\n", PADDING); - printf("STRIDE = %d\n", STRIDE); - printf("NO_BIAS = %d\n", NO_BIAS); - - gemmini_flush(0); - - printf("Output dimension: %u\n\n", OUT_DIM); - - static uint8_t heap[HEAP_SIZE]; - - // static elem_t input[BATCH_SIZE][IN_DIM][IN_DIM][CHANNELS]; - // static elem_t weights[CHANNELS][KERNEL_DIM][KERNEL_DIM]; - // static acc_t bias[CHANNELS]; - // static elem_t output[BATCH_SIZE][OUT_DIM][OUT_DIM][CHANNELS]; - - elem_t *input = (elem_t *)(&heap[0]); - elem_t *weights = - (elem_t *)((elem_t *)input + BATCH_SIZE * IN_DIM * IN_DIM * CHANNELS); - acc_t *bias = - (acc_t *)((elem_t *)weights + CHANNELS * KERNEL_DIM * KERNEL_DIM); - elem_t *output = (elem_t *)((acc_t *)bias + CHANNELS); - - { - uint8_t *end = (uint8_t *)((elem_t *)output + - BATCH_SIZE * OUT_DIM * OUT_DIM * CHANNELS); - if (end >= &heap[HEAP_SIZE]) { - printf("problem size is too large to fit in memory"); - exit(1); - } - } - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - - tiled_conv_dw_auto(BATCH_SIZE, IN_DIM, IN_DIM, CHANNELS, OUT_DIM, OUT_DIM, - STRIDE, PADDING, KERNEL_DIM, - - (elem_t *)input, (elem_t *)weights, (acc_t *)bias, - (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 1, 0, 0, - - WS); - - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_first_layer.c b/bb-tests/workloads/src/CTest/gemmini/conv_first_layer.c deleted file mode 100644 index 222ff264..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_first_layer.c +++ /dev/null @@ -1,337 +0,0 @@ -// Here, we test Gemmini's optimizations for conv layers with few input -// channels, like the first layer of most CNNs. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != (in_row_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - exit(1); - } - if (out_col_dim != (in_col_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, PADDING, input, weights, - bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 13 && v != 19 && v != 28) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_perf.c b/bb-tests/workloads/src/CTest/gemmini/conv_perf.c deleted file mode 100644 index 4fe2cf4e..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_perf.c +++ /dev/null @@ -1,128 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define HEAP_SIZE (4 * 1024 * 1024) - -int str2int(char *str) { - int res = 0; - for (int i = 0; str[i] != '\0'; ++i) - res = res * 10 + str[i] - '0'; - return res; -} - -int main(int argc, char *argv[]) { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - int BATCH_SIZE = 4; - int IN_DIM = 224; - int IN_CHANNELS = 3; - int OUT_CHANNELS = 32; - int KERNEL_DIM = 3; - int PADDING = 1; - int STRIDE = 2; - bool NO_BIAS = false; - - if (argc == 9) { - BATCH_SIZE = str2int(argv[1]); - IN_DIM = str2int(argv[2]); - IN_CHANNELS = str2int(argv[3]); - OUT_CHANNELS = str2int(argv[4]); - KERNEL_DIM = str2int(argv[5]); - PADDING = str2int(argv[6]); - STRIDE = str2int(argv[7]); - NO_BIAS = str2int(argv[8]); - } else if (argc > 1) { - printf("BATCH_SIZE IN_DIM IN_CHANNELS OUT_CHANNELS KERNEL_DIM PADDING " - "STRIDE NO_BIAS\n"); - exit(1); - } - - int OUT_DIM = ((IN_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1); - - printf("BATCH_SIZE = %d\n", BATCH_SIZE); - printf("IN_DIM = %d\n", IN_DIM); - printf("IN_CHANNELS = %d\n", IN_CHANNELS); - printf("OUT_CHANNELS = %d\n", OUT_CHANNELS); - printf("KERNEL_DIM = %d\n", KERNEL_DIM); - printf("PADDING = %d\n", PADDING); - printf("STRIDE = %d\n", STRIDE); - printf("NO_BIAS = %d\n", NO_BIAS); - printf("Output dimension: %u\n\n", OUT_DIM); - - bool map_to_matmul = KERNEL_DIM == 1 && PADDING == 0 && STRIDE == 1; - int I = BATCH_SIZE * OUT_DIM * OUT_DIM; - int J = OUT_CHANNELS; - int K = KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; - - if (map_to_matmul) { - printf("I = %d\n", I); - printf("J = %d\n", J); - printf("K = %d\n", K); - } - - gemmini_flush(0); - - static uint8_t heap[HEAP_SIZE]; - - // static elem_t input[BATCH_SIZE][IN_DIM][IN_DIM][IN_CHANNELS]; - // static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - // static acc_t bias[OUT_CHANNELS]; - // static elem_t output[BATCH_SIZE][OUT_DIM][OUT_DIM][OUT_CHANNELS]; - - elem_t *input = (elem_t *)(&heap[0]); - elem_t *weights = - (elem_t *)((elem_t *)input + BATCH_SIZE * IN_DIM * IN_DIM * IN_CHANNELS); - acc_t *bias = (acc_t *)((elem_t *)weights + - OUT_CHANNELS * KERNEL_DIM * KERNEL_DIM * IN_CHANNELS); - elem_t *output = (elem_t *)((acc_t *)bias + OUT_CHANNELS); - - { - uint8_t *end = (uint8_t *)((elem_t *)output + - BATCH_SIZE * OUT_DIM * OUT_DIM * OUT_CHANNELS); - if (end >= &heap[HEAP_SIZE]) { - printf("problem size is too large to fit in memory"); - exit(1); - } - } - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - - if (map_to_matmul) { - - tiled_matmul_auto(I, J, K, input, weights, NO_BIAS ? NULL : bias, output, K, - J, J, J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - true, false, false, false, false, 0, WS); - - } else { - - tiled_conv_auto(BATCH_SIZE, IN_DIM, IN_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_DIM, OUT_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, false, - false, false, false, false, - - (elem_t *)input, (elem_t *)weights, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - } - - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_rect.c b/bb-tests/workloads/src/CTest/gemmini/conv_rect.c deleted file mode 100644 index d5e9f74b..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_rect.c +++ /dev/null @@ -1,337 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 7 -#define IN_COL_DIM 11 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 18 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != (in_row_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - exit(1); - } - - if (out_col_dim != (in_col_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions (rows by columns): %u by %u\n", IN_ROW_DIM, - IN_COL_DIM); - printf("Output dimensions (rows by columns): %u by %u\n\n", OUT_ROW_DIM, - OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, PADDING, input, weights, - bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_rect_pool.c b/bb-tests/workloads/src/CTest/gemmini/conv_rect_pool.c deleted file mode 100644 index 8e2ddab2..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_rect_pool.c +++ /dev/null @@ -1,431 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#define POOL_SIZE 3 -#define POOL_STRIDE 2 -#define POOL_PADDING 1 - -#else - -#ifdef FAST -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 -#else -#define IN_ROW_DIM 9 -#define IN_COL_DIM 11 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 12 -#endif - -#define BATCH_SIZE 1 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 - -#define POOL_SIZE 3 -#define POOL_STRIDE 2 -#define POOL_PADDING 1 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -#define POOL_OUT_ROW_DIM \ - ((OUT_ROW_DIM + 2 * POOL_PADDING - POOL_SIZE) / POOL_STRIDE + 1) -#define POOL_OUT_COL_DIM \ - ((OUT_COL_DIM + 2 * POOL_PADDING - POOL_SIZE) / POOL_STRIDE + 1) - -#define NO_POOL false - -#if NO_POOL == true && \ - !(POOL_SIZE == 1 && POOL_STRIDE == 1 && POOL_PADDING == 0) -#error NO_POOL is not set correctly -#endif - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != (in_row_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - exit(1); - } - if (out_col_dim != (in_col_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void pool(int batch_size, int channels, int in_row_dim, int in_col_dim, - int out_row_dim, int out_col_dim, int window_dim, int stride, - int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][channels], - elem_t output[batch_size][out_row_dim][out_col_dim][channels]) { - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int ch = 0; ch < channels; ch++) { - output[b][orow][ocol][ch] = elem_t_min; - - for (int wrow = 0; wrow < window_dim; wrow++) { - for (int wcol = 0; wcol < window_dim; wcol++) { - int irow = orow * stride + wrow - padding; - int icol = ocol * stride + wcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][ch]; - - if (pixel > output[b][orow][ocol][ch]) { - output[b][orow][ocol][ch] = pixel; - } - } - } - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((IN_DIM + 2*PADDING - KERNEL_DIM) % STRIDE == 0); - // assert((OUT_DIM + 2*PADDING - POOL_SIZE) % POOL_STRIDE == 0); - - printf("Output dimensions (rows by columns): %u by %u\n", OUT_ROW_DIM, - OUT_COL_DIM); - printf("Pooling output dimensions (rows by columns): %u by %u\n\n", - POOL_OUT_ROW_DIM, POOL_OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - static elem_t pool_output[BATCH_SIZE][POOL_OUT_ROW_DIM][POOL_OUT_COL_DIM] - [OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - -#ifndef FAST - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, PADDING, input, weights, - bias, output); - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("CPU pool...\n"); - uint64_t start_cpu_pool = read_cycles(); - pool(BATCH_SIZE, OUT_CHANNELS, OUT_ROW_DIM, OUT_COL_DIM, POOL_OUT_ROW_DIM, - POOL_OUT_COL_DIM, POOL_SIZE, POOL_STRIDE, POOL_PADDING, output, - pool_output); - uint64_t end_cpu_pool = read_cycles(); - printf("CPU pool took %llu cycles\n", end_cpu_pool - start_cpu_pool); - - printf("CPU conv+pool took %llu cycles\n", - end_cpu_pool - start_cpu_pool + end_cpu - start_cpu); -#endif - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - static elem_t pool_output_mat[BATCH_SIZE * POOL_OUT_ROW_DIM * - POOL_OUT_COL_DIM][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, false, false, false, false, - - // 1, - // 1, 1, 1, - // 1, 1, 1, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)pool_output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, POOL_SIZE, - NO_POOL ? 0 : POOL_STRIDE, POOL_PADDING, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(pool_output_mat) == sizeof(pool_output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * POOL_OUT_ROW_DIM * POOL_OUT_COL_DIM; - orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - if (pool_output_mat[orow][ocol] != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&pool_output[0][0][0][0], &pool_output_mat[0][0], - sizeof(pool_output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("pool_output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < POOL_OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < POOL_OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", pool_output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("pool_output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * POOL_OUT_ROW_DIM * POOL_OUT_COL_DIM; - orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", pool_output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("Output dimensions (rows by columns): %u by %u\n", OUT_ROW_DIM, - OUT_COL_DIM); - printf("Pooling output dimensions: %u by %u\n\n", POOL_OUT_ROW_DIM, - POOL_OUT_COL_DIM); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_stride.c b/bb-tests/workloads/src/CTest/gemmini/conv_stride.c deleted file mode 100644 index 074ad8c8..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_stride.c +++ /dev/null @@ -1,309 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#define IN_ROW_DIM 7 -#define IN_COL_DIM 7 -#define IN_CHANNELS 13 -#define OUT_CHANNELS 17 -#define BATCH_SIZE 1 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -#define IN_STRIDE (IN_CHANNELS + 3) -#define WEIGHT_STRIDE (OUT_CHANNELS + 4) -#define OUT_STRIDE (OUT_CHANNELS + 2) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int padding, int in_stride, int out_stride, - elem_t input[batch_size][in_row_dim][in_col_dim][in_stride], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_stride]) { - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != (in_row_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - exit(1); - } - - if (out_col_dim != (in_col_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - int out_stride, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_stride]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int row, int col, int stride) { - elem_t i = 0; - for (int r = 0; r < row; r++) { - for (int c = 0; c < col; c++) { - elem_t *ptr = buf + r * stride + c; - *ptr = (rand() % 5) - 2; - } - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; - *ptr = (rand() % 5) - 2; - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - gemmini_flush(0); - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions (rows by columns): %u by %u\n", IN_ROW_DIM, - IN_COL_DIM); - printf("Output dimensions (rows by columns): %u by %u\n\n", OUT_ROW_DIM, - OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_STRIDE]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_STRIDE]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], BATCH_SIZE * IN_ROW_DIM * IN_COL_DIM, - IN_CHANNELS, IN_STRIDE); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], OUT_CHANNELS * KERNEL_DIM * KERNEL_DIM, - IN_CHANNELS, IN_CHANNELS); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, PADDING, IN_STRIDE, - OUT_STRIDE, input, weights, bias, output); - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][WEIGHT_STRIDE] = {0}; - static elem_t output_mat[N_PATCHES][OUT_STRIDE] = {0}; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, - WEIGHT_STRIDE, weights, weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_stride_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, - OUT_CHANNELS, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, - PADDING, KERNEL_DIM, IN_STRIDE, WEIGHT_STRIDE, - OUT_STRIDE, false, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < WEIGHT_STRIDE; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_STRIDE; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_STRIDE; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],\n"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_STRIDE; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_trans_input_3120.c b/bb-tests/workloads/src/CTest/gemmini/conv_trans_input_3120.c deleted file mode 100644 index 8c82a97c..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_trans_input_3120.c +++ /dev/null @@ -1,295 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define TRANS_OUTPUT_1203 false -#define TRANS_WEIGHT_1203 false -#define TRANS_WEIGHT_0132 false -#define TRANS_INPUT_3120 true - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS][BATCH_SIZE]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, TRANS_INPUT_3120, TRANS_WEIGHT_1203, - TRANS_WEIGHT_0132, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - CPU); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, TRANS_INPUT_3120, TRANS_WEIGHT_1203, - TRANS_WEIGHT_0132, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[irow][icol][ich][batch]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_trans_input_3120_with_kernel_dilation.c b/bb-tests/workloads/src/CTest/gemmini/conv_trans_input_3120_with_kernel_dilation.c deleted file mode 100644 index 560a1560..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_trans_input_3120_with_kernel_dilation.c +++ /dev/null @@ -1,297 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define KERNEL_DILATION 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define KERNEL_DILATION 2 - -#endif - -#define NO_BIAS false - -#define TRANS_OUTPUT_1203 false -#define TRANS_WEIGHT_1203 false -#define TRANS_WEIGHT_0132 false -#define TRANS_INPUT_3120 true - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, KERNEL_DILATION, PADDING, - KERNEL_DIM, false, TRANS_OUTPUT_1203, TRANS_INPUT_3120, - TRANS_WEIGHT_1203, TRANS_WEIGHT_0132, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - CPU); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, KERNEL_DILATION, PADDING, - KERNEL_DIM, false, TRANS_OUTPUT_1203, TRANS_INPUT_3120, - TRANS_WEIGHT_1203, TRANS_WEIGHT_0132, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 6 && v != 11 && v != 16 && v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_trans_output_1203.c b/bb-tests/workloads/src/CTest/gemmini/conv_trans_output_1203.c deleted file mode 100644 index 904aa8ef..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_trans_output_1203.c +++ /dev/null @@ -1,290 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define TRANS_OUTPUT_1203 true - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - CPU); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_trans_weight_0132.c b/bb-tests/workloads/src/CTest/gemmini/conv_trans_weight_0132.c deleted file mode 100644 index 4a42fc23..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_trans_weight_0132.c +++ /dev/null @@ -1,294 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define TRANS_OUTPUT_1203 false -#define TRANS_WEIGHT_1203 false -#define TRANS_WEIGHT_0132 true - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, false, TRANS_WEIGHT_1203, - TRANS_WEIGHT_0132, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - CPU); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, false, TRANS_WEIGHT_1203, - TRANS_WEIGHT_0132, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_trans_weight_1203.c b/bb-tests/workloads/src/CTest/gemmini/conv_trans_weight_1203.c deleted file mode 100644 index 297763e9..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_trans_weight_1203.c +++ /dev/null @@ -1,291 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 17 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#endif - -#define NO_BIAS false - -#define TRANS_OUTPUT_1203 false -#define TRANS_WEIGHT_1203 true - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, false, TRANS_WEIGHT_1203, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - CPU); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, TRANS_OUTPUT_1203, false, TRANS_WEIGHT_1203, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation.c b/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation.c deleted file mode 100644 index 4b0fffcd..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation.c +++ /dev/null @@ -1,380 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 17 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 23 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 - -#endif - -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 2 - -#endif - -#define NO_BIAS false - -#define IN_ROW_DIM_DILATED \ - (IN_ROW_DIM + (INPUT_DILATION - 1) * (IN_ROW_DIM - 1)) -#define IN_COL_DIM_DILATED \ - (IN_COL_DIM + (INPUT_DILATION - 1) * (IN_COL_DIM - 1)) -#define OUT_ROW_DIM \ - ((IN_ROW_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM \ - ((IN_COL_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int input_dilation, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - - const size_t in_row_dim_dilated = - in_row_dim + (input_dilation - 1) * (in_row_dim - 1); - const size_t in_col_dim_dilated = - in_col_dim + (input_dilation - 1) * (in_col_dim - 1); - assert(in_row_dim_dilated == IN_ROW_DIM_DILATED); - assert(in_col_dim_dilated == IN_COL_DIM_DILATED); - static elem_t dilated[BATCH_SIZE][IN_ROW_DIM_DILATED][IN_COL_DIM_DILATED] - [IN_CHANNELS]; - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != - (in_row_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - printf("out_row_dim\n"); - exit(1); - } - if (out_col_dim != - (in_col_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - printf("out_col_dim\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow++) - for (int icol = 0; icol < in_col_dim_dilated; icol++) - for (int ich = 0; ich < in_channels; ich++) - dilated[b][irow][icol][ich] = 0; - - size_t idx = 0; - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow += input_dilation) - for (int icol = 0; icol < in_col_dim_dilated; icol += input_dilation) - for (int ich = 0; ich < in_channels; ich++) { - dilated[b][irow][icol][ich] = *((elem_t *)input + idx); - idx++; - } - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim_dilated || - icol < 0 || icol >= in_col_dim_dilated - ? 0 - : dilated[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, PADDING, - input, weights, bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, 1, PADDING, - KERNEL_DIM, false, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 6 && v != 11 && v != 21) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation_and_neg_padding.c b/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation_and_neg_padding.c deleted file mode 100644 index 85cd1eaf..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation_and_neg_padding.c +++ /dev/null @@ -1,373 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 17 -#define KERNEL_DIM 3 -#define PADDING -1 -#define STRIDE 1 -#define INPUT_DILATION 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING -1 -#define STRIDE 1 -#define INPUT_DILATION 2 - -#endif - -#define NO_BIAS false - -#define IN_ROW_DIM_DILATED \ - (IN_ROW_DIM + (INPUT_DILATION - 1) * (IN_ROW_DIM - 1)) -#define IN_COL_DIM_DILATED \ - (IN_COL_DIM + (INPUT_DILATION - 1) * (IN_COL_DIM - 1)) -#define OUT_ROW_DIM \ - ((IN_ROW_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM \ - ((IN_COL_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int input_dilation, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - - const size_t in_row_dim_dilated = - in_row_dim + (input_dilation - 1) * (in_row_dim - 1); - const size_t in_col_dim_dilated = - in_col_dim + (input_dilation - 1) * (in_col_dim - 1); - assert(in_row_dim_dilated == IN_ROW_DIM_DILATED); - assert(in_col_dim_dilated == IN_COL_DIM_DILATED); - static elem_t dilated[BATCH_SIZE][IN_ROW_DIM_DILATED][IN_COL_DIM_DILATED] - [IN_CHANNELS]; - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != - (in_row_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - printf("out_row_dim\n"); - exit(1); - } - if (out_col_dim != - (in_col_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - printf("out_col_dim\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow++) - for (int icol = 0; icol < in_col_dim_dilated; icol++) - for (int ich = 0; ich < in_channels; ich++) - dilated[b][irow][icol][ich] = 0; - - size_t idx = 0; - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow += input_dilation) - for (int icol = 0; icol < in_col_dim_dilated; icol += input_dilation) - for (int ich = 0; ich < in_channels; ich++) { - dilated[b][irow][icol][ich] = *((elem_t *)input + idx); - idx++; - } - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim_dilated || - icol < 0 || icol >= in_col_dim_dilated - ? 0 - : dilated[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Dilated input dimensions: %u by %u\n", IN_ROW_DIM_DILATED, - IN_COL_DIM_DILATED); - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, PADDING, - input, weights, bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, 1, PADDING, - KERNEL_DIM, false, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - // CPU); - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 6 && v != 11 && v != 21) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation_and_rot180.c b/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation_and_rot180.c deleted file mode 100644 index d226d8c2..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_with_input_dilation_and_rot180.c +++ /dev/null @@ -1,387 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 17 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 2 - -#endif - -#define NO_BIAS false - -#define WROT180 true - -#define IN_ROW_DIM_DILATED \ - (IN_ROW_DIM + (INPUT_DILATION - 1) * (IN_ROW_DIM - 1)) -#define IN_COL_DIM_DILATED \ - (IN_COL_DIM + (INPUT_DILATION - 1) * (IN_COL_DIM - 1)) -#define OUT_ROW_DIM \ - ((IN_ROW_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM \ - ((IN_COL_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int input_dilation, int padding, bool wrot180, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - - const size_t in_row_dim_dilated = - in_row_dim + (input_dilation - 1) * (in_row_dim - 1); - const size_t in_col_dim_dilated = - in_col_dim + (input_dilation - 1) * (in_col_dim - 1); - assert(in_row_dim_dilated == IN_ROW_DIM_DILATED); - assert(in_col_dim_dilated == IN_COL_DIM_DILATED); - static elem_t dilated[BATCH_SIZE][IN_ROW_DIM_DILATED][IN_COL_DIM_DILATED] - [IN_CHANNELS]; - - static elem_t weights_rot180[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM] - [IN_CHANNELS]; - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != - (in_row_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - printf("out_row_dim\n"); - exit(1); - } - if (out_col_dim != - (in_col_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - printf("out_col_dim\n"); - exit(1); - } -#endif - - // Populate dilated - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow++) - for (int icol = 0; icol < in_col_dim_dilated; icol++) - for (int ich = 0; ich < in_channels; ich++) - dilated[b][irow][icol][ich] = 0; - - size_t idx = 0; - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow += input_dilation) - for (int icol = 0; icol < in_col_dim_dilated; icol += input_dilation) - for (int ich = 0; ich < in_channels; ich++) { - dilated[b][irow][icol][ich] = *((elem_t *)input + idx); - idx++; - } - - // Populate weights_rot180 - for (int och = 0; och < out_channels; och++) - for (int krow = 0; krow < kernel_dim; krow++) - for (int kcol = 0; kcol < kernel_dim; kcol++) - for (int kch = 0; kch < in_channels; kch++) - weights_rot180[och][krow][kcol][kch] = - weights[och][kernel_dim - krow - 1][kernel_dim - kcol - 1][kch]; - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim_dilated || - icol < 0 || icol >= in_col_dim_dilated - ? 0 - : dilated[b][irow][icol][kch]; - - elem_t w = wrot180 ? weights_rot180[och][krow][kcol][kch] - : weights[och][krow][kcol][kch]; - - result += w * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, PADDING, - WROT180, input, weights, bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, 1, PADDING, - KERNEL_DIM, WROT180, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 6 && v != 11 && v != 21) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_with_kernel_dilation.c b/bb-tests/workloads/src/CTest/gemmini/conv_with_kernel_dilation.c deleted file mode 100644 index 326633fc..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_with_kernel_dilation.c +++ /dev/null @@ -1,396 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 17 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 -#define INPUT_DILATION 1 -#define KERNEL_DILATION 2 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 1 -#define KERNEL_DILATION 2 - -#endif - -#define NO_BIAS false - -#define IN_ROW_DIM_DILATED \ - (IN_ROW_DIM + (INPUT_DILATION - 1) * (IN_ROW_DIM - 1)) -#define IN_COL_DIM_DILATED \ - (IN_COL_DIM + (INPUT_DILATION - 1) * (IN_COL_DIM - 1)) -#define KERNEL_DIM_DILATED \ - (KERNEL_DIM + (KERNEL_DILATION - 1) * (KERNEL_DIM - 1)) -#define OUT_ROW_DIM \ - ((IN_ROW_DIM_DILATED + 2 * PADDING - KERNEL_DIM_DILATED) / STRIDE + 1) -#define OUT_COL_DIM \ - ((IN_COL_DIM_DILATED + 2 * PADDING - KERNEL_DIM_DILATED) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int input_dilation, int kernel_dilation, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - - const size_t in_row_dim_dilated = - in_row_dim + (input_dilation - 1) * (in_row_dim - 1); - const size_t in_col_dim_dilated = - in_col_dim + (input_dilation - 1) * (in_col_dim - 1); - assert(in_row_dim_dilated == IN_ROW_DIM_DILATED); - assert(in_col_dim_dilated == IN_COL_DIM_DILATED); - static elem_t dilated[BATCH_SIZE][IN_ROW_DIM_DILATED][IN_COL_DIM_DILATED] - [IN_CHANNELS]; - - const size_t kernel_dim_dilated = - kernel_dim + (kernel_dilation - 1) * (kernel_dim - 1); - assert(kernel_dim_dilated == KERNEL_DIM_DILATED); - static elem_t weights_dilated[OUT_CHANNELS][KERNEL_DIM_DILATED] - [KERNEL_DIM_DILATED][IN_CHANNELS]; - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != - (in_row_dim_dilated + 2 * padding - kernel_dim_dilated) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - printf("out_row_dim\n"); - exit(1); - } - if (out_col_dim != - (in_col_dim_dilated + 2 * padding - kernel_dim_dilated) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - printf("out_col_dim\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow++) - for (int icol = 0; icol < in_col_dim_dilated; icol++) - for (int ich = 0; ich < in_channels; ich++) - dilated[b][irow][icol][ich] = 0; - - size_t idx = 0; - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow += input_dilation) - for (int icol = 0; icol < in_col_dim_dilated; icol += input_dilation) - for (int ich = 0; ich < in_channels; ich++) { - dilated[b][irow][icol][ich] = *((elem_t *)input + idx); - idx++; - } - - for (int och = 0; och < out_channels; och++) - for (int krow = 0; krow < kernel_dim_dilated; krow++) - for (int kcol = 0; kcol < kernel_dim_dilated; kcol++) - for (int kch = 0; kch < in_channels; kch++) - weights_dilated[och][krow][kcol][kch] = 0; - - idx = 0; - for (int och = 0; och < out_channels; och++) - for (int krow = 0; krow < kernel_dim_dilated; krow += kernel_dilation) - for (int kcol = 0; kcol < kernel_dim_dilated; kcol += kernel_dilation) - for (int kch = 0; kch < in_channels; kch++) { - weights_dilated[och][krow][kcol][kch] = *((elem_t *)weights + idx); - idx++; - } - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim_dilated; krow++) { - for (int kcol = 0; kcol < kernel_dim_dilated; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim_dilated || - icol < 0 || icol >= in_col_dim_dilated - ? 0 - : dilated[b][irow][icol][kch]; - - result += weights_dilated[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, - KERNEL_DILATION, PADDING, input, weights, bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, - KERNEL_DILATION, PADDING, KERNEL_DIM, false, false, false, - false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_with_pool.c b/bb-tests/workloads/src/CTest/gemmini/conv_with_pool.c deleted file mode 100644 index a5eb558e..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_with_pool.c +++ /dev/null @@ -1,431 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 32 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#define POOL_SIZE 3 -#define POOL_STRIDE 2 -#define POOL_PADDING 1 - -#else - -#ifdef FAST -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 -#else -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 2 - -#define POOL_SIZE 3 -#define POOL_STRIDE 2 -#define POOL_PADDING 1 - -#endif - -#define NO_BIAS false - -#define OUT_ROW_DIM ((IN_ROW_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM ((IN_COL_DIM + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -#define POOL_OUT_ROW_DIM \ - ((OUT_ROW_DIM + 2 * POOL_PADDING - POOL_SIZE) / POOL_STRIDE + 1) -#define POOL_OUT_COL_DIM \ - ((OUT_COL_DIM + 2 * POOL_PADDING - POOL_SIZE) / POOL_STRIDE + 1) - -#define NO_POOL false - -#if NO_POOL == true && \ - !(POOL_SIZE == 1 && POOL_STRIDE == 1 && POOL_PADDING == 0) -#error NO_POOL is not set correctly -#endif - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != (in_row_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - exit(1); - } - if (out_col_dim != (in_col_dim + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - exit(1); - } -#endif - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][kch]; - - result += weights[och][krow][kcol][kch] * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void pool(int batch_size, int channels, int in_row_dim, int in_col_dim, - int out_row_dim, int out_col_dim, int window_dim, int stride, - int padding, - elem_t input[batch_size][in_row_dim][in_col_dim][channels], - elem_t output[batch_size][out_row_dim][out_col_dim][channels]) { - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int ch = 0; ch < channels; ch++) { - output[b][orow][ocol][ch] = elem_t_min; - - for (int wrow = 0; wrow < window_dim; wrow++) { - for (int wcol = 0; wcol < window_dim; wcol++) { - int irow = orow * stride + wrow - padding; - int icol = ocol * stride + wcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : input[b][irow][icol][ch]; - - if (pixel > output[b][orow][ocol][ch]) { - output[b][orow][ocol][ch] = pixel; - } - } - } - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((IN_DIM + 2*PADDING - KERNEL_DIM) % STRIDE == 0); - // assert((OUT_DIM + 2*PADDING - POOL_SIZE) % POOL_STRIDE == 0); - - printf("Output dimensions (rows by columns): %u by %u\n", OUT_ROW_DIM, - OUT_COL_DIM); - printf("Pooling output dimensions (rows by columns): %u by %u\n\n", - POOL_OUT_ROW_DIM, POOL_OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - static elem_t pool_output[BATCH_SIZE][POOL_OUT_ROW_DIM][POOL_OUT_COL_DIM] - [OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - -#ifndef FAST - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, PADDING, input, weights, - bias, output); - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - printf("CPU pool...\n"); - uint64_t start_cpu_pool = read_cycles(); - pool(BATCH_SIZE, OUT_CHANNELS, OUT_ROW_DIM, OUT_COL_DIM, POOL_OUT_ROW_DIM, - POOL_OUT_COL_DIM, POOL_SIZE, POOL_STRIDE, POOL_PADDING, output, - pool_output); - uint64_t end_cpu_pool = read_cycles(); - printf("CPU pool took %llu cycles\n", end_cpu_pool - start_cpu_pool); - - printf("CPU conv+pool took %llu cycles\n", - end_cpu_pool - start_cpu_pool + end_cpu - start_cpu); -#endif - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - static elem_t pool_output_mat[BATCH_SIZE * POOL_OUT_ROW_DIM * - POOL_OUT_COL_DIM][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, 1, 1, PADDING, KERNEL_DIM, - false, false, false, false, false, - - // 1, - // 1, 1, 1, - // 1, 1, 1, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)pool_output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, POOL_SIZE, - NO_POOL ? 0 : POOL_STRIDE, POOL_PADDING, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(pool_output_mat) == sizeof(pool_output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * POOL_OUT_ROW_DIM * POOL_OUT_COL_DIM; - orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - if (pool_output_mat[orow][ocol] != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&pool_output[0][0][0][0], &pool_output_mat[0][0], - sizeof(pool_output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("pool_output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < POOL_OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < POOL_OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", pool_output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("pool_output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * POOL_OUT_ROW_DIM * POOL_OUT_COL_DIM; - orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", pool_output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("Output dimensions (rows by columns): %u by %u\n", OUT_ROW_DIM, - OUT_COL_DIM); - printf("Pooling output dimensions: %u by %u\n\n", POOL_OUT_ROW_DIM, - POOL_OUT_COL_DIM); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/conv_with_rot180.c b/bb-tests/workloads/src/CTest/gemmini/conv_with_rot180.c deleted file mode 100644 index e511641b..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/conv_with_rot180.c +++ /dev/null @@ -1,387 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCH_SIZE 4 -#define IN_ROW_DIM 224 -#define IN_COL_DIM 224 -#define IN_CHANNELS 3 -#define OUT_CHANNELS 17 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 1 - -#else - -#ifdef FAST - -#define IN_ROW_DIM 9 -#define IN_COL_DIM 9 -#define IN_CHANNELS 5 -#define OUT_CHANNELS 7 - -#else - -#define IN_ROW_DIM 17 -#define IN_COL_DIM 17 -#define IN_CHANNELS 18 -#define OUT_CHANNELS 19 - -#endif - -#define BATCH_SIZE 2 -#define KERNEL_DIM 3 -#define PADDING 1 -#define STRIDE 1 -#define INPUT_DILATION 1 - -#endif - -#define NO_BIAS false - -#define WROT180 true - -#define IN_ROW_DIM_DILATED \ - (IN_ROW_DIM + (INPUT_DILATION - 1) * (IN_ROW_DIM - 1)) -#define IN_COL_DIM_DILATED \ - (IN_COL_DIM + (INPUT_DILATION - 1) * (IN_COL_DIM - 1)) -#define OUT_ROW_DIM \ - ((IN_ROW_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define OUT_COL_DIM \ - ((IN_COL_DIM_DILATED + 2 * PADDING - KERNEL_DIM) / STRIDE + 1) -#define PATCH_SIZE (KERNEL_DIM * KERNEL_DIM * IN_CHANNELS) -#define N_PATCHES (BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM) - -void conv(int batch_size, int in_channels, int in_row_dim, int in_col_dim, - int out_channels, int kernel_dim, int out_row_dim, int out_col_dim, - int stride, int input_dilation, int padding, bool wrot180, - elem_t input[batch_size][in_row_dim][in_col_dim][in_channels], - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - acc_t bias[out_channels], - elem_t output[batch_size][out_row_dim][out_col_dim][out_channels]) { - - const size_t in_row_dim_dilated = - in_row_dim + (input_dilation - 1) * (in_row_dim - 1); - const size_t in_col_dim_dilated = - in_col_dim + (input_dilation - 1) * (in_col_dim - 1); - assert(in_row_dim_dilated == IN_ROW_DIM_DILATED); - assert(in_col_dim_dilated == IN_COL_DIM_DILATED); - static elem_t dilated[BATCH_SIZE][IN_ROW_DIM_DILATED][IN_COL_DIM_DILATED] - [IN_CHANNELS]; - - static elem_t weights_rot180[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM] - [IN_CHANNELS]; - -#ifdef GEMMINI_ASSERTIONS - if (out_row_dim != - (in_row_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_row_dim is not correct\n"); - printf("out_row_dim\n"); - exit(1); - } - if (out_col_dim != - (in_col_dim_dilated + 2 * padding - kernel_dim) / stride + 1) { - printf("conv out_col_dim is not correct\n"); - printf("out_col_dim\n"); - exit(1); - } -#endif - - // Populate dilated - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow++) - for (int icol = 0; icol < in_col_dim_dilated; icol++) - for (int ich = 0; ich < in_channels; ich++) - dilated[b][irow][icol][ich] = 0; - - size_t idx = 0; - for (int b = 0; b < batch_size; b++) - for (int irow = 0; irow < in_row_dim_dilated; irow += input_dilation) - for (int icol = 0; icol < in_col_dim_dilated; icol += input_dilation) - for (int ich = 0; ich < in_channels; ich++) { - dilated[b][irow][icol][ich] = *((elem_t *)input + idx); - idx++; - } - - // Populate weights_rot180 - for (int och = 0; och < out_channels; och++) - for (int krow = 0; krow < kernel_dim; krow++) - for (int kcol = 0; kcol < kernel_dim; kcol++) - for (int kch = 0; kch < in_channels; kch++) - weights_rot180[och][krow][kcol][kch] = - weights[och][kernel_dim - krow - 1][kernel_dim - kcol - 1][kch]; - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - acc_t result = bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int kch = 0; kch < in_channels; kch++) { - int irow = orow * stride + krow - padding; - int icol = ocol * stride + kcol - padding; - - elem_t pixel = irow < 0 || irow >= in_row_dim_dilated || - icol < 0 || icol >= in_col_dim_dilated - ? 0 - : dilated[b][irow][icol][kch]; - - elem_t w = wrot180 ? weights_rot180[och][krow][kcol][kch] - : weights[och][krow][kcol][kch]; - - result += w * pixel; - } - } - } - - // Clip result - result = result > elem_t_max - ? elem_t_max - : (result < elem_t_min ? elem_t_min : result); - - output[b][orow][ocol][och] = result; - } - } - } - } -} - -void flatten_weights( - int out_channels, int kernel_dim, int in_channels, int patch_size, - elem_t weights[out_channels][kernel_dim][kernel_dim][in_channels], - elem_t weights_mat[patch_size][out_channels]) { - - assert(patch_size == kernel_dim * kernel_dim * in_channels); - - for (int outc = 0; outc < out_channels; outc++) { - for (int krow = 0; krow < kernel_dim; krow++) { - for (int kcol = 0; kcol < kernel_dim; kcol++) { - for (int inc = 0; inc < in_channels; inc++) { - int wmatrow = - krow * kernel_dim * in_channels + kcol * in_channels + inc; - - weights_mat[wmatrow][outc] = weights[outc][krow][kcol][inc]; - } - } - } - } -} - -bool vec_is_equal(elem_t *a, elem_t *b, int len) { - for (int i = 0; i < len; i++) - if (a[i] != b[i]) - return false; - return true; -} - -void init_random(elem_t *buf, int len) { - elem_t i = 0; - for (elem_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_random_acc(acc_t *buf, int len) { - elem_t i = 0; - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - // *ptr = (rand() % 32) - 16; -#ifdef FAST - *ptr = 1; -#else - *ptr = (rand() % 5) - 2; -#endif - } -} - -void init_zeros_acc(acc_t *buf, int len) { - for (acc_t *ptr = buf; ptr < buf + len; ptr++) { - *ptr = 0; - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // assert((in_dim + 2*padding - kernel_dim) % stride == 0); - - printf("Input dimensions: %u by %u\n", IN_ROW_DIM, IN_COL_DIM); - printf("Output dimensions: %u by %u\n\n", OUT_ROW_DIM, OUT_COL_DIM); - - static elem_t input[BATCH_SIZE][IN_ROW_DIM][IN_COL_DIM][IN_CHANNELS]; - static elem_t weights[OUT_CHANNELS][KERNEL_DIM][KERNEL_DIM][IN_CHANNELS]; - static acc_t bias[OUT_CHANNELS]; - static elem_t output[BATCH_SIZE][OUT_ROW_DIM][OUT_COL_DIM][OUT_CHANNELS]; - - printf("Randomize inputs...\n"); - init_random(&input[0][0][0][0], sizeof(input) / sizeof(elem_t)); - - printf("Randomize weights...\n"); - init_random(&weights[0][0][0][0], sizeof(weights) / sizeof(elem_t)); - - printf("Randomize bias...\n"); - if (NO_BIAS) - init_zeros_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - else - init_random_acc(&bias[0], sizeof(bias) / sizeof(acc_t)); - - printf("CPU conv...\n"); - uint64_t start_cpu = read_cycles(); -#ifndef FAST - conv(BATCH_SIZE, IN_CHANNELS, IN_ROW_DIM, IN_COL_DIM, OUT_CHANNELS, - KERNEL_DIM, OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, PADDING, - WROT180, input, weights, bias, output); -#endif - uint64_t end_cpu = read_cycles(); - printf("CPU conv took %llu cycles\n", end_cpu - start_cpu); - - static elem_t weights_mat[PATCH_SIZE][OUT_CHANNELS]; - static elem_t output_mat[N_PATCHES][OUT_CHANNELS]; - - printf("Flatten weights...\n"); - flatten_weights(OUT_CHANNELS, KERNEL_DIM, IN_CHANNELS, PATCH_SIZE, weights, - weights_mat); - - printf("Gemmini conv...\n"); - uint64_t start_gemmini = read_cycles(); - tiled_conv_auto(BATCH_SIZE, IN_ROW_DIM, IN_COL_DIM, IN_CHANNELS, OUT_CHANNELS, - OUT_ROW_DIM, OUT_COL_DIM, STRIDE, INPUT_DILATION, 1, PADDING, - KERNEL_DIM, WROT180, false, false, false, false, - - (elem_t *)input, (elem_t *)weights_mat, - NO_BIAS ? NULL : (acc_t *)bias, (elem_t *)output_mat, - - NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, 0, 0, - - WS); - uint64_t end_gemmini = read_cycles(); - printf("Gemmini conv took %llu cycles\n", end_gemmini - start_gemmini); - - assert(sizeof(output_mat) == sizeof(output)); - -#ifdef FAST - bool success = true; - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - elem_t v = output_mat[orow][ocol]; - if (v != 21 && v != 31 && v != 46) { - success = false; - break; - } - } - } -#else - bool success = vec_is_equal(&output[0][0][0][0], &output_mat[0][0], - sizeof(output) / sizeof(elem_t)); -#endif - - if (!success) { - // return 1; - - printf("bias:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", bias[och]); - } - printf("\b\n\n"); - - printf("weights:\n"); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("["); - for (int wrow = 0; wrow < KERNEL_DIM; wrow++) { - printf("["); - for (int wcol = 0; wcol < KERNEL_DIM; wcol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", weights[och][wrow][wcol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("weights_mat:\n"); - for (int wrow = 0; wrow < KERNEL_DIM * KERNEL_DIM * IN_CHANNELS; wrow++) { - printf("["); - for (int wcol = 0; wcol < OUT_CHANNELS; wcol++) { - printf("%d,", weights_mat[wrow][wcol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - printf("input:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int irow = 0; irow < IN_ROW_DIM; irow++) { - printf("["); - for (int icol = 0; icol < IN_COL_DIM; icol++) { - printf("["); - for (int ich = 0; ich < IN_CHANNELS; ich++) { - printf("%d,", input[batch][irow][icol][ich]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output:\n"); - for (int batch = 0; batch < BATCH_SIZE; batch++) { - printf("["); - for (int orow = 0; orow < OUT_ROW_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_COL_DIM; ocol++) { - printf("["); - for (int och = 0; och < OUT_CHANNELS; och++) { - printf("%d,", output[batch][orow][ocol][och]); - } - printf("\b],"); - } - printf("\b],\n"); - } - printf("\b],"); - } - printf("\b\n\n"); - - printf("output_mat:\n"); - for (int orow = 0; orow < BATCH_SIZE * OUT_ROW_DIM * OUT_COL_DIM; orow++) { - printf("["); - for (int ocol = 0; ocol < OUT_CHANNELS; ocol++) { - printf("%d,", output_mat[orow][ocol]); - } - printf("\b],\n"); - } - printf("\b\n\n"); - - return 1; - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/gemmini/gemmini_counter.c b/bb-tests/workloads/src/CTest/gemmini/gemmini_counter.c deleted file mode 100644 index 13388e4b..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/gemmini_counter.c +++ /dev/null @@ -1,129 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define N 8 - -#if (N * DIM) > (BANK_NUM * BANK_ROWS) -#error not enough scratchpad space -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - // Set counter and reset - counter_configure(0, LOAD_ACTIVE_CYCLE); - if (counter_read(0) != 0) { - printf("Counter Reset Failed (not equal to 0)\n"); - exit(1); - } - - // Initial matrix - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - static elem_t In[N][DIM][DIM] row_align(1); - static elem_t Out[N][DIM][DIM] row_align(1); - - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) - In[n][i][j] = i * DIM + j + n; - - // Move in - for (size_t i = 0; i < N; i++) { - gemmini_mvin(In[i], i * DIM); - gemmini_mvout(Out[i], i * DIM); - } - - // Check value (should be increasing right now as Gemmini executes in the - // background) - int counter_val = counter_read(0); - - // Take a snapshot - counter_snapshot_take(); - int snapshot_val = counter_read(0); - - // Print first counter value (Syscall takes a lot of time, and we might not - // capture a snapshot when the instructions are still being executed) - printf("Read DMA cycles: %d\n", counter_val); - if (counter_val == 0) { - printf("Counter Value failed to increase\n"); - exit(1); - } - - // Wait till the operation finish - gemmini_fence(); - - // Check again - counter_val = counter_read(0); - printf("Cycle when taking snapshot: %d, Cycle read after operation finished: " - "%d\n", - snapshot_val, counter_val); - if (counter_val != snapshot_val) { - printf("Snapshot changed after taken; test failed\n"); - exit(1); - } - - // Reset snapshot, and check if cycles changed - counter_snapshot_reset(); - counter_val = counter_read(0); - printf("Cycles after snapshot is reset: %d\n", counter_val); - if (counter_val < snapshot_val + 10) { - printf("Counter values changed too little after snapshot reset; check if " - "counter continues properly\n"); - exit(1); - } - - // Global reset - counter_reset(); - counter_val = counter_read(0); - printf("Cycles after counter reset: %d\n", counter_val); - if (counter_val != 0) { - printf("Cycles did not reset after global reset inst\n"); - exit(1); - } - - // Check external counter - counter_configure(7, RESERVATION_STATION_LD_COUNT); - for (size_t i = 0; i < N; i++) { - gemmini_mvin(In[i], i * DIM); - gemmini_mvout(Out[i], i * DIM); - } - - // Fused read and take snapshot command - uint32_t custom_command = (7 & 0x7) << 4 | 0x4; - gemmini_counter_access(counter_val, custom_command); - - printf("RESERVATION_STATION # of load insts after executing %d mvin and " - "mvout insts: %d\n", - N - 1, counter_val); - if (counter_val < 2) { - printf("The load RESERVATION_STATION counter value is too small\n"); - exit(1); - } - - snapshot_val = counter_read(7); - if (counter_val != snapshot_val) { - printf("Snapshot value doesn't match the raw value read before snapshot " - "taken\n"); - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/global_average.c b/bb-tests/workloads/src/CTest/gemmini/global_average.c deleted file mode 100644 index 3994d281..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/global_average.c +++ /dev/null @@ -1,92 +0,0 @@ -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL - -#define BATCHES 4 -#define INPUT_DIM 7 -#define CHANNELS 2048 - -#else - -#define BATCHES 2 -#define INPUT_DIM 3 -#define CHANNELS 47 - -#endif - -void init_random(elem_t *buf, int len) { - for (int i = 0; i < len; i++) { - buf[i] = rand() % 10; - } -} - -bool is_same(elem_t *x, elem_t *y, int len) { - for (int i = 0; i < len; i++) - if (x[i] != y[i]) - return false; - return true; -} - -int main() { - static elem_t input[BATCHES][INPUT_DIM][INPUT_DIM][CHANNELS]; - static elem_t output[BATCHES][CHANNELS]; - static elem_t gold[BATCHES][CHANNELS]; - - init_random((elem_t *)input, BATCHES * INPUT_DIM * INPUT_DIM * CHANNELS); - - printf("CPU average pooling...\n"); - tiled_global_average_auto((elem_t *)input, (elem_t *)gold, BATCHES, CHANNELS, - INPUT_DIM, CPU); - - printf("Gemmini average pooling...\n"); - tiled_global_average_auto((elem_t *)input, (elem_t *)output, BATCHES, - CHANNELS, INPUT_DIM, WS); - - if (!is_same((elem_t *)gold, (elem_t *)output, BATCHES * CHANNELS)) { - printf("Fail\n"); - - printf("Input:\n"); - for (int b = 0; b < BATCHES; b++) { - for (int row = 0; row < INPUT_DIM; row++) { - printf("{"); - for (int col = 0; col < INPUT_DIM; col++) { - printf("{"); - for (int ch = 0; ch < CHANNELS; ch++) { - printf("%d ", input[b][row][col][ch]); - } - printf("}"); - } - printf("}"); - } - printf("\n"); - } - - printf("Output:\n"); - for (int b = 0; b < BATCHES; b++) { - for (int ch = 0; ch < CHANNELS; ch++) { - printf("%d ", output[b][ch]); - } - printf("\n"); - } - - printf("Gold:\n"); - for (int b = 0; b < BATCHES; b++) { - for (int ch = 0; ch < CHANNELS; ch++) { - printf("%d ", gold[b][ch]); - } - printf("\n"); - } - - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/include/accumulator.h b/bb-tests/workloads/src/CTest/gemmini/include/accumulator.h deleted file mode 100644 index c4d53ac8..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/accumulator.h +++ /dev/null @@ -1,24 +0,0 @@ -// See LICENSE for license details. - -#ifndef SRC_MAIN_C_ACCUMULATOR_H -#define SRC_MAIN_C_ACCUMULATOR_H - -#include "rocc-software/src/xcustom.h" - -#define k_DO_WRITE 0 -#define k_DO_READ 1 -#define k_DO_LOAD 2 -#define k_DO_ACCUM 3 - -#define XCUSTOM_ACC 0 - -#define doWrite(y, rocc_rd, data) \ - ROCC_INSTRUCTION(XCUSTOM_ACC, y, data, rocc_rd, k_DO_WRITE); -#define doRead(y, rocc_rd) \ - ROCC_INSTRUCTION(XCUSTOM_ACC, y, 0, rocc_rd, k_DO_READ); -#define doLoad(y, rocc_rd, mem_addr) \ - ROCC_INSTRUCTION(XCUSTOM_ACC, y, mem_addr, rocc_rd, k_DO_LOAD); -#define doAccum(y, rocc_rd, data) \ - ROCC_INSTRUCTION(XCUSTOM_ACC, y, data, rocc_rd, k_DO_ACCUM); - -#endif // SRC_MAIN_C_ACCUMULATOR_H diff --git a/bb-tests/workloads/src/CTest/gemmini/include/character.h b/bb-tests/workloads/src/CTest/gemmini/include/character.h deleted file mode 100644 index baeeb178..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/character.h +++ /dev/null @@ -1,10 +0,0 @@ -// See LICENSE for license details. - -#ifndef SRC_MAIN_C_CHARACTER_H -#define SRC_MAIN_C_CHARACTER_H - -#include "rocc-software/src/xcustom.h" - -#define XCUSTOM_CHAR 2 - -#endif // SRC_MAIN_C_CHARACTER_H diff --git a/bb-tests/workloads/src/CTest/gemmini/include/gemmini.h b/bb-tests/workloads/src/CTest/gemmini/include/gemmini.h deleted file mode 100644 index 261b0846..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/gemmini.h +++ /dev/null @@ -1,3929 +0,0 @@ -// See LICENSE for license details. - -#ifndef SRC_MAIN_C_GEMMINI_H -#define SRC_MAIN_C_GEMMINI_H - -#undef abs - -#include -#include -#include -#include -#include -#include - -#include "include/gemmini_params.h" - -#define GEMMINI_ASSERTIONS - -// Accelerator interface -#include "rocc-software/src/xcustom.h" - -// Counter Definition -#include "include/gemmini_counter.h" - -#define k_CONFIG 0 -#define k_MVIN2 1 -#define k_MVIN 2 -#define k_MVOUT 3 -#define k_COMPUTE_PRELOADED 4 -#define k_COMPUTE_ACCUMULATE 5 -#define k_PRELOAD 6 -#define k_FLUSH 7 - -#define k_LOOP_WS 8 -#define k_LOOP_WS_CONFIG_BOUNDS 9 -#define k_LOOP_WS_CONFIG_ADDRS_AB 10 -#define k_LOOP_WS_CONFIG_ADDRS_DC 11 -#define k_LOOP_WS_CONFIG_STRIDES_AB 12 -#define k_LOOP_WS_CONFIG_STRIDES_DC 13 - -#define k_MVIN3 14 - -#define k_COUNTER 126 - -#define k_LOOP_CONV_WS 15 -#define k_LOOP_CONV_WS_CONFIG_1 16 -#define k_LOOP_CONV_WS_CONFIG_2 17 -#define k_LOOP_CONV_WS_CONFIG_3 18 -#define k_LOOP_CONV_WS_CONFIG_4 19 -#define k_LOOP_CONV_WS_CONFIG_5 20 -#define k_LOOP_CONV_WS_CONFIG_6 21 - -#define CONFIG_EX 0 -#define CONFIG_LD 1 -#define CONFIG_ST 2 -#define CONFIG_BERT 3 - -#define GARBAGE_ADDR ((uint32_t)(-1)) -#define OUTPUT_STATIONARY 0 -#define WEIGHT_STATIONARY 1 - -#define NO_ACTIVATION 0 -#define RELU 1 -#define LAYERNORM 2 -#define IGELU 3 -#define SOFTMAX 4 - -#ifdef ELEM_T_IS_FLOAT -elem_t elem_t_bits_to_elem_t(elem_t_bits x) { - union { - elem_t_bits b; - elem_t f; - } un; - - un.b = x; - return un.f; -} - -elem_t_bits elem_t_to_elem_t_bits(elem_t x) { - union { - elem_t_bits b; - elem_t f; - } un; - - un.f = x; - return un.b; -} - -acc_t acc_t_bits_to_acc_t(acc_t_bits x) { - union { - acc_t_bits b; - acc_t f; - } un; - - un.b = x; - return un.f; -} - -acc_t_bits acc_t_to_acc_t_bits(acc_t x) { - union { - acc_t_bits b; - acc_t f; - } un; - - un.f = x; - return un.b; -} - -bool elem_t_isnan(elem_t x) { - elem_t_bits bits = elem_t_to_elem_t_bits(x); - uint64_t exp = - (bits >> (ELEM_T_SIG_BITS - 1)) & (((uint64_t)1 << ELEM_T_EXP_BITS) - 1); - uint64_t sig = bits & (((uint64_t)1 << ELEM_T_SIG_BITS) - 1); - bool is_nan_or_inf = exp == (((uint64_t)1 << ELEM_T_EXP_BITS) - 1); - bool is_not_inf = sig != 0; - return is_nan_or_inf && is_not_inf; -} - -bool acc_t_isnan(acc_t x) { - acc_t_bits bits = acc_t_to_acc_t_bits(x); - uint64_t exp = - (bits >> (ACC_T_SIG_BITS - 1)) & (((uint64_t)1 << ACC_T_EXP_BITS) - 1); - uint64_t sig = bits & (((uint64_t)1 << ACC_T_SIG_BITS) - 1); - bool is_nan_or_inf = exp == (((uint64_t)1 << ACC_T_EXP_BITS) - 1); - bool is_not_inf = sig != 0; - return is_nan_or_inf && is_not_inf; -} -#endif - -#ifdef HAS_MVIN_SCALE -static scale_t scale_t_bits_to_scale_t(scale_t_bits x) { - union { - scale_t_bits b; - scale_t f; - } un; - - un.b = x; - return un.f; -} - -static scale_t_bits scale_t_to_scale_t_bits(scale_t x) { - union { - scale_t_bits b; - scale_t f; - } un; - - un.f = x; - return un.b; -} -#else -#define scale_t_to_scale_t_bits(x) 0 -#endif - -#ifdef HAS_MVIN_ACC_SCALE -static scale_acc_t scale_acc_t_bits_to_scale_acc_t(scale_acc_t_bits x) { - union { - scale_acc_t_bits b; - scale_acc_t f; - } un; - - un.b = x; - return un.f; -} - -static scale_acc_t_bits scale_acc_t_to_scale_acc_t_bits(scale_acc_t x) { - union { - scale_acc_t_bits b; - scale_acc_t f; - } un; - - un.f = x; - return un.b; -} -#endif - -static acc_scale_t acc_scale_t_bits_to_acc_scale_t(acc_scale_t_bits x) { - union { - acc_scale_t_bits b; - acc_scale_t f; - } un; - - un.b = x; - return un.f; -} - -static acc_scale_t_bits acc_scale_t_to_acc_scale_t_bits(acc_scale_t x) { - union { - acc_scale_t_bits b; - acc_scale_t f; - } un; - - un.f = x; - return un.b; -} - -#define ROCC_INSTRUCTION_RS1_RS2(x, rs1, rs2, funct) \ - ROCC_INSTRUCTION_0_R_R(x, rs1, rs2, funct) - -// mvin and mvout -#define gemmini_extended_mvin(dram_addr, spad_addr, cols, rows) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, dram_addr, \ - ((uint64_t)(rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(cols) << ADDR_LEN) | (spad_addr), \ - k_MVIN) - -#define gemmini_extended_mvin2(dram_addr, spad_addr, cols, rows) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, dram_addr, \ - ((uint64_t)(rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(cols) << ADDR_LEN) | (spad_addr), \ - k_MVIN2) - -#define gemmini_extended_mvin3(dram_addr, spad_addr, cols, rows) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, dram_addr, \ - ((uint64_t)(rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(cols) << ADDR_LEN) | (spad_addr), \ - k_MVIN3) - -#define gemmini_block_mvin(dram_addr, spad_addr, len) \ - gemmini_extended_mvin(dram_addr, spad_addr, (len) * DIM, DIM) - -#define gemmini_mvin(dram_addr, spad_addr) \ - gemmini_extended_mvin(dram_addr, spad_addr, DIM, DIM) - -#define gemmini_extended_mvout(dram_addr, spad_addr, cols, rows) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, dram_addr, \ - ((uint64_t)(rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(cols) << ADDR_LEN) | \ - (uint64_t)(spad_addr), \ - k_MVOUT) - -#define gemmini_mvout(dram_addr, spad_addr) \ - gemmini_extended_mvout(dram_addr, spad_addr, DIM, DIM) - -// compute -#define gemmini_extended_compute_preloaded(A, BD, A_cols, A_rows, BD_cols, \ - BD_rows) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(A_rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(A_cols) << ADDR_LEN) | (uint64_t)(A), \ - ((uint64_t)(BD_rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(BD_cols) << ADDR_LEN) | (uint64_t)(BD), \ - k_COMPUTE_PRELOADED) - -#define gemmini_extended_compute_accumulated(A, BD, A_cols, A_rows, BD_cols, \ - BD_rows) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(A_rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(A_cols) << ADDR_LEN) | (uint64_t)(A), \ - ((uint64_t)(BD_rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(BD_cols) << ADDR_LEN) | (uint64_t)(BD), \ - k_COMPUTE_ACCUMULATE) - -#define gemmini_compute_preloaded(A, BD) \ - gemmini_extended_compute_preloaded(A, BD, DIM, DIM, DIM, DIM) - -#define gemmini_compute_accumulated(A, BD) \ - gemmini_extended_compute_accumulated(A, BD, DIM, DIM, DIM, DIM) - -// preload -#define gemmini_extended_preload(BD, C, BD_cols, BD_rows, C_cols, C_rows) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(BD_rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(BD_cols) << ADDR_LEN) | (uint64_t)(BD), \ - ((uint64_t)(C_rows) << (ADDR_LEN + 16)) | \ - ((uint64_t)(C_cols) << ADDR_LEN) | (uint64_t)(C), \ - k_PRELOAD) - -#define gemmini_preload(BD, C) \ - gemmini_extended_preload(BD, C, DIM, DIM, DIM, DIM) - -#define gemmini_preload_zeros(C) gemmini_preload(GARBAGE_ADDR, C) - -// config -#define gemmini_extended3_config_ex( \ - dataflow, sys_act, sys_shift, sys_acc_scale, C_stride, A_stride, \ - A_transpose, B_transpose, set_only_strides) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)acc_scale_t_to_acc_scale_t_bits((acc_scale_t)sys_acc_scale) \ - << 32) | \ - ((uint64_t)(A_stride) << 16) | (B_transpose << 9) | \ - (A_transpose << 8) | ((set_only_strides) << 7) | ((sys_act) << 3) | \ - ((dataflow) << 2) | CONFIG_EX, \ - ((uint64_t)(C_stride) << 48) | (sys_shift), k_CONFIG); - -#define gemmini_extended2_config_ex(dataflow, sys_act, sys_shift, A_stride, \ - A_transpose, B_transpose) \ - gemmini_extended3_config_ex(dataflow, sys_act, sys_shift, \ - ACC_SCALE_IDENTITY, 1, A_stride, A_transpose, \ - B_transpose, false) - -#define gemmini_extended_config_ex(dataflow, sys_act, sys_shift, A_stride, \ - A_transpose, B_transpose) \ - gemmini_extended2_config_ex(dataflow, sys_act, sys_shift, A_stride, \ - A_transpose, B_transpose) - -#define gemmini_config_ex(dataflow, sys_act, sys_shift) \ - gemmini_extended_config_ex(dataflow, sys_act, sys_shift, 1, 0, 0) - -// Note: The "pixel_repeats" parameter below is still experimental, andthere is -// a high chance that it will be removed in future releases. -#define gemmini_extended5_config_ld(stride, scale, shrunk, block_mvin_stride, \ - pixel_repeats, id) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(scale_t_to_scale_t_bits(scale)) << 32) | \ - ((uint64_t)(block_mvin_stride) << 16) | \ - ((uint64_t)(pixel_repeats) << 8) | ((id) << 3) | ((shrunk) << 2) | \ - CONFIG_LD, \ - stride, k_CONFIG) - -#define gemmini_extended4_config_ld(stride, scale, shrunk, block_mvin_stride, \ - id) \ - gemmini_extended5_config_ld(stride, scale, shrunk, block_mvin_stride, 1, id) - -#define gemmini_extended3_config_ld(stride, scale, shrunk, id) \ - gemmini_extended4_config_ld(stride, scale, shrunk, DIM, id) - -#define gemmini_extended2_config_ld(stride, scale, shrunk) \ - gemmini_extended3_config_ld(stride, scale, shrunk, 0) - -#define gemmini_extended_config_ld(stride, scale) \ - gemmini_extended2_config_ld(stride, scale, false) - -#define gemmini_config_ld(stride) \ - gemmini_extended_config_ld(stride, MVIN_SCALE_IDENTITY) - -#define gemmini_extended2_config_st(stride, acc_act, acc_scale, pool_stride, \ - pool_size, pool_out_dim, porows, pocols, \ - orows, ocols, upad, lpad) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(ocols) << 56) | ((uint64_t)(orows) << 48) | \ - ((uint64_t)(pocols) << 40) | ((uint64_t)(porows) << 32) | \ - ((uint64_t)(pool_out_dim) << 24) | ((uint64_t)(lpad) << 10) | \ - ((uint64_t)(upad) << 8) | ((uint64_t)(pool_size) << 6) | \ - ((uint64_t)(pool_stride) << 4) | ((uint64_t)(acc_act) << 2) | \ - CONFIG_ST, \ - ((uint64_t)acc_scale_t_to_acc_scale_t_bits((acc_scale_t)acc_scale) \ - << 32) | \ - ((uint32_t)stride), \ - k_CONFIG) - -#define gemmini_extended_config_st(stride, acc_act, acc_scale) \ - gemmini_extended2_config_st(stride, acc_act, acc_scale, 0, 0, 0, 0, 0, 0, 0, \ - 0, 0) - -#define gemmini_config_st(stride) \ - gemmini_extended_config_st(stride, NO_ACTIVATION, ACC_SCALE_IDENTITY) - -#define gemmini_config_norm(q_const, q_const_type, set_stats_id_only, act_msb, \ - stat_id, igelu_qb, igelu_qc) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - (((uint64_t)((uint32_t)q_const)) << 32) | ((q_const_type & 1) << 18) | \ - ((set_stats_id_only & 1) << 17) | ((act_msb & 1) << 16) | \ - ((uint64_t)stat_id << 8) | CONFIG_BERT, \ - ((uint64_t)((uint32_t)(igelu_qc)) << 32) | \ - ((uint64_t)((uint32_t)(igelu_qb))), \ - k_CONFIG) - -// flush -#define gemmini_flush(skip) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, skip, 0, k_FLUSH) - -// fence -#define gemmini_fence() asm volatile("fence") - -// Counter access -#define gemmini_counter_access(rd, config_reg) \ - { \ - uint32_t _placeholder; \ - ROCC_INSTRUCTION(XCUSTOM_ACC, rd, config_reg, _placeholder, k_COUNTER) \ - } - -// Read counter -static uint32_t counter_read(size_t index) { - uint32_t config_reg = (index & 0x7) << 4; - uint32_t res; - gemmini_counter_access(res, config_reg); - return res; -} - -// Configure counter to take a new signal -static void counter_configure(size_t index, size_t counter_code) { - int non_incremental = counter_code > INCREMENTAL_COUNTERS; - if (non_incremental) { - counter_code -= INCREMENTAL_COUNTERS; - } - - uint32_t config_reg = (index & 0x7) << 4 | 0x8 | (counter_code & 0x3f) << 12 | - non_incremental << 31; - uint32_t placeholder; - gemmini_counter_access(placeholder, config_reg); -} - -// Take a snapshot -static void counter_snapshot_take() { - uint32_t config_reg = 0x4; - uint32_t placeholder; - gemmini_counter_access(placeholder, config_reg); -} - -// Counter snapshot reset -static void counter_snapshot_reset() { - uint32_t config_reg = 0x2; - uint32_t placeholder; - gemmini_counter_access(placeholder, config_reg); -} - -// Counter module reset -static void counter_reset() { - uint32_t config_reg = 0x1; - uint32_t placeholder; - gemmini_counter_access(placeholder, config_reg); -} - -int ceil_divide_int(int a, int b) { - int c = (a % b == 0) ? ((int)(a / b)) : (((int)(a / b)) + 1); - if (a < b) - c = 1; - return c; -} - -// weight-stationary matmul loop -#define gemmini_loop_ws(I, J, K, pad_I, pad_J, pad_K, A, B, D, C, A_stride, \ - B_stride, D_stride, C_stride, A_transpose, \ - B_transpose, full_C, low_D, ex_accumulate, act, \ - a_spad_id, b_spad_id, is_resadd) \ - {ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, \ - ((uint64_t)(pad_K) << 32) | \ - ((uint64_t)(pad_J) << 16) | (uint64_t)(pad_I), \ - ((uint64_t)(K) << 32) | ((uint64_t)(J) << 16) | \ - (uint64_t)(I), \ - k_LOOP_WS_CONFIG_BOUNDS) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, A, B, k_LOOP_WS_CONFIG_ADDRS_AB) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, D, C, \ - k_LOOP_WS_CONFIG_ADDRS_DC) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, A_stride, B_stride, \ - k_LOOP_WS_CONFIG_STRIDES_AB) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, D_stride, C_stride, \ - k_LOOP_WS_CONFIG_STRIDES_DC) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(a_spad_id) << 18) | \ - ((uint64_t)(b_spad_id) << 16) | \ - ((uint64_t)(act) << 8) | ((low_D) << 2) | \ - ((full_C) << 1) | (ex_accumulate), \ - ((is_resadd) << 2) | ((B_transpose) << 1) | \ - (A_transpose), \ - k_LOOP_WS)} - -// weight-stationary conv loop -#define gemmini_loop_conv_ws( \ - batch_size, in_row_dim, in_col_dim, in_channels, out_channels, \ - out_row_dim, out_col_dim, pool_out_row_dim, pool_out_col_dim, stride, \ - padding, kernel_dim, kernel_dilation, pool_size, pool_stride, \ - pool_padding, batches, porows, pocols, pochs, krows, kcols, kchs, lpad, \ - rpad, upad, dpad, plpad, prpad, pupad, pdpad, orows, ocols, weights, \ - output, bias, input, no_bias, no_pool, downsample, wrot180, input_dilated, \ - activation, trans_output_1203, trans_weight_1203, trans_weight_0132, \ - trans_input_3120, max_pixels_per_row, in_stride, weight_stride, \ - out_stride, dw, a_spad_id, b_spad_id) \ - {ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(out_channels) << 48) | ((uint64_t)(in_channels) << 32) | \ - ((uint64_t)(in_row_dim) << 16) | (uint64_t)(batch_size), \ - ((uint64_t)(padding) << 56) | ((uint64_t)(stride) << 48) | \ - ((uint64_t)(out_col_dim) << 32) | \ - ((uint64_t)(pool_out_row_dim) << 16) | (uint64_t)(out_row_dim), \ - k_LOOP_CONV_WS_CONFIG_1) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(kernel_dim) << 48) | \ - ((uint64_t)(pool_out_col_dim) << 32) | \ - ((uint64_t)(pool_size) << 16) | \ - ((uint64_t)(pool_stride) << 8) | (uint64_t)(pool_padding), \ - ((uint64_t)(batches) << 48) | ((uint64_t)(porows) << 32) | \ - ((uint64_t)(pocols) << 16) | (uint64_t)(pochs), \ - k_LOOP_CONV_WS_CONFIG_2) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(krows) << 48) | ((uint64_t)(kcols) << 32) | \ - ((uint64_t)(kchs) << 16) | (uint64_t)(lpad), \ - ((uint64_t)(rpad) << 48) | ((uint64_t)(upad) << 32) | \ - ((uint64_t)(dpad) << 24) | ((uint64_t)(plpad) << 16) | \ - ((uint64_t)(in_col_dim)), \ - k_LOOP_CONV_WS_CONFIG_3) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(orows) << 48) | ((uint64_t)(prpad) << 32) | \ - ((uint64_t)(pupad) << 21) | ((uint64_t)(pdpad) << 10) | \ - (uint64_t)(kernel_dilation), \ - ((uint64_t)(in_stride) << 48) | \ - ((uint64_t)(weight_stride) << 32) | \ - ((uint64_t)(out_stride) << 16) | (uint64_t)(ocols), \ - k_LOOP_CONV_WS_CONFIG_4) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, weights, output, \ - k_LOOP_CONV_WS_CONFIG_5) \ - ROCC_INSTRUCTION_RS1_RS2(XCUSTOM_ACC, bias, input, \ - k_LOOP_CONV_WS_CONFIG_6) \ - ROCC_INSTRUCTION_RS1_RS2( \ - XCUSTOM_ACC, \ - ((uint64_t)(a_spad_id) << 18) | \ - ((uint64_t)(b_spad_id) << 16) | \ - ((uint64_t)(max_pixels_per_row) << 8) | \ - ((dw) << 6) | ((trans_input_3120) << 5) | \ - ((trans_weight_0132) << 4) | \ - ((trans_weight_1203) << 3) | \ - ((trans_output_1203) << 2) | \ - ((wrot180) << 1) | (no_bias), \ - ((activation) << 3) | ((input_dilated) << 2) | \ - ((downsample) << 1) | (no_pool), \ - k_LOOP_CONV_WS)} - -// Tiling functions -static void -sp_tiled_matmul_os(const elem_t *A, const elem_t *B, const void *D, void *C, - scale_t A_scale_factor, scale_t B_scale_factor, - scale_acc_t D_scale_factor, size_t I, size_t J, size_t K, - size_t pad_I, size_t pad_J, size_t pad_K, - size_t A_row_stride, size_t B_row_stride, - size_t D_row_stride, size_t C_row_stride, bool a_transpose, - bool b_transpose, bool full_C, bool low_D, bool no_bias, - bool repeating_bias, int act, int a_spad_id, int b_spad_id) { - - const uint32_t A_sp_addr_start = 0; - const uint32_t B_sp_addr_start = BANK_NUM * BANK_ROWS - K * J * DIM; - const uint32_t D_sp_addr_start = 1 << (ADDR_LEN - 1); - const uint32_t C_sp_addr_start = - (3 << (ADDR_LEN - 2)) | (full_C << (ADDR_LEN - 3)); - - const int A_blocks = K <= MAX_BLOCK_LEN ? K : MAX_BLOCK_LEN; - const int B_blocks = J <= MAX_BLOCK_LEN ? J : MAX_BLOCK_LEN; - const int D_blocks = J <= MAX_BLOCK_LEN_ACC ? J : MAX_BLOCK_LEN_ACC; - - // Move-in D - if (D != NULL && !no_bias) { - const size_t D_stride = repeating_bias ? 0 : D_row_stride * sizeof(acc_t); - gemmini_extended_config_ld(D_stride, D_scale_factor); - - for (size_t i = 0; i < I; i++) { - for (size_t j = 0; j < J; j += D_blocks) { - const size_t bias_row = repeating_bias ? 0 : i; - const acc_t *const D_dram_addr = - (acc_t *)D + (bias_row * D_row_stride + j) * DIM; - - const uint32_t D_sp_addr_acc = D_sp_addr_start + (i * J + j) * DIM; - - const size_t blocks = j + D_blocks <= J ? D_blocks : J - j; - - const size_t cols = blocks * DIM - (j + blocks >= J ? pad_J : 0); - const size_t rows = DIM - (i == I - 1 ? pad_I : 0); - - gemmini_extended_mvin(D_dram_addr, D_sp_addr_acc, cols, rows); - } - } - } - - // Move-in B - gemmini_extended_config_ld(B_row_stride * sizeof(elem_t), B_scale_factor); - for (size_t j = 0; j < J; j += B_blocks) { - for (size_t k = 0; k < K; k++) { - const elem_t *const B_dram_addr = B + (k * B_row_stride + j) * DIM; - const uint32_t B_sp_addr = B_sp_addr_start + (k * J + j) * DIM; - const size_t blocks = j + B_blocks <= J ? B_blocks : J - j; - const size_t cols = blocks * DIM - (j + blocks >= J ? pad_J : 0); - const size_t rows = DIM - (k == K - 1 ? pad_K : 0); - gemmini_extended_mvin(B_dram_addr, B_sp_addr, cols, rows); - } - } - - // Move-in A - gemmini_extended_config_ld(A_row_stride * sizeof(elem_t), A_scale_factor); - for (size_t i = 0; i < I; i++) { - for (size_t k = 0; k < K; k += A_blocks) { - const elem_t *const A_dram_addr = A + (i * A_row_stride + k) * DIM; - const uint32_t A_sp_addr = A_sp_addr_start + (i * K + k) * DIM; - const size_t blocks = k + A_blocks <= K ? A_blocks : K - k; - const size_t cols = blocks * DIM - (k + blocks >= K ? pad_K : 0); - const size_t rows = DIM - (i == I - 1 ? pad_I : 0); - gemmini_extended_mvin(A_dram_addr, A_sp_addr, cols, rows); - } - } - - for (size_t i = 0; i < I; i++) { - for (size_t j = 0; j < J; j++) { - const uint32_t C_sp_addr = C_sp_addr_start + (i * J + j) * DIM; - - for (size_t k = 0; k < K; k++) { - - const uint32_t A_sp_addr = A_sp_addr_start + (i * K + k) * DIM; - const uint32_t B_sp_addr = B_sp_addr_start + (k * J + j) * DIM; - - uint32_t out_sp_addr = k == K - 1 ? C_sp_addr : GARBAGE_ADDR; - - // If we're not using a bias, then we want to overwrite what's in the - // accumulator, rather than writing over it - int no_bias_new_matrix = no_bias && D != NULL && k == K - 1; - if (no_bias_new_matrix) { - out_sp_addr &= ~(1 << (ADDR_LEN - 2)); - } - - const size_t A_cols = DIM - (k == K - 1 ? pad_K : 0); - const size_t A_rows = DIM - (i == I - 1 ? pad_I : 0); - const size_t B_cols = DIM - (j == J - 1 ? pad_J : 0); - const size_t B_rows = DIM - (k == K - 1 ? pad_K : 0); - const size_t C_cols = DIM - (j == J - 1 ? pad_J : 0); - const size_t C_rows = DIM - (i == I - 1 ? pad_I : 0); - - gemmini_extended_preload(GARBAGE_ADDR, out_sp_addr, DIM, DIM, C_cols, - C_rows); - - if (k == 0) { // First iteration - gemmini_extended_compute_preloaded(A_sp_addr, B_sp_addr, A_cols, - A_rows, B_cols, B_rows); - } else { // All other iterations - gemmini_extended_compute_accumulated(A_sp_addr, B_sp_addr, A_cols, - A_rows, B_cols, B_rows); - } - } - } - } - - // Move-out C - if (C != NULL) { - const size_t sizeof_C = full_C ? sizeof(acc_t) : sizeof(elem_t); - - for (size_t i = 0; i < I; i++) { - for (size_t j = 0; j < J; j++) { - void *const C_dram_addr = - (int8_t *)C + (i * C_row_stride + j) * DIM * sizeof_C; - const uint32_t C_sp_addr = C_sp_addr_start + (i * J + j) * DIM; - - const size_t C_cols = DIM - (j == J - 1 ? pad_J : 0); - const size_t C_rows = DIM - (i == I - 1 ? pad_I : 0); - - gemmini_extended_mvout(C_dram_addr, C_sp_addr, C_cols, C_rows); - } - } - } -} - -static void -sp_tiled_matmul_ws(const elem_t *A, const elem_t *B, const void *D, void *C, - scale_t A_scale_factor, scale_t B_scale_factor, - scale_acc_t D_scale_factor, size_t I, size_t J, size_t K, - size_t pad_I, size_t pad_J, size_t pad_K, - size_t A_row_stride, size_t B_row_stride, - size_t D_row_stride, size_t C_row_stride, bool a_transpose, - bool b_transpose, bool full_C, bool low_D, bool no_bias, - bool repeating_bias, int act, int a_spad_id, int b_spad_id) { - /* - const uint32_t A_sp_addr_start = 0; - const uint32_t B_sp_addr_start = BANK_NUM * BANK_ROWS - K * J * DIM; - const uint32_t D_sp_addr_start = 1 << (ADDR_LEN-1); - const uint32_t C_sp_addr_start = 3 << (ADDR_LEN-2) | (full_C << - (ADDR_LEN-3)); const int A_blocks = a_transpose ? (I <= MAX_BLOCK_LEN ? I : - MAX_BLOCK_LEN) : (K <= MAX_BLOCK_LEN ? K : MAX_BLOCK_LEN); const int - B_blocks = b_transpose ? (K <= MAX_BLOCK_LEN ? K : MAX_BLOCK_LEN) : (J <= - MAX_BLOCK_LEN ? J : MAX_BLOCK_LEN); const int D_blocks = low_D ? (J <= - MAX_BLOCK_LEN ? J : MAX_BLOCK_LEN) : (J <= MAX_BLOCK_LEN_ACC ? J : - MAX_BLOCK_LEN_ACC); const int C_blocks = full_C ? 1 : (J <= MAX_BLOCK_LEN ? - J : MAX_BLOCK_LEN); const size_t sizeof_D = low_D ? sizeof(elem_t) : - sizeof(acc_t); const size_t sizeof_C = full_C ? sizeof(acc_t) : - sizeof(elem_t); - // Move-in D - if (D != NULL && !no_bias) { - for (size_t i = 0; i < I; i++) { - const size_t rows = DIM - (i == I-1 ? pad_I : 0); - for (size_t j = 0; j < J; j += D_blocks) { - const size_t bias_row = repeating_bias ? 0 : i; - const void * const D_dram_addr = (int8_t *)D + (bias_row * - D_row_stride + j)*DIM*sizeof_D; const uint32_t D_sp_addr_acc = - D_sp_addr_start + (i*J + j)*DIM; size_t blocks = j + D_blocks <= J ? - D_blocks : J-j; const size_t cols = blocks * DIM - (j + blocks >= J ? pad_J - : 0); gemmini_extended_mvin3(D_dram_addr, D_sp_addr_acc, cols, rows); - } - } - } - for (size_t k = 0; k < K; k++) { - for (size_t j = 0; j < J; j++) { - for (size_t i = 0; i < I; i++) { - const uint32_t A_sp_addr = a_transpose ? (A_sp_addr_start + (k*I + - i)*DIM) : (A_sp_addr_start + (i*K + k)*DIM); const uint32_t B_sp_addr = - b_transpose ? (B_sp_addr_start + (j*K + k)*DIM) : (B_sp_addr_start + (k*J + - j)*DIM); const uint32_t C_sp_addr = C_sp_addr_start + (i*J + j)*DIM; - // Mvin A - if (a_transpose) { - if (j == 0 && i % A_blocks == 0) { - const elem_t * const A_dram_addr = A + (k*A_row_stride + i)*DIM; - const size_t blocks = i + A_blocks <= I ? A_blocks : I-i; - const size_t cols = blocks * DIM - (i + blocks >= I ? pad_I : 0); - const size_t rows = DIM - (k == K-1 ? pad_K : 0); - gemmini_extended_mvin(A_dram_addr, A_sp_addr, cols, rows); - } - } else { - if (j == 0 && k % A_blocks == 0) { - const elem_t * const A_dram_addr = A + (i*A_row_stride + k)*DIM; - const size_t blocks = k + A_blocks <= K ? A_blocks : K-k; - const size_t cols = blocks * DIM - (k + blocks >= K ? pad_K : 0); - const size_t rows = DIM - (i == I-1 ? pad_I : 0); - gemmini_extended_mvin(A_dram_addr, A_sp_addr, cols, rows); - } - } - // Mvin B - if (b_transpose) { - if (i == 0 && k % B_blocks == 0) { - const elem_t * const B_dram_addr = B + (j*B_row_stride + k)*DIM; - const size_t blocks = k + B_blocks <= K ? B_blocks : K-k; - const size_t cols = blocks * DIM - (k + blocks >= K ? pad_K : 0); - const size_t rows = DIM - (j == J-1 ? pad_J : 0); - gemmini_extended_mvin2(B_dram_addr, B_sp_addr, cols, rows); - } - } else { - if (i == 0 && j % B_blocks == 0) { - const elem_t * const B_dram_addr = B + (k*B_row_stride + j)*DIM; - const size_t blocks = j + B_blocks <= J ? B_blocks : J-j; - const size_t cols = blocks * DIM - (j + blocks >= J ? pad_J : 0); - const size_t rows = DIM - (k == K-1 ? pad_K : 0); - gemmini_extended_mvin2(B_dram_addr, B_sp_addr, cols, rows); - } - } - // Compute - { - uint32_t pre_sp_addr = i == 0 ? B_sp_addr : GARBAGE_ADDR; - uint32_t out_sp_addr = C_sp_addr; - // If we're not using a bias, then we want to overwrite what's in - the - // accumulator, rather than writing over it - int no_bias_new_matrix = no_bias && D != NULL && k == 0; - if (no_bias_new_matrix) { - out_sp_addr &= ~(1 << (ADDR_LEN-2)); - } - const size_t A_cols = DIM - (k == K - 1 ? pad_K : 0); - const size_t A_rows = DIM - (i == I - 1 ? pad_I : 0); - const size_t B_cols = DIM - (j == J - 1 ? pad_J : 0); - const size_t B_rows = DIM - (k == K - 1 ? pad_K : 0); - const size_t C_cols = DIM - (j == J - 1 ? pad_J : 0); - const size_t C_rows = DIM - (i == I - 1 ? pad_I : 0); - gemmini_extended_preload(pre_sp_addr, out_sp_addr, B_cols, B_rows, - C_cols, C_rows); if (i == 0) { // First iteration - gemmini_extended_compute_preloaded(A_sp_addr, GARBAGE_ADDR, - A_cols, A_rows, DIM, DIM); } else { // All other iterations - gemmini_extended_compute_accumulated(A_sp_addr, GARBAGE_ADDR, - A_cols, A_rows, DIM, DIM); - } - } - if (C != NULL && k == K-1) { - // Move-out C (if not normalizing) - if (((act != LAYERNORM) && (act != SOFTMAX)) && (j == J-1 || j % - C_blocks == C_blocks-1)) { const size_t rounded_j = (j / C_blocks) * - C_blocks; const uint32_t rounded_C_sp_addr = C_sp_addr_start + (i*J + - rounded_j)*DIM; void * const C_dram_addr = (int8_t*)C + (i*C_row_stride + - rounded_j)*DIM*sizeof_C; const size_t blocks = rounded_j + C_blocks <= J ? - C_blocks : J-rounded_j; const size_t cols = blocks * DIM - (rounded_j + - blocks >= J ? pad_J : 0); const size_t rows = DIM - (i == I - 1 ? pad_I : - 0); gemmini_extended_mvout(C_dram_addr, rounded_C_sp_addr, cols, rows); - } - // Move-out C (if normalizing) - if (act == LAYERNORM && j == J - 1) { - uint32_t norm_cmds[][2] = {{1,2},{3,4},{0,0}}; - const int norm_cmds_size = sizeof(norm_cmds) / - sizeof(norm_cmds[0]); const size_t rows = DIM - (i == I-1 ? pad_I : 0); for - (size_t row = 0; row < rows; row += NORM_STAT_IDS) { const size_t stat_ids = - rows - row > NORM_STAT_IDS ? NORM_STAT_IDS : rows - row; for (int cmd = 0; - cmd < norm_cmds_size; cmd++) { for (size_t stat_id = 0; stat_id < stat_ids; - stat_id++) { gemmini_config_norm(0, 0, 0, 0, stat_id, 0, 0); const size_t r - = row + stat_id; for (size_t jj = 0; jj < J; jj += C_blocks) { uint32_t - norm_C_sp_addr = C_sp_addr_start + (i*J + jj)*DIM + r; if (jj + C_blocks >= - J) { norm_C_sp_addr |= (norm_cmds[cmd][1] << 26); // Final mean/inv-std-dev - calculation } else { norm_C_sp_addr |= (norm_cmds[cmd][0] << 26); // - Accumulate sum/variance - } - void * const C_dram_addr = (int8_t*)C + - (i*C_row_stride + jj) * DIM * sizeof_C + - r * C_row_stride * sizeof_C; - const size_t blocks = jj + C_blocks <= J ? C_blocks : - J-jj; const size_t cols = blocks * DIM - (jj + blocks >= J ? pad_J : 0); - gemmini_extended_mvout(C_dram_addr, norm_C_sp_addr, cols, - 1); - } - } - } - } - } else if (act == SOFTMAX && j == J - 1) { - uint32_t norm_cmds[][2] = {{5,5},{6,7},{0,0}}; - const int norm_cmds_size = sizeof(norm_cmds) / - sizeof(norm_cmds[0]); const size_t rows = DIM - (i == I-1 ? pad_I : 0); for - (size_t row = 0; row < rows; row += NORM_STAT_IDS) { const size_t stat_ids = - rows - row > NORM_STAT_IDS ? NORM_STAT_IDS : rows - row; for (int cmd = 0; - cmd < norm_cmds_size; cmd++) { for (size_t stat_id = 0; stat_id < stat_ids; - stat_id++) { - // set stat id only - gemmini_config_norm(0, 0, 1, 0, stat_id, 0, 0); - const size_t r = row + stat_id; - for (size_t jj = 0; jj < J; jj += C_blocks) { - uint32_t norm_C_sp_addr = C_sp_addr_start + (i*J + jj)*DIM - + r; if (jj + C_blocks >= J) { norm_C_sp_addr |= (norm_cmds[cmd][1] << 26); - // Final mean/inv-std-dev calculation } else { norm_C_sp_addr |= - (norm_cmds[cmd][0] << 26); // Accumulate sum/variance - } - void * const C_dram_addr = (int8_t*)C + - (i*C_row_stride + jj) * DIM * sizeof_C + - r * C_row_stride * sizeof_C; - const size_t blocks = jj + C_blocks <= J ? C_blocks : - J-jj; const size_t cols = blocks * DIM - (jj + blocks >= J ? pad_J : 0); - gemmini_extended_mvout(C_dram_addr, norm_C_sp_addr, cols, - 1); - } - } - } - } - } - } - } - } - } - */ - - // Combined loop - gemmini_loop_ws(I, J, K, pad_I, pad_J, pad_K, A, B, no_bias ? NULL : D, C, - A_row_stride, B_row_stride, repeating_bias ? 0 : D_row_stride, - C_row_stride, a_transpose, b_transpose, full_C, low_D, - !no_bias || D == NULL, act, a_spad_id, b_spad_id, false); -} - -static void tiled_matmul_outer( - size_t dim_I, size_t dim_J, size_t dim_K, const elem_t *A, const elem_t *B, - const void *D, void *C, size_t stride_A, size_t stride_B, size_t stride_D, - size_t stride_C, scale_t A_scale_factor, scale_t B_scale_factor, - scale_acc_t D_scale_factor, size_t tile_I, size_t tile_J, size_t tile_K, - int act, acc_scale_t scale, acc_scale_t bert_scale, bool repeating_bias, - bool a_transpose, bool b_transpose, bool full_C, bool low_D, - uint8_t weightA, int dataflow) { - - const size_t dim_I_padded = (dim_I / DIM + (dim_I % DIM != 0)) * DIM; - const size_t dim_J_padded = (dim_J / DIM + (dim_J % DIM != 0)) * DIM; - const size_t dim_K_padded = (dim_K / DIM + (dim_K % DIM != 0)) * DIM; - - const size_t I0 = - dim_I_padded / (tile_I * DIM) + (dim_I_padded % (tile_I * DIM) != 0); - const size_t J0 = - dim_J_padded / (tile_J * DIM) + (dim_J_padded % (tile_J * DIM) != 0); - const size_t K0 = - dim_K_padded / (tile_K * DIM) + (dim_K_padded % (tile_K * DIM) != 0); - - // These lines here are supposed to help us deal with when the dimensions of - // the systolic array aren't divisible by the tiling factors - const size_t last_I = dim_I_padded % (tile_I * DIM) == 0 - ? tile_I - : (dim_I_padded / DIM) % tile_I; - const size_t last_J = dim_J_padded % (tile_J * DIM) == 0 - ? tile_J - : (dim_J_padded / DIM) % tile_J; - const size_t last_K = dim_K_padded % (tile_K * DIM) == 0 - ? tile_K - : (dim_K_padded / DIM) % tile_K; - - // These lines are supposed to figure out how much padding the hardware is - // supposed to add for the final tile - const size_t padding_I = dim_I_padded - dim_I; - const size_t padding_J = dim_J_padded - dim_J; - const size_t padding_K = dim_K_padded - dim_K; - - const bool no_bias = D == NULL; - - if (no_bias) { - D = (void *)1; // Dummy address which isn't NULL - } - - const size_t sizeof_D = low_D ? sizeof(elem_t) : sizeof(acc_t); - const size_t sizeof_C = full_C ? sizeof(acc_t) : sizeof(elem_t); - - gemmini_extended_config_ex(dataflow, act & 3, 0, 1, a_transpose, b_transpose); - gemmini_extended_config_st(stride_C * sizeof_C, act & 3, scale); - gemmini_extended3_config_ld(stride_A * sizeof(elem_t), A_scale_factor, false, - 0); - gemmini_extended3_config_ld(stride_B * sizeof(elem_t), B_scale_factor, false, - 1) - gemmini_extended3_config_ld(repeating_bias ? 0 : (stride_D * sizeof_D), - D_scale_factor, low_D, 2); - - if (act == IGELU) { - const acc_scale_t sqrt_2 = 1.41421356237; - const acc_scale_t S = bert_scale; - const acc_scale_t S_erf = (-0.2888 * ((S * S) / 2)); - - const acc_t qb = -1.769 / (S / sqrt_2); - const acc_t qc = 1.0 / S_erf; - - gemmini_config_norm(0, 0, 0, 0, 0, qb, qc); - } - - if (act == SOFTMAX) { - const scale_t a = 0.3585; - const scale_t b = 1.353; - const scale_t c = 0.344; - - const acc_t qln2 = (int)(0.693147 / bert_scale); - const acc_t qln2_inv = 65536 / qln2; - const acc_t qb = b / bert_scale; - const acc_t qc = c / (a * bert_scale * bert_scale); - - gemmini_config_norm(qln2, 0, 0, 1, 0, qb, qc); - gemmini_config_norm(qln2_inv, 1, 0, 1, 0, qb, qc); - } - - void (*inner)(const elem_t *, const elem_t *, const void *, void *, scale_t, - scale_t, scale_acc_t, size_t, size_t, size_t, size_t, size_t, - size_t, size_t, size_t, size_t, size_t, bool, bool, bool, bool, - bool, bool, int, int, int); - - if (dataflow == OUTPUT_STATIONARY) { - inner = &sp_tiled_matmul_os; - } else /* if (dataflow == WEIGHT_STATIONARY) */ { - inner = &sp_tiled_matmul_ws; - } - - // reuse operand if it fits scratchpad - int a_spad_id = 0; - int b_spad_id = 0; - bool b_reuse = (J0 * K0 <= 2) && (dataflow == WEIGHT_STATIONARY); - bool a_reuse = (I0 * K0 <= 2) && (dataflow == WEIGHT_STATIONARY); - - for (size_t i0 = 0; i0 < I0; i0++) - for (size_t j0 = 0; j0 < J0; j0++) - for (size_t k0 = 0; k0 < K0; k0++) { - if (a_reuse) - a_spad_id = ((i0 + k0) == 0) ? 1 : 2; - if (b_reuse) - b_spad_id = ((j0 + k0) == 0) ? 1 : 2; - - const void *pre; - if (k0 != 0) { - pre = NULL; - } else { - size_t bias_row = repeating_bias ? 0 : i0 * tile_I * DIM; - // pre = &(((acc_t*)D)[bias_row * stride_D + j0 * tile_J * DIM]); - pre = (int8_t *)D + - (bias_row * stride_D + j0 * tile_J * DIM) * sizeof_D; - } - - void *out = k0 == K0 - 1 ? (int8_t *)C + (i0 * tile_I * DIM * stride_C + - j0 * tile_J * DIM) * - sizeof_C - : NULL; - - const size_t I = i0 < I0 - 1 ? tile_I : last_I; - const size_t J = j0 < J0 - 1 ? tile_J : last_J; - const size_t K = k0 < K0 - 1 ? tile_K : last_K; - - const size_t pad_I = i0 == I0 - 1 ? padding_I : 0; - const size_t pad_J = j0 == J0 - 1 ? padding_J : 0; - const size_t pad_K = k0 == K0 - 1 ? padding_K : 0; - - const elem_t *a = - a_transpose - ? (A + k0 * tile_K * DIM * stride_A + i0 * tile_I * DIM) - : (A + i0 * tile_I * DIM * stride_A + k0 * tile_K * DIM); - - const elem_t *b = - b_transpose - ? (B + j0 * tile_J * DIM * stride_B + k0 * tile_K * DIM) - : (B + k0 * tile_K * DIM * stride_B + j0 * tile_J * DIM); - - if (a_reuse && j0 >= 1) - a = NULL; - if (b_reuse && i0 >= 1) - b = NULL; - // printf("a_reuse: %d, b_reuse: %d, a_spad_id: %d, b_spad_id: %d, a: - // %llu, b: %llu \n", a_reuse, b_reuse, a_spad_id, b_spad_id, a, b); - (*inner)(a, b, pre, out, A_scale_factor, B_scale_factor, D_scale_factor, - I, J, K, pad_I, pad_J, pad_K, stride_A, stride_B, stride_D, - stride_C, a_transpose, b_transpose, full_C, low_D, no_bias, - repeating_bias, act, a_spad_id, b_spad_id); - } - - gemmini_fence(); -} - -static acc_t int_sqrt(acc_t n) { - if (n == 0) - return 0; - - int bits = 0; - for (acc_t x = n; x > 0; x /= 2) - bits++; - - acc_t x_prev = 1 << ((bits + 1) / 2); - - while (1) { - acc_t x_next = (x_prev + n / x_prev) / 2; - if (x_next >= x_prev) - return x_prev; - x_prev = x_next; - }; -} - -static elem_t scale_and_sat(acc_t x, int act, acc_scale_t scale, - acc_scale_t bert_scale) { - // Apply I-GELU if needed - if (act == IGELU) { - const acc_scale_t sqrt_2 = 1.41421356237; - - const acc_scale_t S = bert_scale; - - const acc_scale_t S_erf = (-0.2888 * (S / sqrt_2) * (S / sqrt_2)); - const acc_t q1 = 1 / S_erf; - const acc_t qb = -1.769 / (S / sqrt_2); - const acc_t qc = 1.0 / (-0.2888 * (S / sqrt_2) * (S / sqrt_2)); - - const acc_t q = x; - - const acc_t q_sign = q < 0 ? -1 : 1; - const acc_t q_clipped = abs(q) > (-qb) ? (-qb) : abs(q); - const acc_t q_poly = (q_clipped + qb) * (q_clipped + qb) + qc; - const acc_t q_erf = q_sign * q_poly; - - x = q * (q_erf + q1); - } - - // Scale value down and round it - x = ACC_SCALE(x, scale); - // Clip result - x = x > elem_t_max ? elem_t_max : (x < elem_t_min ? elem_t_min : x); - // Apply activation function - if (act == RELU) { - x = x < 0 ? 0 : x; - } - return x; -} - -#ifdef HAS_MVIN_SCALE -#define GEMMINI_SCALE(x, scale) MVIN_SCALE((x), (scale)) -#else -#define GEMMINI_SCALE(x, scale) (x) -#endif - -#ifdef HAS_MVIN_ACC_SCALE -#define GEMMINI_ACC_SCALE(x, scale) MVIN_SCALE_ACC((x), (scale)) -#else -#define GEMMINI_ACC_SCALE(x, scale) (x) -#endif - -static void matmul_cpu(bool transA, bool transB, size_t DIM_I, size_t DIM_J, - size_t DIM_K, const elem_t *A, const elem_t *B, - const acc_t *D, elem_t *C, size_t stride_A, - size_t stride_B, size_t stride_D, size_t stride_C, - scale_t A_scale_factor, scale_t B_scale_factor, - scale_acc_t D_scale_factor, int act, acc_scale_t scale, - acc_scale_t bert_scale, bool repeating_bias) { - - const int no_bias = D == NULL; - if (act != LAYERNORM && act != SOFTMAX && !transA && !transB && - DIM_I % 4 == 0 && DIM_J % 4 == 0) { - for (size_t i = 0; i < DIM_I; i += 4) { - for (size_t j = 0; j < DIM_J; j += 4) { - - acc_t result[4][4]; // = {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, - // 0, 0, 0}}; - - for (size_t ii = 0; ii < 4; ii++) - for (size_t jj = 0; jj < 4; jj++) { - const size_t bias_row = repeating_bias ? 0 : i + ii; - result[ii][jj] = - no_bias ? 0 - : GEMMINI_ACC_SCALE(*(D + bias_row * stride_D + j + jj), - D_scale_factor); - } - - for (size_t k = 0; k < DIM_K; k++) { - result[0][0] += - GEMMINI_SCALE(*(A + i * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j), B_scale_factor); - result[0][1] += - GEMMINI_SCALE(*(A + i * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 1), B_scale_factor); - result[0][2] += - GEMMINI_SCALE(*(A + i * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 2), B_scale_factor); - result[0][3] += - GEMMINI_SCALE(*(A + i * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 3), B_scale_factor); - result[1][0] += - GEMMINI_SCALE(*(A + (i + 1) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j), B_scale_factor); - result[1][1] += - GEMMINI_SCALE(*(A + (i + 1) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 1), B_scale_factor); - result[1][2] += - GEMMINI_SCALE(*(A + (i + 1) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 2), B_scale_factor); - result[1][3] += - GEMMINI_SCALE(*(A + (i + 1) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 3), B_scale_factor); - result[2][0] += - GEMMINI_SCALE(*(A + (i + 2) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j), B_scale_factor); - result[2][1] += - GEMMINI_SCALE(*(A + (i + 2) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 1), B_scale_factor); - result[2][2] += - GEMMINI_SCALE(*(A + (i + 2) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 2), B_scale_factor); - result[2][3] += - GEMMINI_SCALE(*(A + (i + 2) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 3), B_scale_factor); - result[3][0] += - GEMMINI_SCALE(*(A + (i + 3) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j), B_scale_factor); - result[3][1] += - GEMMINI_SCALE(*(A + (i + 3) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 1), B_scale_factor); - result[3][2] += - GEMMINI_SCALE(*(A + (i + 3) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 2), B_scale_factor); - result[3][3] += - GEMMINI_SCALE(*(A + (i + 3) * stride_A + k), A_scale_factor) * - GEMMINI_SCALE(*(B + k * stride_B + j + 3), B_scale_factor); - } - - *(C + i * stride_C + j) = - scale_and_sat(result[0][0], act, scale, bert_scale); - *(C + i * stride_C + j + 1) = - scale_and_sat(result[0][1], act, scale, bert_scale); - *(C + i * stride_C + j + 2) = - scale_and_sat(result[0][2], act, scale, bert_scale); - *(C + i * stride_C + j + 3) = - scale_and_sat(result[0][3], act, scale, bert_scale); - *(C + (i + 1) * stride_C + j) = - scale_and_sat(result[1][0], act, scale, bert_scale); - *(C + (i + 1) * stride_C + j + 1) = - scale_and_sat(result[1][1], act, scale, bert_scale); - *(C + (i + 1) * stride_C + j + 2) = - scale_and_sat(result[1][2], act, scale, bert_scale); - *(C + (i + 1) * stride_C + j + 3) = - scale_and_sat(result[1][3], act, scale, bert_scale); - *(C + (i + 2) * stride_C + j) = - scale_and_sat(result[2][0], act, scale, bert_scale); - *(C + (i + 2) * stride_C + j + 1) = - scale_and_sat(result[2][1], act, scale, bert_scale); - *(C + (i + 2) * stride_C + j + 2) = - scale_and_sat(result[2][2], act, scale, bert_scale); - *(C + (i + 2) * stride_C + j + 3) = - scale_and_sat(result[2][3], act, scale, bert_scale); - *(C + (i + 3) * stride_C + j) = - scale_and_sat(result[3][0], act, scale, bert_scale); - *(C + (i + 3) * stride_C + j + 1) = - scale_and_sat(result[3][1], act, scale, bert_scale); - *(C + (i + 3) * stride_C + j + 2) = - scale_and_sat(result[3][2], act, scale, bert_scale); - *(C + (i + 3) * stride_C + j + 3) = - scale_and_sat(result[3][3], act, scale, bert_scale); - } - } - } else { - size_t A_dim_strides[2] = {!transA ? stride_A : 1, - !transA ? 1 : stride_A}; // i, j stride - size_t B_dim_strides[2] = {!transB ? 1 : stride_B, - !transB ? stride_B : 1}; // j, k stride - - // We also create a buffer that we can use for layernorms and softmaxes - static acc_t c_buffer[1024]; - const size_t c_buffer_sz = sizeof(c_buffer) / sizeof(c_buffer[0]); - if ((act == LAYERNORM || act == SOFTMAX) && DIM_J > c_buffer_sz) { - printf("Matmul is too large to normalize\n"); - exit(1); - } - - for (size_t i = 0; i < DIM_I; i++) { - for (size_t j = 0; j < DIM_J; j++) { - elem_t *c = C + (i * stride_C) + j; - - const size_t bias_row = repeating_bias ? 0 : i; - acc_t sum = no_bias ? 0 - : GEMMINI_ACC_SCALE(*(D + bias_row * stride_D + j), - D_scale_factor); - - for (size_t k = 0; k < DIM_K; k++) { - const elem_t *a = A + i * A_dim_strides[0] + k * A_dim_strides[1]; - const elem_t *b = B + j * B_dim_strides[0] + k * B_dim_strides[1]; - sum += (GEMMINI_SCALE(*a, A_scale_factor) * - GEMMINI_SCALE(*b, B_scale_factor)); - } - - if (act == LAYERNORM || act == SOFTMAX) - c_buffer[j] = sum; - else - *c = scale_and_sat(sum, act, scale, bert_scale); - } - - if (act == LAYERNORM) { - acc_t sum = 0; - for (size_t j = 0; j < DIM_J; j++) - sum += c_buffer[j]; - acc_t mean = sum / (acc_t)DIM_J; - - acc_t total_err_sq = 0; - for (size_t j = 0; j < DIM_J; j++) - total_err_sq += (c_buffer[j] - mean) * (c_buffer[j] - mean); - acc_t variance = total_err_sq / (acc_t)DIM_J; - - acc_t stddev = int_sqrt(variance); - if (variance == 0) - stddev = 1; - - for (size_t j = 0; j < DIM_J; j++) { - c_buffer[j] -= mean; - // c_buffer[j] /= stddev; - c_buffer[j] = ROUND_NEAR_EVEN( - (double)c_buffer[j] / - stddev); // TODO I don't think I-BERT uses round-near-even, so we - // shouldn't either. We just use this rounding mode here - // in order to match the hardware. - - elem_t *c = C + (i * stride_C) + j; - *c = scale_and_sat(c_buffer[j], act, scale, bert_scale); - } - } else if (act == SOFTMAX) { - const scale_t a = 0.3585; - const scale_t b = 1.353; - const scale_t c = 0.344; - - // is SCALE supposed to be input scale? - const acc_t qln2 = (acc_t)(0.693147 / bert_scale); - const acc_t qln2_inv = 65536 / qln2; - const acc_t qb = b / bert_scale; - const acc_t qc = c / (a * bert_scale * bert_scale); - - // pass 1: get max_q - acc_t max_q = -2147483648; - for (size_t j = 0; j < DIM_J; j++) { - if (c_buffer[j] > max_q) - max_q = c_buffer[j]; - } - - // pass 2: calculate iexp(q_tilde) and sum(q_tilde) - acc_t sum_exp = 0; - for (size_t j = 0; j < DIM_J; j++) { - acc_t q = c_buffer[j] - max_q; - acc_t z = (acc_t)(-q * qln2_inv) >> 16; - acc_t qp = q + z * qln2; - acc_t q_exp = (qp + qb) * (qp + qb) + qc; - c_buffer[j] = q_exp >> z; - sum_exp += c_buffer[j]; - } - - // pass 3: divide by sum - scale_t factor = - (127.f) / (float)sum_exp; // what corresponds to 1 in output? - for (size_t j = 0; j < DIM_J; j++) { - elem_t *c = C + (i * stride_C) + j; - *c = scale_and_sat(c_buffer[j], act, factor, bert_scale); - } - } - } - } -} - -#undef GEMMINI_SCALE - -// General matmul which can be run with different dataflows, or on the CPU -enum tiled_matmul_type_t { - OS, - WS, - CPU -}; // TODO rename this so it's name also applies to convs - -// This function runs a tiled matrix mulctiplication, with hardcoded tiling -// factors -static void tiled_matmul( - size_t dim_I, size_t dim_J, size_t dim_K, const elem_t *A, const elem_t *B, - const void *D, void *C, size_t stride_A, size_t stride_B, size_t stride_D, - size_t stride_C, scale_t A_scale_factor, scale_t B_scale_factor, - scale_acc_t D_scale_factor, int act, acc_scale_t scale, - acc_scale_t bert_scale, bool repeating_bias, size_t tile_I, size_t tile_J, - size_t tile_K, bool transpose_A, bool transpose_B, bool full_C, bool low_D, - uint8_t weightA, enum tiled_matmul_type_t tiled_matmul_type) { - -#ifdef GEMMINI_ASSERTIONS - // Make sure that the tiling factors make sense - if (tile_I <= 0) { - printf("tile_I is non-positive\n"); - exit(1); - } else if (tile_J <= 0) { - printf("tile_J is non-positive\n"); - exit(1); - } else if (tile_K <= 0) { - printf("tile_K is non-positive\n"); - exit(1); - } - - const size_t dim_I_padded = (dim_I / DIM + (dim_I % DIM != 0)) * DIM; - const size_t dim_J_padded = (dim_J / DIM + (dim_J % DIM != 0)) * DIM; - const size_t dim_K_padded = (dim_K / DIM + (dim_K % DIM != 0)) * DIM; - - if (tile_I * DIM > dim_I_padded) { - printf("tile_I is too large (tile_I * DIM > dim_I_padded)\n"); - exit(1); - } else if (tile_J * DIM > dim_J_padded) { - printf("tile_J is too large (tile_J * DIM > dim_J_padded)\n"); - exit(1); - } else if (tile_K * DIM > dim_K_padded) { - printf("tile_K is too large (tile_K * DIM > dim_K_padded)\n"); - exit(1); - } - - const bool double_buffered = tiled_matmul_type == WS; - - const size_t total_spad_size = - double_buffered ? BANK_NUM * BANK_ROWS / 2 : BANK_NUM * BANK_ROWS; - const size_t total_acc_size = double_buffered ? ACC_ROWS / 2 : ACC_ROWS; - - const size_t total_spad_rows = (tile_I * tile_K * DIM) + // Rows to store A - (tile_K * tile_J * DIM); // Rows to store B - - if (total_spad_rows > total_spad_size) { - printf("Not enough space in scratchpad to store A and B matrices\n"); - exit(1); - } - - const size_t total_acc_rows = tile_I * tile_J * DIM; // Rows to store C - - if (total_acc_rows > total_acc_size) { - printf("Not enough space in accumulator to store C\n"); - exit(1); - } - - if (tile_I > 65535 || tile_J > 65535 || tile_K > 65535) { - printf("I, J, and K tiling factors must be less than 65535, to fit within " - "the bounds of the LOOP_WS function"); - exit(1); - } - - char matmul_type_str[][4] = {"OS", "WS", "CPU"}; - - // Check if transpose options are correct - if (((tiled_matmul_type == OS) && (transpose_A || transpose_B)) || - (tiled_matmul_type == WS && transpose_A && transpose_B)) { - printf("Not implemented: %s matmul, a_transpose=%d, b_transpose=%d\n", - matmul_type_str[tiled_matmul_type], transpose_A, transpose_B); - exit(1); - } - - // Check if full_C options are correct - if ((tiled_matmul_type == CPU && (full_C || low_D)) || - (tiled_matmul_type == OS && low_D)) { - printf("Not implemented: %s matmul, full_C=%d, low_D=%d\n", - matmul_type_str[tiled_matmul_type], full_C, low_D); - } - - if (act == LAYERNORM || act == SOFTMAX) { - if (tiled_matmul_type == OS) { - printf("Not implemented: %s matmul, act=%d\n", - matmul_type_str[tiled_matmul_type], act); - } - if (tile_J * DIM < dim_J) { - printf("When doing layernorm or softmax, the full J dimension of the " - "matrix must fit in the accumulator\n"); - } - } -#endif - - // Run a tiled matrix multiplication on either Gemmini or the CPU - if (tiled_matmul_type == OS || tiled_matmul_type == WS) { - tiled_matmul_outer(dim_I, dim_J, dim_K, A, B, D, C, stride_A, stride_B, - stride_D, stride_C, A_scale_factor, B_scale_factor, - D_scale_factor, tile_I, tile_J, tile_K, act, scale, - bert_scale, repeating_bias, transpose_A, transpose_B, - full_C, low_D, weightA, (int)tiled_matmul_type); - } else /*if (tiled_matmul_type == CPU)*/ { - matmul_cpu(transpose_A, transpose_B, dim_I, dim_J, dim_K, A, B, - (const acc_t *)D, (elem_t *)C, stride_A, stride_B, stride_D, - stride_C, A_scale_factor, B_scale_factor, D_scale_factor, act, - scale, bert_scale, repeating_bias); - } -} - -static size_t tiled_matmul_total_spad_rows(size_t I, size_t J, size_t K) { - return (I * K + K * J) * DIM; -} - -static size_t tiled_matmul_total_acc_rows(size_t I, size_t J) { - return (I * J) * DIM; -} - -// This function runs a tiled matrix multiplication, with automatically -// calculated tiling factors -static void -tiled_matmul_auto(size_t dim_I, size_t dim_J, size_t dim_K, const elem_t *A, - const elem_t *B, const void *D, void *C, size_t stride_A, - size_t stride_B, size_t stride_D, size_t stride_C, - scale_t A_scale_factor, scale_t B_scale_factor, - scale_acc_t D_scale_factor, int act, acc_scale_t scale, - acc_scale_t bert_scale, bool repeating_bias, bool transpose_A, - bool transpose_B, bool full_C, bool low_D, uint8_t weightA, - enum tiled_matmul_type_t tiled_matmul_type) { - -#define partition_rows (BANK_NUM * BANK_ROWS / 2) -#define mats_in_partition (partition_rows / DIM) -#define mats_in_acc (ACC_ROWS / DIM) -#define max_tile_i_j ((size_t)sqrt(mats_in_acc)) -#define max_tile_k (mats_in_partition / max_tile_i_j) - - // "db_" means "double-buffered" -#define db_partition_rows ((BANK_NUM * BANK_ROWS / 2) / 2) -#define db_mats_in_partition (db_partition_rows / DIM) -#define db_mats_in_acc ((ACC_ROWS / 2) / DIM) -#define db_max_tile_i_j ((size_t)sqrt(db_mats_in_acc)) -#define db_max_tile_k (db_mats_in_partition / db_max_tile_i_j) - - const size_t dim_I_padded = (dim_I / DIM + (dim_I % DIM != 0)) * DIM; - const size_t dim_J_padded = (dim_J / DIM + (dim_J % DIM != 0)) * DIM; - const size_t dim_K_padded = (dim_K / DIM + (dim_K % DIM != 0)) * DIM; - - const bool double_buffered = tiled_matmul_type == WS; - - const size_t max_spad_rows = - double_buffered ? BANK_NUM * BANK_ROWS / 2 : BANK_NUM * BANK_ROWS; - const size_t max_acc_rows = double_buffered ? ACC_ROWS / 2 : ACC_ROWS; - - size_t tile_I, tile_J, tile_K; - - if (act == LAYERNORM || act == SOFTMAX) { - tile_I = 1; - tile_J = dim_J_padded / DIM; - tile_K = 1; - } else if (double_buffered) { - tile_I = dim_I_padded / DIM < db_max_tile_i_j ? dim_I_padded / DIM - : db_max_tile_i_j; - tile_J = dim_J_padded / DIM < db_max_tile_i_j ? dim_J_padded / DIM - : db_max_tile_i_j; - tile_K = - dim_K_padded / DIM < db_max_tile_k ? dim_K_padded / DIM : db_max_tile_k; - } else { - tile_I = - dim_I_padded / DIM < max_tile_i_j ? dim_I_padded / DIM : max_tile_i_j; - tile_J = - dim_J_padded / DIM < max_tile_i_j ? dim_J_padded / DIM : max_tile_i_j; - tile_K = dim_K_padded / DIM < max_tile_k ? dim_K_padded / DIM : max_tile_k; - } - - // Fill scratchpad as much as possible - while (true) { - bool increased = false; - - if (tiled_matmul_total_spad_rows(tile_I, tile_J + 1, tile_K) <= - max_spad_rows && - tiled_matmul_total_acc_rows(tile_I, tile_J + 1) <= max_acc_rows && - (tile_J + 1) * DIM <= dim_J_padded) { - tile_J++; - increased = true; - } - - if (tiled_matmul_total_spad_rows(tile_I + 1, tile_J, tile_K) <= - max_spad_rows && - tiled_matmul_total_acc_rows(tile_I + 1, tile_J) <= max_acc_rows && - (tile_I + 1) * DIM <= dim_I_padded) { - tile_I++; - increased = true; - } - - if (tiled_matmul_total_spad_rows(tile_I, tile_J, tile_K + 1) <= - max_spad_rows && - (tile_K + 1) * DIM <= dim_K_padded) { - tile_K++; - increased = true; - } - - if (!increased) - break; - } - -#ifdef PRINT_TILE -#if PRINT_TILE - const int spad_rows = tiled_matmul_total_spad_rows(tile_I, tile_J, tile_K); - const int acc_rows = tiled_matmul_total_acc_rows(tile_I, tile_J); - - printf("tile_I: %d\n", tile_I); - printf("tile_J: %d\n", tile_J); - printf("tile_K: %d\n\n", tile_K); - - printf("spad_rows: %d\n", spad_rows); - printf("acc_rows: %d\n\n", acc_rows); - - printf("spad_row utilization: %d%%\n", (spad_rows * 100) / max_spad_rows); - printf("acc_row utilization: %d%%\n\n", (acc_rows * 100) / max_acc_rows); - - exit(EXIT_SUCCESS); -#endif -#endif - - tiled_matmul(dim_I, dim_J, dim_K, A, B, D, C, stride_A, stride_B, stride_D, - stride_C, A_scale_factor, B_scale_factor, D_scale_factor, act, - scale, bert_scale, repeating_bias, tile_I, tile_J, tile_K, - transpose_A, transpose_B, full_C, low_D, weightA, - tiled_matmul_type); - -#undef partition_rows -#undef mats_in_partition -#undef mats_in_acc -#undef max_tile_i_j -#undef max_tile_k -} - -static void -sp_tiled_conv(int batch_size, int in_row_dim, int in_col_dim, int in_channels, - int out_channels, int out_row_dim, int out_col_dim, - int pool_out_row_dim, int pool_out_col_dim, - - int stride, int padding, int kernel_dim, int kernel_dilation, - int in_stride, int weight_stride, int out_stride, - - int pool_size, int pool_stride, int pool_padding, - - int batches, int porows, int pocols, int pochs, int krows, - int kcols, int kchs, - - int lpad, int rpad, int upad, int dpad, int plpad, int prpad, - int pupad, int pdpad, - - const elem_t *input, const elem_t *weights, elem_t *output, - const acc_t *bias, - - int act, acc_scale_t scale, - - bool wrot180, bool trans_output_1203, bool trans_input_3120, - bool trans_weight_1203, bool trans_weight_0132, - - bool no_bias, bool no_pool, bool downsample, bool input_dilated, - bool dw, int a_spad_id, int b_spad_id) { - - // When dw convs are true, we assume that kchs and ochs are 1 - if (dw) { - kchs = 1; - pochs = 1; - } - - const int orows = porows * pool_stride + pool_size - 1 - pupad - pdpad; - const int ocols = pocols * pool_stride + pool_size - 1 - plpad - prpad; - const int ochs = pochs; - - // Calculate image dimensions - // Note: "irows" and "icols" includes padding - const int dilated_krows = krows + (kernel_dilation - 1) * (krows - 1); - const int dilated_kcols = kcols + (kernel_dilation - 1) * (kcols - 1); - int irows = orows * stride + dilated_krows - 1; - int icols = ocols * stride + dilated_kcols - 1; - int irows_unpadded = irows - upad - dpad; - int icols_unpadded = icols - lpad - rpad; - const int ichs = kchs; - -#define UNDILATED(x) ((input_dilated) ? (((x) + 1) / 2) : (x)) - - if (input_dilated) { - irows_unpadded = (irows_unpadded + 1) / 2; - icols_unpadded = (icols_unpadded + 1) / 2; - - irows = irows_unpadded + UNDILATED(upad) + UNDILATED(dpad); - icols = icols_unpadded + UNDILATED(lpad) + UNDILATED(rpad); - } - -#ifdef HAS_FIRST_LAYER_OPTIMIZATIONS - const bool transposed = trans_output_1203 || trans_input_3120 || - trans_weight_1203 || trans_weight_0132; - int max_pixels_per_row = transposed || wrot180 || downsample || - input_dilated || kernel_dilation > 1 || - ichs > DIM - ? 1 - : DIM / ichs; - if (max_pixels_per_row > kcols) - max_pixels_per_row = kcols; -#else - const int max_pixels_per_row = 1; -#endif - - // Calculate spad address offsets - const int out_channels_per_bank = ochs / DIM + (ochs % DIM != 0); - const int in_channels_per_bank = kchs / DIM + (kchs % DIM != 0); - const int B_rows = trans_weight_0132 - ? in_channels_per_bank * kcols * krows * ochs - : out_channels_per_bank * kcols * krows * kchs; - - static uint32_t D_sp_addr_row = 0; - static uint32_t C_sp_addr_row = 0; - - const uint32_t A_sp_addr_start = 0; - const uint32_t B_sp_addr_start = BANK_NUM * BANK_ROWS - B_rows; - const uint32_t D_sp_addr_start = (1 << (ADDR_LEN - 1)) + D_sp_addr_row; - const uint32_t C_sp_addr_start = (3 << (ADDR_LEN - 2)) + C_sp_addr_row; - - if (bias != 0) { - D_sp_addr_row = (D_sp_addr_row + ACC_ROWS / 2) % ACC_ROWS; - } - - if (output != 0) { - C_sp_addr_row = (C_sp_addr_row + ACC_ROWS / 2) % ACC_ROWS; - } - - gemmini_loop_conv_ws( - batch_size, in_row_dim, in_col_dim, in_channels, out_channels, - out_row_dim, out_col_dim, pool_out_row_dim, pool_out_col_dim, stride, - padding, kernel_dim, kernel_dilation, pool_size, pool_stride, - pool_padding, batches, porows, pocols, pochs, krows, kcols, kchs, lpad, - rpad, upad, dpad, plpad, prpad, pupad, pdpad, orows, ocols, weights, - output, bias, input, no_bias, no_pool, downsample, wrot180, input_dilated, - act, trans_output_1203, trans_weight_1203, trans_weight_0132, - trans_input_3120, max_pixels_per_row, in_stride, weight_stride, - out_stride, dw, a_spad_id, b_spad_id); - - /* - if (!no_pool) { - printf("Pooling with rectangular convolutions is currently not - supported.\n"); exit(1); - } - - // Only rectangular convolutions will use the following C code - - // mvin bias - if (bias != NULL) { - // TODO we probably don't need quite this many nested loops for this part - - const int max_ochs_per_mvin = ochs < MAX_BLOCK_LEN_ACC * DIM ? ochs : - MAX_BLOCK_LEN_ACC * DIM; - - gemmini_extended4_config_ld(0, MVIN_SCALE_IDENTITY, false, batches * orows - * ocols, 2); - - for (int b = 0; b < batches; b++) - for (int orow = 0; orow < orows; orow++) - for (int ocol = 0; ocol < ocols; ocol += DIM) { - const int I = ocols - ocol > DIM ? DIM : ocols - ocol; - - for (int och = 0; och < ochs; och += max_ochs_per_mvin) { - const int J = ochs - och > max_ochs_per_mvin ? max_ochs_per_mvin : - ochs - och; - - const uint32_t D_sp_addr = D_sp_addr_start + (och / DIM) * batches - * orows * ocols + b * orows * ocols + orow * ocols + ocol; - - const acc_t * bias_dram_addr = no_bias ? NULL : bias + och; - - gemmini_extended_mvin3(bias_dram_addr, - D_sp_addr, - J, I); - } - } - } - - // mvin input - if (input != NULL){ - int max_chs_per_mvin = ichs < MAX_BLOCK_LEN * DIM ? ichs : - MAX_BLOCK_LEN * DIM; - if (trans_input_3120) { - max_chs_per_mvin = batches < MAX_BLOCK_LEN * DIM ? batches : - MAX_BLOCK_LEN * DIM; - } - - const int dram_stride = trans_input_3120 ? - batch_size * sizeof(elem_t) : - in_channels * sizeof(elem_t); - - const int spad_stride = trans_input_3120 ? - ichs * (irows >> downsample) * (icols >> downsample) : - batches * (irows >> downsample) * (icols >> downsample); - - gemmini_extended5_config_ld(dram_stride << downsample, - MVIN_SCALE_IDENTITY, false, spad_stride, max_pixels_per_row, 0); - - const int b_it = trans_input_3120 ? max_chs_per_mvin : 1; - const int ich_it = trans_input_3120 ? 1 : max_chs_per_mvin; - - for (int b = 0; b < batches; b += b_it) - for (int irow = -UNDILATED(upad); irow < irows_unpadded + - UNDILATED(dpad); irow += 1 + downsample) { const int irow_padded = irow + - UNDILATED(upad); - - for (int icol = -UNDILATED(lpad); icol < icols_unpadded + - UNDILATED(rpad);) { - // TODO There might be some unnecessary mvins here at the edge of - the image - - int I = icols_unpadded - icol > (DIM << downsample) ? - (DIM << downsample) : icols_unpadded - icol; - - if (icol < 0) { - I = -icol > DIM ? DIM : -icol; - } else if (icol >= icols_unpadded) { - I = icols_unpadded + UNDILATED(rpad) - icol > DIM ? DIM : - icols_unpadded + UNDILATED(rpad) - icol; - } - - const int icol_padded = icol + UNDILATED(lpad); - - for (int ich = 0; ich < ichs; ich += ich_it) { - int K = ichs - ich > max_chs_per_mvin ? - max_chs_per_mvin : ichs - ich; - if (trans_input_3120) { - K = batches - b > max_chs_per_mvin ? - max_chs_per_mvin : batches - b; - } - - #define DS(x) ((x) >> (downsample)) - - uint32_t A_sp_addr = A_sp_addr_start + (ich / DIM) * batches * - DS(irows) * DS(icols) + b * DS(irows) * DS(icols) + DS(irow_padded) * - DS(icols) + DS(icol_padded); if (trans_input_3120) { A_sp_addr = - A_sp_addr_start + (b / DIM) * ichs * DS(irows) * DS(icols) + ich * DS(irows) * - DS(icols) + DS(irow_padded) * DS(icols) + DS(icol_padded); - } - - const bool is_zeros = irow < 0 || irow >= irows_unpadded || icol < - 0 || icol >= icols_unpadded; - - const elem_t * in = input + (b*in_row_dim*in_col_dim + - irow*in_col_dim + icol) * in_stride + ich; if (is_zeros) { in = NULL; } else - if (trans_input_3120) { in = input + (ich*in_row_dim*in_col_dim + - irow*in_col_dim + icol) * batch_size + b; - } - - gemmini_extended_mvin(in, - A_sp_addr, - K, I >> downsample); - } - - icol += I; - } - } - } - - // mvin weights - if (weights != NULL) { - int max_chs_per_mvin = ochs < MAX_BLOCK_LEN * DIM ? ochs : - MAX_BLOCK_LEN * DIM; - if (trans_weight_0132) { - max_chs_per_mvin = kchs < MAX_BLOCK_LEN * DIM ? kchs : - MAX_BLOCK_LEN * DIM; - } - - size_t dram_stride = weight_stride * sizeof(elem_t); - if (dw) { - dram_stride = sizeof(elem_t); - } else if (trans_weight_1203) { - dram_stride = kernel_dim * kernel_dim * out_channels * sizeof(elem_t); - } else if (trans_weight_0132) { - dram_stride = in_channels * sizeof(elem_t); - } - - const size_t spad_block_stride = trans_weight_0132 ? - krows * kcols * ochs : krows * kcols * kchs; - - gemmini_extended4_config_ld(dram_stride, MVIN_SCALE_IDENTITY, false, - spad_block_stride, 1); - - const size_t och_it = trans_weight_0132 ? DIM : max_chs_per_mvin; - const size_t kch_it = trans_weight_0132 ? max_chs_per_mvin : DIM; - - for (int och = 0; och < ochs; och += och_it) { - for (int krow = 0; krow < krows; krow++) - for (int kcol = 0; kcol < kcols; kcol++) - for (int kch = 0; kch < kchs; kch += kch_it) { - int K = kchs - kch > DIM ? DIM : kchs - kch; - int J = ochs - och > max_chs_per_mvin ? max_chs_per_mvin : ochs - - och; if (trans_weight_0132) { K = ochs - och > DIM ? DIM : ochs - och; J = - kchs - kch > max_chs_per_mvin ? max_chs_per_mvin : kchs - kch; - } - - uint32_t B_sp_addr = B_sp_addr_start + (och / DIM) * krows * kcols - * kchs + krow * kcols * kchs + kcol * kchs + kch; if (trans_weight_0132) { - B_sp_addr = B_sp_addr_start + (kch / DIM) * krows * kcols * ochs - + krow * kcols * ochs + kcol * ochs + och; - } - - const elem_t * w = weights + (krow*kernel_dim*in_channels + - kcol*in_channels + kch) * weight_stride + och; if (dw) { w = weights + krow * - kernel_dim + kcol; } else if (trans_weight_1203) { w = weights + (kch * - kernel_dim * kernel_dim + krow * kernel_dim + kcol) * out_channels + och; } - else if (trans_weight_0132) { w = weights + (krow * kernel_dim * out_channels - + kcol * out_channels + och) * in_channels + kch; - } - - gemmini_extended_mvin2(w, B_sp_addr, J, K); - } - } - } - - // Compute - { - const int b_it = trans_input_3120 ? DIM : 1; - const int ocol_it = trans_input_3120 ? 1 : (DIM << input_dilated); - - if (trans_input_3120) { - gemmini_extended3_config_ex(0, 0, 0, 0, orows * ocols, irows * icols, 0, - 0, true); - } - - for (int och = 0; och < ochs; och += DIM) { - for (int krow = 0; krow < krows; krow++) { - for (int kcol = 0; kcol < kcols; kcol += max_pixels_per_row) { - for (int kch = 0; kch < kchs; kch += DIM) { - bool new_weights = true; - - for (int b = 0; b < batches; b += b_it) { - for (int orow = 0; orow < orows; orow++) { - // Skip some kernel rows due to input-dilation - if (input_dilated && ((krow * kernel_dilation + orow * stride - - upad) % 2 != 0)) { continue; - } - - for (int ocol = 0; ocol < ocols;) { - // Skip some cols dimensions due to input-dilation - if (input_dilated && ((kcol + ocol * stride - lpad) % 2 != - 0)) { ocol++; continue; - } - - int irow = orow * stride + krow * kernel_dilation; - int icol = ocol * stride + kcol * kernel_dilation; - - if (input_dilated) { - irow = (irow + 1) / 2; - icol = (icol + 1) / 2; - } - - const int pixels = kcols - kcol > max_pixels_per_row ? - max_pixels_per_row : kcols - kcol; - - const uint32_t C_sp_addr = C_sp_addr_start + (och / DIM) * - batches * orows * ocols + b * orows * ocols + orow * ocols + ocol; - - // Over here, construct a new matrix - // - // Let us assume that we only ever operate on - // one pixel in one row. - // Thus, krows == kcols == 1 - // - // Then, for every set of I, J, and K values - // - I = ocols - // - J = ochs - // - K = kchs - - int I = UNDILATED(ocols - ocol > (DIM << input_dilated) ? - (DIM << input_dilated) : ocols - ocol); const int J = ochs - och > DIM ? DIM : - ochs - och; const int K = pixels * (kchs - kch > DIM ? DIM : kchs - kch); - - if (trans_input_3120) { - I = batches - b > DIM ? DIM : batches - b; - } - - uint32_t A_sp_addr = A_sp_addr_start + (kch / DIM) * batches - * DS(irows) * DS(icols) + b * DS(irows) * DS(icols) + DS(irow) * DS(icols) + - DS(icol); if (trans_input_3120) { A_sp_addr = A_sp_addr_start + (b / DIM) * - kchs * DS(irows) * DS(icols) + kch * DS(irows) * DS(icols) + DS(irow) * - DS(icols) + DS(icol); - } - - const int krow_ = wrot180 ? krows - krow - 1 : krow; - const int kcol_ = wrot180 ? kcols - kcol - 1 : kcol; - - uint32_t B_sp_addr = B_sp_addr_start + (och / DIM) * krows * - kcols * kchs + krow_ * kcols * kchs + kcol_ * kchs + kch; if - (trans_weight_0132) { B_sp_addr = B_sp_addr_start + (kch / DIM) * krows * - kcols * ochs + krow_ * kcols * ochs + kcol_ * ochs + och; - } - - const uint32_t pre_sp_addr = new_weights ? - B_sp_addr : GARBAGE_ADDR; - - // perform matmul - gemmini_extended_preload(pre_sp_addr, C_sp_addr, J, K, J, - I); - - if (new_weights) { - gemmini_extended_compute_preloaded(A_sp_addr, - GARBAGE_ADDR, K, I, J, I); } else { - gemmini_extended_compute_accumulated(A_sp_addr, - GARBAGE_ADDR, K, I, J, I); - } - - ocol += ocol_it; - new_weights = false; - } - } - } - } - } - } - } - } - - #undef DS - #undef UNDILATED - - // mvout output - if (output != NULL) { - if (no_pool) { - for (int b = 0; b < batches; b++) - for (int orow = 0; orow < orows; orow++) - for (int ocol = 0; ocol < ocols; ocol += DIM) { - const int I = ocols - ocol > DIM ? DIM : ocols - ocol; - - for (int och = 0; och < ochs; och += DIM) { - const int J = ochs - och > DIM ? DIM : ochs - och; - - const uint32_t C_sp_addr = C_sp_addr_start + (och / DIM) * - batches * orows * ocols + b * orows * ocols + orow * ocols + ocol; - - elem_t * out = output + (b*out_row_dim*out_col_dim + - orow*out_col_dim + ocol) * out_stride + och; if (trans_output_1203) { out = - output + (orow*out_col_dim*batch_size + ocol*batch_size + b) * out_channels + - och; - } - - gemmini_extended_mvout(out, - C_sp_addr, - J, I); - } - } - } else { - printf("Pooling with rectangular convolutions is currently not - supported.\n"); exit(1); - */ - /* - gemmini_extended2_config_st(out_channels * sizeof(elem_t), act, scale, -pool_stride, pool_size, pool_out_row_dim, porows, pocols, orows, ocols, pupad, -plpad); - - for (int b = 0; b < batches; b++) { - for (int poch = 0; poch < pochs; poch += DIM) { - const int channels = poch + DIM >= pochs ? pochs - poch : DIM; - - elem_t * pout = output + (b * pool_out_row_dim * -pool_out_col_dim)*out_channels + poch; - - const uint32_t C_sp_addr = C_sp_addr_start + (poch / DIM) * batches * -orows * ocols + b * orows * ocols; - - gemmini_extended_mvout(pout, - C_sp_addr, - channels, 0); - } - } - - gemmini_extended_config_st(out_channels * sizeof(elem_t), act, scale); -<<<<<<< HEAD - */ - // } - // } - // } - //} -} - -static int tiled_conv_total_spad_rows_dw(bool acc, bool weight, int stride, - int batches, int porows, int pocols, - int ochs, int krows, int kcols, - int kchs, int pool_size, - int pool_stride) { - - const int orows = porows * pool_stride + pool_size - 1; - const int ocols = pocols * pool_stride + pool_size - 1; - - const int irows = orows * stride + krows - 1; // - 2 * padding; - const int icols = ocols * stride + kcols - 1; // - 2 * padding; - const int ichs = kchs; - - const int in_channels_per_bank = ichs / DIM + (ichs % DIM != 0); - const int out_channels_per_bank = ochs / DIM + (ochs % DIM != 0); - - const int A_rows = in_channels_per_bank * batches * irows * icols; - const int B_rows = out_channels_per_bank * kcols * krows * kchs; - const int C_rows = out_channels_per_bank * batches * orows * ocols; - - if (acc) - return C_rows; - else if (weight) - return B_rows; - else - return A_rows; -} - -static int tiled_conv_total_spad_rows(bool acc, int stride, int input_dilation, - int kernel_dilation, bool downsample, - bool trans_weight_0132, - bool trans_input_3120, int batches, - int porows, int pocols, int ochs, - int krows, int kcols, int kchs, - int pool_size, int pool_stride) { - - const int orows = porows * pool_stride + pool_size - 1; - const int ocols = pocols * pool_stride + pool_size - 1; - - const int krows_dilated = krows + (kernel_dilation - 1) * (krows - 1); - const int kcols_dilated = kcols + (kernel_dilation - 1) * (kcols - 1); - - int irows = orows * stride + krows_dilated - 1; // - 2 * padding; - int icols = ocols * stride + kcols_dilated - 1; // - 2 * padding; - const int ichs = kchs; - - irows = irows / input_dilation + (irows % input_dilation != 0); - icols = icols / input_dilation + (icols % input_dilation != 0); - - const int in_channels_per_bank = ichs / DIM + (ichs % DIM != 0); - const int out_channels_per_bank = ochs / DIM + (ochs % DIM != 0); - const int batches_per_bank = batches / DIM + (batches % DIM != 0); - - const int A_rows = trans_input_3120 - ? (batches_per_bank * ichs * (irows >> downsample) * - (icols >> downsample)) - : (in_channels_per_bank * batches * - (irows >> downsample) * (icols >> downsample)); - - const int B_rows = trans_weight_0132 - ? in_channels_per_bank * kcols * krows * ochs - : out_channels_per_bank * kcols * krows * kchs; - - const int C_rows = out_channels_per_bank * batches * orows * ocols; - - return acc ? C_rows : A_rows + B_rows; -} - -static void conv_cpu_without_pool( - int batch_size, int in_row_dim, int in_col_dim, int in_channels, - int out_channels, int out_row_dim, int out_col_dim, int stride, - int input_dilation, int kernel_dilation, int padding, int kernel_dim, - int in_stride, int weight_stride, int out_stride, bool wrot180, - bool trans_output_1203, bool trans_input_3120, bool trans_weight_1203, - bool trans_weight_0132, - - const elem_t *input, const elem_t *weights, const acc_t *bias, - elem_t *output, - - int act, acc_scale_t scale) { - - bool no_bias = bias == NULL; - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int och = 0; och < out_channels; och++) { - - acc_t opixel = no_bias ? 0 : bias[och]; - - for (int krow = 0; krow < kernel_dim; krow++) { - if ((orow * stride + krow * kernel_dilation - padding) % - input_dilation != - 0) - continue; - - const int irow = - (orow * stride + krow * kernel_dilation - padding) / - input_dilation; - - for (int kcol = 0; kcol < kernel_dim; kcol++) { - if ((ocol * stride + kcol * kernel_dilation - padding) % - input_dilation != - 0) - continue; - - const int icol = - (ocol * stride + kcol * kernel_dilation - padding) / - input_dilation; - - for (int kch = 0; kch < in_channels; kch++) { - const elem_t *in = - input + - (b * in_row_dim * in_col_dim + irow * in_col_dim + icol) * - in_stride + - kch; - if (trans_input_3120) { - // NHWC to CHWN - in = input + - (kch * in_row_dim * in_col_dim + irow * in_col_dim + - icol) * - batch_size + - b; - } - - elem_t ipixel = irow < 0 || irow >= in_row_dim || icol < 0 || - icol >= in_col_dim - ? 0 - : *in; - - const int krow_ = wrot180 ? kernel_dim - krow - 1 : krow; - const int kcol_ = wrot180 ? kernel_dim - kcol - 1 : kcol; - - elem_t weight = *(weights + - (krow_ * kernel_dim * in_channels + - kcol_ * in_channels + kch) * - weight_stride + - och); - if (trans_weight_1203) { - // HWIO to WIHO - weight = *(weights + - (kch * kernel_dim * kernel_dim + - krow_ * kernel_dim + kcol_) * - out_channels + - och); - } else if (trans_weight_0132) { - // HWIO to HWOI - weight = *(weights + - (krow_ * kernel_dim * out_channels + - kcol_ * out_channels + och) * - in_channels + - kch); - } - - opixel += weight * ipixel; - } - } - } - - elem_t *out = - output + - (b * out_row_dim * out_col_dim + orow * out_col_dim + ocol) * - out_stride + - och; - if (trans_output_1203) { - // NHWC to HWNC - out = output + - (orow * out_col_dim * batch_size + ocol * batch_size + b) * - out_channels + - och; - } - - *out = scale_and_sat(opixel, act, scale, 0); - } - } - } - } -} - -static void conv_dw_cpu_without_pool(int batch_size, int in_row_dim, - int in_col_dim, int channels, - int out_row_dim, int out_col_dim, - int stride, int padding, int kernel_dim, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale) { - - bool no_bias = bias == NULL; - - for (int b = 0; b < batch_size; b++) { - for (int orow = 0; orow < out_row_dim; orow++) { - for (int ocol = 0; ocol < out_col_dim; ocol++) { - for (int ch = 0; ch < channels; ch++) { - acc_t opixel = no_bias ? 0 : bias[ch]; - - for (int krow = 0; krow < kernel_dim; krow++) { - const int irow = orow * stride + krow - padding; - - for (int kcol = 0; kcol < kernel_dim; kcol++) { - const int icol = ocol * stride + kcol - padding; - - const elem_t *in = - input + - (b * in_row_dim * in_col_dim + irow * in_col_dim + icol) * - channels + - ch; - - const elem_t ipixel = irow < 0 || irow >= in_row_dim || - icol < 0 || icol >= in_col_dim - ? 0 - : *in; - - const elem_t weight = - *(weights + (ch * kernel_dim + krow) * kernel_dim + kcol); - - opixel += weight * ipixel; - } - } - - elem_t *out = - output + - (b * out_row_dim * out_col_dim + orow * out_col_dim + ocol) * - channels + - ch; - - *out = scale_and_sat(opixel, act, scale, 0); - } - } - } - } -} - -static void conv_cpu(int batch_size, int in_row_dim, int in_col_dim, - int in_channels, int out_channels, int out_row_dim, - int out_col_dim, int stride, int input_dilation, - int kernel_dilation, int padding, int kernel_dim, - int in_stride, int weight_stride, int out_stride, - bool wrot180, bool trans_output_1203, - bool trans_input_3120, bool trans_weight_1203, - bool trans_weight_0132, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale, int pool_size, int pool_stride, - int pool_padding) { - - const bool no_pool = pool_stride == 0; - if (no_pool) { - conv_cpu_without_pool( - batch_size, in_row_dim, in_col_dim, in_channels, out_channels, - out_row_dim, out_col_dim, stride, input_dilation, kernel_dilation, - padding, kernel_dim, in_stride, weight_stride, out_stride, wrot180, - trans_output_1203, trans_input_3120, trans_weight_1203, - trans_weight_0132, input, weights, bias, output, act, scale); - return; - } - - const bool no_bias = bias == NULL; - const int pool_out_row_dim = - (out_row_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int pool_out_col_dim = - (out_col_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - - for (int b = 0; b < batch_size; b++) { - for (int porow = 0; porow < pool_out_row_dim; porow++) { - for (int pocol = 0; pocol < pool_out_col_dim; pocol++) { - for (int poch = 0; poch < out_channels; poch++) { - - elem_t running_max = 0; - bool running_max_initialized = false; - - for (int pwrow = 0; pwrow < pool_size; pwrow++) { - const int orow = porow * pool_stride + pwrow - pool_padding; - - for (int pwcol = 0; pwcol < pool_size; pwcol++) { - const int ocol = pocol * pool_stride + pwcol - pool_padding; - - if (orow < 0 || orow >= out_row_dim || ocol < 0 || - ocol >= out_col_dim) { - if (!running_max_initialized || running_max < 0) { - running_max = 0; - running_max_initialized = true; - } - } else { - - acc_t opixel = no_bias ? 0 : bias[poch]; - - for (int krow = 0; krow < kernel_dim; krow++) { - if ((orow * stride + krow * kernel_dilation - padding) % - input_dilation != - 0) - continue; - - const int irow = - (orow * stride + krow * kernel_dilation - padding) / - input_dilation; - - for (int kcol = 0; kcol < kernel_dim; kcol++) { - if ((ocol * stride + kcol * kernel_dilation - padding) % - input_dilation != - 0) - continue; - - const int icol = - (ocol * stride + kcol * kernel_dilation - padding) / - input_dilation; - - for (int kch = 0; kch < in_channels; kch++) { - const elem_t *in = input + - (b * in_row_dim * in_col_dim + - irow * in_col_dim + icol) * - in_stride + - kch; - if (trans_input_3120) { - // NHWC to CHWN - in = input + - (kch * in_row_dim * in_col_dim + - irow * in_col_dim + icol) * - batch_size + - b; - } - - elem_t ipixel = irow < 0 || irow >= in_row_dim || - icol < 0 || icol >= in_col_dim - ? 0 - : *in; - - const int krow_ = wrot180 ? kernel_dim - krow - 1 : krow; - const int kcol_ = wrot180 ? kernel_dim - kcol - 1 : kcol; - - elem_t weight = *(weights + - (krow_ * kernel_dim * in_channels + - kcol_ * in_channels + kch) * - weight_stride + - poch); - if (trans_weight_1203) { - // HWIO to WIHO - weight = *(weights + - (kch * kernel_dim * kernel_dim + - krow_ * kernel_dim + kcol_) * - out_channels + - poch); - } else if (trans_weight_0132) { - // HWIO to HWOI - weight = *(weights + - (krow_ * kernel_dim * out_channels + - kcol_ * out_channels + poch) * - in_channels + - kch); - } - - opixel += weight * ipixel; - } - } - } - - opixel = scale_and_sat(opixel, act, scale, 0); - if (!running_max_initialized || opixel > running_max) { - running_max = opixel; - running_max_initialized = true; - } - } - - if (pwrow == pool_size - 1 && pwcol == pool_size - 1) { - elem_t *out = output + - (b * pool_out_row_dim * pool_out_col_dim + - porow * pool_out_col_dim + pocol) * - out_stride + - poch; - if (trans_output_1203) { - // NHWC to HWNC - out = output + - (porow * pool_out_col_dim * batch_size + - pocol * batch_size + b) * - out_channels + - poch; - } - - *out = running_max; - } - } - } - } - } - } - } -} - -static void conv_dw_cpu(int batch_size, int in_row_dim, int in_col_dim, - int channels, int out_row_dim, int out_col_dim, - int stride, int padding, int kernel_dim, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale, int pool_size, - int pool_stride, int pool_padding) { - - const bool no_pool = pool_stride == 0; - if (no_pool) { - conv_dw_cpu_without_pool( - batch_size, in_row_dim, in_col_dim, channels, out_row_dim, out_col_dim, - stride, padding, kernel_dim, input, weights, bias, output, act, scale); - return; - } - - const bool no_bias = bias == NULL; - const int pool_out_row_dim = - (out_row_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int pool_out_col_dim = - (out_col_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - - for (int b = 0; b < batch_size; b++) { - for (int porow = 0; porow < pool_out_row_dim; porow++) { - for (int pocol = 0; pocol < pool_out_col_dim; pocol++) { - for (int ch = 0; ch < channels; ch++) { - - elem_t running_max = 0; - bool running_max_initialized = false; - - for (int pwrow = 0; pwrow < pool_size; pwrow++) { - const int orow = porow * pool_stride + pwrow - pool_padding; - - for (int pwcol = 0; pwcol < pool_size; pwcol++) { - const int ocol = pocol * pool_stride + pwcol - pool_padding; - - if (orow < 0 || orow >= out_row_dim || ocol < 0 || - ocol >= out_col_dim) { - if (!running_max_initialized || running_max < 0) { - running_max = 0; - running_max_initialized = true; - } - } else { - - acc_t opixel = no_bias ? 0 : bias[ch]; - - for (int krow = 0; krow < kernel_dim; krow++) { - const int irow = orow * stride + krow - padding; - - for (int kcol = 0; kcol < kernel_dim; kcol++) { - const int icol = ocol * stride + kcol - padding; - - const elem_t *in = input + - (b * in_row_dim * in_col_dim + - irow * in_col_dim + icol) * - channels + - ch; - - elem_t ipixel = irow < 0 || irow >= in_row_dim || - icol < 0 || icol >= in_col_dim - ? 0 - : *in; - - const elem_t weight = *( - weights + (ch * kernel_dim + krow) * kernel_dim + kcol); - - opixel += weight * ipixel; - } - } - - opixel = scale_and_sat(opixel, act, scale, 0); - if (!running_max_initialized || opixel > running_max) { - running_max = opixel; - running_max_initialized = true; - } - } - - if (pwrow == pool_size - 1 && pwcol == pool_size - 1) { - elem_t *out = output + - (b * pool_out_row_dim * pool_out_col_dim + - porow * pool_out_col_dim + pocol) * - channels + - ch; - - *out = running_max; - } - } - } - } - } - } - } -} - -static void tiled_conv(int batch_size, int in_row_dim, int in_col_dim, - int in_channels, int out_channels, int out_row_dim, - int out_col_dim, int stride, int input_dilation, - int kernel_dilation, int padding, int kernel_dim, - int in_stride, int weight_stride, int out_stride, - bool wrot180, bool trans_output_1203, - bool trans_input_3120, bool trans_weight_1203, - bool trans_weight_0132, - - int batches, int porows, int pocols, int pochs, - int krows, int kcols, int kchs, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale, int pool_size, - int pool_stride, int pool_padding, - - enum tiled_matmul_type_t tiled_conv_type) { - -#ifdef GEMMINI_ASSERTIONS - if (trans_weight_1203 && trans_weight_0132) { - printf("Only one weight transformation can be applied at a time\n"); - exit(1); - } -#endif - - if (tiled_conv_type == CPU) { - if (pool_size == 1 && pool_stride == 1 && pool_padding == 0) { - pool_stride = 0; - } - - // assume in_dim_rows = in_dim_cols - // and out_dim_rows = out_dim_cols for now - conv_cpu(batch_size, in_row_dim, in_col_dim, in_channels, out_channels, - out_row_dim, out_col_dim, stride, input_dilation, kernel_dilation, - padding, kernel_dim, in_stride, weight_stride, out_stride, wrot180, - trans_output_1203, trans_input_3120, trans_weight_1203, - trans_weight_0132, input, weights, bias, output, act, scale, - pool_size, pool_stride, pool_padding); - return; - } else if (tiled_conv_type == OS) { - printf("Gemmini convs do not currently support OS\n"); - exit(1); - } - - // TODO move everything below this into a tiled_conv_outer function to match - // the tiled_matmul function - - bool no_bias = false; - if (bias == NULL) { - bias = (acc_t *)1; - no_bias = true; - } - - bool no_pool = pool_stride == 0; - if (no_pool) { - pool_size = 1; - pool_stride = 1; - pool_padding = 0; - } - - const bool downsample = stride == 2 && kernel_dim == 1 && - in_row_dim % 2 == 0 && in_col_dim % 2 == 0 && - padding == 0 && no_pool && input_dilation == 1 && - !trans_input_3120; - - const int input_dilated = input_dilation == 2; - -#ifdef GEMMINI_ASSERTIONS - { - // const int orows = porows * pool_stride + pool_size - 1; - // const int ocols = pocols * pool_stride + pool_size - 1; - - // Check that data will fit in scratchpad - const int spad_rows = tiled_conv_total_spad_rows( - false, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, batches, porows, pocols, pochs, - krows, kcols, kchs, pool_size, pool_stride); - const int acc_rows = tiled_conv_total_spad_rows( - true, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, batches, porows, pocols, pochs, - krows, kcols, kchs, pool_size, pool_stride); - - if (spad_rows > BANK_NUM * BANK_ROWS / 2) { - printf("not enough scratchpad space to store inputs and weights, %d\n", - spad_rows); - exit(1); - } - if (acc_rows > ACC_ROWS / 2) { - printf("not enough accumulator space to store outputs\n"); - exit(1); - } - if (kernel_dim <= padding) { - printf("kernel_dim must be larger than padding\n"); - exit(1); - } - if (input_dilation > 2) { - printf("input_dilation > 2 is only supported on CPU\n"); - exit(1); - } - if (input_dilation > 1 && stride > 1) { - printf("input input_dilation is only supported when stride == 1\n"); - exit(1); - } - if (trans_output_1203 && !no_pool) { - printf("Output can only be transposed when pooling is disabled\n"); - exit(1); - } - if (trans_input_3120 && trans_weight_0132) { - printf("Cannot transpose innermost dimensions of both inputs and weights " - "on WS.\n"); - exit(1); - } - } -#endif - - const size_t st_dram_stride = trans_output_1203 - ? batch_size * out_channels * sizeof(elem_t) - : out_stride * sizeof(elem_t); - gemmini_extended_config_st(st_dram_stride, act, scale); - - gemmini_extended3_config_ex(WEIGHT_STATIONARY, 0, 0, 0, input_dilation, - stride >> downsample, trans_input_3120, - trans_weight_0132, false); - - const int pool_out_row_dim = - (out_row_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int pool_out_col_dim = - (out_col_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int dilated_in_row_dim = - in_row_dim + (input_dilation - 1) * (in_row_dim - 1); - const int dilated_in_col_dim = - in_col_dim + (input_dilation - 1) * (in_col_dim - 1); - - size_t a_spad_id = 0; - size_t b_spad_id = 0; - - int porow_end = pool_out_row_dim; - int porow_start = 0; - bool a_reuse = false; - bool b_reuse = false; - size_t num_kch = ceil_divide_int(in_channels, kchs); - size_t num_poch = ceil_divide_int(out_channels, pochs); - size_t num_b = ceil_divide_int(batch_size, batches); - size_t num_porow = ceil_divide_int((porow_end - porow_start), porows); - size_t num_pocol = ceil_divide_int(pool_out_col_dim, pocols); - size_t num_krow = ceil_divide_int(kernel_dim, krows); - size_t num_kcol = ceil_divide_int(kernel_dim, kcols); - - // printf("num_kch: %d, num_poch: %d, num_b: %d, num_porow: %d, num_pocol: - // %d, num_krow: %d, num_kcol: %d\n", num_kch, num_poch, num_b, num_porow, - // num_pocol, num_krow, num_kcol); - - if (num_kch * num_poch * num_krow * num_kcol <= 2) - b_reuse = true; - if (num_kch * num_krow * num_kcol * num_b * num_porow * num_pocol <= 2) - a_reuse = true; - - for (int b = 0; b < batch_size; b += batches) { - for (int porow = porow_start; porow < porow_end; porow += porows) { - const int orow = porow * pool_stride - pool_padding; - - for (int pocol = 0; pocol < pool_out_col_dim; pocol += pocols) { - const int ocol = pocol * pool_stride - pool_padding; - - for (int poch = 0; poch < out_channels; poch += pochs) { - for (int krow = 0; krow < kernel_dim; krow += krows) { - const int orow_floored = orow < 0 ? 0 : orow; - int irow = orow_floored * stride + krow * kernel_dilation - padding; - - for (int kcol = 0; kcol < kernel_dim; kcol += kcols) { - const int ocol_floored = ocol < 0 ? 0 : ocol; - int icol = - ocol_floored * stride + kcol * kernel_dilation - padding; - - for (int kch = 0; kch < in_channels; kch += kchs) { - if (a_reuse) - a_spad_id = (kch + krow + kcol + b + (porow - porow_start) + - pocol) == 0 - ? 1 - : 2; - if (b_reuse) - b_spad_id = (kch + poch + krow + kcol) == 0 ? 1 : 2; - elem_t *out = output + - (b * pool_out_row_dim * pool_out_col_dim + - porow * pool_out_col_dim + pocol) * - out_stride + - poch; - if (trans_output_1203) { - out = output + - (porow * pool_out_col_dim * batch_size + - pocol * batch_size + b) * - out_channels + - poch; - } - - if (krow + krows < kernel_dim || kcol + kcols < kernel_dim || - kch + kchs < in_channels) { - out = NULL; - } - - const acc_t *bias_ = bias + poch; - if (krow > 0 || kcol > 0 || kch > 0) { - bias_ = NULL; - } - - const int batches_ = - batch_size - b > batches ? batches : batch_size - b; - const int porows_ = pool_out_row_dim - porow > porows - ? porows - : pool_out_row_dim - porow; - const int pocols_ = pool_out_col_dim - pocol > pocols - ? pocols - : pool_out_col_dim - pocol; - const int pochs_ = - out_channels - poch > pochs ? pochs : out_channels - poch; - const int krows_ = - kernel_dim - krow > krows ? krows : kernel_dim - krow; - const int kcols_ = - kernel_dim - kcol > kcols ? kcols : kernel_dim - kcol; - const int kchs_ = - in_channels - kch > kchs ? kchs : in_channels - kch; - - const int ocols_ = pocols_ * pool_stride + pool_size - 1; - const int orows_ = porows_ * pool_stride + pool_size - 1; - - const int plpad = ocol < 0 ? -ocol : 0; - const int prpad = ocol + ocols_ > out_col_dim - ? ocol + ocols_ - out_col_dim - : 0; - const int pupad = orow < 0 ? -orow : 0; - const int pdpad = orow + orows_ > out_row_dim - ? orow + orows_ - out_row_dim - : 0; - - const int dilated_krows_ = - krows_ + (kernel_dilation - 1) * (krows_ - 1); - const int dilated_kcols_ = - kcols_ + (kernel_dilation - 1) * (kcols_ - 1); - - const int icols_ = - (ocols_ - plpad - prpad) * stride + dilated_kcols_ - 1; - const int irows_ = - (orows_ - pupad - pdpad) * stride + dilated_krows_ - 1; - - int lpad = icol < 0 ? -icol : 0; - int rpad = icol + icols_ > dilated_in_col_dim - ? icol + icols_ - dilated_in_col_dim - : 0; - int upad = irow < 0 ? -irow : 0; - int dpad = irow + irows_ > dilated_in_row_dim - ? irow + irows_ - dilated_in_row_dim - : 0; - - if (input_dilated) { - lpad += lpad == 0 && icol % 2 != 0; - rpad += rpad == 0 && (icol + icols_) % 2 != 1; - upad += upad == 0 && irow % 2 != 0; - dpad += dpad == 0 && (irow + irows_) % 2 != 1; - } - - int krow_ = krow; - int kcol_ = kcol; - if (wrot180) { - krow_ = kernel_dim - krow - krows_; - kcol_ = kernel_dim - kcol - kcols_; - } - - const elem_t *weights_slice = - weights + - (krow_ * kernel_dim * in_channels + kcol_ * in_channels + - kch) * - weight_stride + - poch; - if (trans_weight_1203) { - weights_slice = weights + - (kch * kernel_dim * kernel_dim + - krow_ * kernel_dim + kcol_) * - out_channels + - poch; - } else if (trans_weight_0132) { - weights_slice = weights + - (krow_ * kernel_dim * out_channels + - kcol_ * out_channels + poch) * - in_channels + - kch; - } - - const elem_t *in = - input + - (b * in_row_dim * in_col_dim + - ((irow + upad) >> input_dilated) * in_col_dim + - ((icol + lpad) >> input_dilated)) * - in_stride + - kch; - if (trans_input_3120) { - in = input + - (kch * in_row_dim * in_col_dim + - ((irow + upad) >> input_dilated) * in_col_dim + - ((icol + lpad) >> input_dilated)) * - batch_size + - b; - } - if (b_reuse && (pocol + (porow - porow_start) + b > 0)) - weights_slice = NULL; - if (a_reuse && (poch > 0)) - in = NULL; - // printf("a_reuse: %d, b_reuse: %d, a_spad_id: %d, b_spad_id: - // %d, in: %llu, weight: %llu \n", a_reuse, b_reuse, a_spad_id, - // b_spad_id, in, weights_slice); - - sp_tiled_conv( - batch_size, in_row_dim, in_col_dim, in_channels, - out_channels, out_row_dim, out_col_dim, pool_out_row_dim, - pool_out_col_dim, - - stride, padding, kernel_dim, kernel_dilation, in_stride, - weight_stride, out_stride, - - pool_size, pool_stride, pool_padding, - - batches_, porows_, pocols_, pochs_, krows_, kcols_, kchs_, - - lpad, rpad, upad, dpad, plpad, prpad, pupad, pdpad, - - in, weights_slice, out, bias_, - - act, scale, - - wrot180, trans_output_1203, trans_input_3120, - trans_weight_1203, trans_weight_0132, - - no_bias, no_pool, downsample, input_dilated, false, - a_spad_id, b_spad_id); - } - } - } - } - } - } - } -} - -static void tiled_conv_dw(int batch_size, int in_row_dim, int in_col_dim, - int channels, int out_row_dim, int out_col_dim, - int stride, int padding, int kernel_dim, - - int batches, int porows, int pocols, int krows, - int kcols, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale, int pool_size, - int pool_stride, int pool_padding, - - enum tiled_matmul_type_t tiled_conv_type) { - - if (tiled_conv_type == CPU) { - if (pool_size == 1 && pool_stride == 1 && pool_padding == 0) { - pool_stride = 0; - } - - conv_dw_cpu(batch_size, in_row_dim, in_col_dim, channels, out_row_dim, - out_col_dim, stride, padding, kernel_dim, input, weights, bias, - output, act, scale, pool_size, pool_stride, pool_padding); - return; - } else if (tiled_conv_type == OS) { - printf("Gemmini convs do not currently support OS\n"); - exit(1); - } - - // TODO move everything below this into a tiled_conv_outer function to match - // the tiled_matmul function - - bool no_bias = false; - if (bias == NULL) { - bias = (acc_t *)1; - no_bias = true; - } - - bool no_pool = pool_stride == 0; - if (no_pool) { - pool_size = 1; - pool_stride = 1; - pool_padding = 0; - } - -#ifdef GEMMINI_ASSERTIONS - { - // const int orows = porows * pool_stride + pool_size - 1; - // const int ocols = pocols * pool_stride + pool_size - 1; - - // Check that data will fit in scratchpad - const int spad_rows = tiled_conv_total_spad_rows( - false, stride, 1, 1, false, false, false, batches, porows, pocols, 1, - krows, kcols, 1, pool_size, pool_stride); - const int acc_rows = tiled_conv_total_spad_rows( - true, stride, 1, 1, false, false, false, batches, porows, pocols, 1, - krows, kcols, 1, pool_size, pool_stride); - - if (spad_rows > BANK_NUM * BANK_ROWS / 2) { - printf("not enough scratchpad space to store inputs and weights, %d\n", - spad_rows); - exit(1); - } - if (acc_rows > ACC_ROWS / 2) { - printf("not enough accumulator space to store outputs\n"); - exit(1); - } - if (kernel_dim <= padding) { - printf("kernel_dim must be larger than padding\n"); - exit(1); - } - } -#endif - - const size_t st_dram_stride = channels * sizeof(elem_t); - gemmini_extended_config_st(st_dram_stride, act, scale); - - gemmini_extended3_config_ex(WEIGHT_STATIONARY, 0, 0, 0, 1, stride, false, - false, false); - - const int pool_out_row_dim = - (out_row_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int pool_out_col_dim = - (out_col_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - - for (int b = 0; b < batch_size; b += batches) { - for (int porow = 0; porow < pool_out_row_dim; porow += porows) { - const int orow = porow * pool_stride - pool_padding; - - for (int pocol = 0; pocol < pool_out_col_dim; pocol += pocols) { - const int ocol = pocol * pool_stride - pool_padding; - - for (int ch = 0; ch < channels; ch++) { - for (int krow = 0; krow < kernel_dim; krow += krows) { - const int orow_floored = orow < 0 ? 0 : orow; - int irow = orow_floored * stride + krow - padding; - - for (int kcol = 0; kcol < kernel_dim; kcol += kcols) { - const int ocol_floored = ocol < 0 ? 0 : ocol; - int icol = ocol_floored * stride + kcol - padding; - - elem_t *out = output + - (b * pool_out_row_dim * pool_out_col_dim + - porow * pool_out_col_dim + pocol) * - channels + - ch; - - if (krow + krows < kernel_dim || kcol + kcols < kernel_dim) { - out = NULL; - } - - const acc_t *bias_ = bias + ch; - if (krow > 0 || kcol > 0) { - bias_ = NULL; - } - - const int batches_ = - batch_size - b > batches ? batches : batch_size - b; - const int porows_ = pool_out_row_dim - porow > porows - ? porows - : pool_out_row_dim - porow; - const int pocols_ = pool_out_col_dim - pocol > pocols - ? pocols - : pool_out_col_dim - pocol; - const int krows_ = - kernel_dim - krow > krows ? krows : kernel_dim - krow; - const int kcols_ = - kernel_dim - kcol > kcols ? kcols : kernel_dim - kcol; - - const int ocols_ = pocols_ * pool_stride + pool_size - 1; - const int orows_ = porows_ * pool_stride + pool_size - 1; - - const int plpad = ocol < 0 ? -ocol : 0; - const int prpad = - ocol + ocols_ > out_col_dim ? ocol + ocols_ - out_col_dim : 0; - const int pupad = orow < 0 ? -orow : 0; - const int pdpad = - orow + orows_ > out_row_dim ? orow + orows_ - out_row_dim : 0; - - const int icols_ = (ocols_ - plpad - prpad) * stride + kcols_ - 1; - const int irows_ = (orows_ - pupad - pdpad) * stride + krows_ - 1; - - int lpad = icol < 0 ? -icol : 0; - int rpad = - icol + icols_ > in_col_dim ? icol + icols_ - in_col_dim : 0; - int upad = irow < 0 ? -irow : 0; - int dpad = - irow + irows_ > in_row_dim ? irow + irows_ - in_row_dim : 0; - - const elem_t *weights_slice = - weights + (ch * kernel_dim + krow) * kernel_dim + kcol; - - const elem_t *in = input + - (b * in_row_dim * in_col_dim + - (irow + upad) * in_col_dim + (icol + lpad)) * - channels + - ch; - - sp_tiled_conv( - batch_size, in_row_dim, in_col_dim, channels, channels, - out_row_dim, out_col_dim, pool_out_row_dim, pool_out_col_dim, - - stride, padding, kernel_dim, 1, channels, 1, channels, - - pool_size, pool_stride, pool_padding, - - batches_, porows_, pocols_, 1, krows_, kcols_, 1, - - lpad, rpad, upad, dpad, plpad, prpad, pupad, pdpad, - - in, weights_slice, out, bias_, - - act, scale, - - false, false, false, false, false, - - no_bias, no_pool, false, false, true, 0, 0); - } - } - } - } - } - } -} - -// need to specify each operand/output's stride -// stride only for trans == false, wrot == false -static void tiled_conv_stride_auto( - int batch_size, int in_row_dim, int in_col_dim, int in_channels, - int out_channels, int out_row_dim, int out_col_dim, int stride, - int input_dilation, int kernel_dilation, int padding, int kernel_dim, - int in_stride, int weight_stride, - int out_stride, // specify in/output's stride - bool wrot180, bool trans_output_1203, bool trans_input_3120, - bool trans_weight_1203, bool trans_weight_0132, - - const elem_t *input, const elem_t *weights, const acc_t *bias, - elem_t *output, - - int act, acc_scale_t scale, int pool_size, int pool_stride, - int pool_padding, - - enum tiled_matmul_type_t tiled_conv_type) { - - const bool no_pool = pool_stride == 0; - if (no_pool) { - pool_size = 1; - pool_stride = 1; - pool_padding = 0; - } - - const int pool_out_row_dim = - (out_row_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int pool_out_col_dim = - (out_col_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - - const bool downsample = stride == 2 && kernel_dim == 1 && padding == 0 && - no_pool && in_row_dim % 2 == 0 && in_col_dim % 2 == 0; - - // Tile convolution params - - // int args[] = {batch_size, porows, pocols, pochs, krows, kcols, kchs}; - int args[] = {batch_size, pool_out_row_dim, pool_out_col_dim, out_channels, - kernel_dim, kernel_dim, in_channels}; - const int max_args[] = {batch_size, pool_out_row_dim, pool_out_col_dim, - out_channels, kernel_dim, kernel_dim, - in_channels}; - - const int orows_idx = 1; - const int ocols_idx = 2; - const int out_channels_idx = 3; - const int in_channels_idx = 6; - - // We divide by 2 for the sake of double-buffering - const int max_spad_rows = (BANK_NUM * BANK_ROWS / 2); - const int max_acc_rows = (ACC_ROWS / 2); - - int spad_rows = tiled_conv_total_spad_rows( - false, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args[0], args[1], args[2], args[3], - args[4], args[5], args[6], pool_size, pool_stride); - int acc_rows = tiled_conv_total_spad_rows( - true, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args[0], args[1], args[2], args[3], - args[4], args[5], args[6], pool_size, pool_stride); - - while (spad_rows > max_spad_rows || acc_rows > max_acc_rows) { - int max_val = -1; - int max_idx = -1; - - for (size_t i = 0; i < sizeof(args) / sizeof(args[0]); i++) { - // We avoid reducing ocols when possible to keep the spatial array fully - // utilized - if (!(i == ocols_idx && args[i] <= DIM && args[orows_idx] > 1) && - args[i] > max_val) { - max_val = args[i]; - max_idx = i; - } - } - - if (max_idx == out_channels_idx || max_idx == in_channels_idx) { - // For input and output channels, there's no point in subtracting by just - // one - if (args[max_idx] % DIM != 0) { - args[max_idx] = (args[max_idx] / DIM) * DIM; - } else { - args[max_idx] -= DIM; - } - args[max_idx] = args[max_idx] == 0 ? 1 : args[max_idx]; - } else { - args[max_idx]--; - } - - spad_rows = tiled_conv_total_spad_rows( - false, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args[0], args[1], args[2], args[3], - args[4], args[5], args[6], pool_size, pool_stride); - acc_rows = tiled_conv_total_spad_rows( - true, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args[0], args[1], args[2], args[3], - args[4], args[5], args[6], pool_size, pool_stride); - } - - // Check if we can increase ocols - bool not_increased = false; - while (!not_increased) { - not_increased = true; - - int args_candidate[] = {args[0], args[1], args[2], args[3], - args[4], args[5], args[6]}; - args_candidate[ocols_idx]++; - - if (args_candidate[ocols_idx] > max_args[ocols_idx]) - continue; - - spad_rows = tiled_conv_total_spad_rows( - false, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - acc_rows = tiled_conv_total_spad_rows( - true, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - - if (spad_rows <= max_spad_rows && acc_rows <= max_acc_rows) { - args[ocols_idx] = args_candidate[ocols_idx]; - not_increased = false; - } - } - - // Check if there are any parameters that we can currently still increase - bool nothing_increased = false; - while (!nothing_increased) { - nothing_increased = true; - - for (size_t i = 0; i < sizeof(args) / sizeof(args[0]); i++) { - int args_candidate[] = {args[0], args[1], args[2], args[3], - args[4], args[5], args[6]}; - args_candidate[i]++; - - if (args_candidate[i] > max_args[i]) - continue; - - spad_rows = tiled_conv_total_spad_rows( - false, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - acc_rows = tiled_conv_total_spad_rows( - true, stride, input_dilation, kernel_dilation, downsample, - trans_weight_0132, trans_input_3120, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - - if (spad_rows <= max_spad_rows && acc_rows <= max_acc_rows) { - args[i] = args_candidate[i]; - nothing_increased = false; - } - } - } - - const int batches = args[0]; - const int orows = args[1]; - const int ocols = args[2]; - const int ochs = args[3]; - const int krows = args[4]; - const int kcols = args[5]; - const int kchs = args[6]; - - /* - spad_rows = tiled_conv_total_spad_rows(false, - stride, input_dilation, kernel_dilation, downsample, trans_weight_0132, - trans_input_3120, args[0], args[1], args[2], args[3], args[4], args[5], - args[6], pool_size, pool_stride); acc_rows = tiled_conv_total_spad_rows(true, - stride, input_dilation, kernel_dilation, downsample, trans_weight_0132, - trans_input_3120, args[0], args[1], args[2], args[3], args[4], args[5], - args[6], pool_size, pool_stride); - */ - -#ifdef PRINT_TILE -#if PRINT_TILE - printf("batches = %d\n", batches); - printf("orows = %d\n", orows); - printf("ocols = %d\n", ocols); - printf("ochs = %d\n", ochs); - printf("krows = %d\n", krows); - printf("kcols = %d\n", kcols); - printf("kchs = %d\n\n", kchs); - - printf("total spad_rows reserved: %d\n", spad_rows); - printf("total acc_rows reserved: %d\n\n", acc_rows); - - printf("scratchpad row utilization: %d%%\n", - (spad_rows * 100) / max_spad_rows); - printf("accumulator row utilization: %d%%\n\n", - (acc_rows * 100) / max_acc_rows); - - printf("inner matmul size: i=%d, j=%d, k=%d\n\n", ocols, ochs, kchs); -#endif -#endif - - tiled_conv(batch_size, in_row_dim, in_col_dim, in_channels, out_channels, - out_row_dim, out_col_dim, stride, input_dilation, kernel_dilation, - padding, kernel_dim, in_stride, weight_stride, out_stride, wrot180, - trans_output_1203, trans_input_3120, trans_weight_1203, - trans_weight_0132, - - batches, orows, ocols, ochs, krows, kcols, kchs, - - input, weights, bias, output, - - act, scale, pool_size, no_pool ? 0 : pool_stride, pool_padding, - - tiled_conv_type); -} - -static void tiled_conv_auto(int batch_size, int in_row_dim, int in_col_dim, - int in_channels, int out_channels, int out_row_dim, - int out_col_dim, int stride, int input_dilation, - int kernel_dilation, int padding, int kernel_dim, - bool wrot180, bool trans_output_1203, - bool trans_input_3120, bool trans_weight_1203, - bool trans_weight_0132, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale, int pool_size, - int pool_stride, int pool_padding, - - enum tiled_matmul_type_t tiled_conv_type) { - - int in_stride = in_channels; - int out_stride = out_channels; - int weight_stride = out_channels; - tiled_conv_stride_auto( - batch_size, in_row_dim, in_col_dim, in_channels, out_channels, - out_row_dim, out_col_dim, stride, input_dilation, kernel_dilation, - padding, kernel_dim, in_stride, weight_stride, out_stride, wrot180, - trans_output_1203, trans_input_3120, trans_weight_1203, trans_weight_0132, - - input, weights, bias, output, - - act, scale, pool_size, pool_stride, pool_padding, tiled_conv_type); -} - -// This function is for a convolution with kernel_dim=1, stride==2, padding=0, -// and no pooling -static void tiled_conv_downsample(int batch_size, int in_row_dim, - int in_col_dim, int in_channels, - int out_channels, int out_row_dim, - int out_col_dim, int in_stride, - int weight_stride, int out_stride, - - const elem_t *input, const elem_t *weights, - const acc_t *bias, elem_t *output, - - int act, acc_scale_t scale, - - enum tiled_matmul_type_t tiled_conv_type) { - - // Rectangular dimensions for this function are currently not supported - if (in_row_dim != in_col_dim || out_row_dim != out_col_dim) { - printf("Rectangular convolutions for tiled_conv_downsample are currently " - "not supported.\n"); - exit(1); - } - - const int in_dim = in_row_dim; - const int out_dim = out_row_dim; - - const int stride = 2; - - for (int b = 0; b < batch_size; b++) { - for (int irow = 0; irow < in_row_dim; irow += stride) { - const int orow = irow / stride; - - const int I = in_col_dim / stride; // number of columns in row - const int J = out_channels; - const int K = in_channels; - - const elem_t *A = input + (b * in_dim + irow) * in_dim * in_stride; - const elem_t *B = weights; - const acc_t *D = bias; - elem_t *C = output + (b * out_dim + orow) * out_dim * out_stride; - - const int A_stride = in_stride * 2; - const int B_stride = weight_stride; - const int D_stride = out_stride; - const int C_stride = out_stride; - - tiled_matmul_auto(I, J, K, A, B, (void *)D, (void *)C, A_stride, B_stride, - D_stride, C_stride, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, act, scale, 0, - true, false, false, false, false, 0, tiled_conv_type); - } - } -} - -// for mobilenet's depthwise convs -static void tiled_conv_dw_auto(int batch_size, int in_row_dim, int in_col_dim, - int channels, int out_row_dim, int out_col_dim, - int stride, int padding, int kernel_dim, - - elem_t *input, elem_t *weights, acc_t *bias, - elem_t *output, - - int act, acc_scale_t scale, int pool_size, - int pool_stride, int pool_padding, - - enum tiled_matmul_type_t tiled_conv_type) { - - const bool no_pool = pool_stride == 0; - if (no_pool) { - pool_size = 1; - pool_stride = 1; - pool_padding = 0; - } - - const int pool_out_row_dim = - (out_row_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - const int pool_out_col_dim = - (out_col_dim + 2 * pool_padding - pool_size) / pool_stride + 1; - - // Tile convolution params - - // int args[] = {batch_size, porows, pocols, pochs, krows, kcols, kchs}; - int args[] = {batch_size, pool_out_row_dim, pool_out_col_dim, - 1, kernel_dim, kernel_dim, - 1}; - const int max_args[] = {batch_size, pool_out_row_dim, pool_out_col_dim, - 1, kernel_dim, kernel_dim, - 1}; - - const int orows_idx = 1; - const int ocols_idx = 2; - const int out_channels_idx = 3; - - // We divide by 2 for the sake of double-buffering - const int max_spad_rows = (BANK_NUM * BANK_ROWS / 2); - const int max_acc_rows = (ACC_ROWS / 2); - - int spad_rows = tiled_conv_total_spad_rows( - false, stride, 1, 1, false, false, false, args[0], args[1], args[2], - args[3], args[4], args[5], args[6], pool_size, pool_stride); - int acc_rows = tiled_conv_total_spad_rows( - true, stride, 1, 1, false, false, false, args[0], args[1], args[2], - args[3], args[4], args[5], args[6], pool_size, pool_stride); - - while (spad_rows > max_spad_rows || acc_rows > max_acc_rows) { - int max_val = -1; - int max_idx = -1; - - for (size_t i = 0; i < sizeof(args) / sizeof(args[0]); i++) { - // We avoid reducing ocols when possible to keep the spatial array fully - // utilized - if (!(i == ocols_idx && args[i] <= DIM && args[orows_idx] > 1) && - args[i] > max_val) { - max_val = args[i]; - max_idx = i; - } - } - - if (max_idx == out_channels_idx) { - // For input and output channels, there's no point in subtracting by just - // one - if (args[max_idx] % DIM != 0) { - args[max_idx] = (args[max_idx] / DIM) * DIM; - } else { - args[max_idx] -= DIM; - } - args[max_idx] = args[max_idx] == 0 ? 1 : args[max_idx]; - } else { - args[max_idx]--; - } - - spad_rows = tiled_conv_total_spad_rows( - false, stride, 1, 1, false, false, false, args[0], args[1], args[2], - args[3], args[4], args[5], args[6], pool_size, pool_stride); - acc_rows = tiled_conv_total_spad_rows( - true, stride, 1, 1, false, false, false, args[0], args[1], args[2], - args[3], args[4], args[5], args[6], pool_size, pool_stride); - } - - // Check if we can increase ocols - bool not_increased = false; - while (!not_increased) { - not_increased = true; - - int args_candidate[] = {args[0], args[1], args[2], args[3], - args[4], args[5], args[6]}; - args_candidate[ocols_idx]++; - - if (args_candidate[ocols_idx] > max_args[ocols_idx]) - continue; - - spad_rows = tiled_conv_total_spad_rows( - false, stride, 1, 1, false, false, false, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - acc_rows = tiled_conv_total_spad_rows( - true, stride, 1, 1, false, false, false, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - - if (spad_rows <= max_spad_rows && acc_rows <= max_acc_rows) { - args[ocols_idx] = args_candidate[ocols_idx]; - not_increased = false; - } - } - - // Check if there are any parameters that we can currently still increase - bool nothing_increased = false; - while (!nothing_increased) { - nothing_increased = true; - - for (size_t i = 0; i < sizeof(args) / sizeof(args[0]); i++) { - int args_candidate[] = {args[0], args[1], args[2], args[3], - args[4], args[5], args[6]}; - args_candidate[i]++; - - if (args_candidate[i] > max_args[i]) - continue; - - spad_rows = tiled_conv_total_spad_rows( - false, stride, 1, 1, false, false, false, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - acc_rows = tiled_conv_total_spad_rows( - true, stride, 1, 1, false, false, false, args_candidate[0], - args_candidate[1], args_candidate[2], args_candidate[3], - args_candidate[4], args_candidate[5], args_candidate[6], pool_size, - pool_stride); - - if (spad_rows <= max_spad_rows && acc_rows <= max_acc_rows) { - args[i] = args_candidate[i]; - nothing_increased = false; - } - } - } - - const int batches = args[0]; - const int orows = args[1]; - const int ocols = args[2]; - const int ochs = 1; // args[3]; - const int krows = args[4]; - const int kcols = args[5]; - const int kchs = 1; // args[6]; - - /* - spad_rows = tiled_conv_total_spad_rows(false, - stride, 1, 1, false, false, false, - args[0], args[1], args[2], args[3], args[4], args[5], args[6], pool_size, - pool_stride); acc_rows = tiled_conv_total_spad_rows(true, stride, 1, 1, false, - false, false, args[0], args[1], args[2], args[3], args[4], args[5], args[6], - pool_size, pool_stride); - - printf("batches = %d\n", batches); - printf("orows = %d\n", orows); - printf("ocols = %d\n", ocols); - printf("ochs = %d\n", ochs); - printf("krows = %d\n", krows); - printf("kcols = %d\n", kcols); - printf("kchs = %d\n\n", kchs); - - printf("total spad_rows reserved: %d\n", spad_rows); - printf("total acc_rows reserved: %d\n\n", acc_rows); - - printf("scratchpad row utilization: %d%%\n", (spad_rows*100) / max_spad_rows); - printf("accumulator row utilization: %d%%\n\n", (acc_rows*100) / - max_acc_rows); - - printf("inner matmul size: i=%d, j=%d, k=%d\n\n", ocols, ochs, kchs); - */ - - tiled_conv_dw(batch_size, in_row_dim, in_col_dim, channels, out_row_dim, - out_col_dim, stride, padding, kernel_dim, - - batches, orows, ocols, krows, kcols, - - input, weights, bias, output, - - act, scale, pool_size, no_pool ? 0 : pool_stride, pool_padding, - - tiled_conv_type); -} - -static void resadd_cpu(const size_t I, const size_t J, const size_t stride, - const scale_t A_scale, const scale_t B_scale, - const acc_scale_t C_scale, const elem_t *A, - const elem_t *B, elem_t *C, bool relu) { - - const int minimum = relu ? 0 : elem_t_min; - - for (size_t i = 0; i < I; i++) { - for (size_t j = 0; j < J; j++) { - const elem_t *a = A + i * stride + j; - const elem_t *b = B + i * stride + j; - elem_t *c = C + i * stride + j; - - acc_t result = MVIN_SCALE(*a, A_scale) + MVIN_SCALE(*b, B_scale); - result = ACC_SCALE(result, C_scale); - result = result > elem_t_max ? elem_t_max - : (result < minimum ? minimum : result); - - *c = result; - } - } -} - -static void sp_tiled_resadd(const size_t I, const size_t J, - const scale_t A_scale, const scale_t B_scale, - const elem_t *A, const elem_t *B, elem_t *C, - size_t A_row_stride, size_t B_row_stride, - size_t C_row_stride, bool relu) { - - int pad_I = ((I % DIM) == 0) ? 0 : DIM - (I % DIM); - int pad_J = ((J % DIM) == 0) ? 0 : DIM - (J % DIM); - int tile_I = (I % DIM == 0) ? (int)(I / DIM) : (int)(I / DIM) + 1; - int tile_J = (J % DIM == 0) ? (int)(J / DIM) : (int)(J / DIM) + 1; - // printf("pad I: %d, pad_J: %d, tile_I: %d, tile_J: %d\n", pad_I, pad_J, - // tile_I, tile_J); - gemmini_loop_ws(tile_I, tile_J, 0, pad_I, pad_J, 0, A, B, NULL, C, - A_row_stride, B_row_stride, 0, C_row_stride, false, false, - false, false, false, relu, 0, 0, true); - /* - // Use the new mvin2 command to overlap mvin A, mvin B, and mvout C - - size_t blocks = (J/DIM + (J % DIM != 0)); - if (blocks > MAX_BLOCK_LEN) blocks = MAX_BLOCK_LEN; - - const uint32_t D_sp_addr_start = 1 << (ADDR_LEN-1); - const uint32_t C_sp_addr_start = 3 << (ADDR_LEN-2); - - const size_t rounded_up_J = (J / DIM + (J % DIM != 0)) * DIM; - - // Mvin A - // printf("Mving A\n"); - for (size_t i = 0; i < I; i += DIM) { - for (size_t j = 0; j < J; j += blocks * DIM) { - const size_t cols = j + blocks*DIM <= J ? blocks*DIM : J-j; - const size_t rows = i + DIM <= I ? DIM : I-i; - - const elem_t * const A_dram_addr = A + i * A_row_stride + j; - const uint32_t A_sp_addr = D_sp_addr_start + i * (rounded_up_J/DIM) + - j; - - gemmini_extended_mvin(A_dram_addr, A_sp_addr, cols, rows); - } - } - - // Mvin B - printf("Mving B\n"); - for (size_t i = 0; i < I; i += DIM) { - for (size_t j = 0; j < J; j += blocks * DIM) { - const size_t cols = j + blocks*DIM <= J ? blocks*DIM : J-j; - const size_t rows = i + DIM <= I ? DIM : I-i; - - const elem_t * const B_dram_addr = B + i * B_row_stride + j; - const uint32_t B_sp_addr = C_sp_addr_start + i * (rounded_up_J/DIM) + - j; gemmini_extended_mvin2(B_dram_addr, B_sp_addr, cols, rows); - } - } - - // Mvout C from accumulator - // printf("Mvout C from accumulator\n"); - for (size_t i = 0; i < I; i += DIM) { - for (size_t j = 0; j < J; j += blocks * DIM) { - const size_t cols = j + blocks*DIM <= J ? blocks*DIM : J-j; - const size_t rows = i + DIM <= I ? DIM : I-i; - - elem_t * const C_dram_addr = C + i * C_row_stride + j; - const uint32_t C_sp_addr = D_sp_addr_start + i * (rounded_up_J/DIM) + - j; gemmini_extended_mvout(C_dram_addr, C_sp_addr, cols, rows); - } - } - */ -} - -// Compute MVIN_SCALE(A, A_scale) + MVIN_SCALE(B, B_scale) = C -static void tiled_resadd(const size_t I, const size_t J, const size_t stride, - const size_t tile_I, const size_t tile_J, - const scale_t A_scale, const scale_t B_scale, - const acc_scale_t C_scale, const elem_t *A, - const elem_t *B, elem_t *C, bool relu, - enum tiled_matmul_type_t matadd_type) { - - gemmini_extended_config_st(stride * sizeof(elem_t), - relu ? RELU : NO_ACTIVATION, C_scale); - gemmini_config_ex(WS, 0, 0); - - gemmini_extended4_config_ld(stride * sizeof(elem_t), A_scale, true, DIM, 0); - gemmini_extended4_config_ld(stride * sizeof(elem_t), B_scale, true, DIM, 1); - - for (size_t i = 0; i < I; i += tile_I) { - for (size_t j = 0; j < J; j += tile_J) { - const size_t I_tile = i + tile_I <= I ? tile_I : I - i; - const size_t J_tile = j + tile_J <= J ? tile_J : J - j; - - const elem_t *a = A + i * stride + j; - const elem_t *b = B + i * stride + j; - elem_t *c = C + i * stride + j; - - sp_tiled_resadd(I_tile, J_tile, A_scale, B_scale, a, b, c, stride, stride, - stride, relu); - } - } - - gemmini_fence(); -} - -// Compute (A >> A_shift) + B = C -// specify stride -static void tiled_resadd_stride_auto(const size_t I, const size_t J, - const scale_t A_scale, - const scale_t B_scale, - const acc_scale_t C_scale, - const size_t stride, const elem_t *A, - const elem_t *B, elem_t *C, bool relu, - enum tiled_matmul_type_t matadd_type) { - - if (matadd_type == CPU) { - resadd_cpu(I, J, stride, A_scale, B_scale, C_scale, A, B, C, relu); - return; - } - - size_t tile_I = I, tile_J = J; - - // size_t total_spad_rows = 2 * (tile_I / DIM + (tile_I % DIM != 0))*DIM * - // (tile_J / DIM + (tile_J % DIM != 0)); - size_t total_acc_rows = (tile_I / DIM + (tile_I % DIM != 0)) * DIM * - (tile_J / DIM + (tile_J % DIM != 0)); - - // TODO this is a very inefficient way of doing this... - while (total_acc_rows > ACC_ROWS / 2) { - // if(tile_J > MAX_BLOCK_LEN * DIM) - // tile_J = MAX_BLOCK_LEN * DIM; - // else - if (tile_I >= tile_J || tile_J <= DIM) - tile_I /= 2; - else - tile_J -= DIM; - - total_acc_rows = (tile_I / DIM + (tile_I % DIM != 0)) * DIM * - (tile_J / DIM + (tile_J % DIM != 0)); - } - - // printf("tile_I: %llu\n", tile_I); - // printf("tile_J: %llu\n", tile_J); - - if (matadd_type == WS) { - tiled_resadd(I, J, stride, tile_I, tile_J, A_scale, B_scale, C_scale, A, B, - C, relu, matadd_type); - } else { - printf("Unsupported type\n"); - exit(1); - } -} - -static void tiled_resadd_auto(const size_t I, const size_t J, - const scale_t A_scale, const scale_t B_scale, - const acc_scale_t C_scale, const elem_t *A, - const elem_t *B, elem_t *C, bool relu, - enum tiled_matmul_type_t matadd_type) { - tiled_resadd_stride_auto(I, J, A_scale, B_scale, C_scale, J, A, B, C, relu, - matadd_type); -} - -static void global_average_cpu(const elem_t *input, elem_t *output, int batches, - int channels, int dim) { - const int count = dim * dim; - - for (int batch = 0; batch < batches; batch++) { - for (int channel = 0; channel < channels; channel++) { - acc_t sum = 0; - for (int row = 0; row < dim; row++) { - for (int col = 0; col < dim; col++) { - size_t pixel = batch * dim * dim + row * dim + col; - - sum += input[pixel * channels + channel]; - } - } - -#ifdef ELEM_T_IS_FLOAT - output[batch * channels + channel] = sum / count; -#else - output[batch * channels + channel] = (sum + count / 2) / count; -#endif - } - } -} - -static void sp_tiled_global_average(const elem_t *input, elem_t *output, - int batches, int channels, int dim, - int channel_tile_size) { - const uint32_t C_acc_addr_start = ((uint32_t)1 << 31); - - size_t blocks = channel_tile_size / DIM + (channel_tile_size % DIM != 0); - if (blocks > MAX_BLOCK_LEN) - blocks = MAX_BLOCK_LEN; - - for (int channel = 0; channel < channel_tile_size; channel += blocks * DIM) { - for (int row = 0; row < dim; row++) { - for (int col = 0; col < dim; col++) { - const elem_t *in = input + (row * dim + col) * channels + channel; - - const uint32_t acc_addr_start = - C_acc_addr_start | ((row != 0 || col != 0) << 30); - - const uint32_t acc_addr = acc_addr_start + channel / DIM; - - const size_t cols = channel + blocks * DIM <= channel_tile_size - ? blocks * DIM - : channel_tile_size - channel; - - const size_t rows = 1; - - gemmini_extended_mvin(in, acc_addr, cols, rows); - } - } - } - - for (int channel = 0; channel < channel_tile_size; channel += DIM) { - elem_t *out = output + channel; - - const uint32_t acc_addr = C_acc_addr_start + channel / DIM; - - const size_t cols = - channel + DIM <= channel_tile_size ? DIM : channel_tile_size - channel; - - const size_t rows = - 1; // TODO we should move out more than just one row here - - gemmini_extended_mvout(out, acc_addr, cols, rows); - } -} - -static void tiled_global_average(const elem_t *input, elem_t *output, - int batches, int channels, int dim, - int channel_tile_size) { - - gemmini_extended4_config_ld(DIM * sizeof(elem_t), MVIN_SCALE_IDENTITY, true, - 1, 0); - gemmini_config_ex(0, NO_ACTIVATION, 0); - gemmini_extended_config_st(0, NO_ACTIVATION, 1.0 / (dim * dim)); - - for (int batch = 0; batch < batches; batch++) { - for (int channel = 0; channel < channels; channel += channel_tile_size) { - const int tile_size = channel + channel_tile_size <= channels - ? channel_tile_size - : channels - channel; - - sp_tiled_global_average(input + batch * dim * dim * channels + channel, - output + batch * channels + channel, batches, - channels, dim, tile_size); - } - } -} - -static void tiled_global_average_auto(const elem_t *input, elem_t *output, - int batches, int channels, int dim, - enum tiled_matmul_type_t type) { - if (type == CPU) { - return global_average_cpu(input, output, batches, channels, dim); - } - - int channel_tile_size = channels; - - int acc_rows = channel_tile_size / DIM + (channel_tile_size % DIM != 0); - while (acc_rows > ACC_ROWS) { - channel_tile_size--; - acc_rows = channel_tile_size / DIM + (channel_tile_size % DIM != 0); - } - - tiled_global_average(input, output, batches, channels, dim, - channel_tile_size); -} - -static void sp_tiled_norm(const size_t I, const size_t J, const acc_t *in, - elem_t *out, size_t A_row_stride, size_t C_row_stride, - int act) { -#ifdef HAS_NORMALIZATIONS - size_t A_blocks = (J / DIM + (J % DIM != 0)); - if (A_blocks > MAX_BLOCK_LEN_ACC) - A_blocks = MAX_BLOCK_LEN_ACC; - size_t C_blocks = (J / DIM + (J % DIM != 0)); - if (C_blocks > MAX_BLOCK_LEN) - C_blocks = MAX_BLOCK_LEN; - - const uint32_t D_sp_addr_start = 1 << (ADDR_LEN - 1); - const uint32_t C_sp_addr_start = 3 << (ADDR_LEN - 2); - - const size_t rounded_up_J = (J / DIM + (J % DIM != 0)) * DIM; - - for (size_t i = 0; i < I; i += DIM) { - // Mvin - for (size_t j = 0; j < J; j += A_blocks * DIM) { - const size_t cols = j + A_blocks * DIM <= J ? A_blocks * DIM : J - j; - const size_t rows = i + DIM <= I ? DIM : I - i; - - const acc_t *const A_dram_addr = in + i * A_row_stride + j; - const uint32_t A_sp_addr = D_sp_addr_start + i * (rounded_up_J / DIM) + j; - - gemmini_extended_mvin(A_dram_addr, A_sp_addr, cols, rows); - } - - // Mvout - if (act == LAYERNORM) { - uint32_t norm_cmds[][2] = {{1, 2}, {3, 4}, {0, 0}}; - const int norm_cmds_size = sizeof(norm_cmds) / sizeof(norm_cmds[0]); - const size_t rows = I - i < DIM ? I - i : DIM; - for (size_t row = 0; row < rows; row += NORM_STAT_IDS) { - const size_t stat_ids = - rows - row > NORM_STAT_IDS ? NORM_STAT_IDS : rows - row; - for (int cmd = 0; cmd < norm_cmds_size; cmd++) { - for (size_t stat_id = 0; stat_id < stat_ids; stat_id++) { - gemmini_config_norm(0, 0, 0, 0, stat_id, 0, 0); - const size_t r = row + stat_id; - for (size_t jj = 0; jj < J; jj += C_blocks * DIM) { - uint32_t norm_C_sp_addr = - C_sp_addr_start + i * (rounded_up_J / DIM) + jj + r; - if (jj + C_blocks * DIM >= J) { - norm_C_sp_addr |= (norm_cmds[cmd][1] - << 26); // Final mean/inv-std-dev calculation - } else { - norm_C_sp_addr |= - (norm_cmds[cmd][0] << 26); // Accumulate sum/variance - } - void *const C_dram_addr = - (int8_t *)out + (i * C_row_stride + jj) * sizeof(elem_t) + - r * C_row_stride * sizeof(elem_t); - const size_t cols = - J - jj < C_blocks * DIM ? J - jj : C_blocks * DIM; - gemmini_extended_mvout(C_dram_addr, norm_C_sp_addr, cols, 1); - } - } - } - } - } else if (act == SOFTMAX) { - uint32_t norm_cmds[][2] = {{5, 5}, {6, 7}, {0, 0}}; - const int norm_cmds_size = sizeof(norm_cmds) / sizeof(norm_cmds[0]); - const size_t rows = I - i < DIM ? I - i : DIM; - for (size_t row = 0; row < rows; row += NORM_STAT_IDS) { - const size_t stat_ids = - rows - row > NORM_STAT_IDS ? NORM_STAT_IDS : rows - row; - for (int cmd = 0; cmd < norm_cmds_size; cmd++) { - for (size_t stat_id = 0; stat_id < stat_ids; stat_id++) { - // set stat id only - gemmini_config_norm(0, 0, 1, 0, stat_id, 0, 0); - const size_t r = row + stat_id; - for (size_t jj = 0; jj < J; jj += C_blocks * DIM) { - uint32_t norm_C_sp_addr = - C_sp_addr_start + i * (rounded_up_J / DIM) + jj + r; - if (jj + C_blocks * DIM >= J) { - norm_C_sp_addr |= (norm_cmds[cmd][1] - << 26); // Final mean/inv-std-dev calculation - } else { - norm_C_sp_addr |= - (norm_cmds[cmd][0] << 26); // Accumulate sum/variance - } - void *const C_dram_addr = - (int8_t *)out + (i * C_row_stride + jj) * sizeof(elem_t) + - r * C_row_stride * sizeof(elem_t); - const size_t cols = - J - jj < C_blocks * DIM ? J - jj : C_blocks * DIM; - gemmini_extended_mvout(C_dram_addr, norm_C_sp_addr, cols, 1); - } - } - } - } - } - } -#else - printf("Normalizations not supported in this Gemmini config\n"); - exit(1); -#endif -} - -static void tiled_norm(const size_t I, const size_t J, const size_t tile_I, - const size_t tile_J, const acc_t *in, elem_t *out, - const acc_scale_t C_scale, int act, - enum tiled_matmul_type_t norm_type) { - - gemmini_extended_config_st(J * sizeof(elem_t), act & 3, C_scale); - gemmini_config_ex(WS, 0, 0); // TODO is this actually required? - - gemmini_extended4_config_ld(J * sizeof(acc_t), MVIN_SCALE_IDENTITY, false, - DIM, 0); - gemmini_extended4_config_ld(J * sizeof(acc_t), MVIN_SCALE_IDENTITY, false, - DIM, 1); - - if (act == SOFTMAX) { - const scale_t a = 0.3585; - const scale_t b = 1.353; - const scale_t c = 0.344; - - // TODO let bert-scale be set by the programmer - acc_scale_t bert_scale = 0.05; - const acc_t qln2 = (int)(0.693147 / bert_scale); - const acc_t qln2_inv = 65536 / qln2; - const acc_t qb = b / bert_scale; - const acc_t qc = c / (a * bert_scale * bert_scale); - - gemmini_config_norm(qln2, 0, 0, 1, 0, qb, qc); - gemmini_config_norm(qln2_inv, 1, 0, 1, 0, qb, qc); - } - - for (size_t i = 0; i < I; i += tile_I) { - for (size_t j = 0; j < J; j += tile_J) { - const size_t I_tile = i + tile_I <= I ? tile_I : I - i; - const size_t J_tile = j + tile_J <= J ? tile_J : J - j; - - const acc_t *in_ = in + i * J + j; - elem_t *out_ = out + i * J + j; - - sp_tiled_norm(I_tile, J_tile, in_, out_, J, J, act); - } - } - - gemmini_fence(); -} - -static void tiled_norm_auto(const size_t I, const size_t J, const acc_t *in, - elem_t *out, const acc_scale_t C_scale, int act, - enum tiled_matmul_type_t norm_type) { - - size_t tile_I = I, tile_J = J; - size_t total_acc_rows = (tile_I / DIM + (tile_I % DIM != 0)) * DIM * - (tile_J / DIM + (tile_J % DIM != 0)); - - while (total_acc_rows > ACC_ROWS) { - if (tile_I > 1) { - tile_I--; - } else { - // TODO we should be able to tile over J as well to avoid this issue - printf("Can't fit pre-normalized tensor into accumulator"); - exit(1); - } - - total_acc_rows = (tile_I / DIM + (tile_I % DIM != 0)) * DIM * - (tile_J / DIM + (tile_J % DIM != 0)); - } - - if (norm_type) { - tiled_norm(I, J, tile_I, tile_J, in, out, C_scale, act, norm_type); - } else { - printf("Unsupported type\n"); - exit(1); - } -} - -#undef abs - -#endif // SRC_MAIN_C_GEMMINI_H diff --git a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_counter.h b/bb-tests/workloads/src/CTest/gemmini/include/gemmini_counter.h deleted file mode 100644 index 6050ed74..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_counter.h +++ /dev/null @@ -1,79 +0,0 @@ -// See LICENSE for license details. - -#ifndef COUNTER_H_ -#define COUNTER_H_ - -#define DISABLE 0 - -#define INCREMENTAL_COUNTERS 44 - -// All existing Gemmini performance counters - -#define MAIN_LD_CYCLES 1 -#define MAIN_ST_CYCLES 2 -#define MAIN_EX_CYCLES 3 -#define MAIN_LD_ST_CYCLES 4 -#define MAIN_LD_EX_CYCLES 5 -#define MAIN_ST_EX_CYCLES 6 -#define MAIN_LD_ST_EX_CYCLES 7 - -#define LOAD_DMA_WAIT_CYCLE 8 -#define LOAD_ACTIVE_CYCLE 9 -#define LOAD_SCRATCHPAD_WAIT_CYCLE 10 - -#define STORE_DMA_WAIT_CYCLE 11 -#define STORE_ACTIVE_CYCLE 12 -#define STORE_POOLING_CYCLE 13 -#define STORE_SCRATCHPAD_WAIT_CYCLE 14 - -#define DMA_TLB_MISS_CYCLE 15 -#define DMA_TLB_HIT_REQ 16 -#define DMA_TLB_TOTAL_REQ 17 - -#define RDMA_ACTIVE_CYCLE 18 -#define RDMA_TLB_WAIT_CYCLES 19 -#define RDMA_TL_WAIT_CYCLES 20 - -#define WDMA_ACTIVE_CYCLE 21 -#define WDMA_TLB_WAIT_CYCLES 22 -#define WDMA_TL_WAIT_CYCLES 23 - -#define EXE_ACTIVE_CYCLE 24 -#define EXE_FLUSH_CYCLE 25 -#define EXE_CONTROL_Q_BLOCK_CYCLE 26 -#define EXE_PRELOAD_HAZ_CYCLE 27 -#define EXE_OVERLAP_HAZ_CYCLE 28 - -#define SCRATCHPAD_A_WAIT_CYCLE 29 -#define SCRATCHPAD_B_WAIT_CYCLE 30 -#define SCRATCHPAD_D_WAIT_CYCLE 31 - -#define ACC_A_WAIT_CYCLE 32 -#define ACC_B_WAIT_CYCLE 33 -#define ACC_D_WAIT_CYCLE 34 - -#define A_GARBAGE_CYCLES 35 -#define B_GARBAGE_CYCLES 36 -#define D_GARBAGE_CYCLES 37 - -#define IM2COL_MEM_CYCLES 38 -#define IM2COL_ACTIVE_CYCLES 39 -#define IM2COL_TRANSPOSER_WAIT_CYCLE 40 - -#define RESERVATION_STATION_FULL_CYCLES 41 -#define RESERVATION_STATION_ACTIVE_CYCLES 42 - -#define LOOP_MATMUL_ACTIVE_CYCLES 43 -#define TRANSPOSE_PRELOAD_UNROLLER_ACTIVE_CYCLES 44 - -#define RESERVATION_STATION_LD_COUNT (INCREMENTAL_COUNTERS + 1) -#define RESERVATION_STATION_ST_COUNT (INCREMENTAL_COUNTERS + 2) -#define RESERVATION_STATION_EX_COUNT (INCREMENTAL_COUNTERS + 3) - -#define RDMA_BYTES_REC (INCREMENTAL_COUNTERS + 4) -#define WDMA_BYTES_SENT (INCREMENTAL_COUNTERS + 5) - -#define RDMA_TOTAL_LATENCY (INCREMENTAL_COUNTERS + 6) -#define WDMA_TOTAL_LATENCY (INCREMENTAL_COUNTERS + 7) - -#endif diff --git a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_nn.h b/bb-tests/workloads/src/CTest/gemmini/include/gemmini_nn.h deleted file mode 100644 index 16b3ebbd..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_nn.h +++ /dev/null @@ -1,584 +0,0 @@ -#ifndef GEMMINI_NN_H -#define GEMMINI_NN_H - -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini.h" -#include "include/gemmini_testutils.h" - -struct ConvParams { - int batch_size; - int in_row_dim; - int in_col_dim; - int out_row_dim; - int out_col_dim; - int kernel_size; - int in_channels; - int out_channels; - int in_stride; - int weight_stride; - int out_stride; - int stride; - int padding; - bool bias; - bool depthwise; - int n_patches; - int patch_size; - acc_scale_t output_scale; - scale_t res_scale; - int pool_size, pool_stride, pool_padding, out_dim_pooled; - - int I, J, K; -}; - -struct FcParams { - int batch_size; - int in_features; - int out_features; - acc_scale_t output_scale; - bool bias; - - int I, J, K; -}; - -#define HIST_IMAGES(IMAGES) \ - for (int num = -128; num <= 127; num++) { \ - int count = 0; \ - for (int i = 0; i < sizeof(IMAGES) / sizeof(IMAGES[0]); i++) { \ - for (int j = 0; j < sizeof(IMAGES[0]) / sizeof(IMAGES[0][0]); j++) { \ - for (int k = 0; k < sizeof(IMAGES[0][0]) / sizeof(IMAGES[0][0][0]); \ - k++) { \ - for (int l = 0; \ - l < sizeof(IMAGES[0][0][0]) / sizeof(IMAGES[0][0][0][0]); \ - l++) { \ - if (IMAGES[i][j][k][l] == num) { \ - count++; \ - } \ - } \ - } \ - } \ - } \ - if (count > 0) \ - printf("%d: %d times\n", num, count); \ - } - -#define HIST_MATRIX(MATRIX) \ - for (int num = -128; num <= 127; num++) { \ - int count = 0; \ - for (int i = 0; i < sizeof(MATRIX) / sizeof(MATRIX[0]); i++) { \ - for (int j = 0; j < sizeof(MATRIX[0]) / sizeof(MATRIX[0][0]); j++) { \ - if (MATRIX[i][j] == num) { \ - count++; \ - } \ - } \ - } \ - if (count > 0) \ - printf("%d: %d times\n", num, count); \ - } - -// This function runs a tiled matrix multiplication, with explicit tiling -// factors -static void tiled_matmul_nn(size_t dim_I, size_t dim_J, size_t dim_K, - const elem_t A[dim_I][dim_K], - const elem_t B[dim_K][dim_J], const void *D, - elem_t C[dim_I][dim_J], int act, acc_scale_t scale, - bool repeating_bias, size_t tile_I, size_t tile_J, - size_t tile_K, - enum tiled_matmul_type_t tiled_matmul_type, - bool check, char *layer_name) { - if (check) - printf("%s: gemmini\n", layer_name); - - tiled_matmul(dim_I, dim_J, dim_K, (elem_t *)A, (elem_t *)B, D, (elem_t *)C, - dim_K, dim_J, dim_J, dim_J, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, act, scale, 0, - repeating_bias, tile_I, tile_J, tile_K, false, false, false, - false, 0, tiled_matmul_type); - - if (check) { - printf("%s: CPU\n", layer_name); - elem_t gold[dim_I][dim_J]; - tiled_matmul_auto(dim_I, dim_J, dim_K, (elem_t *)A, (elem_t *)B, D, - (elem_t *)gold, dim_K, dim_J, dim_J, dim_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, act, scale, 0, repeating_bias, false, - false, false, false, 0, CPU); - - if (!MAT_IS_EQUAL(dim_I, dim_J, C, gold)) { - printf("Layer calculated incorrectly: %s\n", layer_name); - exit(1); - } - } -} - -// This function runs a tiled matrix multiplication, with automatically -// calculated tiling factors -// With default auto-stride calc (A_stride = dim_K, B_stride/C_stride/D_stride = -// dim_J) -static void tiled_matmul_nn_auto(size_t dim_I, size_t dim_J, size_t dim_K, - const elem_t A[dim_I][dim_K], - const elem_t B[dim_K][dim_J], const void *D, - elem_t C[dim_I][dim_J], int act, - acc_scale_t scale, bool repeating_bias, - enum tiled_matmul_type_t tiled_matmul_type, - bool check, char *layer_name) { - if (check) - printf("%s: gemmini\n", layer_name); - - tiled_matmul_auto(dim_I, dim_J, dim_K, (elem_t *)A, (elem_t *)B, D, - (elem_t *)C, dim_K, dim_J, dim_J, dim_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, act, scale, 0, repeating_bias, false, - false, false, false, 0, tiled_matmul_type); - - if (check) { - printf("%s: CPU\n", layer_name); - elem_t gold[dim_I][dim_J]; - tiled_matmul_auto(dim_I, dim_J, dim_K, (elem_t *)A, (elem_t *)B, D, - (elem_t *)gold, dim_K, dim_J, dim_J, dim_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, act, scale, 0, repeating_bias, false, - false, false, false, 0, CPU); - - if (!MAT_IS_EQUAL(dim_I, dim_J, C, gold)) { - printf("Layer calculated incorrectly: %s\n", layer_name); - exit(1); - } - } -} - -// need to specify stride -// auto tiling calc -static void tiled_matmul_nn_stride_auto( - size_t dim_I, size_t dim_J, size_t dim_K, const size_t A_stride, - const size_t B_stride, const size_t C_stride, const elem_t *A, - const elem_t *B, const void *D, const elem_t *C, int act, acc_scale_t scale, - bool repeating_bias, enum tiled_matmul_type_t tiled_matmul_type) { - - tiled_matmul_auto(dim_I, dim_J, dim_K, (elem_t *)A, (elem_t *)B, D, - (elem_t *)C, A_stride, B_stride, C_stride, C_stride, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, act, scale, 0, repeating_bias, false, - false, false, false, 0, tiled_matmul_type); -} -static void conv_dw( - size_t I, size_t J, const size_t batch_size, const size_t channels, - const size_t in_row_dim, const size_t in_col_dim, const size_t out_row_dim, - const size_t out_col_dim, const size_t kernel_size, - const elem_t input[batch_size][in_row_dim][in_col_dim][channels], - const elem_t weight[channels][kernel_size][kernel_size], const acc_t *bias, - // elem_t output [batch_size][out_row_dim][out_col_dim][channels], - elem_t output[I][J], const struct ConvParams *params) { - for (int batch = 0; batch < batch_size; batch++) { - for (int channel = 0; channel < channels; channel++) { - for (int out_row = 0; out_row < out_row_dim; out_row++) { - for (int out_col = 0; out_col < out_col_dim; out_col++) { - int in_row = out_row * params->stride - params->padding; - - acc_t result = 0; - if (params->bias) { - result = bias[channel]; - } - - for (int kernel_row = 0; kernel_row < params->kernel_size; - kernel_row++) { - int in_col = out_col * params->stride - params->padding; - - for (int kernel_col = 0; kernel_col < params->kernel_size; - kernel_col++) { - if (in_row >= 0 && in_row < params->in_row_dim && in_col >= 0 && - in_col < params->in_col_dim) { - result += input[batch][in_row][in_col][channel] * - weight[channel][kernel_row][kernel_col]; - } - - in_col++; - } - - in_row++; - } - - if (result < 0) { - result = 0; - } - - acc_t scaled = ACC_SCALE(result, params->output_scale); - - if (scaled > elem_t_max) { - scaled = elem_t_max; - } else if (scaled < elem_t_min) { - scaled = elem_t_min; - } - - size_t r = batch * params->out_row_dim * params->out_col_dim + - out_row * params->out_col_dim + out_col; - output[r][channel] = scaled; - // output[batch][out_row][out_col][channel] = scaled; - } - } - } - } -} - -static void conv_dw_with_col2im( - size_t prev_I, size_t prev_J, size_t I, size_t J, const size_t batch_size, - const size_t channels, const size_t out_row_dim, const size_t out_col_dim, - const size_t kernel_size, const elem_t input[prev_I][prev_J], - const elem_t weight[channels][kernel_size][kernel_size], const acc_t *bias, - // elem_t output [batch_size][out_dim][out_dim][channels], - elem_t output[I][J], const struct ConvParams *params) { - for (int batch = 0; batch < batch_size; batch++) { - for (int channel = 0; channel < channels; channel++) { - for (int out_row = 0; out_row < out_row_dim; out_row++) { - for (int out_col = 0; out_col < out_col_dim; out_col++) { - int in_row = out_row * params->stride - params->padding; - - acc_t result = 0; - if (params->bias) { - result = bias[channel]; - } - - for (int kernel_row = 0; kernel_row < params->kernel_size; - kernel_row++) { - int in_col = out_col * params->stride - params->padding; - - for (int kernel_col = 0; kernel_col < params->kernel_size; - kernel_col++) { - if (in_row >= 0 && in_row < params->in_row_dim && in_col >= 0 && - in_col < params->in_col_dim) { - // result += input[batch][in_row][in_col][channel] * - // weight[channel][kernel_row][kernel_col]; - - size_t r = batch * params->in_row_dim * params->in_col_dim + - in_row * params->in_col_dim + in_col; - - result += - input[r][channel] * weight[channel][kernel_row][kernel_col]; - } - - in_col++; - } - - in_row++; - } - - if (result < 0) { - result = 0; - } - - acc_t scaled = ACC_SCALE(result, params->output_scale); - - if (scaled > elem_t_max) { - scaled = elem_t_max; - } else if (scaled < elem_t_min) { - scaled = elem_t_min; - } - - size_t r = batch * params->out_row_dim * params->out_col_dim + - out_row * params->out_col_dim + out_col; - output[r][channel] = scaled; - // output[batch][out_row][out_col][channel] = scaled; - } - } - } - } -} - -static void -im2col(size_t batch_size, size_t channels, size_t im_row_dim, size_t im_col_dim, - size_t I, size_t K, - const elem_t input[batch_size][im_row_dim][im_col_dim][channels], - elem_t output[I][K], const struct ConvParams *params) { - int patch_row = 0; - - for (int n_batch = 0; n_batch < params->batch_size; n_batch++) { - for (int im_row = -params->padding; - im_row < - params->in_row_dim - params->kernel_size + params->padding + 1; - im_row += params->stride) { - for (int im_col = -params->padding; - im_col < - params->in_col_dim - params->kernel_size + params->padding + 1; - im_col += params->stride) { - int patch_col = 0; - - for (int filter_row = 0; filter_row < params->kernel_size; - filter_row++) { - for (int filter_col = 0; filter_col < params->kernel_size; - filter_col++) { - for (int im_channel = 0; im_channel < params->in_channels; - im_channel++) { - int pixel_row = im_row + filter_row; - int pixel_col = im_col + filter_col; - - if (pixel_row < 0 || pixel_row >= params->in_row_dim || - pixel_col < 0 || pixel_col >= params->in_col_dim) { - // output[patch_row][patch_col] = 0; - } else { - output[patch_row][patch_col] = - input[n_batch][pixel_row][pixel_col][im_channel]; - } - - patch_col++; - } - } - } - - patch_row++; - } - } - } -} - -static void im2col_with_col2im(size_t prev_I, size_t prev_J, size_t next_I, - size_t next_K, - const elem_t input[prev_I][prev_J], - elem_t output[next_I][next_K], - const struct ConvParams *params) { - int out_row = 0; - - for (int n_batch = 0; n_batch < params->batch_size; n_batch++) { - for (int im_row = -params->padding; - im_row < - params->in_row_dim - params->kernel_size + params->padding + 1; - im_row += params->stride) { - for (int im_col = -params->padding; - im_col < - params->in_col_dim - params->kernel_size + params->padding + 1; - im_col += params->stride) { - int out_col = 0; - - for (int filter_row = 0; filter_row < params->kernel_size; - filter_row++) { - for (int filter_col = 0; filter_col < params->kernel_size; - filter_col++) { - for (int im_channel = 0; im_channel < params->in_channels; - im_channel++) { - int pixel_row = im_row + filter_row; - int pixel_col = im_col + filter_col; - - if (pixel_row < 0 || pixel_row >= params->in_row_dim || - pixel_col < 0 || pixel_col >= params->in_col_dim) { - // output[out_row][out_col] = 0; - } else { - int in_row = n_batch * params->in_row_dim * params->in_col_dim + - pixel_row * params->in_col_dim + pixel_col; - int in_col = im_channel; - - output[out_row][out_col] = input[in_row][in_col]; - } - - out_col++; - } - } - } - - out_row++; - } - } - } -} - -// Compute C = A + B with saturating add -void vecadd(size_t len, const elem_t *A, const elem_t *B, elem_t *C, - scale_t A_shift) { - for (size_t i = 0; i < len; i++) { - acc_t result = MVIN_SCALE(A[i], A_shift) + B[i]; - - if (result > elem_t_max) { - result = elem_t_max; - } else if (result < elem_t_min) { - result = elem_t_min; - } - - C[i] = result; - } -} - -void resadd1(const size_t batch_size, const size_t channels, - const size_t im_dim, - const elem_t A[batch_size][im_dim][im_dim][channels], - const elem_t B[batch_size][im_dim][im_dim][channels], - elem_t C[batch_size][im_dim][im_dim][channels], bool relu, - const struct ConvParams *params) { - - const int minimum = relu ? 0 : elem_t_min; - - for (size_t batch = 0; batch < params->batch_size; batch++) { - for (size_t row = 0; row < params->out_dim_pooled; row++) { - for (size_t col = 0; col < params->out_dim_pooled; col++) { - for (size_t channel = 0; channel < params->out_channels; channel++) { - acc_t result = - MVIN_SCALE(A[batch][row][col][channel], params->res_scale) + - B[batch][row][col][channel]; - - if (result > elem_t_max) { - result = elem_t_max; - } else if (result < minimum) { - result = minimum; - } - - C[batch][row][col][channel] = result; - } - } - } - } -} - -void resadd2(const size_t I, const size_t J, const size_t batch_size, - const size_t channels, const size_t im_dim, const elem_t A[I][J], - const elem_t B[batch_size][im_dim][im_dim][channels], - elem_t C[batch_size][im_dim][im_dim][channels], bool relu, - const struct ConvParams *params) { - - const int minimum = relu ? 0 : elem_t_min; - - for (size_t batch = 0; batch < params->batch_size; batch++) { - for (size_t row = 0; row < params->out_dim_pooled; row++) { - for (size_t col = 0; col < params->out_dim_pooled; col++) { - for (size_t channel = 0; channel < params->out_channels; channel++) { - size_t r = batch * params->out_dim_pooled * params->out_dim_pooled + - row * params->out_dim_pooled + col; - - acc_t result = MVIN_SCALE(A[r][channel], params->res_scale) + - B[batch][row][col][channel]; - - if (result > elem_t_max) { - result = elem_t_max; - } else if (result < minimum) { - result = minimum; - } - - C[batch][row][col][channel] = result; - } - } - } - } -} - -void resadd3(const size_t I, const size_t J, const elem_t A[I][J], - const elem_t B[I][J], elem_t C[I][J], bool relu, - const struct ConvParams *params) { - - const int minimum = relu ? 0 : elem_t_min; - - for (size_t batch = 0; batch < params->batch_size; batch++) { - for (size_t row = 0; row < params->out_dim_pooled; row++) { - for (size_t col = 0; col < params->out_dim_pooled; col++) { - for (size_t channel = 0; channel < params->out_channels; channel++) { - size_t r = batch * params->out_dim_pooled * params->out_dim_pooled + - row * params->out_dim_pooled + col; - - acc_t result = - MVIN_SCALE(A[r][channel], params->res_scale) + B[r][channel]; - - if (result > elem_t_max) { - result = elem_t_max; - } else if (result < minimum) { - result = minimum; - } - - C[r][channel] = result; - } - } - } - } -} - -// Pooling -void pool(size_t batch_size, size_t channels, size_t in_row_dim, - size_t in_col_dim, size_t out_row_dim, size_t out_col_dim, - elem_t input[batch_size][in_row_dim][in_col_dim][channels], - elem_t output[batch_size][out_row_dim][out_col_dim][channels], - const struct ConvParams *params) { - size_t kernel_size = params->pool_size; - size_t stride = params->pool_stride; - // size_t in_dim = params->out_dim; - size_t padding = params->pool_padding; - - for (int batch = 0; batch < batch_size; batch++) { - for (int channel = 0; channel < channels; channel++) { - for (int out_row = 0; out_row < out_row_dim; out_row++) { - for (int out_col = 0; out_col < out_col_dim; out_col++) { - int in_row = out_row * stride - padding; - - elem_t result = elem_t_min; - - for (int kernel_row = 0; kernel_row < kernel_size; kernel_row++) { - int in_col = out_col * stride - padding; - - for (int kernel_col = 0; kernel_col < kernel_size; kernel_col++) { - if (in_row >= 0 && in_row < in_row_dim && in_col >= 0 && - in_col < in_col_dim) { - if (input[batch][in_row][in_col][channel] > result) { - result = input[batch][in_row][in_col][channel]; - } - } else if (0 > result) { - result = 0; - } - - in_col++; - } - - in_row++; - } - - output[batch][out_row][out_col][channel] = result; - } - } - } - } -} - -void pool_with_col2im( - size_t I, size_t J, size_t batch_size, size_t channels, size_t out_row_dim, - size_t out_col_dim, elem_t input[I][J], - elem_t output[batch_size][out_row_dim][out_col_dim][channels], - const struct ConvParams *params) { - size_t kernel_size = params->pool_size; - size_t stride = params->pool_stride; - size_t in_row_dim = params->out_row_dim; - size_t in_col_dim = params->out_col_dim; - size_t padding = params->pool_padding; - - for (int batch = 0; batch < batch_size; batch++) { - for (int channel = 0; channel < channels; channel++) { - for (int out_row = 0; out_row < out_row_dim; out_row++) { - for (int out_col = 0; out_col < out_col_dim; out_col++) { - int in_row = out_row * stride - padding; - - elem_t result = elem_t_min; - - for (int kernel_row = 0; kernel_row < kernel_size; kernel_row++) { - int in_col = out_col * stride - padding; - - for (int kernel_col = 0; kernel_col < kernel_size; kernel_col++) { - if (in_row >= 0 && in_row < in_row_dim && in_col >= 0 && - in_col < in_col_dim) { - if (input[batch * in_row_dim * in_col_dim + - in_row * in_col_dim + in_col][channel] > result) { - result = input[batch * in_row_dim * in_col_dim + - in_row * in_col_dim + in_col][channel]; - } - } else if (0 > result) { - result = 0; - } - - in_col++; - } - - in_row++; - } - - output[batch][out_row][out_col][channel] = result; - } - } - } - } -} - -#endif // GEMMINI_NN_H diff --git a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_params.h b/bb-tests/workloads/src/CTest/gemmini/include/gemmini_params.h deleted file mode 100644 index 119859f7..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_params.h +++ /dev/null @@ -1,102 +0,0 @@ -#ifndef GEMMINI_PARAMS_H -#define GEMMINI_PARAMS_H - -#include -#include - -#define XCUSTOM_ACC 3 -#define DIM 16 -#define ADDR_LEN 32 -#define BANK_NUM 4 -#define BANK_ROWS 4096 -#define ACC_ROWS 1024 -#define MAX_BYTES 64 -#define MAX_BLOCK_LEN (MAX_BYTES / (DIM * 1)) -#define MAX_BLOCK_LEN_ACC (MAX_BYTES / (DIM * 4)) - -typedef int8_t elem_t; -static const elem_t elem_t_max = 127; -static const elem_t elem_t_min = -128; -typedef int32_t acc_t; -typedef int64_t full_t; - -#define HAS_MVIN_SCALE -typedef float scale_t; -typedef uint32_t scale_t_bits; - -typedef int32_t scale_acc_t; -typedef uint32_t scale_acc_t_bits; - -typedef float acc_scale_t; -typedef uint32_t acc_scale_t_bits; - -#define row_align(blocks) \ - __attribute__((aligned(blocks * DIM * sizeof(elem_t)))) -#define row_align_acc(blocks) \ - __attribute__((aligned(blocks * DIM * sizeof(acc_t)))) - -#define MVIN_SCALE_IDENTITY 1.0 - -#define ACC_SCALE_IDENTITY 1.0 - -// Rounding right shift equation: -// https://riscv.github.io/documents/riscv-v-spec/#_vector_fixed_point_rounding_mode_register_vxrm -#define ROUNDING_RIGHT_SHIFT(x, shift) \ - ((shift) > 0 \ - ? (((x) >> (shift)) + \ - (((shift) == 0 ? 0 : (((x) >> ((shift) - 1)) & 1)) & \ - ((((shift) <= 1 ? 0 : ((x) & ((1 << ((shift) - 1)) - 1))) != 0) | \ - (((x) >> (shift)) & 1)))) \ - : ((x) << (-(shift)))) - -#ifdef __cplusplus -#define SAME_TYPE(x) decltype(x) -#else -#define SAME_TYPE(x) typeof(x) -#endif - -#define ROUND_NEAR_EVEN(x) \ - ({ \ - const SAME_TYPE(x) x_ = (x); \ - const long long i = x_; \ - const long long next = x_ < 0 ? x_ - 1 : x_ + 1; \ - SAME_TYPE(x) rem = x_ - i; \ - rem = rem < 0 ? -rem : rem; \ - SAME_TYPE(x) \ - result = rem < 0.5 ? i : (rem > 0.5 ? next : (i % 2 == 0 ? i : next)); \ - result; \ - }) - -// Rounding right shift equation: -// https://riscv.github.io/documents/riscv-v-spec/#_vector_fixed_point_rounding_mode_register_vxrm -#define ROUNDING_RIGHT_SHIFT_BITS(x, shift) \ - ((shift) > 0 \ - ? (((x) >> (shift)) + \ - (((shift) == 0 ? 0 : (((x) >> ((shift) - 1)) & 1)) & \ - ((((shift) <= 1 ? 0 : ((x) & ((1 << ((shift) - 1)) - 1))) != 0) | \ - (((x) >> (shift)) & 1)))) \ - : ((x) << (-(shift)))) - -#define ACC_SCALE(x, scale) \ - ({ \ - float y = ROUND_NEAR_EVEN((x) * (scale)); \ - y > INT8_MAX ? INT8_MAX : (y < INT8_MIN ? INT8_MIN : (acc_t)y); \ - }) - -#define MVIN_SCALE(x, scale) \ - ({ \ - float y = ROUND_NEAR_EVEN((x) * (scale)); \ - y > INT8_MAX ? INT8_MAX : (y < INT8_MIN ? INT8_MIN : (elem_t)y); \ - }) - -#define MVIN_SCALE_ACC(x, scale) (x) - -#define ACC_SCALE_T_IS_FLOAT -#define ACC_SCALE_EXP_BITS 8 -#define ACC_SCALE_SIG_BITS 24 - -#define ACC_READ_SMALL_WIDTH - -#define HAS_FIRST_LAYER_OPTIMIZATIONS - -#endif // GEMMINI_PARAMS_H diff --git a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_testutils.h b/bb-tests/workloads/src/CTest/gemmini/include/gemmini_testutils.h deleted file mode 100644 index 3cfd4d00..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/gemmini_testutils.h +++ /dev/null @@ -1,307 +0,0 @@ -// See LICENSE for license details. - -#ifndef SRC_MAIN_C_GEMMINI_TESTUTILS_H -#define SRC_MAIN_C_GEMMINI_TESTUTILS_H - -#undef abs - -#include -#include -#include -#include -#include -#include - -#include "include/gemmini.h" -#include "include/gemmini_params.h" - -#ifdef BAREMETAL -#undef assert -#define assert(expr) \ - if (!(expr)) { \ - printf("Failed assertion: " #expr "\n " __FILE__ ":%u\n", __LINE__); \ - exit(1); \ - } -#endif - -// #define GEMMINI_ASSERTIONS - -// Matmul utility functions -static void matmul(elem_t A[DIM][DIM], elem_t B[DIM][DIM], elem_t D[DIM][DIM], - full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -static void matmul_short(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], elem_t C[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C[r][c] += A[r][k] * B[k][c]; - } -} - -static void matmul_full(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - full_t D[DIM][DIM], full_t C_full[DIM][DIM]) { - // Identical to the other matmul function, but with a 64-bit bias - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -static void matmul_A_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[k][r] * B[k][c]; - } -} - -static void matmul_short_A_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], elem_t C[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C[r][c] += A[k][r] * B[k][c]; - } -} - -static void matmul_full_A_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - full_t D[DIM][DIM], - full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[k][r] * B[k][c]; - } -} - -static void matmul_B_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[r][k] * B[c][k]; - } -} - -static void matmul_short_B_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], elem_t C[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C[r][c] += A[r][k] * B[c][k]; - } -} - -static void matmul_full_B_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - full_t D[DIM][DIM], - full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[r][k] * B[c][k]; - } -} - -static void matmul_AB_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[k][r] * B[c][k]; - } -} - -static void matmul_short_AB_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - elem_t D[DIM][DIM], elem_t C[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C[r][c] += A[k][r] * B[c][k]; - } -} - -static void matmul_full_AB_transposed(elem_t A[DIM][DIM], elem_t B[DIM][DIM], - full_t D[DIM][DIM], - full_t C_full[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < DIM; k++) - C_full[r][c] += A[k][r] * B[c][k]; - } -} - -static void matadd(full_t sum[DIM][DIM], full_t m1[DIM][DIM], - full_t m2[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) - sum[r][c] = m1[r][c] + m2[r][c]; -} - -// THIS IS A ROUNDING SHIFT! It also performs a saturating cast -static void matshift(full_t full[DIM][DIM], elem_t out[DIM][DIM], int shift) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - // Bitshift and round element - full_t shifted = ROUNDING_RIGHT_SHIFT(full[r][c], shift); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = shifted > elem_t_max - ? elem_t_max - : (shifted < elem_t_min ? elem_t_min : shifted); - out[r][c] = elem; -#else - out[r][c] = shifted; // TODO should we also saturate when using floats? -#endif - } -} - -static void matscale(full_t full[DIM][DIM], elem_t out[DIM][DIM], - acc_scale_t scale) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) { - // Bitshift and round element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -static void matrelu(elem_t in[DIM][DIM], elem_t out[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) - out[r][c] = in[r][c] > 0 ? in[r][c] : 0; -} - -static void transpose(elem_t in[DIM][DIM], elem_t out[DIM][DIM]) { - for (size_t r = 0; r < DIM; r++) - for (size_t c = 0; c < DIM; c++) - out[c][r] = in[r][c]; -} - -int rand() { - static uint32_t x = 777; - x = x * 1664525 + 1013904223; - return x >> 24; -} - -#ifdef ELEM_T_IS_FLOAT -double rand_double() { - double a = (double)(rand() % 128) / (double)(1 + (rand() % 64)); - double b = (double)(rand() % 128) / (double)(1 + (rand() % 64)); - return a - b; -} -#endif - -static void printMatrix(elem_t m[DIM][DIM]) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", m[i][j]); -#else - printf("%x ", elem_t_to_elem_t_bits(m[i][j])); -#endif - printf("\n"); - } -} - -static void printMatrixAcc(acc_t m[DIM][DIM]) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", m[i][j]); -#else - printf("%x ", acc_t_to_acc_t_bits(m[i][j])); -#endif - printf("\n"); - } -} - -static int is_equal(elem_t x[DIM][DIM], elem_t y[DIM][DIM]) { - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - if (x[i][j] != y[i][j]) -#else - bool isnanx = elem_t_isnan(x[i][j]); - bool isnany = elem_t_isnan(y[i][j]); - - if (x[i][j] != y[i][j] && !(isnanx && isnany)) -#endif - return 0; - } - return 1; -} - -static int is_equal_transposed(elem_t x[DIM][DIM], elem_t y[DIM][DIM]) { - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - if (x[i][j] != y[j][i]) -#else - bool isnanx = elem_t_isnan(x[i][j]); - bool isnany = elem_t_isnan(y[j][i]); - - if (x[i][j] != y[j][i] && !(isnanx && isnany)) -#endif - return 0; - } - return 1; -} - -// This is a GNU extension known as statment expressions -#define MAT_IS_EQUAL(dim_i, dim_j, x, y) \ - ({ \ - int result = 1; \ - for (size_t i = 0; i < dim_i; i++) \ - for (size_t j = 0; j < dim_j; ++j) { \ - if (x[i][j] != y[i][j]) { \ - result = 0; \ - break; \ - } \ - } \ - result; \ - }) - -static uint64_t read_cycles() { - uint64_t cycles; - asm volatile("rdcycle %0" : "=r"(cycles)); - return cycles; - - // const uint32_t * mtime = (uint32_t *)(33554432 + 0xbff8); - // const uint32_t * mtime = (uint32_t *)(33554432 + 0xbffc); - // return *mtime; -} - -#undef abs - -#endif // SRC_MAIN_C_GEMMINI_TESTUTILS_H diff --git a/bb-tests/workloads/src/CTest/gemmini/include/translator.h b/bb-tests/workloads/src/CTest/gemmini/include/translator.h deleted file mode 100644 index 748ad4d0..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/include/translator.h +++ /dev/null @@ -1,12 +0,0 @@ -// See LICENSE for license details. - -#ifndef SRC_MAIN_C_TRANSLATOR_H -#define SRC_MAIN_C_TRANSLATOR_H - -#include "rocc-software/src/xcustom.h" - -#define XCUSTOM_TRANS 1 - -#define doTranslate(y, vaddr) ROCC_INSTRUCTION(XCUSTOM_TRANS, y, vaddr, 0, 0); - -#endif // SRC_MAIN_C_TRANSLATOR_H diff --git a/bb-tests/workloads/src/CTest/gemmini/matmul.c b/bb-tests/workloads/src/CTest/gemmini/matmul.c deleted file mode 100644 index 8c37de4f..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/matmul.c +++ /dev/null @@ -1,460 +0,0 @@ -// See LICENSE for license details. -// The main point of this test is just to check whether we can switch between -// output- and weight-stationary dataflows - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" -#include - -static elem_t ZERO[DIM][DIM]; - -#ifdef FAST -#define AINIT RELU -#define SINIT 4 -#define RAND (rand()) -#define N 1 -#else -#define AINIT NO_ACTIVATION -#define SINIT 0 -#define RAND (rand()) -#define N 2 -#endif - -void operands(int c, int *a, int *b, int *d) { - *d = c % N; - *b = (c / N) % N; - *a = c / (N * N); -} - -void test_os(bool A_transpose, bool B_transpose) { - // Output stationary - printf("Output-stationary\n"); - - void (*matmul_ptr)(elem_t[DIM][DIM], elem_t[DIM][DIM], elem_t[DIM][DIM], - full_t[DIM][DIM]); - void (*matmul_full_ptr)(elem_t[DIM][DIM], elem_t[DIM][DIM], full_t[DIM][DIM], - full_t[DIM][DIM]); - - if (!A_transpose && !B_transpose) { - matmul_ptr = &matmul; - matmul_full_ptr = &matmul_full; - } else if (A_transpose && !B_transpose) { - matmul_ptr = &matmul_A_transposed; - matmul_full_ptr = &matmul_full_A_transposed; - } else if (!A_transpose && B_transpose) { - // Just return immediately because we don't support this - return; - } else if (A_transpose && B_transpose) { - matmul_ptr = &matmul_AB_transposed; - matmul_full_ptr = &matmul_full_AB_transposed; - } - - for (int activation = AINIT; activation <= RELU; ++activation) { - for (int shift = SINIT; shift <= 4; shift += 4) { - // printf("activation: %d, shift: %d\n", activation, shift); - - static elem_t A[N][DIM][DIM] row_align(1); - static elem_t B[N][DIM][DIM] row_align(1); - static elem_t D[N][DIM][DIM] row_align(1); - - // We will try out every combination of A, B, D possible - static elem_t C[N * N * N][DIM][DIM] row_align(1); - static full_t gold_full[N * N * N][DIM][DIM]; - static elem_t gold[N * N * N][DIM][DIM]; - - // ...taking into account the preloads or accumulates - static int preload[N * N * N] = {1}; - for (int i = 1; i < N * N * N; ++i) - preload[i] = 1; // rand() % 2; - - // ...and for the actual preloads, do we just preload zeros? - static int preload_zeros[N * N * N]; - for (int i = 0; i < N * N * N; ++i) - preload_zeros[i] = 1; // rand() % 2; - - // ...and finally, which results won't produce any output - static int no_output[N * N * N]; - for (int i = 0; i < N * N * N - 1; ++i) - no_output[i] = !preload[i + 1]; - no_output[N * N * N - 1] = 0; - - // Print the sequence out - /*printf("Preloads: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", preload[i]); - printf("\n"); - printf("Zeros: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", preload_zeros[i]); - printf("\n"); - printf("No outputs: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", no_output[i]); - printf("\n");*/ - - for (size_t n = 0; n < N; ++n) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - A[n][i][j] = (RAND % 64) - 32; - B[n][i][j] = (RAND % 64) - 32; - D[n][i][j] = (RAND % 64) - 32; - } - } - } -#ifdef FAST1 - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - gold[0][i][j] = 1; - } - } -#else - for (size_t g = 0; g < N * N * N; ++g) { - int a, b, d; - operands(g, &a, &b, &d); - - if (!preload[g]) - (*matmul_full_ptr)(A[a], B[b], gold_full[g - 1], gold_full[g]); - else if (preload_zeros[g]) { - (*matmul_ptr)(A[a], B[b], ZERO, gold_full[g]); - } else { - (*matmul_ptr)(A[a], B[b], D[d], gold_full[g]); - } - } - - for (size_t g = 0; g < N * N * N; ++g) { - matshift(gold_full[g], gold[g], shift); - if (activation == RELU) - matrelu(gold[g], gold[g]); - } -#endif - int A_addr = 0; - int B_addr = N * DIM; - int D_addr = 2 * N * DIM; - int C_addr = 3 * N * DIM; - - // printf("Moving in\n"); - for (size_t n = 0; n < N; ++n) - gemmini_mvin(A[n], A_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) - gemmini_mvin(B[n], B_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) { - gemmini_mvin(D[n], D_addr + n * DIM); - } - - // printf("Setting mode\n"); - gemmini_extended_config_ex(OUTPUT_STATIONARY, activation, shift, 1, - A_transpose, B_transpose); - - // printf("Matmulling\n"); - for (size_t c = 0; c < N * N * N; ++c) { - int a, b, d; - operands(c, &a, &b, &d); - - uint64_t out_addr = C_addr + c * DIM; - if (no_output[c]) - out_addr = GARBAGE_ADDR; - - if (!preload[c]) { - gemmini_preload_zeros(out_addr); - gemmini_compute_accumulated(A_addr + a * DIM, B_addr + b * DIM); - } else if (preload_zeros[c]) { - gemmini_preload_zeros(out_addr); - gemmini_compute_preloaded(A_addr + a * DIM, B_addr + b * DIM); - } else { - gemmini_preload(D_addr + d * DIM, out_addr); - gemmini_compute_preloaded(A_addr + a * DIM, B_addr + b * DIM); - } - } - - // printf("Moving out\n"); - for (size_t c = 0; c < N * N * N; ++c) - if (!no_output[c]) { - gemmini_mvout(C[c], C_addr + c * DIM); - } - - gemmini_fence(); - - /* - printf("Moved out\n"); - - printf("A[0]:\n"); - printMatrix(A[0]); - printf("B[0]:\n"); - printMatrix(B[0]); - - for (int n = 0; n < N*N*N; ++n) { - if (!no_output[n]) { - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("\n"); - } - } - */ - - for (int n = 0; n < N * N * N; ++n) - if (!no_output[n] && !is_equal(C[n], gold[n])) { - printf("activation: %d, shift: %d, n: %d, A_transpose: %d, " - "B_transpose: %d\n", - activation, shift, n, A_transpose, B_transpose); - - printf("A:\n"); - printMatrix(A[n]); - printf("B:\n"); - printMatrix(B[n]); - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("\n"); - - exit(1); - } - } - } -} - -void test_ws(bool A_transpose, bool B_transpose) { - // Weight-stationary - printf("Weight-stationary\n"); - - void (*matmul_ptr)(elem_t[DIM][DIM], elem_t[DIM][DIM], elem_t[DIM][DIM], - full_t[DIM][DIM]); - - if (!A_transpose && !B_transpose) { - matmul_ptr = &matmul; - } else if (A_transpose && !B_transpose) { - matmul_ptr = &matmul_A_transposed; - } else if (!A_transpose && B_transpose) { - matmul_ptr = &matmul_B_transposed; - } else if (A_transpose && B_transpose) { - return; - } - - for (int activation = AINIT; activation <= RELU; ++activation) { - for (int scale = SINIT; scale <= 4; scale += 4) { - static elem_t A[N][DIM][DIM] row_align(1); - static elem_t B[N][DIM][DIM] row_align(1); - static elem_t D[N][DIM][DIM] row_align(1); - - // We will try out every combination of A, B, D possible - static elem_t C[N * N * N][DIM][DIM] row_align(1); - static full_t gold_full[N * N * N][DIM][DIM]; - static elem_t gold[N * N * N][DIM][DIM]; - - // ...taking into account whether we preload new weights or re-use the old - // ones - static int preload[N * N * N] = {1}; - for (int i = 1; i < N * N * N; ++i) - preload[i] = RAND % 2; - - // ...whether we pass in a D or just use zeros - static int add_to_zeros[N * N * N]; - for (int i = 0; i < N * N * N; ++i) - add_to_zeros[i] = RAND % 2; - - // ...and whether we accumulate on top of the previous result - static int accumulate[N * N * N] = {0}; - for (int i = 1; i < N * N * N; ++i) - accumulate[i] = RAND % 2; - - static int no_output[N * N * N]; - for (int i = 0; i < N * N * N - 1; ++i) - no_output[i] = accumulate[i + 1]; - no_output[N * N * N - 1] = 0; - - // Print the sequence out - /*printf("Preloads: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", preload[i]); - printf("\n"); - printf("Zeros: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", add_to_zeros[i]); - printf("\n"); - printf("Accumulates: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", accumulate[i]); - printf("\n"); - printf("No outputs: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", no_output[i]); - printf("\n");*/ - - for (size_t n = 0; n < N; ++n) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - A[n][i][j] = (RAND % 64) - 32; - B[n][i][j] = (RAND % 64) - 32; - D[n][i][j] = (RAND % 64) - 32; - } - } - } -#ifdef FAST1 - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - gold[0][i][j] = 64; - } - } -#else - for (size_t g = 0; g < N * N * N; ++g) { - int a, b, d; - operands(g, &a, &b, &d); - - // We need to find the last B value in case we aren't preloading new - // weights - for (int last_g = g; last_g >= 0; --last_g) { - int tmp_a, tmp_d; - if (preload[last_g]) { - operands(last_g, &tmp_a, &b, &tmp_d); - break; - } - } - - if (add_to_zeros[g]) { - (*matmul_ptr)(A[a], B[b], ZERO, gold_full[g]); - } else { - (*matmul_ptr)(A[a], B[b], D[d], gold_full[g]); - } - - if (accumulate[g]) - matadd(gold_full[g], gold_full[g - 1], gold_full[g]); - } - - for (size_t g = 0; g < N * N * N; ++g) { - matscale(gold_full[g], gold[g], scale); - if (activation == RELU) - matrelu(gold[g], gold[g]); - } -#endif - uint32_t A_addr = 0; - uint32_t B_addr = N * DIM; - uint32_t D_addr = 2 * N * DIM; - uint32_t C_addr_acc = 1 << (ADDR_LEN - 1); - - // Calculate the proper destination addresses of everything - uint32_t C_addrs[N * N * N]; - for (size_t c = 0; c < N * N * N; ++c) - C_addrs[c] = C_addr_acc + c * DIM; - for (size_t c = 0; c < N * N * N; ++c) { - int last_c; - for (last_c = c; last_c >= 0; --last_c) - if (!accumulate[last_c]) - break; - if (c != last_c) - C_addrs[c] = C_addrs[last_c] | (1 << (ADDR_LEN - 2)); - } - - // printf("Moving in\n"); - for (size_t n = 0; n < N; ++n) - gemmini_mvin(A[n], A_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) - gemmini_mvin(B[n], B_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) - gemmini_mvin(D[n], D_addr + n * DIM); - - // printf("Setting mode\n"); - gemmini_extended_config_ex(WEIGHT_STATIONARY, 0, 0, 1, A_transpose, - B_transpose); - gemmini_extended_config_st(DIM * sizeof(elem_t), activation, scale); - - // printf("Matmulling\n"); - for (size_t c = 0; c < N * N * N; ++c) { - int a, b, d; - operands(c, &a, &b, &d); - - uint64_t d_addr = D_addr + d * DIM; - if (add_to_zeros[c]) - d_addr = GARBAGE_ADDR; - - if (!preload[c]) { - gemmini_preload_zeros(C_addrs[c]); - gemmini_compute_accumulated(A_addr + a * DIM, d_addr); - } else { - gemmini_preload(B_addr + b * DIM, C_addrs[c]); - gemmini_compute_preloaded(A_addr + a * DIM, d_addr); - } - } - - // printf("Moving out\n"); - for (size_t c = 0; c < N * N * N; ++c) - if (!no_output[c]) { - gemmini_mvout(C[c], C_addrs[c] & ~(1 << (ADDR_LEN - 2))); - } - - gemmini_fence(); - - /*printf("Moved out\n"); - for (int n = 0; n < N*N*N; ++n) { - if (!no_output[n]) { - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("\n"); - } - }*/ - - for (int n = 0; n < N * N * N; ++n) - if (!no_output[n] && !is_equal(C[n], gold[n])) { - printf("activation: %d, scale: %d, n: %d, A_transpose: %d, " - "B_transpose: %d\n", - activation, scale, n, A_transpose, B_transpose); - - printf("A:\n"); - printMatrix(A[n]); - printf("B:\n"); - printMatrix(B[n]); - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("\n"); - - exit(1); - } - } - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - for (int A_transpose = 0; A_transpose < 2; A_transpose++) { - for (int B_transpose = 0; B_transpose < 2; B_transpose++) { - for (int dataflow = OUTPUT_STATIONARY; dataflow <= WEIGHT_STATIONARY; - dataflow++) { - printf("A_transpose: %d, B_transpose: %d, dataflow: %d\n", A_transpose, - B_transpose, dataflow); - - if (dataflow == OUTPUT_STATIONARY) - test_os(A_transpose, B_transpose); - else - test_ws(A_transpose, B_transpose); - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/matmul_os.c b/bb-tests/workloads/src/CTest/gemmini/matmul_os.c deleted file mode 100644 index e57733b2..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/matmul_os.c +++ /dev/null @@ -1,206 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" -#include - -#ifdef FAST -#define AINIT RELU -#define SINIT 12 -#define N 1 -#else -#define AINIT NO_ACTIVATION -#define SINIT 0 -#define N 2 -#endif - -void operands(int c, int *a, int *b, int *d) { - *d = c % N; - *b = (c / N) % N; - *a = c / (N * N); -} - -#if (3 * N + N * N * N) * DIM > (BANK_NUM * BANK_ROWS) -#error scratchpad not big enough -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - static elem_t ZERO[DIM][DIM]; - - for (int activation = AINIT; activation <= RELU; ++activation) { - for (int shift = SINIT; shift <= 12; shift += 4) { - // printf("activation: %d, shift: %d\n", activation, shift); - - static elem_t A[N][DIM][DIM] row_align(1); - static elem_t B[N][DIM][DIM] row_align(1); - static elem_t D[N][DIM][DIM] row_align(1); - - // We will try out every combination of A, B, D possible - static elem_t C[N * N * N][DIM][DIM] row_align(1); - static full_t gold_full[N * N * N][DIM][DIM]; - static elem_t gold[N * N * N][DIM][DIM]; - - // ...taking into account the preloads or accumulates - static int preload[N * N * N] = {1}; - for (int i = 1; i < N * N * N; ++i) - preload[i] = rand() % 2; - - // ...and for the actual preloads, do we just preload zeros? - static int preload_zeros[N * N * N]; - for (int i = 0; i < N * N * N; ++i) - preload_zeros[i] = rand() % 2; - - // ...and finally, which results won't produce any output - static int no_output[N * N * N]; - for (int i = 0; i < N * N * N - 1; ++i) - no_output[i] = !preload[i + 1]; - no_output[N * N * N - 1] = 0; - - // Print the sequence out - /*printf("Preloads: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", preload[i]); - printf("\n"); - printf("Zeros: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", preload_zeros[i]); - printf("\n"); - printf("No outputs: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", no_output[i]); - printf("\n");*/ - - for (size_t n = 0; n < N; ++n) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - A[n][i][j] = (rand() % 64) - 32; - B[n][i][j] = (rand() % 64) - 32; - D[n][i][j] = (rand() % 64) - 32; - } - } - } - - for (size_t g = 0; g < N * N * N; ++g) { - int a, b, d; - operands(g, &a, &b, &d); - - if (!preload[g]) - matmul_full(A[a], B[b], gold_full[g - 1], gold_full[g]); - else if (preload_zeros[g]) - matmul(A[a], B[b], ZERO, gold_full[g]); - else - matmul(A[a], B[b], D[d], gold_full[g]); - } - - for (size_t g = 0; g < N * N * N; ++g) { - matshift(gold_full[g], gold[g], shift); - if (activation == RELU) - matrelu(gold[g], gold[g]); - } - - int A_addr = 0; - int B_addr = N * DIM; - int D_addr = 2 * N * DIM; - int C_addr = 3 * N * DIM; - - // printf("Moving in\n"); - for (size_t n = 0; n < N; ++n) - gemmini_mvin(A[n], A_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) - gemmini_mvin(B[n], B_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) { - gemmini_mvin(D[n], D_addr + n * DIM); - } - - // printf("Setting mode\n"); - gemmini_config_ex(OUTPUT_STATIONARY, activation, shift); - - // printf("Matmulling\n"); - for (size_t c = 0; c < N * N * N; ++c) { - // printf("\tc: %u\n", c); - - int a, b, d; - operands(c, &a, &b, &d); - - uint64_t out_addr = C_addr + c * DIM; - if (no_output[c]) - out_addr = GARBAGE_ADDR; - - if (!preload[c]) { - gemmini_preload_zeros(out_addr); - gemmini_compute_accumulated(A_addr + a * DIM, B_addr + b * DIM); - } else if (preload_zeros[c]) { - gemmini_preload_zeros(out_addr); - gemmini_compute_preloaded(A_addr + a * DIM, B_addr + b * DIM); - } else { - gemmini_preload(D_addr + d * DIM, out_addr); - gemmini_compute_preloaded(A_addr + a * DIM, B_addr + b * DIM); - } - } - - // printf("Moving out\n"); - for (size_t c = 0; c < N * N * N; ++c) - if (!no_output[c]) { - // printf("\tc: %u\n", c); - gemmini_mvout(&C[c][0][0], C_addr + c * DIM); - } - - // printf("Fencing\n"); - gemmini_fence(); - - /*printf("Moved out\n"); - for (int n = 0; n < N*N*N; ++n) { - if (!no_output[n]) { - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("\n"); - } - }*/ - - // printf("Checking\n"); - for (int n = 0; n < N * N * N; ++n) - if (!no_output[n] && !is_equal(C[n], gold[n])) { - printf("activation: %d, shift: %d\n", activation, shift); - - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("Gold_full:\n"); - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - printf("%lld ", gold_full[n][i][j]); - } - printf("\n"); - } - printf("\n"); - - exit(1); - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/matmul_ws.c b/bb-tests/workloads/src/CTest/gemmini/matmul_ws.c deleted file mode 100644 index cd30940c..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/matmul_ws.c +++ /dev/null @@ -1,227 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" -#include - -#ifdef FAST -#define AINIT RELU -#define SINIT 12 -#define N 1 -#else -#define AINIT NO_ACTIVATION -#define SINIT 0 -#define N 2 -#endif - -void operands(int c, int *a, int *b, int *d) { - *d = c % N; - *b = (c / N) % N; - *a = c / (N * N); -} - -#if 3 * N * DIM > (BANK_NUM * BANK_ROWS) || N * N * N * DIM > ACC_ROWS -// #error scratchpad or accumulator not big enough -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - static elem_t ZERO[DIM][DIM]; - - gemmini_flush(0); - gemmini_config_ld(DIM * sizeof(elem_t)); - - for (int activation = AINIT; activation <= RELU; ++activation) { -#ifdef ACC_SCALE_T_IS_FLOAT - for (acc_scale_t scale = 0; scale <= 1.5; scale += 0.5) { -#else - for (acc_scale_t scale = SINIT; scale <= 12; scale += 4) { -#endif - static elem_t A[N][DIM][DIM] row_align(1); - static elem_t B[N][DIM][DIM] row_align(1); - static elem_t D[N][DIM][DIM] row_align(1); - - // We will try out every combination of A, B, D possible - static elem_t C[N * N * N][DIM][DIM] row_align(1); - static full_t gold_full[N * N * N][DIM][DIM]; - static elem_t gold[N * N * N][DIM][DIM]; - - // ...taking into account whether we preload new weights or re-use the old - // ones - static int preload[N * N * N] = {1}; - for (int i = 1; i < N * N * N; ++i) - preload[i] = rand() % 2; - - // ...whether we pass in a D or just use zeros - static int add_to_zeros[N * N * N]; - for (int i = 0; i < N * N * N; ++i) - add_to_zeros[i] = rand() % 2; - - // ...and whether we accumulate on top of the previous result - static int accumulate[N * N * N] = {0}; - for (int i = 1; i < N * N * N; ++i) - accumulate[i] = rand() % 2; - - static int no_output[N * N * N]; - for (int i = 0; i < N * N * N - 1; ++i) - no_output[i] = accumulate[i + 1]; - no_output[N * N * N - 1] = 0; - - // Print the sequence out - /*printf("Preloads: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", preload[i]); - printf("\n"); - printf("Zeros: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", add_to_zeros[i]); - printf("\n"); - printf("Accumulates: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", accumulate[i]); - printf("\n"); - printf("No outputs: "); - for (int i = 0; i < N*N*N; ++i) - printf("%d, ", no_output[i]); - printf("\n");*/ - - for (size_t n = 0; n < N; ++n) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) { - A[n][i][j] = (rand() % 64) - 32; - B[n][i][j] = (rand() % 64) - 32; - D[n][i][j] = (rand() % 64) - 32; - } - } - } - - for (size_t g = 0; g < N * N * N; ++g) { - int a, b, d; - operands(g, &a, &b, &d); - - // We need to find the last B value in case we aren't preloading new - // weights - for (int last_g = g; last_g >= 0; --last_g) { - int tmp_a, tmp_d; - if (preload[last_g]) { - operands(last_g, &tmp_a, &b, &tmp_d); - break; - } - } - - if (add_to_zeros[g]) - matmul(A[a], B[b], ZERO, gold_full[g]); - else - matmul(A[a], B[b], D[d], gold_full[g]); - - if (accumulate[g]) - matadd(gold_full[g], gold_full[g - 1], gold_full[g]); - } - - for (size_t g = 0; g < N * N * N; ++g) { - matscale(gold_full[g], gold[g], scale); - if (activation == RELU) - matrelu(gold[g], gold[g]); - } - - uint32_t A_addr = 0; - uint32_t B_addr = N * DIM; - uint32_t D_addr = 2 * N * DIM; - uint32_t C_addr_acc = 1 << (ADDR_LEN - 1); - - // Calculate the proper destination addresses of everything - uint32_t C_addrs[N * N * N]; - for (size_t c = 0; c < N * N * N; ++c) - C_addrs[c] = C_addr_acc + c * DIM; - for (size_t c = 0; c < N * N * N; ++c) { - int last_c; - for (last_c = c; last_c >= 0; --last_c) - if (!accumulate[last_c]) - break; - if (c != last_c) - C_addrs[c] = C_addrs[last_c] | (1 << (ADDR_LEN - 2)); - } - - // printf("Moving in\n"); - for (size_t n = 0; n < N; ++n) - gemmini_mvin(A[n], A_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) - gemmini_mvin(B[n], B_addr + n * DIM); - - for (size_t n = 0; n < N; ++n) - if (n == N - 1) { - gemmini_mvin(D[n], D_addr + n * DIM); - } else { - gemmini_mvin(D[n], D_addr + n * DIM); - } - - // printf("Setting mode\n"); - gemmini_config_ex(WEIGHT_STATIONARY, 0, 0); - gemmini_extended_config_st(DIM * sizeof(elem_t), activation, scale); - - // printf("Matmulling\n"); - for (size_t c = 0; c < N * N * N; ++c) { - int a, b, d; - operands(c, &a, &b, &d); - - uint32_t d_addr = D_addr + d * DIM; - if (add_to_zeros[c]) - d_addr = GARBAGE_ADDR; - - if (!preload[c]) { - gemmini_preload_zeros(C_addrs[c]); - gemmini_compute_accumulated(A_addr + a * DIM, d_addr); - } else { - gemmini_preload(B_addr + b * DIM, C_addrs[c]); - gemmini_compute_preloaded(A_addr + a * DIM, d_addr); - } - } - - // printf("Moving out\n"); - for (size_t c = 0; c < N * N * N; ++c) - if (!no_output[c]) { - gemmini_mvout(C[c], C_addrs[c] & ~(1 << (ADDR_LEN - 2))); - } - - gemmini_fence(); - - /*printf("Moved out\n"); - for (int n = 0; n < N*N*N; ++n) { - if (!no_output[n]) { - printf("C:\n"); - printMatrix(C[n]); - printf("Gold:\n"); - printMatrix(gold[n]); - printf("\n"); - } - }*/ - - // printf("Checking\n"); - for (int n = 0; n < N * N * N; ++n) - if (!no_output[n] && !is_equal(C[n], gold[n])) { - printf("activation: %d, scale: %d\n", activation, scale); - printf("Actual (%d):\n", n); - printMatrix(C[n]); - printf("\nGold:\n"); - printMatrix(gold[n]); - exit(1); - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/matrix_add.c b/bb-tests/workloads/src/CTest/gemmini/matrix_add.c deleted file mode 100644 index a54a719e..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/matrix_add.c +++ /dev/null @@ -1,72 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" -#include - -int main() { - elem_t A[DIM][DIM]; - elem_t B[DIM][DIM]; - elem_t C[DIM][DIM]; - elem_t gold[DIM][DIM]; - - for (size_t i = 0; i < DIM; i++) - for (size_t j = 0; j < DIM; j++) { - A[i][j] = (rand() % 16) - 8; - B[i][j] = (rand() % 16) - 8; - } - - for (int ascale = -2; ascale < 2; ascale++) { - for (int bscale = -2; bscale < 2; bscale++) { - for (size_t i = 0; i < DIM; i++) - for (size_t j = 0; j < DIM; j++) { - acc_t sum = MVIN_SCALE(A[i][j], ascale) + MVIN_SCALE(B[i][j], bscale); - gold[i][j] = sum > elem_t_max ? elem_t_max - : (sum < elem_t_min ? elem_t_min : sum); - } - - uint32_t A_acc_addr = 1 << (ADDR_LEN - 1); - uint32_t B_acc_addr = (1 << (ADDR_LEN - 1)) | (1 << (ADDR_LEN - 2)); - uint32_t C_acc_addr = 1 << (ADDR_LEN - 1); - - gemmini_extended2_config_ld(DIM * sizeof(elem_t), ascale, true); - gemmini_mvin(A, A_acc_addr); - - gemmini_extended2_config_ld(DIM * sizeof(elem_t), bscale, true); - gemmini_mvin(B, B_acc_addr); - - gemmini_config_ex(0, NO_ACTIVATION, 0); - gemmini_config_st(DIM * sizeof(elem_t)); - gemmini_mvout(C, C_acc_addr); - - gemmini_fence(); - - if (!is_equal(C, gold)) { - printf("Wrong (ascale: %d, bscale: %d)\n", ascale, bscale); - printf("\"C\" matrix:\n"); - printMatrix(C); - printf("\n"); - printf("\"Gold\" matrix:\n"); - printMatrix(gold); - printf("\n"); - printf("\"A\" matrix:\n"); - printMatrix(A); - printf("\n"); - printf("\"B\" matrix:\n"); - printMatrix(B); - printf("\n"); - printf("Wrong (ascale: %d, bscale: %d)\n", ascale, bscale); - exit(1); - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout.c deleted file mode 100644 index a93a258a..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout.c +++ /dev/null @@ -1,62 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define N 8 - -#if (N * DIM) > (BANK_NUM * BANK_ROWS) -#error not enough scratchpad space -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - // printf("Flush\n"); - gemmini_flush(0); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - static elem_t In[N][DIM][DIM] row_align(1); - static elem_t Out[N][DIM][DIM] row_align(1); - - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) - In[n][i][j] = i * DIM + j + n; - - for (size_t n = 0; n < N; ++n) { - // printf("Mvin %d\n", n); - gemmini_mvin(In[n], n * DIM); - // printf("Mvout %d\n", n); - gemmini_mvout(Out[n], n * DIM); - } - - // printf("Fence"); - gemmini_fence(); - - for (size_t n = 0; n < N; ++n) - if (!is_equal(In[n], Out[n])) { - printf("Matrix %u:\n", n); - printMatrix(In[n]); - printf("Matrix %u output:\n", n); - printMatrix(Out[n]); - printf("\n"); - - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc.c deleted file mode 100644 index f3485e53..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc.c +++ /dev/null @@ -1,132 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifdef FAST -#define N 2 -#define AINIT RELU -#define SINIT 12 -#else -#define N 4 -#define AINIT NO_ACTIVATION -#define SINIT 12 -#endif - -#if (N * DIM) > ACC_ROWS -#error not enough accumulator space -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - for (int activation = AINIT; activation <= RELU; ++activation) { - for (int scale = SINIT; scale <= 12; scale += 4) { - // printf("activation: %d, scale: %d\n", activation, scale); - - static acc_t In[N][DIM][DIM] row_align_acc(1); - static full_t In_full[N][DIM][DIM]; - static elem_t Out[N][DIM][DIM] row_align(1); - static elem_t Out_gold[N][DIM][DIM]; - - // printf("Initializing matrices\n"); - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - In[n][i][j] = 0; -#ifdef FAST -#define RAND (j + i) -#else -#define RAND rand() -#endif - int bytes = RAND % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - In[n][i][j] |= (RAND % 255) << (b * 8); - } -#else - acc_t_bits data; - - do { - data = 0; - - int bytes = rand() % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - data |= (uint64_t)(rand() % 255) << (b * 8); - } - - In[n][i][j] = acc_t_bits_to_acc_t(data); - } while (acc_t_isnan(In[n][i][j])); -#endif - - In_full[n][i][j] = In[n][i][j]; - } - - // printf("Shifting and activating matrices\n"); - for (size_t n = 0; n < N; ++n) { - matscale(In_full[n], Out_gold[n], scale); - - if (activation == RELU) - matrelu(Out_gold[n], Out_gold[n]); - } - - const uint32_t acc_addr = 1 << (ADDR_LEN - 1); - - // printf("Config\n"); - gemmini_config_ld(DIM * sizeof(acc_t)); - gemmini_config_ex(0, 0, 0); - gemmini_extended_config_st(DIM * sizeof(elem_t), activation, scale); - - // printf("Mvin and mvout\n"); - for (size_t n = 0; n < N; ++n) { - // printf("Mvin n: %u\n", n); - gemmini_mvin(In[n], acc_addr + n * DIM); - // printf("Mvout n: %u\n", n); - gemmini_mvout(Out[n], acc_addr + n * DIM); - } - - // printf("Fence\n"); - gemmini_fence(); - - // printf("Check\n"); - for (size_t n = 0; n < N; ++n) - if (!is_equal(Out[n], Out_gold[n])) { - printf("activation: %d, scale: %d\n", activation, scale); - - printf("Matrix %u:\n", n); - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", In[n][i][j]); -#else - printf("%llx ", acc_t_to_acc_t_bits(In[n][i][j])); -#endif - printf("\n"); - } - printf("Matrix %u output:\n", n); - printMatrix(Out[n]); - printf("Matrix %u gold output:\n", n); - printMatrix(Out_gold[n]); - printf("\n"); - - exit(1); - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_full.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_full.c deleted file mode 100644 index d0208af7..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_full.c +++ /dev/null @@ -1,119 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define N 4 - -#if (N * DIM) > ACC_ROWS -#error not enough accumulator space -#endif - -int main() { -#ifdef ACC_READ_FULL_WIDTH - -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static acc_t In[N][DIM][DIM] row_align_acc(1); - static acc_t Out[N][DIM][DIM] row_align_acc(1); - - // printf("Initializing matrices\n"); - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - In[n][i][j] = 0; -#ifdef FAST -#define RAND (j + i) -#else -#define RAND rand() -#endif - - int bytes = RAND % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - In[n][i][j] |= (RAND % 255) << (b * 8); - } -#else - acc_t_bits data; - - do { - data = 0; - - int bytes = rand() % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - data |= (uint64_t)(rand() % 255) << (b * 8); - } - - In[n][i][j] = acc_t_bits_to_acc_t(data); - } while (acc_t_isnan(In[n][i][j])); -#endif - } - - const uint32_t acc_addr = 5 << (ADDR_LEN - 3); - - // printf("Config\n"); - gemmini_config_ld(DIM * sizeof(acc_t)); - gemmini_config_ex(0, NO_ACTIVATION, 0); - gemmini_config_st(DIM * sizeof(acc_t)); - - // printf("Mvin and mvout\n"); - for (size_t n = 0; n < N; ++n) { - // printf("Mvin n: %u\n", n); - gemmini_mvin(In[n], acc_addr + n * DIM); - // printf("Mvout n: %u\n", n); - gemmini_mvout(Out[n], acc_addr + n * DIM); - } - - // printf("Fence\n"); - gemmini_fence(); - - // printf("Check\n"); - for (size_t n = 0; n < N; ++n) - if (!MAT_IS_EQUAL(DIM, DIM, Out[n], In[n])) { - printf("Matrix %u:\n", n); - - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", In[n][i][j]); -#else - printf("%llx ", acc_t_to_acc_t_bits(In[n][i][j])); -#endif - printf("\n"); - } - - printf("\nMatrix %u output:\n", n); - - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", Out[n][i][j]); -#else - printf("%llx ", acc_t_to_acc_t_bits(Out[n][i][j])); -#endif - printf("\n"); - } - - printf("\n"); - - exit(1); - } - -#endif // #ifdef ACC_READ_FULL_WIDTH - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_full_stride.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_full_stride.c deleted file mode 100644 index 60ad293b..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_full_stride.c +++ /dev/null @@ -1,177 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifdef FAST -#define BIG_DIM (DIM * 2) -#define BINIT MIN(MAX_BLOCK_LEN_ACC, BIG_DIM / DIM) -#else -#define BIG_DIM 64 -#define BINIT 1 -#endif - -#if (BIG_DIM % DIM) != 0 -#error incorrect dimensions -#endif - -#if (BIG_DIM * BIG_DIM / DIM) > ACC_ROWS -#error not enough accumulator space -#endif - -#define MIN(a, b) ((a > b) ? b : a) - -int is_equal_big(acc_t x[BIG_DIM][BIG_DIM], acc_t y[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) - for (size_t j = 0; j < BIG_DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - if (x[i][j] != y[i][j]) { -#else - bool isnanx = acc_t_isnan(x[i][j]); - bool isnany = acc_t_isnan(y[i][j]); - - if (x[i][j] != y[i][j] && !(isnanx && isnany)) { - printf("x[i][j] == %x\n", acc_t_to_acc_t_bits(x[i][j])); - printf("y[i][j] == %x\n", acc_t_to_acc_t_bits(y[i][j])); - -#endif - printf("Unequal in row %u and column %u\n", i, j); - return 0; - } - } - return 1; -} - -void printMatrix_acc_big(acc_t m[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) { - for (size_t j = 0; j < BIG_DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", m[i][j]); -#else - printf("%llx ", acc_t_to_acc_t_bits(m[i][j])); -#endif - printf("\n"); - } -} - -int main() { -#ifdef ACC_READ_FULL_WIDTH - -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - for (int block_len = BINIT; - block_len <= BIG_DIM / DIM && block_len <= MAX_BLOCK_LEN_ACC; - block_len++) { - static acc_t In[BIG_DIM][BIG_DIM] row_align_acc(1); - static acc_t Out[BIG_DIM][BIG_DIM] row_align(1); - - for (size_t i = 0; i < BIG_DIM; ++i) { - for (size_t j = 0; j < BIG_DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - In[i][j] = 0; -#ifdef FAST -#define RAND (j + i) -#else -#define RAND rand() -#endif - int bytes = RAND % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - In[i][j] |= (RAND % 255) << (b * 8); - } -#else - acc_t_bits data; - - do { - data = 0; - - int bytes = rand() % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - data |= (uint64_t)(rand() % 255) << (b * 8); - } - - In[i][j] = acc_t_bits_to_acc_t(data); - } while (acc_t_isnan(In[i][j])); -#endif - } - } - - const uint32_t acc_addr = 5 << (ADDR_LEN - 3); - - gemmini_config_ld(BIG_DIM * sizeof(acc_t)); - gemmini_config_ex(0, NO_ACTIVATION, 0); - gemmini_config_st(BIG_DIM * sizeof(acc_t)); - - for (size_t i = 0; i < BIG_DIM; i += DIM) { - for (size_t j = 0; j < BIG_DIM; j += DIM) { - // printf("i: %u, j: %u\n", i, j); - - acc_t *dram_addr_in = &In[i][j]; - acc_t *dram_addr_out = &Out[i][j]; - uint32_t sp_addr = acc_addr + i * (BIG_DIM / DIM) + j; - - int already_moved_in = (j / DIM) % block_len != 0; - - if (!already_moved_in) { - int len = - j + block_len * DIM <= BIG_DIM ? block_len : (BIG_DIM - j) / DIM; - // printf("Moving in with len: %d\n", len); - gemmini_block_mvin(dram_addr_in, sp_addr, len); - gemmini_mvout(dram_addr_out, sp_addr); - } else { - // printf("Already moved in\n"); - gemmini_mvout(dram_addr_out, sp_addr); - } - } - } - - // printf("Fence\n"); - gemmini_fence(); - - // printf("Check\n"); - if (!is_equal_big(Out, In)) { - /*printf("Out:\n"); - for (size_t i = 0; i < BIG_DIM; i++) { - for (size_t j = 0; j < BIG_DIM; j++) { - printf("%d, ", Out[i][j]); - } - printf("\n"); - } - - printf("\n"); - - printf("Gold:\n"); - for (size_t i = 0; i < BIG_DIM; i++) { - for (size_t j = 0; j < BIG_DIM; j++) { - printf("%d, ", Out[i][j]); - } - printf("\n"); - }*/ - - printf("Matrix:\n"); - printMatrix_acc_big(In); - printf("Matrix output:\n"); - printMatrix_acc_big(Out); - printf("\n"); - - exit(1); - } - } - -#endif // #ifdef ACC_READ_FULL_WIDTH - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_stride.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_stride.c deleted file mode 100644 index 11e56285..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_stride.c +++ /dev/null @@ -1,235 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define MIN(a, b) ((a > b) ? b : a) - -#ifdef FAST -#define BIG_DIM (DIM * 2) -#define BINIT MIN(MAX_BLOCK_LEN_ACC, BIG_DIM / DIM) -#define AINIT 2 -#define SINIT 12 -#else -#define BIG_DIM 64 -#define BINIT 1 -#define AINIT 0 -#define SINIT 0 -#endif - -#if (BIG_DIM % DIM) != 0 -#error incorrect dimensions -#endif - -#if (BIG_DIM * BIG_DIM / DIM) > ACC_ROWS -#error not enough accumulator space -#endif - -int is_equal_big(elem_t x[BIG_DIM][BIG_DIM], elem_t y[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) - for (size_t j = 0; j < BIG_DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - if (x[i][j] != y[i][j]) { -#else - bool isnanx = elem_t_isnan(x[i][j]); - bool isnany = elem_t_isnan(y[i][j]); - - if (x[i][j] != y[i][j] && !(isnanx && isnany)) { - printf("x[i][j] == %x\n", elem_t_to_elem_t_bits(x[i][j])); - printf("y[i][j] == %x\n", elem_t_to_elem_t_bits(y[i][j])); - -#endif - printf("Unequal in row %u and column %u\n", i, j); - return 0; - } - } - return 1; -} - -void matscale_big(full_t full[BIG_DIM][BIG_DIM], elem_t out[BIG_DIM][BIG_DIM], - acc_scale_t scale) { - for (size_t r = 0; r < BIG_DIM; r++) - for (size_t c = 0; c < BIG_DIM; c++) { - // Scale element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -void matrelu_big(elem_t in[BIG_DIM][BIG_DIM], elem_t out[BIG_DIM][BIG_DIM]) { - for (size_t r = 0; r < BIG_DIM; r++) - for (size_t c = 0; c < BIG_DIM; c++) - out[r][c] = in[r][c] > 0 ? in[r][c] : 0; -} - -void printMatrix_big(elem_t m[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) { - for (size_t j = 0; j < BIG_DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", m[i][j]); -#else - printf("%x ", elem_t_to_elem_t_bits(m[i][j])); -#endif - printf("\n"); - } -} - -void printMatrix_acc_big(acc_t m[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) { - for (size_t j = 0; j < BIG_DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", m[i][j]); -#else - printf("%llx ", acc_t_to_acc_t_bits(m[i][j])); -#endif - printf("\n"); - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - for (int block_len = BINIT; - block_len <= BIG_DIM / DIM && block_len <= MAX_BLOCK_LEN_ACC; - block_len++) { - for (int activation = AINIT; activation <= RELU; ++activation) { - for (int scale = SINIT; scale <= 12; scale += 4) { - // printf("block_len: %d, activation: %d, scale: %d\n", block_len, - // activation, scale); - - static acc_t In[BIG_DIM][BIG_DIM] row_align_acc(1); - static full_t In_full[BIG_DIM][BIG_DIM]; - static elem_t Out[BIG_DIM][BIG_DIM] row_align(1); - static elem_t Out_gold[BIG_DIM][BIG_DIM]; - - for (size_t i = 0; i < BIG_DIM; ++i) { - for (size_t j = 0; j < BIG_DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - In[i][j] = 0; -#ifdef FAST -#define RAND (j + i) -#else -#define RAND rand() -#endif - int bytes = RAND % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - In[i][j] |= (RAND % 255) << (b * 8); - } -#else - acc_t_bits data; - - do { - data = 0; - - int bytes = rand() % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - data |= (uint64_t)(rand() % 255) << (b * 8); - } - - In[i][j] = acc_t_bits_to_acc_t(data); - } while (acc_t_isnan(In[i][j])); -#endif - - In_full[i][j] = In[i][j]; - } - } - - matscale_big(In_full, Out_gold, scale); - - if (activation == RELU) - matrelu_big(Out_gold, Out_gold); - - const uint32_t acc_addr = 1 << (ADDR_LEN - 1); - - gemmini_config_ld(BIG_DIM * sizeof(acc_t)); - gemmini_config_ex(0, 0, 0); - gemmini_extended_config_st(BIG_DIM * sizeof(elem_t), activation, scale); - - for (size_t i = 0; i < BIG_DIM; i += DIM) { - for (size_t j = 0; j < BIG_DIM; j += DIM) { - // printf("i: %u, j: %u\n", i, j); - - acc_t *dram_addr_in = &In[i][j]; - elem_t *dram_addr_out = &Out[i][j]; - uint32_t sp_addr = acc_addr + i * (BIG_DIM / DIM) + j; - - int already_moved_in = (j / DIM) % block_len != 0; - - if (!already_moved_in) { - int len = j + block_len * DIM <= BIG_DIM ? block_len - : (BIG_DIM - j) / DIM; - // printf("Moving in with len: %d\n", len); - gemmini_block_mvin(dram_addr_in, sp_addr, len); - gemmini_mvout(dram_addr_out, sp_addr); - } else { - // printf("Already moved in\n"); - gemmini_mvout(dram_addr_out, sp_addr); - } - } - } - - // printf("Fence\n"); - gemmini_fence(); - - // printf("Check\n"); - if (!is_equal_big(Out, Out_gold)) { - printf("block_len: %d, activation: %d, scale: %d\n", block_len, - activation, scale); - - /*printf("Out:\n"); - for (size_t i = 0; i < BIG_DIM; i++) { - for (size_t j = 0; j < BIG_DIM; j++) { - printf("%d, ", Out[i][j]); - } - printf("\n"); - } - - printf("\n"); - - printf("Gold:\n"); - for (size_t i = 0; i < BIG_DIM; i++) { - for (size_t j = 0; j < BIG_DIM; j++) { - printf("%d, ", Out[i][j]); - } - printf("\n"); - }*/ - - printf("Matrix:\n"); - printMatrix_acc_big(In); - printf("Matrix output:\n"); - printMatrix_big(Out); - printf("Matrix gold output:\n"); - printMatrix_big(Out_gold); - printf("\n"); - - exit(1); - } - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_zero_stride.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_zero_stride.c deleted file mode 100644 index bcb27b1e..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_acc_zero_stride.c +++ /dev/null @@ -1,104 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define N 4 - -#if (N * DIM) > ACC_ROWS -#error not enough accumulator space -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static acc_t In[N][DIM] row_align_acc(1); - static elem_t Out[N][DIM][DIM] row_align(1); - static elem_t Out_gold[N][DIM][DIM]; - - for (size_t n = 0; n < N; ++n) - for (size_t j = 0; j < DIM; ++j) { -#ifndef ELEM_T_IS_FLOAT - In[n][j] = 0; - - int bytes = rand() % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - In[n][j] |= (rand() % 255) << (b * 8); - } -#else - acc_t_bits data; - - do { - data = 0; - - int bytes = rand() % 2 ? sizeof(acc_t) : sizeof(elem_t); - for (size_t b = 0; b < bytes; ++b) { - data |= (uint64_t)(rand() % 255) << (b * 8); - } - - In[n][j] = acc_t_bits_to_acc_t(data); - } while (acc_t_isnan(In[n][j])); -#endif - } - - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { - Out_gold[n][i][j] = ACC_SCALE(In[n][j], ACC_SCALE_IDENTITY); - } - - const uint32_t acc_addr = 1 << (ADDR_LEN - 1); - - // printf("Config\n"); - gemmini_config_ld(0); - gemmini_config_ex(0, NO_ACTIVATION, 0); - gemmini_config_st(DIM * sizeof(elem_t)); - - // printf("Mvin and mvout\n"); - for (size_t n = 0; n < N; ++n) { - // printf("Mvin n: %u\n", n); - gemmini_mvin(In[n], acc_addr + n * DIM); - // printf("Mvout n: %u\n", n); - gemmini_mvout(Out[n], acc_addr + n * DIM); - } - - // printf("Fence\n"); - gemmini_fence(); - - // printf("Check\n"); - for (size_t n = 0; n < N; ++n) - if (!is_equal(Out[n], Out_gold[n])) { - printf("Matrix %u:\n", n); - for (size_t j = 0; j < DIM; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", In[n][j]); -#else - printf("%llx ", acc_t_to_acc_t_bits(In[n][j])); -#endif - printf("\n"); - - printf("Matrix %u output:\n", n); - printMatrix(Out[n]); - printf("Matrix %u gold output:\n", n); - printMatrix(Out_gold[n]); - printf("\n"); - - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_block_stride.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_block_stride.c deleted file mode 100644 index d014b32c..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_block_stride.c +++ /dev/null @@ -1,71 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define BLOCK_STRIDE (DIM * 2) -#define COLS (DIM * MAX_BLOCK_LEN) - -void printMatrixFull(elem_t m[DIM][COLS]) { - for (size_t i = 0; i < DIM; ++i) { - for (size_t j = 0; j < COLS; ++j) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", m[i][j]); -#else - printf("%x ", elem_t_to_elem_t_bits(m[i][j])); -#endif - printf("\n"); - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - static elem_t In[DIM][COLS] row_align(1); - static elem_t Out[DIM][COLS] row_align(1); - - // printf("Flush\n"); - gemmini_flush(0); - gemmini_extended4_config_ld(COLS * sizeof(elem_t), MVIN_SCALE_IDENTITY, false, - BLOCK_STRIDE, 0); - gemmini_config_st(COLS * sizeof(elem_t)); - - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < COLS; ++j) - In[i][j] = i * COLS + j; - - gemmini_block_mvin(In, 0, MAX_BLOCK_LEN); - - gemmini_fence(); - - for (size_t n = 0; n < MAX_BLOCK_LEN; ++n) { - gemmini_mvout((elem_t *)Out + n * DIM, n * BLOCK_STRIDE); - } - - // printf("Fence"); - gemmini_fence(); - - if (!MAT_IS_EQUAL(DIM, COLS, In, Out)) { - printf("Matrix:\n"); - printMatrixFull(In); - printf("\nMatrix output:\n"); - printMatrixFull(Out); - printf("\n"); - - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_stride.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_stride.c deleted file mode 100644 index 3b88b58d..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_stride.c +++ /dev/null @@ -1,116 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define MIN(a, b) ((a > b) ? b : a) -#ifdef FAST -#define BIG_DIM (DIM * 2) -#define BINIT MIN(MAX_BLOCK_LEN_ACC, BIG_DIM / DIM) -#else -#define BIG_DIM 64 -#define BINIT 1 -#endif - -#if (BIG_DIM % DIM) != 0 -#error incorrect dimensions -#endif - -#if (BIG_DIM * BIG_DIM / DIM) > (BANK_ROWS * BANK_NUM) -#error not enough rows -#endif - -int is_equal_big(elem_t x[BIG_DIM][BIG_DIM], elem_t y[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) - for (size_t j = 0; j < BIG_DIM; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void printMatrix_big(elem_t m[BIG_DIM][BIG_DIM]) { - for (size_t i = 0; i < BIG_DIM; ++i) { - for (size_t j = 0; j < BIG_DIM; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - for (int block_len = BINIT; - block_len <= BIG_DIM / DIM && block_len <= MAX_BLOCK_LEN; block_len++) { - // printf("block_len: %d\n", block_len); - - static elem_t In[BIG_DIM][BIG_DIM] row_align(1); - static elem_t Out[BIG_DIM][BIG_DIM] row_align(1); - - for (size_t i = 0; i < BIG_DIM; ++i) - for (size_t j = 0; j < BIG_DIM; ++j) { -#ifdef FAST -#define RAND (j + i) -#else -#define RAND rand() -#endif - In[i][j] = (RAND % 64) - 32; // i*BIG_DIM + j; - Out[i][j] = 0; - } - - gemmini_config_ld(BIG_DIM * sizeof(elem_t)); - gemmini_config_st(BIG_DIM * sizeof(elem_t)); - - for (size_t i = 0; i < BIG_DIM; i += DIM) { - for (size_t j = 0; j < BIG_DIM; j += DIM) { - // printf("i: %u, j: %u\n", i, j); - - elem_t *dram_addr_in = &In[i][j]; - elem_t *dram_addr_out = &Out[i][j]; - uint32_t sp_addr = i * (BIG_DIM / DIM) + j; - - int already_moved_in = (j / DIM) % block_len != 0; - - if (!already_moved_in) { - int len = - j + block_len * DIM <= BIG_DIM ? block_len : (BIG_DIM - j) / DIM; - // printf("Moving in with len: %d\n", len); - gemmini_block_mvin(dram_addr_in, sp_addr, len); - gemmini_mvout(dram_addr_out, sp_addr); - } else { - // printf("Already moved in, so moving out\n"); - gemmini_mvout(dram_addr_out, sp_addr); - } - } - } - - gemmini_fence(); - - if (!is_equal_big(In, Out)) { - printf("fails at block_len: %d\n", block_len); - - // printf("Matrix output:\n"); - // printMatrix_big(Out); - // printf("Matrix gold:\n"); - // printMatrix_big(In); - // printf("\n"); - - exit(1); - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_zeros.c b/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_zeros.c deleted file mode 100644 index f7272065..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_mvout_zeros.c +++ /dev/null @@ -1,52 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - // printf("Flush\n"); - gemmini_flush(0); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - static elem_t Out[DIM][DIM] row_align(1); - - // printf("Mvin %d\n", n); - gemmini_mvin(NULL, 0); - // printf("Mvout %d\n", n); - gemmini_mvout(Out, 0); - - // printf("Fence"); - gemmini_fence(); - - bool success = true; - - for (size_t i = 0; i < DIM; i++) - for (size_t j = 0; j < DIM; j++) - if (Out[i][j] != 0) - success = false; - - if (!success) { - printf("Matrix:\n"); - printMatrix(Out); - printf("\n"); - - exit(1); - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/mvin_scale.c b/bb-tests/workloads/src/CTest/gemmini/mvin_scale.c deleted file mode 100644 index 941328b3..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/mvin_scale.c +++ /dev/null @@ -1,126 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define N 8 - -#if (N * DIM) > (BANK_NUM * BANK_ROWS) -#error not enough scratchpad space -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - // printf("Flush\n"); - gemmini_flush(0); - -#ifdef HAS_MVIN_SCALE - elem_t In[N][DIM][DIM] row_align(1); - elem_t Out[N][DIM][DIM] row_align(1); - - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) - In[n][i][j] = i * DIM + j + n; - - gemmini_config_st(DIM * sizeof(elem_t)); - - for (int n = 0; n < N; ++n) { - gemmini_extended_config_ld(DIM * sizeof(elem_t), n); - gemmini_mvin(In[n], n * DIM); - gemmini_mvout(Out[n], n * DIM); - } - - gemmini_fence(); - - for (int n = 0; n < N; ++n) { - bool is_correct = true; - - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { - if (Out[n][i][j] != (elem_t)(MVIN_SCALE(In[n][i][j], n))) { - is_correct = false; - break; - } - } - - if (!is_correct) { - printf("Matrix %u:\n", n); - printMatrix(In[n]); - printf("Matrix %u output:\n", n); - printMatrix(Out[n]); - printf("\n"); - printf("Scale: %d", n); - - exit(1); - } - } -#endif - -#ifdef HAS_MVIN_ACC_SCALE - acc_t In_acc[N][DIM][DIM] row_align_acc(1); - elem_t Out_acc[N][DIM][DIM] row_align(1); - - const uint64_t acc_addr = (uint64_t)(1) << (ADDR_LEN - 1); - - for (size_t n = 0; n < N; ++n) - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) - In_acc[n][i][j] = i * DIM + j + n; - - gemmini_config_st(DIM * sizeof(elem_t)); - - for (int n = 0; n < N; ++n) { - gemmini_extended_config_ld(DIM * sizeof(acc_t), (n + 1)); - gemmini_mvin(In_acc[n], acc_addr | (n * DIM)); - gemmini_mvout(Out_acc[n], acc_addr | (n * DIM)); - } - - gemmini_fence(); - - for (int n = 0; n < N; ++n) { - bool is_correct = true; - - for (size_t i = 0; i < DIM; ++i) - for (size_t j = 0; j < DIM; ++j) { - // acc_t gold = (n+1) * In_acc[n][i][j]; - acc_t gold = ACC_SCALE(In_acc[n][i][j], n + 1); - if (gold > elem_t_max) { - gold = elem_t_max; - } else if (gold < elem_t_min) { - gold = elem_t_min; - } - - if (Out_acc[n][i][j] != gold) { - is_correct = false; - break; - } - } - - if (!is_correct) { - printf("Accumulator matrix %u:\n", n); - printMatrixAcc(In_acc[n]); - printf("Accumulator matrix %u output:\n", n); - printMatrix(Out_acc[n]); - printf("\n"); - - exit(1); - } - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/padded.c b/bb-tests/workloads/src/CTest/gemmini/padded.c deleted file mode 100644 index 2977d5dd..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/padded.c +++ /dev/null @@ -1,190 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -void print_matrix(size_t rows, size_t cols, elem_t mat[rows][cols]) { - for (size_t r = 0; r < rows; r++) { - for (size_t c = 0; c < cols; c++) -#ifndef ELEM_T_IS_FLOAT - printf("%d ", mat[r][c]); -#else - printf("%x ", elem_t_to_elem_t_bits(mat[r][c])); -#endif - printf("\n"); - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - // Test padded mvins - { - const size_t rows = rand() % (DIM - 1) + 1; - const size_t cols = rand() % (DIM - 1) + 1; - elem_t input[rows][cols]; - elem_t output[DIM][DIM]; - - for (size_t r = 0; r < rows; r++) - for (size_t c = 0; c < cols; c++) -#ifndef ELEM_T_IS_FLOAT - input[r][c] = rand() % elem_t_max; -#else - input[r][c] = rand_double(); -#endif - - const size_t sp_addr = 0; - - gemmini_config_ld(cols * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - gemmini_extended_mvin(input, sp_addr, cols, rows); - gemmini_mvout(output, sp_addr); - gemmini_fence(); - - for (size_t r = 0; r < rows; r++) - for (size_t c = 0; c < cols; c++) - if (input[r][c] != output[r][c]) { - printf("Matrices don't match!\n"); - - printf("input:\n"); - print_matrix(rows, cols, input); - - printf("output:\n"); - printMatrix(output); - - exit(1); - } - } - - // Test padded mvins and padded mvouts - { - const size_t rows = rand() % (DIM - 1) + 1; - const size_t cols = rand() % (DIM - 1) + 1; - elem_t input[rows][cols]; - elem_t output[rows][cols]; - - for (size_t r = 0; r < rows; r++) - for (size_t c = 0; c < cols; c++) -#ifndef ELEM_T_IS_FLOAT - input[r][c] = rand() % elem_t_max; -#else - input[r][c] = rand_double(); -#endif - - const size_t sp_addr = 0; - - gemmini_config_ld(cols * sizeof(elem_t)); - gemmini_config_st(cols * sizeof(elem_t)); - - gemmini_extended_mvin(input, sp_addr, cols, rows); - gemmini_extended_mvout(output, sp_addr, cols, rows); - gemmini_fence(); - - for (size_t r = 0; r < rows; r++) - for (size_t c = 0; c < cols; c++) - if (input[r][c] != output[r][c]) { - printf("Matrices don't match!\n"); - - printf("input:\n"); - print_matrix(rows, cols, input); - - printf("output:\n"); - print_matrix(rows, cols, output); - - exit(1); - } - } - - // Test padded matmuls - for (int dataflow = 0; dataflow <= 1; dataflow++) { - const size_t I = rand() % (DIM - 1) + 1; - const size_t J = rand() % (DIM - 1) + 1; - const size_t K = rand() % (DIM - 1) + 1; - elem_t A[I][K]; - elem_t B[K][J]; - elem_t D[I][J]; - elem_t C[I][J]; - elem_t gold[I][J]; - - for (size_t i = 0; i < I; i++) - for (size_t k = 0; k < K; k++) - A[i][k] = rand() % 5; - - for (size_t k = 0; k < K; k++) - for (size_t j = 0; j < J; j++) - B[k][j] = rand() % 5; - - for (size_t i = 0; i < I; i++) - for (size_t j = 0; j < J; j++) - D[i][j] = rand() % 5; - - for (size_t i = 0; i < I; i++) - for (size_t j = 0; j < J; j++) { - acc_t result = D[i][j]; - for (size_t k = 0; k < K; k++) - result += A[i][k] * B[k][j]; - - gold[i][j] = result < elem_t_min - ? elem_t_min - : (result > elem_t_max ? elem_t_max : result); - } - - const size_t A_sp_addr = 0; - const size_t B_sp_addr = DIM; - const size_t D_sp_addr = 2 * DIM; - const size_t C_sp_addr = 3 * DIM; - - gemmini_config_ex(dataflow, NO_ACTIVATION, 0); - gemmini_config_st(J * sizeof(elem_t)); - - gemmini_config_ld(K * sizeof(elem_t)); - gemmini_extended_mvin(A, A_sp_addr, K, I); - - gemmini_config_ld(J * sizeof(elem_t)); - gemmini_extended_mvin(B, B_sp_addr, J, K); - - gemmini_config_ld(J * sizeof(elem_t)); - gemmini_extended_mvin(D, D_sp_addr, J, I); - - if (dataflow == OUTPUT_STATIONARY) { - gemmini_extended_preload(D_sp_addr, C_sp_addr, J, I, J, I); - gemmini_extended_compute_preloaded(A_sp_addr, B_sp_addr, K, I, J, K); - } else { - gemmini_extended_preload(B_sp_addr, C_sp_addr, J, K, J, I); - gemmini_extended_compute_preloaded(A_sp_addr, D_sp_addr, K, I, J, I); - } - - gemmini_extended_mvout(C, C_sp_addr, J, I); - - gemmini_fence(); - - for (size_t r = 0; r < I; r++) - for (size_t c = 0; c < J; c++) - if (C[r][c] != gold[r][c]) { - printf("Matrices don't match! (dataflow == %d)\n", dataflow); - - printf("C:\n"); - print_matrix(I, J, C); - - printf("gold:\n"); - print_matrix(I, J, gold); - - exit(1); - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/raw_hazard.c b/bb-tests/workloads/src/CTest/gemmini/raw_hazard.c deleted file mode 100644 index b4b57f4d..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/raw_hazard.c +++ /dev/null @@ -1,132 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#if BANK_NUM * BANK_ROWS < 5 * DIM -#error need more memory capacity -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - - const int a_additions = 10; - const int b_additions = 10; - const int d_additions = 10; - - static elem_t IDENTITY[DIM][DIM] row_align(1); - - static elem_t result_A[DIM][DIM] row_align(1); - static elem_t result_B[DIM][DIM] row_align(1); - static elem_t result_D[DIM][DIM] row_align(1); - - static elem_t gold_A[DIM][DIM]; - static elem_t gold_B[DIM][DIM]; - static elem_t gold_D[DIM][DIM]; - - for (size_t i = 0; i < DIM; i++) { - for (size_t j = 0; j < DIM; j++) { - IDENTITY[i][j] = i == j; - gold_A[i][j] = i == j ? (a_additions + 1) : 0; - gold_B[i][j] = i == j ? (b_additions + 1) : 0; - gold_D[i][j] = i == j ? (d_additions + 1) : 0; - } - } - - int IDENTITY1_addr = 0; - int IDENTITY2_addr = DIM; - int A_addr = 2 * DIM; - int B_addr = 3 * DIM; - int D_addr = 4 * DIM; - - // printf("Moving in\n"); - gemmini_mvin(IDENTITY, IDENTITY1_addr); - gemmini_mvin(IDENTITY, IDENTITY2_addr); - gemmini_mvin(IDENTITY, A_addr); - gemmini_mvin(IDENTITY, B_addr); - gemmini_mvin(IDENTITY, D_addr); - - // printf("Setting mode\n"); - gemmini_config_ex(OUTPUT_STATIONARY, 0, 0); - - // printf("RAW with A\n"); - for (int i = 0; i < a_additions; i++) { - // printf(" %d\n", i); - - gemmini_preload(IDENTITY1_addr, A_addr); - gemmini_compute_preloaded(A_addr, IDENTITY2_addr); - } - - // printf("RAW with B\n"); - for (int i = 0; i < b_additions; i++) { - gemmini_preload(IDENTITY1_addr, B_addr); - gemmini_compute_preloaded(IDENTITY2_addr, B_addr); - } - - // printf("RAW with D\n"); - for (int i = 0; i < d_additions; i++) { - gemmini_preload(D_addr, D_addr); - gemmini_compute_preloaded(IDENTITY1_addr, IDENTITY2_addr); - } - - // printf("Moving out A\n"); - gemmini_mvout(result_A, A_addr); - // printf("Moving out B\n"); - gemmini_mvout(result_B, B_addr); - // printf("Moving out D\n"); - gemmini_mvout(result_D, D_addr); - - // printf("Fencing\n"); - gemmini_fence(); - - // printf("Checking\n"); - int fail = 0; - - if (!is_equal(result_A, gold_A)) { - printf("A:\n"); - printMatrix(result_A); - printf("\n"); - // printMatrix(gold_A); - // printf("\n"); - - fail = 1; - } - - if (!is_equal(result_B, gold_B)) { - printf("B:\n"); - printMatrix(result_B); - printf("\n"); - // printMatrix(gold_B); - // printf("\n"); - - fail = 1; - } - - if (!is_equal(result_D, gold_D)) { - printf("D:\n"); - printMatrix(result_D); - printf("\n"); - // printMatrix(gold_D); - // printf("\n"); - - fail = 1; - } - - exit(fail); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/resadd.c b/bb-tests/workloads/src/CTest/gemmini/resadd.c deleted file mode 100644 index e5d462d7..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/resadd.c +++ /dev/null @@ -1,105 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 0 // 1 - -#ifndef BAREMETAL - -#define MAT_DIM_I 128 -#define MAT_DIM_J 512 - -#else -#define MAT_DIM_I 35 -#define MAT_DIM_J 27 -#endif - -#define A_SCALE 2 -#define B_SCALE MVIN_SCALE_IDENTITY -#define C_SCALE ACC_SCALE_IDENTITY -#define USE_RELU true - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("I: %d, J: %d\n", MAT_DIM_I, MAT_DIM_J); - - gemmini_flush(0); - - static elem_t A[MAT_DIM_I][MAT_DIM_J] row_align(1); - static elem_t B[MAT_DIM_I][MAT_DIM_J] row_align(1); - static elem_t C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A and B\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - A[i][j] = (rand() % 64) - 32; - B[i][j] = (rand() % 8) - 4; - } - } - - printf("Starting slow CPU resadd\n"); - unsigned long cpu_start = read_cycles(); - resadd_cpu(MAT_DIM_I, MAT_DIM_J, A_SCALE, B_SCALE, C_SCALE, (elem_t *)A, - (elem_t *)B, (elem_t *)gold, USE_RELU); - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); -#endif - - printf("Starting gemmini resadd\n"); - unsigned long start = read_cycles(); - tiled_resadd_auto(MAT_DIM_I, MAT_DIM_J, A_SCALE, B_SCALE, C_SCALE, - (elem_t *)A, (elem_t *)B, (elem_t *)C, USE_RELU, WS); - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(C, gold)) { - printf("C:\n"); - full_printMatrix(C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("A:\n"); - full_printMatrix(A); - printf("B:\n"); - full_printMatrix(B); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/resadd_stride.c b/bb-tests/workloads/src/CTest/gemmini/resadd_stride.c deleted file mode 100644 index 94056538..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/resadd_stride.c +++ /dev/null @@ -1,108 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#ifndef BAREMETAL - -#define MAT_DIM_I 128 -#define MAT_DIM_J 251 - -#else -#define MAT_DIM_I 54 -#define MAT_DIM_J 27 -#endif - -#define J_STRIDE (MAT_DIM_J + 5) - -#define A_SCALE 2 -#define B_SCALE MVIN_SCALE_IDENTITY -#define C_SCALE ACC_SCALE_IDENTITY -#define USE_RELU true - -void full_printMatrix(elem_t m[MAT_DIM_I][J_STRIDE]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][J_STRIDE], - elem_t y[MAT_DIM_I][J_STRIDE]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("I: %d, J: %d\n", MAT_DIM_I, MAT_DIM_J); - - gemmini_flush(0); - - static elem_t A[MAT_DIM_I][J_STRIDE] row_align(1) = {0}; - static elem_t B[MAT_DIM_I][J_STRIDE] row_align(1) = {0}; - static elem_t C[MAT_DIM_I][J_STRIDE] row_align(1) = {0}; - static elem_t gold[MAT_DIM_I][J_STRIDE] = {0}; - -#if CHECK_RESULT == 1 - // printf("Init A and B\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - A[i][j] = (rand() % 64) - 32; - B[i][j] = (rand() % 8) - 4; - } - } - - printf("Starting slow CPU resadd\n"); - unsigned long cpu_start = read_cycles(); - resadd_cpu(MAT_DIM_I, MAT_DIM_J, J_STRIDE, A_SCALE, B_SCALE, C_SCALE, - (elem_t *)A, (elem_t *)B, (elem_t *)gold, USE_RELU); - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); -#endif - - printf("Starting gemmini resadd\n"); - unsigned long start = read_cycles(); - tiled_resadd_stride_auto(MAT_DIM_I, MAT_DIM_J, A_SCALE, B_SCALE, C_SCALE, - J_STRIDE, (elem_t *)A, (elem_t *)B, (elem_t *)C, - USE_RELU, WS); - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(C, gold)) { - printf("C:\n"); - full_printMatrix(C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("A:\n"); - full_printMatrix(A); - printf("B:\n"); - full_printMatrix(B); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/rocc-software/.gitignore b/bb-tests/workloads/src/CTest/gemmini/rocc-software/.gitignore deleted file mode 100644 index 8bf29476..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/rocc-software/.gitignore +++ /dev/null @@ -1,3 +0,0 @@ -*~ -*# -*.#* diff --git a/bb-tests/workloads/src/CTest/gemmini/rocc-software/CONTRIBUTING.md b/bb-tests/workloads/src/CTest/gemmini/rocc-software/CONTRIBUTING.md deleted file mode 100644 index 6e97de44..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/rocc-software/CONTRIBUTING.md +++ /dev/null @@ -1,46 +0,0 @@ -All contributors must agree to the Developer Certificate of Origin Version 1.1. (DCO 1.1) by signing their commits with: - -``` -DCO 1.1 Signed-off-by: [NAME] <[EMAIL]> -``` - -The full text of the DCO 1.1 is as follows: - -``` -Developer Certificate of Origin -Version 1.1 - -Copyright (C) 2004, 2006 The Linux Foundation and its contributors. -660 York Street, Suite 102, -San Francisco, CA 94110 USA - -Everyone is permitted to copy and distribute verbatim copies of this -license document, but changing it is not allowed. - - -Developer's Certificate of Origin 1.1 - -By making a contribution to this project, I certify that: - -(a) The contribution was created in whole or in part by me and I -have the right to submit it under the open source license -indicated in the file; or - -(b) The contribution is based upon previous work that, to the best -of my knowledge, is covered under an appropriate open source -license and I have the right under that license to submit that -work with modifications, whether created in whole or in part -by me, under the same open source license (unless I am -permitted to submit under a different license), as indicated -in the file; or - -(c) The contribution was provided directly to me by some other -person who certified (a), (b) or (c) and I have not modified -it. - -(d) I understand and agree that this project and the contribution -are public and that a record of the contribution (including all -personal information I submit with it, including my sign-off) is -maintained indefinitely and may be redistributed consistent with -this project or the open source license(s) involved. -``` diff --git a/bb-tests/workloads/src/CTest/gemmini/rocc-software/LICENSE b/bb-tests/workloads/src/CTest/gemmini/rocc-software/LICENSE deleted file mode 100644 index 8dada3ed..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/rocc-software/LICENSE +++ /dev/null @@ -1,201 +0,0 @@ - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "{}" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright {yyyy} {name of copyright owner} - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/bb-tests/workloads/src/CTest/gemmini/rocc-software/README.md b/bb-tests/workloads/src/CTest/gemmini/rocc-software/README.md deleted file mode 100644 index 237b3902..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/rocc-software/README.md +++ /dev/null @@ -1,4 +0,0 @@ -Rocket Custom Coprocessor (RoCC) Software -======================================== - -This is a set of C and RISC-V Assembly macros that help with emitting custom RISC-V instructions for talking with Rocket Custom Coprocessors (RoCCs). diff --git a/bb-tests/workloads/src/CTest/gemmini/rocc-software/src/riscv_test_rocc.h b/bb-tests/workloads/src/CTest/gemmini/rocc-software/src/riscv_test_rocc.h deleted file mode 100644 index 0dbd9109..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/rocc-software/src/riscv_test_rocc.h +++ /dev/null @@ -1,26 +0,0 @@ -// Copyright 2018 IBM -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#ifndef ROCC_SOFTWARE_SRC_RISCV_TEST_ROCC_H_ -#define ROCC_SOFTWARE_SRC_RISCV_TEST_ROCC_H_ - -#define RVTEST_XS_ENABLE \ - li a0, MSTATUS_XS &(MSTATUS_XS >> 1); \ - csrs mstatus, a0; - -#define RVTEST_WITH_ROCC \ - .macro init; \ - RVTEST_XS_ENABLE.endm - -#endif // ROCC_SOFTWARE_SRC_RISCV_TEST_ROCC_H_ diff --git a/bb-tests/workloads/src/CTest/gemmini/rocc-software/src/xcustom.h b/bb-tests/workloads/src/CTest/gemmini/rocc-software/src/xcustom.h deleted file mode 100644 index 1880ff56..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/rocc-software/src/xcustom.h +++ /dev/null @@ -1,170 +0,0 @@ -// Copyright 2018--2020 IBM -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#ifndef ROCC_SOFTWARE_SRC_XCUSTOM_H_ -#define ROCC_SOFTWARE_SRC_XCUSTOM_H_ - -#define STR1(x) #x -#ifndef STR -#define STR(x) STR1(x) -#endif - -#define CAT_(A, B) A##B -#define CAT(A, B) CAT_(A, B) - -/** Assembly macro for creating "raw" Rocket Custom Coproessor (RoCC) - * assembly language instructions that will return data in rd. These - * are to be used only in assembly language programs (not C/C++). - * - * Example: - * - * Consider the following macro consisting of a CUSTOM_0 instruction - * with func7 "42" that is doing some operation of "a0 = op(a1, a2)": - * - * ROCC_INSTRUCTION_RAW_R_R_R(0, a0, a1, a2, 42) - * - * This will produce the following pseudo assembly language - * instruction: - * - * .insn r CUSTOM_0, 7, 42, a0, a1, a2 - * - * @param x the custom instruction number: 0, 1, 2, or 3 - * @param rd the destination register, e.g., a0 or x10 - * @param rs1 the first source register, e.g., a0 or x10 - * @param rs2 the second source register, e.g., a0 or x10 - * @param func7 the value of the func7 field - * @return a raw .insn RoCC instruction - */ -#define ROCC_INSTRUCTION_RAW_R_R_R(x, rd, rs1, rs2, func7) \ - .insn r CAT(CUSTOM_, x), 7, func7, rd, rs1, rs2 - -/** Assembly macro for creating "raw" Rocket Custom Coproessor (RoCC) - * assembly language instructions that will *NOT* return data in rd. - * These are to be used only in assembly language programs (not - * C/C++). - * - * Example: - * - * Consider the following macro consisting of a CUSTOM_1 instruction - * with func7 "42" that is doing some operation of "op(a1, a2)". *NO* - * data is returned: - * - * ROCC_INSTRUCTION_RAW_R_R_R(1, a1, a2, 42) - * - * This will produce the following pseudo assembly language - * instruction: - * - * .insn r CUSTOM_1, 3, 42, x0, a1, a2 - * - * @param x the custom instruction number: 0, 1, 2, or 3 - * @param rs1 the first source register, e.g., a0 or x10 - * @param rs2 the second source register, e.g., a0 or x10 - * @param func7 the value of the func7 field - * @return a raw .insn RoCC instruction - */ -#define ROCC_INSTRUCTION_RAW_0_R_R(x, rs1, rs2, func7) \ - .insn r CAT(CUSTOM_, x), 3, func7, x0, rs1, rs2 - -/** C/C++ inline assembly macro for creating Rocket Custom Coprocessor - * (RoCC) instructions that return data in rd. These are to be used - * only in C/C++ programs (not bare assembly). - * - * This is equivalent to ROCC_INSTRUCTION_R_R_R. See it's - * documentation. - */ -#define ROCC_INSTRUCTION(x, rd, rs1, rs2, func7) \ - ROCC_INSTRUCTION_R_R_R(x, rd, rs1, rs2, func7) - -/** C/C++ inline assembly macro for creating Rocket Custom Coprocessor - * (RoCC) instructions that return data in C variable rd. - * These are to be used only in C/C++ programs (not bare assembly). - * - * Example: - * - * Consider the following macro consisting of a CUSTOM_2 instruction - * with func7 "42" that is doing some operation of "a0 = op(a1, a2)" - * (where a0, a1, and a2 are variables defined in C): - * - * ROCC_INSTRUCTION(2, a0, a1, a2, 42) - * - * This will produce the following inline assembly: - * - * asm volatile( - * ".insn r CUSTOM_2, 0x7, 42, %0, %1, %2" - * : "=r"(rd) - * : "r"(rs1), "r"(rs2)); - * - * @param x the custom instruction number: 0, 1, 2, or 3 - * @param rd the C variable to capture as destination operand - * @param rs1 the C variable to capture for first source register - * @param rs2 the C variable to capture for second source register - * @param func7 the value of the func7 field - * @return an inline assembly RoCC instruction - */ -#define ROCC_INSTRUCTION_R_R_R(x, rd, rs1, rs2, func7) \ - { \ - asm volatile(".insn r " STR(CAT(CUSTOM_, x)) ", " STR(0x7) ", " STR( \ - func7) ", %0, %1, %2" \ - : "=r"(rd) \ - : "r"(rs1), "r"(rs2)); \ - } - -/** C/C++ inline assembly macro for creating Rocket Custom Coprocessor - * (RoCC) instructions that return data in C variable rd. - * These are to be used only in C/C++ programs (not bare assembly). - * - * Example: - * - * Consider the following macro consisting of a CUSTOM_3 instruction - * with func7 "42" that is doing some operation of "a0 = op(a1, a2)" - * (where a0, a1, and a2 are variables defined in C): - * - * ROCC_INSTRUCTION(3, a0, a1, a2, 42) - * - * This will produce the following inline assembly: - * - * asm volatile( - * ".insn r CUSTOM_3, 0x7, 42, %0, %1, %2" - * :: "r"(rs1), "r"(rs2)); - * - * @param x the custom instruction number: 0, 1, 2, or 3 - * @param rs1 the C variable to capture for first source register - * @param rs2 the C variable to capture for second source register - * @param funct7 the value of the funct7 f - * @return an inline assembly RoCC instruction - */ -#define ROCC_INSTRUCTION_0_R_R(x, rs1, rs2, func7) \ - { \ - asm volatile(".insn r " STR(CAT(CUSTOM_, x)) ", " STR(0x3) ", " STR( \ - func7) ", x0, %0, %1" \ - : \ - : "r"(rs1), "r"(rs2)); \ - } - -// [TODO] fix these to align with the above approach -// Macro to pass rs2_ as an immediate -/* -#define ROCC_INSTRUCTION_R_R_I(XCUSTOM_, rd_, rs1_, rs2_, funct_) \ - asm volatile (XCUSTOM_" %[rd], %[rs1], %[rs2], %[funct]" \ - : [rd] "=r" (rd_) \ - : [rs1] "r" (rs1_), [rs2] "i" (rs2_), [funct] "i" (funct_)) - -// Macro to pass rs1_ and rs2_ as immediates -#define ROCC_INSTRUCTION_R_I_I(XCUSTOM_, rd_, rs1_, rs2_, funct_) \ - asm volatile (XCUSTOM_" %[rd], %[rs1], %[rs2], %[funct]" \ - : [rd] "=r" (rd_) \ - : [rs1] "i" (rs1_), [rs2] "i" (rs2_), [funct] "i" (funct_)) -*/ - -#endif // ROCC_SOFTWARE_SRC_XCUSTOM_H_ diff --git a/bb-tests/workloads/src/CTest/gemmini/template.c b/bb-tests/workloads/src/CTest/gemmini/template.c deleted file mode 100644 index f8026e6c..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/template.c +++ /dev/null @@ -1,75 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("Flush Gemmini TLB of stale virtual addresses\n"); - gemmini_flush(0); - - printf("Initialize our input and output matrices in main memory\n"); - elem_t In[DIM][DIM]; - elem_t Out[DIM][DIM]; - - elem_t Identity[DIM][DIM]; - for (size_t i = 0; i < DIM; i++) - for (size_t j = 0; j < DIM; j++) - Identity[i][j] = i == j; - - printf("Calculate the scratchpad addresses of all our matrices\n"); - printf(" Note: The scratchpad is \"row-addressed\", where each address " - "contains one matrix row\n"); - size_t In_sp_addr = 0; - size_t Out_sp_addr = DIM; - size_t Identity_sp_addr = 2 * DIM; - - printf("Move \"In\" matrix from main memory into Gemmini's scratchpad\n"); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - gemmini_mvin(In, In_sp_addr); - - printf( - "Move \"Identity\" matrix from main memory into Gemmini's scratchpad\n"); - gemmini_mvin(Identity, Identity_sp_addr); - - printf("Multiply \"In\" matrix with \"Identity\" matrix with a bias of 0\n"); - gemmini_config_ex(OUTPUT_STATIONARY, 0, 0); - gemmini_preload_zeros(Out_sp_addr); - gemmini_compute_preloaded(In_sp_addr, Identity_sp_addr); - - printf("Move \"Out\" matrix from Gemmini's scratchpad into main memory\n"); - gemmini_config_st(DIM * sizeof(elem_t)); - gemmini_mvout(Out, Out_sp_addr); - - printf("Fence till Gemmini completes all memory operations\n"); - gemmini_fence(); - - printf("Check whether \"In\" and \"Out\" matrices are identical\n"); - if (!is_equal(In, Out)) { - printf("Input and output matrices are different!\n"); - printf("\"In\" matrix:\n"); - printMatrix(In); - printf("\"Out\" matrix:\n"); - printMatrix(Out); - printf("\n"); - - exit(1); - } - - printf("Input and output matrices are identical, as expected\n"); - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_cpu.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_cpu.c deleted file mode 100644 index 10b6840f..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_cpu.c +++ /dev/null @@ -1,167 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#error variable-bitwidth bias not currently supported -#endif - -#ifndef BAREMETAL -#define MAT_DIM_I 512 -#define MAT_DIM_K 512 -#define MAT_DIM_J 512 -#else -#define MAT_DIM_I 64 -#define MAT_DIM_K 64 -#define MAT_DIM_J 64 -#endif - -void print_tile(elem_t *in, int tile_dim) { - for (size_t r = 0; r < tile_dim; r++) { - printf("row starts at: %p\n", in + r * MAT_DIM_J); - for (size_t c = 0; c < tile_dim; c++) { - printf("%d ", *(in + r * MAT_DIM_J + c)); - } - printf("\n"); - } -} - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_K][MAT_DIM_J], - ACC_T D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void full_matscale(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], acc_scale_t scale) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Scale element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc( - 1); // TODO don't use row_align_acc when ACC_T is elem_t - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = rand() % 2; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = rand() % 2; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : rand() % 2; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); - full_matmul(full_A, full_B, full_D, gold_full); - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - full_matscale(gold_full, gold, ACC_SCALE_IDENTITY); -#endif - - printf("Starting fast CPU matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, false, false, false, false, 0, CPU); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_option.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_option.c deleted file mode 100644 index bde638c8..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_option.c +++ /dev/null @@ -1,245 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#ifndef BAREMETAL -#define MAT_DIM_I 300 -#define MAT_DIM_K 200 -#define MAT_DIM_J 100 -#else -#define MAT_DIM_I 33 -#define MAT_DIM_K 28 -#define MAT_DIM_J 32 -#endif - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_K][MAT_DIM_J], - acc_t D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J], bool repeating_bias) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[repeating_bias ? 0 : r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -void full_matmul_At(elem_t A[MAT_DIM_K][MAT_DIM_I], - elem_t B[MAT_DIM_K][MAT_DIM_J], - acc_t D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J], bool repeating_bias) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[repeating_bias ? 0 : r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[k][r] * B[k][c]; - } -} - -void full_matmul_Bt(elem_t A[MAT_DIM_I][MAT_DIM_K], - elem_t B[MAT_DIM_J][MAT_DIM_K], - acc_t D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J], bool repeating_bias) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[repeating_bias ? 0 : r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[c][k]; - } -} - -void full_matmul_At_Bt(elem_t A[MAT_DIM_K][MAT_DIM_I], - elem_t B[MAT_DIM_J][MAT_DIM_K], - acc_t D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J], - bool repeating_bias) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[repeating_bias ? 0 : r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[k][r] * B[c][k]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -void full_printMatrix64Bit(full_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%lld ", m[i][j]); - printf("\n"); - } -} - -void full_matscale(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], acc_scale_t scale) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Bitshift and round element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -void full_matrelu(elem_t in[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) - out[r][c] = in[r][c] > 0 ? in[r][c] : 0; -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - -#ifdef BAREMETAL - for (enum tiled_matmul_type_t option = OS; option <= WS; option++) { - for (int activation = 0; activation <= 1; activation++) { - for (int scale = 0; scale <= 1; scale += 1) { -#else - for (enum tiled_matmul_type_t option = OS; option <= CPU; option++) { - for (int activation = 0; activation <= 2; activation++) { - for (int scale = 0; scale <= 12; scale += 6) { -#endif - for (int no_bias = 0; no_bias < 2; no_bias++) { - for (int repeating_bias = 0; repeating_bias < 2; repeating_bias++) { - for (int a_transpose = 0; a_transpose < 2; a_transpose++) { - for (int b_transpose = 0; b_transpose < 2; b_transpose++) { - - if (((option == OS || option == CPU) && - (a_transpose || b_transpose)) || - (option == WS && a_transpose && b_transpose)) { - continue; - } - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static acc_t full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = (rand() % 3) - 1; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = (rand() % 3) - 1; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < (repeating_bias ? 1 : MAT_DIM_I); ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = no_bias ? 0 : ((rand() % 3) - 1); - } - } - -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wincompatible-pointer-types" - printf("Starting CPU matmul\n"); - if (!a_transpose && !b_transpose) { - full_matmul(full_A, full_B, full_D, gold_full, - repeating_bias); - } else if (a_transpose && !b_transpose) { - full_matmul_At(full_A, full_B, full_D, gold_full, - repeating_bias); - } else if (!a_transpose && b_transpose) { - full_matmul_Bt(full_A, full_B, full_D, gold_full, - repeating_bias); - } else if (a_transpose && b_transpose) { - full_matmul_At_Bt(full_A, full_B, full_D, gold_full, - repeating_bias); - } - full_matscale(gold_full, gold, scale); -#pragma GCC diagnostic pop - - if (activation == RELU) { - full_matrelu(gold, gold); - } - - size_t stride_A = a_transpose ? MAT_DIM_I : MAT_DIM_K; - size_t stride_B = b_transpose ? MAT_DIM_K : MAT_DIM_J; - - printf("Starting gemmini matmul\n"); - tiled_matmul_auto( - MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, no_bias ? NULL : &full_D[0][0], - (elem_t *)full_C, stride_A, stride_B, MAT_DIM_J, MAT_DIM_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, activation, scale, 0, repeating_bias, - a_transpose, b_transpose, false, false, 0, option); - - if (!full_is_equal(full_C, gold)) { - printf("\nINCORRECT!\n"); - printf("option: %d\n", option); - printf("activation: %d\n", activation); - printf("scale: %d\n", scale); - printf("no_bias: %d\n", no_bias); - printf("repeating_bias: %d\n", repeating_bias); - printf("a_transpose: %d\n", a_transpose); - printf("b_transpose: %d\n", b_transpose); - - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("Gold full:\n"); - full_printMatrix64Bit(gold_full); - printf("\n"); - - exit(1); - } - } - } - } - } - } - } - } - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_os.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_os.c deleted file mode 100644 index ce28bfb4..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_os.c +++ /dev/null @@ -1,180 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#error variable-bitwidth bias not currently supported -#endif - -#ifndef BAREMETAL -#define MAT_DIM_I 512 -#define MAT_DIM_K 512 -#define MAT_DIM_J 512 -#else -#define MAT_DIM_I 64 -#define MAT_DIM_K 64 -#define MAT_DIM_J 64 -#endif - -void print_tile(elem_t *in, int tile_dim) { - for (size_t r = 0; r < tile_dim; r++) { - printf("row starts at: %p\n", in + r * MAT_DIM_J); - for (size_t c = 0; c < tile_dim; c++) { - printf("%d ", *(in + r * MAT_DIM_J + c)); - } - printf("\n"); - } -} - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_K][MAT_DIM_J], - ACC_T D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void full_matshift(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], int shift) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Bitshift and round element - full_t shifted = ROUNDING_RIGHT_SHIFT(full[r][c], shift); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = shifted > elem_t_max - ? elem_t_max - : (shifted < elem_t_min ? elem_t_min : shifted); - out[r][c] = elem; -#else - out[r][c] = shifted; // TODO should we also saturate when using floats? -#endif - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); -#ifdef FAST -#define RAND 1 -#else -#define RAND rand() -#endif - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = RAND % 2; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = RAND % 2; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : RAND % 2; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); -#ifndef FAST - full_matmul(full_A, full_B, full_D, gold_full); -#else - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - gold_full[i][j] = MAT_DIM_K + (NO_BIAS ? 0 : (RAND % 2)); - } - } - -#endif - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - full_matshift(gold_full, gold, 0); -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, false, false, false, false, 0, OS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws.c deleted file mode 100644 index dd07cd5f..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws.c +++ /dev/null @@ -1,182 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifndef BAREMETAL -#define MAT_DIM_I 512 -#define MAT_DIM_K 512 -#define MAT_DIM_J 512 -#else -#define MAT_DIM_I 64 -#define MAT_DIM_K 64 -#define MAT_DIM_J 64 -#endif - -void print_tile(elem_t *in, int tile_dim) { - for (size_t r = 0; r < tile_dim; r++) { - printf("row starts at: %p\n", in + r * MAT_DIM_J); - for (size_t c = 0; c < tile_dim; c++) { - printf("%d ", *(in + r * MAT_DIM_J + c)); - } - printf("\n"); - } -} - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_K][MAT_DIM_J], - ACC_T D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void full_matscale(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], acc_scale_t scale) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Scale element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("MAT_DIM_I: %d\n", MAT_DIM_I); - printf("MAT_DIM_J: %d\n", MAT_DIM_J); - printf("MAT_DIM_K: %d\n", MAT_DIM_K); - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 -#ifdef FAST -#define RAND 1 -#else -#define RAND rand() -#endif - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = RAND % 2; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = RAND % 2; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : RAND % 2; - } - } - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, false, false, false, !FULL_BIAS_WIDTH, 0, WS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); -#ifdef FAST - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - gold_full[i][j] = MAT_DIM_K + (NO_BIAS ? 0 : (RAND % 2)); - } - } - -#else - full_matmul(full_A, full_B, full_D, gold_full); -#endif - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - full_matscale(gold_full, gold, ACC_SCALE_IDENTITY); -#endif - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_At.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_At.c deleted file mode 100644 index 847efb8a..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_At.c +++ /dev/null @@ -1,200 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifdef FAST - -#define MAT_DIM_I 19 -#define MAT_DIM_K 18 -#define MAT_DIM_J 17 - -#else - -#ifndef BAREMETAL -#define MAT_DIM_I 500 -#define MAT_DIM_K 412 -#define MAT_DIM_J 300 -#else -#define MAT_DIM_I 60 -#define MAT_DIM_K 50 -#define MAT_DIM_J 30 -#endif - -#endif // ifdef FAST - -void print_tile(elem_t *in, int tile_dim) { - for (size_t r = 0; r < tile_dim; r++) { - printf("row starts at: %p\n", in + r * MAT_DIM_J); - for (size_t c = 0; c < tile_dim; c++) { - printf("%d ", *(in + r * MAT_DIM_J + c)); - } - printf("\n"); - } -} - -void full_matmul(elem_t A[MAT_DIM_K][MAT_DIM_I], elem_t B[MAT_DIM_K][MAT_DIM_J], - ACC_T D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[k][r] * B[k][c]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void full_matscale(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], acc_scale_t scale) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Scale element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_K][MAT_DIM_I] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_I; ++j) { -#ifdef FAST - full_A[i][j] = 1; -#else - full_A[i][j] = rand() % 2; -#endif - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { -#ifdef FAST - full_B[i][j] = 1; -#else - full_B[i][j] = rand() % 2; -#endif - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { -#ifdef FAST - full_D[i][j] = NO_BIAS ? 0 : 1; -#else - full_D[i][j] = NO_BIAS ? 0 : rand() % 2; -#endif - } - } - -#ifdef FAST - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - gold[i][j] = MAT_DIM_K + !NO_BIAS; - } - } -#else - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); - full_matmul(full_A, full_B, full_D, gold_full); - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - full_matscale(gold_full, gold, ACC_SCALE_IDENTITY); -#endif // #ifdef FAST - -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_I, MAT_DIM_J, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, true, false, false, !FULL_BIAS_WIDTH, 0, WS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); -#ifdef FAST - printf("All elements must be %d\n", MAT_DIM_K + !NO_BIAS); -#else - full_printMatrix(gold); - printf("\n"); -#endif // ifdef FAST - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_Bt.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_Bt.c deleted file mode 100644 index cb47be42..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_Bt.c +++ /dev/null @@ -1,201 +0,0 @@ - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifdef FAST - -#define MAT_DIM_I 19 -#define MAT_DIM_K 18 -#define MAT_DIM_J 17 - -#else - -#ifndef BAREMETAL -#define MAT_DIM_I 500 -#define MAT_DIM_K 412 -#define MAT_DIM_J 300 -#else -#define MAT_DIM_I 60 -#define MAT_DIM_K 50 -#define MAT_DIM_J 30 - -#endif - -#endif // ifdef FAST - -void print_tile(elem_t *in, int tile_dim) { - for (size_t r = 0; r < tile_dim; r++) { - printf("row starts at: %p\n", in + r * MAT_DIM_J); - for (size_t c = 0; c < tile_dim; c++) { - printf("%d ", *(in + r * MAT_DIM_J + c)); - } - printf("\n"); - } -} - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_J][MAT_DIM_K], - ACC_T D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[c][k]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void full_matscale(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], acc_scale_t scale) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Scale element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_J][MAT_DIM_K] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { -#ifdef FAST - full_A[i][j] = 1; -#else - full_A[i][j] = rand() % 2; -#endif - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_J; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { -#ifdef FAST - full_B[i][j] = 1; -#else - full_B[i][j] = rand() % 2; -#endif - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { -#ifdef FAST - full_D[i][j] = NO_BIAS ? 0 : 1; -#else - full_D[i][j] = NO_BIAS ? 0 : rand() % 2; -#endif - } - } - -#ifdef FAST - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - gold[i][j] = MAT_DIM_K + !NO_BIAS; - } - } -#else - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); - full_matmul(full_A, full_B, full_D, gold_full); - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - full_matscale(gold_full, gold, ACC_SCALE_IDENTITY); -#endif // #ifdef FAST - -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_K, MAT_DIM_K, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, false, true, false, !FULL_BIAS_WIDTH, 0, WS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - - printf("Gold:\n"); -#ifdef FAST - printf("All elements must be %d\n", MAT_DIM_K + !NO_BIAS); -#else - full_printMatrix(gold); - printf("\n"); -#endif // ifdef FAST - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_full_C.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_full_C.c deleted file mode 100644 index 87c7e277..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_full_C.c +++ /dev/null @@ -1,149 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifndef BAREMETAL -#define MAT_DIM_I 512 -#define MAT_DIM_K 512 -#define MAT_DIM_J 512 -#else -#define MAT_DIM_I 60 -#define MAT_DIM_K 40 -#define MAT_DIM_J 30 -#endif - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_K][MAT_DIM_J], - ACC_T D[MAT_DIM_I][MAT_DIM_J], acc_t C[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C[r][c] += A[r][k] * B[k][c]; - } -} - -void full_printMatrix(acc_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(acc_t x[MAT_DIM_I][MAT_DIM_J], - acc_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static acc_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static acc_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - -#ifdef FAST -#define RAND 1 -#else -#define RAND rand() -#endif - - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = RAND % 2; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = RAND % 2; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : RAND % 2; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); -#ifdef FAST - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - gold[i][j] = MAT_DIM_K + (NO_BIAS ? 0 : (RAND % 2)); - } - } - -#else - full_matmul(full_A, full_B, full_D, gold); -#endif - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], full_C, - MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, MAT_DIM_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, false, false, true, !FULL_BIAS_WIDTH, 0, WS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_igelu.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_igelu.c deleted file mode 100644 index 61b01ec8..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_igelu.c +++ /dev/null @@ -1,144 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#define BERT_SCALE 0.8 - -#ifndef BAREMETAL - -#define MAT_DIM_I 128 -#define MAT_DIM_K 512 -#define MAT_DIM_J 512 - -#else -#define MAT_DIM_I 30 -#define MAT_DIM_K 30 -#define MAT_DIM_J 30 -#endif - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#if defined(FAST) || !defined(HAS_NORMALIZATIONS) - exit(0); -#endif - -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("I: %d, J: %d, K: %d\n", MAT_DIM_I, MAT_DIM_J, MAT_DIM_K); - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = (rand() % 3) - 1; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = (rand() % 3) - 1; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : (rand() % 3) - 1; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)gold, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, MAT_DIM_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, IGELU, ACC_SCALE_IDENTITY, BERT_SCALE, - false, false, false, false, !FULL_BIAS_WIDTH, 0, CPU); - - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - -#endif - - printf("Starting gemmini matmul\n"); - printf("I: %d, J: %d, K: %d\n", MAT_DIM_I, MAT_DIM_J, MAT_DIM_K); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, IGELU, ACC_SCALE_IDENTITY, BERT_SCALE, - false, false, false, false, !FULL_BIAS_WIDTH, 0, WS); - - gemmini_fence(); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_layernorm.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_layernorm.c deleted file mode 100644 index 5374d0c2..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_layernorm.c +++ /dev/null @@ -1,174 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 0 -#define FULL_BIAS_WIDTH 1 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifndef BAREMETAL - -#define MAT_DIM_I 32 -#define MAT_DIM_K 240 -#define MAT_DIM_J 512 - -#else -#define MAT_DIM_I 31 -#define MAT_DIM_K 30 -#define MAT_DIM_J 66 -#endif - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -void full_printMatrix_acc(acc_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#if defined(FAST) || !defined(HAS_NORMALIZATIONS) - exit(0); -#endif - -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("MAT_DIM_I: %d\n", MAT_DIM_I); - printf("MAT_DIM_J: %d\n", MAT_DIM_J); - printf("MAT_DIM_K: %d\n", MAT_DIM_K); - printf("NO_BIAS: %d\n", NO_BIAS); - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static acc_t unnormed_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = (rand() % 3) - 1; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = (rand() % 3) - 1; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : (rand() % 3) - 1; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)gold, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, MAT_DIM_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, LAYERNORM, ACC_SCALE_IDENTITY, 0, - false, false, false, false, !FULL_BIAS_WIDTH, 0, CPU); - - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - /* - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, - (elem_t*)full_A, (elem_t*)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t*)full_C, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, MAT_DIM_J, - MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - LAYERNORM, ACC_SCALE_IDENTITY, 0, false, - - false, false, - false, !FULL_BIAS_WIDTH, - 0, - WS); - */ - - tiled_matmul_auto( - MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, (elem_t *)full_B, - NO_BIAS ? NULL : &full_D[0][0], (acc_t *)unnormed_C, MAT_DIM_K, MAT_DIM_J, - MAT_DIM_J, MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, false, - - false, false, true, !FULL_BIAS_WIDTH, 0, WS); - - gemmini_fence(); - - tiled_norm_auto(MAT_DIM_I, MAT_DIM_J, (acc_t *)unnormed_C, (elem_t *)full_C, - ACC_SCALE_IDENTITY, LAYERNORM, WS); - - gemmini_fence(); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("\nUnnormed:\n"); - full_printMatrix_acc(unnormed_C); - printf("\nGold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_low_D.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_low_D.c deleted file mode 100644 index ed8c39f7..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_low_D.c +++ /dev/null @@ -1,179 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 0 -#define FULL_BIAS_WIDTH 0 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifndef BAREMETAL -#define MAT_DIM_I 500 -#define MAT_DIM_K 400 -#define MAT_DIM_J 300 -#else -#define MAT_DIM_I 60 -#define MAT_DIM_K 40 -#define MAT_DIM_J 30 -#endif - -void print_tile(elem_t *in, int tile_dim) { - for (size_t r = 0; r < tile_dim; r++) { - printf("row starts at: %p\n", in + r * MAT_DIM_J); - for (size_t c = 0; c < tile_dim; c++) { - printf("%d ", *(in + r * MAT_DIM_J + c)); - } - printf("\n"); - } -} - -void full_matmul(elem_t A[MAT_DIM_I][MAT_DIM_K], elem_t B[MAT_DIM_K][MAT_DIM_J], - ACC_T D[MAT_DIM_I][MAT_DIM_J], - full_t C_full[MAT_DIM_I][MAT_DIM_J]) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - C_full[r][c] = D[r][c]; - for (size_t k = 0; k < MAT_DIM_K; k++) - C_full[r][c] += A[r][k] * B[k][c]; - } -} - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -void full_matscale(full_t full[MAT_DIM_I][MAT_DIM_J], - elem_t out[MAT_DIM_I][MAT_DIM_J], acc_scale_t scale) { - for (size_t r = 0; r < MAT_DIM_I; r++) - for (size_t c = 0; c < MAT_DIM_J; c++) { - // Scale element - full_t scaled = ACC_SCALE(full[r][c], scale); - - // Saturate and cast element -#ifndef ELEM_T_IS_FLOAT - full_t elem = scaled > elem_t_max - ? elem_t_max - : (scaled < elem_t_min ? elem_t_min : scaled); - out[r][c] = elem; -#else - out[r][c] = scaled; // TODO should we also saturate when using floats? -#endif - } -} - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 -#ifdef FAST -#define RAND 1 -#else -#define RAND rand() -#endif - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = RAND % 2; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = RAND % 2; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : RAND % 2; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); -#ifdef FAST - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - gold_full[i][j] = MAT_DIM_K + (NO_BIAS ? 0 : (RAND % 2)); - } - } - -#else - full_matmul(full_A, full_B, full_D, gold_full); -#endif - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - full_matscale(gold_full, gold, ACC_SCALE_IDENTITY); -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto(MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, - (elem_t *)full_B, NO_BIAS ? NULL : &full_D[0][0], - (elem_t *)full_C, MAT_DIM_K, MAT_DIM_J, MAT_DIM_J, - MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, NO_ACTIVATION, ACC_SCALE_IDENTITY, 0, - false, false, false, false, !FULL_BIAS_WIDTH, 0, WS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_perf.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_perf.c deleted file mode 100644 index 02b8ff8e..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_perf.c +++ /dev/null @@ -1,108 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define ACTIVATION NO_ACTIVATION - -#define NO_BIAS 0 -#define REPEATING_BIAS 1 - -#define A_TRANSPOSE 0 -#define B_TRANSPOSE 0 - -#ifndef BAREMETAL - -#define MAT_DIM_I 128 -#define MAT_DIM_K 512 -#define MAT_DIM_J 256 - -#else - -#define MAT_DIM_I 128 -#define MAT_DIM_K 256 -#define MAT_DIM_J 256 - -#endif - -#if A_TRANSPOSE == 0 -#define A_STRIDE MAT_DIM_K -#else -#define A_STRIDE MAT_DIM_I -#endif - -#if B_TRANSPOSE == 0 -#define B_STRIDE MAT_DIM_J -#else -#define B_STRIDE MAT_DIM_K -#endif - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - -#if A_TRANSPOSE == 0 - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); -#else - static elem_t full_A[MAT_DIM_K][MAT_DIM_I] row_align(1); -#endif - -#if B_TRANSPOSE == 0 - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); -#else - static elem_t full_B[MAT_DIM_J][MAT_DIM_K] row_align(1); -#endif - - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static acc_t full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static full_t gold_full[MAT_DIM_I][MAT_DIM_J]; - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - - counter_configure(0, RDMA_BYTES_REC); - counter_configure(1, WDMA_BYTES_SENT); - counter_reset(); - - printf("Starting gemmini matmul\n"); - printf("I: %d, J: %d, K: %d\n", MAT_DIM_I, MAT_DIM_J, MAT_DIM_K); - printf("NO_BIAS: %d, REPEATING_BIAS: %d\n", NO_BIAS, REPEATING_BIAS); - printf("A_TRANSPOSE: %d, B_TRANSPOSE: %d\n", A_TRANSPOSE, B_TRANSPOSE); - uint64_t start = read_cycles(); - - tiled_matmul_auto( - MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, (elem_t *)full_B, - NO_BIAS ? NULL : &full_D[0][0], (elem_t *)full_C, A_STRIDE, B_STRIDE, - MAT_DIM_J, MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, ACTIVATION, ACC_SCALE_IDENTITY, 0, REPEATING_BIAS, - A_TRANSPOSE, B_TRANSPOSE, false, false, 0, WS); - - gemmini_fence(); - - uint64_t end = read_cycles(); - printf("Cycles taken: %llu\n", end - start); - - const uint64_t total_macs = MAT_DIM_I * MAT_DIM_J * MAT_DIM_K; - const uint64_t ideal_cycles = total_macs / (DIM * DIM); - const uint64_t utilization = 100 * ideal_cycles / (end - start); - printf("Total macs: %llu\n", total_macs); - printf("Ideal cycles: %llu\n", ideal_cycles); - printf("Utilization: %llu%%\n", utilization); - - printf("RDMA_BYTES_REC: %u\n", counter_read(0)); - printf("WDMA_BYTES_SENT: %u\n", counter_read(1)); - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_softmax.c b/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_softmax.c deleted file mode 100644 index a36ccd01..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/tiled_matmul_ws_softmax.c +++ /dev/null @@ -1,136 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -#define CHECK_RESULT 1 - -#define NO_BIAS 1 -#define FULL_BIAS_WIDTH 1 -#define BERT_SCALE 0.05 - -#if FULL_BIAS_WIDTH -typedef acc_t ACC_T; -#else -typedef elem_t ACC_T; -#endif - -#ifndef BAREMETAL -#define MAT_DIM_I 128 -#define MAT_DIM_K 64 -#define MAT_DIM_J 128 -#else -#define MAT_DIM_I 31 -#define MAT_DIM_K 30 -#define MAT_DIM_J 66 -#endif - -void full_printMatrix(elem_t m[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) - printf("%d ", m[i][j]); - printf("\n"); - } -} - -int full_is_equal(elem_t x[MAT_DIM_I][MAT_DIM_J], - elem_t y[MAT_DIM_I][MAT_DIM_J]) { - for (size_t i = 0; i < MAT_DIM_I; ++i) - for (size_t j = 0; j < MAT_DIM_J; ++j) - if (x[i][j] != y[i][j]) - return 0; - return 1; -} - -int main() { -#if defined(FAST) || !defined(HAS_NORMALIZATIONS) - exit(0); -#endif - -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - gemmini_flush(0); - - static elem_t full_A[MAT_DIM_I][MAT_DIM_K] row_align(1); - static elem_t full_B[MAT_DIM_K][MAT_DIM_J] row_align(1); - static elem_t full_C[MAT_DIM_I][MAT_DIM_J] row_align(1); - static ACC_T full_D[MAT_DIM_I][MAT_DIM_J] row_align_acc(1); - - static elem_t gold[MAT_DIM_I][MAT_DIM_J]; - -#if CHECK_RESULT == 1 - // printf("Init A\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_K; ++j) { - full_A[i][j] = (rand() % 7) - 3; - } - } - - // printf("Init B\n"); - for (size_t i = 0; i < MAT_DIM_K; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_B[i][j] = (rand() % 7) - 3; - } - } - - // printf("Init D\n"); - for (size_t i = 0; i < MAT_DIM_I; ++i) { - for (size_t j = 0; j < MAT_DIM_J; ++j) { - full_D[i][j] = NO_BIAS ? 0 : (rand() % 3) - 1; - } - } - - printf("Starting slow CPU matmul\n"); - unsigned long cpu_start = read_cycles(); - - tiled_matmul_auto( - MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, (elem_t *)full_B, - NO_BIAS ? NULL : &full_D[0][0], (elem_t *)gold, MAT_DIM_K, MAT_DIM_J, - MAT_DIM_J, MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, SOFTMAX, ACC_SCALE_IDENTITY, BERT_SCALE, false, - false, false, false, !FULL_BIAS_WIDTH, 0, CPU); - - unsigned long cpu_end = read_cycles(); - printf("Cycles taken: %u\n", cpu_end - cpu_start); - -#endif - - printf("Starting gemmini matmul\n"); - unsigned long start = read_cycles(); - - tiled_matmul_auto( - MAT_DIM_I, MAT_DIM_J, MAT_DIM_K, (elem_t *)full_A, (elem_t *)full_B, - NO_BIAS ? NULL : &full_D[0][0], (elem_t *)full_C, MAT_DIM_K, MAT_DIM_J, - MAT_DIM_J, MAT_DIM_J, MVIN_SCALE_IDENTITY, MVIN_SCALE_IDENTITY, - MVIN_SCALE_IDENTITY, SOFTMAX, ACC_SCALE_IDENTITY, BERT_SCALE, false, - false, false, false, !FULL_BIAS_WIDTH, 0, WS); - - unsigned long end = read_cycles(); - printf("Cycles taken: %u\n", end - start); - -#if CHECK_RESULT == 1 - if (!full_is_equal(full_C, gold)) { - printf("C:\n"); - full_printMatrix(full_C); - printf("Gold:\n"); - full_printMatrix(gold); - printf("\n"); - - exit(1); - } -#endif - - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/gemmini/transpose.c b/bb-tests/workloads/src/CTest/gemmini/transpose.c deleted file mode 100644 index 4ad4329b..00000000 --- a/bb-tests/workloads/src/CTest/gemmini/transpose.c +++ /dev/null @@ -1,77 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include -#include -#include -#ifndef BAREMETAL -#include -#endif -#include "include/gemmini_testutils.h" - -int main() { -#ifndef BAREMETAL - if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { - perror("mlockall failed"); - exit(1); - } -#endif - - printf("Flush Gemmini TLB of stale virtual addresses\n"); - gemmini_flush(0); - - printf("Initialize our input and output matrices in main memory\n"); - elem_t In[DIM][DIM]; - elem_t Out[DIM][DIM]; - - elem_t Identity[DIM][DIM]; - for (size_t i = 0; i < DIM; i++) - for (size_t j = 0; j < DIM; j++) { - In[i][j] = rand() % 10; - Identity[i][j] = i == j; - } - - printf("Calculate the scratchpad addresses of all our matrices\n"); - printf(" Note: The scratchpad is \"row-addressed\", where each address " - "contains one matrix row\n"); - size_t In_sp_addr = 0; - size_t Out_sp_addr = DIM; - size_t Identity_sp_addr = 2 * DIM; - - printf("Move \"In\" matrix from main memory into Gemmini's scratchpad\n"); - gemmini_config_ld(DIM * sizeof(elem_t)); - gemmini_config_st(DIM * sizeof(elem_t)); - gemmini_mvin(In, In_sp_addr); - - printf( - "Move \"Identity\" matrix from main memory into Gemmini's scratchpad\n"); - gemmini_mvin(Identity, Identity_sp_addr); - - printf("Multiply \"In\" matrix with \"Identity\" matrix with a bias of 0\n"); - gemmini_extended_config_ex(OUTPUT_STATIONARY, 0, 0, 1, true, false) - gemmini_preload_zeros(Out_sp_addr); - gemmini_compute_preloaded(In_sp_addr, Identity_sp_addr); - - printf("Move \"Out\" matrix from Gemmini's scratchpad into main memory\n"); - gemmini_config_st(DIM * sizeof(elem_t)); - gemmini_mvout(Out, Out_sp_addr); - - printf("Fence till Gemmini completes all memory operations\n"); - gemmini_fence(); - - printf("Check whether \"In\" and \"Out\" matrices are identical\n"); - if (!is_equal_transposed(In, Out)) { - printf("Input and output matrices are different!\n"); - printf("\"In\" matrix:\n"); - printMatrix(In); - printf("\"Out\" matrix:\n"); - printMatrix(Out); - printf("\n"); - - exit(1); - } - - printf("Input and output matrices are identical, as expected\n"); - exit(0); -} diff --git a/bb-tests/workloads/src/CTest/goban/CMakeLists.txt b/bb-tests/workloads/src/CTest/goban/CMakeLists.txt new file mode 100644 index 00000000..ac0c7652 --- /dev/null +++ b/bb-tests/workloads/src/CTest/goban/CMakeLists.txt @@ -0,0 +1,56 @@ +set(ELF_CC "riscv64-unknown-elf-gcc") + +#------------------------------------------------------------------------------- +# Goban: multi-core (SPMD) baremetal workloads +# All cores run the ELF simultaneously; divergence is by mhartid. +#------------------------------------------------------------------------------- +set(CTEST_GOBAN_WORKLOAD_DIR ${CTEST_WORKLOAD_DIR}/goban) +set(GOBAN_LD ${CTEST_GOBAN_WORKLOAD_DIR}/goban.ld) + +set(GOBAN_C_FLAGS + -g -fno-common -O2 -static + -march=rv64gc -mcmodel=medany + -fno-builtin-printf -specs=nano.specs -specs=nosys.specs -nostartfiles + -Wl,-T,${GOBAN_LD} + -I${WORKLOAD_LIB_DIR} + -I${CTEST_GOBAN_WORKLOAD_DIR} +) + +# Generate a multicore baremetal ELF. +# start.S (goban version) launches all harts into main(). +function(add_goban_test_target TEST_NAME SOURCE_FILE) + set(EXECUTABLE "${TEST_NAME}_multicore-baremetal") + + add_custom_command( + OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} + COMMAND ${ELF_CC} ${GOBAN_C_FLAGS} + -o ${EXECUTABLE} + ${CTEST_GOBAN_WORKLOAD_DIR}/start.S + ${CTEST_GOBAN_WORKLOAD_DIR}/${SOURCE_FILE} + DEPENDS + ${CTEST_GOBAN_WORKLOAD_DIR}/${SOURCE_FILE} + ${CTEST_GOBAN_WORKLOAD_DIR}/start.S + COMMENT "Building goban multicore baremetal: ${EXECUTABLE}" + WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} + ) + + add_custom_target(${TEST_NAME} + DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} + COMMENT "Building ${TEST_NAME}" + ) +endfunction() + +#------------------------------------------------------------------------------- +# Build list +#------------------------------------------------------------------------------- +add_goban_test_target(ctest_goban_barrier_test barrier_test.c) +add_goban_test_target(ctest_goban_barrier_mvin_test barrier_mvin_test.c) + +# Master build target +add_custom_target(buckyball-goban-CTest-build ALL + DEPENDS + ctest_goban_barrier_test + ctest_goban_barrier_mvin_test + COMMENT "Building all Goban workloads" + VERBATIM +) diff --git a/bb-tests/workloads/src/CTest/goban/barrier_mvin_test.c b/bb-tests/workloads/src/CTest/goban/barrier_mvin_test.c new file mode 100644 index 00000000..d67fe3f7 --- /dev/null +++ b/bb-tests/workloads/src/CTest/goban/barrier_mvin_test.c @@ -0,0 +1,83 @@ +/* + * barrier_mvin_test.c — Goban multi-core parallel mvin/mvout + barrier test. + * + * SPMD: all 4 cores run this program simultaneously. + * + * Each core: + * 1. Allocates its own private bank (bank = cid). + * 2. Fills the bank with a core-specific pattern (mvin). + * 3. Reads it back (mvout) and verifies locally. + * 4. Calls bb_barrier() — waits for all cores. + * 5. Records a "done" flag in shared memory. + * 6. Core 0 prints the summary. + * + * This test exercises: + * - Per-core private-bank mvin/mvout (no sharing needed) + * - bb_barrier() synchronization across all 4 cores + * - Parallel hardware accelerator utilization + */ + +#include "goban.h" +#include +#include + +#define DIM 16 +#define NCORES 4 + +/* Per-core scratchpad buffers — compiler places these in BSS (all harts share + the same virtual address space, but each core writes its own slot). */ +static elem_t src[NCORES][DIM * DIM] __attribute__((aligned(128))); +static elem_t dst[NCORES][DIM * DIM] __attribute__((aligned(128))); +static volatile int core_ok[NCORES]; + +int main(void) { + int cid = bb_get_core_id(); + + printf("[core %d] starting mvin/mvout\n", cid); + + /* ---- Step 1: fill src with a core-specific pattern ---- */ + elem_t pat = (elem_t)(cid + 1); /* core 0 → 1, core 1 → 2, … */ + for (int i = 0; i < DIM * DIM; i++) { + src[cid][i] = pat; + } + + /* ---- Step 2: mvin → shared bank ---- */ + int bank = cid; /* each core uses its own bank, no conflict */ + bb_mem_alloc(bank, 1, 1); + bb_mvin((uintptr_t)src[cid], bank, DIM, 1); + + /* ---- Step 3: mvout → dst ---- */ + memset(dst[cid], 0, sizeof(dst[cid])); + bb_mvout((uintptr_t)dst[cid], bank, DIM, 1); + bb_mem_release(bank); + + /* ---- Step 4: verify locally ---- */ + int ok = 1; + for (int i = 0; i < DIM * DIM; i++) { + if (dst[cid][i] != pat) { + printf("[core %d] ERROR at [%d]: got %d, expected %d\n", + cid, i, (int)dst[cid][i], (int)pat); + ok = 0; + break; + } + } + core_ok[cid] = ok; + printf("[core %d] mvin/mvout %s\n", cid, ok ? "PASSED" : "FAILED"); + + /* ============ BARRIER: wait for all cores to finish ============ */ + bb_barrier(); + + /* ---- Step 5: core 0 collects results ---- */ + if (cid == 0) { + int all_ok = 1; + for (int i = 0; i < NCORES; i++) { + if (!core_ok[i]) { + all_ok = 0; + printf("[core 0] core %d reported FAILED\n", i); + } + } + printf("=== barrier_mvin_test %s ===\n", all_ok ? "PASSED" : "FAILED"); + } + + return core_ok[cid] ? 0 : 1; +} diff --git a/bb-tests/workloads/src/CTest/goban/barrier_test.c b/bb-tests/workloads/src/CTest/goban/barrier_test.c new file mode 100644 index 00000000..c10dfd60 --- /dev/null +++ b/bb-tests/workloads/src/CTest/goban/barrier_test.c @@ -0,0 +1,66 @@ +/* + * barrier_test.c — Goban multi-core barrier smoke test. + * + * All nCores harts run this program concurrently (SPMD). + * Test plan: + * 1. Each core records a "before" cycle count. + * 2. All cores execute bb_barrier() — hardware stalls until everyone arrives. + * 3. Each core records an "after" cycle count. + * 4. Core 0 prints a summary; all cores reach the final printf. + * + * Correctness criterion: if the simulation does not hang and all cores print + * their completion message, the barrier works. + */ + +#include "goban.h" +#include + +/* Shared flag written by each core after the barrier to verify ordering. */ +static volatile int arrived[4] = {0, 0, 0, 0}; + +int main(void) { + int cid = bb_get_core_id(); + + printf("[core %d] starting\n", cid); + + /* --- Phase 1: mark arrival before barrier --- */ + arrived[cid] = 1; + + /* ============ BARRIER 1 ============ */ + bb_barrier(); + + /* --- Phase 2: all cores should see every arrival flag set --- */ + int ok = 1; + for (int i = 0; i < 4; i++) { + if (!arrived[i]) { + printf("[core %d] ERROR: arrived[%d] not set after barrier!\n", cid, i); + ok = 0; + } + } + + if (ok) { + printf("[core %d] after barrier: all arrival flags set — PASSED\n", cid); + } + + /* ============ BARRIER 2 ============ */ + /* Verify barrier can be used more than once in the same program. */ + arrived[cid] = 2; + bb_barrier(); + + for (int i = 0; i < 4; i++) { + if (arrived[i] != 2) { + printf("[core %d] ERROR: arrived[%d] != 2 after barrier 2!\n", cid, i); + ok = 0; + } + } + + if (ok) { + printf("[core %d] barrier 2 PASSED\n", cid); + } + + if (cid == 0) { + printf("=== barrier_test %s ===\n", ok ? "PASSED" : "FAILED"); + } + + return ok ? 0 : 1; +} diff --git a/bb-tests/workloads/src/CTest/goban/goban.h b/bb-tests/workloads/src/CTest/goban/goban.h new file mode 100644 index 00000000..45f40bb8 --- /dev/null +++ b/bb-tests/workloads/src/CTest/goban/goban.h @@ -0,0 +1,14 @@ +#ifndef GOBAN_H +#define GOBAN_H + +#include +#include + +/* Read hart ID from CSR mhartid */ +static inline int bb_get_core_id(void) { + int hartid; + asm volatile("csrr %0, mhartid" : "=r"(hartid)); + return hartid; +} + +#endif // GOBAN_H diff --git a/bb-tests/workloads/src/CTest/goban/goban.ld b/bb-tests/workloads/src/CTest/goban/goban.ld new file mode 100644 index 00000000..e3284422 --- /dev/null +++ b/bb-tests/workloads/src/CTest/goban/goban.ld @@ -0,0 +1,41 @@ +/* goban.ld — baremetal linker script for Goban multi-core workloads (BBSimHarness) */ +OUTPUT_ARCH("riscv") +ENTRY(_start) + +SECTIONS { + . = 0x80000000; + + .text : { + *(.text.init) + *(.text.startup .text.startup.*) + *(.text .text.*) + *(.gnu.linkonce.t.*) + } + + .rodata : { + *(.rodata .rodata.*) + *(.srodata .srodata.*) + *(.gnu.linkonce.r.*) + } + + .data : { + *(.data .data.*) + *(.sdata .sdata.*) + *(.gnu.linkonce.d.*) + *(.gnu.linkonce.s.*) + } + + .bss (NOLOAD) : { + PROVIDE(__bss_start = .); + *(.bss .bss.*) + *(.sbss .sbss.*) + *(.gnu.linkonce.b.*) + *(.gnu.linkonce.sb.*) + *(COMMON) + PROVIDE(__bss_end = .); + } + + . = ALIGN(16); + PROVIDE(end = .); + PROVIDE(_end = .); +} diff --git a/bb-tests/workloads/src/CTest/goban/start.S b/bb-tests/workloads/src/CTest/goban/start.S new file mode 100644 index 00000000..177d8504 --- /dev/null +++ b/bb-tests/workloads/src/CTest/goban/start.S @@ -0,0 +1,30 @@ +# Multi-core SPMD startup for Goban (nCores=4 by default). +# All harts jump to main(); divergence is done via mhartid inside main(). +# Stack layout: base = 0x80020000, each hart gets 4 KB downward. +# On return from main(), hart 0 writes exit code to BBSimHarness MMIO exit +# register (0x60000000); other harts spin forever. + +.section .text.init +.global _start + +_start: + csrr t0, mhartid # t0 = hart id + + # Stack per hart: base - hartid * 0x1000 + li sp, 0x80020000 + li t1, 0x1000 + mul t1, t0, t1 + sub sp, sp, t1 + + call main + + # Only hart 0 writes the MMIO exit register + csrr t0, mhartid + bnez t0, 1f + li t1, 0x60000000 + sw a0, 0(t1) # write exit code; C++ mmio_tick() calls sim_exit() +1: j 1b # spin forever (all harts) + +.align 12 +stack_space: + .space 0x8000 # 32 KB total stack space (8 harts × 4 KB) diff --git a/bb-tests/workloads/src/CTest/rvv/.gitignore b/bb-tests/workloads/src/CTest/rvv/.gitignore deleted file mode 100644 index fb0d9c4e..00000000 --- a/bb-tests/workloads/src/CTest/rvv/.gitignore +++ /dev/null @@ -1 +0,0 @@ -*_data.S diff --git a/bb-tests/workloads/src/CTest/rvv/CMakeLists.txt b/bb-tests/workloads/src/CTest/rvv/CMakeLists.txt deleted file mode 100644 index 134263f6..00000000 --- a/bb-tests/workloads/src/CTest/rvv/CMakeLists.txt +++ /dev/null @@ -1,229 +0,0 @@ -set(ELF_CC "riscv64-unknown-elf-gcc") -set(ELF_CXX "riscv64-unknown-elf-g++") - -#------------------------------------------------------------------------------- -# Set baremetal compilation flags for RVV -#------------------------------------------------------------------------------- -set(RVV_C_FLAGS -g -DPREALLOCATE=1 -DNR_LANES=64 -mcmodel=medany -static -O2 -ffast-math - -fno-common -fno-builtin-printf -fno-tree-loop-distribute-patterns - -march=rv64gcv_zfh_zvfh -mabi=lp64d -std=gnu99 - -I${CTEST_RVV_WORKLOAD_DIR}/env - -I${CTEST_RVV_WORKLOAD_DIR}/common) - -set(RVV_CXX_FLAGS -g -DPREALLOCATE=1 -mcmodel=medany -static -O2 -ffast-math - -fno-common -fno-builtin-printf -fno-tree-loop-distribute-patterns - -march=rv64gcv_zfh_zvfh -mabi=lp64d -std=c++17 - -specs=htif_nano.specs - -I${CTEST_RVV_WORKLOAD_DIR}/env - -I${CTEST_RVV_WORKLOAD_DIR}/common - -I${CTEST_RVV_WORKLOAD_DIR}/utasks) - -# Link options for baremetal -set(RVV_LINK_OPTS -static -nostdlib -nostartfiles -lm -lgcc - -T ${CTEST_RVV_WORKLOAD_DIR}/common/test.ld) - -# Common sources -file(GLOB RVV_COMMON_C_SRCS - ${CTEST_RVV_WORKLOAD_DIR}/common/*.c - ${CTEST_RVV_WORKLOAD_DIR}/common/ara/*.c) -file(GLOB RVV_COMMON_ASM_SRCS - ${CTEST_RVV_WORKLOAD_DIR}/common/*.S) -set(RVV_COMMON_SRCS ${RVV_COMMON_C_SRCS} ${RVV_COMMON_ASM_SRCS}) - -#------------------------------------------------------------------------------- -# Define compilation functions for RVV benchmarks -#------------------------------------------------------------------------------- - -# Generate baremetal executable for C benchmarks -function(add_rvv_baremetal_test_target TEST_NAME) - set(EXECUTABLE "${TEST_NAME}-baremetal") - set(TEST_DIR "${CTEST_RVV_WORKLOAD_DIR}/${TEST_NAME}") - - # Find all C sources in the test directory - file(GLOB TEST_C_SRCS ${TEST_DIR}/*.c) - - # Check if there's a data generation script (only for tests that generate .S files) - # Exclude tests that generate C header files instead - set(GEN_SCRIPT "") - if(NOT TEST_NAME MATCHES "vec-conv-3|vec-sep-conv-3|vec-slide-conv|vec-sgemm|vec-sgemv|vec-transpose-load|vec-transpose-store") - file(GLOB GEN_SCRIPT ${TEST_DIR}/gen_data.py ${TEST_DIR}/*_gendata.py ${TEST_DIR}/gendata.py) - endif() - - # Find existing assembly sources (non-data files like vec-*.S) - file(GLOB TEST_ASM_SRCS ${TEST_DIR}/*.S) - list(FILTER TEST_ASM_SRCS EXCLUDE REGEX "data\\.S$") - - set(TEST_SRCS ${TEST_C_SRCS} ${TEST_ASM_SRCS}) - set(DATA_ASM_FILE "") - - # If there's a generation script, add command to generate data.S - if(GEN_SCRIPT) - set(DATA_ASM_FILE ${CMAKE_CURRENT_BINARY_DIR}/${TEST_NAME}_data.S) - - # Determine default arguments for each test - set(GEN_ARGS "") - if(TEST_NAME MATCHES "vec-cos|vec-exp|vec-log") - set(GEN_ARGS "64") # N_f64 - elseif(TEST_NAME MATCHES "vec-jacobi2d") - set(GEN_ARGS "16 16") # R C - elseif(TEST_NAME MATCHES "vec-pathfinder") - set(GEN_ARGS "10 20 30") # num_runs cols rows - elseif(TEST_NAME MATCHES "vec-conjugate-gradient|vec-spmv") - set(GEN_ARGS "16 16 0.5") # S N D - elseif(TEST_NAME MATCHES "vec-igemm") - set(GEN_ARGS "8 8 8") # M N P - elseif(TEST_NAME MATCHES "vec-softmax") - set(GEN_ARGS "8 8") # channels innerSize - elseif(TEST_NAME MATCHES "vec-dropout") - set(GEN_ARGS "1024") # N - elseif(TEST_NAME MATCHES "vec-fconv2d|vec-iconv2d") - set(GEN_ARGS "64 3") # matrix_width filter_size - elseif(TEST_NAME MATCHES "vec-fconv3d") - set(GEN_ARGS "16 3") # matrix_width filter_size - elseif(TEST_NAME MATCHES "vec-fdotprod|vec-dotprod") - set(GEN_ARGS "1024") # N - elseif(TEST_NAME MATCHES "vec-roi-align") - set(GEN_ARGS "2 3 16 16 4 8 8") # batch_size depth height width n_boxes crop_h crop_w - endif() - - # Convert space-separated args to list - string(REPLACE " " ";" GEN_ARGS_LIST "${GEN_ARGS}") - - add_custom_command( - OUTPUT ${DATA_ASM_FILE} - COMMAND ${CMAKE_COMMAND} -E env python3 ${GEN_SCRIPT} ${GEN_ARGS_LIST} > ${DATA_ASM_FILE} - DEPENDS ${GEN_SCRIPT} - COMMENT "Generating data for ${TEST_NAME} with args: ${GEN_ARGS}" - WORKING_DIRECTORY ${TEST_DIR} - ) - list(APPEND TEST_SRCS ${DATA_ASM_FILE}) - else() - # If no generation script, use existing data.S if present - file(GLOB DATA_S ${TEST_DIR}/data.S) - if(DATA_S) - list(APPEND TEST_SRCS ${DATA_S}) - endif() - endif() - - add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} - COMMAND ${ELF_CC} ${RVV_C_FLAGS} - -I${TEST_DIR} - -o ${EXECUTABLE} - ${TEST_SRCS} - ${RVV_COMMON_SRCS} - ${RVV_LINK_OPTS} - DEPENDS ${TEST_SRCS} ${RVV_COMMON_SRCS} - COMMENT "Building RVV baremetal executable: ${EXECUTABLE}" - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - ) - - add_custom_target(${TEST_NAME}_baremetal - DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} - ) -endfunction() - -# Generate baremetal executable for C++ benchmarks -# Note: C++ programs don't link common sources, they rely on htif_nano.specs -function(add_rvv_cpp_baremetal_test_target TEST_NAME) - set(EXECUTABLE "${TEST_NAME}-baremetal") - set(TEST_DIR "${CTEST_RVV_WORKLOAD_DIR}/${TEST_NAME}") - - # Find all C++ and assembly sources in the test directory - file(GLOB TEST_CPP_SRCS ${TEST_DIR}/*.cc) - file(GLOB TEST_ASM_SRCS ${TEST_DIR}/*.S) - set(TEST_SRCS ${TEST_CPP_SRCS} ${TEST_ASM_SRCS}) - - add_custom_command( - OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} - COMMAND ${ELF_CXX} ${RVV_CXX_FLAGS} - -I${TEST_DIR} - -o ${EXECUTABLE} - ${TEST_SRCS} - ${RVV_LINK_OPTS} - DEPENDS ${TEST_SRCS} ${CTEST_RVV_WORKLOAD_DIR}/utasks/utasks.h - COMMENT "Building RVV C++ baremetal executable: ${EXECUTABLE}" - WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} - ) - - add_custom_target(${TEST_NAME}_baremetal - DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} - ) -endfunction() - -#------------------------------------------------------------------------------- -# Build list - C benchmarks (baremetal only) -#------------------------------------------------------------------------------- -add_rvv_baremetal_test_target(vec-conditional) -add_rvv_baremetal_test_target(vec-conjugate-gradient) -add_rvv_baremetal_test_target(vec-conv-3) -add_rvv_baremetal_test_target(vec-cos) -add_rvv_baremetal_test_target(vec-div-approx) -add_rvv_baremetal_test_target(vec-dotprod) -add_rvv_baremetal_test_target(vec-dropout) -add_rvv_baremetal_test_target(vec-exp) -add_rvv_baremetal_test_target(vec-fconv2d) -add_rvv_baremetal_test_target(vec-fconv3d) -add_rvv_baremetal_test_target(vec-fdotprod) -add_rvv_baremetal_test_target(vec-iconv2d) -add_rvv_baremetal_test_target(vec-igemm) -add_rvv_baremetal_test_target(vec-jacobi2d) -add_rvv_baremetal_test_target(vec-log) -add_rvv_baremetal_test_target(vec-mixed_width_mask) -add_rvv_baremetal_test_target(vec-pathfinder) -add_rvv_baremetal_test_target(vec-roi-align) -add_rvv_baremetal_test_target(vec-sep-conv-3) -add_rvv_baremetal_test_target(vec-sgemm) -add_rvv_baremetal_test_target(vec-sgemm-v2) -add_rvv_baremetal_test_target(vec-sgemm-v3) -add_rvv_baremetal_test_target(vec-sgemv) -add_rvv_baremetal_test_target(vec-slide-conv) -add_rvv_baremetal_test_target(vec-softmax) -add_rvv_baremetal_test_target(vec-strlen) -add_rvv_baremetal_test_target(vec-spmv) -add_rvv_baremetal_test_target(vec-square-root-approx) -add_rvv_baremetal_test_target(vec-transpose-load) -add_rvv_baremetal_test_target(vec-transpose-store) - -#------------------------------------------------------------------------------- -# Build list - C++ benchmarks -#------------------------------------------------------------------------------- -# add_rvv_cpp_baremetal_test_target(vec-tasks) -# add_rvv_cpp_baremetal_test_target(vec-daxpy) - -#------------------------------------------------------------------------------- -# Create master build target -#------------------------------------------------------------------------------- -add_custom_target(buckyball-rvv-build ALL DEPENDS - vec-conditional_baremetal - vec-conjugate-gradient_baremetal - vec-conv-3_baremetal - vec-cos_baremetal - vec-div-approx_baremetal - vec-dotprod_baremetal - vec-dropout_baremetal - vec-exp_baremetal - vec-fconv2d_baremetal - vec-fconv3d_baremetal - vec-fdotprod_baremetal - vec-iconv2d_baremetal - vec-igemm_baremetal - vec-jacobi2d_baremetal - vec-log_baremetal - vec-mixed_width_mask_baremetal - vec-pathfinder_baremetal - vec-roi-align_baremetal - vec-sep-conv-3_baremetal - vec-sgemm_baremetal - vec-sgemm-v2_baremetal - vec-sgemm-v3_baremetal - vec-sgemv_baremetal - vec-slide-conv_baremetal - vec-softmax_baremetal - vec-strlen_baremetal - vec-spmv_baremetal - vec-square-root-approx_baremetal - vec-transpose-load_baremetal - vec-transpose-store_baremetal - COMMENT "Building all RVV baremetal workloads for Buckyball" - VERBATIM) diff --git a/bb-tests/workloads/src/CTest/rvv/Makefile b/bb-tests/workloads/src/CTest/rvv/Makefile deleted file mode 100644 index 17417bdb..00000000 --- a/bb-tests/workloads/src/CTest/rvv/Makefile +++ /dev/null @@ -1,133 +0,0 @@ -#======================================================================= -# UCB VLSI FLOW: Makefile for riscv-bmarks -#----------------------------------------------------------------------- -# Yunsup Lee (yunsup@cs.berkeley.edu) -# - -XLEN ?= 64 - -default: all - -src_dir = . - -instname = riscv-bmarks -instbasedir = $(UCB_VLSI_HOME)/install - -#-------------------------------------------------------------------- -# Sources -#-------------------------------------------------------------------- - -bmarks = \ - vec-conditional \ - vec-conjugate-gradient \ - vec-conv-3 \ - vec-cos \ - vec-div-approx \ - vec-dotprod \ - vec-dropout \ - vec-exp \ - vec-fconv2d \ - vec-fconv3d \ - vec-fdotprod \ - vec-fft \ - vec-iconv2d \ - vec-igemm \ - vec-jacobi2d \ - vec-log \ - vec-mixed_width_mask \ - vec-pathfinder \ - vec-roi-align \ - vec-sep-conv-3 \ - vec-sgemm \ - vec-sgemm-v2 \ - vec-sgemm-v3 \ - vec-sgemv \ - vec-slide-conv \ - vec-softmax \ - vec-strlen \ - vec-spmv \ - vec-square-root-approx \ - vec-transpose-load \ - vec-transpose-store \ - -cpp_bmarks = \ - vec-tasks \ - vec-daxpy - -#-------------------------------------------------------------------- -# Build rules -#-------------------------------------------------------------------- - -RISCV_PREFIX ?= riscv$(XLEN)-unknown-elf- -RISCV_GCC ?= $(RISCV_PREFIX)gcc -RISCV_GXX ?= $(RISCV_PREFIX)g++ -RISCV_COMMON_OPTS ?= -DPREALLOCATE=1 -mcmodel=medany -static -O2 -g -ffast-math -fno-common -fno-builtin-printf -fno-tree-loop-distribute-patterns -march=rv$(XLEN)gcv_zfh_zvfh -mabi=lp64d -RISCV_GCC_OPTS ?= $(RISCV_COMMON_OPTS) -std=gnu99 -RISCV_GXX_OPTS ?= $(RISCV_COMMON_OPTS) -std=c++17 -specs=htif_nano.specs -RISCV_LINK ?= $(RISCV_GCC) -T $(src_dir)/common/test.ld $(incs) -RISCV_LINK_OPTS ?= -static -nostdlib -nostartfiles -lm -lgcc -T $(src_dir)/common/test.ld -RISCV_OBJDUMP ?= $(RISCV_PREFIX)objdump -C -D -S --disassemble-all --disassemble-zeroes --section=.text --section=.text.startup --section=.text.init --section=.data -RISCV_SIM ?= spike --isa=rv$(XLEN)gcv_zfh_zvfh -p4 -m0x70020000:0x20000,0x80000000:0x10000000 - -incs += -I$(src_dir)/env -I$(src_dir)/common $(addprefix -I$(src_dir)/, $(bmarks)) -objs := - -COMMON_SRCS = $(wildcard $(src_dir)/common/*.c) $(wildcard $(src_dir)/common/*.S) $(wildcard $(src_dir)/common/ara/*.c) - -define compile_template -$(1).riscv: $(wildcard $(src_dir)/$(1)/*) $(wildcard $(src_dir)/common/*) - $$(RISCV_GCC) $$(incs) $$(RISCV_GCC_OPTS) -o $$@ $(wildcard $(src_dir)/$(1)/*.c) $(wildcard $(src_dir)/$(1)/*.S) $(COMMON_SRCS) $$(RISCV_LINK_OPTS) -endef - -define compile_cpp_template -$(1).riscv: $(wildcard $(src_dir)/$(1)/*) $(src_dir)/utasks/utasks.h - $$(RISCV_GXX) $$(incs) $$(RISCV_GXX_OPTS) -I$(src_dir)/utasks -o $$@ $(wildcard $(src_dir)/$(1)/*.cc) $(wildcard $(src_dir)/$(1)/*.S) -endef - - -$(foreach bmark,$(bmarks),$(eval $(call compile_template,$(bmark)))) -$(foreach bmark,$(cpp_bmarks),$(eval $(call compile_cpp_template,$(bmark)))) - -#------------------------------------------------------------ -# Build and run benchmarks on riscv simulator - -bmarks_riscv_bin = $(addsuffix .riscv, $(bmarks) $(cpp_bmarks)) -bmarks_riscv_dump = $(addsuffix .riscv.dump, $(bmarks) $(cpp_bmarks)) -bmarks_riscv_out = $(addsuffix .riscv.out, $(bmarks) $(cpp_bmarks)) - -$(bmarks_riscv_dump): %.riscv.dump: %.riscv - $(RISCV_OBJDUMP) $< > $@ - -$(bmarks_riscv_out): %.riscv.out: %.riscv - $(RISCV_SIM) $< > $@ - -riscv: $(bmarks_riscv_dump) -run: $(bmarks_riscv_out) - -junk += $(bmarks_riscv_bin) $(bmarks_riscv_dump) $(bmarks_riscv_hex) $(bmarks_riscv_out) - -#------------------------------------------------------------ -# Default - -all: riscv - -#------------------------------------------------------------ -# Install - -date_suffix = $(shell date +%Y-%m-%d_%H-%M) -install_dir = $(instbasedir)/$(instname)-$(date_suffix) -latest_install = $(shell ls -1 -d $(instbasedir)/$(instname)* | tail -n 1) - -install: - mkdir $(install_dir) - cp -r $(bmarks_riscv_bin) $(bmarks_riscv_dump) $(install_dir) - -install-link: - rm -rf $(instbasedir)/$(instname) - ln -s $(latest_install) $(instbasedir)/$(instname) - -#------------------------------------------------------------ -# Clean up - -clean: - rm -rf $(objs) $(junk) diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/exp.h b/bb-tests/workloads/src/CTest/rvv/common/ara/exp.h deleted file mode 100644 index 1bdec349..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/exp.h +++ /dev/null @@ -1,99 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include -#include - -#include "riscv_vector.h" - -#define EXP_IMPL(t, w, l, b, c1, c2) \ - void exp_f##w##m##l##_bmark(t *exponents, t *results, size_t len); \ - inline vfloat##w##m##l##_t __exp_f##w##m##l(vfloat##w##m##l##_t x, \ - size_t gvl) { \ - t exp_hi = 88.3762626##w##7949; \ - t exp_lo = -88.3762626##w##7949; \ - \ - t cephes_LOG2EF = 1.44269504088896341; \ - t cephes_exp_C1 = 0.693359375; \ - t cephes_exp_C2 = -2.12194440e-4; \ - \ - t cephes_exp_p0 = 1.9875691500E-4; \ - t cephes_exp_p1 = 1.3981999507E-3; \ - t cephes_exp_p2 = 8.3334519073E-3; \ - t cephes_exp_p3 = 4.1665795894E-2; \ - t cephes_exp_p4 = 1.6666665459E-1; \ - t cephes_exp_p5 = 5.0000001201E-1; \ - vfloat##w##m##l##_t tmp; \ - vfloat##w##m##l##_t tmp2; \ - vfloat##w##m##l##_t tmp4; \ - vfloat##w##m##l##_t fx; \ - \ - vfloat##w##m##l##_t one = __riscv_vfmv_v_f_f##w##m##l(1.0, gvl); \ - vfloat##w##m##l##_t zero = __riscv_vfmv_v_f_f##w##m##l(0.0, gvl); \ - vfloat##w##m##l##_t z; \ - vfloat##w##m##l##_t y; \ - \ - vbool##b##_t mask; \ - vint##w##m##l##_t imm0; \ - vint##w##m##l##_t tmp3; \ - \ - x = __riscv_vfmin_vf_f##w##m##l(x, exp_hi, gvl); \ - x = __riscv_vfmax_vf_f##w##m##l(x, exp_lo, gvl); \ - \ - fx = __riscv_vfmv_v_f_f##w##m##l(0.5, gvl); \ - fx = __riscv_vfmacc_vf_f##w##m##l(fx, cephes_LOG2EF, x, gvl); \ - \ - tmp3 = __riscv_vfcvt_x_f_v_i##w##m##l(fx, gvl); \ - tmp = __riscv_vfcvt_f_x_v_f##w##m##l(tmp3, gvl); \ - \ - mask = __riscv_vmflt_vv_f##w##m##l##_b##b(fx, tmp, gvl); \ - tmp2 = __riscv_vmerge_vvm_f##w##m##l(zero, one, mask, gvl); \ - fx = __riscv_vfsub_vv_f##w##m##l(tmp, tmp2, gvl); \ - tmp = __riscv_vfmul_vf_f##w##m##l(fx, cephes_exp_C1, gvl); \ - z = __riscv_vfmul_vf_f##w##m##l(fx, cephes_exp_C2, gvl); \ - x = __riscv_vfsub_vv_f##w##m##l(x, tmp, gvl); \ - x = __riscv_vfsub_vv_f##w##m##l(x, z, gvl); \ - \ - z = __riscv_vfmul_vv_f##w##m##l(x, x, gvl); \ - \ - y = __riscv_vfmv_v_f_f##w##m##l(cephes_exp_p0, gvl); \ - y = __riscv_vfmadd_vf_f##w##m##l(y, cephes_exp_p1, x, gvl); \ - y = __riscv_vfmadd_vf_f##w##m##l(y, cephes_exp_p2, x, gvl); \ - y = __riscv_vfmadd_vf_f##w##m##l(y, cephes_exp_p3, x, gvl); \ - y = __riscv_vfmadd_vf_f##w##m##l(y, cephes_exp_p4, x, gvl); \ - y = __riscv_vfmadd_vf_f##w##m##l(y, cephes_exp_p5, x, gvl); \ - y = __riscv_vfmadd_vv_f##w##m##l(y, z, x, gvl); \ - y = __riscv_vfadd_vv_f##w##m##l(y, one, gvl); \ - \ - imm0 = __riscv_vfcvt_x_f_v_i##w##m##l(fx, gvl); \ - imm0 = __riscv_vadd_vv_i##w##m##l( \ - imm0, __riscv_vmv_v_x_i##w##m##l(c1, gvl), gvl); \ - imm0 = __riscv_vsll_vv_i##w##m##l( \ - imm0, __riscv_vmv_v_x_u##w##m##l(c2, gvl), gvl); \ - \ - tmp4 = __riscv_vreinterpret_v_i##w##m##l##_f##w##m##l(imm0); \ - y = __riscv_vfmul_vv_f##w##m##l(y, tmp4, gvl); \ - return y; \ - } - -EXP_IMPL(double, 64, 1, 64, 1023, 52); -EXP_IMPL(double, 64, 2, 32, 1023, 52); -EXP_IMPL(double, 64, 4, 16, 1023, 52); -EXP_IMPL(float, 32, 1, 32, 0x7f, 23); -EXP_IMPL(float, 32, 2, 16, 0x7f, 23); -EXP_IMPL(float, 32, 4, 8, 0x7f, 23); diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/fdotproduct.c b/bb-tests/workloads/src/CTest/rvv/common/ara/fdotproduct.c deleted file mode 100644 index 305ad6d9..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/fdotproduct.c +++ /dev/null @@ -1,346 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "fdotproduct.h" -#define INTRINSICS -// 64-bit dot-product: a * b -double fdotp_v64b(const double *a, const double *b, size_t avl) { -#ifdef INTRINSICS - - size_t orig_avl = avl; - size_t vl = __riscv_vsetvlmax_e64m8(); - - vfloat64m8_t acc, buf_a, buf_b; - vfloat64m1_t red; - - double *a_ = (double *)a; - double *b_ = (double *)b; - - // Clean the accumulator - acc = __riscv_vfmv_s_f_f64m8(0, vl); - red = __riscv_vfmv_s_f_f64m1(0, vl); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - vl = __riscv_vsetvl_e64m8(avl); - // Load chunk a and b - buf_a = __riscv_vle64_v_f64m8(a_, vl); - buf_b = __riscv_vle64_v_f64m8(b_, vl); - // Multiply and accumulate - acc = __riscv_vfmacc_vv_f64m8(acc, buf_a, buf_b, vl); - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - vl = __riscv_vsetvlmax_e64m8(); - red = __riscv_vfredusum_vs_f64m8_f64m1(acc, red, vl); - return __riscv_vfmv_f_s_f64m1_f64(red); - -#else - - size_t orig_avl = avl; - size_t vl; - asm volatile("vsetvli %0, %1, e64, m8, ta, ma" : "=r"(vl) : "r"(avl)); - - double red; - - double *a_ = (double *)a; - double *b_ = (double *)b; - - // Clean the accumulator - asm volatile("vmv.s.x v0, zero"); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - asm volatile("vsetvli %0, %1, e64, m8, ta, ma" : "=r"(vl) : "r"(avl)); - // Load chunk a and b - asm volatile("vle64.v v8, (%0)" ::"r"(a_)); - asm volatile("vle64.v v16, (%0)" ::"r"(b_)); - // Multiply and accumulate - if (avl == orig_avl) { - asm volatile("vfmul.vv v24, v8, v16"); - } else { - asm volatile("vfmacc.vv v24, v8, v16"); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - asm volatile("vfredusum.vs v0, v24, v0"); - asm volatile("vfmv.f.s %0, v0" : "=f"(red)); - return red; - -#endif -} - -// 32-bit dot-product: a * b -float fdotp_v32b(const float *a, const float *b, size_t avl) { -#ifdef INTRINSICS - - size_t orig_avl = avl; - size_t vl = __riscv_vsetvl_e32m8(avl); - - vfloat32m8_t acc, buf_a, buf_b; - vfloat32m1_t red; - - float *a_ = (float *)a; - float *b_ = (float *)b; - - // Clean the accumulator - acc = __riscv_vfmv_s_f_f32m8(0, vl); - red = __riscv_vfmv_s_f_f32m1(0, vl); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - vl = __riscv_vsetvl_e32m8(avl); - // Load chunk a and b - buf_a = __riscv_vle32_v_f32m8(a_, vl); - buf_b = __riscv_vle32_v_f32m8(b_, vl); - // Multiply and accumulate - acc = __riscv_vfmacc_vv_f32m8(acc, buf_a, buf_b, vl); - - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - vl = __riscv_vsetvlmax_e64m8(); - red = __riscv_vfredusum_vs_f32m8_f32m1(acc, red, vl); - return __riscv_vfmv_f_s_f32m1_f32(red); - -#else - - size_t orig_avl = avl; - size_t vl; - asm volatile("vsetvli %0, %1, e32, m8, ta, ma" : "=r"(vl) : "r"(avl)); - - float red; - - float *a_ = (float *)a; - float *b_ = (float *)b; - - // Clean the accumulator - asm volatile("vmv.s.x v0, zero"); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - asm volatile("vsetvli %0, %1, e32, m8, ta, ma" : "=r"(vl) : "r"(avl)); - // Load chunk a and b - asm volatile("vle32.v v8, (%0)" ::"r"(a_)); - asm volatile("vle32.v v16, (%0)" ::"r"(b_)); - // Multiply and accumulate - if (avl == orig_avl) { - asm volatile("vfmul.vv v24, v8, v16"); - } else { - asm volatile("vfmacc.vv v24, v8, v16"); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - asm volatile("vfredusum.vs v0, v24, v0"); - asm volatile("vfmv.f.s %0, v0" : "=f"(red)); - return red; - -#endif -} - -// 16-bit dot-product: a * b -_Float16 fdotp_v16b(const _Float16 *a, const _Float16 *b, size_t avl) { - /* #ifdef INTRINSICS */ - - /* size_t orig_avl = avl; */ - /* size_t vl = __riscv_vsetvl_e16m8(avl); */ - - /* vfloat16m8_t acc, buf_a, buf_b; */ - /* vfloat16m1_t red; */ - - /* _Float16 *a_ = (_Float16 *)a; */ - /* _Float16 *b_ = (_Float16 *)b; */ - - /* // Clean the accumulator */ - /* red = __riscv_vfmv_s_f_f16m1(0, vl); */ - /* // Stripmine and accumulate a partial reduced vector */ - /* for (; avl > 0; avl -= vl) { */ - /* vl = __riscv_vsetvl_e16m8(avl); */ - /* // Load chunk a and b */ - /* buf_a = __riscv_vle16_v_f16m8(a_, vl); */ - /* buf_b = __riscv_vle16_v_f16m8(b_, vl); */ - /* // Multiply and accumulate */ - /* if (avl == orig_avl) { */ - /* acc = __riscv_vfmul_vv_f16m8(buf_a, buf_b, vl); */ - /* } else { */ - /* acc = __riscv_vfmacc_vv_f16m8(acc, buf_a, buf_b, vl); */ - /* } */ - /* // Bump pointers */ - /* a_ += vl; */ - /* b_ += vl; */ - /* } */ - - /* // Reduce and store */ - /* red = __riscv_vfredusum_vs_f16m8_f16m1(red, acc, red, vl); */ - /* return __riscv_vfmv_f_s_f16m1_f16(red); */ - - /* #else */ - - size_t orig_avl = avl; - size_t vl; - asm volatile("vsetvli %0, %1, e16, m8, ta, ma" : "=r"(vl) : "r"(avl)); - - _Float16 red; - - _Float16 *a_ = (_Float16 *)a; - _Float16 *b_ = (_Float16 *)b; - - // Clean the accumulator - asm volatile("vmv.s.x v0, zero"); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - asm volatile("vsetvli %0, %1, e16, m8, ta, ma" : "=r"(vl) : "r"(avl)); - // Load chunk a and b - asm volatile("vle16.v v8, (%0)" ::"r"(a_)); - asm volatile("vle16.v v16, (%0)" ::"r"(b_)); - // Multiply and accumulate - if (avl == orig_avl) { - asm volatile("vfmul.vv v24, v8, v16"); - } else { - asm volatile("vfmacc.vv v24, v8, v16"); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - asm volatile("vfredusum.vs v0, v24, v0"); - asm volatile("vfmv.f.s %0, v0" : "=f"(red)); - return red; - - /* #endif */ -} - -double fdotp_s64b(const double *a, const double *b, size_t avl) { - double acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} - -float fdotp_s32b(const float *a, const float *b, size_t avl) { - float acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} - -_Float16 fdotp_s16b(const _Float16 *a, const _Float16 *b, size_t avl) { - _Float16 acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/fdotproduct.h b/bb-tests/workloads/src/CTest/rvv/common/ara/fdotproduct.h deleted file mode 100644 index 4f5c3279..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/fdotproduct.h +++ /dev/null @@ -1,35 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#ifndef _FDOTPRODUCT_H_ -#define _FDOTPRODUCT_H_ - -#include -#include - -#include "riscv_vector.h" - -double fdotp_v64b(const double *a, const double *b, size_t avl); -float fdotp_v32b(const float *a, const float *b, size_t avl); -_Float16 fdotp_v16b(const _Float16 *a, const _Float16 *b, size_t avl); - -double fdotp_s64b(const double *a, const double *b, size_t avl); -float fdotp_s32b(const float *a, const float *b, size_t avl); -_Float16 fdotp_s16b(const _Float16 *a, const _Float16 *b, size_t avl); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/gemv.c b/bb-tests/workloads/src/CTest/rvv/common/ara/gemv.c deleted file mode 100644 index a1e13ad7..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/gemv.c +++ /dev/null @@ -1,248 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Chi Zhang, ETH Zurich - -#include "gemv.h" -#include "util.h" -#include -#include -#include -#include - -#include - -void init_gemv_data(const unsigned long int m_row, - const unsigned long int v_len, double *matrix, - double *vector, double a, double b, double c) { - // initialize matrix - for (uint64_t i = 0; i < m_row; ++i) { - for (uint64_t j = 0; j < v_len; ++j) { - matrix[i * v_len + j] = a * (double)i + b * (double)j + c; - } - } - - // initialize vector - for (uint64_t i = 0; i < v_len; ++i) { - vector[i] = a * (double)i + b; - } -} - -//=====================================// -//========= GEMV ROW WISE KERNEL ======// -//=====================================// - -#define SLICE_SIZE 128 - -void clear_reduction_register() { - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(SLICE_SIZE)); - asm volatile("vmv.v.i v16, 0"); - asm volatile("vmv.v.i v24, 0"); -} - -void store_slice_results(double *dest, const unsigned long int slice_height) { - double tmp; - // round slide - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_height)); - asm volatile("vfmv.f.s %0, v24" : "=f"(tmp)); - asm volatile("vfslide1down.vf v16, v24, %0" ::"f"(tmp)); - // store - asm volatile("vse64.v v16, (%0);" ::"r"(dest)); -} - -void gemv_rowwise_small_than_slice(const unsigned long int m_row, - const unsigned long int v_len, - double *matrix, double *vector, - double *dest) { - // setup - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(1)); - asm volatile("vmv.v.i v24, 0"); - asm volatile("vmv.v.i v16, 0"); - // load vector - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(v_len)); - asm volatile("vle64.v v0, (%0);" ::"r"(vector)); - for (uint64_t i = 0; i < m_row; ++i) { - double *_mat_ = matrix + i * v_len; - double *_dst_ = dest + i - 1; // delayed store - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(v_len)); - - if (i % 2 == 0) { - // load matrix slice row - asm volatile("vle64.v v4, (%0);" ::"r"(_mat_)); - // multiply with vector - asm volatile("vfmul.vv v8, v4, v0"); - // reduction - asm volatile("vfredsum.vs v16, v8, v16"); - // store previous data - if (i != 0) { - // asm volatile("vse64.v v24, (%0);" ::"r"(_dst_)); - double tmp; - asm volatile("vfmv.f.s %0, v24" : "=f"(tmp)); - *_dst_ = tmp; - // clear reduction register - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(1)); - asm volatile("vmv.v.i v24, 0"); - } - - } else { - // load matrix slice row - asm volatile("vle64.v v2, (%0);" ::"r"(_mat_)); - // multiply with vector - asm volatile("vfmul.vv v6, v2, v0"); - // reduction - asm volatile("vfredsum.vs v24, v6, v24"); - // store previous data - // asm volatile("vse64.v v16, (%0);" ::"r"(_dst_)); - double tmp; - asm volatile("vfmv.f.s %0, v16" : "=f"(tmp)); - *_dst_ = tmp; - // clear reduction register - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(1)); - asm volatile("vmv.v.i v16, 0"); - } - } - - // store the last value - double *_dst_ = dest + m_row - 1; - if (m_row % 2 == 0) // even - { - double tmp; - asm volatile("vfmv.f.s %0, v24" : "=f"(tmp)); - *_dst_ = tmp; - } else { // odd - double tmp; - asm volatile("vfmv.f.s %0, v16" : "=f"(tmp)); - *_dst_ = tmp; - } -} - -void gemv_rowwise_kernel_slice(const unsigned long int v_len, - const unsigned long int slice_width, - const unsigned long int slice_height, - double *matrix, double *vector) { - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_width)); - // load vector - asm volatile("vle64.v v0, (%0);" ::"r"(vector)); - - // for each row in slice - for (uint64_t i = 0; i < slice_height; ++i) { - double *_mat_ = matrix + i * v_len; - double tmp; // for round slice later - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_width)); - // load matrix slice row - asm volatile("vle64.v v4, (%0);" ::"r"(_mat_)); - // multiply with vector - asm volatile("vfmul.vv v8, v4, v0"); - if (i % 2 == 0) { - // round slide - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_height)); - asm volatile("vfmv.f.s %0, v24" : "=f"(tmp)); - asm volatile("vfslide1down.vf v16, v24, %0" ::"f"(tmp)); - // reduction - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_width)); - asm volatile("vfredsum.vs v16, v8, v16"); - } else { - // round slide - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_height)); - asm volatile("vfmv.f.s %0, v16" : "=f"(tmp)); - asm volatile("vfslide1down.vf v24, v16, %0" ::"f"(tmp)); - // reduction - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_width)); - asm volatile("vfredsum.vs v24, v8, v24"); - } - } - - if (slice_height % 2) { - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(slice_height)); - asm volatile("vmv.v.v v24, v16"); - } -} - -void gemv_rowwise(const unsigned long int m_row, const unsigned long int v_len, - double *matrix, double *vector, double *dest) { - // when matrix is samller than a slice - if (v_len <= SLICE_SIZE) { - gemv_rowwise_small_than_slice(m_row, v_len, matrix, vector, dest); - return; - } - - uint64_t num_slice_row = m_row / SLICE_SIZE; - uint64_t rest_row = m_row % SLICE_SIZE; - uint64_t num_slice_col = v_len / SLICE_SIZE; - uint64_t rest_col = v_len % SLICE_SIZE; - - // each slice row - for (uint64_t i = 0; i < num_slice_row; ++i) { - // clear reduction sum register file - clear_reduction_register(); - // each full slice - for (uint64_t j = 0; j < num_slice_col; ++j) { - double *_mat_ = matrix + i * SLICE_SIZE * v_len + j * SLICE_SIZE; - double *_vec_ = vector + j * SLICE_SIZE; - gemv_rowwise_kernel_slice(v_len, SLICE_SIZE, SLICE_SIZE, _mat_, _vec_); - } - // margin slice - if (rest_col > 0) { - double *_mat_ = - matrix + i * SLICE_SIZE * v_len + num_slice_col * SLICE_SIZE; - double *_vec_ = vector + num_slice_col * SLICE_SIZE; - gemv_rowwise_kernel_slice(v_len, rest_col, SLICE_SIZE, _mat_, _vec_); - } - // store dest vector value - double *_dst_ = dest + i * SLICE_SIZE; - store_slice_results(_dst_, SLICE_SIZE); - } - - // margin slice row - if (rest_row > 0) { - // clear reduction sum register file - clear_reduction_register(); - // each bottom slice - for (uint64_t j = 0; j < num_slice_col; ++j) { - double *_mat_ = - matrix + num_slice_row * SLICE_SIZE * v_len + j * SLICE_SIZE; - double *_vec_ = vector + j * SLICE_SIZE; - gemv_rowwise_kernel_slice(v_len, SLICE_SIZE, rest_row, _mat_, _vec_); - } - // margin slice - if (rest_col > 0) { - double *_mat_ = matrix + num_slice_row * SLICE_SIZE * v_len + - num_slice_col * SLICE_SIZE; - double *_vec_ = vector + num_slice_col * SLICE_SIZE; - gemv_rowwise_kernel_slice(v_len, rest_col, rest_row, _mat_, _vec_); - } - // store dest vector value - double *_dst_ = dest + num_slice_row * SLICE_SIZE; - store_slice_results(_dst_, rest_row); - } -} - -int gemv_verify(const unsigned long int m_row, const unsigned long int v_len, - double *matrix, double *vector, double *dest) { - for (uint64_t i = 0; i < m_row; ++i) { - double res = dest[i]; - double golden = 0; - for (uint64_t j = 0; j < v_len; ++j) { - golden = golden + matrix[i * v_len + j] * vector[j]; - } - if (golden != res) { - printf("Sorry, wrong value! at index %d, result = %f, golden = %f \n", i, - res, golden); - return i; - } - } - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/gemv.h b/bb-tests/workloads/src/CTest/rvv/common/ara/gemv.h deleted file mode 100644 index f38997c1..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/gemv.h +++ /dev/null @@ -1,37 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Chi Zhang, ETH Zurich - -#ifndef _GEMV_H -#define _GEMV_H - -void init_gemv_data(const unsigned long int m_row, - const unsigned long int v_len, double *matrix, - double *vector, double a, double b, double c); - -void gemv_rowwise(const unsigned long int m_row, const unsigned long int v_len, - double *matrix, double *vector, double *dest); - -void gemv_rowwise_small_than_slice(const unsigned long int m_row, - const unsigned long int v_len, - double *matrix, double *vector, - double *dest); - -int gemv_verify(const unsigned long int m_row, const unsigned long int v_len, - double *matrix, double *vector, double *dest); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/rivec/vector_defines.h b/bb-tests/workloads/src/CTest/rvv/common/ara/rivec/vector_defines.h deleted file mode 100644 index e525b51d..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/rivec/vector_defines.h +++ /dev/null @@ -1,138 +0,0 @@ -// Modified version of: -// vector_defines.h -// https://github.com/RALC88/riscv-vectorized-benchmark-suite/blob/rvv-1.0/common/vector_defines.h -// Find details on the original version below -// Author: Matteo Perotti - -// RISC-V VECTOR intrinsics mapping by Cristóbal Ramírez Lazo, "Barcelona 2019" - -#include "riscv_vector.h" - -/* - Data-Type Intrinsics -*/ - -#define _MMR_f64 vfloat64m1_t -#define _MMR_f32 vfloat32m1_t - -#define _MMR_i64 vint64m1_t -#define _MMR_i32 vint32m1_t - -#define _MMR_u64 vuint64m1_t -#define _MMR_u32 vuint32m1_t - -#define _MMR_MASK_i64 vbool64_t -#define _MMR_MASK_i32 vbool32_t - -/* - Reinterpret Intrinsics -*/ - -#define _MM_CAST_i32_u32(op1) __riscv_vreinterpret_v_u32m1_i32m1(op1) -#define _MM_CAST_u32_i32(op1) __riscv_vreinterpret_v_i32m1_u32m1(op1) -#define _MM_CAST_i64_u64(op1) __riscv_vreinterpret_v_u64m1_i64m1(op1) -#define _MM_CAST_u64_i64(op1) __riscv_vreinterpret_v_i64m1_u64m1(op1) - -#define _MM_CAST_i32_f32(op1) __riscv_vreinterpret_v_f32m1_i32m1(op1) -#define _MM_CAST_u32_f32(op1) __riscv_vreinterpret_v_f32m1_u32m1(op1) -#define _MM_CAST_f32_i32(op1) __riscv_vreinterpret_v_i32m1_f32m1(op1) -#define _MM_CAST_f32_u32(op1) __riscv_vreinterpret_v_u32m1_f32m1(op1) - -#define _MM_CAST_i64_f64(op1) __riscv_vreinterpret_v_f64m1_i64m1(op1) -#define _MM_CAST_u64_f64(op1) __riscv_vreinterpret_v_f64m1_u64m1(op1) -#define _MM_CAST_f64_i64(op1) __riscv_vreinterpret_v_i64m1_f64m1(op1) -#define _MM_CAST_f64_u64(op1) __riscv_vreinterpret_v_u64m1_f64m1(op1) - -/* - Integer Intrinsics -*/ - -#define _MM_SET_i64(op1, vl) __riscv_vmv_v_x_i64m1(op1, vl) -#define _MM_SET_i32(op1, vl) __riscv_vmv_v_x_i32m1(op1, vl) - -#define _MM_MERGE_i64(op1, op2, op3, vl) \ - __riscv_vmerge_vvm_i64m1(op1, op2, op3, vl) -#define _MM_MERGE_i32(op1, op2, op3, vl) \ - __riscv_vmerge_vvm_i32m1(op1, op2, op3, vl) - -#define _MM_AND_i64(op1, op2, vl) __riscv_vand_vv_i64m1(op1, op2, vl) -#define _MM_AND_i32(op1, op2, vl) __riscv_vand_vv_i32m1(op1, op2, vl) - -#define _MM_OR_i64(op1, op2, vl) __riscv_vor_vv_i64m1(op1, op2, vl) -#define _MM_OR_i32(op1, op2, vl) __riscv_vor_vv_i32m1(op1, op2, vl) - -#define _MM_XOR_i64(op1, op2, vl) __riscv_vxor_vv_i64m1(op1, op2, vl) -#define _MM_XOR_i32(op1, op2, vl) __riscv_vxor_vv_i32m1(op1, op2, vl) - -#define _MM_SLL_i64(op1, op2, vl) __riscv_vsll_vv_i64m1(op1, op2, vl) -#define _MM_SLL_i32(op1, op2, vl) __riscv_vsll_vv_i32m1(op1, op2, vl) - -#define _MM_SRL_i64(op1, op2, vl) __riscv_vsrl_vv_u64m1(op1, op2, vl) -#define _MM_SRL_i32(op1, op2, vl) __riscv_vsrl_vv_u32m1(op1, op2, vl) - -#define _MM_ADD_i64(op1, op2, vl) __riscv_vadd_vv_i64m1(op1, op2, vl) -#define _MM_ADD_i32(op1, op2, vl) __riscv_vadd_vv_i32m1(op1, op2, vl) - -#define _MM_SUB_i64(op1, op2, vl) __riscv_vsub_vv_i64m1(op1, op2, vl) -#define _MM_SUB_i32(op1, op2, vl) __riscv_vsub_vv_i32m1(op1, op2, vl) - -#define _MM_MUL_i64(op1, op2, vl) __riscv_vmul_vv_i64m1(op1, op2, vl) -#define _MM_MUL_i32(op1, op2, vl) __riscv_vmul_vv_i32m1(op1, op2, vl) - -#define _MM_VMSEQ_i64(op1, op2, vl) __riscv_vmseq_vv_i64m1_b64(op1, op2, vl) -#define _MM_VMSEQ_i32(op1, op2, vl) __riscv_vmseq_vv_i32m1_b32(op1, op2, vl) - -/* - Floating-Point Intrinsics -*/ - -#define _MM_SET_f64(op1, vl) __riscv_vfmv_v_f_f64m1(op1, vl) -#define _MM_SET_f32(op1, vl) __riscv_vfmv_v_f_f32m1(op1, vl) - -#define _MM_MERGE_f64(op1, op2, op3, vl) \ - __riscv_vmerge_vvm_f64m1(op1, op2, op3, vl) -#define _MM_MERGE_f32(op1, op2, op3, vl) \ - __riscv_vmerge_vvm_f32m1(op1, op2, op3, vl) - -#define _MM_MAX_f64(op1, op2, vl) __riscv_vfmax_vv_f64m1(op1, op2, vl) -#define _MM_MAX_f32(op1, op2, vl) __riscv_vfmax_vv_f32m1(op1, op2, vl) - -#define _MM_ADD_f64(op1, op2, vl) __riscv_vfadd_vv_f64m1(op1, op2, vl) -#define _MM_ADD_f32(op1, op2, vl) __riscv_vfadd_vv_f32m1(op1, op2, vl) - -#define _MM_SUB_f64(op1, op2, vl) __riscv_vfsub_vv_f64m1(op1, op2, vl) -#define _MM_SUB_f32(op1, op2, vl) __riscv_vfsub_vv_f32m1(op1, op2, vl) - -#define _MM_MUL_f64(op1, op2, vl) __riscv_vfmul_vv_f64m1(op1, op2, vl) -#define _MM_MUL_f32(op1, op2, vl) __riscv_vfmul_vv_f32m1(op1, op2, vl) - -#define _MM_MACC_f64(op1, op2, op3, vl) \ - __riscv_vfmacc_vv_f64m1(op1, op2, op3, vl) -#define _MM_MACC_f32(op1, op2, op3, vl) \ - __riscv_vfmacc_vv_f32m1(op1, op2, op3, vl) - -#define _MM_MADD_f64(op1, op2, op3, vl) \ - __riscv_vfmadd_vv_f64m1(op1, op2, op3, vl) -#define _MM_MADD_f32(op1, op2, op3, vl) \ - __riscv_vfmadd_vv_f32m1(op1, op2, op3, vl) - -#define _MM_VFCVT_F_X_f64(op1, vl) __riscv_vfcvt_f_x_v_f64m1(op1, vl) -#define _MM_VFCVT_F_X_f32(op1, vl) __riscv_vfcvt_f_x_v_f32m1(op1, vl) -#define _MM_VFCVT_X_F_i64(op1, vl) __riscv_vfcvt_x_f_v_i64m1(op1, vl) -#define _MM_VFCVT_X_F_i32(op1, vl) __riscv_vfcvt_x_f_v_i32m1(op1, vl) -#define _MM_VFWCVT_F_F_f64(op1, vl) __riscv_vfwcvt_f_f_v_f64m2(op1, vl) -#define _MM_VFNCVT_F_F_f32(op1, vl) __riscv_vfncvt_f_f_w_f32m1(op1, vl) -#define _MM_VFWCVT_F_X_f64(op1, vl) __riscv_vfwcvt_f_x_v_f64m2(op1, vl) -#define _MM_VFCVT_f32_i32(op1, vl) __riscv_vfcvt_f_x_v_f32m1(op1, vl) - -#define _MM_VFLE_f64(op1, op2, vl) __riscv_vmfle_vv_f64m1_b64(op1, op2, vl) -#define _MM_VFLE_f32(op1, op2, vl) __riscv_vmfle_vv_f32m1_b32(op1, op2, vl) - -#define _MM_VFLT_f64(op1, op2, vl) __riscv_vmflt_vv_f64m1_b64(op1, op2, vl) -#define _MM_VFLT_f32(op1, op2, vl) __riscv_vmflt_vv_f32m1_b32(op1, op2, vl) - -/* - Ancillary Defines -*/ - -#define FENCE() asm volatile("fence"); diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/spmv.c b/bb-tests/workloads/src/CTest/rvv/common/ara/spmv.c deleted file mode 100644 index af2e8187..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/spmv.c +++ /dev/null @@ -1,116 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Chi Zhang, ETH Zurich - -#include "spmv.h" -#include "util.h" -#include -#include -#include -#include -#include -#include -#include - -void spmv_csr_idx32(int32_t N_ROW, int32_t *CSR_PROW, int32_t *CSR_INDEX, - double *CSR_DATA, double *IN_VEC, double *OUT_VEC) { - /* printf("Do spmv\n"); */ - for (int i = 0; i < N_ROW; ++i) { - int32_t len = CSR_PROW[i + 1] - CSR_PROW[i]; - double *data = CSR_DATA + CSR_PROW[i]; - int32_t *index = CSR_INDEX + CSR_PROW[i]; - - // clear register file - - volatile size_t maxvl = __riscv_vsetvl_e64m8(len); - asm volatile("vmv.v.i v24, 0"); - - double dbg = 0; - while (len) { - volatile size_t slice_size = __riscv_vsetvl_e64m8(len); - asm volatile("vle64.v v8, (%0)" ::"r"(data)); // fetch entries - asm volatile("vle32.v v16, (%0)" ::"r"(index)); // fetch indices - asm volatile("vloxei32.v v0, (%0), v16" ::"r"(IN_VEC)); // load data - asm volatile("vfmacc.vv v24, v8, v0"); // vector multiply - /* if (i == 0) { */ - /* printf("slice=%ld\n", slice_size); */ - /* for (size_t j = 0; j < slice_size; j++) { */ - /* double t, u, v; */ - /* asm volatile("vrgather.vx v16, v12, %0" :: "r"(j)); */ - /* asm volatile("vfmv.f.s %0, v16" : "=f"(t)); */ - - /* asm volatile("vrgather.vx v16, v4, %0" :: "r"(j)); */ - /* asm volatile("vfmv.f.s %0, v16" : "=f"(u)); */ - - /* asm volatile("vrgather.vx v16, v0, %0" :: "r"(j)); */ - /* asm volatile("vfmv.f.s %0, v16" : "=f"(v)); */ - /* dbg += data[j] * IN_VEC[index[j]/sizeof(double)]; */ - /* printf("i=%ld j=%ld id=%ld data=%lx vec=%lx out=%lx - * vdata=\"%lx\" vvec=\"%lx\" vout=\"%lx\"\n", */ - /* i, j, index[j] / sizeof(double), */ - /* *(uint64_t*)(&data[j]), - * *(uint64_t*)(&IN_VEC[index[j]/sizeof(double)]), */ - /* *(uint64_t*)(&dbg), */ - /* *(uint64_t*)(&u), *(uint64_t*)(&v), *(uint64_t*)(&t) */ - /* ); */ - - /* } */ - /* } */ - - len = len - slice_size; - data = data + slice_size; - index = index + slice_size; - } - - double tmp; - asm volatile("vsetvli x0, %0, e64, m8, ta, ma" ::"r"(maxvl)); - asm volatile("vmv.v.i v8, 0"); - asm volatile("vfredusum.vs v24, v24, v8"); // reduction - asm volatile("vfmv.f.s %0, v24" : "=f"(tmp)); - OUT_VEC[i] = tmp; - } - /* printf("Verifying\n"); */ - // spmv_verify(N_ROW, CSR_PROW, CSR_INDEX, CSR_DATA, IN_VEC, OUT_VEC); -} - -int spmv_verify(int32_t N_ROW, int32_t *CSR_PROW, int32_t *CSR_INDEX, - double *CSR_DATA, double *IN_VEC, double *OUT_VEC) { - for (int32_t i = 0; i < N_ROW; ++i) { - double res = OUT_VEC[i]; - - int32_t len = CSR_PROW[i + 1] - CSR_PROW[i]; - double *data = CSR_DATA + CSR_PROW[i]; - int32_t *index = CSR_INDEX + CSR_PROW[i]; - - double golden = 0; - for (int32_t j = 0; j < len; ++j) { - int32_t idx = index[j] / sizeof(double); - double next = golden + data[j] * IN_VEC[idx]; - /* printf("index:%d, data: %lx, vec: %lx %lx\n", idx, - * *(uint64_t*)(&data[j]), *(uint64_t*)(&IN_VEC[idx]), */ - /* *(uint64_t*)(&next)); */ - golden = next; - } - if ((float)golden != (float)res) { - printf("Sorry, wrong value! at index %d, result = %lx, golden = %lx \n", - i, *(uint64_t *)(&res), *(uint64_t *)(&golden)); - exit(1); - return i; - } - } - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/spmv.h b/bb-tests/workloads/src/CTest/rvv/common/ara/spmv.h deleted file mode 100644 index 3a449966..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/spmv.h +++ /dev/null @@ -1,30 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Chi Zhang, ETH Zurich - -#ifndef _SPMV_H -#define _SPMV_H - -#include - -void spmv_csr_idx32(int32_t N_ROW, int32_t *CSR_PROW, int32_t *CSR_INDEX, - double *CSR_DATA, double *IN_VEC, double *OUT_VEC); - -int spmv_verify(int32_t N_ROW, int32_t *CSR_PROW, int32_t *CSR_INDEX, - double *CSR_DATA, double *IN_VEC, double *OUT_VEC); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/util.c b/bb-tests/workloads/src/CTest/rvv/common/ara/util.c deleted file mode 100644 index ffaadb3a..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/util.c +++ /dev/null @@ -1,42 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti -// -// Utility functions for Ara software environment - -#include "util.h" - -int *__dummy__errno__ptr__; - -// Floating-point similarity check with threshold -int similarity_check(double a, double b, double threshold) { - double diff = a - b; - if (FABS(diff) > threshold) - return 0; - else - return 1; -} -int similarity_check_32b(float a, float b, float threshold) { - float diff = a - b; - if (FABS(diff) > threshold) - return 0; - else - return 1; -} - -// Dummy declaration for libm exp -int *__errno(void) { return __dummy__errno__ptr__; } diff --git a/bb-tests/workloads/src/CTest/rvv/common/ara/util.h b/bb-tests/workloads/src/CTest/rvv/common/ara/util.h deleted file mode 100644 index 7118a9b6..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/ara/util.h +++ /dev/null @@ -1,58 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti -// -// Utility functions for Ara software environment (header file) - -#ifndef __UTIL_H__ -#define __UTIL_H__ - -#ifdef GATE_SIM -#define NO_PRINTF -#endif - -#ifdef VCD_DUMP -#pragma message("VCD_DUMP successfully initialized") -#define NO_PRINTF -#define NO_TIMER -#endif - -// Don't measure performance -#ifdef NO_TIMER -#define start_timer() -#define stop_timer() -#define get_timer() 0 -#endif - -// Don't print anything -#ifdef NO_PRINTF -#define printf_(...) -#endif - -#define FABS(x) ((x < 0) ? -x : x) - -#define FABS(x) ((x < 0) ? -x : x) -#define MIN(a, b) ((a) <= (b) ? (a) : (b)) - -// Floating-point similarity check with threshold -int similarity_check(double a, double b, double threshold); -int similarity_check_32b(float a, float b, float threshold); - -// Dummy declaration for libm exp -int *__errno(void); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/common/crt.S b/bb-tests/workloads/src/CTest/rvv/common/crt.S deleted file mode 100644 index 3f5bb2c1..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/crt.S +++ /dev/null @@ -1,225 +0,0 @@ -# See LICENSE for license details. - -#include "encoding.h" - -#if __riscv_xlen == 64 -# define LREG ld -# define SREG sd -# define REGBYTES 8 -#else -# define LREG lw -# define SREG sw -# define REGBYTES 4 -#endif - - .section ".text.init" - .globl _start -_start: - li x1, 0 - li x2, 0 - li x3, 0 - li x4, 0 - li x5, 0 - li x6, 0 - li x7, 0 - li x8, 0 - li x9, 0 - li x10,0 - li x11,0 - li x12,0 - li x13,0 - li x14,0 - li x15,0 - li x16,0 - li x17,0 - li x18,0 - li x19,0 - li x20,0 - li x21,0 - li x22,0 - li x23,0 - li x24,0 - li x25,0 - li x26,0 - li x27,0 - li x28,0 - li x29,0 - li x30,0 - li x31,0 - - # enable FPU, vector, and accelerator if present - li t0, MSTATUS_FS | MSTATUS_XS | MSTATUS_VS - csrs mstatus, t0 - - # make sure XLEN agrees with compilation choice - li t0, 1 - slli t0, t0, 31 -#if __riscv_xlen == 64 - bgez t0, 1f -#else - bltz t0, 1f -#endif -2: - li a0, 1 - sw a0, tohost, t0 - j 2b -1: - -#ifdef __riscv_flen - # initialize FPU if we have one - la t0, 1f - csrw mtvec, t0 - - fssr x0 - fmv.s.x f0, x0 - fmv.s.x f1, x0 - fmv.s.x f2, x0 - fmv.s.x f3, x0 - fmv.s.x f4, x0 - fmv.s.x f5, x0 - fmv.s.x f6, x0 - fmv.s.x f7, x0 - fmv.s.x f8, x0 - fmv.s.x f9, x0 - fmv.s.x f10,x0 - fmv.s.x f11,x0 - fmv.s.x f12,x0 - fmv.s.x f13,x0 - fmv.s.x f14,x0 - fmv.s.x f15,x0 - fmv.s.x f16,x0 - fmv.s.x f17,x0 - fmv.s.x f18,x0 - fmv.s.x f19,x0 - fmv.s.x f20,x0 - fmv.s.x f21,x0 - fmv.s.x f22,x0 - fmv.s.x f23,x0 - fmv.s.x f24,x0 - fmv.s.x f25,x0 - fmv.s.x f26,x0 - fmv.s.x f27,x0 - fmv.s.x f28,x0 - fmv.s.x f29,x0 - fmv.s.x f30,x0 - fmv.s.x f31,x0 -1: -#endif - - # initialize trap vector - la t0, trap_entry - csrw mtvec, t0 - - # initialize global pointer -.option push -.option norelax - la gp, __global_pointer$ -.option pop - - la tp, _end + 63 - and tp, tp, -64 - - # get core id - csrr a0, mhartid - # for now, assume only 1 core - li a1, 1 -1:bgeu a0, a1, 1b - - # give each core 128KB of stack + TLS -#define STKSHIFT 17 - add sp, a0, 1 - sll sp, sp, STKSHIFT - add sp, sp, tp - sll a2, a0, STKSHIFT - add tp, tp, a2 - - j _init - - .align 2 -trap_entry: - addi sp, sp, -272 - - SREG x1, 1*REGBYTES(sp) - SREG x2, 2*REGBYTES(sp) - SREG x3, 3*REGBYTES(sp) - SREG x4, 4*REGBYTES(sp) - SREG x5, 5*REGBYTES(sp) - SREG x6, 6*REGBYTES(sp) - SREG x7, 7*REGBYTES(sp) - SREG x8, 8*REGBYTES(sp) - SREG x9, 9*REGBYTES(sp) - SREG x10, 10*REGBYTES(sp) - SREG x11, 11*REGBYTES(sp) - SREG x12, 12*REGBYTES(sp) - SREG x13, 13*REGBYTES(sp) - SREG x14, 14*REGBYTES(sp) - SREG x15, 15*REGBYTES(sp) - SREG x16, 16*REGBYTES(sp) - SREG x17, 17*REGBYTES(sp) - SREG x18, 18*REGBYTES(sp) - SREG x19, 19*REGBYTES(sp) - SREG x20, 20*REGBYTES(sp) - SREG x21, 21*REGBYTES(sp) - SREG x22, 22*REGBYTES(sp) - SREG x23, 23*REGBYTES(sp) - SREG x24, 24*REGBYTES(sp) - SREG x25, 25*REGBYTES(sp) - SREG x26, 26*REGBYTES(sp) - SREG x27, 27*REGBYTES(sp) - SREG x28, 28*REGBYTES(sp) - SREG x29, 29*REGBYTES(sp) - SREG x30, 30*REGBYTES(sp) - SREG x31, 31*REGBYTES(sp) - - csrr a0, mcause - csrr a1, mepc - mv a2, sp - jal handle_trap - csrw mepc, a0 - - # Remain in M-mode after eret - li t0, MSTATUS_MPP - csrs mstatus, t0 - - LREG x1, 1*REGBYTES(sp) - LREG x2, 2*REGBYTES(sp) - LREG x3, 3*REGBYTES(sp) - LREG x4, 4*REGBYTES(sp) - LREG x5, 5*REGBYTES(sp) - LREG x6, 6*REGBYTES(sp) - LREG x7, 7*REGBYTES(sp) - LREG x8, 8*REGBYTES(sp) - LREG x9, 9*REGBYTES(sp) - LREG x10, 10*REGBYTES(sp) - LREG x11, 11*REGBYTES(sp) - LREG x12, 12*REGBYTES(sp) - LREG x13, 13*REGBYTES(sp) - LREG x14, 14*REGBYTES(sp) - LREG x15, 15*REGBYTES(sp) - LREG x16, 16*REGBYTES(sp) - LREG x17, 17*REGBYTES(sp) - LREG x18, 18*REGBYTES(sp) - LREG x19, 19*REGBYTES(sp) - LREG x20, 20*REGBYTES(sp) - LREG x21, 21*REGBYTES(sp) - LREG x22, 22*REGBYTES(sp) - LREG x23, 23*REGBYTES(sp) - LREG x24, 24*REGBYTES(sp) - LREG x25, 25*REGBYTES(sp) - LREG x26, 26*REGBYTES(sp) - LREG x27, 27*REGBYTES(sp) - LREG x28, 28*REGBYTES(sp) - LREG x29, 29*REGBYTES(sp) - LREG x30, 30*REGBYTES(sp) - LREG x31, 31*REGBYTES(sp) - - addi sp, sp, 272 - mret - -.section ".tohost","aw",@progbits -.align 6 -.globl tohost -tohost: .dword 0 -.align 6 -.globl fromhost -fromhost: .dword 0 diff --git a/bb-tests/workloads/src/CTest/rvv/common/syscalls.c b/bb-tests/workloads/src/CTest/rvv/common/syscalls.c deleted file mode 100644 index 24689182..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/syscalls.c +++ /dev/null @@ -1,472 +0,0 @@ -// See LICENSE for license details. - -#include "util.h" -#include -#include -#include -#include -#include -#include - -#ifdef __cplusplus -extern "C" { -#endif - -#define SYS_write 64 - -#undef strcmp - -extern volatile uint64_t tohost; -extern volatile uint64_t fromhost; - -static uintptr_t syscall(uintptr_t which, uint64_t arg0, uint64_t arg1, - uint64_t arg2) { - volatile uint64_t magic_mem[8] __attribute__((aligned(64))); - magic_mem[0] = which; - magic_mem[1] = arg0; - magic_mem[2] = arg1; - magic_mem[3] = arg2; - __sync_synchronize(); - - tohost = (uintptr_t)magic_mem; - while (fromhost == 0) - ; - fromhost = 0; - - __sync_synchronize(); - return magic_mem[0]; -} - -#define NUM_COUNTERS 2 -static uintptr_t counters[NUM_COUNTERS]; -static const char *counter_names[NUM_COUNTERS]; - -void setStats(int enable) { - int i = 0; -#define READ_CTR(name) \ - do { \ - while (i >= NUM_COUNTERS) \ - ; \ - uintptr_t csr = read_csr(name); \ - if (!enable) { \ - csr -= counters[i]; \ - counter_names[i] = (const char *)#name; \ - } \ - counters[i++] = csr; \ - } while (0) - - READ_CTR(mcycle); - READ_CTR(minstret); - -#undef READ_CTR -} - -void __attribute__((noreturn)) tohost_exit(uintptr_t code) { - tohost = (code << 1) | 1; - while (1) - ; -} - -uintptr_t __attribute__((weak)) handle_trap(uintptr_t cause, uintptr_t epc, - uintptr_t regs[32]) { - tohost_exit(1337); -} - -void exit(int code) { tohost_exit(code); } - -void abort() { exit(128 + SIGABRT); } - -void printstr(const char *s) { syscall(SYS_write, 1, (uintptr_t)s, strlen(s)); } - -void __attribute__((weak)) thread_entry(int cid, int nc) { - // multi-threaded programs override this function. - // for the case of single-threaded programs, only let core 0 proceed. - while (cid != 0) - ; -} - -int __attribute__((weak)) main(int argc, char **argv) { - // single-threaded programs override this function. - printstr("Implement main(), foo!\n"); - return -1; -} - -static void init_tls() { - register void *thread_pointer asm("tp"); - extern char _tdata_begin, _tdata_end, _tbss_end; - size_t tdata_size = &_tdata_end - &_tdata_begin; - memcpy(thread_pointer, &_tdata_begin, tdata_size); - size_t tbss_size = &_tbss_end - &_tdata_end; - memset((char *)thread_pointer + tdata_size, 0, tbss_size); -} - -void _init(int cid, int nc) { - init_tls(); - thread_entry(cid, nc); - - // only single-threaded programs should ever get here. - int ret = main(0, 0); - - char buf[NUM_COUNTERS * 32] __attribute__((aligned(64))); - char *pbuf = buf; - for (int i = 0; i < NUM_COUNTERS; i++) - if (counters[i]) - pbuf += sprintf(pbuf, "%s = %d\n", counter_names[i], counters[i]); - if (pbuf != buf) - printstr(buf); - - exit(ret); -} - -#undef putchar -int putchar(int ch) { - static __thread char buf[64] __attribute__((aligned(64))); - static __thread int buflen = 0; - - buf[buflen++] = ch; - - if (ch == '\n' || buflen == sizeof(buf)) { - syscall(SYS_write, 1, (uintptr_t)buf, buflen); - buflen = 0; - } - - return 0; -} - -void printhex(uint64_t x) { - char str[17]; - int i; - for (i = 0; i < 16; i++) { - str[15 - i] = (x & 0xF) + ((x & 0xF) < 10 ? '0' : 'a' - 10); - x >>= 4; - } - str[16] = 0; - - printstr(str); -} - -static inline void printnum(void (*putch)(int, void **), void **putdat, - unsigned long long num, unsigned base, int width, - int padc) { - unsigned digs[sizeof(num) * CHAR_BIT]; - int pos = 0; - - while (1) { - digs[pos++] = num % base; - if (num < base) - break; - num /= base; - } - - while (width-- > pos) - putch(padc, putdat); - - while (pos-- > 0) - putch(digs[pos] + (digs[pos] >= 10 ? 'a' - 10 : '0'), putdat); -} - -static unsigned long long getuint(va_list *ap, int lflag) { - if (lflag >= 2) - return va_arg(*ap, unsigned long long); - else if (lflag) - return va_arg(*ap, unsigned long); - else - return va_arg(*ap, unsigned int); -} - -static long long getint(va_list *ap, int lflag) { - if (lflag >= 2) - return va_arg(*ap, long long); - else if (lflag) - return va_arg(*ap, long); - else - return va_arg(*ap, int); -} - -static void vprintfmt(void (*putch)(int, void **), void **putdat, - const char *fmt, va_list ap) { - const char *p; - const char *last_fmt; - unsigned long long num; - int ch, err; - int base, lflag, width, precision, altflag; - char padc; - - while (1) { - while ((ch = *(unsigned char *)fmt) != '%') { - if (ch == '\0') - return; - fmt++; - putch(ch, putdat); - } - fmt++; - - // Process a %-escape sequence - last_fmt = fmt; - padc = ' '; - width = -1; - precision = -1; - lflag = 0; - altflag = 0; - reswitch: - switch (ch = *(unsigned char *)fmt++) { - - // flag to pad on the right - case '-': - padc = '-'; - goto reswitch; - - // flag to pad with 0's instead of spaces - case '0': - padc = '0'; - goto reswitch; - - // width field - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - for (precision = 0;; ++fmt) { - precision = precision * 10 + ch - '0'; - ch = *fmt; - if (ch < '0' || ch > '9') - break; - } - goto process_precision; - - case '*': - precision = va_arg(ap, int); - goto process_precision; - - case '.': - if (width < 0) - width = 0; - goto reswitch; - - case '#': - altflag = 1; - goto reswitch; - - process_precision: - if (width < 0) - width = precision, precision = -1; - goto reswitch; - - // long flag (doubled for long long) - case 'l': - lflag++; - goto reswitch; - - // character - case 'c': - putch(va_arg(ap, int), putdat); - break; - - // string - case 's': - if ((p = va_arg(ap, char *)) == NULL) - p = "(null)"; - if (width > 0 && padc != '-') { - size_t slen = strlen(p); - if (precision >= 0 && (size_t)precision < slen) - slen = precision; - for (width -= slen; width > 0; width--) - putch(padc, putdat); - } - for (; (ch = *p) != '\0' && (precision < 0 || --precision >= 0); - width--) { - putch(ch, putdat); - p++; - } - for (; width > 0; width--) - putch(' ', putdat); - break; - - // (signed) decimal - case 'd': - num = getint(&ap, lflag); - if ((long long)num < 0) { - putch('-', putdat); - num = -(long long)num; - } - base = 10; - goto signed_number; - - // unsigned decimal - case 'u': - base = 10; - goto unsigned_number; - - // (unsigned) octal - case 'o': - // should do something with padding so it's always 3 octits - base = 8; - goto unsigned_number; - - // pointer - case 'p': - static_assert(sizeof(long) == sizeof(void *)); - lflag = 1; - putch('0', putdat); - putch('x', putdat); - /* fall through to 'x' */ - - // (unsigned) hexadecimal - case 'x': - base = 16; - unsigned_number: - num = getuint(&ap, lflag); - signed_number: - printnum(putch, putdat, num, base, width, padc); - break; - - // escaped '%' character - case '%': - putch(ch, putdat); - break; - - // unrecognized escape sequence - just print it literally - default: - putch('%', putdat); - fmt = last_fmt; - break; - } - } -} - -int printf(const char *fmt, ...) { - va_list ap; - va_start(ap, fmt); - - vprintfmt((void (*)(int, void **))putchar, 0, fmt, ap); - - va_end(ap); - return 0; // incorrect return value, but who cares, anyway? -} - -static void sprintf_putch(int ch, void **data) { - char **pstr = (char **)data; - **pstr = ch; - (*pstr)++; -} - -int sprintf(char *str, const char *fmt, ...) { - va_list ap; - char *str0 = str; - va_start(ap, fmt); - - vprintfmt(sprintf_putch, (void **)&str, fmt, ap); - *str = 0; - - va_end(ap); - return str - str0; -} - -void *memcpy(void *dest, const void *src, size_t len) { - if ((((uintptr_t)dest | (uintptr_t)src | len) & (sizeof(uintptr_t) - 1)) == - 0) { - const uintptr_t *s = (const uintptr_t *)src; - uintptr_t *d = (uintptr_t *)dest; - uintptr_t *end = (uintptr_t *)((char *)dest + len); - while (d + 8 < end) { - uintptr_t reg[8] = {s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7]}; - d[0] = reg[0]; - d[1] = reg[1]; - d[2] = reg[2]; - d[3] = reg[3]; - d[4] = reg[4]; - d[5] = reg[5]; - d[6] = reg[6]; - d[7] = reg[7]; - d += 8; - s += 8; - } - while (d < end) - *d++ = *s++; - } else { - const char *s = (const char *)src; - char *d = (char *)dest; - while (d < (char *)dest + len) - *d++ = *s++; - } - return dest; -} - -void *memset(void *dest, int byte, size_t len) { - if ((((uintptr_t)dest | len) & (sizeof(uintptr_t) - 1)) == 0) { - uintptr_t word = byte & 0xFF; - word |= word << 8; - word |= word << 16; - word |= word << 16 << 16; - - uintptr_t *d = (uintptr_t *)dest; - while (d < (uintptr_t *)((char *)dest + len)) - *d++ = word; - } else { - char *d = (char *)dest; - while (d < (char *)dest + len) - *d++ = byte; - } - return dest; -} - -size_t strlen(const char *s) { - const char *p = s; - while (*p) - p++; - return p - s; -} - -size_t strnlen(const char *s, size_t n) { - const char *p = s; - while (n-- && *p) - p++; - return p - s; -} - -int strcmp(const char *s1, const char *s2) { - unsigned char c1, c2; - - do { - c1 = *s1++; - c2 = *s2++; - } while (c1 != 0 && c1 == c2); - - return c1 - c2; -} - -char *strcpy(char *dest, const char *src) { - char *d = dest; - while ((*d++ = *src++)) - ; - return dest; -} - -long atol(const char *str) { - long res = 0; - int sign = 0; - - while (*str == ' ') - str++; - - if (*str == '-' || *str == '+') { - sign = *str == '-'; - str++; - } - - while (*str) { - res *= 10; - res += *str++ - '0'; - } - - return sign ? -res : res; -} - -#ifdef __cplusplus -} -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/common/test.ld b/bb-tests/workloads/src/CTest/rvv/common/test.ld deleted file mode 100644 index 08c6e819..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/test.ld +++ /dev/null @@ -1,65 +0,0 @@ -/*======================================================================*/ -/* Proxy kernel linker script */ -/*======================================================================*/ -/* This is the linker script used when building the proxy kernel. */ - -/*----------------------------------------------------------------------*/ -/* Setup */ -/*----------------------------------------------------------------------*/ - -/* The OUTPUT_ARCH command specifies the machine architecture where the - argument is one of the names used in the BFD library. More - specifically one of the entires in bfd/cpu-mips.c */ - -OUTPUT_ARCH( "riscv" ) -ENTRY(_start) - -/*----------------------------------------------------------------------*/ -/* Sections */ -/*----------------------------------------------------------------------*/ - -SECTIONS -{ - - /* text: test code section */ - . = 0x80000000; - .text.init : { *(.text.init) } - - . = ALIGN(0x1000); - .tohost : { *(.tohost) } - - . = ALIGN(0x1000); - .text : { *(.text) } - - /* data segment */ - .data : { *(.data) } - - .sdata : { - __global_pointer$ = . + 0x800; - *(.srodata.cst16) *(.srodata.cst8) *(.srodata.cst4) *(.srodata.cst2) *(.srodata*) - *(.sdata .sdata.* .gnu.linkonce.s.*) - } - - /* bss segment */ - .sbss : { - *(.sbss .sbss.* .gnu.linkonce.sb.*) - *(.scommon) - } - .bss : { *(.bss) } - - /* thread-local data segment */ - .tdata : - { - _tdata_begin = .; - *(.tdata) - _tdata_end = .; - } - .tbss : - { - *(.tbss) - _tbss_end = .; - } - - /* End of uninitalized data segement */ - _end = .; -} diff --git a/bb-tests/workloads/src/CTest/rvv/common/util.h b/bb-tests/workloads/src/CTest/rvv/common/util.h deleted file mode 100644 index 1bebbd4e..00000000 --- a/bb-tests/workloads/src/CTest/rvv/common/util.h +++ /dev/null @@ -1,115 +0,0 @@ -// See LICENSE for license details. - -#ifndef __UTIL_H -#define __UTIL_H - -#ifdef __cplusplus -extern "C" { -#endif - -extern void setStats(int enable); - -#include - -#define static_assert(cond) \ - switch (0) { \ - case 0: \ - case !!(long)(cond):; \ - } - -static int verify(int n, const volatile int *test, const int *verify) { - int i; - // Unrolled for faster verification - for (i = 0; i < n / 2 * 2; i += 2) { - int t0 = test[i], t1 = test[i + 1]; - int v0 = verify[i], v1 = verify[i + 1]; - if (t0 != v0) - return i + 1; - if (t1 != v1) - return i + 2; - } - if (n % 2 != 0 && test[n - 1] != verify[n - 1]) - return n; - return 0; -} - -static int verifyDouble(int n, const volatile double *test, - const double *verify) { - int i; - // Unrolled for faster verification - for (i = 0; i < n / 2 * 2; i += 2) { - double t0 = test[i], t1 = test[i + 1]; - double v0 = verify[i], v1 = verify[i + 1]; - int eq1 = t0 == v0, eq2 = t1 == v1; - if (!(eq1 & eq2)) - return i + 1 + eq1; - } - if (n % 2 != 0 && test[n - 1] != verify[n - 1]) - return n; - return 0; -} - -static int verifyFloat(int n, const volatile float *test, const float *verify) { - int i; - // Unrolled for faster verification - for (i = 0; i < n / 2 * 2; i += 2) { - float t0 = test[i], t1 = test[i + 1]; - float v0 = verify[i], v1 = verify[i + 1]; - int eq1 = t0 == v0, eq2 = t1 == v1; - if (!(eq1 & eq2)) - return i + 1 + eq1; - } - if (n % 2 != 0 && test[n - 1] != verify[n - 1]) - return n; - return 0; -} - -static void __attribute__((noinline)) barrier(int ncores) { - static volatile int sense; - static volatile int count; - static __thread int threadsense; - - __sync_synchronize(); - - threadsense = !threadsense; - if (__sync_fetch_and_add(&count, 1) == ncores - 1) { - count = 0; - sense = threadsense; - } else - while (sense != threadsense) - ; - - __sync_synchronize(); -} - -static uint64_t lfsr(uint64_t x) { - uint64_t bit = (x ^ (x >> 1)) & 1; - return (x >> 1) | (bit << 62); -} - -static uintptr_t insn_len(uintptr_t pc) { - return (*(unsigned short *)pc & 3) ? 4 : 2; -} - -#ifdef __riscv -#include "encoding.h" -#endif - -#define stringify_1(s) #s -#define stringify(s) stringify_1(s) -#define stats(code, iter) \ - do { \ - unsigned long _c = -read_csr(mcycle), _i = -read_csr(minstret); \ - code; \ - _c += read_csr(mcycle), _i += read_csr(minstret); \ - if (cid == 0) \ - printf("\n%s: %ld cycles, %ld.%ld cycles/iter, %ld.%ld CPI\n", \ - stringify(code), _c, _c / iter, 10 * _c / iter % 10, _c / _i, \ - 10 * _c / _i % 10); \ - } while (0) - -#ifdef __cplusplus -} -#endif - -#endif //__UTIL_H diff --git a/bb-tests/workloads/src/CTest/rvv/env/LICENSE b/bb-tests/workloads/src/CTest/rvv/env/LICENSE deleted file mode 100644 index 48fe522a..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/LICENSE +++ /dev/null @@ -1,24 +0,0 @@ -Copyright (c) 2012-2015, The Regents of the University of California (Regents). -All Rights Reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: -1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. -2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. -3. Neither the name of the Regents nor the - names of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - -IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, -SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING -OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS -BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, -THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED -HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE -MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. diff --git a/bb-tests/workloads/src/CTest/rvv/env/encoding.h b/bb-tests/workloads/src/CTest/rvv/env/encoding.h deleted file mode 100644 index 2bef816d..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/encoding.h +++ /dev/null @@ -1,5024 +0,0 @@ -/* SPDX-License-Identifier: BSD-3-Clause */ - -/* Copyright (c) 2023 RISC-V International */ - -/* - * This file is auto-generated by running 'make' in - * https://github.com/riscv/riscv-opcodes (02b4866) - */ - -#ifndef RISCV_CSR_ENCODING_H -#define RISCV_CSR_ENCODING_H - -#define MSTATUS_UIE 0x00000001 -#define MSTATUS_SIE 0x00000002 -#define MSTATUS_HIE 0x00000004 -#define MSTATUS_MIE 0x00000008 -#define MSTATUS_UPIE 0x00000010 -#define MSTATUS_SPIE 0x00000020 -#define MSTATUS_UBE 0x00000040 -#define MSTATUS_MPIE 0x00000080 -#define MSTATUS_SPP 0x00000100 -#define MSTATUS_VS 0x00000600 -#define MSTATUS_MPP 0x00001800 -#define MSTATUS_FS 0x00006000 -#define MSTATUS_XS 0x00018000 -#define MSTATUS_MPRV 0x00020000 -#define MSTATUS_SUM 0x00040000 -#define MSTATUS_MXR 0x00080000 -#define MSTATUS_TVM 0x00100000 -#define MSTATUS_TW 0x00200000 -#define MSTATUS_TSR 0x00400000 -#define MSTATUS32_SD 0x80000000 -#define MSTATUS_UXL 0x0000000300000000 -#define MSTATUS_SXL 0x0000000C00000000 -#define MSTATUS_SBE 0x0000001000000000 -#define MSTATUS_MBE 0x0000002000000000 -#define MSTATUS_GVA 0x0000004000000000 -#define MSTATUS_MPV 0x0000008000000000 -#define MSTATUS64_SD 0x8000000000000000 - -#define MSTATUSH_SBE 0x00000010 -#define MSTATUSH_MBE 0x00000020 -#define MSTATUSH_GVA 0x00000040 -#define MSTATUSH_MPV 0x00000080 - -#define SSTATUS_UIE 0x00000001 -#define SSTATUS_SIE 0x00000002 -#define SSTATUS_UPIE 0x00000010 -#define SSTATUS_SPIE 0x00000020 -#define SSTATUS_UBE 0x00000040 -#define SSTATUS_SPP 0x00000100 -#define SSTATUS_VS 0x00000600 -#define SSTATUS_FS 0x00006000 -#define SSTATUS_XS 0x00018000 -#define SSTATUS_SUM 0x00040000 -#define SSTATUS_MXR 0x00080000 -#define SSTATUS32_SD 0x80000000 -#define SSTATUS_UXL 0x0000000300000000 -#define SSTATUS64_SD 0x8000000000000000 - -#define HSTATUS_VSXL 0x300000000 -#define HSTATUS_VTSR 0x00400000 -#define HSTATUS_VTW 0x00200000 -#define HSTATUS_VTVM 0x00100000 -#define HSTATUS_VGEIN 0x0003f000 -#define HSTATUS_HU 0x00000200 -#define HSTATUS_SPVP 0x00000100 -#define HSTATUS_SPV 0x00000080 -#define HSTATUS_GVA 0x00000040 -#define HSTATUS_VSBE 0x00000020 - -#define USTATUS_UIE 0x00000001 -#define USTATUS_UPIE 0x00000010 - -#define MNSTATUS_NMIE 0x00000008 -#define MNSTATUS_MNPP 0x00001800 -#define MNSTATUS_MNPV 0x00000080 - -#define DCSR_XDEBUGVER (3U << 30) -#define DCSR_NDRESET (1 << 29) -#define DCSR_FULLRESET (1 << 28) -#define DCSR_EBREAKM (1 << 15) -#define DCSR_EBREAKH (1 << 14) -#define DCSR_EBREAKS (1 << 13) -#define DCSR_EBREAKU (1 << 12) -#define DCSR_STOPCYCLE (1 << 10) -#define DCSR_STOPTIME (1 << 9) -#define DCSR_CAUSE (7 << 6) -#define DCSR_DEBUGINT (1 << 5) -#define DCSR_HALT (1 << 3) -#define DCSR_STEP (1 << 2) -#define DCSR_PRV (3 << 0) - -#define DCSR_CAUSE_NONE 0 -#define DCSR_CAUSE_SWBP 1 -#define DCSR_CAUSE_HWBP 2 -#define DCSR_CAUSE_DEBUGINT 3 -#define DCSR_CAUSE_STEP 4 -#define DCSR_CAUSE_HALT 5 -#define DCSR_CAUSE_GROUP 6 - -#define MCONTROL_TYPE(xlen) (0xfULL << ((xlen) - 4)) -#define MCONTROL_DMODE(xlen) (1ULL << ((xlen) - 5)) -#define MCONTROL_MASKMAX(xlen) (0x3fULL << ((xlen) - 11)) - -#define MCONTROL_SELECT (1 << 19) -#define MCONTROL_TIMING (1 << 18) -#define MCONTROL_ACTION (0x3f << 12) -#define MCONTROL_CHAIN (1 << 11) -#define MCONTROL_MATCH (0xf << 7) -#define MCONTROL_M (1 << 6) -#define MCONTROL_H (1 << 5) -#define MCONTROL_S (1 << 4) -#define MCONTROL_U (1 << 3) -#define MCONTROL_EXECUTE (1 << 2) -#define MCONTROL_STORE (1 << 1) -#define MCONTROL_LOAD (1 << 0) - -#define MCONTROL_TYPE_NONE 0 -#define MCONTROL_TYPE_MATCH 2 - -#define MCONTROL_ACTION_DEBUG_EXCEPTION 0 -#define MCONTROL_ACTION_DEBUG_MODE 1 -#define MCONTROL_ACTION_TRACE_START 2 -#define MCONTROL_ACTION_TRACE_STOP 3 -#define MCONTROL_ACTION_TRACE_EMIT 4 - -#define MCONTROL_MATCH_EQUAL 0 -#define MCONTROL_MATCH_NAPOT 1 -#define MCONTROL_MATCH_GE 2 -#define MCONTROL_MATCH_LT 3 -#define MCONTROL_MATCH_MASK_LOW 4 -#define MCONTROL_MATCH_MASK_HIGH 5 - -#define MIP_USIP (1 << IRQ_U_SOFT) -#define MIP_SSIP (1 << IRQ_S_SOFT) -#define MIP_VSSIP (1 << IRQ_VS_SOFT) -#define MIP_MSIP (1 << IRQ_M_SOFT) -#define MIP_UTIP (1 << IRQ_U_TIMER) -#define MIP_STIP (1 << IRQ_S_TIMER) -#define MIP_VSTIP (1 << IRQ_VS_TIMER) -#define MIP_MTIP (1 << IRQ_M_TIMER) -#define MIP_UEIP (1 << IRQ_U_EXT) -#define MIP_SEIP (1 << IRQ_S_EXT) -#define MIP_VSEIP (1 << IRQ_VS_EXT) -#define MIP_MEIP (1 << IRQ_M_EXT) -#define MIP_SGEIP (1 << IRQ_S_GEXT) -#define MIP_LCOFIP (1 << IRQ_LCOF) - -#define MIP_S_MASK (MIP_SSIP | MIP_STIP | MIP_SEIP) -#define MIP_VS_MASK (MIP_VSSIP | MIP_VSTIP | MIP_VSEIP) -#define MIP_HS_MASK (MIP_VS_MASK | MIP_SGEIP) - -#define MIDELEG_FORCED_MASK MIP_HS_MASK - -#define SIP_SSIP MIP_SSIP -#define SIP_STIP MIP_STIP - -#define MENVCFG_FIOM 0x00000001 -#define MENVCFG_CBIE 0x00000030 -#define MENVCFG_CBCFE 0x00000040 -#define MENVCFG_CBZE 0x00000080 -#define MENVCFG_HADE 0x2000000000000000 -#define MENVCFG_PBMTE 0x4000000000000000 -#define MENVCFG_STCE 0x8000000000000000 - -#define MENVCFGH_HADE 0x20000000 -#define MENVCFGH_PBMTE 0x40000000 -#define MENVCFGH_STCE 0x80000000 - -#define MSTATEEN0_CS 0x00000001 -#define MSTATEEN0_FCSR 0x00000002 -#define MSTATEEN0_JVT 0x00000004 -#define MSTATEEN0_HCONTEXT 0x0200000000000000 -#define MSTATEEN0_HENVCFG 0x4000000000000000 -#define MSTATEEN_HSTATEEN 0x8000000000000000 - -#define MSTATEEN0H_HCONTEXT 0x02000000 -#define MSTATEEN0H_HENVCFG 0x40000000 -#define MSTATEENH_HSTATEEN 0x80000000 - -#define MHPMEVENT_VUINH 0x0400000000000000 -#define MHPMEVENT_VSINH 0x0800000000000000 -#define MHPMEVENT_UINH 0x1000000000000000 -#define MHPMEVENT_SINH 0x2000000000000000 -#define MHPMEVENT_MINH 0x4000000000000000 -#define MHPMEVENT_OF 0x8000000000000000 - -#define MHPMEVENTH_VUINH 0x04000000 -#define MHPMEVENTH_VSINH 0x08000000 -#define MHPMEVENTH_UINH 0x10000000 -#define MHPMEVENTH_SINH 0x20000000 -#define MHPMEVENTH_MINH 0x40000000 -#define MHPMEVENTH_OF 0x80000000 - -#define HENVCFG_FIOM 0x00000001 -#define HENVCFG_CBIE 0x00000030 -#define HENVCFG_CBCFE 0x00000040 -#define HENVCFG_CBZE 0x00000080 -#define HENVCFG_HADE 0x2000000000000000 -#define HENVCFG_PBMTE 0x4000000000000000 -#define HENVCFG_STCE 0x8000000000000000 - -#define HENVCFGH_HADE 0x20000000 -#define HENVCFGH_PBMTE 0x40000000 -#define HENVCFGH_STCE 0x80000000 - -#define HSTATEEN0_CS 0x00000001 -#define HSTATEEN0_FCSR 0x00000002 -#define HSTATEEN0_JVT 0x00000004 -#define HSTATEEN0_SCONTEXT 0x0200000000000000 -#define HSTATEEN0_SENVCFG 0x4000000000000000 -#define HSTATEEN_SSTATEEN 0x8000000000000000 - -#define HSTATEEN0H_SCONTEXT 0x02000000 -#define HSTATEEN0H_SENVCFG 0x40000000 -#define HSTATEENH_SSTATEEN 0x80000000 - -#define SENVCFG_FIOM 0x00000001 -#define SENVCFG_CBIE 0x00000030 -#define SENVCFG_CBCFE 0x00000040 -#define SENVCFG_CBZE 0x00000080 - -#define SSTATEEN0_CS 0x00000001 -#define SSTATEEN0_FCSR 0x00000002 -#define SSTATEEN0_JVT 0x00000004 - -#define MSECCFG_MML 0x00000001 -#define MSECCFG_MMWP 0x00000002 -#define MSECCFG_RLB 0x00000004 -#define MSECCFG_USEED 0x00000100 -#define MSECCFG_SSEED 0x00000200 - -/* jvt fields */ -#define JVT_MODE 0x3F -#define JVT_BASE (~0x3F) - -#define PRV_U 0 -#define PRV_S 1 -#define PRV_M 3 - -#define PRV_HS (PRV_S + 1) - -#define SATP32_MODE 0x80000000 -#define SATP32_ASID 0x7FC00000 -#define SATP32_PPN 0x003FFFFF -#define SATP64_MODE 0xF000000000000000 -#define SATP64_ASID 0x0FFFF00000000000 -#define SATP64_PPN 0x00000FFFFFFFFFFF - -#define SATP_MODE_OFF 0 -#define SATP_MODE_SV32 1 -#define SATP_MODE_SV39 8 -#define SATP_MODE_SV48 9 -#define SATP_MODE_SV57 10 -#define SATP_MODE_SV64 11 - -#define HGATP32_MODE 0x80000000 -#define HGATP32_VMID 0x1FC00000 -#define HGATP32_PPN 0x003FFFFF - -#define HGATP64_MODE 0xF000000000000000 -#define HGATP64_VMID 0x03FFF00000000000 -#define HGATP64_PPN 0x00000FFFFFFFFFFF - -#define HGATP_MODE_OFF 0 -#define HGATP_MODE_SV32X4 1 -#define HGATP_MODE_SV39X4 8 -#define HGATP_MODE_SV48X4 9 -#define HGATP_MODE_SV57X4 10 - -#define PMP_R 0x01 -#define PMP_W 0x02 -#define PMP_X 0x04 -#define PMP_A 0x18 -#define PMP_L 0x80 -#define PMP_SHIFT 2 - -#define PMP_TOR 0x08 -#define PMP_NA4 0x10 -#define PMP_NAPOT 0x18 - -#define IRQ_U_SOFT 0 -#define IRQ_S_SOFT 1 -#define IRQ_VS_SOFT 2 -#define IRQ_M_SOFT 3 -#define IRQ_U_TIMER 4 -#define IRQ_S_TIMER 5 -#define IRQ_VS_TIMER 6 -#define IRQ_M_TIMER 7 -#define IRQ_U_EXT 8 -#define IRQ_S_EXT 9 -#define IRQ_VS_EXT 10 -#define IRQ_M_EXT 11 -#define IRQ_S_GEXT 12 -#define IRQ_COP 12 -#define IRQ_LCOF 13 - -#define DEFAULT_RSTVEC 0x00001000 -#define CLINT_BASE 0x02000000 -#define CLINT_SIZE 0x000c0000 -#define EXT_IO_BASE 0x40000000 -#define DRAM_BASE 0x80000000 - -/* page table entry (PTE) fields */ -#define PTE_V 0x001 /* Valid */ -#define PTE_R 0x002 /* Read */ -#define PTE_W 0x004 /* Write */ -#define PTE_X 0x008 /* Execute */ -#define PTE_U 0x010 /* User */ -#define PTE_G 0x020 /* Global */ -#define PTE_A 0x040 /* Accessed */ -#define PTE_D 0x080 /* Dirty */ -#define PTE_SOFT 0x300 /* Reserved for Software */ -#define PTE_RSVD 0x1FC0000000000000 /* Reserved for future standard use */ -#define PTE_PBMT 0x6000000000000000 /* Svpbmt: Page-based memory types */ -#define PTE_N 0x8000000000000000 /* Svnapot: NAPOT translation contiguity */ -#define PTE_ATTR 0xFFC0000000000000 /* All attributes and reserved bits */ - -#define PTE_PPN_SHIFT 10 - -#define PTE_TABLE(PTE) (((PTE) & (PTE_V | PTE_R | PTE_W | PTE_X)) == PTE_V) - -#ifdef __riscv - -#if __riscv_xlen == 64 -#define MSTATUS_SD MSTATUS64_SD -#define SSTATUS_SD SSTATUS64_SD -#define RISCV_PGLEVEL_BITS 9 -#define SATP_MODE SATP64_MODE -#else -#define MSTATUS_SD MSTATUS32_SD -#define SSTATUS_SD SSTATUS32_SD -#define RISCV_PGLEVEL_BITS 10 -#define SATP_MODE SATP32_MODE -#endif -#define RISCV_PGSHIFT 12 -#define RISCV_PGSIZE (1 << RISCV_PGSHIFT) - -#ifndef __ASSEMBLER__ - -#ifdef __GNUC__ - -#define read_csr(reg) \ - ({ \ - unsigned long __tmp; \ - asm volatile("csrr %0, " #reg : "=r"(__tmp)); \ - __tmp; \ - }) - -#define write_csr(reg, val) ({ asm volatile("csrw " #reg ", %0" ::"rK"(val)); }) - -#define swap_csr(reg, val) \ - ({ \ - unsigned long __tmp; \ - asm volatile("csrrw %0, " #reg ", %1" : "=r"(__tmp) : "rK"(val)); \ - __tmp; \ - }) - -#define set_csr(reg, bit) \ - ({ \ - unsigned long __tmp; \ - asm volatile("csrrs %0, " #reg ", %1" : "=r"(__tmp) : "rK"(bit)); \ - __tmp; \ - }) - -#define clear_csr(reg, bit) \ - ({ \ - unsigned long __tmp; \ - asm volatile("csrrc %0, " #reg ", %1" : "=r"(__tmp) : "rK"(bit)); \ - __tmp; \ - }) - -#define rdtime() read_csr(time) -#define rdcycle() read_csr(cycle) -#define rdinstret() read_csr(instret) - -#endif - -#endif - -#endif - -#endif - -/* Automatically generated by parse_opcodes. */ -#ifndef RISCV_ENCODING_H -#define RISCV_ENCODING_H -#define MATCH_ADD 0x33 -#define MASK_ADD 0xfe00707f -#define MATCH_ADD16 0x40000077 -#define MASK_ADD16 0xfe00707f -#define MATCH_ADD32 0x40002077 -#define MASK_ADD32 0xfe00707f -#define MATCH_ADD64 0xc0001077 -#define MASK_ADD64 0xfe00707f -#define MATCH_ADD8 0x48000077 -#define MASK_ADD8 0xfe00707f -#define MATCH_ADD_UW 0x800003b -#define MASK_ADD_UW 0xfe00707f -#define MATCH_ADDI 0x13 -#define MASK_ADDI 0x707f -#define MATCH_ADDIW 0x1b -#define MASK_ADDIW 0x707f -#define MATCH_ADDW 0x3b -#define MASK_ADDW 0xfe00707f -#define MATCH_AES32DSI 0x2a000033 -#define MASK_AES32DSI 0x3e00707f -#define MATCH_AES32DSMI 0x2e000033 -#define MASK_AES32DSMI 0x3e00707f -#define MATCH_AES32ESI 0x22000033 -#define MASK_AES32ESI 0x3e00707f -#define MATCH_AES32ESMI 0x26000033 -#define MASK_AES32ESMI 0x3e00707f -#define MATCH_AES64DS 0x3a000033 -#define MASK_AES64DS 0xfe00707f -#define MATCH_AES64DSM 0x3e000033 -#define MASK_AES64DSM 0xfe00707f -#define MATCH_AES64ES 0x32000033 -#define MASK_AES64ES 0xfe00707f -#define MATCH_AES64ESM 0x36000033 -#define MASK_AES64ESM 0xfe00707f -#define MATCH_AES64IM 0x30001013 -#define MASK_AES64IM 0xfff0707f -#define MATCH_AES64KS1I 0x31001013 -#define MASK_AES64KS1I 0xff00707f -#define MATCH_AES64KS2 0x7e000033 -#define MASK_AES64KS2 0xfe00707f -#define MATCH_AMOADD_D 0x302f -#define MASK_AMOADD_D 0xf800707f -#define MATCH_AMOADD_W 0x202f -#define MASK_AMOADD_W 0xf800707f -#define MATCH_AMOAND_D 0x6000302f -#define MASK_AMOAND_D 0xf800707f -#define MATCH_AMOAND_W 0x6000202f -#define MASK_AMOAND_W 0xf800707f -#define MATCH_AMOMAX_D 0xa000302f -#define MASK_AMOMAX_D 0xf800707f -#define MATCH_AMOMAX_W 0xa000202f -#define MASK_AMOMAX_W 0xf800707f -#define MATCH_AMOMAXU_D 0xe000302f -#define MASK_AMOMAXU_D 0xf800707f -#define MATCH_AMOMAXU_W 0xe000202f -#define MASK_AMOMAXU_W 0xf800707f -#define MATCH_AMOMIN_D 0x8000302f -#define MASK_AMOMIN_D 0xf800707f -#define MATCH_AMOMIN_W 0x8000202f -#define MASK_AMOMIN_W 0xf800707f -#define MATCH_AMOMINU_D 0xc000302f -#define MASK_AMOMINU_D 0xf800707f -#define MATCH_AMOMINU_W 0xc000202f -#define MASK_AMOMINU_W 0xf800707f -#define MATCH_AMOOR_D 0x4000302f -#define MASK_AMOOR_D 0xf800707f -#define MATCH_AMOOR_W 0x4000202f -#define MASK_AMOOR_W 0xf800707f -#define MATCH_AMOSWAP_D 0x800302f -#define MASK_AMOSWAP_D 0xf800707f -#define MATCH_AMOSWAP_W 0x800202f -#define MASK_AMOSWAP_W 0xf800707f -#define MATCH_AMOXOR_D 0x2000302f -#define MASK_AMOXOR_D 0xf800707f -#define MATCH_AMOXOR_W 0x2000202f -#define MASK_AMOXOR_W 0xf800707f -#define MATCH_AND 0x7033 -#define MASK_AND 0xfe00707f -#define MATCH_ANDI 0x7013 -#define MASK_ANDI 0x707f -#define MATCH_ANDN 0x40007033 -#define MASK_ANDN 0xfe00707f -#define MATCH_AUIPC 0x17 -#define MASK_AUIPC 0x7f -#define MATCH_AVE 0xe0000077 -#define MASK_AVE 0xfe00707f -#define MATCH_BCLR 0x48001033 -#define MASK_BCLR 0xfe00707f -#define MATCH_BCLRI 0x48001013 -#define MASK_BCLRI 0xfc00707f -#define MATCH_BCOMPRESS 0x8006033 -#define MASK_BCOMPRESS 0xfe00707f -#define MATCH_BCOMPRESSW 0x800603b -#define MASK_BCOMPRESSW 0xfe00707f -#define MATCH_BDECOMPRESS 0x48006033 -#define MASK_BDECOMPRESS 0xfe00707f -#define MATCH_BDECOMPRESSW 0x4800603b -#define MASK_BDECOMPRESSW 0xfe00707f -#define MATCH_BEQ 0x63 -#define MASK_BEQ 0x707f -#define MATCH_BEXT 0x48005033 -#define MASK_BEXT 0xfe00707f -#define MATCH_BEXTI 0x48005013 -#define MASK_BEXTI 0xfc00707f -#define MATCH_BFP 0x48007033 -#define MASK_BFP 0xfe00707f -#define MATCH_BFPW 0x4800703b -#define MASK_BFPW 0xfe00707f -#define MATCH_BGE 0x5063 -#define MASK_BGE 0x707f -#define MATCH_BGEU 0x7063 -#define MASK_BGEU 0x707f -#define MATCH_BINV 0x68001033 -#define MASK_BINV 0xfe00707f -#define MATCH_BINVI 0x68001013 -#define MASK_BINVI 0xfc00707f -#define MATCH_BLT 0x4063 -#define MASK_BLT 0x707f -#define MATCH_BLTU 0x6063 -#define MASK_BLTU 0x707f -#define MATCH_BMATFLIP 0x60301013 -#define MASK_BMATFLIP 0xfff0707f -#define MATCH_BMATOR 0x8003033 -#define MASK_BMATOR 0xfe00707f -#define MATCH_BMATXOR 0x48003033 -#define MASK_BMATXOR 0xfe00707f -#define MATCH_BNE 0x1063 -#define MASK_BNE 0x707f -#define MATCH_BSET 0x28001033 -#define MASK_BSET 0xfe00707f -#define MATCH_BSETI 0x28001013 -#define MASK_BSETI 0xfc00707f -#define MATCH_C_ADD 0x9002 -#define MASK_C_ADD 0xf003 -#define MATCH_C_ADDI 0x1 -#define MASK_C_ADDI 0xe003 -#define MATCH_C_ADDI16SP 0x6101 -#define MASK_C_ADDI16SP 0xef83 -#define MATCH_C_ADDI4SPN 0x0 -#define MASK_C_ADDI4SPN 0xe003 -#define MATCH_C_ADDIW 0x2001 -#define MASK_C_ADDIW 0xe003 -#define MATCH_C_ADDW 0x9c21 -#define MASK_C_ADDW 0xfc63 -#define MATCH_C_AND 0x8c61 -#define MASK_C_AND 0xfc63 -#define MATCH_C_ANDI 0x8801 -#define MASK_C_ANDI 0xec03 -#define MATCH_C_BEQZ 0xc001 -#define MASK_C_BEQZ 0xe003 -#define MATCH_C_BNEZ 0xe001 -#define MASK_C_BNEZ 0xe003 -#define MATCH_C_EBREAK 0x9002 -#define MASK_C_EBREAK 0xffff -#define MATCH_C_FLD 0x2000 -#define MASK_C_FLD 0xe003 -#define MATCH_C_FLDSP 0x2002 -#define MASK_C_FLDSP 0xe003 -#define MATCH_C_FLW 0x6000 -#define MASK_C_FLW 0xe003 -#define MATCH_C_FLWSP 0x6002 -#define MASK_C_FLWSP 0xe003 -#define MATCH_C_FSD 0xa000 -#define MASK_C_FSD 0xe003 -#define MATCH_C_FSDSP 0xa002 -#define MASK_C_FSDSP 0xe003 -#define MATCH_C_FSW 0xe000 -#define MASK_C_FSW 0xe003 -#define MATCH_C_FSWSP 0xe002 -#define MASK_C_FSWSP 0xe003 -#define MATCH_C_J 0xa001 -#define MASK_C_J 0xe003 -#define MATCH_C_JAL 0x2001 -#define MASK_C_JAL 0xe003 -#define MATCH_C_JALR 0x9002 -#define MASK_C_JALR 0xf07f -#define MATCH_C_JR 0x8002 -#define MASK_C_JR 0xf07f -#define MATCH_C_LBU 0x8000 -#define MASK_C_LBU 0xfc03 -#define MATCH_C_LD 0x6000 -#define MASK_C_LD 0xe003 -#define MATCH_C_LDSP 0x6002 -#define MASK_C_LDSP 0xe003 -#define MATCH_C_LH 0x8440 -#define MASK_C_LH 0xfc43 -#define MATCH_C_LHU 0x8400 -#define MASK_C_LHU 0xfc43 -#define MATCH_C_LI 0x4001 -#define MASK_C_LI 0xe003 -#define MATCH_C_LUI 0x6001 -#define MASK_C_LUI 0xe003 -#define MATCH_C_LW 0x4000 -#define MASK_C_LW 0xe003 -#define MATCH_C_LWSP 0x4002 -#define MASK_C_LWSP 0xe003 -#define MATCH_C_MUL 0x9c41 -#define MASK_C_MUL 0xfc63 -#define MATCH_C_MV 0x8002 -#define MASK_C_MV 0xf003 -#define MATCH_C_NOP 0x1 -#define MASK_C_NOP 0xef83 -#define MATCH_C_NOT 0x9c75 -#define MASK_C_NOT 0xfc7f -#define MATCH_C_OR 0x8c41 -#define MASK_C_OR 0xfc63 -#define MATCH_C_SB 0x8800 -#define MASK_C_SB 0xfc03 -#define MATCH_C_SD 0xe000 -#define MASK_C_SD 0xe003 -#define MATCH_C_SDSP 0xe002 -#define MASK_C_SDSP 0xe003 -#define MATCH_C_SEXT_B 0x9c65 -#define MASK_C_SEXT_B 0xfc7f -#define MATCH_C_SEXT_H 0x9c6d -#define MASK_C_SEXT_H 0xfc7f -#define MATCH_C_SH 0x8c00 -#define MASK_C_SH 0xfc43 -#define MATCH_C_SLLI 0x2 -#define MASK_C_SLLI 0xe003 -#define MATCH_C_SRAI 0x8401 -#define MASK_C_SRAI 0xec03 -#define MATCH_C_SRLI 0x8001 -#define MASK_C_SRLI 0xec03 -#define MATCH_C_SUB 0x8c01 -#define MASK_C_SUB 0xfc63 -#define MATCH_C_SUBW 0x9c01 -#define MASK_C_SUBW 0xfc63 -#define MATCH_C_SW 0xc000 -#define MASK_C_SW 0xe003 -#define MATCH_C_SWSP 0xc002 -#define MASK_C_SWSP 0xe003 -#define MATCH_C_XOR 0x8c21 -#define MASK_C_XOR 0xfc63 -#define MATCH_C_ZEXT_B 0x9c61 -#define MASK_C_ZEXT_B 0xfc7f -#define MATCH_C_ZEXT_H 0x9c69 -#define MASK_C_ZEXT_H 0xfc7f -#define MATCH_C_ZEXT_W 0x9c71 -#define MASK_C_ZEXT_W 0xfc7f -#define MATCH_CBO_CLEAN 0x10200f -#define MASK_CBO_CLEAN 0xfff07fff -#define MATCH_CBO_FLUSH 0x20200f -#define MASK_CBO_FLUSH 0xfff07fff -#define MATCH_CBO_INVAL 0x200f -#define MASK_CBO_INVAL 0xfff07fff -#define MATCH_CBO_ZERO 0x40200f -#define MASK_CBO_ZERO 0xfff07fff -#define MATCH_CLMUL 0xa001033 -#define MASK_CLMUL 0xfe00707f -#define MATCH_CLMULH 0xa003033 -#define MASK_CLMULH 0xfe00707f -#define MATCH_CLMULR 0xa002033 -#define MASK_CLMULR 0xfe00707f -#define MATCH_CLRS16 0xae800077 -#define MASK_CLRS16 0xfff0707f -#define MATCH_CLRS32 0xaf800077 -#define MASK_CLRS32 0xfff0707f -#define MATCH_CLRS8 0xae000077 -#define MASK_CLRS8 0xfff0707f -#define MATCH_CLZ 0x60001013 -#define MASK_CLZ 0xfff0707f -#define MATCH_CLZ16 0xae900077 -#define MASK_CLZ16 0xfff0707f -#define MATCH_CLZ32 0xaf900077 -#define MASK_CLZ32 0xfff0707f -#define MATCH_CLZ8 0xae100077 -#define MASK_CLZ8 0xfff0707f -#define MATCH_CLZW 0x6000101b -#define MASK_CLZW 0xfff0707f -#define MATCH_CM_JALT 0xa002 -#define MASK_CM_JALT 0xfc03 -#define MATCH_CM_MVA01S 0xac62 -#define MASK_CM_MVA01S 0xfc63 -#define MATCH_CM_MVSA01 0xac22 -#define MASK_CM_MVSA01 0xfc63 -#define MATCH_CM_POP 0xba02 -#define MASK_CM_POP 0xff03 -#define MATCH_CM_POPRET 0xbe02 -#define MASK_CM_POPRET 0xff03 -#define MATCH_CM_POPRETZ 0xbc02 -#define MASK_CM_POPRETZ 0xff03 -#define MATCH_CM_PUSH 0xb802 -#define MASK_CM_PUSH 0xff03 -#define MATCH_CMIX 0x6001033 -#define MASK_CMIX 0x600707f -#define MATCH_CMOV 0x6005033 -#define MASK_CMOV 0x600707f -#define MATCH_CMPEQ16 0x4c000077 -#define MASK_CMPEQ16 0xfe00707f -#define MATCH_CMPEQ8 0x4e000077 -#define MASK_CMPEQ8 0xfe00707f -#define MATCH_CPOP 0x60201013 -#define MASK_CPOP 0xfff0707f -#define MATCH_CPOPW 0x6020101b -#define MASK_CPOPW 0xfff0707f -#define MATCH_CRAS16 0x44000077 -#define MASK_CRAS16 0xfe00707f -#define MATCH_CRAS32 0x44002077 -#define MASK_CRAS32 0xfe00707f -#define MATCH_CRC32_B 0x61001013 -#define MASK_CRC32_B 0xfff0707f -#define MATCH_CRC32_D 0x61301013 -#define MASK_CRC32_D 0xfff0707f -#define MATCH_CRC32_H 0x61101013 -#define MASK_CRC32_H 0xfff0707f -#define MATCH_CRC32_W 0x61201013 -#define MASK_CRC32_W 0xfff0707f -#define MATCH_CRC32C_B 0x61801013 -#define MASK_CRC32C_B 0xfff0707f -#define MATCH_CRC32C_D 0x61b01013 -#define MASK_CRC32C_D 0xfff0707f -#define MATCH_CRC32C_H 0x61901013 -#define MASK_CRC32C_H 0xfff0707f -#define MATCH_CRC32C_W 0x61a01013 -#define MASK_CRC32C_W 0xfff0707f -#define MATCH_CRSA16 0x46000077 -#define MASK_CRSA16 0xfe00707f -#define MATCH_CRSA32 0x46002077 -#define MASK_CRSA32 0xfe00707f -#define MATCH_CSRRC 0x3073 -#define MASK_CSRRC 0x707f -#define MATCH_CSRRCI 0x7073 -#define MASK_CSRRCI 0x707f -#define MATCH_CSRRS 0x2073 -#define MASK_CSRRS 0x707f -#define MATCH_CSRRSI 0x6073 -#define MASK_CSRRSI 0x707f -#define MATCH_CSRRW 0x1073 -#define MASK_CSRRW 0x707f -#define MATCH_CSRRWI 0x5073 -#define MASK_CSRRWI 0x707f -#define MATCH_CTZ 0x60101013 -#define MASK_CTZ 0xfff0707f -#define MATCH_CTZW 0x6010101b -#define MASK_CTZW 0xfff0707f -#define MATCH_CZERO_EQZ 0xe005033 -#define MASK_CZERO_EQZ 0xfe00707f -#define MATCH_CZERO_NEZ 0xe007033 -#define MASK_CZERO_NEZ 0xfe00707f -#define MATCH_DIV 0x2004033 -#define MASK_DIV 0xfe00707f -#define MATCH_DIVU 0x2005033 -#define MASK_DIVU 0xfe00707f -#define MATCH_DIVUW 0x200503b -#define MASK_DIVUW 0xfe00707f -#define MATCH_DIVW 0x200403b -#define MASK_DIVW 0xfe00707f -#define MATCH_DRET 0x7b200073 -#define MASK_DRET 0xffffffff -#define MATCH_EBREAK 0x100073 -#define MASK_EBREAK 0xffffffff -#define MATCH_ECALL 0x73 -#define MASK_ECALL 0xffffffff -#define MATCH_FADD_D 0x2000053 -#define MASK_FADD_D 0xfe00007f -#define MATCH_FADD_H 0x4000053 -#define MASK_FADD_H 0xfe00007f -#define MATCH_FADD_Q 0x6000053 -#define MASK_FADD_Q 0xfe00007f -#define MATCH_FADD_S 0x53 -#define MASK_FADD_S 0xfe00007f -#define MATCH_FCLASS_D 0xe2001053 -#define MASK_FCLASS_D 0xfff0707f -#define MATCH_FCLASS_H 0xe4001053 -#define MASK_FCLASS_H 0xfff0707f -#define MATCH_FCLASS_Q 0xe6001053 -#define MASK_FCLASS_Q 0xfff0707f -#define MATCH_FCLASS_S 0xe0001053 -#define MASK_FCLASS_S 0xfff0707f -#define MATCH_FCVT_D_H 0x42200053 -#define MASK_FCVT_D_H 0xfff0007f -#define MATCH_FCVT_D_L 0xd2200053 -#define MASK_FCVT_D_L 0xfff0007f -#define MATCH_FCVT_D_LU 0xd2300053 -#define MASK_FCVT_D_LU 0xfff0007f -#define MATCH_FCVT_D_Q 0x42300053 -#define MASK_FCVT_D_Q 0xfff0007f -#define MATCH_FCVT_D_S 0x42000053 -#define MASK_FCVT_D_S 0xfff0007f -#define MATCH_FCVT_D_W 0xd2000053 -#define MASK_FCVT_D_W 0xfff0007f -#define MATCH_FCVT_D_WU 0xd2100053 -#define MASK_FCVT_D_WU 0xfff0007f -#define MATCH_FCVT_H_D 0x44100053 -#define MASK_FCVT_H_D 0xfff0007f -#define MATCH_FCVT_H_L 0xd4200053 -#define MASK_FCVT_H_L 0xfff0007f -#define MATCH_FCVT_H_LU 0xd4300053 -#define MASK_FCVT_H_LU 0xfff0007f -#define MATCH_FCVT_H_Q 0x44300053 -#define MASK_FCVT_H_Q 0xfff0007f -#define MATCH_FCVT_H_S 0x44000053 -#define MASK_FCVT_H_S 0xfff0007f -#define MATCH_FCVT_H_W 0xd4000053 -#define MASK_FCVT_H_W 0xfff0007f -#define MATCH_FCVT_H_WU 0xd4100053 -#define MASK_FCVT_H_WU 0xfff0007f -#define MATCH_FCVT_L_D 0xc2200053 -#define MASK_FCVT_L_D 0xfff0007f -#define MATCH_FCVT_L_H 0xc4200053 -#define MASK_FCVT_L_H 0xfff0007f -#define MATCH_FCVT_L_Q 0xc6200053 -#define MASK_FCVT_L_Q 0xfff0007f -#define MATCH_FCVT_L_S 0xc0200053 -#define MASK_FCVT_L_S 0xfff0007f -#define MATCH_FCVT_LU_D 0xc2300053 -#define MASK_FCVT_LU_D 0xfff0007f -#define MATCH_FCVT_LU_H 0xc4300053 -#define MASK_FCVT_LU_H 0xfff0007f -#define MATCH_FCVT_LU_Q 0xc6300053 -#define MASK_FCVT_LU_Q 0xfff0007f -#define MATCH_FCVT_LU_S 0xc0300053 -#define MASK_FCVT_LU_S 0xfff0007f -#define MATCH_FCVT_Q_D 0x46100053 -#define MASK_FCVT_Q_D 0xfff0007f -#define MATCH_FCVT_Q_H 0x46200053 -#define MASK_FCVT_Q_H 0xfff0007f -#define MATCH_FCVT_Q_L 0xd6200053 -#define MASK_FCVT_Q_L 0xfff0007f -#define MATCH_FCVT_Q_LU 0xd6300053 -#define MASK_FCVT_Q_LU 0xfff0007f -#define MATCH_FCVT_Q_S 0x46000053 -#define MASK_FCVT_Q_S 0xfff0007f -#define MATCH_FCVT_Q_W 0xd6000053 -#define MASK_FCVT_Q_W 0xfff0007f -#define MATCH_FCVT_Q_WU 0xd6100053 -#define MASK_FCVT_Q_WU 0xfff0007f -#define MATCH_FCVT_S_D 0x40100053 -#define MASK_FCVT_S_D 0xfff0007f -#define MATCH_FCVT_S_H 0x40200053 -#define MASK_FCVT_S_H 0xfff0007f -#define MATCH_FCVT_S_L 0xd0200053 -#define MASK_FCVT_S_L 0xfff0007f -#define MATCH_FCVT_S_LU 0xd0300053 -#define MASK_FCVT_S_LU 0xfff0007f -#define MATCH_FCVT_S_Q 0x40300053 -#define MASK_FCVT_S_Q 0xfff0007f -#define MATCH_FCVT_S_W 0xd0000053 -#define MASK_FCVT_S_W 0xfff0007f -#define MATCH_FCVT_S_WU 0xd0100053 -#define MASK_FCVT_S_WU 0xfff0007f -#define MATCH_FCVT_W_D 0xc2000053 -#define MASK_FCVT_W_D 0xfff0007f -#define MATCH_FCVT_W_H 0xc4000053 -#define MASK_FCVT_W_H 0xfff0007f -#define MATCH_FCVT_W_Q 0xc6000053 -#define MASK_FCVT_W_Q 0xfff0007f -#define MATCH_FCVT_W_S 0xc0000053 -#define MASK_FCVT_W_S 0xfff0007f -#define MATCH_FCVT_WU_D 0xc2100053 -#define MASK_FCVT_WU_D 0xfff0007f -#define MATCH_FCVT_WU_H 0xc4100053 -#define MASK_FCVT_WU_H 0xfff0007f -#define MATCH_FCVT_WU_Q 0xc6100053 -#define MASK_FCVT_WU_Q 0xfff0007f -#define MATCH_FCVT_WU_S 0xc0100053 -#define MASK_FCVT_WU_S 0xfff0007f -#define MATCH_FDIV_D 0x1a000053 -#define MASK_FDIV_D 0xfe00007f -#define MATCH_FDIV_H 0x1c000053 -#define MASK_FDIV_H 0xfe00007f -#define MATCH_FDIV_Q 0x1e000053 -#define MASK_FDIV_Q 0xfe00007f -#define MATCH_FDIV_S 0x18000053 -#define MASK_FDIV_S 0xfe00007f -#define MATCH_FENCE 0xf -#define MASK_FENCE 0x707f -#define MATCH_FENCE_I 0x100f -#define MASK_FENCE_I 0x707f -#define MATCH_FEQ_D 0xa2002053 -#define MASK_FEQ_D 0xfe00707f -#define MATCH_FEQ_H 0xa4002053 -#define MASK_FEQ_H 0xfe00707f -#define MATCH_FEQ_Q 0xa6002053 -#define MASK_FEQ_Q 0xfe00707f -#define MATCH_FEQ_S 0xa0002053 -#define MASK_FEQ_S 0xfe00707f -#define MATCH_FLD 0x3007 -#define MASK_FLD 0x707f -#define MATCH_FLE_D 0xa2000053 -#define MASK_FLE_D 0xfe00707f -#define MATCH_FLE_H 0xa4000053 -#define MASK_FLE_H 0xfe00707f -#define MATCH_FLE_Q 0xa6000053 -#define MASK_FLE_Q 0xfe00707f -#define MATCH_FLE_S 0xa0000053 -#define MASK_FLE_S 0xfe00707f -#define MATCH_FLH 0x1007 -#define MASK_FLH 0x707f -#define MATCH_FLQ 0x4007 -#define MASK_FLQ 0x707f -#define MATCH_FLT_D 0xa2001053 -#define MASK_FLT_D 0xfe00707f -#define MATCH_FLT_H 0xa4001053 -#define MASK_FLT_H 0xfe00707f -#define MATCH_FLT_Q 0xa6001053 -#define MASK_FLT_Q 0xfe00707f -#define MATCH_FLT_S 0xa0001053 -#define MASK_FLT_S 0xfe00707f -#define MATCH_FLW 0x2007 -#define MASK_FLW 0x707f -#define MATCH_FMADD_D 0x2000043 -#define MASK_FMADD_D 0x600007f -#define MATCH_FMADD_H 0x4000043 -#define MASK_FMADD_H 0x600007f -#define MATCH_FMADD_Q 0x6000043 -#define MASK_FMADD_Q 0x600007f -#define MATCH_FMADD_S 0x43 -#define MASK_FMADD_S 0x600007f -#define MATCH_FMAX_D 0x2a001053 -#define MASK_FMAX_D 0xfe00707f -#define MATCH_FMAX_H 0x2c001053 -#define MASK_FMAX_H 0xfe00707f -#define MATCH_FMAX_Q 0x2e001053 -#define MASK_FMAX_Q 0xfe00707f -#define MATCH_FMAX_S 0x28001053 -#define MASK_FMAX_S 0xfe00707f -#define MATCH_FMIN_D 0x2a000053 -#define MASK_FMIN_D 0xfe00707f -#define MATCH_FMIN_H 0x2c000053 -#define MASK_FMIN_H 0xfe00707f -#define MATCH_FMIN_Q 0x2e000053 -#define MASK_FMIN_Q 0xfe00707f -#define MATCH_FMIN_S 0x28000053 -#define MASK_FMIN_S 0xfe00707f -#define MATCH_FMSUB_D 0x2000047 -#define MASK_FMSUB_D 0x600007f -#define MATCH_FMSUB_H 0x4000047 -#define MASK_FMSUB_H 0x600007f -#define MATCH_FMSUB_Q 0x6000047 -#define MASK_FMSUB_Q 0x600007f -#define MATCH_FMSUB_S 0x47 -#define MASK_FMSUB_S 0x600007f -#define MATCH_FMUL_D 0x12000053 -#define MASK_FMUL_D 0xfe00007f -#define MATCH_FMUL_H 0x14000053 -#define MASK_FMUL_H 0xfe00007f -#define MATCH_FMUL_Q 0x16000053 -#define MASK_FMUL_Q 0xfe00007f -#define MATCH_FMUL_S 0x10000053 -#define MASK_FMUL_S 0xfe00007f -#define MATCH_FMV_D_X 0xf2000053 -#define MASK_FMV_D_X 0xfff0707f -#define MATCH_FMV_H_X 0xf4000053 -#define MASK_FMV_H_X 0xfff0707f -#define MATCH_FMV_W_X 0xf0000053 -#define MASK_FMV_W_X 0xfff0707f -#define MATCH_FMV_X_D 0xe2000053 -#define MASK_FMV_X_D 0xfff0707f -#define MATCH_FMV_X_H 0xe4000053 -#define MASK_FMV_X_H 0xfff0707f -#define MATCH_FMV_X_W 0xe0000053 -#define MASK_FMV_X_W 0xfff0707f -#define MATCH_FNMADD_D 0x200004f -#define MASK_FNMADD_D 0x600007f -#define MATCH_FNMADD_H 0x400004f -#define MASK_FNMADD_H 0x600007f -#define MATCH_FNMADD_Q 0x600004f -#define MASK_FNMADD_Q 0x600007f -#define MATCH_FNMADD_S 0x4f -#define MASK_FNMADD_S 0x600007f -#define MATCH_FNMSUB_D 0x200004b -#define MASK_FNMSUB_D 0x600007f -#define MATCH_FNMSUB_H 0x400004b -#define MASK_FNMSUB_H 0x600007f -#define MATCH_FNMSUB_Q 0x600004b -#define MASK_FNMSUB_Q 0x600007f -#define MATCH_FNMSUB_S 0x4b -#define MASK_FNMSUB_S 0x600007f -#define MATCH_FSD 0x3027 -#define MASK_FSD 0x707f -#define MATCH_FSGNJ_D 0x22000053 -#define MASK_FSGNJ_D 0xfe00707f -#define MATCH_FSGNJ_H 0x24000053 -#define MASK_FSGNJ_H 0xfe00707f -#define MATCH_FSGNJ_Q 0x26000053 -#define MASK_FSGNJ_Q 0xfe00707f -#define MATCH_FSGNJ_S 0x20000053 -#define MASK_FSGNJ_S 0xfe00707f -#define MATCH_FSGNJN_D 0x22001053 -#define MASK_FSGNJN_D 0xfe00707f -#define MATCH_FSGNJN_H 0x24001053 -#define MASK_FSGNJN_H 0xfe00707f -#define MATCH_FSGNJN_Q 0x26001053 -#define MASK_FSGNJN_Q 0xfe00707f -#define MATCH_FSGNJN_S 0x20001053 -#define MASK_FSGNJN_S 0xfe00707f -#define MATCH_FSGNJX_D 0x22002053 -#define MASK_FSGNJX_D 0xfe00707f -#define MATCH_FSGNJX_H 0x24002053 -#define MASK_FSGNJX_H 0xfe00707f -#define MATCH_FSGNJX_Q 0x26002053 -#define MASK_FSGNJX_Q 0xfe00707f -#define MATCH_FSGNJX_S 0x20002053 -#define MASK_FSGNJX_S 0xfe00707f -#define MATCH_FSH 0x1027 -#define MASK_FSH 0x707f -#define MATCH_FSL 0x4001033 -#define MASK_FSL 0x600707f -#define MATCH_FSLW 0x400103b -#define MASK_FSLW 0x600707f -#define MATCH_FSQ 0x4027 -#define MASK_FSQ 0x707f -#define MATCH_FSQRT_D 0x5a000053 -#define MASK_FSQRT_D 0xfff0007f -#define MATCH_FSQRT_H 0x5c000053 -#define MASK_FSQRT_H 0xfff0007f -#define MATCH_FSQRT_Q 0x5e000053 -#define MASK_FSQRT_Q 0xfff0007f -#define MATCH_FSQRT_S 0x58000053 -#define MASK_FSQRT_S 0xfff0007f -#define MATCH_FSR 0x4005033 -#define MASK_FSR 0x600707f -#define MATCH_FSRI 0x4005013 -#define MASK_FSRI 0x400707f -#define MATCH_FSRIW 0x400501b -#define MASK_FSRIW 0x600707f -#define MATCH_FSRW 0x400503b -#define MASK_FSRW 0x600707f -#define MATCH_FSUB_D 0xa000053 -#define MASK_FSUB_D 0xfe00007f -#define MATCH_FSUB_H 0xc000053 -#define MASK_FSUB_H 0xfe00007f -#define MATCH_FSUB_Q 0xe000053 -#define MASK_FSUB_Q 0xfe00007f -#define MATCH_FSUB_S 0x8000053 -#define MASK_FSUB_S 0xfe00007f -#define MATCH_FSW 0x2027 -#define MASK_FSW 0x707f -#define MATCH_GORC 0x28005033 -#define MASK_GORC 0xfe00707f -#define MATCH_GORCI 0x28005013 -#define MASK_GORCI 0xfc00707f -#define MATCH_GORCIW 0x2800501b -#define MASK_GORCIW 0xfe00707f -#define MATCH_GORCW 0x2800503b -#define MASK_GORCW 0xfe00707f -#define MATCH_GREV 0x68005033 -#define MASK_GREV 0xfe00707f -#define MATCH_GREVI 0x68005013 -#define MASK_GREVI 0xfc00707f -#define MATCH_GREVIW 0x6800501b -#define MASK_GREVIW 0xfe00707f -#define MATCH_GREVW 0x6800503b -#define MASK_GREVW 0xfe00707f -#define MATCH_HFENCE_GVMA 0x62000073 -#define MASK_HFENCE_GVMA 0xfe007fff -#define MATCH_HFENCE_VVMA 0x22000073 -#define MASK_HFENCE_VVMA 0xfe007fff -#define MATCH_HINVAL_GVMA 0x66000073 -#define MASK_HINVAL_GVMA 0xfe007fff -#define MATCH_HINVAL_VVMA 0x26000073 -#define MASK_HINVAL_VVMA 0xfe007fff -#define MATCH_HLV_B 0x60004073 -#define MASK_HLV_B 0xfff0707f -#define MATCH_HLV_BU 0x60104073 -#define MASK_HLV_BU 0xfff0707f -#define MATCH_HLV_D 0x6c004073 -#define MASK_HLV_D 0xfff0707f -#define MATCH_HLV_H 0x64004073 -#define MASK_HLV_H 0xfff0707f -#define MATCH_HLV_HU 0x64104073 -#define MASK_HLV_HU 0xfff0707f -#define MATCH_HLV_W 0x68004073 -#define MASK_HLV_W 0xfff0707f -#define MATCH_HLV_WU 0x68104073 -#define MASK_HLV_WU 0xfff0707f -#define MATCH_HLVX_HU 0x64304073 -#define MASK_HLVX_HU 0xfff0707f -#define MATCH_HLVX_WU 0x68304073 -#define MASK_HLVX_WU 0xfff0707f -#define MATCH_HSV_B 0x62004073 -#define MASK_HSV_B 0xfe007fff -#define MATCH_HSV_D 0x6e004073 -#define MASK_HSV_D 0xfe007fff -#define MATCH_HSV_H 0x66004073 -#define MASK_HSV_H 0xfe007fff -#define MATCH_HSV_W 0x6a004073 -#define MASK_HSV_W 0xfe007fff -#define MATCH_INSB 0xac000077 -#define MASK_INSB 0xff80707f -#define MATCH_JAL 0x6f -#define MASK_JAL 0x7f -#define MATCH_JALR 0x67 -#define MASK_JALR 0x707f -#define MATCH_KABS16 0xad100077 -#define MASK_KABS16 0xfff0707f -#define MATCH_KABS32 0xad200077 -#define MASK_KABS32 0xfff0707f -#define MATCH_KABS8 0xad000077 -#define MASK_KABS8 0xfff0707f -#define MATCH_KABSW 0xad400077 -#define MASK_KABSW 0xfff0707f -#define MATCH_KADD16 0x10000077 -#define MASK_KADD16 0xfe00707f -#define MATCH_KADD32 0x10002077 -#define MASK_KADD32 0xfe00707f -#define MATCH_KADD64 0x90001077 -#define MASK_KADD64 0xfe00707f -#define MATCH_KADD8 0x18000077 -#define MASK_KADD8 0xfe00707f -#define MATCH_KADDH 0x4001077 -#define MASK_KADDH 0xfe00707f -#define MATCH_KADDW 0x1077 -#define MASK_KADDW 0xfe00707f -#define MATCH_KCRAS16 0x14000077 -#define MASK_KCRAS16 0xfe00707f -#define MATCH_KCRAS32 0x14002077 -#define MASK_KCRAS32 0xfe00707f -#define MATCH_KCRSA16 0x16000077 -#define MASK_KCRSA16 0xfe00707f -#define MATCH_KCRSA32 0x16002077 -#define MASK_KCRSA32 0xfe00707f -#define MATCH_KDMABB 0xd2001077 -#define MASK_KDMABB 0xfe00707f -#define MATCH_KDMABB16 0xd8001077 -#define MASK_KDMABB16 0xfe00707f -#define MATCH_KDMABT 0xe2001077 -#define MASK_KDMABT 0xfe00707f -#define MATCH_KDMABT16 0xe8001077 -#define MASK_KDMABT16 0xfe00707f -#define MATCH_KDMATT 0xf2001077 -#define MASK_KDMATT 0xfe00707f -#define MATCH_KDMATT16 0xf8001077 -#define MASK_KDMATT16 0xfe00707f -#define MATCH_KDMBB 0xa001077 -#define MASK_KDMBB 0xfe00707f -#define MATCH_KDMBB16 0xda001077 -#define MASK_KDMBB16 0xfe00707f -#define MATCH_KDMBT 0x1a001077 -#define MASK_KDMBT 0xfe00707f -#define MATCH_KDMBT16 0xea001077 -#define MASK_KDMBT16 0xfe00707f -#define MATCH_KDMTT 0x2a001077 -#define MASK_KDMTT 0xfe00707f -#define MATCH_KDMTT16 0xfa001077 -#define MASK_KDMTT16 0xfe00707f -#define MATCH_KHM16 0x86000077 -#define MASK_KHM16 0xfe00707f -#define MATCH_KHM8 0x8e000077 -#define MASK_KHM8 0xfe00707f -#define MATCH_KHMBB 0xc001077 -#define MASK_KHMBB 0xfe00707f -#define MATCH_KHMBB16 0xdc001077 -#define MASK_KHMBB16 0xfe00707f -#define MATCH_KHMBT 0x1c001077 -#define MASK_KHMBT 0xfe00707f -#define MATCH_KHMBT16 0xec001077 -#define MASK_KHMBT16 0xfe00707f -#define MATCH_KHMTT 0x2c001077 -#define MASK_KHMTT 0xfe00707f -#define MATCH_KHMTT16 0xfc001077 -#define MASK_KHMTT16 0xfe00707f -#define MATCH_KHMX16 0x96000077 -#define MASK_KHMX16 0xfe00707f -#define MATCH_KHMX8 0x9e000077 -#define MASK_KHMX8 0xfe00707f -#define MATCH_KMABB 0x5a001077 -#define MASK_KMABB 0xfe00707f -#define MATCH_KMABB32 0x5a002077 -#define MASK_KMABB32 0xfe00707f -#define MATCH_KMABT 0x6a001077 -#define MASK_KMABT 0xfe00707f -#define MATCH_KMABT32 0x6a002077 -#define MASK_KMABT32 0xfe00707f -#define MATCH_KMADA 0x48001077 -#define MASK_KMADA 0xfe00707f -#define MATCH_KMADRS 0x6c001077 -#define MASK_KMADRS 0xfe00707f -#define MATCH_KMADRS32 0x6c002077 -#define MASK_KMADRS32 0xfe00707f -#define MATCH_KMADS 0x5c001077 -#define MASK_KMADS 0xfe00707f -#define MATCH_KMADS32 0x5c002077 -#define MASK_KMADS32 0xfe00707f -#define MATCH_KMAR64 0x94001077 -#define MASK_KMAR64 0xfe00707f -#define MATCH_KMATT 0x7a001077 -#define MASK_KMATT 0xfe00707f -#define MATCH_KMATT32 0x7a002077 -#define MASK_KMATT32 0xfe00707f -#define MATCH_KMAXDA 0x4a001077 -#define MASK_KMAXDA 0xfe00707f -#define MATCH_KMAXDA32 0x4a002077 -#define MASK_KMAXDA32 0xfe00707f -#define MATCH_KMAXDS 0x7c001077 -#define MASK_KMAXDS 0xfe00707f -#define MATCH_KMAXDS32 0x7c002077 -#define MASK_KMAXDS32 0xfe00707f -#define MATCH_KMDA 0x38001077 -#define MASK_KMDA 0xfe00707f -#define MATCH_KMDA32 0x38002077 -#define MASK_KMDA32 0xfe00707f -#define MATCH_KMMAC 0x60001077 -#define MASK_KMMAC 0xfe00707f -#define MATCH_KMMAC_U 0x70001077 -#define MASK_KMMAC_U 0xfe00707f -#define MATCH_KMMAWB 0x46001077 -#define MASK_KMMAWB 0xfe00707f -#define MATCH_KMMAWB2 0xce001077 -#define MASK_KMMAWB2 0xfe00707f -#define MATCH_KMMAWB2_U 0xde001077 -#define MASK_KMMAWB2_U 0xfe00707f -#define MATCH_KMMAWB_U 0x56001077 -#define MASK_KMMAWB_U 0xfe00707f -#define MATCH_KMMAWT 0x66001077 -#define MASK_KMMAWT 0xfe00707f -#define MATCH_KMMAWT2 0xee001077 -#define MASK_KMMAWT2 0xfe00707f -#define MATCH_KMMAWT2_U 0xfe001077 -#define MASK_KMMAWT2_U 0xfe00707f -#define MATCH_KMMAWT_U 0x76001077 -#define MASK_KMMAWT_U 0xfe00707f -#define MATCH_KMMSB 0x42001077 -#define MASK_KMMSB 0xfe00707f -#define MATCH_KMMSB_U 0x52001077 -#define MASK_KMMSB_U 0xfe00707f -#define MATCH_KMMWB2 0x8e001077 -#define MASK_KMMWB2 0xfe00707f -#define MATCH_KMMWB2_U 0x9e001077 -#define MASK_KMMWB2_U 0xfe00707f -#define MATCH_KMMWT2 0xae001077 -#define MASK_KMMWT2 0xfe00707f -#define MATCH_KMMWT2_U 0xbe001077 -#define MASK_KMMWT2_U 0xfe00707f -#define MATCH_KMSDA 0x4c001077 -#define MASK_KMSDA 0xfe00707f -#define MATCH_KMSDA32 0x4c002077 -#define MASK_KMSDA32 0xfe00707f -#define MATCH_KMSR64 0x96001077 -#define MASK_KMSR64 0xfe00707f -#define MATCH_KMSXDA 0x4e001077 -#define MASK_KMSXDA 0xfe00707f -#define MATCH_KMSXDA32 0x4e002077 -#define MASK_KMSXDA32 0xfe00707f -#define MATCH_KMXDA 0x3a001077 -#define MASK_KMXDA 0xfe00707f -#define MATCH_KMXDA32 0x3a002077 -#define MASK_KMXDA32 0xfe00707f -#define MATCH_KSLL16 0x64000077 -#define MASK_KSLL16 0xfe00707f -#define MATCH_KSLL32 0x64002077 -#define MASK_KSLL32 0xfe00707f -#define MATCH_KSLL8 0x6c000077 -#define MASK_KSLL8 0xfe00707f -#define MATCH_KSLLI16 0x75000077 -#define MASK_KSLLI16 0xff00707f -#define MATCH_KSLLI32 0x84002077 -#define MASK_KSLLI32 0xfe00707f -#define MATCH_KSLLI8 0x7c800077 -#define MASK_KSLLI8 0xff80707f -#define MATCH_KSLLIW 0x36001077 -#define MASK_KSLLIW 0xfe00707f -#define MATCH_KSLLW 0x26001077 -#define MASK_KSLLW 0xfe00707f -#define MATCH_KSLRA16 0x56000077 -#define MASK_KSLRA16 0xfe00707f -#define MATCH_KSLRA16_U 0x66000077 -#define MASK_KSLRA16_U 0xfe00707f -#define MATCH_KSLRA32 0x56002077 -#define MASK_KSLRA32 0xfe00707f -#define MATCH_KSLRA32_U 0x66002077 -#define MASK_KSLRA32_U 0xfe00707f -#define MATCH_KSLRA8 0x5e000077 -#define MASK_KSLRA8 0xfe00707f -#define MATCH_KSLRA8_U 0x6e000077 -#define MASK_KSLRA8_U 0xfe00707f -#define MATCH_KSLRAW 0x6e001077 -#define MASK_KSLRAW 0xfe00707f -#define MATCH_KSLRAW_U 0x7e001077 -#define MASK_KSLRAW_U 0xfe00707f -#define MATCH_KSTAS16 0xc4002077 -#define MASK_KSTAS16 0xfe00707f -#define MATCH_KSTAS32 0xc0002077 -#define MASK_KSTAS32 0xfe00707f -#define MATCH_KSTSA16 0xc6002077 -#define MASK_KSTSA16 0xfe00707f -#define MATCH_KSTSA32 0xc2002077 -#define MASK_KSTSA32 0xfe00707f -#define MATCH_KSUB16 0x12000077 -#define MASK_KSUB16 0xfe00707f -#define MATCH_KSUB32 0x12002077 -#define MASK_KSUB32 0xfe00707f -#define MATCH_KSUB64 0x92001077 -#define MASK_KSUB64 0xfe00707f -#define MATCH_KSUB8 0x1a000077 -#define MASK_KSUB8 0xfe00707f -#define MATCH_KSUBH 0x6001077 -#define MASK_KSUBH 0xfe00707f -#define MATCH_KSUBW 0x2001077 -#define MASK_KSUBW 0xfe00707f -#define MATCH_KWMMUL 0x62001077 -#define MASK_KWMMUL 0xfe00707f -#define MATCH_KWMMUL_U 0x72001077 -#define MASK_KWMMUL_U 0xfe00707f -#define MATCH_LB 0x3 -#define MASK_LB 0x707f -#define MATCH_LBU 0x4003 -#define MASK_LBU 0x707f -#define MATCH_LD 0x3003 -#define MASK_LD 0x707f -#define MATCH_LH 0x1003 -#define MASK_LH 0x707f -#define MATCH_LHU 0x5003 -#define MASK_LHU 0x707f -#define MATCH_LR_D 0x1000302f -#define MASK_LR_D 0xf9f0707f -#define MATCH_LR_W 0x1000202f -#define MASK_LR_W 0xf9f0707f -#define MATCH_LUI 0x37 -#define MASK_LUI 0x7f -#define MATCH_LW 0x2003 -#define MASK_LW 0x707f -#define MATCH_LWU 0x6003 -#define MASK_LWU 0x707f -#define MATCH_MADDR32 0xc4001077 -#define MASK_MADDR32 0xfe00707f -#define MATCH_MAX 0xa006033 -#define MASK_MAX 0xfe00707f -#define MATCH_MAXU 0xa007033 -#define MASK_MAXU 0xfe00707f -#define MATCH_MIN 0xa004033 -#define MASK_MIN 0xfe00707f -#define MATCH_MINU 0xa005033 -#define MASK_MINU 0xfe00707f -#define MATCH_MNRET 0x70200073 -#define MASK_MNRET 0xffffffff -#define MATCH_MRET 0x30200073 -#define MASK_MRET 0xffffffff -#define MATCH_MSUBR32 0xc6001077 -#define MASK_MSUBR32 0xfe00707f -#define MATCH_MUL 0x2000033 -#define MASK_MUL 0xfe00707f -#define MATCH_MULH 0x2001033 -#define MASK_MULH 0xfe00707f -#define MATCH_MULHSU 0x2002033 -#define MASK_MULHSU 0xfe00707f -#define MATCH_MULHU 0x2003033 -#define MASK_MULHU 0xfe00707f -#define MATCH_MULR64 0xf0001077 -#define MASK_MULR64 0xfe00707f -#define MATCH_MULSR64 0xe0001077 -#define MASK_MULSR64 0xfe00707f -#define MATCH_MULW 0x200003b -#define MASK_MULW 0xfe00707f -#define MATCH_OR 0x6033 -#define MASK_OR 0xfe00707f -#define MATCH_ORI 0x6013 -#define MASK_ORI 0x707f -#define MATCH_ORN 0x40006033 -#define MASK_ORN 0xfe00707f -#define MATCH_PACK 0x8004033 -#define MASK_PACK 0xfe00707f -#define MATCH_PACKH 0x8007033 -#define MASK_PACKH 0xfe00707f -#define MATCH_PACKU 0x48004033 -#define MASK_PACKU 0xfe00707f -#define MATCH_PACKUW 0x4800403b -#define MASK_PACKUW 0xfe00707f -#define MATCH_PACKW 0x800403b -#define MASK_PACKW 0xfe00707f -#define MATCH_PAUSE 0x100000f -#define MASK_PAUSE 0xffffffff -#define MATCH_PBSAD 0xfc000077 -#define MASK_PBSAD 0xfe00707f -#define MATCH_PBSADA 0xfe000077 -#define MASK_PBSADA 0xfe00707f -#define MATCH_PKBB16 0xe001077 -#define MASK_PKBB16 0xfe00707f -#define MATCH_PKBT16 0x1e001077 -#define MASK_PKBT16 0xfe00707f -#define MATCH_PKBT32 0x1e002077 -#define MASK_PKBT32 0xfe00707f -#define MATCH_PKTB16 0x3e001077 -#define MASK_PKTB16 0xfe00707f -#define MATCH_PKTB32 0x3e002077 -#define MASK_PKTB32 0xfe00707f -#define MATCH_PKTT16 0x2e001077 -#define MASK_PKTT16 0xfe00707f -#define MATCH_PREFETCH_I 0x6013 -#define MASK_PREFETCH_I 0x1f07fff -#define MATCH_PREFETCH_R 0x106013 -#define MASK_PREFETCH_R 0x1f07fff -#define MATCH_PREFETCH_W 0x306013 -#define MASK_PREFETCH_W 0x1f07fff -#define MATCH_RADD16 0x77 -#define MASK_RADD16 0xfe00707f -#define MATCH_RADD32 0x2077 -#define MASK_RADD32 0xfe00707f -#define MATCH_RADD64 0x80001077 -#define MASK_RADD64 0xfe00707f -#define MATCH_RADD8 0x8000077 -#define MASK_RADD8 0xfe00707f -#define MATCH_RADDW 0x20001077 -#define MASK_RADDW 0xfe00707f -#define MATCH_RCRAS16 0x4000077 -#define MASK_RCRAS16 0xfe00707f -#define MATCH_RCRAS32 0x4002077 -#define MASK_RCRAS32 0xfe00707f -#define MATCH_RCRSA16 0x6000077 -#define MASK_RCRSA16 0xfe00707f -#define MATCH_RCRSA32 0x6002077 -#define MASK_RCRSA32 0xfe00707f -#define MATCH_REM 0x2006033 -#define MASK_REM 0xfe00707f -#define MATCH_REMU 0x2007033 -#define MASK_REMU 0xfe00707f -#define MATCH_REMUW 0x200703b -#define MASK_REMUW 0xfe00707f -#define MATCH_REMW 0x200603b -#define MASK_REMW 0xfe00707f -#define MATCH_ROL 0x60001033 -#define MASK_ROL 0xfe00707f -#define MATCH_ROLW 0x6000103b -#define MASK_ROLW 0xfe00707f -#define MATCH_ROR 0x60005033 -#define MASK_ROR 0xfe00707f -#define MATCH_RORI 0x60005013 -#define MASK_RORI 0xfc00707f -#define MATCH_RORIW 0x6000501b -#define MASK_RORIW 0xfe00707f -#define MATCH_RORW 0x6000503b -#define MASK_RORW 0xfe00707f -#define MATCH_RSTAS16 0xb4002077 -#define MASK_RSTAS16 0xfe00707f -#define MATCH_RSTAS32 0xb0002077 -#define MASK_RSTAS32 0xfe00707f -#define MATCH_RSTSA16 0xb6002077 -#define MASK_RSTSA16 0xfe00707f -#define MATCH_RSTSA32 0xb2002077 -#define MASK_RSTSA32 0xfe00707f -#define MATCH_RSUB16 0x2000077 -#define MASK_RSUB16 0xfe00707f -#define MATCH_RSUB32 0x2002077 -#define MASK_RSUB32 0xfe00707f -#define MATCH_RSUB64 0x82001077 -#define MASK_RSUB64 0xfe00707f -#define MATCH_RSUB8 0xa000077 -#define MASK_RSUB8 0xfe00707f -#define MATCH_RSUBW 0x22001077 -#define MASK_RSUBW 0xfe00707f -#define MATCH_SB 0x23 -#define MASK_SB 0x707f -#define MATCH_SC_D 0x1800302f -#define MASK_SC_D 0xf800707f -#define MATCH_SC_W 0x1800202f -#define MASK_SC_W 0xf800707f -#define MATCH_SCLIP16 0x84000077 -#define MASK_SCLIP16 0xff00707f -#define MATCH_SCLIP32 0xe4000077 -#define MASK_SCLIP32 0xfe00707f -#define MATCH_SCLIP8 0x8c000077 -#define MASK_SCLIP8 0xff80707f -#define MATCH_SCMPLE16 0x1c000077 -#define MASK_SCMPLE16 0xfe00707f -#define MATCH_SCMPLE8 0x1e000077 -#define MASK_SCMPLE8 0xfe00707f -#define MATCH_SCMPLT16 0xc000077 -#define MASK_SCMPLT16 0xfe00707f -#define MATCH_SCMPLT8 0xe000077 -#define MASK_SCMPLT8 0xfe00707f -#define MATCH_SD 0x3023 -#define MASK_SD 0x707f -#define MATCH_SEXT_B 0x60401013 -#define MASK_SEXT_B 0xfff0707f -#define MATCH_SEXT_H 0x60501013 -#define MASK_SEXT_H 0xfff0707f -#define MATCH_SFENCE_INVAL_IR 0x18100073 -#define MASK_SFENCE_INVAL_IR 0xffffffff -#define MATCH_SFENCE_VMA 0x12000073 -#define MASK_SFENCE_VMA 0xfe007fff -#define MATCH_SFENCE_W_INVAL 0x18000073 -#define MASK_SFENCE_W_INVAL 0xffffffff -#define MATCH_SH 0x1023 -#define MASK_SH 0x707f -#define MATCH_SH1ADD 0x20002033 -#define MASK_SH1ADD 0xfe00707f -#define MATCH_SH1ADD_UW 0x2000203b -#define MASK_SH1ADD_UW 0xfe00707f -#define MATCH_SH2ADD 0x20004033 -#define MASK_SH2ADD 0xfe00707f -#define MATCH_SH2ADD_UW 0x2000403b -#define MASK_SH2ADD_UW 0xfe00707f -#define MATCH_SH3ADD 0x20006033 -#define MASK_SH3ADD 0xfe00707f -#define MATCH_SH3ADD_UW 0x2000603b -#define MASK_SH3ADD_UW 0xfe00707f -#define MATCH_SHA256SIG0 0x10201013 -#define MASK_SHA256SIG0 0xfff0707f -#define MATCH_SHA256SIG1 0x10301013 -#define MASK_SHA256SIG1 0xfff0707f -#define MATCH_SHA256SUM0 0x10001013 -#define MASK_SHA256SUM0 0xfff0707f -#define MATCH_SHA256SUM1 0x10101013 -#define MASK_SHA256SUM1 0xfff0707f -#define MATCH_SHA512SIG0 0x10601013 -#define MASK_SHA512SIG0 0xfff0707f -#define MATCH_SHA512SIG0H 0x5c000033 -#define MASK_SHA512SIG0H 0xfe00707f -#define MATCH_SHA512SIG0L 0x54000033 -#define MASK_SHA512SIG0L 0xfe00707f -#define MATCH_SHA512SIG1 0x10701013 -#define MASK_SHA512SIG1 0xfff0707f -#define MATCH_SHA512SIG1H 0x5e000033 -#define MASK_SHA512SIG1H 0xfe00707f -#define MATCH_SHA512SIG1L 0x56000033 -#define MASK_SHA512SIG1L 0xfe00707f -#define MATCH_SHA512SUM0 0x10401013 -#define MASK_SHA512SUM0 0xfff0707f -#define MATCH_SHA512SUM0R 0x50000033 -#define MASK_SHA512SUM0R 0xfe00707f -#define MATCH_SHA512SUM1 0x10501013 -#define MASK_SHA512SUM1 0xfff0707f -#define MATCH_SHA512SUM1R 0x52000033 -#define MASK_SHA512SUM1R 0xfe00707f -#define MATCH_SHFL 0x8001033 -#define MASK_SHFL 0xfe00707f -#define MATCH_SHFLI 0x8001013 -#define MASK_SHFLI 0xfe00707f -#define MATCH_SHFLW 0x800103b -#define MASK_SHFLW 0xfe00707f -#define MATCH_SINVAL_VMA 0x16000073 -#define MASK_SINVAL_VMA 0xfe007fff -#define MATCH_SLL 0x1033 -#define MASK_SLL 0xfe00707f -#define MATCH_SLL16 0x54000077 -#define MASK_SLL16 0xfe00707f -#define MATCH_SLL32 0x54002077 -#define MASK_SLL32 0xfe00707f -#define MATCH_SLL8 0x5c000077 -#define MASK_SLL8 0xfe00707f -#define MATCH_SLLI 0x1013 -#define MASK_SLLI 0xfc00707f -#define MATCH_SLLI16 0x74000077 -#define MASK_SLLI16 0xff00707f -#define MATCH_SLLI32 0x74002077 -#define MASK_SLLI32 0xfe00707f -#define MATCH_SLLI8 0x7c000077 -#define MASK_SLLI8 0xff80707f -#define MATCH_SLLI_RV32 0x1013 -#define MASK_SLLI_RV32 0xfe00707f -#define MATCH_SLLI_UW 0x800101b -#define MASK_SLLI_UW 0xfc00707f -#define MATCH_SLLIW 0x101b -#define MASK_SLLIW 0xfe00707f -#define MATCH_SLLW 0x103b -#define MASK_SLLW 0xfe00707f -#define MATCH_SLO 0x20001033 -#define MASK_SLO 0xfe00707f -#define MATCH_SLOI 0x20001013 -#define MASK_SLOI 0xfc00707f -#define MATCH_SLOIW 0x2000101b -#define MASK_SLOIW 0xfe00707f -#define MATCH_SLOW 0x2000103b -#define MASK_SLOW 0xfe00707f -#define MATCH_SLT 0x2033 -#define MASK_SLT 0xfe00707f -#define MATCH_SLTI 0x2013 -#define MASK_SLTI 0x707f -#define MATCH_SLTIU 0x3013 -#define MASK_SLTIU 0x707f -#define MATCH_SLTU 0x3033 -#define MASK_SLTU 0xfe00707f -#define MATCH_SM3P0 0x10801013 -#define MASK_SM3P0 0xfff0707f -#define MATCH_SM3P1 0x10901013 -#define MASK_SM3P1 0xfff0707f -#define MATCH_SM4ED 0x30000033 -#define MASK_SM4ED 0x3e00707f -#define MATCH_SM4KS 0x34000033 -#define MASK_SM4KS 0x3e00707f -#define MATCH_SMAL 0x5e001077 -#define MASK_SMAL 0xfe00707f -#define MATCH_SMALBB 0x88001077 -#define MASK_SMALBB 0xfe00707f -#define MATCH_SMALBT 0x98001077 -#define MASK_SMALBT 0xfe00707f -#define MATCH_SMALDA 0x8c001077 -#define MASK_SMALDA 0xfe00707f -#define MATCH_SMALDRS 0x9a001077 -#define MASK_SMALDRS 0xfe00707f -#define MATCH_SMALDS 0x8a001077 -#define MASK_SMALDS 0xfe00707f -#define MATCH_SMALTT 0xa8001077 -#define MASK_SMALTT 0xfe00707f -#define MATCH_SMALXDA 0x9c001077 -#define MASK_SMALXDA 0xfe00707f -#define MATCH_SMALXDS 0xaa001077 -#define MASK_SMALXDS 0xfe00707f -#define MATCH_SMAQA 0xc8000077 -#define MASK_SMAQA 0xfe00707f -#define MATCH_SMAQA_SU 0xca000077 -#define MASK_SMAQA_SU 0xfe00707f -#define MATCH_SMAR64 0x84001077 -#define MASK_SMAR64 0xfe00707f -#define MATCH_SMAX16 0x82000077 -#define MASK_SMAX16 0xfe00707f -#define MATCH_SMAX32 0x92002077 -#define MASK_SMAX32 0xfe00707f -#define MATCH_SMAX8 0x8a000077 -#define MASK_SMAX8 0xfe00707f -#define MATCH_SMBB16 0x8001077 -#define MASK_SMBB16 0xfe00707f -#define MATCH_SMBT16 0x18001077 -#define MASK_SMBT16 0xfe00707f -#define MATCH_SMBT32 0x18002077 -#define MASK_SMBT32 0xfe00707f -#define MATCH_SMDRS 0x68001077 -#define MASK_SMDRS 0xfe00707f -#define MATCH_SMDRS32 0x68002077 -#define MASK_SMDRS32 0xfe00707f -#define MATCH_SMDS 0x58001077 -#define MASK_SMDS 0xfe00707f -#define MATCH_SMDS32 0x58002077 -#define MASK_SMDS32 0xfe00707f -#define MATCH_SMIN16 0x80000077 -#define MASK_SMIN16 0xfe00707f -#define MATCH_SMIN32 0x90002077 -#define MASK_SMIN32 0xfe00707f -#define MATCH_SMIN8 0x88000077 -#define MASK_SMIN8 0xfe00707f -#define MATCH_SMMUL 0x40001077 -#define MASK_SMMUL 0xfe00707f -#define MATCH_SMMUL_U 0x50001077 -#define MASK_SMMUL_U 0xfe00707f -#define MATCH_SMMWB 0x44001077 -#define MASK_SMMWB 0xfe00707f -#define MATCH_SMMWB_U 0x54001077 -#define MASK_SMMWB_U 0xfe00707f -#define MATCH_SMMWT 0x64001077 -#define MASK_SMMWT 0xfe00707f -#define MATCH_SMMWT_U 0x74001077 -#define MASK_SMMWT_U 0xfe00707f -#define MATCH_SMSLDA 0xac001077 -#define MASK_SMSLDA 0xfe00707f -#define MATCH_SMSLXDA 0xbc001077 -#define MASK_SMSLXDA 0xfe00707f -#define MATCH_SMSR64 0x86001077 -#define MASK_SMSR64 0xfe00707f -#define MATCH_SMTT16 0x28001077 -#define MASK_SMTT16 0xfe00707f -#define MATCH_SMTT32 0x28002077 -#define MASK_SMTT32 0xfe00707f -#define MATCH_SMUL16 0xa0000077 -#define MASK_SMUL16 0xfe00707f -#define MATCH_SMUL8 0xa8000077 -#define MASK_SMUL8 0xfe00707f -#define MATCH_SMULX16 0xa2000077 -#define MASK_SMULX16 0xfe00707f -#define MATCH_SMULX8 0xaa000077 -#define MASK_SMULX8 0xfe00707f -#define MATCH_SMXDS 0x78001077 -#define MASK_SMXDS 0xfe00707f -#define MATCH_SMXDS32 0x78002077 -#define MASK_SMXDS32 0xfe00707f -#define MATCH_SRA 0x40005033 -#define MASK_SRA 0xfe00707f -#define MATCH_SRA16 0x50000077 -#define MASK_SRA16 0xfe00707f -#define MATCH_SRA16_U 0x60000077 -#define MASK_SRA16_U 0xfe00707f -#define MATCH_SRA32 0x50002077 -#define MASK_SRA32 0xfe00707f -#define MATCH_SRA32_U 0x60002077 -#define MASK_SRA32_U 0xfe00707f -#define MATCH_SRA8 0x58000077 -#define MASK_SRA8 0xfe00707f -#define MATCH_SRA8_U 0x68000077 -#define MASK_SRA8_U 0xfe00707f -#define MATCH_SRA_U 0x24001077 -#define MASK_SRA_U 0xfe00707f -#define MATCH_SRAI 0x40005013 -#define MASK_SRAI 0xfc00707f -#define MATCH_SRAI16 0x70000077 -#define MASK_SRAI16 0xff00707f -#define MATCH_SRAI16_U 0x71000077 -#define MASK_SRAI16_U 0xff00707f -#define MATCH_SRAI32 0x70002077 -#define MASK_SRAI32 0xfe00707f -#define MATCH_SRAI32_U 0x80002077 -#define MASK_SRAI32_U 0xfe00707f -#define MATCH_SRAI8 0x78000077 -#define MASK_SRAI8 0xff80707f -#define MATCH_SRAI8_U 0x78800077 -#define MASK_SRAI8_U 0xff80707f -#define MATCH_SRAI_RV32 0x40005013 -#define MASK_SRAI_RV32 0xfe00707f -#define MATCH_SRAI_U 0xd4001077 -#define MASK_SRAI_U 0xfc00707f -#define MATCH_SRAIW 0x4000501b -#define MASK_SRAIW 0xfe00707f -#define MATCH_SRAIW_U 0x34001077 -#define MASK_SRAIW_U 0xfe00707f -#define MATCH_SRAW 0x4000503b -#define MASK_SRAW 0xfe00707f -#define MATCH_SRET 0x10200073 -#define MASK_SRET 0xffffffff -#define MATCH_SRL 0x5033 -#define MASK_SRL 0xfe00707f -#define MATCH_SRL16 0x52000077 -#define MASK_SRL16 0xfe00707f -#define MATCH_SRL16_U 0x62000077 -#define MASK_SRL16_U 0xfe00707f -#define MATCH_SRL32 0x52002077 -#define MASK_SRL32 0xfe00707f -#define MATCH_SRL32_U 0x62002077 -#define MASK_SRL32_U 0xfe00707f -#define MATCH_SRL8 0x5a000077 -#define MASK_SRL8 0xfe00707f -#define MATCH_SRL8_U 0x6a000077 -#define MASK_SRL8_U 0xfe00707f -#define MATCH_SRLI 0x5013 -#define MASK_SRLI 0xfc00707f -#define MATCH_SRLI16 0x72000077 -#define MASK_SRLI16 0xff00707f -#define MATCH_SRLI16_U 0x73000077 -#define MASK_SRLI16_U 0xff00707f -#define MATCH_SRLI32 0x72002077 -#define MASK_SRLI32 0xfe00707f -#define MATCH_SRLI32_U 0x82002077 -#define MASK_SRLI32_U 0xfe00707f -#define MATCH_SRLI8 0x7a000077 -#define MASK_SRLI8 0xff80707f -#define MATCH_SRLI8_U 0x7a800077 -#define MASK_SRLI8_U 0xff80707f -#define MATCH_SRLI_RV32 0x5013 -#define MASK_SRLI_RV32 0xfe00707f -#define MATCH_SRLIW 0x501b -#define MASK_SRLIW 0xfe00707f -#define MATCH_SRLW 0x503b -#define MASK_SRLW 0xfe00707f -#define MATCH_SRO 0x20005033 -#define MASK_SRO 0xfe00707f -#define MATCH_SROI 0x20005013 -#define MASK_SROI 0xfc00707f -#define MATCH_SROIW 0x2000501b -#define MASK_SROIW 0xfe00707f -#define MATCH_SROW 0x2000503b -#define MASK_SROW 0xfe00707f -#define MATCH_STAS16 0xf4002077 -#define MASK_STAS16 0xfe00707f -#define MATCH_STAS32 0xf0002077 -#define MASK_STAS32 0xfe00707f -#define MATCH_STSA16 0xf6002077 -#define MASK_STSA16 0xfe00707f -#define MATCH_STSA32 0xf2002077 -#define MASK_STSA32 0xfe00707f -#define MATCH_SUB 0x40000033 -#define MASK_SUB 0xfe00707f -#define MATCH_SUB16 0x42000077 -#define MASK_SUB16 0xfe00707f -#define MATCH_SUB32 0x42002077 -#define MASK_SUB32 0xfe00707f -#define MATCH_SUB64 0xc2001077 -#define MASK_SUB64 0xfe00707f -#define MATCH_SUB8 0x4a000077 -#define MASK_SUB8 0xfe00707f -#define MATCH_SUBW 0x4000003b -#define MASK_SUBW 0xfe00707f -#define MATCH_SUNPKD810 0xac800077 -#define MASK_SUNPKD810 0xfff0707f -#define MATCH_SUNPKD820 0xac900077 -#define MASK_SUNPKD820 0xfff0707f -#define MATCH_SUNPKD830 0xaca00077 -#define MASK_SUNPKD830 0xfff0707f -#define MATCH_SUNPKD831 0xacb00077 -#define MASK_SUNPKD831 0xfff0707f -#define MATCH_SUNPKD832 0xad300077 -#define MASK_SUNPKD832 0xfff0707f -#define MATCH_SW 0x2023 -#define MASK_SW 0x707f -#define MATCH_UCLIP16 0x85000077 -#define MASK_UCLIP16 0xff00707f -#define MATCH_UCLIP32 0xf4000077 -#define MASK_UCLIP32 0xfe00707f -#define MATCH_UCLIP8 0x8d000077 -#define MASK_UCLIP8 0xff80707f -#define MATCH_UCMPLE16 0x3c000077 -#define MASK_UCMPLE16 0xfe00707f -#define MATCH_UCMPLE8 0x3e000077 -#define MASK_UCMPLE8 0xfe00707f -#define MATCH_UCMPLT16 0x2c000077 -#define MASK_UCMPLT16 0xfe00707f -#define MATCH_UCMPLT8 0x2e000077 -#define MASK_UCMPLT8 0xfe00707f -#define MATCH_UKADD16 0x30000077 -#define MASK_UKADD16 0xfe00707f -#define MATCH_UKADD32 0x30002077 -#define MASK_UKADD32 0xfe00707f -#define MATCH_UKADD64 0xb0001077 -#define MASK_UKADD64 0xfe00707f -#define MATCH_UKADD8 0x38000077 -#define MASK_UKADD8 0xfe00707f -#define MATCH_UKADDH 0x14001077 -#define MASK_UKADDH 0xfe00707f -#define MATCH_UKADDW 0x10001077 -#define MASK_UKADDW 0xfe00707f -#define MATCH_UKCRAS16 0x34000077 -#define MASK_UKCRAS16 0xfe00707f -#define MATCH_UKCRAS32 0x34002077 -#define MASK_UKCRAS32 0xfe00707f -#define MATCH_UKCRSA16 0x36000077 -#define MASK_UKCRSA16 0xfe00707f -#define MATCH_UKCRSA32 0x36002077 -#define MASK_UKCRSA32 0xfe00707f -#define MATCH_UKMAR64 0xb4001077 -#define MASK_UKMAR64 0xfe00707f -#define MATCH_UKMSR64 0xb6001077 -#define MASK_UKMSR64 0xfe00707f -#define MATCH_UKSTAS16 0xe4002077 -#define MASK_UKSTAS16 0xfe00707f -#define MATCH_UKSTAS32 0xe0002077 -#define MASK_UKSTAS32 0xfe00707f -#define MATCH_UKSTSA16 0xe6002077 -#define MASK_UKSTSA16 0xfe00707f -#define MATCH_UKSTSA32 0xe2002077 -#define MASK_UKSTSA32 0xfe00707f -#define MATCH_UKSUB16 0x32000077 -#define MASK_UKSUB16 0xfe00707f -#define MATCH_UKSUB32 0x32002077 -#define MASK_UKSUB32 0xfe00707f -#define MATCH_UKSUB64 0xb2001077 -#define MASK_UKSUB64 0xfe00707f -#define MATCH_UKSUB8 0x3a000077 -#define MASK_UKSUB8 0xfe00707f -#define MATCH_UKSUBH 0x16001077 -#define MASK_UKSUBH 0xfe00707f -#define MATCH_UKSUBW 0x12001077 -#define MASK_UKSUBW 0xfe00707f -#define MATCH_UMAQA 0xcc000077 -#define MASK_UMAQA 0xfe00707f -#define MATCH_UMAR64 0xa4001077 -#define MASK_UMAR64 0xfe00707f -#define MATCH_UMAX16 0x92000077 -#define MASK_UMAX16 0xfe00707f -#define MATCH_UMAX32 0xa2002077 -#define MASK_UMAX32 0xfe00707f -#define MATCH_UMAX8 0x9a000077 -#define MASK_UMAX8 0xfe00707f -#define MATCH_UMIN16 0x90000077 -#define MASK_UMIN16 0xfe00707f -#define MATCH_UMIN32 0xa0002077 -#define MASK_UMIN32 0xfe00707f -#define MATCH_UMIN8 0x98000077 -#define MASK_UMIN8 0xfe00707f -#define MATCH_UMSR64 0xa6001077 -#define MASK_UMSR64 0xfe00707f -#define MATCH_UMUL16 0xb0000077 -#define MASK_UMUL16 0xfe00707f -#define MATCH_UMUL8 0xb8000077 -#define MASK_UMUL8 0xfe00707f -#define MATCH_UMULX16 0xb2000077 -#define MASK_UMULX16 0xfe00707f -#define MATCH_UMULX8 0xba000077 -#define MASK_UMULX8 0xfe00707f -#define MATCH_UNSHFL 0x8005033 -#define MASK_UNSHFL 0xfe00707f -#define MATCH_UNSHFLI 0x8005013 -#define MASK_UNSHFLI 0xfe00707f -#define MATCH_UNSHFLW 0x800503b -#define MASK_UNSHFLW 0xfe00707f -#define MATCH_URADD16 0x20000077 -#define MASK_URADD16 0xfe00707f -#define MATCH_URADD32 0x20002077 -#define MASK_URADD32 0xfe00707f -#define MATCH_URADD64 0xa0001077 -#define MASK_URADD64 0xfe00707f -#define MATCH_URADD8 0x28000077 -#define MASK_URADD8 0xfe00707f -#define MATCH_URADDW 0x30001077 -#define MASK_URADDW 0xfe00707f -#define MATCH_URCRAS16 0x24000077 -#define MASK_URCRAS16 0xfe00707f -#define MATCH_URCRAS32 0x24002077 -#define MASK_URCRAS32 0xfe00707f -#define MATCH_URCRSA16 0x26000077 -#define MASK_URCRSA16 0xfe00707f -#define MATCH_URCRSA32 0x26002077 -#define MASK_URCRSA32 0xfe00707f -#define MATCH_URSTAS16 0xd4002077 -#define MASK_URSTAS16 0xfe00707f -#define MATCH_URSTAS32 0xd0002077 -#define MASK_URSTAS32 0xfe00707f -#define MATCH_URSTSA16 0xd6002077 -#define MASK_URSTSA16 0xfe00707f -#define MATCH_URSTSA32 0xd2002077 -#define MASK_URSTSA32 0xfe00707f -#define MATCH_URSUB16 0x22000077 -#define MASK_URSUB16 0xfe00707f -#define MATCH_URSUB32 0x22002077 -#define MASK_URSUB32 0xfe00707f -#define MATCH_URSUB64 0xa2001077 -#define MASK_URSUB64 0xfe00707f -#define MATCH_URSUB8 0x2a000077 -#define MASK_URSUB8 0xfe00707f -#define MATCH_URSUBW 0x32001077 -#define MASK_URSUBW 0xfe00707f -#define MATCH_VAADD_VV 0x24002057 -#define MASK_VAADD_VV 0xfc00707f -#define MATCH_VAADD_VX 0x24006057 -#define MASK_VAADD_VX 0xfc00707f -#define MATCH_VAADDU_VV 0x20002057 -#define MASK_VAADDU_VV 0xfc00707f -#define MATCH_VAADDU_VX 0x20006057 -#define MASK_VAADDU_VX 0xfc00707f -#define MATCH_VADC_VIM 0x40003057 -#define MASK_VADC_VIM 0xfe00707f -#define MATCH_VADC_VVM 0x40000057 -#define MASK_VADC_VVM 0xfe00707f -#define MATCH_VADC_VXM 0x40004057 -#define MASK_VADC_VXM 0xfe00707f -#define MATCH_VADD_VI 0x3057 -#define MASK_VADD_VI 0xfc00707f -#define MATCH_VADD_VV 0x57 -#define MASK_VADD_VV 0xfc00707f -#define MATCH_VADD_VX 0x4057 -#define MASK_VADD_VX 0xfc00707f -#define MATCH_VAMOADDEI16_V 0x502f -#define MASK_VAMOADDEI16_V 0xf800707f -#define MATCH_VAMOADDEI32_V 0x602f -#define MASK_VAMOADDEI32_V 0xf800707f -#define MATCH_VAMOADDEI64_V 0x702f -#define MASK_VAMOADDEI64_V 0xf800707f -#define MATCH_VAMOADDEI8_V 0x2f -#define MASK_VAMOADDEI8_V 0xf800707f -#define MATCH_VAMOANDEI16_V 0x6000502f -#define MASK_VAMOANDEI16_V 0xf800707f -#define MATCH_VAMOANDEI32_V 0x6000602f -#define MASK_VAMOANDEI32_V 0xf800707f -#define MATCH_VAMOANDEI64_V 0x6000702f -#define MASK_VAMOANDEI64_V 0xf800707f -#define MATCH_VAMOANDEI8_V 0x6000002f -#define MASK_VAMOANDEI8_V 0xf800707f -#define MATCH_VAMOMAXEI16_V 0xa000502f -#define MASK_VAMOMAXEI16_V 0xf800707f -#define MATCH_VAMOMAXEI32_V 0xa000602f -#define MASK_VAMOMAXEI32_V 0xf800707f -#define MATCH_VAMOMAXEI64_V 0xa000702f -#define MASK_VAMOMAXEI64_V 0xf800707f -#define MATCH_VAMOMAXEI8_V 0xa000002f -#define MASK_VAMOMAXEI8_V 0xf800707f -#define MATCH_VAMOMAXUEI16_V 0xe000502f -#define MASK_VAMOMAXUEI16_V 0xf800707f -#define MATCH_VAMOMAXUEI32_V 0xe000602f -#define MASK_VAMOMAXUEI32_V 0xf800707f -#define MATCH_VAMOMAXUEI64_V 0xe000702f -#define MASK_VAMOMAXUEI64_V 0xf800707f -#define MATCH_VAMOMAXUEI8_V 0xe000002f -#define MASK_VAMOMAXUEI8_V 0xf800707f -#define MATCH_VAMOMINEI16_V 0x8000502f -#define MASK_VAMOMINEI16_V 0xf800707f -#define MATCH_VAMOMINEI32_V 0x8000602f -#define MASK_VAMOMINEI32_V 0xf800707f -#define MATCH_VAMOMINEI64_V 0x8000702f -#define MASK_VAMOMINEI64_V 0xf800707f -#define MATCH_VAMOMINEI8_V 0x8000002f -#define MASK_VAMOMINEI8_V 0xf800707f -#define MATCH_VAMOMINUEI16_V 0xc000502f -#define MASK_VAMOMINUEI16_V 0xf800707f -#define MATCH_VAMOMINUEI32_V 0xc000602f -#define MASK_VAMOMINUEI32_V 0xf800707f -#define MATCH_VAMOMINUEI64_V 0xc000702f -#define MASK_VAMOMINUEI64_V 0xf800707f -#define MATCH_VAMOMINUEI8_V 0xc000002f -#define MASK_VAMOMINUEI8_V 0xf800707f -#define MATCH_VAMOOREI16_V 0x4000502f -#define MASK_VAMOOREI16_V 0xf800707f -#define MATCH_VAMOOREI32_V 0x4000602f -#define MASK_VAMOOREI32_V 0xf800707f -#define MATCH_VAMOOREI64_V 0x4000702f -#define MASK_VAMOOREI64_V 0xf800707f -#define MATCH_VAMOOREI8_V 0x4000002f -#define MASK_VAMOOREI8_V 0xf800707f -#define MATCH_VAMOSWAPEI16_V 0x800502f -#define MASK_VAMOSWAPEI16_V 0xf800707f -#define MATCH_VAMOSWAPEI32_V 0x800602f -#define MASK_VAMOSWAPEI32_V 0xf800707f -#define MATCH_VAMOSWAPEI64_V 0x800702f -#define MASK_VAMOSWAPEI64_V 0xf800707f -#define MATCH_VAMOSWAPEI8_V 0x800002f -#define MASK_VAMOSWAPEI8_V 0xf800707f -#define MATCH_VAMOXOREI16_V 0x2000502f -#define MASK_VAMOXOREI16_V 0xf800707f -#define MATCH_VAMOXOREI32_V 0x2000602f -#define MASK_VAMOXOREI32_V 0xf800707f -#define MATCH_VAMOXOREI64_V 0x2000702f -#define MASK_VAMOXOREI64_V 0xf800707f -#define MATCH_VAMOXOREI8_V 0x2000002f -#define MASK_VAMOXOREI8_V 0xf800707f -#define MATCH_VAND_VI 0x24003057 -#define MASK_VAND_VI 0xfc00707f -#define MATCH_VAND_VV 0x24000057 -#define MASK_VAND_VV 0xfc00707f -#define MATCH_VAND_VX 0x24004057 -#define MASK_VAND_VX 0xfc00707f -#define MATCH_VASUB_VV 0x2c002057 -#define MASK_VASUB_VV 0xfc00707f -#define MATCH_VASUB_VX 0x2c006057 -#define MASK_VASUB_VX 0xfc00707f -#define MATCH_VASUBU_VV 0x28002057 -#define MASK_VASUBU_VV 0xfc00707f -#define MATCH_VASUBU_VX 0x28006057 -#define MASK_VASUBU_VX 0xfc00707f -#define MATCH_VCOMPRESS_VM 0x5e002057 -#define MASK_VCOMPRESS_VM 0xfe00707f -#define MATCH_VCPOP_M 0x40082057 -#define MASK_VCPOP_M 0xfc0ff07f -#define MATCH_VDIV_VV 0x84002057 -#define MASK_VDIV_VV 0xfc00707f -#define MATCH_VDIV_VX 0x84006057 -#define MASK_VDIV_VX 0xfc00707f -#define MATCH_VDIVU_VV 0x80002057 -#define MASK_VDIVU_VV 0xfc00707f -#define MATCH_VDIVU_VX 0x80006057 -#define MASK_VDIVU_VX 0xfc00707f -#define MATCH_VFADD_VF 0x5057 -#define MASK_VFADD_VF 0xfc00707f -#define MATCH_VFADD_VV 0x1057 -#define MASK_VFADD_VV 0xfc00707f -#define MATCH_VFCLASS_V 0x4c081057 -#define MASK_VFCLASS_V 0xfc0ff07f -#define MATCH_VFCVT_F_X_V 0x48019057 -#define MASK_VFCVT_F_X_V 0xfc0ff07f -#define MATCH_VFCVT_F_XU_V 0x48011057 -#define MASK_VFCVT_F_XU_V 0xfc0ff07f -#define MATCH_VFCVT_RTZ_X_F_V 0x48039057 -#define MASK_VFCVT_RTZ_X_F_V 0xfc0ff07f -#define MATCH_VFCVT_RTZ_XU_F_V 0x48031057 -#define MASK_VFCVT_RTZ_XU_F_V 0xfc0ff07f -#define MATCH_VFCVT_X_F_V 0x48009057 -#define MASK_VFCVT_X_F_V 0xfc0ff07f -#define MATCH_VFCVT_XU_F_V 0x48001057 -#define MASK_VFCVT_XU_F_V 0xfc0ff07f -#define MATCH_VFDIV_VF 0x80005057 -#define MASK_VFDIV_VF 0xfc00707f -#define MATCH_VFDIV_VV 0x80001057 -#define MASK_VFDIV_VV 0xfc00707f -#define MATCH_VFIRST_M 0x4008a057 -#define MASK_VFIRST_M 0xfc0ff07f -#define MATCH_VFMACC_VF 0xb0005057 -#define MASK_VFMACC_VF 0xfc00707f -#define MATCH_VFMACC_VV 0xb0001057 -#define MASK_VFMACC_VV 0xfc00707f -#define MATCH_VFMADD_VF 0xa0005057 -#define MASK_VFMADD_VF 0xfc00707f -#define MATCH_VFMADD_VV 0xa0001057 -#define MASK_VFMADD_VV 0xfc00707f -#define MATCH_VFMAX_VF 0x18005057 -#define MASK_VFMAX_VF 0xfc00707f -#define MATCH_VFMAX_VV 0x18001057 -#define MASK_VFMAX_VV 0xfc00707f -#define MATCH_VFMERGE_VFM 0x5c005057 -#define MASK_VFMERGE_VFM 0xfe00707f -#define MATCH_VFMIN_VF 0x10005057 -#define MASK_VFMIN_VF 0xfc00707f -#define MATCH_VFMIN_VV 0x10001057 -#define MASK_VFMIN_VV 0xfc00707f -#define MATCH_VFMSAC_VF 0xb8005057 -#define MASK_VFMSAC_VF 0xfc00707f -#define MATCH_VFMSAC_VV 0xb8001057 -#define MASK_VFMSAC_VV 0xfc00707f -#define MATCH_VFMSUB_VF 0xa8005057 -#define MASK_VFMSUB_VF 0xfc00707f -#define MATCH_VFMSUB_VV 0xa8001057 -#define MASK_VFMSUB_VV 0xfc00707f -#define MATCH_VFMUL_VF 0x90005057 -#define MASK_VFMUL_VF 0xfc00707f -#define MATCH_VFMUL_VV 0x90001057 -#define MASK_VFMUL_VV 0xfc00707f -#define MATCH_VFMV_F_S 0x42001057 -#define MASK_VFMV_F_S 0xfe0ff07f -#define MATCH_VFMV_S_F 0x42005057 -#define MASK_VFMV_S_F 0xfff0707f -#define MATCH_VFMV_V_F 0x5e005057 -#define MASK_VFMV_V_F 0xfff0707f -#define MATCH_VFNCVT_F_F_W 0x480a1057 -#define MASK_VFNCVT_F_F_W 0xfc0ff07f -#define MATCH_VFNCVT_F_X_W 0x48099057 -#define MASK_VFNCVT_F_X_W 0xfc0ff07f -#define MATCH_VFNCVT_F_XU_W 0x48091057 -#define MASK_VFNCVT_F_XU_W 0xfc0ff07f -#define MATCH_VFNCVT_ROD_F_F_W 0x480a9057 -#define MASK_VFNCVT_ROD_F_F_W 0xfc0ff07f -#define MATCH_VFNCVT_RTZ_X_F_W 0x480b9057 -#define MASK_VFNCVT_RTZ_X_F_W 0xfc0ff07f -#define MATCH_VFNCVT_RTZ_XU_F_W 0x480b1057 -#define MASK_VFNCVT_RTZ_XU_F_W 0xfc0ff07f -#define MATCH_VFNCVT_X_F_W 0x48089057 -#define MASK_VFNCVT_X_F_W 0xfc0ff07f -#define MATCH_VFNCVT_XU_F_W 0x48081057 -#define MASK_VFNCVT_XU_F_W 0xfc0ff07f -#define MATCH_VFNMACC_VF 0xb4005057 -#define MASK_VFNMACC_VF 0xfc00707f -#define MATCH_VFNMACC_VV 0xb4001057 -#define MASK_VFNMACC_VV 0xfc00707f -#define MATCH_VFNMADD_VF 0xa4005057 -#define MASK_VFNMADD_VF 0xfc00707f -#define MATCH_VFNMADD_VV 0xa4001057 -#define MASK_VFNMADD_VV 0xfc00707f -#define MATCH_VFNMSAC_VF 0xbc005057 -#define MASK_VFNMSAC_VF 0xfc00707f -#define MATCH_VFNMSAC_VV 0xbc001057 -#define MASK_VFNMSAC_VV 0xfc00707f -#define MATCH_VFNMSUB_VF 0xac005057 -#define MASK_VFNMSUB_VF 0xfc00707f -#define MATCH_VFNMSUB_VV 0xac001057 -#define MASK_VFNMSUB_VV 0xfc00707f -#define MATCH_VFRDIV_VF 0x84005057 -#define MASK_VFRDIV_VF 0xfc00707f -#define MATCH_VFREC7_V 0x4c029057 -#define MASK_VFREC7_V 0xfc0ff07f -#define MATCH_VFREDMAX_VS 0x1c001057 -#define MASK_VFREDMAX_VS 0xfc00707f -#define MATCH_VFREDMIN_VS 0x14001057 -#define MASK_VFREDMIN_VS 0xfc00707f -#define MATCH_VFREDOSUM_VS 0xc001057 -#define MASK_VFREDOSUM_VS 0xfc00707f -#define MATCH_VFREDUSUM_VS 0x4001057 -#define MASK_VFREDUSUM_VS 0xfc00707f -#define MATCH_VFRSQRT7_V 0x4c021057 -#define MASK_VFRSQRT7_V 0xfc0ff07f -#define MATCH_VFRSUB_VF 0x9c005057 -#define MASK_VFRSUB_VF 0xfc00707f -#define MATCH_VFSGNJ_VF 0x20005057 -#define MASK_VFSGNJ_VF 0xfc00707f -#define MATCH_VFSGNJ_VV 0x20001057 -#define MASK_VFSGNJ_VV 0xfc00707f -#define MATCH_VFSGNJN_VF 0x24005057 -#define MASK_VFSGNJN_VF 0xfc00707f -#define MATCH_VFSGNJN_VV 0x24001057 -#define MASK_VFSGNJN_VV 0xfc00707f -#define MATCH_VFSGNJX_VF 0x28005057 -#define MASK_VFSGNJX_VF 0xfc00707f -#define MATCH_VFSGNJX_VV 0x28001057 -#define MASK_VFSGNJX_VV 0xfc00707f -#define MATCH_VFSLIDE1DOWN_VF 0x3c005057 -#define MASK_VFSLIDE1DOWN_VF 0xfc00707f -#define MATCH_VFSLIDE1UP_VF 0x38005057 -#define MASK_VFSLIDE1UP_VF 0xfc00707f -#define MATCH_VFSQRT_V 0x4c001057 -#define MASK_VFSQRT_V 0xfc0ff07f -#define MATCH_VFSUB_VF 0x8005057 -#define MASK_VFSUB_VF 0xfc00707f -#define MATCH_VFSUB_VV 0x8001057 -#define MASK_VFSUB_VV 0xfc00707f -#define MATCH_VFWADD_VF 0xc0005057 -#define MASK_VFWADD_VF 0xfc00707f -#define MATCH_VFWADD_VV 0xc0001057 -#define MASK_VFWADD_VV 0xfc00707f -#define MATCH_VFWADD_WF 0xd0005057 -#define MASK_VFWADD_WF 0xfc00707f -#define MATCH_VFWADD_WV 0xd0001057 -#define MASK_VFWADD_WV 0xfc00707f -#define MATCH_VFWCVT_F_F_V 0x48061057 -#define MASK_VFWCVT_F_F_V 0xfc0ff07f -#define MATCH_VFWCVT_F_X_V 0x48059057 -#define MASK_VFWCVT_F_X_V 0xfc0ff07f -#define MATCH_VFWCVT_F_XU_V 0x48051057 -#define MASK_VFWCVT_F_XU_V 0xfc0ff07f -#define MATCH_VFWCVT_RTZ_X_F_V 0x48079057 -#define MASK_VFWCVT_RTZ_X_F_V 0xfc0ff07f -#define MATCH_VFWCVT_RTZ_XU_F_V 0x48071057 -#define MASK_VFWCVT_RTZ_XU_F_V 0xfc0ff07f -#define MATCH_VFWCVT_X_F_V 0x48049057 -#define MASK_VFWCVT_X_F_V 0xfc0ff07f -#define MATCH_VFWCVT_XU_F_V 0x48041057 -#define MASK_VFWCVT_XU_F_V 0xfc0ff07f -#define MATCH_VFWMACC_VF 0xf0005057 -#define MASK_VFWMACC_VF 0xfc00707f -#define MATCH_VFWMACC_VV 0xf0001057 -#define MASK_VFWMACC_VV 0xfc00707f -#define MATCH_VFWMSAC_VF 0xf8005057 -#define MASK_VFWMSAC_VF 0xfc00707f -#define MATCH_VFWMSAC_VV 0xf8001057 -#define MASK_VFWMSAC_VV 0xfc00707f -#define MATCH_VFWMUL_VF 0xe0005057 -#define MASK_VFWMUL_VF 0xfc00707f -#define MATCH_VFWMUL_VV 0xe0001057 -#define MASK_VFWMUL_VV 0xfc00707f -#define MATCH_VFWNMACC_VF 0xf4005057 -#define MASK_VFWNMACC_VF 0xfc00707f -#define MATCH_VFWNMACC_VV 0xf4001057 -#define MASK_VFWNMACC_VV 0xfc00707f -#define MATCH_VFWNMSAC_VF 0xfc005057 -#define MASK_VFWNMSAC_VF 0xfc00707f -#define MATCH_VFWNMSAC_VV 0xfc001057 -#define MASK_VFWNMSAC_VV 0xfc00707f -#define MATCH_VFWREDOSUM_VS 0xcc001057 -#define MASK_VFWREDOSUM_VS 0xfc00707f -#define MATCH_VFWREDUSUM_VS 0xc4001057 -#define MASK_VFWREDUSUM_VS 0xfc00707f -#define MATCH_VFWSUB_VF 0xc8005057 -#define MASK_VFWSUB_VF 0xfc00707f -#define MATCH_VFWSUB_VV 0xc8001057 -#define MASK_VFWSUB_VV 0xfc00707f -#define MATCH_VFWSUB_WF 0xd8005057 -#define MASK_VFWSUB_WF 0xfc00707f -#define MATCH_VFWSUB_WV 0xd8001057 -#define MASK_VFWSUB_WV 0xfc00707f -#define MATCH_VID_V 0x5008a057 -#define MASK_VID_V 0xfdfff07f -#define MATCH_VIOTA_M 0x50082057 -#define MASK_VIOTA_M 0xfc0ff07f -#define MATCH_VL1RE16_V 0x2805007 -#define MASK_VL1RE16_V 0xfff0707f -#define MATCH_VL1RE32_V 0x2806007 -#define MASK_VL1RE32_V 0xfff0707f -#define MATCH_VL1RE64_V 0x2807007 -#define MASK_VL1RE64_V 0xfff0707f -#define MATCH_VL1RE8_V 0x2800007 -#define MASK_VL1RE8_V 0xfff0707f -#define MATCH_VL2RE16_V 0x22805007 -#define MASK_VL2RE16_V 0xfff0707f -#define MATCH_VL2RE32_V 0x22806007 -#define MASK_VL2RE32_V 0xfff0707f -#define MATCH_VL2RE64_V 0x22807007 -#define MASK_VL2RE64_V 0xfff0707f -#define MATCH_VL2RE8_V 0x22800007 -#define MASK_VL2RE8_V 0xfff0707f -#define MATCH_VL4RE16_V 0x62805007 -#define MASK_VL4RE16_V 0xfff0707f -#define MATCH_VL4RE32_V 0x62806007 -#define MASK_VL4RE32_V 0xfff0707f -#define MATCH_VL4RE64_V 0x62807007 -#define MASK_VL4RE64_V 0xfff0707f -#define MATCH_VL4RE8_V 0x62800007 -#define MASK_VL4RE8_V 0xfff0707f -#define MATCH_VL8RE16_V 0xe2805007 -#define MASK_VL8RE16_V 0xfff0707f -#define MATCH_VL8RE32_V 0xe2806007 -#define MASK_VL8RE32_V 0xfff0707f -#define MATCH_VL8RE64_V 0xe2807007 -#define MASK_VL8RE64_V 0xfff0707f -#define MATCH_VL8RE8_V 0xe2800007 -#define MASK_VL8RE8_V 0xfff0707f -#define MATCH_VLE1024_V 0x10007007 -#define MASK_VLE1024_V 0x1df0707f -#define MATCH_VLE1024FF_V 0x11007007 -#define MASK_VLE1024FF_V 0x1df0707f -#define MATCH_VLE128_V 0x10000007 -#define MASK_VLE128_V 0x1df0707f -#define MATCH_VLE128FF_V 0x11000007 -#define MASK_VLE128FF_V 0x1df0707f -#define MATCH_VLE16_V 0x5007 -#define MASK_VLE16_V 0x1df0707f -#define MATCH_VLE16FF_V 0x1005007 -#define MASK_VLE16FF_V 0x1df0707f -#define MATCH_VLE256_V 0x10005007 -#define MASK_VLE256_V 0x1df0707f -#define MATCH_VLE256FF_V 0x11005007 -#define MASK_VLE256FF_V 0x1df0707f -#define MATCH_VLE32_V 0x6007 -#define MASK_VLE32_V 0x1df0707f -#define MATCH_VLE32FF_V 0x1006007 -#define MASK_VLE32FF_V 0x1df0707f -#define MATCH_VLE512_V 0x10006007 -#define MASK_VLE512_V 0x1df0707f -#define MATCH_VLE512FF_V 0x11006007 -#define MASK_VLE512FF_V 0x1df0707f -#define MATCH_VLE64_V 0x7007 -#define MASK_VLE64_V 0x1df0707f -#define MATCH_VLE64FF_V 0x1007007 -#define MASK_VLE64FF_V 0x1df0707f -#define MATCH_VLE8_V 0x7 -#define MASK_VLE8_V 0x1df0707f -#define MATCH_VLE8FF_V 0x1000007 -#define MASK_VLE8FF_V 0x1df0707f -#define MATCH_VLM_V 0x2b00007 -#define MASK_VLM_V 0xfff0707f -#define MATCH_VLOXEI1024_V 0x1c007007 -#define MASK_VLOXEI1024_V 0x1c00707f -#define MATCH_VLOXEI128_V 0x1c000007 -#define MASK_VLOXEI128_V 0x1c00707f -#define MATCH_VLOXEI16_V 0xc005007 -#define MASK_VLOXEI16_V 0x1c00707f -#define MATCH_VLOXEI256_V 0x1c005007 -#define MASK_VLOXEI256_V 0x1c00707f -#define MATCH_VLOXEI32_V 0xc006007 -#define MASK_VLOXEI32_V 0x1c00707f -#define MATCH_VLOXEI512_V 0x1c006007 -#define MASK_VLOXEI512_V 0x1c00707f -#define MATCH_VLOXEI64_V 0xc007007 -#define MASK_VLOXEI64_V 0x1c00707f -#define MATCH_VLOXEI8_V 0xc000007 -#define MASK_VLOXEI8_V 0x1c00707f -#define MATCH_VLSE1024_V 0x18007007 -#define MASK_VLSE1024_V 0x1c00707f -#define MATCH_VLSE128_V 0x18000007 -#define MASK_VLSE128_V 0x1c00707f -#define MATCH_VLSE16_V 0x8005007 -#define MASK_VLSE16_V 0x1c00707f -#define MATCH_VLSE256_V 0x18005007 -#define MASK_VLSE256_V 0x1c00707f -#define MATCH_VLSE32_V 0x8006007 -#define MASK_VLSE32_V 0x1c00707f -#define MATCH_VLSE512_V 0x18006007 -#define MASK_VLSE512_V 0x1c00707f -#define MATCH_VLSE64_V 0x8007007 -#define MASK_VLSE64_V 0x1c00707f -#define MATCH_VLSE8_V 0x8000007 -#define MASK_VLSE8_V 0x1c00707f -#define MATCH_VLUXEI1024_V 0x14007007 -#define MASK_VLUXEI1024_V 0x1c00707f -#define MATCH_VLUXEI128_V 0x14000007 -#define MASK_VLUXEI128_V 0x1c00707f -#define MATCH_VLUXEI16_V 0x4005007 -#define MASK_VLUXEI16_V 0x1c00707f -#define MATCH_VLUXEI256_V 0x14005007 -#define MASK_VLUXEI256_V 0x1c00707f -#define MATCH_VLUXEI32_V 0x4006007 -#define MASK_VLUXEI32_V 0x1c00707f -#define MATCH_VLUXEI512_V 0x14006007 -#define MASK_VLUXEI512_V 0x1c00707f -#define MATCH_VLUXEI64_V 0x4007007 -#define MASK_VLUXEI64_V 0x1c00707f -#define MATCH_VLUXEI8_V 0x4000007 -#define MASK_VLUXEI8_V 0x1c00707f -#define MATCH_VMACC_VV 0xb4002057 -#define MASK_VMACC_VV 0xfc00707f -#define MATCH_VMACC_VX 0xb4006057 -#define MASK_VMACC_VX 0xfc00707f -#define MATCH_VMADC_VI 0x46003057 -#define MASK_VMADC_VI 0xfe00707f -#define MATCH_VMADC_VIM 0x44003057 -#define MASK_VMADC_VIM 0xfe00707f -#define MATCH_VMADC_VV 0x46000057 -#define MASK_VMADC_VV 0xfe00707f -#define MATCH_VMADC_VVM 0x44000057 -#define MASK_VMADC_VVM 0xfe00707f -#define MATCH_VMADC_VX 0x46004057 -#define MASK_VMADC_VX 0xfe00707f -#define MATCH_VMADC_VXM 0x44004057 -#define MASK_VMADC_VXM 0xfe00707f -#define MATCH_VMADD_VV 0xa4002057 -#define MASK_VMADD_VV 0xfc00707f -#define MATCH_VMADD_VX 0xa4006057 -#define MASK_VMADD_VX 0xfc00707f -#define MATCH_VMAND_MM 0x64002057 -#define MASK_VMAND_MM 0xfc00707f -#define MATCH_VMANDN_MM 0x60002057 -#define MASK_VMANDN_MM 0xfc00707f -#define MATCH_VMAX_VV 0x1c000057 -#define MASK_VMAX_VV 0xfc00707f -#define MATCH_VMAX_VX 0x1c004057 -#define MASK_VMAX_VX 0xfc00707f -#define MATCH_VMAXU_VV 0x18000057 -#define MASK_VMAXU_VV 0xfc00707f -#define MATCH_VMAXU_VX 0x18004057 -#define MASK_VMAXU_VX 0xfc00707f -#define MATCH_VMERGE_VIM 0x5c003057 -#define MASK_VMERGE_VIM 0xfe00707f -#define MATCH_VMERGE_VVM 0x5c000057 -#define MASK_VMERGE_VVM 0xfe00707f -#define MATCH_VMERGE_VXM 0x5c004057 -#define MASK_VMERGE_VXM 0xfe00707f -#define MATCH_VMFEQ_VF 0x60005057 -#define MASK_VMFEQ_VF 0xfc00707f -#define MATCH_VMFEQ_VV 0x60001057 -#define MASK_VMFEQ_VV 0xfc00707f -#define MATCH_VMFGE_VF 0x7c005057 -#define MASK_VMFGE_VF 0xfc00707f -#define MATCH_VMFGT_VF 0x74005057 -#define MASK_VMFGT_VF 0xfc00707f -#define MATCH_VMFLE_VF 0x64005057 -#define MASK_VMFLE_VF 0xfc00707f -#define MATCH_VMFLE_VV 0x64001057 -#define MASK_VMFLE_VV 0xfc00707f -#define MATCH_VMFLT_VF 0x6c005057 -#define MASK_VMFLT_VF 0xfc00707f -#define MATCH_VMFLT_VV 0x6c001057 -#define MASK_VMFLT_VV 0xfc00707f -#define MATCH_VMFNE_VF 0x70005057 -#define MASK_VMFNE_VF 0xfc00707f -#define MATCH_VMFNE_VV 0x70001057 -#define MASK_VMFNE_VV 0xfc00707f -#define MATCH_VMIN_VV 0x14000057 -#define MASK_VMIN_VV 0xfc00707f -#define MATCH_VMIN_VX 0x14004057 -#define MASK_VMIN_VX 0xfc00707f -#define MATCH_VMINU_VV 0x10000057 -#define MASK_VMINU_VV 0xfc00707f -#define MATCH_VMINU_VX 0x10004057 -#define MASK_VMINU_VX 0xfc00707f -#define MATCH_VMNAND_MM 0x74002057 -#define MASK_VMNAND_MM 0xfc00707f -#define MATCH_VMNOR_MM 0x78002057 -#define MASK_VMNOR_MM 0xfc00707f -#define MATCH_VMOR_MM 0x68002057 -#define MASK_VMOR_MM 0xfc00707f -#define MATCH_VMORN_MM 0x70002057 -#define MASK_VMORN_MM 0xfc00707f -#define MATCH_VMSBC_VV 0x4e000057 -#define MASK_VMSBC_VV 0xfe00707f -#define MATCH_VMSBC_VVM 0x4c000057 -#define MASK_VMSBC_VVM 0xfe00707f -#define MATCH_VMSBC_VX 0x4e004057 -#define MASK_VMSBC_VX 0xfe00707f -#define MATCH_VMSBC_VXM 0x4c004057 -#define MASK_VMSBC_VXM 0xfe00707f -#define MATCH_VMSBF_M 0x5000a057 -#define MASK_VMSBF_M 0xfc0ff07f -#define MATCH_VMSEQ_VI 0x60003057 -#define MASK_VMSEQ_VI 0xfc00707f -#define MATCH_VMSEQ_VV 0x60000057 -#define MASK_VMSEQ_VV 0xfc00707f -#define MATCH_VMSEQ_VX 0x60004057 -#define MASK_VMSEQ_VX 0xfc00707f -#define MATCH_VMSGT_VI 0x7c003057 -#define MASK_VMSGT_VI 0xfc00707f -#define MATCH_VMSGT_VX 0x7c004057 -#define MASK_VMSGT_VX 0xfc00707f -#define MATCH_VMSGTU_VI 0x78003057 -#define MASK_VMSGTU_VI 0xfc00707f -#define MATCH_VMSGTU_VX 0x78004057 -#define MASK_VMSGTU_VX 0xfc00707f -#define MATCH_VMSIF_M 0x5001a057 -#define MASK_VMSIF_M 0xfc0ff07f -#define MATCH_VMSLE_VI 0x74003057 -#define MASK_VMSLE_VI 0xfc00707f -#define MATCH_VMSLE_VV 0x74000057 -#define MASK_VMSLE_VV 0xfc00707f -#define MATCH_VMSLE_VX 0x74004057 -#define MASK_VMSLE_VX 0xfc00707f -#define MATCH_VMSLEU_VI 0x70003057 -#define MASK_VMSLEU_VI 0xfc00707f -#define MATCH_VMSLEU_VV 0x70000057 -#define MASK_VMSLEU_VV 0xfc00707f -#define MATCH_VMSLEU_VX 0x70004057 -#define MASK_VMSLEU_VX 0xfc00707f -#define MATCH_VMSLT_VV 0x6c000057 -#define MASK_VMSLT_VV 0xfc00707f -#define MATCH_VMSLT_VX 0x6c004057 -#define MASK_VMSLT_VX 0xfc00707f -#define MATCH_VMSLTU_VV 0x68000057 -#define MASK_VMSLTU_VV 0xfc00707f -#define MATCH_VMSLTU_VX 0x68004057 -#define MASK_VMSLTU_VX 0xfc00707f -#define MATCH_VMSNE_VI 0x64003057 -#define MASK_VMSNE_VI 0xfc00707f -#define MATCH_VMSNE_VV 0x64000057 -#define MASK_VMSNE_VV 0xfc00707f -#define MATCH_VMSNE_VX 0x64004057 -#define MASK_VMSNE_VX 0xfc00707f -#define MATCH_VMSOF_M 0x50012057 -#define MASK_VMSOF_M 0xfc0ff07f -#define MATCH_VMUL_VV 0x94002057 -#define MASK_VMUL_VV 0xfc00707f -#define MATCH_VMUL_VX 0x94006057 -#define MASK_VMUL_VX 0xfc00707f -#define MATCH_VMULH_VV 0x9c002057 -#define MASK_VMULH_VV 0xfc00707f -#define MATCH_VMULH_VX 0x9c006057 -#define MASK_VMULH_VX 0xfc00707f -#define MATCH_VMULHSU_VV 0x98002057 -#define MASK_VMULHSU_VV 0xfc00707f -#define MATCH_VMULHSU_VX 0x98006057 -#define MASK_VMULHSU_VX 0xfc00707f -#define MATCH_VMULHU_VV 0x90002057 -#define MASK_VMULHU_VV 0xfc00707f -#define MATCH_VMULHU_VX 0x90006057 -#define MASK_VMULHU_VX 0xfc00707f -#define MATCH_VMV1R_V 0x9e003057 -#define MASK_VMV1R_V 0xfe0ff07f -#define MATCH_VMV2R_V 0x9e00b057 -#define MASK_VMV2R_V 0xfe0ff07f -#define MATCH_VMV4R_V 0x9e01b057 -#define MASK_VMV4R_V 0xfe0ff07f -#define MATCH_VMV8R_V 0x9e03b057 -#define MASK_VMV8R_V 0xfe0ff07f -#define MATCH_VMV_S_X 0x42006057 -#define MASK_VMV_S_X 0xfff0707f -#define MATCH_VMV_V_I 0x5e003057 -#define MASK_VMV_V_I 0xfff0707f -#define MATCH_VMV_V_V 0x5e000057 -#define MASK_VMV_V_V 0xfff0707f -#define MATCH_VMV_V_X 0x5e004057 -#define MASK_VMV_V_X 0xfff0707f -#define MATCH_VMV_X_S 0x42002057 -#define MASK_VMV_X_S 0xfe0ff07f -#define MATCH_VMXNOR_MM 0x7c002057 -#define MASK_VMXNOR_MM 0xfc00707f -#define MATCH_VMXOR_MM 0x6c002057 -#define MASK_VMXOR_MM 0xfc00707f -#define MATCH_VNCLIP_WI 0xbc003057 -#define MASK_VNCLIP_WI 0xfc00707f -#define MATCH_VNCLIP_WV 0xbc000057 -#define MASK_VNCLIP_WV 0xfc00707f -#define MATCH_VNCLIP_WX 0xbc004057 -#define MASK_VNCLIP_WX 0xfc00707f -#define MATCH_VNCLIPU_WI 0xb8003057 -#define MASK_VNCLIPU_WI 0xfc00707f -#define MATCH_VNCLIPU_WV 0xb8000057 -#define MASK_VNCLIPU_WV 0xfc00707f -#define MATCH_VNCLIPU_WX 0xb8004057 -#define MASK_VNCLIPU_WX 0xfc00707f -#define MATCH_VNMSAC_VV 0xbc002057 -#define MASK_VNMSAC_VV 0xfc00707f -#define MATCH_VNMSAC_VX 0xbc006057 -#define MASK_VNMSAC_VX 0xfc00707f -#define MATCH_VNMSUB_VV 0xac002057 -#define MASK_VNMSUB_VV 0xfc00707f -#define MATCH_VNMSUB_VX 0xac006057 -#define MASK_VNMSUB_VX 0xfc00707f -#define MATCH_VNSRA_WI 0xb4003057 -#define MASK_VNSRA_WI 0xfc00707f -#define MATCH_VNSRA_WV 0xb4000057 -#define MASK_VNSRA_WV 0xfc00707f -#define MATCH_VNSRA_WX 0xb4004057 -#define MASK_VNSRA_WX 0xfc00707f -#define MATCH_VNSRL_WI 0xb0003057 -#define MASK_VNSRL_WI 0xfc00707f -#define MATCH_VNSRL_WV 0xb0000057 -#define MASK_VNSRL_WV 0xfc00707f -#define MATCH_VNSRL_WX 0xb0004057 -#define MASK_VNSRL_WX 0xfc00707f -#define MATCH_VOR_VI 0x28003057 -#define MASK_VOR_VI 0xfc00707f -#define MATCH_VOR_VV 0x28000057 -#define MASK_VOR_VV 0xfc00707f -#define MATCH_VOR_VX 0x28004057 -#define MASK_VOR_VX 0xfc00707f -#define MATCH_VREDAND_VS 0x4002057 -#define MASK_VREDAND_VS 0xfc00707f -#define MATCH_VREDMAX_VS 0x1c002057 -#define MASK_VREDMAX_VS 0xfc00707f -#define MATCH_VREDMAXU_VS 0x18002057 -#define MASK_VREDMAXU_VS 0xfc00707f -#define MATCH_VREDMIN_VS 0x14002057 -#define MASK_VREDMIN_VS 0xfc00707f -#define MATCH_VREDMINU_VS 0x10002057 -#define MASK_VREDMINU_VS 0xfc00707f -#define MATCH_VREDOR_VS 0x8002057 -#define MASK_VREDOR_VS 0xfc00707f -#define MATCH_VREDSUM_VS 0x2057 -#define MASK_VREDSUM_VS 0xfc00707f -#define MATCH_VREDXOR_VS 0xc002057 -#define MASK_VREDXOR_VS 0xfc00707f -#define MATCH_VREM_VV 0x8c002057 -#define MASK_VREM_VV 0xfc00707f -#define MATCH_VREM_VX 0x8c006057 -#define MASK_VREM_VX 0xfc00707f -#define MATCH_VREMU_VV 0x88002057 -#define MASK_VREMU_VV 0xfc00707f -#define MATCH_VREMU_VX 0x88006057 -#define MASK_VREMU_VX 0xfc00707f -#define MATCH_VRGATHER_VI 0x30003057 -#define MASK_VRGATHER_VI 0xfc00707f -#define MATCH_VRGATHER_VV 0x30000057 -#define MASK_VRGATHER_VV 0xfc00707f -#define MATCH_VRGATHER_VX 0x30004057 -#define MASK_VRGATHER_VX 0xfc00707f -#define MATCH_VRGATHEREI16_VV 0x38000057 -#define MASK_VRGATHEREI16_VV 0xfc00707f -#define MATCH_VRSUB_VI 0xc003057 -#define MASK_VRSUB_VI 0xfc00707f -#define MATCH_VRSUB_VX 0xc004057 -#define MASK_VRSUB_VX 0xfc00707f -#define MATCH_VS1R_V 0x2800027 -#define MASK_VS1R_V 0xfff0707f -#define MATCH_VS2R_V 0x22800027 -#define MASK_VS2R_V 0xfff0707f -#define MATCH_VS4R_V 0x62800027 -#define MASK_VS4R_V 0xfff0707f -#define MATCH_VS8R_V 0xe2800027 -#define MASK_VS8R_V 0xfff0707f -#define MATCH_VSADD_VI 0x84003057 -#define MASK_VSADD_VI 0xfc00707f -#define MATCH_VSADD_VV 0x84000057 -#define MASK_VSADD_VV 0xfc00707f -#define MATCH_VSADD_VX 0x84004057 -#define MASK_VSADD_VX 0xfc00707f -#define MATCH_VSADDU_VI 0x80003057 -#define MASK_VSADDU_VI 0xfc00707f -#define MATCH_VSADDU_VV 0x80000057 -#define MASK_VSADDU_VV 0xfc00707f -#define MATCH_VSADDU_VX 0x80004057 -#define MASK_VSADDU_VX 0xfc00707f -#define MATCH_VSBC_VVM 0x48000057 -#define MASK_VSBC_VVM 0xfe00707f -#define MATCH_VSBC_VXM 0x48004057 -#define MASK_VSBC_VXM 0xfe00707f -#define MATCH_VSE1024_V 0x10007027 -#define MASK_VSE1024_V 0x1df0707f -#define MATCH_VSE128_V 0x10000027 -#define MASK_VSE128_V 0x1df0707f -#define MATCH_VSE16_V 0x5027 -#define MASK_VSE16_V 0x1df0707f -#define MATCH_VSE256_V 0x10005027 -#define MASK_VSE256_V 0x1df0707f -#define MATCH_VSE32_V 0x6027 -#define MASK_VSE32_V 0x1df0707f -#define MATCH_VSE512_V 0x10006027 -#define MASK_VSE512_V 0x1df0707f -#define MATCH_VSE64_V 0x7027 -#define MASK_VSE64_V 0x1df0707f -#define MATCH_VSE8_V 0x27 -#define MASK_VSE8_V 0x1df0707f -#define MATCH_VSETIVLI 0xc0007057 -#define MASK_VSETIVLI 0xc000707f -#define MATCH_VSETVL 0x80007057 -#define MASK_VSETVL 0xfe00707f -#define MATCH_VSETVLI 0x7057 -#define MASK_VSETVLI 0x8000707f -#define MATCH_VSEXT_VF2 0x4803a057 -#define MASK_VSEXT_VF2 0xfc0ff07f -#define MATCH_VSEXT_VF4 0x4802a057 -#define MASK_VSEXT_VF4 0xfc0ff07f -#define MATCH_VSEXT_VF8 0x4801a057 -#define MASK_VSEXT_VF8 0xfc0ff07f -#define MATCH_VSLIDE1DOWN_VX 0x3c006057 -#define MASK_VSLIDE1DOWN_VX 0xfc00707f -#define MATCH_VSLIDE1UP_VX 0x38006057 -#define MASK_VSLIDE1UP_VX 0xfc00707f -#define MATCH_VSLIDEDOWN_VI 0x3c003057 -#define MASK_VSLIDEDOWN_VI 0xfc00707f -#define MATCH_VSLIDEDOWN_VX 0x3c004057 -#define MASK_VSLIDEDOWN_VX 0xfc00707f -#define MATCH_VSLIDEUP_VI 0x38003057 -#define MASK_VSLIDEUP_VI 0xfc00707f -#define MATCH_VSLIDEUP_VX 0x38004057 -#define MASK_VSLIDEUP_VX 0xfc00707f -#define MATCH_VSLL_VI 0x94003057 -#define MASK_VSLL_VI 0xfc00707f -#define MATCH_VSLL_VV 0x94000057 -#define MASK_VSLL_VV 0xfc00707f -#define MATCH_VSLL_VX 0x94004057 -#define MASK_VSLL_VX 0xfc00707f -#define MATCH_VSM_V 0x2b00027 -#define MASK_VSM_V 0xfff0707f -#define MATCH_VSMUL_VV 0x9c000057 -#define MASK_VSMUL_VV 0xfc00707f -#define MATCH_VSMUL_VX 0x9c004057 -#define MASK_VSMUL_VX 0xfc00707f -#define MATCH_VSOXEI1024_V 0x1c007027 -#define MASK_VSOXEI1024_V 0x1c00707f -#define MATCH_VSOXEI128_V 0x1c000027 -#define MASK_VSOXEI128_V 0x1c00707f -#define MATCH_VSOXEI16_V 0xc005027 -#define MASK_VSOXEI16_V 0x1c00707f -#define MATCH_VSOXEI256_V 0x1c005027 -#define MASK_VSOXEI256_V 0x1c00707f -#define MATCH_VSOXEI32_V 0xc006027 -#define MASK_VSOXEI32_V 0x1c00707f -#define MATCH_VSOXEI512_V 0x1c006027 -#define MASK_VSOXEI512_V 0x1c00707f -#define MATCH_VSOXEI64_V 0xc007027 -#define MASK_VSOXEI64_V 0x1c00707f -#define MATCH_VSOXEI8_V 0xc000027 -#define MASK_VSOXEI8_V 0x1c00707f -#define MATCH_VSRA_VI 0xa4003057 -#define MASK_VSRA_VI 0xfc00707f -#define MATCH_VSRA_VV 0xa4000057 -#define MASK_VSRA_VV 0xfc00707f -#define MATCH_VSRA_VX 0xa4004057 -#define MASK_VSRA_VX 0xfc00707f -#define MATCH_VSRL_VI 0xa0003057 -#define MASK_VSRL_VI 0xfc00707f -#define MATCH_VSRL_VV 0xa0000057 -#define MASK_VSRL_VV 0xfc00707f -#define MATCH_VSRL_VX 0xa0004057 -#define MASK_VSRL_VX 0xfc00707f -#define MATCH_VSSE1024_V 0x18007027 -#define MASK_VSSE1024_V 0x1c00707f -#define MATCH_VSSE128_V 0x18000027 -#define MASK_VSSE128_V 0x1c00707f -#define MATCH_VSSE16_V 0x8005027 -#define MASK_VSSE16_V 0x1c00707f -#define MATCH_VSSE256_V 0x18005027 -#define MASK_VSSE256_V 0x1c00707f -#define MATCH_VSSE32_V 0x8006027 -#define MASK_VSSE32_V 0x1c00707f -#define MATCH_VSSE512_V 0x18006027 -#define MASK_VSSE512_V 0x1c00707f -#define MATCH_VSSE64_V 0x8007027 -#define MASK_VSSE64_V 0x1c00707f -#define MATCH_VSSE8_V 0x8000027 -#define MASK_VSSE8_V 0x1c00707f -#define MATCH_VSSRA_VI 0xac003057 -#define MASK_VSSRA_VI 0xfc00707f -#define MATCH_VSSRA_VV 0xac000057 -#define MASK_VSSRA_VV 0xfc00707f -#define MATCH_VSSRA_VX 0xac004057 -#define MASK_VSSRA_VX 0xfc00707f -#define MATCH_VSSRL_VI 0xa8003057 -#define MASK_VSSRL_VI 0xfc00707f -#define MATCH_VSSRL_VV 0xa8000057 -#define MASK_VSSRL_VV 0xfc00707f -#define MATCH_VSSRL_VX 0xa8004057 -#define MASK_VSSRL_VX 0xfc00707f -#define MATCH_VSSUB_VV 0x8c000057 -#define MASK_VSSUB_VV 0xfc00707f -#define MATCH_VSSUB_VX 0x8c004057 -#define MASK_VSSUB_VX 0xfc00707f -#define MATCH_VSSUBU_VV 0x88000057 -#define MASK_VSSUBU_VV 0xfc00707f -#define MATCH_VSSUBU_VX 0x88004057 -#define MASK_VSSUBU_VX 0xfc00707f -#define MATCH_VSUB_VV 0x8000057 -#define MASK_VSUB_VV 0xfc00707f -#define MATCH_VSUB_VX 0x8004057 -#define MASK_VSUB_VX 0xfc00707f -#define MATCH_VSUXEI1024_V 0x14007027 -#define MASK_VSUXEI1024_V 0x1c00707f -#define MATCH_VSUXEI128_V 0x14000027 -#define MASK_VSUXEI128_V 0x1c00707f -#define MATCH_VSUXEI16_V 0x4005027 -#define MASK_VSUXEI16_V 0x1c00707f -#define MATCH_VSUXEI256_V 0x14005027 -#define MASK_VSUXEI256_V 0x1c00707f -#define MATCH_VSUXEI32_V 0x4006027 -#define MASK_VSUXEI32_V 0x1c00707f -#define MATCH_VSUXEI512_V 0x14006027 -#define MASK_VSUXEI512_V 0x1c00707f -#define MATCH_VSUXEI64_V 0x4007027 -#define MASK_VSUXEI64_V 0x1c00707f -#define MATCH_VSUXEI8_V 0x4000027 -#define MASK_VSUXEI8_V 0x1c00707f -#define MATCH_VWADD_VV 0xc4002057 -#define MASK_VWADD_VV 0xfc00707f -#define MATCH_VWADD_VX 0xc4006057 -#define MASK_VWADD_VX 0xfc00707f -#define MATCH_VWADD_WV 0xd4002057 -#define MASK_VWADD_WV 0xfc00707f -#define MATCH_VWADD_WX 0xd4006057 -#define MASK_VWADD_WX 0xfc00707f -#define MATCH_VWADDU_VV 0xc0002057 -#define MASK_VWADDU_VV 0xfc00707f -#define MATCH_VWADDU_VX 0xc0006057 -#define MASK_VWADDU_VX 0xfc00707f -#define MATCH_VWADDU_WV 0xd0002057 -#define MASK_VWADDU_WV 0xfc00707f -#define MATCH_VWADDU_WX 0xd0006057 -#define MASK_VWADDU_WX 0xfc00707f -#define MATCH_VWMACC_VV 0xf4002057 -#define MASK_VWMACC_VV 0xfc00707f -#define MATCH_VWMACC_VX 0xf4006057 -#define MASK_VWMACC_VX 0xfc00707f -#define MATCH_VWMACCSU_VV 0xfc002057 -#define MASK_VWMACCSU_VV 0xfc00707f -#define MATCH_VWMACCSU_VX 0xfc006057 -#define MASK_VWMACCSU_VX 0xfc00707f -#define MATCH_VWMACCU_VV 0xf0002057 -#define MASK_VWMACCU_VV 0xfc00707f -#define MATCH_VWMACCU_VX 0xf0006057 -#define MASK_VWMACCU_VX 0xfc00707f -#define MATCH_VWMACCUS_VX 0xf8006057 -#define MASK_VWMACCUS_VX 0xfc00707f -#define MATCH_VWMUL_VV 0xec002057 -#define MASK_VWMUL_VV 0xfc00707f -#define MATCH_VWMUL_VX 0xec006057 -#define MASK_VWMUL_VX 0xfc00707f -#define MATCH_VWMULSU_VV 0xe8002057 -#define MASK_VWMULSU_VV 0xfc00707f -#define MATCH_VWMULSU_VX 0xe8006057 -#define MASK_VWMULSU_VX 0xfc00707f -#define MATCH_VWMULU_VV 0xe0002057 -#define MASK_VWMULU_VV 0xfc00707f -#define MATCH_VWMULU_VX 0xe0006057 -#define MASK_VWMULU_VX 0xfc00707f -#define MATCH_VWREDSUM_VS 0xc4000057 -#define MASK_VWREDSUM_VS 0xfc00707f -#define MATCH_VWREDSUMU_VS 0xc0000057 -#define MASK_VWREDSUMU_VS 0xfc00707f -#define MATCH_VWSUB_VV 0xcc002057 -#define MASK_VWSUB_VV 0xfc00707f -#define MATCH_VWSUB_VX 0xcc006057 -#define MASK_VWSUB_VX 0xfc00707f -#define MATCH_VWSUB_WV 0xdc002057 -#define MASK_VWSUB_WV 0xfc00707f -#define MATCH_VWSUB_WX 0xdc006057 -#define MASK_VWSUB_WX 0xfc00707f -#define MATCH_VWSUBU_VV 0xc8002057 -#define MASK_VWSUBU_VV 0xfc00707f -#define MATCH_VWSUBU_VX 0xc8006057 -#define MASK_VWSUBU_VX 0xfc00707f -#define MATCH_VWSUBU_WV 0xd8002057 -#define MASK_VWSUBU_WV 0xfc00707f -#define MATCH_VWSUBU_WX 0xd8006057 -#define MASK_VWSUBU_WX 0xfc00707f -#define MATCH_VXOR_VI 0x2c003057 -#define MASK_VXOR_VI 0xfc00707f -#define MATCH_VXOR_VV 0x2c000057 -#define MASK_VXOR_VV 0xfc00707f -#define MATCH_VXOR_VX 0x2c004057 -#define MASK_VXOR_VX 0xfc00707f -#define MATCH_VZEXT_VF2 0x48032057 -#define MASK_VZEXT_VF2 0xfc0ff07f -#define MATCH_VZEXT_VF4 0x48022057 -#define MASK_VZEXT_VF4 0xfc0ff07f -#define MATCH_VZEXT_VF8 0x48012057 -#define MASK_VZEXT_VF8 0xfc0ff07f -#define MATCH_WFI 0x10500073 -#define MASK_WFI 0xffffffff -#define MATCH_WRS_NTO 0xd00073 -#define MASK_WRS_NTO 0xffffffff -#define MATCH_WRS_STO 0x1d00073 -#define MASK_WRS_STO 0xffffffff -#define MATCH_XNOR 0x40004033 -#define MASK_XNOR 0xfe00707f -#define MATCH_XOR 0x4033 -#define MASK_XOR 0xfe00707f -#define MATCH_XORI 0x4013 -#define MASK_XORI 0x707f -#define MATCH_XPERM16 0x28006033 -#define MASK_XPERM16 0xfe00707f -#define MATCH_XPERM32 0x28000033 -#define MASK_XPERM32 0xfe00707f -#define MATCH_XPERM4 0x28002033 -#define MASK_XPERM4 0xfe00707f -#define MATCH_XPERM8 0x28004033 -#define MASK_XPERM8 0xfe00707f -#define MATCH_ZUNPKD810 0xacc00077 -#define MASK_ZUNPKD810 0xfff0707f -#define MATCH_ZUNPKD820 0xacd00077 -#define MASK_ZUNPKD820 0xfff0707f -#define MATCH_ZUNPKD830 0xace00077 -#define MASK_ZUNPKD830 0xfff0707f -#define MATCH_ZUNPKD831 0xacf00077 -#define MASK_ZUNPKD831 0xfff0707f -#define MATCH_ZUNPKD832 0xad700077 -#define MASK_ZUNPKD832 0xfff0707f - -#define CSR_FFLAGS 0x1 -#define CSR_FRM 0x2 -#define CSR_FCSR 0x3 -#define CSR_VSTART 0x8 -#define CSR_VXSAT 0x9 -#define CSR_VXRM 0xa -#define CSR_VCSR 0xf -#define CSR_SEED 0x15 -#define CSR_JVT 0x17 -#define CSR_CYCLE 0xc00 -#define CSR_TIME 0xc01 -#define CSR_INSTRET 0xc02 -#define CSR_HPMCOUNTER3 0xc03 -#define CSR_HPMCOUNTER4 0xc04 -#define CSR_HPMCOUNTER5 0xc05 -#define CSR_HPMCOUNTER6 0xc06 -#define CSR_HPMCOUNTER7 0xc07 -#define CSR_HPMCOUNTER8 0xc08 -#define CSR_HPMCOUNTER9 0xc09 -#define CSR_HPMCOUNTER10 0xc0a -#define CSR_HPMCOUNTER11 0xc0b -#define CSR_HPMCOUNTER12 0xc0c -#define CSR_HPMCOUNTER13 0xc0d -#define CSR_HPMCOUNTER14 0xc0e -#define CSR_HPMCOUNTER15 0xc0f -#define CSR_HPMCOUNTER16 0xc10 -#define CSR_HPMCOUNTER17 0xc11 -#define CSR_HPMCOUNTER18 0xc12 -#define CSR_HPMCOUNTER19 0xc13 -#define CSR_HPMCOUNTER20 0xc14 -#define CSR_HPMCOUNTER21 0xc15 -#define CSR_HPMCOUNTER22 0xc16 -#define CSR_HPMCOUNTER23 0xc17 -#define CSR_HPMCOUNTER24 0xc18 -#define CSR_HPMCOUNTER25 0xc19 -#define CSR_HPMCOUNTER26 0xc1a -#define CSR_HPMCOUNTER27 0xc1b -#define CSR_HPMCOUNTER28 0xc1c -#define CSR_HPMCOUNTER29 0xc1d -#define CSR_HPMCOUNTER30 0xc1e -#define CSR_HPMCOUNTER31 0xc1f -#define CSR_VL 0xc20 -#define CSR_VTYPE 0xc21 -#define CSR_VLENB 0xc22 -#define CSR_SSTATUS 0x100 -#define CSR_SEDELEG 0x102 -#define CSR_SIDELEG 0x103 -#define CSR_SIE 0x104 -#define CSR_STVEC 0x105 -#define CSR_SCOUNTEREN 0x106 -#define CSR_SENVCFG 0x10a -#define CSR_SSTATEEN0 0x10c -#define CSR_SSTATEEN1 0x10d -#define CSR_SSTATEEN2 0x10e -#define CSR_SSTATEEN3 0x10f -#define CSR_SSCRATCH 0x140 -#define CSR_SEPC 0x141 -#define CSR_SCAUSE 0x142 -#define CSR_STVAL 0x143 -#define CSR_SIP 0x144 -#define CSR_STIMECMP 0x14d -#define CSR_SISELECT 0x150 -#define CSR_SIREG 0x151 -#define CSR_STOPEI 0x15c -#define CSR_SATP 0x180 -#define CSR_SCONTEXT 0x5a8 -#define CSR_VSSTATUS 0x200 -#define CSR_VSIE 0x204 -#define CSR_VSTVEC 0x205 -#define CSR_VSSCRATCH 0x240 -#define CSR_VSEPC 0x241 -#define CSR_VSCAUSE 0x242 -#define CSR_VSTVAL 0x243 -#define CSR_VSIP 0x244 -#define CSR_VSTIMECMP 0x24d -#define CSR_VSISELECT 0x250 -#define CSR_VSIREG 0x251 -#define CSR_VSTOPEI 0x25c -#define CSR_VSATP 0x280 -#define CSR_HSTATUS 0x600 -#define CSR_HEDELEG 0x602 -#define CSR_HIDELEG 0x603 -#define CSR_HIE 0x604 -#define CSR_HTIMEDELTA 0x605 -#define CSR_HCOUNTEREN 0x606 -#define CSR_HGEIE 0x607 -#define CSR_HVIEN 0x608 -#define CSR_HVICTL 0x609 -#define CSR_HENVCFG 0x60a -#define CSR_HSTATEEN0 0x60c -#define CSR_HSTATEEN1 0x60d -#define CSR_HSTATEEN2 0x60e -#define CSR_HSTATEEN3 0x60f -#define CSR_HTVAL 0x643 -#define CSR_HIP 0x644 -#define CSR_HVIP 0x645 -#define CSR_HVIPRIO1 0x646 -#define CSR_HVIPRIO2 0x647 -#define CSR_HTINST 0x64a -#define CSR_HGATP 0x680 -#define CSR_HCONTEXT 0x6a8 -#define CSR_HGEIP 0xe12 -#define CSR_VSTOPI 0xeb0 -#define CSR_SCOUNTOVF 0xda0 -#define CSR_STOPI 0xdb0 -#define CSR_UTVT 0x7 -#define CSR_UNXTI 0x45 -#define CSR_UINTSTATUS 0x46 -#define CSR_USCRATCHCSW 0x48 -#define CSR_USCRATCHCSWL 0x49 -#define CSR_STVT 0x107 -#define CSR_SNXTI 0x145 -#define CSR_SINTSTATUS 0x146 -#define CSR_SSCRATCHCSW 0x148 -#define CSR_SSCRATCHCSWL 0x149 -#define CSR_MTVT 0x307 -#define CSR_MNXTI 0x345 -#define CSR_MINTSTATUS 0x346 -#define CSR_MSCRATCHCSW 0x348 -#define CSR_MSCRATCHCSWL 0x349 -#define CSR_MSTATUS 0x300 -#define CSR_MISA 0x301 -#define CSR_MEDELEG 0x302 -#define CSR_MIDELEG 0x303 -#define CSR_MIE 0x304 -#define CSR_MTVEC 0x305 -#define CSR_MCOUNTEREN 0x306 -#define CSR_MVIEN 0x308 -#define CSR_MVIP 0x309 -#define CSR_MENVCFG 0x30a -#define CSR_MSTATEEN0 0x30c -#define CSR_MSTATEEN1 0x30d -#define CSR_MSTATEEN2 0x30e -#define CSR_MSTATEEN3 0x30f -#define CSR_MCOUNTINHIBIT 0x320 -#define CSR_MSCRATCH 0x340 -#define CSR_MEPC 0x341 -#define CSR_MCAUSE 0x342 -#define CSR_MTVAL 0x343 -#define CSR_MIP 0x344 -#define CSR_MTINST 0x34a -#define CSR_MTVAL2 0x34b -#define CSR_MISELECT 0x350 -#define CSR_MIREG 0x351 -#define CSR_MTOPEI 0x35c -#define CSR_PMPCFG0 0x3a0 -#define CSR_PMPCFG1 0x3a1 -#define CSR_PMPCFG2 0x3a2 -#define CSR_PMPCFG3 0x3a3 -#define CSR_PMPCFG4 0x3a4 -#define CSR_PMPCFG5 0x3a5 -#define CSR_PMPCFG6 0x3a6 -#define CSR_PMPCFG7 0x3a7 -#define CSR_PMPCFG8 0x3a8 -#define CSR_PMPCFG9 0x3a9 -#define CSR_PMPCFG10 0x3aa -#define CSR_PMPCFG11 0x3ab -#define CSR_PMPCFG12 0x3ac -#define CSR_PMPCFG13 0x3ad -#define CSR_PMPCFG14 0x3ae -#define CSR_PMPCFG15 0x3af -#define CSR_PMPADDR0 0x3b0 -#define CSR_PMPADDR1 0x3b1 -#define CSR_PMPADDR2 0x3b2 -#define CSR_PMPADDR3 0x3b3 -#define CSR_PMPADDR4 0x3b4 -#define CSR_PMPADDR5 0x3b5 -#define CSR_PMPADDR6 0x3b6 -#define CSR_PMPADDR7 0x3b7 -#define CSR_PMPADDR8 0x3b8 -#define CSR_PMPADDR9 0x3b9 -#define CSR_PMPADDR10 0x3ba -#define CSR_PMPADDR11 0x3bb -#define CSR_PMPADDR12 0x3bc -#define CSR_PMPADDR13 0x3bd -#define CSR_PMPADDR14 0x3be -#define CSR_PMPADDR15 0x3bf -#define CSR_PMPADDR16 0x3c0 -#define CSR_PMPADDR17 0x3c1 -#define CSR_PMPADDR18 0x3c2 -#define CSR_PMPADDR19 0x3c3 -#define CSR_PMPADDR20 0x3c4 -#define CSR_PMPADDR21 0x3c5 -#define CSR_PMPADDR22 0x3c6 -#define CSR_PMPADDR23 0x3c7 -#define CSR_PMPADDR24 0x3c8 -#define CSR_PMPADDR25 0x3c9 -#define CSR_PMPADDR26 0x3ca -#define CSR_PMPADDR27 0x3cb -#define CSR_PMPADDR28 0x3cc -#define CSR_PMPADDR29 0x3cd -#define CSR_PMPADDR30 0x3ce -#define CSR_PMPADDR31 0x3cf -#define CSR_PMPADDR32 0x3d0 -#define CSR_PMPADDR33 0x3d1 -#define CSR_PMPADDR34 0x3d2 -#define CSR_PMPADDR35 0x3d3 -#define CSR_PMPADDR36 0x3d4 -#define CSR_PMPADDR37 0x3d5 -#define CSR_PMPADDR38 0x3d6 -#define CSR_PMPADDR39 0x3d7 -#define CSR_PMPADDR40 0x3d8 -#define CSR_PMPADDR41 0x3d9 -#define CSR_PMPADDR42 0x3da -#define CSR_PMPADDR43 0x3db -#define CSR_PMPADDR44 0x3dc -#define CSR_PMPADDR45 0x3dd -#define CSR_PMPADDR46 0x3de -#define CSR_PMPADDR47 0x3df -#define CSR_PMPADDR48 0x3e0 -#define CSR_PMPADDR49 0x3e1 -#define CSR_PMPADDR50 0x3e2 -#define CSR_PMPADDR51 0x3e3 -#define CSR_PMPADDR52 0x3e4 -#define CSR_PMPADDR53 0x3e5 -#define CSR_PMPADDR54 0x3e6 -#define CSR_PMPADDR55 0x3e7 -#define CSR_PMPADDR56 0x3e8 -#define CSR_PMPADDR57 0x3e9 -#define CSR_PMPADDR58 0x3ea -#define CSR_PMPADDR59 0x3eb -#define CSR_PMPADDR60 0x3ec -#define CSR_PMPADDR61 0x3ed -#define CSR_PMPADDR62 0x3ee -#define CSR_PMPADDR63 0x3ef -#define CSR_MSECCFG 0x747 -#define CSR_TSELECT 0x7a0 -#define CSR_TDATA1 0x7a1 -#define CSR_TDATA2 0x7a2 -#define CSR_TDATA3 0x7a3 -#define CSR_TINFO 0x7a4 -#define CSR_TCONTROL 0x7a5 -#define CSR_MCONTEXT 0x7a8 -#define CSR_MSCONTEXT 0x7aa -#define CSR_DCSR 0x7b0 -#define CSR_DPC 0x7b1 -#define CSR_DSCRATCH0 0x7b2 -#define CSR_DSCRATCH1 0x7b3 -#define CSR_MCYCLE 0xb00 -#define CSR_MINSTRET 0xb02 -#define CSR_MHPMCOUNTER3 0xb03 -#define CSR_MHPMCOUNTER4 0xb04 -#define CSR_MHPMCOUNTER5 0xb05 -#define CSR_MHPMCOUNTER6 0xb06 -#define CSR_MHPMCOUNTER7 0xb07 -#define CSR_MHPMCOUNTER8 0xb08 -#define CSR_MHPMCOUNTER9 0xb09 -#define CSR_MHPMCOUNTER10 0xb0a -#define CSR_MHPMCOUNTER11 0xb0b -#define CSR_MHPMCOUNTER12 0xb0c -#define CSR_MHPMCOUNTER13 0xb0d -#define CSR_MHPMCOUNTER14 0xb0e -#define CSR_MHPMCOUNTER15 0xb0f -#define CSR_MHPMCOUNTER16 0xb10 -#define CSR_MHPMCOUNTER17 0xb11 -#define CSR_MHPMCOUNTER18 0xb12 -#define CSR_MHPMCOUNTER19 0xb13 -#define CSR_MHPMCOUNTER20 0xb14 -#define CSR_MHPMCOUNTER21 0xb15 -#define CSR_MHPMCOUNTER22 0xb16 -#define CSR_MHPMCOUNTER23 0xb17 -#define CSR_MHPMCOUNTER24 0xb18 -#define CSR_MHPMCOUNTER25 0xb19 -#define CSR_MHPMCOUNTER26 0xb1a -#define CSR_MHPMCOUNTER27 0xb1b -#define CSR_MHPMCOUNTER28 0xb1c -#define CSR_MHPMCOUNTER29 0xb1d -#define CSR_MHPMCOUNTER30 0xb1e -#define CSR_MHPMCOUNTER31 0xb1f -#define CSR_MHPMEVENT3 0x323 -#define CSR_MHPMEVENT4 0x324 -#define CSR_MHPMEVENT5 0x325 -#define CSR_MHPMEVENT6 0x326 -#define CSR_MHPMEVENT7 0x327 -#define CSR_MHPMEVENT8 0x328 -#define CSR_MHPMEVENT9 0x329 -#define CSR_MHPMEVENT10 0x32a -#define CSR_MHPMEVENT11 0x32b -#define CSR_MHPMEVENT12 0x32c -#define CSR_MHPMEVENT13 0x32d -#define CSR_MHPMEVENT14 0x32e -#define CSR_MHPMEVENT15 0x32f -#define CSR_MHPMEVENT16 0x330 -#define CSR_MHPMEVENT17 0x331 -#define CSR_MHPMEVENT18 0x332 -#define CSR_MHPMEVENT19 0x333 -#define CSR_MHPMEVENT20 0x334 -#define CSR_MHPMEVENT21 0x335 -#define CSR_MHPMEVENT22 0x336 -#define CSR_MHPMEVENT23 0x337 -#define CSR_MHPMEVENT24 0x338 -#define CSR_MHPMEVENT25 0x339 -#define CSR_MHPMEVENT26 0x33a -#define CSR_MHPMEVENT27 0x33b -#define CSR_MHPMEVENT28 0x33c -#define CSR_MHPMEVENT29 0x33d -#define CSR_MHPMEVENT30 0x33e -#define CSR_MHPMEVENT31 0x33f -#define CSR_MVENDORID 0xf11 -#define CSR_MARCHID 0xf12 -#define CSR_MIMPID 0xf13 -#define CSR_MHARTID 0xf14 -#define CSR_MCONFIGPTR 0xf15 -#define CSR_MTOPI 0xfb0 -#define CSR_SIEH 0x114 -#define CSR_SIPH 0x154 -#define CSR_STIMECMPH 0x15d -#define CSR_VSIEH 0x214 -#define CSR_VSIPH 0x254 -#define CSR_VSTIMECMPH 0x25d -#define CSR_HTIMEDELTAH 0x615 -#define CSR_HIDELEGH 0x613 -#define CSR_HVIENH 0x618 -#define CSR_HENVCFGH 0x61a -#define CSR_HVIPH 0x655 -#define CSR_HVIPRIO1H 0x656 -#define CSR_HVIPRIO2H 0x657 -#define CSR_HSTATEEN0H 0x61c -#define CSR_HSTATEEN1H 0x61d -#define CSR_HSTATEEN2H 0x61e -#define CSR_HSTATEEN3H 0x61f -#define CSR_CYCLEH 0xc80 -#define CSR_TIMEH 0xc81 -#define CSR_INSTRETH 0xc82 -#define CSR_HPMCOUNTER3H 0xc83 -#define CSR_HPMCOUNTER4H 0xc84 -#define CSR_HPMCOUNTER5H 0xc85 -#define CSR_HPMCOUNTER6H 0xc86 -#define CSR_HPMCOUNTER7H 0xc87 -#define CSR_HPMCOUNTER8H 0xc88 -#define CSR_HPMCOUNTER9H 0xc89 -#define CSR_HPMCOUNTER10H 0xc8a -#define CSR_HPMCOUNTER11H 0xc8b -#define CSR_HPMCOUNTER12H 0xc8c -#define CSR_HPMCOUNTER13H 0xc8d -#define CSR_HPMCOUNTER14H 0xc8e -#define CSR_HPMCOUNTER15H 0xc8f -#define CSR_HPMCOUNTER16H 0xc90 -#define CSR_HPMCOUNTER17H 0xc91 -#define CSR_HPMCOUNTER18H 0xc92 -#define CSR_HPMCOUNTER19H 0xc93 -#define CSR_HPMCOUNTER20H 0xc94 -#define CSR_HPMCOUNTER21H 0xc95 -#define CSR_HPMCOUNTER22H 0xc96 -#define CSR_HPMCOUNTER23H 0xc97 -#define CSR_HPMCOUNTER24H 0xc98 -#define CSR_HPMCOUNTER25H 0xc99 -#define CSR_HPMCOUNTER26H 0xc9a -#define CSR_HPMCOUNTER27H 0xc9b -#define CSR_HPMCOUNTER28H 0xc9c -#define CSR_HPMCOUNTER29H 0xc9d -#define CSR_HPMCOUNTER30H 0xc9e -#define CSR_HPMCOUNTER31H 0xc9f -#define CSR_MSTATUSH 0x310 -#define CSR_MIDELEGH 0x313 -#define CSR_MIEH 0x314 -#define CSR_MVIENH 0x318 -#define CSR_MVIPH 0x319 -#define CSR_MENVCFGH 0x31a -#define CSR_MSTATEEN0H 0x31c -#define CSR_MSTATEEN1H 0x31d -#define CSR_MSTATEEN2H 0x31e -#define CSR_MSTATEEN3H 0x31f -#define CSR_MIPH 0x354 -#define CSR_MHPMEVENT3H 0x723 -#define CSR_MHPMEVENT4H 0x724 -#define CSR_MHPMEVENT5H 0x725 -#define CSR_MHPMEVENT6H 0x726 -#define CSR_MHPMEVENT7H 0x727 -#define CSR_MHPMEVENT8H 0x728 -#define CSR_MHPMEVENT9H 0x729 -#define CSR_MHPMEVENT10H 0x72a -#define CSR_MHPMEVENT11H 0x72b -#define CSR_MHPMEVENT12H 0x72c -#define CSR_MHPMEVENT13H 0x72d -#define CSR_MHPMEVENT14H 0x72e -#define CSR_MHPMEVENT15H 0x72f -#define CSR_MHPMEVENT16H 0x730 -#define CSR_MHPMEVENT17H 0x731 -#define CSR_MHPMEVENT18H 0x732 -#define CSR_MHPMEVENT19H 0x733 -#define CSR_MHPMEVENT20H 0x734 -#define CSR_MHPMEVENT21H 0x735 -#define CSR_MHPMEVENT22H 0x736 -#define CSR_MHPMEVENT23H 0x737 -#define CSR_MHPMEVENT24H 0x738 -#define CSR_MHPMEVENT25H 0x739 -#define CSR_MHPMEVENT26H 0x73a -#define CSR_MHPMEVENT27H 0x73b -#define CSR_MHPMEVENT28H 0x73c -#define CSR_MHPMEVENT29H 0x73d -#define CSR_MHPMEVENT30H 0x73e -#define CSR_MHPMEVENT31H 0x73f -#define CSR_MNSCRATCH 0x740 -#define CSR_MNEPC 0x741 -#define CSR_MNCAUSE 0x742 -#define CSR_MNSTATUS 0x744 -#define CSR_MSECCFGH 0x757 -#define CSR_MCYCLEH 0xb80 -#define CSR_MINSTRETH 0xb82 -#define CSR_MHPMCOUNTER3H 0xb83 -#define CSR_MHPMCOUNTER4H 0xb84 -#define CSR_MHPMCOUNTER5H 0xb85 -#define CSR_MHPMCOUNTER6H 0xb86 -#define CSR_MHPMCOUNTER7H 0xb87 -#define CSR_MHPMCOUNTER8H 0xb88 -#define CSR_MHPMCOUNTER9H 0xb89 -#define CSR_MHPMCOUNTER10H 0xb8a -#define CSR_MHPMCOUNTER11H 0xb8b -#define CSR_MHPMCOUNTER12H 0xb8c -#define CSR_MHPMCOUNTER13H 0xb8d -#define CSR_MHPMCOUNTER14H 0xb8e -#define CSR_MHPMCOUNTER15H 0xb8f -#define CSR_MHPMCOUNTER16H 0xb90 -#define CSR_MHPMCOUNTER17H 0xb91 -#define CSR_MHPMCOUNTER18H 0xb92 -#define CSR_MHPMCOUNTER19H 0xb93 -#define CSR_MHPMCOUNTER20H 0xb94 -#define CSR_MHPMCOUNTER21H 0xb95 -#define CSR_MHPMCOUNTER22H 0xb96 -#define CSR_MHPMCOUNTER23H 0xb97 -#define CSR_MHPMCOUNTER24H 0xb98 -#define CSR_MHPMCOUNTER25H 0xb99 -#define CSR_MHPMCOUNTER26H 0xb9a -#define CSR_MHPMCOUNTER27H 0xb9b -#define CSR_MHPMCOUNTER28H 0xb9c -#define CSR_MHPMCOUNTER29H 0xb9d -#define CSR_MHPMCOUNTER30H 0xb9e -#define CSR_MHPMCOUNTER31H 0xb9f - -#define CAUSE_MISALIGNED_FETCH 0x0 -#define CAUSE_FETCH_ACCESS 0x1 -#define CAUSE_ILLEGAL_INSTRUCTION 0x2 -#define CAUSE_BREAKPOINT 0x3 -#define CAUSE_MISALIGNED_LOAD 0x4 -#define CAUSE_LOAD_ACCESS 0x5 -#define CAUSE_MISALIGNED_STORE 0x6 -#define CAUSE_STORE_ACCESS 0x7 -#define CAUSE_USER_ECALL 0x8 -#define CAUSE_SUPERVISOR_ECALL 0x9 -#define CAUSE_VIRTUAL_SUPERVISOR_ECALL 0xa -#define CAUSE_MACHINE_ECALL 0xb -#define CAUSE_FETCH_PAGE_FAULT 0xc -#define CAUSE_LOAD_PAGE_FAULT 0xd -#define CAUSE_STORE_PAGE_FAULT 0xf -#define CAUSE_FETCH_GUEST_PAGE_FAULT 0x14 -#define CAUSE_LOAD_GUEST_PAGE_FAULT 0x15 -#define CAUSE_VIRTUAL_INSTRUCTION 0x16 -#define CAUSE_STORE_GUEST_PAGE_FAULT 0x17 - -#define INSN_FIELD_RD 0xf80 -#define INSN_FIELD_RT 0xf8000 -#define INSN_FIELD_RS1 0xf8000 -#define INSN_FIELD_RS2 0x1f00000 -#define INSN_FIELD_RS3 0xf8000000 -#define INSN_FIELD_AQRL 0x6000000 -#define INSN_FIELD_AQ 0x4000000 -#define INSN_FIELD_RL 0x2000000 -#define INSN_FIELD_FM 0xf0000000 -#define INSN_FIELD_PRED 0xf000000 -#define INSN_FIELD_SUCC 0xf00000 -#define INSN_FIELD_RM 0x7000 -#define INSN_FIELD_FUNCT3 0x7000 -#define INSN_FIELD_FUNCT2 0x6000000 -#define INSN_FIELD_IMM20 0xfffff000 -#define INSN_FIELD_JIMM20 0xfffff000 -#define INSN_FIELD_IMM12 0xfff00000 -#define INSN_FIELD_CSR 0xfff00000 -#define INSN_FIELD_IMM12HI 0xfe000000 -#define INSN_FIELD_BIMM12HI 0xfe000000 -#define INSN_FIELD_IMM12LO 0xf80 -#define INSN_FIELD_BIMM12LO 0xf80 -#define INSN_FIELD_ZIMM 0xf8000 -#define INSN_FIELD_SHAMTQ 0x7f00000 -#define INSN_FIELD_SHAMTW 0x1f00000 -#define INSN_FIELD_SHAMTW4 0xf00000 -#define INSN_FIELD_SHAMTD 0x3f00000 -#define INSN_FIELD_BS 0xc0000000 -#define INSN_FIELD_RNUM 0xf00000 -#define INSN_FIELD_RC 0x3e000000 -#define INSN_FIELD_IMM2 0x300000 -#define INSN_FIELD_IMM3 0x700000 -#define INSN_FIELD_IMM4 0xf00000 -#define INSN_FIELD_IMM5 0x1f00000 -#define INSN_FIELD_IMM6 0x3f00000 -#define INSN_FIELD_OPCODE 0x7f -#define INSN_FIELD_FUNCT7 0xfe000000 -#define INSN_FIELD_VD 0xf80 -#define INSN_FIELD_VS3 0xf80 -#define INSN_FIELD_VS1 0xf8000 -#define INSN_FIELD_VS2 0x1f00000 -#define INSN_FIELD_VM 0x2000000 -#define INSN_FIELD_WD 0x4000000 -#define INSN_FIELD_AMOOP 0xf8000000 -#define INSN_FIELD_NF 0xe0000000 -#define INSN_FIELD_SIMM5 0xf8000 -#define INSN_FIELD_ZIMM10 0x3ff00000 -#define INSN_FIELD_ZIMM11 0x7ff00000 -#define INSN_FIELD_C_NZUIMM10 0x1fe0 -#define INSN_FIELD_C_UIMM7LO 0x60 -#define INSN_FIELD_C_UIMM7HI 0x1c00 -#define INSN_FIELD_C_UIMM8LO 0x60 -#define INSN_FIELD_C_UIMM8HI 0x1c00 -#define INSN_FIELD_C_UIMM9LO 0x60 -#define INSN_FIELD_C_UIMM9HI 0x1c00 -#define INSN_FIELD_C_NZIMM6LO 0x7c -#define INSN_FIELD_C_NZIMM6HI 0x1000 -#define INSN_FIELD_C_IMM6LO 0x7c -#define INSN_FIELD_C_IMM6HI 0x1000 -#define INSN_FIELD_C_NZIMM10HI 0x1000 -#define INSN_FIELD_C_NZIMM10LO 0x7c -#define INSN_FIELD_C_NZIMM18HI 0x1000 -#define INSN_FIELD_C_NZIMM18LO 0x7c -#define INSN_FIELD_C_IMM12 0x1ffc -#define INSN_FIELD_C_BIMM9LO 0x7c -#define INSN_FIELD_C_BIMM9HI 0x1c00 -#define INSN_FIELD_C_NZUIMM5 0x7c -#define INSN_FIELD_C_NZUIMM6LO 0x7c -#define INSN_FIELD_C_NZUIMM6HI 0x1000 -#define INSN_FIELD_C_UIMM8SPLO 0x7c -#define INSN_FIELD_C_UIMM8SPHI 0x1000 -#define INSN_FIELD_C_UIMM8SP_S 0x1f80 -#define INSN_FIELD_C_UIMM10SPLO 0x7c -#define INSN_FIELD_C_UIMM10SPHI 0x1000 -#define INSN_FIELD_C_UIMM9SPLO 0x7c -#define INSN_FIELD_C_UIMM9SPHI 0x1000 -#define INSN_FIELD_C_UIMM10SP_S 0x1f80 -#define INSN_FIELD_C_UIMM9SP_S 0x1f80 -#define INSN_FIELD_C_UIMM2 0x60 -#define INSN_FIELD_C_UIMM1 0x20 -#define INSN_FIELD_C_RLIST 0xf0 -#define INSN_FIELD_C_SPIMM 0xc -#define INSN_FIELD_C_INDEX 0x3fc -#define INSN_FIELD_RS1_P 0x380 -#define INSN_FIELD_RS2_P 0x1c -#define INSN_FIELD_RD_P 0x1c -#define INSN_FIELD_RD_RS1_N0 0xf80 -#define INSN_FIELD_RD_RS1_P 0x380 -#define INSN_FIELD_RD_RS1 0xf80 -#define INSN_FIELD_RD_N2 0xf80 -#define INSN_FIELD_RD_N0 0xf80 -#define INSN_FIELD_RS1_N0 0xf80 -#define INSN_FIELD_C_RS2_N0 0x7c -#define INSN_FIELD_C_RS1_N0 0xf80 -#define INSN_FIELD_C_RS2 0x7c -#define INSN_FIELD_C_SREG1 0x380 -#define INSN_FIELD_C_SREG2 0x1c -#endif -#ifdef DECLARE_INSN -DECLARE_INSN(add, MATCH_ADD, MASK_ADD) -DECLARE_INSN(add16, MATCH_ADD16, MASK_ADD16) -DECLARE_INSN(add32, MATCH_ADD32, MASK_ADD32) -DECLARE_INSN(add64, MATCH_ADD64, MASK_ADD64) -DECLARE_INSN(add8, MATCH_ADD8, MASK_ADD8) -DECLARE_INSN(add_uw, MATCH_ADD_UW, MASK_ADD_UW) -DECLARE_INSN(addi, MATCH_ADDI, MASK_ADDI) -DECLARE_INSN(addiw, MATCH_ADDIW, MASK_ADDIW) -DECLARE_INSN(addw, MATCH_ADDW, MASK_ADDW) -DECLARE_INSN(aes32dsi, MATCH_AES32DSI, MASK_AES32DSI) -DECLARE_INSN(aes32dsmi, MATCH_AES32DSMI, MASK_AES32DSMI) -DECLARE_INSN(aes32esi, MATCH_AES32ESI, MASK_AES32ESI) -DECLARE_INSN(aes32esmi, MATCH_AES32ESMI, MASK_AES32ESMI) -DECLARE_INSN(aes64ds, MATCH_AES64DS, MASK_AES64DS) -DECLARE_INSN(aes64dsm, MATCH_AES64DSM, MASK_AES64DSM) -DECLARE_INSN(aes64es, MATCH_AES64ES, MASK_AES64ES) -DECLARE_INSN(aes64esm, MATCH_AES64ESM, MASK_AES64ESM) -DECLARE_INSN(aes64im, MATCH_AES64IM, MASK_AES64IM) -DECLARE_INSN(aes64ks1i, MATCH_AES64KS1I, MASK_AES64KS1I) -DECLARE_INSN(aes64ks2, MATCH_AES64KS2, MASK_AES64KS2) -DECLARE_INSN(amoadd_d, MATCH_AMOADD_D, MASK_AMOADD_D) -DECLARE_INSN(amoadd_w, MATCH_AMOADD_W, MASK_AMOADD_W) -DECLARE_INSN(amoand_d, MATCH_AMOAND_D, MASK_AMOAND_D) -DECLARE_INSN(amoand_w, MATCH_AMOAND_W, MASK_AMOAND_W) -DECLARE_INSN(amomax_d, MATCH_AMOMAX_D, MASK_AMOMAX_D) -DECLARE_INSN(amomax_w, MATCH_AMOMAX_W, MASK_AMOMAX_W) -DECLARE_INSN(amomaxu_d, MATCH_AMOMAXU_D, MASK_AMOMAXU_D) -DECLARE_INSN(amomaxu_w, MATCH_AMOMAXU_W, MASK_AMOMAXU_W) -DECLARE_INSN(amomin_d, MATCH_AMOMIN_D, MASK_AMOMIN_D) -DECLARE_INSN(amomin_w, MATCH_AMOMIN_W, MASK_AMOMIN_W) -DECLARE_INSN(amominu_d, MATCH_AMOMINU_D, MASK_AMOMINU_D) -DECLARE_INSN(amominu_w, MATCH_AMOMINU_W, MASK_AMOMINU_W) -DECLARE_INSN(amoor_d, MATCH_AMOOR_D, MASK_AMOOR_D) -DECLARE_INSN(amoor_w, MATCH_AMOOR_W, MASK_AMOOR_W) -DECLARE_INSN(amoswap_d, MATCH_AMOSWAP_D, MASK_AMOSWAP_D) -DECLARE_INSN(amoswap_w, MATCH_AMOSWAP_W, MASK_AMOSWAP_W) -DECLARE_INSN(amoxor_d, MATCH_AMOXOR_D, MASK_AMOXOR_D) -DECLARE_INSN(amoxor_w, MATCH_AMOXOR_W, MASK_AMOXOR_W) -DECLARE_INSN(and, MATCH_AND, MASK_AND) -DECLARE_INSN(andi, MATCH_ANDI, MASK_ANDI) -DECLARE_INSN(andn, MATCH_ANDN, MASK_ANDN) -DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC) -DECLARE_INSN(ave, MATCH_AVE, MASK_AVE) -DECLARE_INSN(bclr, MATCH_BCLR, MASK_BCLR) -DECLARE_INSN(bclri, MATCH_BCLRI, MASK_BCLRI) -DECLARE_INSN(bcompress, MATCH_BCOMPRESS, MASK_BCOMPRESS) -DECLARE_INSN(bcompressw, MATCH_BCOMPRESSW, MASK_BCOMPRESSW) -DECLARE_INSN(bdecompress, MATCH_BDECOMPRESS, MASK_BDECOMPRESS) -DECLARE_INSN(bdecompressw, MATCH_BDECOMPRESSW, MASK_BDECOMPRESSW) -DECLARE_INSN(beq, MATCH_BEQ, MASK_BEQ) -DECLARE_INSN(bext, MATCH_BEXT, MASK_BEXT) -DECLARE_INSN(bexti, MATCH_BEXTI, MASK_BEXTI) -DECLARE_INSN(bfp, MATCH_BFP, MASK_BFP) -DECLARE_INSN(bfpw, MATCH_BFPW, MASK_BFPW) -DECLARE_INSN(bge, MATCH_BGE, MASK_BGE) -DECLARE_INSN(bgeu, MATCH_BGEU, MASK_BGEU) -DECLARE_INSN(binv, MATCH_BINV, MASK_BINV) -DECLARE_INSN(binvi, MATCH_BINVI, MASK_BINVI) -DECLARE_INSN(blt, MATCH_BLT, MASK_BLT) -DECLARE_INSN(bltu, MATCH_BLTU, MASK_BLTU) -DECLARE_INSN(bmatflip, MATCH_BMATFLIP, MASK_BMATFLIP) -DECLARE_INSN(bmator, MATCH_BMATOR, MASK_BMATOR) -DECLARE_INSN(bmatxor, MATCH_BMATXOR, MASK_BMATXOR) -DECLARE_INSN(bne, MATCH_BNE, MASK_BNE) -DECLARE_INSN(bset, MATCH_BSET, MASK_BSET) -DECLARE_INSN(bseti, MATCH_BSETI, MASK_BSETI) -DECLARE_INSN(c_add, MATCH_C_ADD, MASK_C_ADD) -DECLARE_INSN(c_addi, MATCH_C_ADDI, MASK_C_ADDI) -DECLARE_INSN(c_addi16sp, MATCH_C_ADDI16SP, MASK_C_ADDI16SP) -DECLARE_INSN(c_addi4spn, MATCH_C_ADDI4SPN, MASK_C_ADDI4SPN) -DECLARE_INSN(c_addiw, MATCH_C_ADDIW, MASK_C_ADDIW) -DECLARE_INSN(c_addw, MATCH_C_ADDW, MASK_C_ADDW) -DECLARE_INSN(c_and, MATCH_C_AND, MASK_C_AND) -DECLARE_INSN(c_andi, MATCH_C_ANDI, MASK_C_ANDI) -DECLARE_INSN(c_beqz, MATCH_C_BEQZ, MASK_C_BEQZ) -DECLARE_INSN(c_bnez, MATCH_C_BNEZ, MASK_C_BNEZ) -DECLARE_INSN(c_ebreak, MATCH_C_EBREAK, MASK_C_EBREAK) -DECLARE_INSN(c_fld, MATCH_C_FLD, MASK_C_FLD) -DECLARE_INSN(c_fldsp, MATCH_C_FLDSP, MASK_C_FLDSP) -DECLARE_INSN(c_flw, MATCH_C_FLW, MASK_C_FLW) -DECLARE_INSN(c_flwsp, MATCH_C_FLWSP, MASK_C_FLWSP) -DECLARE_INSN(c_fsd, MATCH_C_FSD, MASK_C_FSD) -DECLARE_INSN(c_fsdsp, MATCH_C_FSDSP, MASK_C_FSDSP) -DECLARE_INSN(c_fsw, MATCH_C_FSW, MASK_C_FSW) -DECLARE_INSN(c_fswsp, MATCH_C_FSWSP, MASK_C_FSWSP) -DECLARE_INSN(c_j, MATCH_C_J, MASK_C_J) -DECLARE_INSN(c_jal, MATCH_C_JAL, MASK_C_JAL) -DECLARE_INSN(c_jalr, MATCH_C_JALR, MASK_C_JALR) -DECLARE_INSN(c_jr, MATCH_C_JR, MASK_C_JR) -DECLARE_INSN(c_lbu, MATCH_C_LBU, MASK_C_LBU) -DECLARE_INSN(c_ld, MATCH_C_LD, MASK_C_LD) -DECLARE_INSN(c_ldsp, MATCH_C_LDSP, MASK_C_LDSP) -DECLARE_INSN(c_lh, MATCH_C_LH, MASK_C_LH) -DECLARE_INSN(c_lhu, MATCH_C_LHU, MASK_C_LHU) -DECLARE_INSN(c_li, MATCH_C_LI, MASK_C_LI) -DECLARE_INSN(c_lui, MATCH_C_LUI, MASK_C_LUI) -DECLARE_INSN(c_lw, MATCH_C_LW, MASK_C_LW) -DECLARE_INSN(c_lwsp, MATCH_C_LWSP, MASK_C_LWSP) -DECLARE_INSN(c_mul, MATCH_C_MUL, MASK_C_MUL) -DECLARE_INSN(c_mv, MATCH_C_MV, MASK_C_MV) -DECLARE_INSN(c_nop, MATCH_C_NOP, MASK_C_NOP) -DECLARE_INSN(c_not, MATCH_C_NOT, MASK_C_NOT) -DECLARE_INSN(c_or, MATCH_C_OR, MASK_C_OR) -DECLARE_INSN(c_sb, MATCH_C_SB, MASK_C_SB) -DECLARE_INSN(c_sd, MATCH_C_SD, MASK_C_SD) -DECLARE_INSN(c_sdsp, MATCH_C_SDSP, MASK_C_SDSP) -DECLARE_INSN(c_sext_b, MATCH_C_SEXT_B, MASK_C_SEXT_B) -DECLARE_INSN(c_sext_h, MATCH_C_SEXT_H, MASK_C_SEXT_H) -DECLARE_INSN(c_sh, MATCH_C_SH, MASK_C_SH) -DECLARE_INSN(c_slli, MATCH_C_SLLI, MASK_C_SLLI) -DECLARE_INSN(c_srai, MATCH_C_SRAI, MASK_C_SRAI) -DECLARE_INSN(c_srli, MATCH_C_SRLI, MASK_C_SRLI) -DECLARE_INSN(c_sub, MATCH_C_SUB, MASK_C_SUB) -DECLARE_INSN(c_subw, MATCH_C_SUBW, MASK_C_SUBW) -DECLARE_INSN(c_sw, MATCH_C_SW, MASK_C_SW) -DECLARE_INSN(c_swsp, MATCH_C_SWSP, MASK_C_SWSP) -DECLARE_INSN(c_xor, MATCH_C_XOR, MASK_C_XOR) -DECLARE_INSN(c_zext_b, MATCH_C_ZEXT_B, MASK_C_ZEXT_B) -DECLARE_INSN(c_zext_h, MATCH_C_ZEXT_H, MASK_C_ZEXT_H) -DECLARE_INSN(c_zext_w, MATCH_C_ZEXT_W, MASK_C_ZEXT_W) -DECLARE_INSN(cbo_clean, MATCH_CBO_CLEAN, MASK_CBO_CLEAN) -DECLARE_INSN(cbo_flush, MATCH_CBO_FLUSH, MASK_CBO_FLUSH) -DECLARE_INSN(cbo_inval, MATCH_CBO_INVAL, MASK_CBO_INVAL) -DECLARE_INSN(cbo_zero, MATCH_CBO_ZERO, MASK_CBO_ZERO) -DECLARE_INSN(clmul, MATCH_CLMUL, MASK_CLMUL) -DECLARE_INSN(clmulh, MATCH_CLMULH, MASK_CLMULH) -DECLARE_INSN(clmulr, MATCH_CLMULR, MASK_CLMULR) -DECLARE_INSN(clrs16, MATCH_CLRS16, MASK_CLRS16) -DECLARE_INSN(clrs32, MATCH_CLRS32, MASK_CLRS32) -DECLARE_INSN(clrs8, MATCH_CLRS8, MASK_CLRS8) -DECLARE_INSN(clz, MATCH_CLZ, MASK_CLZ) -DECLARE_INSN(clz16, MATCH_CLZ16, MASK_CLZ16) -DECLARE_INSN(clz32, MATCH_CLZ32, MASK_CLZ32) -DECLARE_INSN(clz8, MATCH_CLZ8, MASK_CLZ8) -DECLARE_INSN(clzw, MATCH_CLZW, MASK_CLZW) -DECLARE_INSN(cm_jalt, MATCH_CM_JALT, MASK_CM_JALT) -DECLARE_INSN(cm_mva01s, MATCH_CM_MVA01S, MASK_CM_MVA01S) -DECLARE_INSN(cm_mvsa01, MATCH_CM_MVSA01, MASK_CM_MVSA01) -DECLARE_INSN(cm_pop, MATCH_CM_POP, MASK_CM_POP) -DECLARE_INSN(cm_popret, MATCH_CM_POPRET, MASK_CM_POPRET) -DECLARE_INSN(cm_popretz, MATCH_CM_POPRETZ, MASK_CM_POPRETZ) -DECLARE_INSN(cm_push, MATCH_CM_PUSH, MASK_CM_PUSH) -DECLARE_INSN(cmix, MATCH_CMIX, MASK_CMIX) -DECLARE_INSN(cmov, MATCH_CMOV, MASK_CMOV) -DECLARE_INSN(cmpeq16, MATCH_CMPEQ16, MASK_CMPEQ16) -DECLARE_INSN(cmpeq8, MATCH_CMPEQ8, MASK_CMPEQ8) -DECLARE_INSN(cpop, MATCH_CPOP, MASK_CPOP) -DECLARE_INSN(cpopw, MATCH_CPOPW, MASK_CPOPW) -DECLARE_INSN(cras16, MATCH_CRAS16, MASK_CRAS16) -DECLARE_INSN(cras32, MATCH_CRAS32, MASK_CRAS32) -DECLARE_INSN(crc32_b, MATCH_CRC32_B, MASK_CRC32_B) -DECLARE_INSN(crc32_d, MATCH_CRC32_D, MASK_CRC32_D) -DECLARE_INSN(crc32_h, MATCH_CRC32_H, MASK_CRC32_H) -DECLARE_INSN(crc32_w, MATCH_CRC32_W, MASK_CRC32_W) -DECLARE_INSN(crc32c_b, MATCH_CRC32C_B, MASK_CRC32C_B) -DECLARE_INSN(crc32c_d, MATCH_CRC32C_D, MASK_CRC32C_D) -DECLARE_INSN(crc32c_h, MATCH_CRC32C_H, MASK_CRC32C_H) -DECLARE_INSN(crc32c_w, MATCH_CRC32C_W, MASK_CRC32C_W) -DECLARE_INSN(crsa16, MATCH_CRSA16, MASK_CRSA16) -DECLARE_INSN(crsa32, MATCH_CRSA32, MASK_CRSA32) -DECLARE_INSN(csrrc, MATCH_CSRRC, MASK_CSRRC) -DECLARE_INSN(csrrci, MATCH_CSRRCI, MASK_CSRRCI) -DECLARE_INSN(csrrs, MATCH_CSRRS, MASK_CSRRS) -DECLARE_INSN(csrrsi, MATCH_CSRRSI, MASK_CSRRSI) -DECLARE_INSN(csrrw, MATCH_CSRRW, MASK_CSRRW) -DECLARE_INSN(csrrwi, MATCH_CSRRWI, MASK_CSRRWI) -DECLARE_INSN(ctz, MATCH_CTZ, MASK_CTZ) -DECLARE_INSN(ctzw, MATCH_CTZW, MASK_CTZW) -DECLARE_INSN(czero_eqz, MATCH_CZERO_EQZ, MASK_CZERO_EQZ) -DECLARE_INSN(czero_nez, MATCH_CZERO_NEZ, MASK_CZERO_NEZ) -DECLARE_INSN(div, MATCH_DIV, MASK_DIV) -DECLARE_INSN(divu, MATCH_DIVU, MASK_DIVU) -DECLARE_INSN(divuw, MATCH_DIVUW, MASK_DIVUW) -DECLARE_INSN(divw, MATCH_DIVW, MASK_DIVW) -DECLARE_INSN(dret, MATCH_DRET, MASK_DRET) -DECLARE_INSN(ebreak, MATCH_EBREAK, MASK_EBREAK) -DECLARE_INSN(ecall, MATCH_ECALL, MASK_ECALL) -DECLARE_INSN(fadd_d, MATCH_FADD_D, MASK_FADD_D) -DECLARE_INSN(fadd_h, MATCH_FADD_H, MASK_FADD_H) -DECLARE_INSN(fadd_q, MATCH_FADD_Q, MASK_FADD_Q) -DECLARE_INSN(fadd_s, MATCH_FADD_S, MASK_FADD_S) -DECLARE_INSN(fclass_d, MATCH_FCLASS_D, MASK_FCLASS_D) -DECLARE_INSN(fclass_h, MATCH_FCLASS_H, MASK_FCLASS_H) -DECLARE_INSN(fclass_q, MATCH_FCLASS_Q, MASK_FCLASS_Q) -DECLARE_INSN(fclass_s, MATCH_FCLASS_S, MASK_FCLASS_S) -DECLARE_INSN(fcvt_d_h, MATCH_FCVT_D_H, MASK_FCVT_D_H) -DECLARE_INSN(fcvt_d_l, MATCH_FCVT_D_L, MASK_FCVT_D_L) -DECLARE_INSN(fcvt_d_lu, MATCH_FCVT_D_LU, MASK_FCVT_D_LU) -DECLARE_INSN(fcvt_d_q, MATCH_FCVT_D_Q, MASK_FCVT_D_Q) -DECLARE_INSN(fcvt_d_s, MATCH_FCVT_D_S, MASK_FCVT_D_S) -DECLARE_INSN(fcvt_d_w, MATCH_FCVT_D_W, MASK_FCVT_D_W) -DECLARE_INSN(fcvt_d_wu, MATCH_FCVT_D_WU, MASK_FCVT_D_WU) -DECLARE_INSN(fcvt_h_d, MATCH_FCVT_H_D, MASK_FCVT_H_D) -DECLARE_INSN(fcvt_h_l, MATCH_FCVT_H_L, MASK_FCVT_H_L) -DECLARE_INSN(fcvt_h_lu, MATCH_FCVT_H_LU, MASK_FCVT_H_LU) -DECLARE_INSN(fcvt_h_q, MATCH_FCVT_H_Q, MASK_FCVT_H_Q) -DECLARE_INSN(fcvt_h_s, MATCH_FCVT_H_S, MASK_FCVT_H_S) -DECLARE_INSN(fcvt_h_w, MATCH_FCVT_H_W, MASK_FCVT_H_W) -DECLARE_INSN(fcvt_h_wu, MATCH_FCVT_H_WU, MASK_FCVT_H_WU) -DECLARE_INSN(fcvt_l_d, MATCH_FCVT_L_D, MASK_FCVT_L_D) -DECLARE_INSN(fcvt_l_h, MATCH_FCVT_L_H, MASK_FCVT_L_H) -DECLARE_INSN(fcvt_l_q, MATCH_FCVT_L_Q, MASK_FCVT_L_Q) -DECLARE_INSN(fcvt_l_s, MATCH_FCVT_L_S, MASK_FCVT_L_S) -DECLARE_INSN(fcvt_lu_d, MATCH_FCVT_LU_D, MASK_FCVT_LU_D) -DECLARE_INSN(fcvt_lu_h, MATCH_FCVT_LU_H, MASK_FCVT_LU_H) -DECLARE_INSN(fcvt_lu_q, MATCH_FCVT_LU_Q, MASK_FCVT_LU_Q) -DECLARE_INSN(fcvt_lu_s, MATCH_FCVT_LU_S, MASK_FCVT_LU_S) -DECLARE_INSN(fcvt_q_d, MATCH_FCVT_Q_D, MASK_FCVT_Q_D) -DECLARE_INSN(fcvt_q_h, MATCH_FCVT_Q_H, MASK_FCVT_Q_H) -DECLARE_INSN(fcvt_q_l, MATCH_FCVT_Q_L, MASK_FCVT_Q_L) -DECLARE_INSN(fcvt_q_lu, MATCH_FCVT_Q_LU, MASK_FCVT_Q_LU) -DECLARE_INSN(fcvt_q_s, MATCH_FCVT_Q_S, MASK_FCVT_Q_S) -DECLARE_INSN(fcvt_q_w, MATCH_FCVT_Q_W, MASK_FCVT_Q_W) -DECLARE_INSN(fcvt_q_wu, MATCH_FCVT_Q_WU, MASK_FCVT_Q_WU) -DECLARE_INSN(fcvt_s_d, MATCH_FCVT_S_D, MASK_FCVT_S_D) -DECLARE_INSN(fcvt_s_h, MATCH_FCVT_S_H, MASK_FCVT_S_H) -DECLARE_INSN(fcvt_s_l, MATCH_FCVT_S_L, MASK_FCVT_S_L) -DECLARE_INSN(fcvt_s_lu, MATCH_FCVT_S_LU, MASK_FCVT_S_LU) -DECLARE_INSN(fcvt_s_q, MATCH_FCVT_S_Q, MASK_FCVT_S_Q) -DECLARE_INSN(fcvt_s_w, MATCH_FCVT_S_W, MASK_FCVT_S_W) -DECLARE_INSN(fcvt_s_wu, MATCH_FCVT_S_WU, MASK_FCVT_S_WU) -DECLARE_INSN(fcvt_w_d, MATCH_FCVT_W_D, MASK_FCVT_W_D) -DECLARE_INSN(fcvt_w_h, MATCH_FCVT_W_H, MASK_FCVT_W_H) -DECLARE_INSN(fcvt_w_q, MATCH_FCVT_W_Q, MASK_FCVT_W_Q) -DECLARE_INSN(fcvt_w_s, MATCH_FCVT_W_S, MASK_FCVT_W_S) -DECLARE_INSN(fcvt_wu_d, MATCH_FCVT_WU_D, MASK_FCVT_WU_D) -DECLARE_INSN(fcvt_wu_h, MATCH_FCVT_WU_H, MASK_FCVT_WU_H) -DECLARE_INSN(fcvt_wu_q, MATCH_FCVT_WU_Q, MASK_FCVT_WU_Q) -DECLARE_INSN(fcvt_wu_s, MATCH_FCVT_WU_S, MASK_FCVT_WU_S) -DECLARE_INSN(fdiv_d, MATCH_FDIV_D, MASK_FDIV_D) -DECLARE_INSN(fdiv_h, MATCH_FDIV_H, MASK_FDIV_H) -DECLARE_INSN(fdiv_q, MATCH_FDIV_Q, MASK_FDIV_Q) -DECLARE_INSN(fdiv_s, MATCH_FDIV_S, MASK_FDIV_S) -DECLARE_INSN(fence, MATCH_FENCE, MASK_FENCE) -DECLARE_INSN(fence_i, MATCH_FENCE_I, MASK_FENCE_I) -DECLARE_INSN(feq_d, MATCH_FEQ_D, MASK_FEQ_D) -DECLARE_INSN(feq_h, MATCH_FEQ_H, MASK_FEQ_H) -DECLARE_INSN(feq_q, MATCH_FEQ_Q, MASK_FEQ_Q) -DECLARE_INSN(feq_s, MATCH_FEQ_S, MASK_FEQ_S) -DECLARE_INSN(fld, MATCH_FLD, MASK_FLD) -DECLARE_INSN(fle_d, MATCH_FLE_D, MASK_FLE_D) -DECLARE_INSN(fle_h, MATCH_FLE_H, MASK_FLE_H) -DECLARE_INSN(fle_q, MATCH_FLE_Q, MASK_FLE_Q) -DECLARE_INSN(fle_s, MATCH_FLE_S, MASK_FLE_S) -DECLARE_INSN(flh, MATCH_FLH, MASK_FLH) -DECLARE_INSN(flq, MATCH_FLQ, MASK_FLQ) -DECLARE_INSN(flt_d, MATCH_FLT_D, MASK_FLT_D) -DECLARE_INSN(flt_h, MATCH_FLT_H, MASK_FLT_H) -DECLARE_INSN(flt_q, MATCH_FLT_Q, MASK_FLT_Q) -DECLARE_INSN(flt_s, MATCH_FLT_S, MASK_FLT_S) -DECLARE_INSN(flw, MATCH_FLW, MASK_FLW) -DECLARE_INSN(fmadd_d, MATCH_FMADD_D, MASK_FMADD_D) -DECLARE_INSN(fmadd_h, MATCH_FMADD_H, MASK_FMADD_H) -DECLARE_INSN(fmadd_q, MATCH_FMADD_Q, MASK_FMADD_Q) -DECLARE_INSN(fmadd_s, MATCH_FMADD_S, MASK_FMADD_S) -DECLARE_INSN(fmax_d, MATCH_FMAX_D, MASK_FMAX_D) -DECLARE_INSN(fmax_h, MATCH_FMAX_H, MASK_FMAX_H) -DECLARE_INSN(fmax_q, MATCH_FMAX_Q, MASK_FMAX_Q) -DECLARE_INSN(fmax_s, MATCH_FMAX_S, MASK_FMAX_S) -DECLARE_INSN(fmin_d, MATCH_FMIN_D, MASK_FMIN_D) -DECLARE_INSN(fmin_h, MATCH_FMIN_H, MASK_FMIN_H) -DECLARE_INSN(fmin_q, MATCH_FMIN_Q, MASK_FMIN_Q) -DECLARE_INSN(fmin_s, MATCH_FMIN_S, MASK_FMIN_S) -DECLARE_INSN(fmsub_d, MATCH_FMSUB_D, MASK_FMSUB_D) -DECLARE_INSN(fmsub_h, MATCH_FMSUB_H, MASK_FMSUB_H) -DECLARE_INSN(fmsub_q, MATCH_FMSUB_Q, MASK_FMSUB_Q) -DECLARE_INSN(fmsub_s, MATCH_FMSUB_S, MASK_FMSUB_S) -DECLARE_INSN(fmul_d, MATCH_FMUL_D, MASK_FMUL_D) -DECLARE_INSN(fmul_h, MATCH_FMUL_H, MASK_FMUL_H) -DECLARE_INSN(fmul_q, MATCH_FMUL_Q, MASK_FMUL_Q) -DECLARE_INSN(fmul_s, MATCH_FMUL_S, MASK_FMUL_S) -DECLARE_INSN(fmv_d_x, MATCH_FMV_D_X, MASK_FMV_D_X) -DECLARE_INSN(fmv_h_x, MATCH_FMV_H_X, MASK_FMV_H_X) -DECLARE_INSN(fmv_w_x, MATCH_FMV_W_X, MASK_FMV_W_X) -DECLARE_INSN(fmv_x_d, MATCH_FMV_X_D, MASK_FMV_X_D) -DECLARE_INSN(fmv_x_h, MATCH_FMV_X_H, MASK_FMV_X_H) -DECLARE_INSN(fmv_x_w, MATCH_FMV_X_W, MASK_FMV_X_W) -DECLARE_INSN(fnmadd_d, MATCH_FNMADD_D, MASK_FNMADD_D) -DECLARE_INSN(fnmadd_h, MATCH_FNMADD_H, MASK_FNMADD_H) -DECLARE_INSN(fnmadd_q, MATCH_FNMADD_Q, MASK_FNMADD_Q) -DECLARE_INSN(fnmadd_s, MATCH_FNMADD_S, MASK_FNMADD_S) -DECLARE_INSN(fnmsub_d, MATCH_FNMSUB_D, MASK_FNMSUB_D) -DECLARE_INSN(fnmsub_h, MATCH_FNMSUB_H, MASK_FNMSUB_H) -DECLARE_INSN(fnmsub_q, MATCH_FNMSUB_Q, MASK_FNMSUB_Q) -DECLARE_INSN(fnmsub_s, MATCH_FNMSUB_S, MASK_FNMSUB_S) -DECLARE_INSN(fsd, MATCH_FSD, MASK_FSD) -DECLARE_INSN(fsgnj_d, MATCH_FSGNJ_D, MASK_FSGNJ_D) -DECLARE_INSN(fsgnj_h, MATCH_FSGNJ_H, MASK_FSGNJ_H) -DECLARE_INSN(fsgnj_q, MATCH_FSGNJ_Q, MASK_FSGNJ_Q) -DECLARE_INSN(fsgnj_s, MATCH_FSGNJ_S, MASK_FSGNJ_S) -DECLARE_INSN(fsgnjn_d, MATCH_FSGNJN_D, MASK_FSGNJN_D) -DECLARE_INSN(fsgnjn_h, MATCH_FSGNJN_H, MASK_FSGNJN_H) -DECLARE_INSN(fsgnjn_q, MATCH_FSGNJN_Q, MASK_FSGNJN_Q) -DECLARE_INSN(fsgnjn_s, MATCH_FSGNJN_S, MASK_FSGNJN_S) -DECLARE_INSN(fsgnjx_d, MATCH_FSGNJX_D, MASK_FSGNJX_D) -DECLARE_INSN(fsgnjx_h, MATCH_FSGNJX_H, MASK_FSGNJX_H) -DECLARE_INSN(fsgnjx_q, MATCH_FSGNJX_Q, MASK_FSGNJX_Q) -DECLARE_INSN(fsgnjx_s, MATCH_FSGNJX_S, MASK_FSGNJX_S) -DECLARE_INSN(fsh, MATCH_FSH, MASK_FSH) -DECLARE_INSN(fsl, MATCH_FSL, MASK_FSL) -DECLARE_INSN(fslw, MATCH_FSLW, MASK_FSLW) -DECLARE_INSN(fsq, MATCH_FSQ, MASK_FSQ) -DECLARE_INSN(fsqrt_d, MATCH_FSQRT_D, MASK_FSQRT_D) -DECLARE_INSN(fsqrt_h, MATCH_FSQRT_H, MASK_FSQRT_H) -DECLARE_INSN(fsqrt_q, MATCH_FSQRT_Q, MASK_FSQRT_Q) -DECLARE_INSN(fsqrt_s, MATCH_FSQRT_S, MASK_FSQRT_S) -DECLARE_INSN(fsr, MATCH_FSR, MASK_FSR) -DECLARE_INSN(fsri, MATCH_FSRI, MASK_FSRI) -DECLARE_INSN(fsriw, MATCH_FSRIW, MASK_FSRIW) -DECLARE_INSN(fsrw, MATCH_FSRW, MASK_FSRW) -DECLARE_INSN(fsub_d, MATCH_FSUB_D, MASK_FSUB_D) -DECLARE_INSN(fsub_h, MATCH_FSUB_H, MASK_FSUB_H) -DECLARE_INSN(fsub_q, MATCH_FSUB_Q, MASK_FSUB_Q) -DECLARE_INSN(fsub_s, MATCH_FSUB_S, MASK_FSUB_S) -DECLARE_INSN(fsw, MATCH_FSW, MASK_FSW) -DECLARE_INSN(gorc, MATCH_GORC, MASK_GORC) -DECLARE_INSN(gorci, MATCH_GORCI, MASK_GORCI) -DECLARE_INSN(gorciw, MATCH_GORCIW, MASK_GORCIW) -DECLARE_INSN(gorcw, MATCH_GORCW, MASK_GORCW) -DECLARE_INSN(grev, MATCH_GREV, MASK_GREV) -DECLARE_INSN(grevi, MATCH_GREVI, MASK_GREVI) -DECLARE_INSN(greviw, MATCH_GREVIW, MASK_GREVIW) -DECLARE_INSN(grevw, MATCH_GREVW, MASK_GREVW) -DECLARE_INSN(hfence_gvma, MATCH_HFENCE_GVMA, MASK_HFENCE_GVMA) -DECLARE_INSN(hfence_vvma, MATCH_HFENCE_VVMA, MASK_HFENCE_VVMA) -DECLARE_INSN(hinval_gvma, MATCH_HINVAL_GVMA, MASK_HINVAL_GVMA) -DECLARE_INSN(hinval_vvma, MATCH_HINVAL_VVMA, MASK_HINVAL_VVMA) -DECLARE_INSN(hlv_b, MATCH_HLV_B, MASK_HLV_B) -DECLARE_INSN(hlv_bu, MATCH_HLV_BU, MASK_HLV_BU) -DECLARE_INSN(hlv_d, MATCH_HLV_D, MASK_HLV_D) -DECLARE_INSN(hlv_h, MATCH_HLV_H, MASK_HLV_H) -DECLARE_INSN(hlv_hu, MATCH_HLV_HU, MASK_HLV_HU) -DECLARE_INSN(hlv_w, MATCH_HLV_W, MASK_HLV_W) -DECLARE_INSN(hlv_wu, MATCH_HLV_WU, MASK_HLV_WU) -DECLARE_INSN(hlvx_hu, MATCH_HLVX_HU, MASK_HLVX_HU) -DECLARE_INSN(hlvx_wu, MATCH_HLVX_WU, MASK_HLVX_WU) -DECLARE_INSN(hsv_b, MATCH_HSV_B, MASK_HSV_B) -DECLARE_INSN(hsv_d, MATCH_HSV_D, MASK_HSV_D) -DECLARE_INSN(hsv_h, MATCH_HSV_H, MASK_HSV_H) -DECLARE_INSN(hsv_w, MATCH_HSV_W, MASK_HSV_W) -DECLARE_INSN(insb, MATCH_INSB, MASK_INSB) -DECLARE_INSN(jal, MATCH_JAL, MASK_JAL) -DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR) -DECLARE_INSN(kabs16, MATCH_KABS16, MASK_KABS16) -DECLARE_INSN(kabs32, MATCH_KABS32, MASK_KABS32) -DECLARE_INSN(kabs8, MATCH_KABS8, MASK_KABS8) -DECLARE_INSN(kabsw, MATCH_KABSW, MASK_KABSW) -DECLARE_INSN(kadd16, MATCH_KADD16, MASK_KADD16) -DECLARE_INSN(kadd32, MATCH_KADD32, MASK_KADD32) -DECLARE_INSN(kadd64, MATCH_KADD64, MASK_KADD64) -DECLARE_INSN(kadd8, MATCH_KADD8, MASK_KADD8) -DECLARE_INSN(kaddh, MATCH_KADDH, MASK_KADDH) -DECLARE_INSN(kaddw, MATCH_KADDW, MASK_KADDW) -DECLARE_INSN(kcras16, MATCH_KCRAS16, MASK_KCRAS16) -DECLARE_INSN(kcras32, MATCH_KCRAS32, MASK_KCRAS32) -DECLARE_INSN(kcrsa16, MATCH_KCRSA16, MASK_KCRSA16) -DECLARE_INSN(kcrsa32, MATCH_KCRSA32, MASK_KCRSA32) -DECLARE_INSN(kdmabb, MATCH_KDMABB, MASK_KDMABB) -DECLARE_INSN(kdmabb16, MATCH_KDMABB16, MASK_KDMABB16) -DECLARE_INSN(kdmabt, MATCH_KDMABT, MASK_KDMABT) -DECLARE_INSN(kdmabt16, MATCH_KDMABT16, MASK_KDMABT16) -DECLARE_INSN(kdmatt, MATCH_KDMATT, MASK_KDMATT) -DECLARE_INSN(kdmatt16, MATCH_KDMATT16, MASK_KDMATT16) -DECLARE_INSN(kdmbb, MATCH_KDMBB, MASK_KDMBB) -DECLARE_INSN(kdmbb16, MATCH_KDMBB16, MASK_KDMBB16) -DECLARE_INSN(kdmbt, MATCH_KDMBT, MASK_KDMBT) -DECLARE_INSN(kdmbt16, MATCH_KDMBT16, MASK_KDMBT16) -DECLARE_INSN(kdmtt, MATCH_KDMTT, MASK_KDMTT) -DECLARE_INSN(kdmtt16, MATCH_KDMTT16, MASK_KDMTT16) -DECLARE_INSN(khm16, MATCH_KHM16, MASK_KHM16) -DECLARE_INSN(khm8, MATCH_KHM8, MASK_KHM8) -DECLARE_INSN(khmbb, MATCH_KHMBB, MASK_KHMBB) -DECLARE_INSN(khmbb16, MATCH_KHMBB16, MASK_KHMBB16) -DECLARE_INSN(khmbt, MATCH_KHMBT, MASK_KHMBT) -DECLARE_INSN(khmbt16, MATCH_KHMBT16, MASK_KHMBT16) -DECLARE_INSN(khmtt, MATCH_KHMTT, MASK_KHMTT) -DECLARE_INSN(khmtt16, MATCH_KHMTT16, MASK_KHMTT16) -DECLARE_INSN(khmx16, MATCH_KHMX16, MASK_KHMX16) -DECLARE_INSN(khmx8, MATCH_KHMX8, MASK_KHMX8) -DECLARE_INSN(kmabb, MATCH_KMABB, MASK_KMABB) -DECLARE_INSN(kmabb32, MATCH_KMABB32, MASK_KMABB32) -DECLARE_INSN(kmabt, MATCH_KMABT, MASK_KMABT) -DECLARE_INSN(kmabt32, MATCH_KMABT32, MASK_KMABT32) -DECLARE_INSN(kmada, MATCH_KMADA, MASK_KMADA) -DECLARE_INSN(kmadrs, MATCH_KMADRS, MASK_KMADRS) -DECLARE_INSN(kmadrs32, MATCH_KMADRS32, MASK_KMADRS32) -DECLARE_INSN(kmads, MATCH_KMADS, MASK_KMADS) -DECLARE_INSN(kmads32, MATCH_KMADS32, MASK_KMADS32) -DECLARE_INSN(kmar64, MATCH_KMAR64, MASK_KMAR64) -DECLARE_INSN(kmatt, MATCH_KMATT, MASK_KMATT) -DECLARE_INSN(kmatt32, MATCH_KMATT32, MASK_KMATT32) -DECLARE_INSN(kmaxda, MATCH_KMAXDA, MASK_KMAXDA) -DECLARE_INSN(kmaxda32, MATCH_KMAXDA32, MASK_KMAXDA32) -DECLARE_INSN(kmaxds, MATCH_KMAXDS, MASK_KMAXDS) -DECLARE_INSN(kmaxds32, MATCH_KMAXDS32, MASK_KMAXDS32) -DECLARE_INSN(kmda, MATCH_KMDA, MASK_KMDA) -DECLARE_INSN(kmda32, MATCH_KMDA32, MASK_KMDA32) -DECLARE_INSN(kmmac, MATCH_KMMAC, MASK_KMMAC) -DECLARE_INSN(kmmac_u, MATCH_KMMAC_U, MASK_KMMAC_U) -DECLARE_INSN(kmmawb, MATCH_KMMAWB, MASK_KMMAWB) -DECLARE_INSN(kmmawb2, MATCH_KMMAWB2, MASK_KMMAWB2) -DECLARE_INSN(kmmawb2_u, MATCH_KMMAWB2_U, MASK_KMMAWB2_U) -DECLARE_INSN(kmmawb_u, MATCH_KMMAWB_U, MASK_KMMAWB_U) -DECLARE_INSN(kmmawt, MATCH_KMMAWT, MASK_KMMAWT) -DECLARE_INSN(kmmawt2, MATCH_KMMAWT2, MASK_KMMAWT2) -DECLARE_INSN(kmmawt2_u, MATCH_KMMAWT2_U, MASK_KMMAWT2_U) -DECLARE_INSN(kmmawt_u, MATCH_KMMAWT_U, MASK_KMMAWT_U) -DECLARE_INSN(kmmsb, MATCH_KMMSB, MASK_KMMSB) -DECLARE_INSN(kmmsb_u, MATCH_KMMSB_U, MASK_KMMSB_U) -DECLARE_INSN(kmmwb2, MATCH_KMMWB2, MASK_KMMWB2) -DECLARE_INSN(kmmwb2_u, MATCH_KMMWB2_U, MASK_KMMWB2_U) -DECLARE_INSN(kmmwt2, MATCH_KMMWT2, MASK_KMMWT2) -DECLARE_INSN(kmmwt2_u, MATCH_KMMWT2_U, MASK_KMMWT2_U) -DECLARE_INSN(kmsda, MATCH_KMSDA, MASK_KMSDA) -DECLARE_INSN(kmsda32, MATCH_KMSDA32, MASK_KMSDA32) -DECLARE_INSN(kmsr64, MATCH_KMSR64, MASK_KMSR64) -DECLARE_INSN(kmsxda, MATCH_KMSXDA, MASK_KMSXDA) -DECLARE_INSN(kmsxda32, MATCH_KMSXDA32, MASK_KMSXDA32) -DECLARE_INSN(kmxda, MATCH_KMXDA, MASK_KMXDA) -DECLARE_INSN(kmxda32, MATCH_KMXDA32, MASK_KMXDA32) -DECLARE_INSN(ksll16, MATCH_KSLL16, MASK_KSLL16) -DECLARE_INSN(ksll32, MATCH_KSLL32, MASK_KSLL32) -DECLARE_INSN(ksll8, MATCH_KSLL8, MASK_KSLL8) -DECLARE_INSN(kslli16, MATCH_KSLLI16, MASK_KSLLI16) -DECLARE_INSN(kslli32, MATCH_KSLLI32, MASK_KSLLI32) -DECLARE_INSN(kslli8, MATCH_KSLLI8, MASK_KSLLI8) -DECLARE_INSN(kslliw, MATCH_KSLLIW, MASK_KSLLIW) -DECLARE_INSN(ksllw, MATCH_KSLLW, MASK_KSLLW) -DECLARE_INSN(kslra16, MATCH_KSLRA16, MASK_KSLRA16) -DECLARE_INSN(kslra16_u, MATCH_KSLRA16_U, MASK_KSLRA16_U) -DECLARE_INSN(kslra32, MATCH_KSLRA32, MASK_KSLRA32) -DECLARE_INSN(kslra32_u, MATCH_KSLRA32_U, MASK_KSLRA32_U) -DECLARE_INSN(kslra8, MATCH_KSLRA8, MASK_KSLRA8) -DECLARE_INSN(kslra8_u, MATCH_KSLRA8_U, MASK_KSLRA8_U) -DECLARE_INSN(kslraw, MATCH_KSLRAW, MASK_KSLRAW) -DECLARE_INSN(kslraw_u, MATCH_KSLRAW_U, MASK_KSLRAW_U) -DECLARE_INSN(kstas16, MATCH_KSTAS16, MASK_KSTAS16) -DECLARE_INSN(kstas32, MATCH_KSTAS32, MASK_KSTAS32) -DECLARE_INSN(kstsa16, MATCH_KSTSA16, MASK_KSTSA16) -DECLARE_INSN(kstsa32, MATCH_KSTSA32, MASK_KSTSA32) -DECLARE_INSN(ksub16, MATCH_KSUB16, MASK_KSUB16) -DECLARE_INSN(ksub32, MATCH_KSUB32, MASK_KSUB32) -DECLARE_INSN(ksub64, MATCH_KSUB64, MASK_KSUB64) -DECLARE_INSN(ksub8, MATCH_KSUB8, MASK_KSUB8) -DECLARE_INSN(ksubh, MATCH_KSUBH, MASK_KSUBH) -DECLARE_INSN(ksubw, MATCH_KSUBW, MASK_KSUBW) -DECLARE_INSN(kwmmul, MATCH_KWMMUL, MASK_KWMMUL) -DECLARE_INSN(kwmmul_u, MATCH_KWMMUL_U, MASK_KWMMUL_U) -DECLARE_INSN(lb, MATCH_LB, MASK_LB) -DECLARE_INSN(lbu, MATCH_LBU, MASK_LBU) -DECLARE_INSN(ld, MATCH_LD, MASK_LD) -DECLARE_INSN(lh, MATCH_LH, MASK_LH) -DECLARE_INSN(lhu, MATCH_LHU, MASK_LHU) -DECLARE_INSN(lr_d, MATCH_LR_D, MASK_LR_D) -DECLARE_INSN(lr_w, MATCH_LR_W, MASK_LR_W) -DECLARE_INSN(lui, MATCH_LUI, MASK_LUI) -DECLARE_INSN(lw, MATCH_LW, MASK_LW) -DECLARE_INSN(lwu, MATCH_LWU, MASK_LWU) -DECLARE_INSN(maddr32, MATCH_MADDR32, MASK_MADDR32) -DECLARE_INSN(max, MATCH_MAX, MASK_MAX) -DECLARE_INSN(maxu, MATCH_MAXU, MASK_MAXU) -DECLARE_INSN(min, MATCH_MIN, MASK_MIN) -DECLARE_INSN(minu, MATCH_MINU, MASK_MINU) -DECLARE_INSN(mnret, MATCH_MNRET, MASK_MNRET) -DECLARE_INSN(mret, MATCH_MRET, MASK_MRET) -DECLARE_INSN(msubr32, MATCH_MSUBR32, MASK_MSUBR32) -DECLARE_INSN(mul, MATCH_MUL, MASK_MUL) -DECLARE_INSN(mulh, MATCH_MULH, MASK_MULH) -DECLARE_INSN(mulhsu, MATCH_MULHSU, MASK_MULHSU) -DECLARE_INSN(mulhu, MATCH_MULHU, MASK_MULHU) -DECLARE_INSN(mulr64, MATCH_MULR64, MASK_MULR64) -DECLARE_INSN(mulsr64, MATCH_MULSR64, MASK_MULSR64) -DECLARE_INSN(mulw, MATCH_MULW, MASK_MULW) -DECLARE_INSN(or, MATCH_OR, MASK_OR) -DECLARE_INSN(ori, MATCH_ORI, MASK_ORI) -DECLARE_INSN(orn, MATCH_ORN, MASK_ORN) -DECLARE_INSN(pack, MATCH_PACK, MASK_PACK) -DECLARE_INSN(packh, MATCH_PACKH, MASK_PACKH) -DECLARE_INSN(packu, MATCH_PACKU, MASK_PACKU) -DECLARE_INSN(packuw, MATCH_PACKUW, MASK_PACKUW) -DECLARE_INSN(packw, MATCH_PACKW, MASK_PACKW) -DECLARE_INSN(pause, MATCH_PAUSE, MASK_PAUSE) -DECLARE_INSN(pbsad, MATCH_PBSAD, MASK_PBSAD) -DECLARE_INSN(pbsada, MATCH_PBSADA, MASK_PBSADA) -DECLARE_INSN(pkbb16, MATCH_PKBB16, MASK_PKBB16) -DECLARE_INSN(pkbt16, MATCH_PKBT16, MASK_PKBT16) -DECLARE_INSN(pkbt32, MATCH_PKBT32, MASK_PKBT32) -DECLARE_INSN(pktb16, MATCH_PKTB16, MASK_PKTB16) -DECLARE_INSN(pktb32, MATCH_PKTB32, MASK_PKTB32) -DECLARE_INSN(pktt16, MATCH_PKTT16, MASK_PKTT16) -DECLARE_INSN(prefetch_i, MATCH_PREFETCH_I, MASK_PREFETCH_I) -DECLARE_INSN(prefetch_r, MATCH_PREFETCH_R, MASK_PREFETCH_R) -DECLARE_INSN(prefetch_w, MATCH_PREFETCH_W, MASK_PREFETCH_W) -DECLARE_INSN(radd16, MATCH_RADD16, MASK_RADD16) -DECLARE_INSN(radd32, MATCH_RADD32, MASK_RADD32) -DECLARE_INSN(radd64, MATCH_RADD64, MASK_RADD64) -DECLARE_INSN(radd8, MATCH_RADD8, MASK_RADD8) -DECLARE_INSN(raddw, MATCH_RADDW, MASK_RADDW) -DECLARE_INSN(rcras16, MATCH_RCRAS16, MASK_RCRAS16) -DECLARE_INSN(rcras32, MATCH_RCRAS32, MASK_RCRAS32) -DECLARE_INSN(rcrsa16, MATCH_RCRSA16, MASK_RCRSA16) -DECLARE_INSN(rcrsa32, MATCH_RCRSA32, MASK_RCRSA32) -DECLARE_INSN(rem, MATCH_REM, MASK_REM) -DECLARE_INSN(remu, MATCH_REMU, MASK_REMU) -DECLARE_INSN(remuw, MATCH_REMUW, MASK_REMUW) -DECLARE_INSN(remw, MATCH_REMW, MASK_REMW) -DECLARE_INSN(rol, MATCH_ROL, MASK_ROL) -DECLARE_INSN(rolw, MATCH_ROLW, MASK_ROLW) -DECLARE_INSN(ror, MATCH_ROR, MASK_ROR) -DECLARE_INSN(rori, MATCH_RORI, MASK_RORI) -DECLARE_INSN(roriw, MATCH_RORIW, MASK_RORIW) -DECLARE_INSN(rorw, MATCH_RORW, MASK_RORW) -DECLARE_INSN(rstas16, MATCH_RSTAS16, MASK_RSTAS16) -DECLARE_INSN(rstas32, MATCH_RSTAS32, MASK_RSTAS32) -DECLARE_INSN(rstsa16, MATCH_RSTSA16, MASK_RSTSA16) -DECLARE_INSN(rstsa32, MATCH_RSTSA32, MASK_RSTSA32) -DECLARE_INSN(rsub16, MATCH_RSUB16, MASK_RSUB16) -DECLARE_INSN(rsub32, MATCH_RSUB32, MASK_RSUB32) -DECLARE_INSN(rsub64, MATCH_RSUB64, MASK_RSUB64) -DECLARE_INSN(rsub8, MATCH_RSUB8, MASK_RSUB8) -DECLARE_INSN(rsubw, MATCH_RSUBW, MASK_RSUBW) -DECLARE_INSN(sb, MATCH_SB, MASK_SB) -DECLARE_INSN(sc_d, MATCH_SC_D, MASK_SC_D) -DECLARE_INSN(sc_w, MATCH_SC_W, MASK_SC_W) -DECLARE_INSN(sclip16, MATCH_SCLIP16, MASK_SCLIP16) -DECLARE_INSN(sclip32, MATCH_SCLIP32, MASK_SCLIP32) -DECLARE_INSN(sclip8, MATCH_SCLIP8, MASK_SCLIP8) -DECLARE_INSN(scmple16, MATCH_SCMPLE16, MASK_SCMPLE16) -DECLARE_INSN(scmple8, MATCH_SCMPLE8, MASK_SCMPLE8) -DECLARE_INSN(scmplt16, MATCH_SCMPLT16, MASK_SCMPLT16) -DECLARE_INSN(scmplt8, MATCH_SCMPLT8, MASK_SCMPLT8) -DECLARE_INSN(sd, MATCH_SD, MASK_SD) -DECLARE_INSN(sext_b, MATCH_SEXT_B, MASK_SEXT_B) -DECLARE_INSN(sext_h, MATCH_SEXT_H, MASK_SEXT_H) -DECLARE_INSN(sfence_inval_ir, MATCH_SFENCE_INVAL_IR, MASK_SFENCE_INVAL_IR) -DECLARE_INSN(sfence_vma, MATCH_SFENCE_VMA, MASK_SFENCE_VMA) -DECLARE_INSN(sfence_w_inval, MATCH_SFENCE_W_INVAL, MASK_SFENCE_W_INVAL) -DECLARE_INSN(sh, MATCH_SH, MASK_SH) -DECLARE_INSN(sh1add, MATCH_SH1ADD, MASK_SH1ADD) -DECLARE_INSN(sh1add_uw, MATCH_SH1ADD_UW, MASK_SH1ADD_UW) -DECLARE_INSN(sh2add, MATCH_SH2ADD, MASK_SH2ADD) -DECLARE_INSN(sh2add_uw, MATCH_SH2ADD_UW, MASK_SH2ADD_UW) -DECLARE_INSN(sh3add, MATCH_SH3ADD, MASK_SH3ADD) -DECLARE_INSN(sh3add_uw, MATCH_SH3ADD_UW, MASK_SH3ADD_UW) -DECLARE_INSN(sha256sig0, MATCH_SHA256SIG0, MASK_SHA256SIG0) -DECLARE_INSN(sha256sig1, MATCH_SHA256SIG1, MASK_SHA256SIG1) -DECLARE_INSN(sha256sum0, MATCH_SHA256SUM0, MASK_SHA256SUM0) -DECLARE_INSN(sha256sum1, MATCH_SHA256SUM1, MASK_SHA256SUM1) -DECLARE_INSN(sha512sig0, MATCH_SHA512SIG0, MASK_SHA512SIG0) -DECLARE_INSN(sha512sig0h, MATCH_SHA512SIG0H, MASK_SHA512SIG0H) -DECLARE_INSN(sha512sig0l, MATCH_SHA512SIG0L, MASK_SHA512SIG0L) -DECLARE_INSN(sha512sig1, MATCH_SHA512SIG1, MASK_SHA512SIG1) -DECLARE_INSN(sha512sig1h, MATCH_SHA512SIG1H, MASK_SHA512SIG1H) -DECLARE_INSN(sha512sig1l, MATCH_SHA512SIG1L, MASK_SHA512SIG1L) -DECLARE_INSN(sha512sum0, MATCH_SHA512SUM0, MASK_SHA512SUM0) -DECLARE_INSN(sha512sum0r, MATCH_SHA512SUM0R, MASK_SHA512SUM0R) -DECLARE_INSN(sha512sum1, MATCH_SHA512SUM1, MASK_SHA512SUM1) -DECLARE_INSN(sha512sum1r, MATCH_SHA512SUM1R, MASK_SHA512SUM1R) -DECLARE_INSN(shfl, MATCH_SHFL, MASK_SHFL) -DECLARE_INSN(shfli, MATCH_SHFLI, MASK_SHFLI) -DECLARE_INSN(shflw, MATCH_SHFLW, MASK_SHFLW) -DECLARE_INSN(sinval_vma, MATCH_SINVAL_VMA, MASK_SINVAL_VMA) -DECLARE_INSN(sll, MATCH_SLL, MASK_SLL) -DECLARE_INSN(sll16, MATCH_SLL16, MASK_SLL16) -DECLARE_INSN(sll32, MATCH_SLL32, MASK_SLL32) -DECLARE_INSN(sll8, MATCH_SLL8, MASK_SLL8) -DECLARE_INSN(slli, MATCH_SLLI, MASK_SLLI) -DECLARE_INSN(slli16, MATCH_SLLI16, MASK_SLLI16) -DECLARE_INSN(slli32, MATCH_SLLI32, MASK_SLLI32) -DECLARE_INSN(slli8, MATCH_SLLI8, MASK_SLLI8) -DECLARE_INSN(slli_rv32, MATCH_SLLI_RV32, MASK_SLLI_RV32) -DECLARE_INSN(slli_uw, MATCH_SLLI_UW, MASK_SLLI_UW) -DECLARE_INSN(slliw, MATCH_SLLIW, MASK_SLLIW) -DECLARE_INSN(sllw, MATCH_SLLW, MASK_SLLW) -DECLARE_INSN(slo, MATCH_SLO, MASK_SLO) -DECLARE_INSN(sloi, MATCH_SLOI, MASK_SLOI) -DECLARE_INSN(sloiw, MATCH_SLOIW, MASK_SLOIW) -DECLARE_INSN(slow, MATCH_SLOW, MASK_SLOW) -DECLARE_INSN(slt, MATCH_SLT, MASK_SLT) -DECLARE_INSN(slti, MATCH_SLTI, MASK_SLTI) -DECLARE_INSN(sltiu, MATCH_SLTIU, MASK_SLTIU) -DECLARE_INSN(sltu, MATCH_SLTU, MASK_SLTU) -DECLARE_INSN(sm3p0, MATCH_SM3P0, MASK_SM3P0) -DECLARE_INSN(sm3p1, MATCH_SM3P1, MASK_SM3P1) -DECLARE_INSN(sm4ed, MATCH_SM4ED, MASK_SM4ED) -DECLARE_INSN(sm4ks, MATCH_SM4KS, MASK_SM4KS) -DECLARE_INSN(smal, MATCH_SMAL, MASK_SMAL) -DECLARE_INSN(smalbb, MATCH_SMALBB, MASK_SMALBB) -DECLARE_INSN(smalbt, MATCH_SMALBT, MASK_SMALBT) -DECLARE_INSN(smalda, MATCH_SMALDA, MASK_SMALDA) -DECLARE_INSN(smaldrs, MATCH_SMALDRS, MASK_SMALDRS) -DECLARE_INSN(smalds, MATCH_SMALDS, MASK_SMALDS) -DECLARE_INSN(smaltt, MATCH_SMALTT, MASK_SMALTT) -DECLARE_INSN(smalxda, MATCH_SMALXDA, MASK_SMALXDA) -DECLARE_INSN(smalxds, MATCH_SMALXDS, MASK_SMALXDS) -DECLARE_INSN(smaqa, MATCH_SMAQA, MASK_SMAQA) -DECLARE_INSN(smaqa_su, MATCH_SMAQA_SU, MASK_SMAQA_SU) -DECLARE_INSN(smar64, MATCH_SMAR64, MASK_SMAR64) -DECLARE_INSN(smax16, MATCH_SMAX16, MASK_SMAX16) -DECLARE_INSN(smax32, MATCH_SMAX32, MASK_SMAX32) -DECLARE_INSN(smax8, MATCH_SMAX8, MASK_SMAX8) -DECLARE_INSN(smbb16, MATCH_SMBB16, MASK_SMBB16) -DECLARE_INSN(smbt16, MATCH_SMBT16, MASK_SMBT16) -DECLARE_INSN(smbt32, MATCH_SMBT32, MASK_SMBT32) -DECLARE_INSN(smdrs, MATCH_SMDRS, MASK_SMDRS) -DECLARE_INSN(smdrs32, MATCH_SMDRS32, MASK_SMDRS32) -DECLARE_INSN(smds, MATCH_SMDS, MASK_SMDS) -DECLARE_INSN(smds32, MATCH_SMDS32, MASK_SMDS32) -DECLARE_INSN(smin16, MATCH_SMIN16, MASK_SMIN16) -DECLARE_INSN(smin32, MATCH_SMIN32, MASK_SMIN32) -DECLARE_INSN(smin8, MATCH_SMIN8, MASK_SMIN8) -DECLARE_INSN(smmul, MATCH_SMMUL, MASK_SMMUL) -DECLARE_INSN(smmul_u, MATCH_SMMUL_U, MASK_SMMUL_U) -DECLARE_INSN(smmwb, MATCH_SMMWB, MASK_SMMWB) -DECLARE_INSN(smmwb_u, MATCH_SMMWB_U, MASK_SMMWB_U) -DECLARE_INSN(smmwt, MATCH_SMMWT, MASK_SMMWT) -DECLARE_INSN(smmwt_u, MATCH_SMMWT_U, MASK_SMMWT_U) -DECLARE_INSN(smslda, MATCH_SMSLDA, MASK_SMSLDA) -DECLARE_INSN(smslxda, MATCH_SMSLXDA, MASK_SMSLXDA) -DECLARE_INSN(smsr64, MATCH_SMSR64, MASK_SMSR64) -DECLARE_INSN(smtt16, MATCH_SMTT16, MASK_SMTT16) -DECLARE_INSN(smtt32, MATCH_SMTT32, MASK_SMTT32) -DECLARE_INSN(smul16, MATCH_SMUL16, MASK_SMUL16) -DECLARE_INSN(smul8, MATCH_SMUL8, MASK_SMUL8) -DECLARE_INSN(smulx16, MATCH_SMULX16, MASK_SMULX16) -DECLARE_INSN(smulx8, MATCH_SMULX8, MASK_SMULX8) -DECLARE_INSN(smxds, MATCH_SMXDS, MASK_SMXDS) -DECLARE_INSN(smxds32, MATCH_SMXDS32, MASK_SMXDS32) -DECLARE_INSN(sra, MATCH_SRA, MASK_SRA) -DECLARE_INSN(sra16, MATCH_SRA16, MASK_SRA16) -DECLARE_INSN(sra16_u, MATCH_SRA16_U, MASK_SRA16_U) -DECLARE_INSN(sra32, MATCH_SRA32, MASK_SRA32) -DECLARE_INSN(sra32_u, MATCH_SRA32_U, MASK_SRA32_U) -DECLARE_INSN(sra8, MATCH_SRA8, MASK_SRA8) -DECLARE_INSN(sra8_u, MATCH_SRA8_U, MASK_SRA8_U) -DECLARE_INSN(sra_u, MATCH_SRA_U, MASK_SRA_U) -DECLARE_INSN(srai, MATCH_SRAI, MASK_SRAI) -DECLARE_INSN(srai16, MATCH_SRAI16, MASK_SRAI16) -DECLARE_INSN(srai16_u, MATCH_SRAI16_U, MASK_SRAI16_U) -DECLARE_INSN(srai32, MATCH_SRAI32, MASK_SRAI32) -DECLARE_INSN(srai32_u, MATCH_SRAI32_U, MASK_SRAI32_U) -DECLARE_INSN(srai8, MATCH_SRAI8, MASK_SRAI8) -DECLARE_INSN(srai8_u, MATCH_SRAI8_U, MASK_SRAI8_U) -DECLARE_INSN(srai_rv32, MATCH_SRAI_RV32, MASK_SRAI_RV32) -DECLARE_INSN(srai_u, MATCH_SRAI_U, MASK_SRAI_U) -DECLARE_INSN(sraiw, MATCH_SRAIW, MASK_SRAIW) -DECLARE_INSN(sraiw_u, MATCH_SRAIW_U, MASK_SRAIW_U) -DECLARE_INSN(sraw, MATCH_SRAW, MASK_SRAW) -DECLARE_INSN(sret, MATCH_SRET, MASK_SRET) -DECLARE_INSN(srl, MATCH_SRL, MASK_SRL) -DECLARE_INSN(srl16, MATCH_SRL16, MASK_SRL16) -DECLARE_INSN(srl16_u, MATCH_SRL16_U, MASK_SRL16_U) -DECLARE_INSN(srl32, MATCH_SRL32, MASK_SRL32) -DECLARE_INSN(srl32_u, MATCH_SRL32_U, MASK_SRL32_U) -DECLARE_INSN(srl8, MATCH_SRL8, MASK_SRL8) -DECLARE_INSN(srl8_u, MATCH_SRL8_U, MASK_SRL8_U) -DECLARE_INSN(srli, MATCH_SRLI, MASK_SRLI) -DECLARE_INSN(srli16, MATCH_SRLI16, MASK_SRLI16) -DECLARE_INSN(srli16_u, MATCH_SRLI16_U, MASK_SRLI16_U) -DECLARE_INSN(srli32, MATCH_SRLI32, MASK_SRLI32) -DECLARE_INSN(srli32_u, MATCH_SRLI32_U, MASK_SRLI32_U) -DECLARE_INSN(srli8, MATCH_SRLI8, MASK_SRLI8) -DECLARE_INSN(srli8_u, MATCH_SRLI8_U, MASK_SRLI8_U) -DECLARE_INSN(srli_rv32, MATCH_SRLI_RV32, MASK_SRLI_RV32) -DECLARE_INSN(srliw, MATCH_SRLIW, MASK_SRLIW) -DECLARE_INSN(srlw, MATCH_SRLW, MASK_SRLW) -DECLARE_INSN(sro, MATCH_SRO, MASK_SRO) -DECLARE_INSN(sroi, MATCH_SROI, MASK_SROI) -DECLARE_INSN(sroiw, MATCH_SROIW, MASK_SROIW) -DECLARE_INSN(srow, MATCH_SROW, MASK_SROW) -DECLARE_INSN(stas16, MATCH_STAS16, MASK_STAS16) -DECLARE_INSN(stas32, MATCH_STAS32, MASK_STAS32) -DECLARE_INSN(stsa16, MATCH_STSA16, MASK_STSA16) -DECLARE_INSN(stsa32, MATCH_STSA32, MASK_STSA32) -DECLARE_INSN(sub, MATCH_SUB, MASK_SUB) -DECLARE_INSN(sub16, MATCH_SUB16, MASK_SUB16) -DECLARE_INSN(sub32, MATCH_SUB32, MASK_SUB32) -DECLARE_INSN(sub64, MATCH_SUB64, MASK_SUB64) -DECLARE_INSN(sub8, MATCH_SUB8, MASK_SUB8) -DECLARE_INSN(subw, MATCH_SUBW, MASK_SUBW) -DECLARE_INSN(sunpkd810, MATCH_SUNPKD810, MASK_SUNPKD810) -DECLARE_INSN(sunpkd820, MATCH_SUNPKD820, MASK_SUNPKD820) -DECLARE_INSN(sunpkd830, MATCH_SUNPKD830, MASK_SUNPKD830) -DECLARE_INSN(sunpkd831, MATCH_SUNPKD831, MASK_SUNPKD831) -DECLARE_INSN(sunpkd832, MATCH_SUNPKD832, MASK_SUNPKD832) -DECLARE_INSN(sw, MATCH_SW, MASK_SW) -DECLARE_INSN(uclip16, MATCH_UCLIP16, MASK_UCLIP16) -DECLARE_INSN(uclip32, MATCH_UCLIP32, MASK_UCLIP32) -DECLARE_INSN(uclip8, MATCH_UCLIP8, MASK_UCLIP8) -DECLARE_INSN(ucmple16, MATCH_UCMPLE16, MASK_UCMPLE16) -DECLARE_INSN(ucmple8, MATCH_UCMPLE8, MASK_UCMPLE8) -DECLARE_INSN(ucmplt16, MATCH_UCMPLT16, MASK_UCMPLT16) -DECLARE_INSN(ucmplt8, MATCH_UCMPLT8, MASK_UCMPLT8) -DECLARE_INSN(ukadd16, MATCH_UKADD16, MASK_UKADD16) -DECLARE_INSN(ukadd32, MATCH_UKADD32, MASK_UKADD32) -DECLARE_INSN(ukadd64, MATCH_UKADD64, MASK_UKADD64) -DECLARE_INSN(ukadd8, MATCH_UKADD8, MASK_UKADD8) -DECLARE_INSN(ukaddh, MATCH_UKADDH, MASK_UKADDH) -DECLARE_INSN(ukaddw, MATCH_UKADDW, MASK_UKADDW) -DECLARE_INSN(ukcras16, MATCH_UKCRAS16, MASK_UKCRAS16) -DECLARE_INSN(ukcras32, MATCH_UKCRAS32, MASK_UKCRAS32) -DECLARE_INSN(ukcrsa16, MATCH_UKCRSA16, MASK_UKCRSA16) -DECLARE_INSN(ukcrsa32, MATCH_UKCRSA32, MASK_UKCRSA32) -DECLARE_INSN(ukmar64, MATCH_UKMAR64, MASK_UKMAR64) -DECLARE_INSN(ukmsr64, MATCH_UKMSR64, MASK_UKMSR64) -DECLARE_INSN(ukstas16, MATCH_UKSTAS16, MASK_UKSTAS16) -DECLARE_INSN(ukstas32, MATCH_UKSTAS32, MASK_UKSTAS32) -DECLARE_INSN(ukstsa16, MATCH_UKSTSA16, MASK_UKSTSA16) -DECLARE_INSN(ukstsa32, MATCH_UKSTSA32, MASK_UKSTSA32) -DECLARE_INSN(uksub16, MATCH_UKSUB16, MASK_UKSUB16) -DECLARE_INSN(uksub32, MATCH_UKSUB32, MASK_UKSUB32) -DECLARE_INSN(uksub64, MATCH_UKSUB64, MASK_UKSUB64) -DECLARE_INSN(uksub8, MATCH_UKSUB8, MASK_UKSUB8) -DECLARE_INSN(uksubh, MATCH_UKSUBH, MASK_UKSUBH) -DECLARE_INSN(uksubw, MATCH_UKSUBW, MASK_UKSUBW) -DECLARE_INSN(umaqa, MATCH_UMAQA, MASK_UMAQA) -DECLARE_INSN(umar64, MATCH_UMAR64, MASK_UMAR64) -DECLARE_INSN(umax16, MATCH_UMAX16, MASK_UMAX16) -DECLARE_INSN(umax32, MATCH_UMAX32, MASK_UMAX32) -DECLARE_INSN(umax8, MATCH_UMAX8, MASK_UMAX8) -DECLARE_INSN(umin16, MATCH_UMIN16, MASK_UMIN16) -DECLARE_INSN(umin32, MATCH_UMIN32, MASK_UMIN32) -DECLARE_INSN(umin8, MATCH_UMIN8, MASK_UMIN8) -DECLARE_INSN(umsr64, MATCH_UMSR64, MASK_UMSR64) -DECLARE_INSN(umul16, MATCH_UMUL16, MASK_UMUL16) -DECLARE_INSN(umul8, MATCH_UMUL8, MASK_UMUL8) -DECLARE_INSN(umulx16, MATCH_UMULX16, MASK_UMULX16) -DECLARE_INSN(umulx8, MATCH_UMULX8, MASK_UMULX8) -DECLARE_INSN(unshfl, MATCH_UNSHFL, MASK_UNSHFL) -DECLARE_INSN(unshfli, MATCH_UNSHFLI, MASK_UNSHFLI) -DECLARE_INSN(unshflw, MATCH_UNSHFLW, MASK_UNSHFLW) -DECLARE_INSN(uradd16, MATCH_URADD16, MASK_URADD16) -DECLARE_INSN(uradd32, MATCH_URADD32, MASK_URADD32) -DECLARE_INSN(uradd64, MATCH_URADD64, MASK_URADD64) -DECLARE_INSN(uradd8, MATCH_URADD8, MASK_URADD8) -DECLARE_INSN(uraddw, MATCH_URADDW, MASK_URADDW) -DECLARE_INSN(urcras16, MATCH_URCRAS16, MASK_URCRAS16) -DECLARE_INSN(urcras32, MATCH_URCRAS32, MASK_URCRAS32) -DECLARE_INSN(urcrsa16, MATCH_URCRSA16, MASK_URCRSA16) -DECLARE_INSN(urcrsa32, MATCH_URCRSA32, MASK_URCRSA32) -DECLARE_INSN(urstas16, MATCH_URSTAS16, MASK_URSTAS16) -DECLARE_INSN(urstas32, MATCH_URSTAS32, MASK_URSTAS32) -DECLARE_INSN(urstsa16, MATCH_URSTSA16, MASK_URSTSA16) -DECLARE_INSN(urstsa32, MATCH_URSTSA32, MASK_URSTSA32) -DECLARE_INSN(ursub16, MATCH_URSUB16, MASK_URSUB16) -DECLARE_INSN(ursub32, MATCH_URSUB32, MASK_URSUB32) -DECLARE_INSN(ursub64, MATCH_URSUB64, MASK_URSUB64) -DECLARE_INSN(ursub8, MATCH_URSUB8, MASK_URSUB8) -DECLARE_INSN(ursubw, MATCH_URSUBW, MASK_URSUBW) -DECLARE_INSN(vaadd_vv, MATCH_VAADD_VV, MASK_VAADD_VV) -DECLARE_INSN(vaadd_vx, MATCH_VAADD_VX, MASK_VAADD_VX) -DECLARE_INSN(vaaddu_vv, MATCH_VAADDU_VV, MASK_VAADDU_VV) -DECLARE_INSN(vaaddu_vx, MATCH_VAADDU_VX, MASK_VAADDU_VX) -DECLARE_INSN(vadc_vim, MATCH_VADC_VIM, MASK_VADC_VIM) -DECLARE_INSN(vadc_vvm, MATCH_VADC_VVM, MASK_VADC_VVM) -DECLARE_INSN(vadc_vxm, MATCH_VADC_VXM, MASK_VADC_VXM) -DECLARE_INSN(vadd_vi, MATCH_VADD_VI, MASK_VADD_VI) -DECLARE_INSN(vadd_vv, MATCH_VADD_VV, MASK_VADD_VV) -DECLARE_INSN(vadd_vx, MATCH_VADD_VX, MASK_VADD_VX) -DECLARE_INSN(vamoaddei16_v, MATCH_VAMOADDEI16_V, MASK_VAMOADDEI16_V) -DECLARE_INSN(vamoaddei32_v, MATCH_VAMOADDEI32_V, MASK_VAMOADDEI32_V) -DECLARE_INSN(vamoaddei64_v, MATCH_VAMOADDEI64_V, MASK_VAMOADDEI64_V) -DECLARE_INSN(vamoaddei8_v, MATCH_VAMOADDEI8_V, MASK_VAMOADDEI8_V) -DECLARE_INSN(vamoandei16_v, MATCH_VAMOANDEI16_V, MASK_VAMOANDEI16_V) -DECLARE_INSN(vamoandei32_v, MATCH_VAMOANDEI32_V, MASK_VAMOANDEI32_V) -DECLARE_INSN(vamoandei64_v, MATCH_VAMOANDEI64_V, MASK_VAMOANDEI64_V) -DECLARE_INSN(vamoandei8_v, MATCH_VAMOANDEI8_V, MASK_VAMOANDEI8_V) -DECLARE_INSN(vamomaxei16_v, MATCH_VAMOMAXEI16_V, MASK_VAMOMAXEI16_V) -DECLARE_INSN(vamomaxei32_v, MATCH_VAMOMAXEI32_V, MASK_VAMOMAXEI32_V) -DECLARE_INSN(vamomaxei64_v, MATCH_VAMOMAXEI64_V, MASK_VAMOMAXEI64_V) -DECLARE_INSN(vamomaxei8_v, MATCH_VAMOMAXEI8_V, MASK_VAMOMAXEI8_V) -DECLARE_INSN(vamomaxuei16_v, MATCH_VAMOMAXUEI16_V, MASK_VAMOMAXUEI16_V) -DECLARE_INSN(vamomaxuei32_v, MATCH_VAMOMAXUEI32_V, MASK_VAMOMAXUEI32_V) -DECLARE_INSN(vamomaxuei64_v, MATCH_VAMOMAXUEI64_V, MASK_VAMOMAXUEI64_V) -DECLARE_INSN(vamomaxuei8_v, MATCH_VAMOMAXUEI8_V, MASK_VAMOMAXUEI8_V) -DECLARE_INSN(vamominei16_v, MATCH_VAMOMINEI16_V, MASK_VAMOMINEI16_V) -DECLARE_INSN(vamominei32_v, MATCH_VAMOMINEI32_V, MASK_VAMOMINEI32_V) -DECLARE_INSN(vamominei64_v, MATCH_VAMOMINEI64_V, MASK_VAMOMINEI64_V) -DECLARE_INSN(vamominei8_v, MATCH_VAMOMINEI8_V, MASK_VAMOMINEI8_V) -DECLARE_INSN(vamominuei16_v, MATCH_VAMOMINUEI16_V, MASK_VAMOMINUEI16_V) -DECLARE_INSN(vamominuei32_v, MATCH_VAMOMINUEI32_V, MASK_VAMOMINUEI32_V) -DECLARE_INSN(vamominuei64_v, MATCH_VAMOMINUEI64_V, MASK_VAMOMINUEI64_V) -DECLARE_INSN(vamominuei8_v, MATCH_VAMOMINUEI8_V, MASK_VAMOMINUEI8_V) -DECLARE_INSN(vamoorei16_v, MATCH_VAMOOREI16_V, MASK_VAMOOREI16_V) -DECLARE_INSN(vamoorei32_v, MATCH_VAMOOREI32_V, MASK_VAMOOREI32_V) -DECLARE_INSN(vamoorei64_v, MATCH_VAMOOREI64_V, MASK_VAMOOREI64_V) -DECLARE_INSN(vamoorei8_v, MATCH_VAMOOREI8_V, MASK_VAMOOREI8_V) -DECLARE_INSN(vamoswapei16_v, MATCH_VAMOSWAPEI16_V, MASK_VAMOSWAPEI16_V) -DECLARE_INSN(vamoswapei32_v, MATCH_VAMOSWAPEI32_V, MASK_VAMOSWAPEI32_V) -DECLARE_INSN(vamoswapei64_v, MATCH_VAMOSWAPEI64_V, MASK_VAMOSWAPEI64_V) -DECLARE_INSN(vamoswapei8_v, MATCH_VAMOSWAPEI8_V, MASK_VAMOSWAPEI8_V) -DECLARE_INSN(vamoxorei16_v, MATCH_VAMOXOREI16_V, MASK_VAMOXOREI16_V) -DECLARE_INSN(vamoxorei32_v, MATCH_VAMOXOREI32_V, MASK_VAMOXOREI32_V) -DECLARE_INSN(vamoxorei64_v, MATCH_VAMOXOREI64_V, MASK_VAMOXOREI64_V) -DECLARE_INSN(vamoxorei8_v, MATCH_VAMOXOREI8_V, MASK_VAMOXOREI8_V) -DECLARE_INSN(vand_vi, MATCH_VAND_VI, MASK_VAND_VI) -DECLARE_INSN(vand_vv, MATCH_VAND_VV, MASK_VAND_VV) -DECLARE_INSN(vand_vx, MATCH_VAND_VX, MASK_VAND_VX) -DECLARE_INSN(vasub_vv, MATCH_VASUB_VV, MASK_VASUB_VV) -DECLARE_INSN(vasub_vx, MATCH_VASUB_VX, MASK_VASUB_VX) -DECLARE_INSN(vasubu_vv, MATCH_VASUBU_VV, MASK_VASUBU_VV) -DECLARE_INSN(vasubu_vx, MATCH_VASUBU_VX, MASK_VASUBU_VX) -DECLARE_INSN(vcompress_vm, MATCH_VCOMPRESS_VM, MASK_VCOMPRESS_VM) -DECLARE_INSN(vcpop_m, MATCH_VCPOP_M, MASK_VCPOP_M) -DECLARE_INSN(vdiv_vv, MATCH_VDIV_VV, MASK_VDIV_VV) -DECLARE_INSN(vdiv_vx, MATCH_VDIV_VX, MASK_VDIV_VX) -DECLARE_INSN(vdivu_vv, MATCH_VDIVU_VV, MASK_VDIVU_VV) -DECLARE_INSN(vdivu_vx, MATCH_VDIVU_VX, MASK_VDIVU_VX) -DECLARE_INSN(vfadd_vf, MATCH_VFADD_VF, MASK_VFADD_VF) -DECLARE_INSN(vfadd_vv, MATCH_VFADD_VV, MASK_VFADD_VV) -DECLARE_INSN(vfclass_v, MATCH_VFCLASS_V, MASK_VFCLASS_V) -DECLARE_INSN(vfcvt_f_x_v, MATCH_VFCVT_F_X_V, MASK_VFCVT_F_X_V) -DECLARE_INSN(vfcvt_f_xu_v, MATCH_VFCVT_F_XU_V, MASK_VFCVT_F_XU_V) -DECLARE_INSN(vfcvt_rtz_x_f_v, MATCH_VFCVT_RTZ_X_F_V, MASK_VFCVT_RTZ_X_F_V) -DECLARE_INSN(vfcvt_rtz_xu_f_v, MATCH_VFCVT_RTZ_XU_F_V, MASK_VFCVT_RTZ_XU_F_V) -DECLARE_INSN(vfcvt_x_f_v, MATCH_VFCVT_X_F_V, MASK_VFCVT_X_F_V) -DECLARE_INSN(vfcvt_xu_f_v, MATCH_VFCVT_XU_F_V, MASK_VFCVT_XU_F_V) -DECLARE_INSN(vfdiv_vf, MATCH_VFDIV_VF, MASK_VFDIV_VF) -DECLARE_INSN(vfdiv_vv, MATCH_VFDIV_VV, MASK_VFDIV_VV) -DECLARE_INSN(vfirst_m, MATCH_VFIRST_M, MASK_VFIRST_M) -DECLARE_INSN(vfmacc_vf, MATCH_VFMACC_VF, MASK_VFMACC_VF) -DECLARE_INSN(vfmacc_vv, MATCH_VFMACC_VV, MASK_VFMACC_VV) -DECLARE_INSN(vfmadd_vf, MATCH_VFMADD_VF, MASK_VFMADD_VF) -DECLARE_INSN(vfmadd_vv, MATCH_VFMADD_VV, MASK_VFMADD_VV) -DECLARE_INSN(vfmax_vf, MATCH_VFMAX_VF, MASK_VFMAX_VF) -DECLARE_INSN(vfmax_vv, MATCH_VFMAX_VV, MASK_VFMAX_VV) -DECLARE_INSN(vfmerge_vfm, MATCH_VFMERGE_VFM, MASK_VFMERGE_VFM) -DECLARE_INSN(vfmin_vf, MATCH_VFMIN_VF, MASK_VFMIN_VF) -DECLARE_INSN(vfmin_vv, MATCH_VFMIN_VV, MASK_VFMIN_VV) -DECLARE_INSN(vfmsac_vf, MATCH_VFMSAC_VF, MASK_VFMSAC_VF) -DECLARE_INSN(vfmsac_vv, MATCH_VFMSAC_VV, MASK_VFMSAC_VV) -DECLARE_INSN(vfmsub_vf, MATCH_VFMSUB_VF, MASK_VFMSUB_VF) -DECLARE_INSN(vfmsub_vv, MATCH_VFMSUB_VV, MASK_VFMSUB_VV) -DECLARE_INSN(vfmul_vf, MATCH_VFMUL_VF, MASK_VFMUL_VF) -DECLARE_INSN(vfmul_vv, MATCH_VFMUL_VV, MASK_VFMUL_VV) -DECLARE_INSN(vfmv_f_s, MATCH_VFMV_F_S, MASK_VFMV_F_S) -DECLARE_INSN(vfmv_s_f, MATCH_VFMV_S_F, MASK_VFMV_S_F) -DECLARE_INSN(vfmv_v_f, MATCH_VFMV_V_F, MASK_VFMV_V_F) -DECLARE_INSN(vfncvt_f_f_w, MATCH_VFNCVT_F_F_W, MASK_VFNCVT_F_F_W) -DECLARE_INSN(vfncvt_f_x_w, MATCH_VFNCVT_F_X_W, MASK_VFNCVT_F_X_W) -DECLARE_INSN(vfncvt_f_xu_w, MATCH_VFNCVT_F_XU_W, MASK_VFNCVT_F_XU_W) -DECLARE_INSN(vfncvt_rod_f_f_w, MATCH_VFNCVT_ROD_F_F_W, MASK_VFNCVT_ROD_F_F_W) -DECLARE_INSN(vfncvt_rtz_x_f_w, MATCH_VFNCVT_RTZ_X_F_W, MASK_VFNCVT_RTZ_X_F_W) -DECLARE_INSN(vfncvt_rtz_xu_f_w, MATCH_VFNCVT_RTZ_XU_F_W, MASK_VFNCVT_RTZ_XU_F_W) -DECLARE_INSN(vfncvt_x_f_w, MATCH_VFNCVT_X_F_W, MASK_VFNCVT_X_F_W) -DECLARE_INSN(vfncvt_xu_f_w, MATCH_VFNCVT_XU_F_W, MASK_VFNCVT_XU_F_W) -DECLARE_INSN(vfnmacc_vf, MATCH_VFNMACC_VF, MASK_VFNMACC_VF) -DECLARE_INSN(vfnmacc_vv, MATCH_VFNMACC_VV, MASK_VFNMACC_VV) -DECLARE_INSN(vfnmadd_vf, MATCH_VFNMADD_VF, MASK_VFNMADD_VF) -DECLARE_INSN(vfnmadd_vv, MATCH_VFNMADD_VV, MASK_VFNMADD_VV) -DECLARE_INSN(vfnmsac_vf, MATCH_VFNMSAC_VF, MASK_VFNMSAC_VF) -DECLARE_INSN(vfnmsac_vv, MATCH_VFNMSAC_VV, MASK_VFNMSAC_VV) -DECLARE_INSN(vfnmsub_vf, MATCH_VFNMSUB_VF, MASK_VFNMSUB_VF) -DECLARE_INSN(vfnmsub_vv, MATCH_VFNMSUB_VV, MASK_VFNMSUB_VV) -DECLARE_INSN(vfrdiv_vf, MATCH_VFRDIV_VF, MASK_VFRDIV_VF) -DECLARE_INSN(vfrec7_v, MATCH_VFREC7_V, MASK_VFREC7_V) -DECLARE_INSN(vfredmax_vs, MATCH_VFREDMAX_VS, MASK_VFREDMAX_VS) -DECLARE_INSN(vfredmin_vs, MATCH_VFREDMIN_VS, MASK_VFREDMIN_VS) -DECLARE_INSN(vfredosum_vs, MATCH_VFREDOSUM_VS, MASK_VFREDOSUM_VS) -DECLARE_INSN(vfredusum_vs, MATCH_VFREDUSUM_VS, MASK_VFREDUSUM_VS) -DECLARE_INSN(vfrsqrt7_v, MATCH_VFRSQRT7_V, MASK_VFRSQRT7_V) -DECLARE_INSN(vfrsub_vf, MATCH_VFRSUB_VF, MASK_VFRSUB_VF) -DECLARE_INSN(vfsgnj_vf, MATCH_VFSGNJ_VF, MASK_VFSGNJ_VF) -DECLARE_INSN(vfsgnj_vv, MATCH_VFSGNJ_VV, MASK_VFSGNJ_VV) -DECLARE_INSN(vfsgnjn_vf, MATCH_VFSGNJN_VF, MASK_VFSGNJN_VF) -DECLARE_INSN(vfsgnjn_vv, MATCH_VFSGNJN_VV, MASK_VFSGNJN_VV) -DECLARE_INSN(vfsgnjx_vf, MATCH_VFSGNJX_VF, MASK_VFSGNJX_VF) -DECLARE_INSN(vfsgnjx_vv, MATCH_VFSGNJX_VV, MASK_VFSGNJX_VV) -DECLARE_INSN(vfslide1down_vf, MATCH_VFSLIDE1DOWN_VF, MASK_VFSLIDE1DOWN_VF) -DECLARE_INSN(vfslide1up_vf, MATCH_VFSLIDE1UP_VF, MASK_VFSLIDE1UP_VF) -DECLARE_INSN(vfsqrt_v, MATCH_VFSQRT_V, MASK_VFSQRT_V) -DECLARE_INSN(vfsub_vf, MATCH_VFSUB_VF, MASK_VFSUB_VF) -DECLARE_INSN(vfsub_vv, MATCH_VFSUB_VV, MASK_VFSUB_VV) -DECLARE_INSN(vfwadd_vf, MATCH_VFWADD_VF, MASK_VFWADD_VF) -DECLARE_INSN(vfwadd_vv, MATCH_VFWADD_VV, MASK_VFWADD_VV) -DECLARE_INSN(vfwadd_wf, MATCH_VFWADD_WF, MASK_VFWADD_WF) -DECLARE_INSN(vfwadd_wv, MATCH_VFWADD_WV, MASK_VFWADD_WV) -DECLARE_INSN(vfwcvt_f_f_v, MATCH_VFWCVT_F_F_V, MASK_VFWCVT_F_F_V) -DECLARE_INSN(vfwcvt_f_x_v, MATCH_VFWCVT_F_X_V, MASK_VFWCVT_F_X_V) -DECLARE_INSN(vfwcvt_f_xu_v, MATCH_VFWCVT_F_XU_V, MASK_VFWCVT_F_XU_V) -DECLARE_INSN(vfwcvt_rtz_x_f_v, MATCH_VFWCVT_RTZ_X_F_V, MASK_VFWCVT_RTZ_X_F_V) -DECLARE_INSN(vfwcvt_rtz_xu_f_v, MATCH_VFWCVT_RTZ_XU_F_V, MASK_VFWCVT_RTZ_XU_F_V) -DECLARE_INSN(vfwcvt_x_f_v, MATCH_VFWCVT_X_F_V, MASK_VFWCVT_X_F_V) -DECLARE_INSN(vfwcvt_xu_f_v, MATCH_VFWCVT_XU_F_V, MASK_VFWCVT_XU_F_V) -DECLARE_INSN(vfwmacc_vf, MATCH_VFWMACC_VF, MASK_VFWMACC_VF) -DECLARE_INSN(vfwmacc_vv, MATCH_VFWMACC_VV, MASK_VFWMACC_VV) -DECLARE_INSN(vfwmsac_vf, MATCH_VFWMSAC_VF, MASK_VFWMSAC_VF) -DECLARE_INSN(vfwmsac_vv, MATCH_VFWMSAC_VV, MASK_VFWMSAC_VV) -DECLARE_INSN(vfwmul_vf, MATCH_VFWMUL_VF, MASK_VFWMUL_VF) -DECLARE_INSN(vfwmul_vv, MATCH_VFWMUL_VV, MASK_VFWMUL_VV) -DECLARE_INSN(vfwnmacc_vf, MATCH_VFWNMACC_VF, MASK_VFWNMACC_VF) -DECLARE_INSN(vfwnmacc_vv, MATCH_VFWNMACC_VV, MASK_VFWNMACC_VV) -DECLARE_INSN(vfwnmsac_vf, MATCH_VFWNMSAC_VF, MASK_VFWNMSAC_VF) -DECLARE_INSN(vfwnmsac_vv, MATCH_VFWNMSAC_VV, MASK_VFWNMSAC_VV) -DECLARE_INSN(vfwredosum_vs, MATCH_VFWREDOSUM_VS, MASK_VFWREDOSUM_VS) -DECLARE_INSN(vfwredusum_vs, MATCH_VFWREDUSUM_VS, MASK_VFWREDUSUM_VS) -DECLARE_INSN(vfwsub_vf, MATCH_VFWSUB_VF, MASK_VFWSUB_VF) -DECLARE_INSN(vfwsub_vv, MATCH_VFWSUB_VV, MASK_VFWSUB_VV) -DECLARE_INSN(vfwsub_wf, MATCH_VFWSUB_WF, MASK_VFWSUB_WF) -DECLARE_INSN(vfwsub_wv, MATCH_VFWSUB_WV, MASK_VFWSUB_WV) -DECLARE_INSN(vid_v, MATCH_VID_V, MASK_VID_V) -DECLARE_INSN(viota_m, MATCH_VIOTA_M, MASK_VIOTA_M) -DECLARE_INSN(vl1re16_v, MATCH_VL1RE16_V, MASK_VL1RE16_V) -DECLARE_INSN(vl1re32_v, MATCH_VL1RE32_V, MASK_VL1RE32_V) -DECLARE_INSN(vl1re64_v, MATCH_VL1RE64_V, MASK_VL1RE64_V) -DECLARE_INSN(vl1re8_v, MATCH_VL1RE8_V, MASK_VL1RE8_V) -DECLARE_INSN(vl2re16_v, MATCH_VL2RE16_V, MASK_VL2RE16_V) -DECLARE_INSN(vl2re32_v, MATCH_VL2RE32_V, MASK_VL2RE32_V) -DECLARE_INSN(vl2re64_v, MATCH_VL2RE64_V, MASK_VL2RE64_V) -DECLARE_INSN(vl2re8_v, MATCH_VL2RE8_V, MASK_VL2RE8_V) -DECLARE_INSN(vl4re16_v, MATCH_VL4RE16_V, MASK_VL4RE16_V) -DECLARE_INSN(vl4re32_v, MATCH_VL4RE32_V, MASK_VL4RE32_V) -DECLARE_INSN(vl4re64_v, MATCH_VL4RE64_V, MASK_VL4RE64_V) -DECLARE_INSN(vl4re8_v, MATCH_VL4RE8_V, MASK_VL4RE8_V) -DECLARE_INSN(vl8re16_v, MATCH_VL8RE16_V, MASK_VL8RE16_V) -DECLARE_INSN(vl8re32_v, MATCH_VL8RE32_V, MASK_VL8RE32_V) -DECLARE_INSN(vl8re64_v, MATCH_VL8RE64_V, MASK_VL8RE64_V) -DECLARE_INSN(vl8re8_v, MATCH_VL8RE8_V, MASK_VL8RE8_V) -DECLARE_INSN(vle1024_v, MATCH_VLE1024_V, MASK_VLE1024_V) -DECLARE_INSN(vle1024ff_v, MATCH_VLE1024FF_V, MASK_VLE1024FF_V) -DECLARE_INSN(vle128_v, MATCH_VLE128_V, MASK_VLE128_V) -DECLARE_INSN(vle128ff_v, MATCH_VLE128FF_V, MASK_VLE128FF_V) -DECLARE_INSN(vle16_v, MATCH_VLE16_V, MASK_VLE16_V) -DECLARE_INSN(vle16ff_v, MATCH_VLE16FF_V, MASK_VLE16FF_V) -DECLARE_INSN(vle256_v, MATCH_VLE256_V, MASK_VLE256_V) -DECLARE_INSN(vle256ff_v, MATCH_VLE256FF_V, MASK_VLE256FF_V) -DECLARE_INSN(vle32_v, MATCH_VLE32_V, MASK_VLE32_V) -DECLARE_INSN(vle32ff_v, MATCH_VLE32FF_V, MASK_VLE32FF_V) -DECLARE_INSN(vle512_v, MATCH_VLE512_V, MASK_VLE512_V) -DECLARE_INSN(vle512ff_v, MATCH_VLE512FF_V, MASK_VLE512FF_V) -DECLARE_INSN(vle64_v, MATCH_VLE64_V, MASK_VLE64_V) -DECLARE_INSN(vle64ff_v, MATCH_VLE64FF_V, MASK_VLE64FF_V) -DECLARE_INSN(vle8_v, MATCH_VLE8_V, MASK_VLE8_V) -DECLARE_INSN(vle8ff_v, MATCH_VLE8FF_V, MASK_VLE8FF_V) -DECLARE_INSN(vlm_v, MATCH_VLM_V, MASK_VLM_V) -DECLARE_INSN(vloxei1024_v, MATCH_VLOXEI1024_V, MASK_VLOXEI1024_V) -DECLARE_INSN(vloxei128_v, MATCH_VLOXEI128_V, MASK_VLOXEI128_V) -DECLARE_INSN(vloxei16_v, MATCH_VLOXEI16_V, MASK_VLOXEI16_V) -DECLARE_INSN(vloxei256_v, MATCH_VLOXEI256_V, MASK_VLOXEI256_V) -DECLARE_INSN(vloxei32_v, MATCH_VLOXEI32_V, MASK_VLOXEI32_V) -DECLARE_INSN(vloxei512_v, MATCH_VLOXEI512_V, MASK_VLOXEI512_V) -DECLARE_INSN(vloxei64_v, MATCH_VLOXEI64_V, MASK_VLOXEI64_V) -DECLARE_INSN(vloxei8_v, MATCH_VLOXEI8_V, MASK_VLOXEI8_V) -DECLARE_INSN(vlse1024_v, MATCH_VLSE1024_V, MASK_VLSE1024_V) -DECLARE_INSN(vlse128_v, MATCH_VLSE128_V, MASK_VLSE128_V) -DECLARE_INSN(vlse16_v, MATCH_VLSE16_V, MASK_VLSE16_V) -DECLARE_INSN(vlse256_v, MATCH_VLSE256_V, MASK_VLSE256_V) -DECLARE_INSN(vlse32_v, MATCH_VLSE32_V, MASK_VLSE32_V) -DECLARE_INSN(vlse512_v, MATCH_VLSE512_V, MASK_VLSE512_V) -DECLARE_INSN(vlse64_v, MATCH_VLSE64_V, MASK_VLSE64_V) -DECLARE_INSN(vlse8_v, MATCH_VLSE8_V, MASK_VLSE8_V) -DECLARE_INSN(vluxei1024_v, MATCH_VLUXEI1024_V, MASK_VLUXEI1024_V) -DECLARE_INSN(vluxei128_v, MATCH_VLUXEI128_V, MASK_VLUXEI128_V) -DECLARE_INSN(vluxei16_v, MATCH_VLUXEI16_V, MASK_VLUXEI16_V) -DECLARE_INSN(vluxei256_v, MATCH_VLUXEI256_V, MASK_VLUXEI256_V) -DECLARE_INSN(vluxei32_v, MATCH_VLUXEI32_V, MASK_VLUXEI32_V) -DECLARE_INSN(vluxei512_v, MATCH_VLUXEI512_V, MASK_VLUXEI512_V) -DECLARE_INSN(vluxei64_v, MATCH_VLUXEI64_V, MASK_VLUXEI64_V) -DECLARE_INSN(vluxei8_v, MATCH_VLUXEI8_V, MASK_VLUXEI8_V) -DECLARE_INSN(vmacc_vv, MATCH_VMACC_VV, MASK_VMACC_VV) -DECLARE_INSN(vmacc_vx, MATCH_VMACC_VX, MASK_VMACC_VX) -DECLARE_INSN(vmadc_vi, MATCH_VMADC_VI, MASK_VMADC_VI) -DECLARE_INSN(vmadc_vim, MATCH_VMADC_VIM, MASK_VMADC_VIM) -DECLARE_INSN(vmadc_vv, MATCH_VMADC_VV, MASK_VMADC_VV) -DECLARE_INSN(vmadc_vvm, MATCH_VMADC_VVM, MASK_VMADC_VVM) -DECLARE_INSN(vmadc_vx, MATCH_VMADC_VX, MASK_VMADC_VX) -DECLARE_INSN(vmadc_vxm, MATCH_VMADC_VXM, MASK_VMADC_VXM) -DECLARE_INSN(vmadd_vv, MATCH_VMADD_VV, MASK_VMADD_VV) -DECLARE_INSN(vmadd_vx, MATCH_VMADD_VX, MASK_VMADD_VX) -DECLARE_INSN(vmand_mm, MATCH_VMAND_MM, MASK_VMAND_MM) -DECLARE_INSN(vmandn_mm, MATCH_VMANDN_MM, MASK_VMANDN_MM) -DECLARE_INSN(vmax_vv, MATCH_VMAX_VV, MASK_VMAX_VV) -DECLARE_INSN(vmax_vx, MATCH_VMAX_VX, MASK_VMAX_VX) -DECLARE_INSN(vmaxu_vv, MATCH_VMAXU_VV, MASK_VMAXU_VV) -DECLARE_INSN(vmaxu_vx, MATCH_VMAXU_VX, MASK_VMAXU_VX) -DECLARE_INSN(vmerge_vim, MATCH_VMERGE_VIM, MASK_VMERGE_VIM) -DECLARE_INSN(vmerge_vvm, MATCH_VMERGE_VVM, MASK_VMERGE_VVM) -DECLARE_INSN(vmerge_vxm, MATCH_VMERGE_VXM, MASK_VMERGE_VXM) -DECLARE_INSN(vmfeq_vf, MATCH_VMFEQ_VF, MASK_VMFEQ_VF) -DECLARE_INSN(vmfeq_vv, MATCH_VMFEQ_VV, MASK_VMFEQ_VV) -DECLARE_INSN(vmfge_vf, MATCH_VMFGE_VF, MASK_VMFGE_VF) -DECLARE_INSN(vmfgt_vf, MATCH_VMFGT_VF, MASK_VMFGT_VF) -DECLARE_INSN(vmfle_vf, MATCH_VMFLE_VF, MASK_VMFLE_VF) -DECLARE_INSN(vmfle_vv, MATCH_VMFLE_VV, MASK_VMFLE_VV) -DECLARE_INSN(vmflt_vf, MATCH_VMFLT_VF, MASK_VMFLT_VF) -DECLARE_INSN(vmflt_vv, MATCH_VMFLT_VV, MASK_VMFLT_VV) -DECLARE_INSN(vmfne_vf, MATCH_VMFNE_VF, MASK_VMFNE_VF) -DECLARE_INSN(vmfne_vv, MATCH_VMFNE_VV, MASK_VMFNE_VV) -DECLARE_INSN(vmin_vv, MATCH_VMIN_VV, MASK_VMIN_VV) -DECLARE_INSN(vmin_vx, MATCH_VMIN_VX, MASK_VMIN_VX) -DECLARE_INSN(vminu_vv, MATCH_VMINU_VV, MASK_VMINU_VV) -DECLARE_INSN(vminu_vx, MATCH_VMINU_VX, MASK_VMINU_VX) -DECLARE_INSN(vmnand_mm, MATCH_VMNAND_MM, MASK_VMNAND_MM) -DECLARE_INSN(vmnor_mm, MATCH_VMNOR_MM, MASK_VMNOR_MM) -DECLARE_INSN(vmor_mm, MATCH_VMOR_MM, MASK_VMOR_MM) -DECLARE_INSN(vmorn_mm, MATCH_VMORN_MM, MASK_VMORN_MM) -DECLARE_INSN(vmsbc_vv, MATCH_VMSBC_VV, MASK_VMSBC_VV) -DECLARE_INSN(vmsbc_vvm, MATCH_VMSBC_VVM, MASK_VMSBC_VVM) -DECLARE_INSN(vmsbc_vx, MATCH_VMSBC_VX, MASK_VMSBC_VX) -DECLARE_INSN(vmsbc_vxm, MATCH_VMSBC_VXM, MASK_VMSBC_VXM) -DECLARE_INSN(vmsbf_m, MATCH_VMSBF_M, MASK_VMSBF_M) -DECLARE_INSN(vmseq_vi, MATCH_VMSEQ_VI, MASK_VMSEQ_VI) -DECLARE_INSN(vmseq_vv, MATCH_VMSEQ_VV, MASK_VMSEQ_VV) -DECLARE_INSN(vmseq_vx, MATCH_VMSEQ_VX, MASK_VMSEQ_VX) -DECLARE_INSN(vmsgt_vi, MATCH_VMSGT_VI, MASK_VMSGT_VI) -DECLARE_INSN(vmsgt_vx, MATCH_VMSGT_VX, MASK_VMSGT_VX) -DECLARE_INSN(vmsgtu_vi, MATCH_VMSGTU_VI, MASK_VMSGTU_VI) -DECLARE_INSN(vmsgtu_vx, MATCH_VMSGTU_VX, MASK_VMSGTU_VX) -DECLARE_INSN(vmsif_m, MATCH_VMSIF_M, MASK_VMSIF_M) -DECLARE_INSN(vmsle_vi, MATCH_VMSLE_VI, MASK_VMSLE_VI) -DECLARE_INSN(vmsle_vv, MATCH_VMSLE_VV, MASK_VMSLE_VV) -DECLARE_INSN(vmsle_vx, MATCH_VMSLE_VX, MASK_VMSLE_VX) -DECLARE_INSN(vmsleu_vi, MATCH_VMSLEU_VI, MASK_VMSLEU_VI) -DECLARE_INSN(vmsleu_vv, MATCH_VMSLEU_VV, MASK_VMSLEU_VV) -DECLARE_INSN(vmsleu_vx, MATCH_VMSLEU_VX, MASK_VMSLEU_VX) -DECLARE_INSN(vmslt_vv, MATCH_VMSLT_VV, MASK_VMSLT_VV) -DECLARE_INSN(vmslt_vx, MATCH_VMSLT_VX, MASK_VMSLT_VX) -DECLARE_INSN(vmsltu_vv, MATCH_VMSLTU_VV, MASK_VMSLTU_VV) -DECLARE_INSN(vmsltu_vx, MATCH_VMSLTU_VX, MASK_VMSLTU_VX) -DECLARE_INSN(vmsne_vi, MATCH_VMSNE_VI, MASK_VMSNE_VI) -DECLARE_INSN(vmsne_vv, MATCH_VMSNE_VV, MASK_VMSNE_VV) -DECLARE_INSN(vmsne_vx, MATCH_VMSNE_VX, MASK_VMSNE_VX) -DECLARE_INSN(vmsof_m, MATCH_VMSOF_M, MASK_VMSOF_M) -DECLARE_INSN(vmul_vv, MATCH_VMUL_VV, MASK_VMUL_VV) -DECLARE_INSN(vmul_vx, MATCH_VMUL_VX, MASK_VMUL_VX) -DECLARE_INSN(vmulh_vv, MATCH_VMULH_VV, MASK_VMULH_VV) -DECLARE_INSN(vmulh_vx, MATCH_VMULH_VX, MASK_VMULH_VX) -DECLARE_INSN(vmulhsu_vv, MATCH_VMULHSU_VV, MASK_VMULHSU_VV) -DECLARE_INSN(vmulhsu_vx, MATCH_VMULHSU_VX, MASK_VMULHSU_VX) -DECLARE_INSN(vmulhu_vv, MATCH_VMULHU_VV, MASK_VMULHU_VV) -DECLARE_INSN(vmulhu_vx, MATCH_VMULHU_VX, MASK_VMULHU_VX) -DECLARE_INSN(vmv1r_v, MATCH_VMV1R_V, MASK_VMV1R_V) -DECLARE_INSN(vmv2r_v, MATCH_VMV2R_V, MASK_VMV2R_V) -DECLARE_INSN(vmv4r_v, MATCH_VMV4R_V, MASK_VMV4R_V) -DECLARE_INSN(vmv8r_v, MATCH_VMV8R_V, MASK_VMV8R_V) -DECLARE_INSN(vmv_s_x, MATCH_VMV_S_X, MASK_VMV_S_X) -DECLARE_INSN(vmv_v_i, MATCH_VMV_V_I, MASK_VMV_V_I) -DECLARE_INSN(vmv_v_v, MATCH_VMV_V_V, MASK_VMV_V_V) -DECLARE_INSN(vmv_v_x, MATCH_VMV_V_X, MASK_VMV_V_X) -DECLARE_INSN(vmv_x_s, MATCH_VMV_X_S, MASK_VMV_X_S) -DECLARE_INSN(vmxnor_mm, MATCH_VMXNOR_MM, MASK_VMXNOR_MM) -DECLARE_INSN(vmxor_mm, MATCH_VMXOR_MM, MASK_VMXOR_MM) -DECLARE_INSN(vnclip_wi, MATCH_VNCLIP_WI, MASK_VNCLIP_WI) -DECLARE_INSN(vnclip_wv, MATCH_VNCLIP_WV, MASK_VNCLIP_WV) -DECLARE_INSN(vnclip_wx, MATCH_VNCLIP_WX, MASK_VNCLIP_WX) -DECLARE_INSN(vnclipu_wi, MATCH_VNCLIPU_WI, MASK_VNCLIPU_WI) -DECLARE_INSN(vnclipu_wv, MATCH_VNCLIPU_WV, MASK_VNCLIPU_WV) -DECLARE_INSN(vnclipu_wx, MATCH_VNCLIPU_WX, MASK_VNCLIPU_WX) -DECLARE_INSN(vnmsac_vv, MATCH_VNMSAC_VV, MASK_VNMSAC_VV) -DECLARE_INSN(vnmsac_vx, MATCH_VNMSAC_VX, MASK_VNMSAC_VX) -DECLARE_INSN(vnmsub_vv, MATCH_VNMSUB_VV, MASK_VNMSUB_VV) -DECLARE_INSN(vnmsub_vx, MATCH_VNMSUB_VX, MASK_VNMSUB_VX) -DECLARE_INSN(vnsra_wi, MATCH_VNSRA_WI, MASK_VNSRA_WI) -DECLARE_INSN(vnsra_wv, MATCH_VNSRA_WV, MASK_VNSRA_WV) -DECLARE_INSN(vnsra_wx, MATCH_VNSRA_WX, MASK_VNSRA_WX) -DECLARE_INSN(vnsrl_wi, MATCH_VNSRL_WI, MASK_VNSRL_WI) -DECLARE_INSN(vnsrl_wv, MATCH_VNSRL_WV, MASK_VNSRL_WV) -DECLARE_INSN(vnsrl_wx, MATCH_VNSRL_WX, MASK_VNSRL_WX) -DECLARE_INSN(vor_vi, MATCH_VOR_VI, MASK_VOR_VI) -DECLARE_INSN(vor_vv, MATCH_VOR_VV, MASK_VOR_VV) -DECLARE_INSN(vor_vx, MATCH_VOR_VX, MASK_VOR_VX) -DECLARE_INSN(vredand_vs, MATCH_VREDAND_VS, MASK_VREDAND_VS) -DECLARE_INSN(vredmax_vs, MATCH_VREDMAX_VS, MASK_VREDMAX_VS) -DECLARE_INSN(vredmaxu_vs, MATCH_VREDMAXU_VS, MASK_VREDMAXU_VS) -DECLARE_INSN(vredmin_vs, MATCH_VREDMIN_VS, MASK_VREDMIN_VS) -DECLARE_INSN(vredminu_vs, MATCH_VREDMINU_VS, MASK_VREDMINU_VS) -DECLARE_INSN(vredor_vs, MATCH_VREDOR_VS, MASK_VREDOR_VS) -DECLARE_INSN(vredsum_vs, MATCH_VREDSUM_VS, MASK_VREDSUM_VS) -DECLARE_INSN(vredxor_vs, MATCH_VREDXOR_VS, MASK_VREDXOR_VS) -DECLARE_INSN(vrem_vv, MATCH_VREM_VV, MASK_VREM_VV) -DECLARE_INSN(vrem_vx, MATCH_VREM_VX, MASK_VREM_VX) -DECLARE_INSN(vremu_vv, MATCH_VREMU_VV, MASK_VREMU_VV) -DECLARE_INSN(vremu_vx, MATCH_VREMU_VX, MASK_VREMU_VX) -DECLARE_INSN(vrgather_vi, MATCH_VRGATHER_VI, MASK_VRGATHER_VI) -DECLARE_INSN(vrgather_vv, MATCH_VRGATHER_VV, MASK_VRGATHER_VV) -DECLARE_INSN(vrgather_vx, MATCH_VRGATHER_VX, MASK_VRGATHER_VX) -DECLARE_INSN(vrgatherei16_vv, MATCH_VRGATHEREI16_VV, MASK_VRGATHEREI16_VV) -DECLARE_INSN(vrsub_vi, MATCH_VRSUB_VI, MASK_VRSUB_VI) -DECLARE_INSN(vrsub_vx, MATCH_VRSUB_VX, MASK_VRSUB_VX) -DECLARE_INSN(vs1r_v, MATCH_VS1R_V, MASK_VS1R_V) -DECLARE_INSN(vs2r_v, MATCH_VS2R_V, MASK_VS2R_V) -DECLARE_INSN(vs4r_v, MATCH_VS4R_V, MASK_VS4R_V) -DECLARE_INSN(vs8r_v, MATCH_VS8R_V, MASK_VS8R_V) -DECLARE_INSN(vsadd_vi, MATCH_VSADD_VI, MASK_VSADD_VI) -DECLARE_INSN(vsadd_vv, MATCH_VSADD_VV, MASK_VSADD_VV) -DECLARE_INSN(vsadd_vx, MATCH_VSADD_VX, MASK_VSADD_VX) -DECLARE_INSN(vsaddu_vi, MATCH_VSADDU_VI, MASK_VSADDU_VI) -DECLARE_INSN(vsaddu_vv, MATCH_VSADDU_VV, MASK_VSADDU_VV) -DECLARE_INSN(vsaddu_vx, MATCH_VSADDU_VX, MASK_VSADDU_VX) -DECLARE_INSN(vsbc_vvm, MATCH_VSBC_VVM, MASK_VSBC_VVM) -DECLARE_INSN(vsbc_vxm, MATCH_VSBC_VXM, MASK_VSBC_VXM) -DECLARE_INSN(vse1024_v, MATCH_VSE1024_V, MASK_VSE1024_V) -DECLARE_INSN(vse128_v, MATCH_VSE128_V, MASK_VSE128_V) -DECLARE_INSN(vse16_v, MATCH_VSE16_V, MASK_VSE16_V) -DECLARE_INSN(vse256_v, MATCH_VSE256_V, MASK_VSE256_V) -DECLARE_INSN(vse32_v, MATCH_VSE32_V, MASK_VSE32_V) -DECLARE_INSN(vse512_v, MATCH_VSE512_V, MASK_VSE512_V) -DECLARE_INSN(vse64_v, MATCH_VSE64_V, MASK_VSE64_V) -DECLARE_INSN(vse8_v, MATCH_VSE8_V, MASK_VSE8_V) -DECLARE_INSN(vsetivli, MATCH_VSETIVLI, MASK_VSETIVLI) -DECLARE_INSN(vsetvl, MATCH_VSETVL, MASK_VSETVL) -DECLARE_INSN(vsetvli, MATCH_VSETVLI, MASK_VSETVLI) -DECLARE_INSN(vsext_vf2, MATCH_VSEXT_VF2, MASK_VSEXT_VF2) -DECLARE_INSN(vsext_vf4, MATCH_VSEXT_VF4, MASK_VSEXT_VF4) -DECLARE_INSN(vsext_vf8, MATCH_VSEXT_VF8, MASK_VSEXT_VF8) -DECLARE_INSN(vslide1down_vx, MATCH_VSLIDE1DOWN_VX, MASK_VSLIDE1DOWN_VX) -DECLARE_INSN(vslide1up_vx, MATCH_VSLIDE1UP_VX, MASK_VSLIDE1UP_VX) -DECLARE_INSN(vslidedown_vi, MATCH_VSLIDEDOWN_VI, MASK_VSLIDEDOWN_VI) -DECLARE_INSN(vslidedown_vx, MATCH_VSLIDEDOWN_VX, MASK_VSLIDEDOWN_VX) -DECLARE_INSN(vslideup_vi, MATCH_VSLIDEUP_VI, MASK_VSLIDEUP_VI) -DECLARE_INSN(vslideup_vx, MATCH_VSLIDEUP_VX, MASK_VSLIDEUP_VX) -DECLARE_INSN(vsll_vi, MATCH_VSLL_VI, MASK_VSLL_VI) -DECLARE_INSN(vsll_vv, MATCH_VSLL_VV, MASK_VSLL_VV) -DECLARE_INSN(vsll_vx, MATCH_VSLL_VX, MASK_VSLL_VX) -DECLARE_INSN(vsm_v, MATCH_VSM_V, MASK_VSM_V) -DECLARE_INSN(vsmul_vv, MATCH_VSMUL_VV, MASK_VSMUL_VV) -DECLARE_INSN(vsmul_vx, MATCH_VSMUL_VX, MASK_VSMUL_VX) -DECLARE_INSN(vsoxei1024_v, MATCH_VSOXEI1024_V, MASK_VSOXEI1024_V) -DECLARE_INSN(vsoxei128_v, MATCH_VSOXEI128_V, MASK_VSOXEI128_V) -DECLARE_INSN(vsoxei16_v, MATCH_VSOXEI16_V, MASK_VSOXEI16_V) -DECLARE_INSN(vsoxei256_v, MATCH_VSOXEI256_V, MASK_VSOXEI256_V) -DECLARE_INSN(vsoxei32_v, MATCH_VSOXEI32_V, MASK_VSOXEI32_V) -DECLARE_INSN(vsoxei512_v, MATCH_VSOXEI512_V, MASK_VSOXEI512_V) -DECLARE_INSN(vsoxei64_v, MATCH_VSOXEI64_V, MASK_VSOXEI64_V) -DECLARE_INSN(vsoxei8_v, MATCH_VSOXEI8_V, MASK_VSOXEI8_V) -DECLARE_INSN(vsra_vi, MATCH_VSRA_VI, MASK_VSRA_VI) -DECLARE_INSN(vsra_vv, MATCH_VSRA_VV, MASK_VSRA_VV) -DECLARE_INSN(vsra_vx, MATCH_VSRA_VX, MASK_VSRA_VX) -DECLARE_INSN(vsrl_vi, MATCH_VSRL_VI, MASK_VSRL_VI) -DECLARE_INSN(vsrl_vv, MATCH_VSRL_VV, MASK_VSRL_VV) -DECLARE_INSN(vsrl_vx, MATCH_VSRL_VX, MASK_VSRL_VX) -DECLARE_INSN(vsse1024_v, MATCH_VSSE1024_V, MASK_VSSE1024_V) -DECLARE_INSN(vsse128_v, MATCH_VSSE128_V, MASK_VSSE128_V) -DECLARE_INSN(vsse16_v, MATCH_VSSE16_V, MASK_VSSE16_V) -DECLARE_INSN(vsse256_v, MATCH_VSSE256_V, MASK_VSSE256_V) -DECLARE_INSN(vsse32_v, MATCH_VSSE32_V, MASK_VSSE32_V) -DECLARE_INSN(vsse512_v, MATCH_VSSE512_V, MASK_VSSE512_V) -DECLARE_INSN(vsse64_v, MATCH_VSSE64_V, MASK_VSSE64_V) -DECLARE_INSN(vsse8_v, MATCH_VSSE8_V, MASK_VSSE8_V) -DECLARE_INSN(vssra_vi, MATCH_VSSRA_VI, MASK_VSSRA_VI) -DECLARE_INSN(vssra_vv, MATCH_VSSRA_VV, MASK_VSSRA_VV) -DECLARE_INSN(vssra_vx, MATCH_VSSRA_VX, MASK_VSSRA_VX) -DECLARE_INSN(vssrl_vi, MATCH_VSSRL_VI, MASK_VSSRL_VI) -DECLARE_INSN(vssrl_vv, MATCH_VSSRL_VV, MASK_VSSRL_VV) -DECLARE_INSN(vssrl_vx, MATCH_VSSRL_VX, MASK_VSSRL_VX) -DECLARE_INSN(vssub_vv, MATCH_VSSUB_VV, MASK_VSSUB_VV) -DECLARE_INSN(vssub_vx, MATCH_VSSUB_VX, MASK_VSSUB_VX) -DECLARE_INSN(vssubu_vv, MATCH_VSSUBU_VV, MASK_VSSUBU_VV) -DECLARE_INSN(vssubu_vx, MATCH_VSSUBU_VX, MASK_VSSUBU_VX) -DECLARE_INSN(vsub_vv, MATCH_VSUB_VV, MASK_VSUB_VV) -DECLARE_INSN(vsub_vx, MATCH_VSUB_VX, MASK_VSUB_VX) -DECLARE_INSN(vsuxei1024_v, MATCH_VSUXEI1024_V, MASK_VSUXEI1024_V) -DECLARE_INSN(vsuxei128_v, MATCH_VSUXEI128_V, MASK_VSUXEI128_V) -DECLARE_INSN(vsuxei16_v, MATCH_VSUXEI16_V, MASK_VSUXEI16_V) -DECLARE_INSN(vsuxei256_v, MATCH_VSUXEI256_V, MASK_VSUXEI256_V) -DECLARE_INSN(vsuxei32_v, MATCH_VSUXEI32_V, MASK_VSUXEI32_V) -DECLARE_INSN(vsuxei512_v, MATCH_VSUXEI512_V, MASK_VSUXEI512_V) -DECLARE_INSN(vsuxei64_v, MATCH_VSUXEI64_V, MASK_VSUXEI64_V) -DECLARE_INSN(vsuxei8_v, MATCH_VSUXEI8_V, MASK_VSUXEI8_V) -DECLARE_INSN(vwadd_vv, MATCH_VWADD_VV, MASK_VWADD_VV) -DECLARE_INSN(vwadd_vx, MATCH_VWADD_VX, MASK_VWADD_VX) -DECLARE_INSN(vwadd_wv, MATCH_VWADD_WV, MASK_VWADD_WV) -DECLARE_INSN(vwadd_wx, MATCH_VWADD_WX, MASK_VWADD_WX) -DECLARE_INSN(vwaddu_vv, MATCH_VWADDU_VV, MASK_VWADDU_VV) -DECLARE_INSN(vwaddu_vx, MATCH_VWADDU_VX, MASK_VWADDU_VX) -DECLARE_INSN(vwaddu_wv, MATCH_VWADDU_WV, MASK_VWADDU_WV) -DECLARE_INSN(vwaddu_wx, MATCH_VWADDU_WX, MASK_VWADDU_WX) -DECLARE_INSN(vwmacc_vv, MATCH_VWMACC_VV, MASK_VWMACC_VV) -DECLARE_INSN(vwmacc_vx, MATCH_VWMACC_VX, MASK_VWMACC_VX) -DECLARE_INSN(vwmaccsu_vv, MATCH_VWMACCSU_VV, MASK_VWMACCSU_VV) -DECLARE_INSN(vwmaccsu_vx, MATCH_VWMACCSU_VX, MASK_VWMACCSU_VX) -DECLARE_INSN(vwmaccu_vv, MATCH_VWMACCU_VV, MASK_VWMACCU_VV) -DECLARE_INSN(vwmaccu_vx, MATCH_VWMACCU_VX, MASK_VWMACCU_VX) -DECLARE_INSN(vwmaccus_vx, MATCH_VWMACCUS_VX, MASK_VWMACCUS_VX) -DECLARE_INSN(vwmul_vv, MATCH_VWMUL_VV, MASK_VWMUL_VV) -DECLARE_INSN(vwmul_vx, MATCH_VWMUL_VX, MASK_VWMUL_VX) -DECLARE_INSN(vwmulsu_vv, MATCH_VWMULSU_VV, MASK_VWMULSU_VV) -DECLARE_INSN(vwmulsu_vx, MATCH_VWMULSU_VX, MASK_VWMULSU_VX) -DECLARE_INSN(vwmulu_vv, MATCH_VWMULU_VV, MASK_VWMULU_VV) -DECLARE_INSN(vwmulu_vx, MATCH_VWMULU_VX, MASK_VWMULU_VX) -DECLARE_INSN(vwredsum_vs, MATCH_VWREDSUM_VS, MASK_VWREDSUM_VS) -DECLARE_INSN(vwredsumu_vs, MATCH_VWREDSUMU_VS, MASK_VWREDSUMU_VS) -DECLARE_INSN(vwsub_vv, MATCH_VWSUB_VV, MASK_VWSUB_VV) -DECLARE_INSN(vwsub_vx, MATCH_VWSUB_VX, MASK_VWSUB_VX) -DECLARE_INSN(vwsub_wv, MATCH_VWSUB_WV, MASK_VWSUB_WV) -DECLARE_INSN(vwsub_wx, MATCH_VWSUB_WX, MASK_VWSUB_WX) -DECLARE_INSN(vwsubu_vv, MATCH_VWSUBU_VV, MASK_VWSUBU_VV) -DECLARE_INSN(vwsubu_vx, MATCH_VWSUBU_VX, MASK_VWSUBU_VX) -DECLARE_INSN(vwsubu_wv, MATCH_VWSUBU_WV, MASK_VWSUBU_WV) -DECLARE_INSN(vwsubu_wx, MATCH_VWSUBU_WX, MASK_VWSUBU_WX) -DECLARE_INSN(vxor_vi, MATCH_VXOR_VI, MASK_VXOR_VI) -DECLARE_INSN(vxor_vv, MATCH_VXOR_VV, MASK_VXOR_VV) -DECLARE_INSN(vxor_vx, MATCH_VXOR_VX, MASK_VXOR_VX) -DECLARE_INSN(vzext_vf2, MATCH_VZEXT_VF2, MASK_VZEXT_VF2) -DECLARE_INSN(vzext_vf4, MATCH_VZEXT_VF4, MASK_VZEXT_VF4) -DECLARE_INSN(vzext_vf8, MATCH_VZEXT_VF8, MASK_VZEXT_VF8) -DECLARE_INSN(wfi, MATCH_WFI, MASK_WFI) -DECLARE_INSN(wrs_nto, MATCH_WRS_NTO, MASK_WRS_NTO) -DECLARE_INSN(wrs_sto, MATCH_WRS_STO, MASK_WRS_STO) -DECLARE_INSN(xnor, MATCH_XNOR, MASK_XNOR) -DECLARE_INSN(xor, MATCH_XOR, MASK_XOR) -DECLARE_INSN(xori, MATCH_XORI, MASK_XORI) -DECLARE_INSN(xperm16, MATCH_XPERM16, MASK_XPERM16) -DECLARE_INSN(xperm32, MATCH_XPERM32, MASK_XPERM32) -DECLARE_INSN(xperm4, MATCH_XPERM4, MASK_XPERM4) -DECLARE_INSN(xperm8, MATCH_XPERM8, MASK_XPERM8) -DECLARE_INSN(zunpkd810, MATCH_ZUNPKD810, MASK_ZUNPKD810) -DECLARE_INSN(zunpkd820, MATCH_ZUNPKD820, MASK_ZUNPKD820) -DECLARE_INSN(zunpkd830, MATCH_ZUNPKD830, MASK_ZUNPKD830) -DECLARE_INSN(zunpkd831, MATCH_ZUNPKD831, MASK_ZUNPKD831) -DECLARE_INSN(zunpkd832, MATCH_ZUNPKD832, MASK_ZUNPKD832) -#endif -#ifdef DECLARE_CSR -DECLARE_CSR(fflags, CSR_FFLAGS) -DECLARE_CSR(frm, CSR_FRM) -DECLARE_CSR(fcsr, CSR_FCSR) -DECLARE_CSR(vstart, CSR_VSTART) -DECLARE_CSR(vxsat, CSR_VXSAT) -DECLARE_CSR(vxrm, CSR_VXRM) -DECLARE_CSR(vcsr, CSR_VCSR) -DECLARE_CSR(seed, CSR_SEED) -DECLARE_CSR(jvt, CSR_JVT) -DECLARE_CSR(cycle, CSR_CYCLE) -DECLARE_CSR(time, CSR_TIME) -DECLARE_CSR(instret, CSR_INSTRET) -DECLARE_CSR(hpmcounter3, CSR_HPMCOUNTER3) -DECLARE_CSR(hpmcounter4, CSR_HPMCOUNTER4) -DECLARE_CSR(hpmcounter5, CSR_HPMCOUNTER5) -DECLARE_CSR(hpmcounter6, CSR_HPMCOUNTER6) -DECLARE_CSR(hpmcounter7, CSR_HPMCOUNTER7) -DECLARE_CSR(hpmcounter8, CSR_HPMCOUNTER8) -DECLARE_CSR(hpmcounter9, CSR_HPMCOUNTER9) -DECLARE_CSR(hpmcounter10, CSR_HPMCOUNTER10) -DECLARE_CSR(hpmcounter11, CSR_HPMCOUNTER11) -DECLARE_CSR(hpmcounter12, CSR_HPMCOUNTER12) -DECLARE_CSR(hpmcounter13, CSR_HPMCOUNTER13) -DECLARE_CSR(hpmcounter14, CSR_HPMCOUNTER14) -DECLARE_CSR(hpmcounter15, CSR_HPMCOUNTER15) -DECLARE_CSR(hpmcounter16, CSR_HPMCOUNTER16) -DECLARE_CSR(hpmcounter17, CSR_HPMCOUNTER17) -DECLARE_CSR(hpmcounter18, CSR_HPMCOUNTER18) -DECLARE_CSR(hpmcounter19, CSR_HPMCOUNTER19) -DECLARE_CSR(hpmcounter20, CSR_HPMCOUNTER20) -DECLARE_CSR(hpmcounter21, CSR_HPMCOUNTER21) -DECLARE_CSR(hpmcounter22, CSR_HPMCOUNTER22) -DECLARE_CSR(hpmcounter23, CSR_HPMCOUNTER23) -DECLARE_CSR(hpmcounter24, CSR_HPMCOUNTER24) -DECLARE_CSR(hpmcounter25, CSR_HPMCOUNTER25) -DECLARE_CSR(hpmcounter26, CSR_HPMCOUNTER26) -DECLARE_CSR(hpmcounter27, CSR_HPMCOUNTER27) -DECLARE_CSR(hpmcounter28, CSR_HPMCOUNTER28) -DECLARE_CSR(hpmcounter29, CSR_HPMCOUNTER29) -DECLARE_CSR(hpmcounter30, CSR_HPMCOUNTER30) -DECLARE_CSR(hpmcounter31, CSR_HPMCOUNTER31) -DECLARE_CSR(vl, CSR_VL) -DECLARE_CSR(vtype, CSR_VTYPE) -DECLARE_CSR(vlenb, CSR_VLENB) -DECLARE_CSR(sstatus, CSR_SSTATUS) -DECLARE_CSR(sedeleg, CSR_SEDELEG) -DECLARE_CSR(sideleg, CSR_SIDELEG) -DECLARE_CSR(sie, CSR_SIE) -DECLARE_CSR(stvec, CSR_STVEC) -DECLARE_CSR(scounteren, CSR_SCOUNTEREN) -DECLARE_CSR(senvcfg, CSR_SENVCFG) -DECLARE_CSR(sstateen0, CSR_SSTATEEN0) -DECLARE_CSR(sstateen1, CSR_SSTATEEN1) -DECLARE_CSR(sstateen2, CSR_SSTATEEN2) -DECLARE_CSR(sstateen3, CSR_SSTATEEN3) -DECLARE_CSR(sscratch, CSR_SSCRATCH) -DECLARE_CSR(sepc, CSR_SEPC) -DECLARE_CSR(scause, CSR_SCAUSE) -DECLARE_CSR(stval, CSR_STVAL) -DECLARE_CSR(sip, CSR_SIP) -DECLARE_CSR(stimecmp, CSR_STIMECMP) -DECLARE_CSR(siselect, CSR_SISELECT) -DECLARE_CSR(sireg, CSR_SIREG) -DECLARE_CSR(stopei, CSR_STOPEI) -DECLARE_CSR(satp, CSR_SATP) -DECLARE_CSR(scontext, CSR_SCONTEXT) -DECLARE_CSR(vsstatus, CSR_VSSTATUS) -DECLARE_CSR(vsie, CSR_VSIE) -DECLARE_CSR(vstvec, CSR_VSTVEC) -DECLARE_CSR(vsscratch, CSR_VSSCRATCH) -DECLARE_CSR(vsepc, CSR_VSEPC) -DECLARE_CSR(vscause, CSR_VSCAUSE) -DECLARE_CSR(vstval, CSR_VSTVAL) -DECLARE_CSR(vsip, CSR_VSIP) -DECLARE_CSR(vstimecmp, CSR_VSTIMECMP) -DECLARE_CSR(vsiselect, CSR_VSISELECT) -DECLARE_CSR(vsireg, CSR_VSIREG) -DECLARE_CSR(vstopei, CSR_VSTOPEI) -DECLARE_CSR(vsatp, CSR_VSATP) -DECLARE_CSR(hstatus, CSR_HSTATUS) -DECLARE_CSR(hedeleg, CSR_HEDELEG) -DECLARE_CSR(hideleg, CSR_HIDELEG) -DECLARE_CSR(hie, CSR_HIE) -DECLARE_CSR(htimedelta, CSR_HTIMEDELTA) -DECLARE_CSR(hcounteren, CSR_HCOUNTEREN) -DECLARE_CSR(hgeie, CSR_HGEIE) -DECLARE_CSR(hvien, CSR_HVIEN) -DECLARE_CSR(hvictl, CSR_HVICTL) -DECLARE_CSR(henvcfg, CSR_HENVCFG) -DECLARE_CSR(hstateen0, CSR_HSTATEEN0) -DECLARE_CSR(hstateen1, CSR_HSTATEEN1) -DECLARE_CSR(hstateen2, CSR_HSTATEEN2) -DECLARE_CSR(hstateen3, CSR_HSTATEEN3) -DECLARE_CSR(htval, CSR_HTVAL) -DECLARE_CSR(hip, CSR_HIP) -DECLARE_CSR(hvip, CSR_HVIP) -DECLARE_CSR(hviprio1, CSR_HVIPRIO1) -DECLARE_CSR(hviprio2, CSR_HVIPRIO2) -DECLARE_CSR(htinst, CSR_HTINST) -DECLARE_CSR(hgatp, CSR_HGATP) -DECLARE_CSR(hcontext, CSR_HCONTEXT) -DECLARE_CSR(hgeip, CSR_HGEIP) -DECLARE_CSR(vstopi, CSR_VSTOPI) -DECLARE_CSR(scountovf, CSR_SCOUNTOVF) -DECLARE_CSR(stopi, CSR_STOPI) -DECLARE_CSR(utvt, CSR_UTVT) -DECLARE_CSR(unxti, CSR_UNXTI) -DECLARE_CSR(uintstatus, CSR_UINTSTATUS) -DECLARE_CSR(uscratchcsw, CSR_USCRATCHCSW) -DECLARE_CSR(uscratchcswl, CSR_USCRATCHCSWL) -DECLARE_CSR(stvt, CSR_STVT) -DECLARE_CSR(snxti, CSR_SNXTI) -DECLARE_CSR(sintstatus, CSR_SINTSTATUS) -DECLARE_CSR(sscratchcsw, CSR_SSCRATCHCSW) -DECLARE_CSR(sscratchcswl, CSR_SSCRATCHCSWL) -DECLARE_CSR(mtvt, CSR_MTVT) -DECLARE_CSR(mnxti, CSR_MNXTI) -DECLARE_CSR(mintstatus, CSR_MINTSTATUS) -DECLARE_CSR(mscratchcsw, CSR_MSCRATCHCSW) -DECLARE_CSR(mscratchcswl, CSR_MSCRATCHCSWL) -DECLARE_CSR(mstatus, CSR_MSTATUS) -DECLARE_CSR(misa, CSR_MISA) -DECLARE_CSR(medeleg, CSR_MEDELEG) -DECLARE_CSR(mideleg, CSR_MIDELEG) -DECLARE_CSR(mie, CSR_MIE) -DECLARE_CSR(mtvec, CSR_MTVEC) -DECLARE_CSR(mcounteren, CSR_MCOUNTEREN) -DECLARE_CSR(mvien, CSR_MVIEN) -DECLARE_CSR(mvip, CSR_MVIP) -DECLARE_CSR(menvcfg, CSR_MENVCFG) -DECLARE_CSR(mstateen0, CSR_MSTATEEN0) -DECLARE_CSR(mstateen1, CSR_MSTATEEN1) -DECLARE_CSR(mstateen2, CSR_MSTATEEN2) -DECLARE_CSR(mstateen3, CSR_MSTATEEN3) -DECLARE_CSR(mcountinhibit, CSR_MCOUNTINHIBIT) -DECLARE_CSR(mscratch, CSR_MSCRATCH) -DECLARE_CSR(mepc, CSR_MEPC) -DECLARE_CSR(mcause, CSR_MCAUSE) -DECLARE_CSR(mtval, CSR_MTVAL) -DECLARE_CSR(mip, CSR_MIP) -DECLARE_CSR(mtinst, CSR_MTINST) -DECLARE_CSR(mtval2, CSR_MTVAL2) -DECLARE_CSR(miselect, CSR_MISELECT) -DECLARE_CSR(mireg, CSR_MIREG) -DECLARE_CSR(mtopei, CSR_MTOPEI) -DECLARE_CSR(pmpcfg0, CSR_PMPCFG0) -DECLARE_CSR(pmpcfg1, CSR_PMPCFG1) -DECLARE_CSR(pmpcfg2, CSR_PMPCFG2) -DECLARE_CSR(pmpcfg3, CSR_PMPCFG3) -DECLARE_CSR(pmpcfg4, CSR_PMPCFG4) -DECLARE_CSR(pmpcfg5, CSR_PMPCFG5) -DECLARE_CSR(pmpcfg6, CSR_PMPCFG6) -DECLARE_CSR(pmpcfg7, CSR_PMPCFG7) -DECLARE_CSR(pmpcfg8, CSR_PMPCFG8) -DECLARE_CSR(pmpcfg9, CSR_PMPCFG9) -DECLARE_CSR(pmpcfg10, CSR_PMPCFG10) -DECLARE_CSR(pmpcfg11, CSR_PMPCFG11) -DECLARE_CSR(pmpcfg12, CSR_PMPCFG12) -DECLARE_CSR(pmpcfg13, CSR_PMPCFG13) -DECLARE_CSR(pmpcfg14, CSR_PMPCFG14) -DECLARE_CSR(pmpcfg15, CSR_PMPCFG15) -DECLARE_CSR(pmpaddr0, CSR_PMPADDR0) -DECLARE_CSR(pmpaddr1, CSR_PMPADDR1) -DECLARE_CSR(pmpaddr2, CSR_PMPADDR2) -DECLARE_CSR(pmpaddr3, CSR_PMPADDR3) -DECLARE_CSR(pmpaddr4, CSR_PMPADDR4) -DECLARE_CSR(pmpaddr5, CSR_PMPADDR5) -DECLARE_CSR(pmpaddr6, CSR_PMPADDR6) -DECLARE_CSR(pmpaddr7, CSR_PMPADDR7) -DECLARE_CSR(pmpaddr8, CSR_PMPADDR8) -DECLARE_CSR(pmpaddr9, CSR_PMPADDR9) -DECLARE_CSR(pmpaddr10, CSR_PMPADDR10) -DECLARE_CSR(pmpaddr11, CSR_PMPADDR11) -DECLARE_CSR(pmpaddr12, CSR_PMPADDR12) -DECLARE_CSR(pmpaddr13, CSR_PMPADDR13) -DECLARE_CSR(pmpaddr14, CSR_PMPADDR14) -DECLARE_CSR(pmpaddr15, CSR_PMPADDR15) -DECLARE_CSR(pmpaddr16, CSR_PMPADDR16) -DECLARE_CSR(pmpaddr17, CSR_PMPADDR17) -DECLARE_CSR(pmpaddr18, CSR_PMPADDR18) -DECLARE_CSR(pmpaddr19, CSR_PMPADDR19) -DECLARE_CSR(pmpaddr20, CSR_PMPADDR20) -DECLARE_CSR(pmpaddr21, CSR_PMPADDR21) -DECLARE_CSR(pmpaddr22, CSR_PMPADDR22) -DECLARE_CSR(pmpaddr23, CSR_PMPADDR23) -DECLARE_CSR(pmpaddr24, CSR_PMPADDR24) -DECLARE_CSR(pmpaddr25, CSR_PMPADDR25) -DECLARE_CSR(pmpaddr26, CSR_PMPADDR26) -DECLARE_CSR(pmpaddr27, CSR_PMPADDR27) -DECLARE_CSR(pmpaddr28, CSR_PMPADDR28) -DECLARE_CSR(pmpaddr29, CSR_PMPADDR29) -DECLARE_CSR(pmpaddr30, CSR_PMPADDR30) -DECLARE_CSR(pmpaddr31, CSR_PMPADDR31) -DECLARE_CSR(pmpaddr32, CSR_PMPADDR32) -DECLARE_CSR(pmpaddr33, CSR_PMPADDR33) -DECLARE_CSR(pmpaddr34, CSR_PMPADDR34) -DECLARE_CSR(pmpaddr35, CSR_PMPADDR35) -DECLARE_CSR(pmpaddr36, CSR_PMPADDR36) -DECLARE_CSR(pmpaddr37, CSR_PMPADDR37) -DECLARE_CSR(pmpaddr38, CSR_PMPADDR38) -DECLARE_CSR(pmpaddr39, CSR_PMPADDR39) -DECLARE_CSR(pmpaddr40, CSR_PMPADDR40) -DECLARE_CSR(pmpaddr41, CSR_PMPADDR41) -DECLARE_CSR(pmpaddr42, CSR_PMPADDR42) -DECLARE_CSR(pmpaddr43, CSR_PMPADDR43) -DECLARE_CSR(pmpaddr44, CSR_PMPADDR44) -DECLARE_CSR(pmpaddr45, CSR_PMPADDR45) -DECLARE_CSR(pmpaddr46, CSR_PMPADDR46) -DECLARE_CSR(pmpaddr47, CSR_PMPADDR47) -DECLARE_CSR(pmpaddr48, CSR_PMPADDR48) -DECLARE_CSR(pmpaddr49, CSR_PMPADDR49) -DECLARE_CSR(pmpaddr50, CSR_PMPADDR50) -DECLARE_CSR(pmpaddr51, CSR_PMPADDR51) -DECLARE_CSR(pmpaddr52, CSR_PMPADDR52) -DECLARE_CSR(pmpaddr53, CSR_PMPADDR53) -DECLARE_CSR(pmpaddr54, CSR_PMPADDR54) -DECLARE_CSR(pmpaddr55, CSR_PMPADDR55) -DECLARE_CSR(pmpaddr56, CSR_PMPADDR56) -DECLARE_CSR(pmpaddr57, CSR_PMPADDR57) -DECLARE_CSR(pmpaddr58, CSR_PMPADDR58) -DECLARE_CSR(pmpaddr59, CSR_PMPADDR59) -DECLARE_CSR(pmpaddr60, CSR_PMPADDR60) -DECLARE_CSR(pmpaddr61, CSR_PMPADDR61) -DECLARE_CSR(pmpaddr62, CSR_PMPADDR62) -DECLARE_CSR(pmpaddr63, CSR_PMPADDR63) -DECLARE_CSR(mseccfg, CSR_MSECCFG) -DECLARE_CSR(tselect, CSR_TSELECT) -DECLARE_CSR(tdata1, CSR_TDATA1) -DECLARE_CSR(tdata2, CSR_TDATA2) -DECLARE_CSR(tdata3, CSR_TDATA3) -DECLARE_CSR(tinfo, CSR_TINFO) -DECLARE_CSR(tcontrol, CSR_TCONTROL) -DECLARE_CSR(mcontext, CSR_MCONTEXT) -DECLARE_CSR(mscontext, CSR_MSCONTEXT) -DECLARE_CSR(dcsr, CSR_DCSR) -DECLARE_CSR(dpc, CSR_DPC) -DECLARE_CSR(dscratch0, CSR_DSCRATCH0) -DECLARE_CSR(dscratch1, CSR_DSCRATCH1) -DECLARE_CSR(mcycle, CSR_MCYCLE) -DECLARE_CSR(minstret, CSR_MINSTRET) -DECLARE_CSR(mhpmcounter3, CSR_MHPMCOUNTER3) -DECLARE_CSR(mhpmcounter4, CSR_MHPMCOUNTER4) -DECLARE_CSR(mhpmcounter5, CSR_MHPMCOUNTER5) -DECLARE_CSR(mhpmcounter6, CSR_MHPMCOUNTER6) -DECLARE_CSR(mhpmcounter7, CSR_MHPMCOUNTER7) -DECLARE_CSR(mhpmcounter8, CSR_MHPMCOUNTER8) -DECLARE_CSR(mhpmcounter9, CSR_MHPMCOUNTER9) -DECLARE_CSR(mhpmcounter10, CSR_MHPMCOUNTER10) -DECLARE_CSR(mhpmcounter11, CSR_MHPMCOUNTER11) -DECLARE_CSR(mhpmcounter12, CSR_MHPMCOUNTER12) -DECLARE_CSR(mhpmcounter13, CSR_MHPMCOUNTER13) -DECLARE_CSR(mhpmcounter14, CSR_MHPMCOUNTER14) -DECLARE_CSR(mhpmcounter15, CSR_MHPMCOUNTER15) -DECLARE_CSR(mhpmcounter16, CSR_MHPMCOUNTER16) -DECLARE_CSR(mhpmcounter17, CSR_MHPMCOUNTER17) -DECLARE_CSR(mhpmcounter18, CSR_MHPMCOUNTER18) -DECLARE_CSR(mhpmcounter19, CSR_MHPMCOUNTER19) -DECLARE_CSR(mhpmcounter20, CSR_MHPMCOUNTER20) -DECLARE_CSR(mhpmcounter21, CSR_MHPMCOUNTER21) -DECLARE_CSR(mhpmcounter22, CSR_MHPMCOUNTER22) -DECLARE_CSR(mhpmcounter23, CSR_MHPMCOUNTER23) -DECLARE_CSR(mhpmcounter24, CSR_MHPMCOUNTER24) -DECLARE_CSR(mhpmcounter25, CSR_MHPMCOUNTER25) -DECLARE_CSR(mhpmcounter26, CSR_MHPMCOUNTER26) -DECLARE_CSR(mhpmcounter27, CSR_MHPMCOUNTER27) -DECLARE_CSR(mhpmcounter28, CSR_MHPMCOUNTER28) -DECLARE_CSR(mhpmcounter29, CSR_MHPMCOUNTER29) -DECLARE_CSR(mhpmcounter30, CSR_MHPMCOUNTER30) -DECLARE_CSR(mhpmcounter31, CSR_MHPMCOUNTER31) -DECLARE_CSR(mhpmevent3, CSR_MHPMEVENT3) -DECLARE_CSR(mhpmevent4, CSR_MHPMEVENT4) -DECLARE_CSR(mhpmevent5, CSR_MHPMEVENT5) -DECLARE_CSR(mhpmevent6, CSR_MHPMEVENT6) -DECLARE_CSR(mhpmevent7, CSR_MHPMEVENT7) -DECLARE_CSR(mhpmevent8, CSR_MHPMEVENT8) -DECLARE_CSR(mhpmevent9, CSR_MHPMEVENT9) -DECLARE_CSR(mhpmevent10, CSR_MHPMEVENT10) -DECLARE_CSR(mhpmevent11, CSR_MHPMEVENT11) -DECLARE_CSR(mhpmevent12, CSR_MHPMEVENT12) -DECLARE_CSR(mhpmevent13, CSR_MHPMEVENT13) -DECLARE_CSR(mhpmevent14, CSR_MHPMEVENT14) -DECLARE_CSR(mhpmevent15, CSR_MHPMEVENT15) -DECLARE_CSR(mhpmevent16, CSR_MHPMEVENT16) -DECLARE_CSR(mhpmevent17, CSR_MHPMEVENT17) -DECLARE_CSR(mhpmevent18, CSR_MHPMEVENT18) -DECLARE_CSR(mhpmevent19, CSR_MHPMEVENT19) -DECLARE_CSR(mhpmevent20, CSR_MHPMEVENT20) -DECLARE_CSR(mhpmevent21, CSR_MHPMEVENT21) -DECLARE_CSR(mhpmevent22, CSR_MHPMEVENT22) -DECLARE_CSR(mhpmevent23, CSR_MHPMEVENT23) -DECLARE_CSR(mhpmevent24, CSR_MHPMEVENT24) -DECLARE_CSR(mhpmevent25, CSR_MHPMEVENT25) -DECLARE_CSR(mhpmevent26, CSR_MHPMEVENT26) -DECLARE_CSR(mhpmevent27, CSR_MHPMEVENT27) -DECLARE_CSR(mhpmevent28, CSR_MHPMEVENT28) -DECLARE_CSR(mhpmevent29, CSR_MHPMEVENT29) -DECLARE_CSR(mhpmevent30, CSR_MHPMEVENT30) -DECLARE_CSR(mhpmevent31, CSR_MHPMEVENT31) -DECLARE_CSR(mvendorid, CSR_MVENDORID) -DECLARE_CSR(marchid, CSR_MARCHID) -DECLARE_CSR(mimpid, CSR_MIMPID) -DECLARE_CSR(mhartid, CSR_MHARTID) -DECLARE_CSR(mconfigptr, CSR_MCONFIGPTR) -DECLARE_CSR(mtopi, CSR_MTOPI) -DECLARE_CSR(sieh, CSR_SIEH) -DECLARE_CSR(siph, CSR_SIPH) -DECLARE_CSR(stimecmph, CSR_STIMECMPH) -DECLARE_CSR(vsieh, CSR_VSIEH) -DECLARE_CSR(vsiph, CSR_VSIPH) -DECLARE_CSR(vstimecmph, CSR_VSTIMECMPH) -DECLARE_CSR(htimedeltah, CSR_HTIMEDELTAH) -DECLARE_CSR(hidelegh, CSR_HIDELEGH) -DECLARE_CSR(hvienh, CSR_HVIENH) -DECLARE_CSR(henvcfgh, CSR_HENVCFGH) -DECLARE_CSR(hviph, CSR_HVIPH) -DECLARE_CSR(hviprio1h, CSR_HVIPRIO1H) -DECLARE_CSR(hviprio2h, CSR_HVIPRIO2H) -DECLARE_CSR(hstateen0h, CSR_HSTATEEN0H) -DECLARE_CSR(hstateen1h, CSR_HSTATEEN1H) -DECLARE_CSR(hstateen2h, CSR_HSTATEEN2H) -DECLARE_CSR(hstateen3h, CSR_HSTATEEN3H) -DECLARE_CSR(cycleh, CSR_CYCLEH) -DECLARE_CSR(timeh, CSR_TIMEH) -DECLARE_CSR(instreth, CSR_INSTRETH) -DECLARE_CSR(hpmcounter3h, CSR_HPMCOUNTER3H) -DECLARE_CSR(hpmcounter4h, CSR_HPMCOUNTER4H) -DECLARE_CSR(hpmcounter5h, CSR_HPMCOUNTER5H) -DECLARE_CSR(hpmcounter6h, CSR_HPMCOUNTER6H) -DECLARE_CSR(hpmcounter7h, CSR_HPMCOUNTER7H) -DECLARE_CSR(hpmcounter8h, CSR_HPMCOUNTER8H) -DECLARE_CSR(hpmcounter9h, CSR_HPMCOUNTER9H) -DECLARE_CSR(hpmcounter10h, CSR_HPMCOUNTER10H) -DECLARE_CSR(hpmcounter11h, CSR_HPMCOUNTER11H) -DECLARE_CSR(hpmcounter12h, CSR_HPMCOUNTER12H) -DECLARE_CSR(hpmcounter13h, CSR_HPMCOUNTER13H) -DECLARE_CSR(hpmcounter14h, CSR_HPMCOUNTER14H) -DECLARE_CSR(hpmcounter15h, CSR_HPMCOUNTER15H) -DECLARE_CSR(hpmcounter16h, CSR_HPMCOUNTER16H) -DECLARE_CSR(hpmcounter17h, CSR_HPMCOUNTER17H) -DECLARE_CSR(hpmcounter18h, CSR_HPMCOUNTER18H) -DECLARE_CSR(hpmcounter19h, CSR_HPMCOUNTER19H) -DECLARE_CSR(hpmcounter20h, CSR_HPMCOUNTER20H) -DECLARE_CSR(hpmcounter21h, CSR_HPMCOUNTER21H) -DECLARE_CSR(hpmcounter22h, CSR_HPMCOUNTER22H) -DECLARE_CSR(hpmcounter23h, CSR_HPMCOUNTER23H) -DECLARE_CSR(hpmcounter24h, CSR_HPMCOUNTER24H) -DECLARE_CSR(hpmcounter25h, CSR_HPMCOUNTER25H) -DECLARE_CSR(hpmcounter26h, CSR_HPMCOUNTER26H) -DECLARE_CSR(hpmcounter27h, CSR_HPMCOUNTER27H) -DECLARE_CSR(hpmcounter28h, CSR_HPMCOUNTER28H) -DECLARE_CSR(hpmcounter29h, CSR_HPMCOUNTER29H) -DECLARE_CSR(hpmcounter30h, CSR_HPMCOUNTER30H) -DECLARE_CSR(hpmcounter31h, CSR_HPMCOUNTER31H) -DECLARE_CSR(mstatush, CSR_MSTATUSH) -DECLARE_CSR(midelegh, CSR_MIDELEGH) -DECLARE_CSR(mieh, CSR_MIEH) -DECLARE_CSR(mvienh, CSR_MVIENH) -DECLARE_CSR(mviph, CSR_MVIPH) -DECLARE_CSR(menvcfgh, CSR_MENVCFGH) -DECLARE_CSR(mstateen0h, CSR_MSTATEEN0H) -DECLARE_CSR(mstateen1h, CSR_MSTATEEN1H) -DECLARE_CSR(mstateen2h, CSR_MSTATEEN2H) -DECLARE_CSR(mstateen3h, CSR_MSTATEEN3H) -DECLARE_CSR(miph, CSR_MIPH) -DECLARE_CSR(mhpmevent3h, CSR_MHPMEVENT3H) -DECLARE_CSR(mhpmevent4h, CSR_MHPMEVENT4H) -DECLARE_CSR(mhpmevent5h, CSR_MHPMEVENT5H) -DECLARE_CSR(mhpmevent6h, CSR_MHPMEVENT6H) -DECLARE_CSR(mhpmevent7h, CSR_MHPMEVENT7H) -DECLARE_CSR(mhpmevent8h, CSR_MHPMEVENT8H) -DECLARE_CSR(mhpmevent9h, CSR_MHPMEVENT9H) -DECLARE_CSR(mhpmevent10h, CSR_MHPMEVENT10H) -DECLARE_CSR(mhpmevent11h, CSR_MHPMEVENT11H) -DECLARE_CSR(mhpmevent12h, CSR_MHPMEVENT12H) -DECLARE_CSR(mhpmevent13h, CSR_MHPMEVENT13H) -DECLARE_CSR(mhpmevent14h, CSR_MHPMEVENT14H) -DECLARE_CSR(mhpmevent15h, CSR_MHPMEVENT15H) -DECLARE_CSR(mhpmevent16h, CSR_MHPMEVENT16H) -DECLARE_CSR(mhpmevent17h, CSR_MHPMEVENT17H) -DECLARE_CSR(mhpmevent18h, CSR_MHPMEVENT18H) -DECLARE_CSR(mhpmevent19h, CSR_MHPMEVENT19H) -DECLARE_CSR(mhpmevent20h, CSR_MHPMEVENT20H) -DECLARE_CSR(mhpmevent21h, CSR_MHPMEVENT21H) -DECLARE_CSR(mhpmevent22h, CSR_MHPMEVENT22H) -DECLARE_CSR(mhpmevent23h, CSR_MHPMEVENT23H) -DECLARE_CSR(mhpmevent24h, CSR_MHPMEVENT24H) -DECLARE_CSR(mhpmevent25h, CSR_MHPMEVENT25H) -DECLARE_CSR(mhpmevent26h, CSR_MHPMEVENT26H) -DECLARE_CSR(mhpmevent27h, CSR_MHPMEVENT27H) -DECLARE_CSR(mhpmevent28h, CSR_MHPMEVENT28H) -DECLARE_CSR(mhpmevent29h, CSR_MHPMEVENT29H) -DECLARE_CSR(mhpmevent30h, CSR_MHPMEVENT30H) -DECLARE_CSR(mhpmevent31h, CSR_MHPMEVENT31H) -DECLARE_CSR(mnscratch, CSR_MNSCRATCH) -DECLARE_CSR(mnepc, CSR_MNEPC) -DECLARE_CSR(mncause, CSR_MNCAUSE) -DECLARE_CSR(mnstatus, CSR_MNSTATUS) -DECLARE_CSR(mseccfgh, CSR_MSECCFGH) -DECLARE_CSR(mcycleh, CSR_MCYCLEH) -DECLARE_CSR(minstreth, CSR_MINSTRETH) -DECLARE_CSR(mhpmcounter3h, CSR_MHPMCOUNTER3H) -DECLARE_CSR(mhpmcounter4h, CSR_MHPMCOUNTER4H) -DECLARE_CSR(mhpmcounter5h, CSR_MHPMCOUNTER5H) -DECLARE_CSR(mhpmcounter6h, CSR_MHPMCOUNTER6H) -DECLARE_CSR(mhpmcounter7h, CSR_MHPMCOUNTER7H) -DECLARE_CSR(mhpmcounter8h, CSR_MHPMCOUNTER8H) -DECLARE_CSR(mhpmcounter9h, CSR_MHPMCOUNTER9H) -DECLARE_CSR(mhpmcounter10h, CSR_MHPMCOUNTER10H) -DECLARE_CSR(mhpmcounter11h, CSR_MHPMCOUNTER11H) -DECLARE_CSR(mhpmcounter12h, CSR_MHPMCOUNTER12H) -DECLARE_CSR(mhpmcounter13h, CSR_MHPMCOUNTER13H) -DECLARE_CSR(mhpmcounter14h, CSR_MHPMCOUNTER14H) -DECLARE_CSR(mhpmcounter15h, CSR_MHPMCOUNTER15H) -DECLARE_CSR(mhpmcounter16h, CSR_MHPMCOUNTER16H) -DECLARE_CSR(mhpmcounter17h, CSR_MHPMCOUNTER17H) -DECLARE_CSR(mhpmcounter18h, CSR_MHPMCOUNTER18H) -DECLARE_CSR(mhpmcounter19h, CSR_MHPMCOUNTER19H) -DECLARE_CSR(mhpmcounter20h, CSR_MHPMCOUNTER20H) -DECLARE_CSR(mhpmcounter21h, CSR_MHPMCOUNTER21H) -DECLARE_CSR(mhpmcounter22h, CSR_MHPMCOUNTER22H) -DECLARE_CSR(mhpmcounter23h, CSR_MHPMCOUNTER23H) -DECLARE_CSR(mhpmcounter24h, CSR_MHPMCOUNTER24H) -DECLARE_CSR(mhpmcounter25h, CSR_MHPMCOUNTER25H) -DECLARE_CSR(mhpmcounter26h, CSR_MHPMCOUNTER26H) -DECLARE_CSR(mhpmcounter27h, CSR_MHPMCOUNTER27H) -DECLARE_CSR(mhpmcounter28h, CSR_MHPMCOUNTER28H) -DECLARE_CSR(mhpmcounter29h, CSR_MHPMCOUNTER29H) -DECLARE_CSR(mhpmcounter30h, CSR_MHPMCOUNTER30H) -DECLARE_CSR(mhpmcounter31h, CSR_MHPMCOUNTER31H) -#endif -#ifdef DECLARE_CAUSE -DECLARE_CAUSE("misaligned fetch", CAUSE_MISALIGNED_FETCH) -DECLARE_CAUSE("fetch access", CAUSE_FETCH_ACCESS) -DECLARE_CAUSE("illegal instruction", CAUSE_ILLEGAL_INSTRUCTION) -DECLARE_CAUSE("breakpoint", CAUSE_BREAKPOINT) -DECLARE_CAUSE("misaligned load", CAUSE_MISALIGNED_LOAD) -DECLARE_CAUSE("load access", CAUSE_LOAD_ACCESS) -DECLARE_CAUSE("misaligned store", CAUSE_MISALIGNED_STORE) -DECLARE_CAUSE("store access", CAUSE_STORE_ACCESS) -DECLARE_CAUSE("user_ecall", CAUSE_USER_ECALL) -DECLARE_CAUSE("supervisor_ecall", CAUSE_SUPERVISOR_ECALL) -DECLARE_CAUSE("virtual_supervisor_ecall", CAUSE_VIRTUAL_SUPERVISOR_ECALL) -DECLARE_CAUSE("machine_ecall", CAUSE_MACHINE_ECALL) -DECLARE_CAUSE("fetch page fault", CAUSE_FETCH_PAGE_FAULT) -DECLARE_CAUSE("load page fault", CAUSE_LOAD_PAGE_FAULT) -DECLARE_CAUSE("store page fault", CAUSE_STORE_PAGE_FAULT) -DECLARE_CAUSE("fetch guest page fault", CAUSE_FETCH_GUEST_PAGE_FAULT) -DECLARE_CAUSE("load guest page fault", CAUSE_LOAD_GUEST_PAGE_FAULT) -DECLARE_CAUSE("virtual instruction", CAUSE_VIRTUAL_INSTRUCTION) -DECLARE_CAUSE("store guest page fault", CAUSE_STORE_GUEST_PAGE_FAULT) -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/env/p/link.ld b/bb-tests/workloads/src/CTest/rvv/env/p/link.ld deleted file mode 100644 index a64fffb6..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/p/link.ld +++ /dev/null @@ -1,16 +0,0 @@ -OUTPUT_ARCH( "riscv" ) -ENTRY(_start) - -SECTIONS -{ - . = 0x80000000; - .text.init : { *(.text.init) } - . = ALIGN(0x1000); - .tohost : { *(.tohost) } - . = ALIGN(0x1000); - .text : { *(.text) } - . = ALIGN(0x1000); - .data : { *(.data) } - .bss : { *(.bss) } - _end = .; -} diff --git a/bb-tests/workloads/src/CTest/rvv/env/p/riscv_test.h b/bb-tests/workloads/src/CTest/rvv/env/p/riscv_test.h deleted file mode 100644 index 8dd32ed8..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/p/riscv_test.h +++ /dev/null @@ -1,297 +0,0 @@ -// See LICENSE for license details. - -#ifndef _ENV_PHYSICAL_SINGLE_CORE_H -#define _ENV_PHYSICAL_SINGLE_CORE_H - -#include "../encoding.h" - -//----------------------------------------------------------------------- -// Begin Macro -//----------------------------------------------------------------------- - -#define RVTEST_RV64U \ - .macro init; \ - .endm - -#define RVTEST_RV64UF \ - .macro init; \ - RVTEST_FP_ENABLE; \ - .endm - -#define RVTEST_RV64UV \ - .macro init; \ - RVTEST_VECTOR_ENABLE; \ - .endm - -#define RVTEST_RV32U \ - .macro init; \ - .endm - -#define RVTEST_RV32UF \ - .macro init; \ - RVTEST_FP_ENABLE; \ - .endm - -#define RVTEST_RV32UV \ - .macro init; \ - RVTEST_VECTOR_ENABLE; \ - .endm - -#define RVTEST_RV64M \ - .macro init; \ - RVTEST_ENABLE_MACHINE; \ - .endm - -#define RVTEST_RV64S \ - .macro init; \ - RVTEST_ENABLE_SUPERVISOR; \ - .endm - -#define RVTEST_RV32M \ - .macro init; \ - RVTEST_ENABLE_MACHINE; \ - .endm - -#define RVTEST_RV32S \ - .macro init; \ - RVTEST_ENABLE_SUPERVISOR; \ - .endm - -#if __riscv_xlen == 64 -#define CHECK_XLEN \ - li a0, 1; \ - slli a0, a0, 31; \ - bgez a0, 1f; \ - RVTEST_PASS; \ - 1: -#else -#define CHECK_XLEN \ - li a0, 1; \ - slli a0, a0, 31; \ - bltz a0, 1f; \ - RVTEST_PASS; \ - 1: -#endif - -#define INIT_XREG \ - li x1, 0; \ - li x2, 0; \ - li x3, 0; \ - li x4, 0; \ - li x5, 0; \ - li x6, 0; \ - li x7, 0; \ - li x8, 0; \ - li x9, 0; \ - li x10, 0; \ - li x11, 0; \ - li x12, 0; \ - li x13, 0; \ - li x14, 0; \ - li x15, 0; \ - li x16, 0; \ - li x17, 0; \ - li x18, 0; \ - li x19, 0; \ - li x20, 0; \ - li x21, 0; \ - li x22, 0; \ - li x23, 0; \ - li x24, 0; \ - li x25, 0; \ - li x26, 0; \ - li x27, 0; \ - li x28, 0; \ - li x29, 0; \ - li x30, 0; \ - li x31, 0; - -#define INIT_PMP \ - la t0, 1f; \ - csrw mtvec, t0; \ - /* Set up a PMP to permit all accesses */ \ - li t0, (1 << (31 + (__riscv_xlen / 64) * (53 - 31))) - 1; \ - csrw pmpaddr0, t0; \ - li t0, PMP_NAPOT | PMP_R | PMP_W | PMP_X; \ - csrw pmpcfg0, t0; \ - .align 2; \ - 1: - -#define INIT_RNMI \ - la t0, 1f; \ - csrw mtvec, t0; \ - csrwi CSR_MNSTATUS, MNSTATUS_NMIE; \ - .align 2; \ - 1: - -#define INIT_SATP \ - la t0, 1f; \ - csrw mtvec, t0; \ - csrwi satp, 0; \ - .align 2; \ - 1: - -#define DELEGATE_NO_TRAPS \ - csrwi mie, 0; \ - la t0, 1f; \ - csrw mtvec, t0; \ - csrwi medeleg, 0; \ - csrwi mideleg, 0; \ - .align 2; \ - 1: - -#define RVTEST_ENABLE_SUPERVISOR \ - li a0, MSTATUS_MPP &(MSTATUS_MPP >> 1); \ - csrs mstatus, a0; \ - li a0, SIP_SSIP | SIP_STIP; \ - csrs mideleg, a0; - -#define RVTEST_ENABLE_MACHINE \ - li a0, MSTATUS_MPP; \ - csrs mstatus, a0; - -#define RVTEST_FP_ENABLE \ - li a0, MSTATUS_FS &(MSTATUS_FS >> 1); \ - csrs mstatus, a0; \ - csrwi fcsr, 0 - -#define RVTEST_VECTOR_ENABLE \ - li a0, (MSTATUS_VS & (MSTATUS_VS >> 1)) | (MSTATUS_FS & (MSTATUS_FS >> 1)); \ - csrs mstatus, a0; \ - csrwi fcsr, 0; \ - csrwi vcsr, 0; - -#define RISCV_MULTICORE_DISABLE \ - csrr a0, mhartid; \ - 1 : bnez a0, 1b - -#define EXTRA_TVEC_USER -#define EXTRA_TVEC_MACHINE -#define EXTRA_INIT -#define EXTRA_INIT_TIMER -#define FILTER_TRAP -#define FILTER_PAGE_FAULT - -#define INTERRUPT_HANDLER j other_exception /* No interrupts should occur */ - -#define RVTEST_CODE_BEGIN \ - .section.text.init; \ - .align 6; \ - .weak stvec_handler; \ - .weak mtvec_handler; \ - .globl _start; \ - _start: \ - /* reset vector */ \ - j reset_vector; \ - .align 2; \ - trap_vector: \ - /* test whether the test came from pass/fail */ \ - csrr t5, mcause; \ - li t6, CAUSE_USER_ECALL; \ - beq t5, t6, write_tohost; \ - li t6, CAUSE_SUPERVISOR_ECALL; \ - beq t5, t6, write_tohost; \ - li t6, CAUSE_MACHINE_ECALL; \ - beq t5, t6, write_tohost; \ - /* if an mtvec_handler is defined, jump to it */ \ - la t5, mtvec_handler; \ - beqz t5, 1f; \ - jr t5; \ - /* was it an interrupt or an exception? */ \ - 1 : csrr t5, mcause; \ - bgez t5, handle_exception; \ - INTERRUPT_HANDLER; \ - handle_exception: \ - /* we don't know how to handle whatever the exception was */ \ - other_exception: \ - /* some unhandlable exception occurred */ \ - 1 : ori TESTNUM, TESTNUM, 1337; \ - write_tohost: \ - sw TESTNUM, tohost, t5; \ - sw zero, tohost + 4, t5; \ - j write_tohost; \ - reset_vector: \ - INIT_XREG; \ - RISCV_MULTICORE_DISABLE; \ - INIT_RNMI; \ - INIT_SATP; \ - INIT_PMP; \ - DELEGATE_NO_TRAPS; \ - li TESTNUM, 0; \ - la t0, trap_vector; \ - csrw mtvec, t0; \ - CHECK_XLEN; \ - /* if an stvec_handler is defined, delegate exceptions to it */ \ - la t0, stvec_handler; \ - beqz t0, 1f; \ - csrw stvec, t0; \ - li t0, (1 << CAUSE_LOAD_PAGE_FAULT) | (1 << CAUSE_STORE_PAGE_FAULT) | \ - (1 << CAUSE_FETCH_PAGE_FAULT) | (1 << CAUSE_MISALIGNED_FETCH) | \ - (1 << CAUSE_USER_ECALL) | (1 << CAUSE_BREAKPOINT); \ - csrw medeleg, t0; \ - 1 : csrwi mstatus, 0; \ - init; \ - EXTRA_INIT; \ - EXTRA_INIT_TIMER; \ - la t0, 1f; \ - csrw mepc, t0; \ - csrr a0, mhartid; \ - mret; \ - 1: - -//----------------------------------------------------------------------- -// End Macro -//----------------------------------------------------------------------- - -#define RVTEST_CODE_END unimp - -//----------------------------------------------------------------------- -// Pass/Fail Macro -//----------------------------------------------------------------------- - -#define RVTEST_PASS \ - fence; \ - li TESTNUM, 1; \ - li a7, 93; \ - li a0, 0; \ - ecall - -#define TESTNUM gp -#define RVTEST_FAIL \ - fence; \ - 1 : beqz TESTNUM, 1b; \ - sll TESTNUM, TESTNUM, 1; \ - or TESTNUM, TESTNUM, 1; \ - li a7, 93; \ - addi a0, TESTNUM, 0; \ - ecall - -//----------------------------------------------------------------------- -// Data Section Macro -//----------------------------------------------------------------------- - -#define EXTRA_DATA - -#define RVTEST_DATA_BEGIN \ - EXTRA_DATA.pushsection.tohost, "aw", @progbits; \ - .align 6; \ - .global tohost; \ - tohost: \ - .dword 0; \ - .size tohost, 8; \ - .align 6; \ - .global fromhost; \ - fromhost: \ - .dword 0; \ - .size fromhost, 8; \ - .popsection; \ - .align 4; \ - .global begin_signature; \ - begin_signature: - -#define RVTEST_DATA_END \ - .align 4; \ - .global end_signature; \ - end_signature: - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/env/pm/link.ld b/bb-tests/workloads/src/CTest/rvv/env/pm/link.ld deleted file mode 120000 index 86b45f9f..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/pm/link.ld +++ /dev/null @@ -1 +0,0 @@ -../p/link.ld \ No newline at end of file diff --git a/bb-tests/workloads/src/CTest/rvv/env/pm/riscv_test.h b/bb-tests/workloads/src/CTest/rvv/env/pm/riscv_test.h deleted file mode 100644 index 38a0e86b..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/pm/riscv_test.h +++ /dev/null @@ -1,11 +0,0 @@ -// See LICENSE for license details. - -#ifndef _ENV_PHYSICAL_MULTI_CORE_H -#define _ENV_PHYSICAL_MULTI_CORE_H - -#include "../p/riscv_test.h" - -#undef RISCV_MULTICORE_DISABLE -#define RISCV_MULTICORE_DISABLE - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/env/pt/link.ld b/bb-tests/workloads/src/CTest/rvv/env/pt/link.ld deleted file mode 120000 index 86b45f9f..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/pt/link.ld +++ /dev/null @@ -1 +0,0 @@ -../p/link.ld \ No newline at end of file diff --git a/bb-tests/workloads/src/CTest/rvv/env/pt/riscv_test.h b/bb-tests/workloads/src/CTest/rvv/env/pt/riscv_test.h deleted file mode 100644 index 378406d7..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/pt/riscv_test.h +++ /dev/null @@ -1,69 +0,0 @@ -// See LICENSE for license details. - -#ifndef _ENV_PHYSICAL_SINGLE_CORE_TIMER_H -#define _ENV_PHYSICAL_SINGLE_CORE_TIMER_H - -#include "../p/riscv_test.h" - -#define TIMER_INTERVAL 2 - -#undef EXTRA_INIT_TIMER -#define EXTRA_INIT_TIMER \ - li a0, MIP_MTIP; \ - csrs mie, a0; \ - csrr a0, mtime; \ - addi a0, a0, TIMER_INTERVAL; \ - csrw mtimecmp, a0; - -#if SSTATUS_XS != 0x18000 -#error -#endif -#define XS_SHIFT 15 - -#undef INTERRUPT_HANDLER -#define INTERRUPT_HANDLER \ - slli t5, t5, 1; \ - srli t5, t5, 1; \ - add t5, t5, -IRQ_M_TIMER; \ - bnez t5, other_exception; /* other interrups shouldn't happen */ \ - csrr t5, mtime; \ - addi t5, t5, TIMER_INTERVAL; \ - csrw mtimecmp, t5; \ - mret; - -//----------------------------------------------------------------------- -// Data Section Macro -//----------------------------------------------------------------------- - -#undef EXTRA_DATA -#define EXTRA_DATA \ - .align 3; \ - regspill: \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - .dword 0xdeadbeefcafebabe; \ - evac: \ - .skip 32768; - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/env/v/entry.S b/bb-tests/workloads/src/CTest/rvv/env/v/entry.S deleted file mode 100644 index 13d46a34..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/v/entry.S +++ /dev/null @@ -1,164 +0,0 @@ -#include "riscv_test.h" - -#if __riscv_xlen == 64 -# define STORE sd -# define LOAD ld -# define REGBYTES 8 -#else -# define STORE sw -# define LOAD lw -# define REGBYTES 4 -#endif - -#define STACK_TOP (_end + RISCV_PGSIZE * 4) - - .section ".text.init","ax",@progbits - .globl _start - .align 2 -_start: - j handle_reset - - /* NMI vector */ - .align 2 -nmi_vector: - j wtf - - .align 2 -trap_vector: - j wtf - -handle_reset: - li x1, 0 - li x2, 0 - li x3, 0 - li x4, 0 - li x5, 0 - li x6, 0 - li x7, 0 - li x8, 0 - li x9, 0 - li x10, 0 - li x11, 0 - li x12, 0 - li x13, 0 - li x14, 0 - li x15, 0 - li x16, 0 - li x17, 0 - li x18, 0 - li x19, 0 - li x20, 0 - li x21, 0 - li x22, 0 - li x23, 0 - li x24, 0 - li x25, 0 - li x26, 0 - li x27, 0 - li x28, 0 - li x29, 0 - li x30, 0 - li x31, 0 - - INIT_RNMI - - la t0, trap_vector - csrw mtvec, t0 - la sp, STACK_TOP - SIZEOF_TRAPFRAME_T - csrr t0, mhartid - slli t0, t0, 12 - add sp, sp, t0 - csrw mscratch, sp - call extra_boot - la a0, userstart - j vm_boot - - .globl pop_tf -pop_tf: - LOAD t0,33*REGBYTES(a0) - csrw sepc,t0 - LOAD x1,1*REGBYTES(a0) - LOAD x2,2*REGBYTES(a0) - LOAD x3,3*REGBYTES(a0) - LOAD x4,4*REGBYTES(a0) - LOAD x5,5*REGBYTES(a0) - LOAD x6,6*REGBYTES(a0) - LOAD x7,7*REGBYTES(a0) - LOAD x8,8*REGBYTES(a0) - LOAD x9,9*REGBYTES(a0) - LOAD x11,11*REGBYTES(a0) - LOAD x12,12*REGBYTES(a0) - LOAD x13,13*REGBYTES(a0) - LOAD x14,14*REGBYTES(a0) - LOAD x15,15*REGBYTES(a0) - LOAD x16,16*REGBYTES(a0) - LOAD x17,17*REGBYTES(a0) - LOAD x18,18*REGBYTES(a0) - LOAD x19,19*REGBYTES(a0) - LOAD x20,20*REGBYTES(a0) - LOAD x21,21*REGBYTES(a0) - LOAD x22,22*REGBYTES(a0) - LOAD x23,23*REGBYTES(a0) - LOAD x24,24*REGBYTES(a0) - LOAD x25,25*REGBYTES(a0) - LOAD x26,26*REGBYTES(a0) - LOAD x27,27*REGBYTES(a0) - LOAD x28,28*REGBYTES(a0) - LOAD x29,29*REGBYTES(a0) - LOAD x30,30*REGBYTES(a0) - LOAD x31,31*REGBYTES(a0) - LOAD a0,10*REGBYTES(a0) - sret - - .global trap_entry - .align 2 -trap_entry: - csrrw sp, sscratch, sp - - # save gprs - STORE x1,1*REGBYTES(sp) - STORE x3,3*REGBYTES(sp) - STORE x4,4*REGBYTES(sp) - STORE x5,5*REGBYTES(sp) - STORE x6,6*REGBYTES(sp) - STORE x7,7*REGBYTES(sp) - STORE x8,8*REGBYTES(sp) - STORE x9,9*REGBYTES(sp) - STORE x10,10*REGBYTES(sp) - STORE x11,11*REGBYTES(sp) - STORE x12,12*REGBYTES(sp) - STORE x13,13*REGBYTES(sp) - STORE x14,14*REGBYTES(sp) - STORE x15,15*REGBYTES(sp) - STORE x16,16*REGBYTES(sp) - STORE x17,17*REGBYTES(sp) - STORE x18,18*REGBYTES(sp) - STORE x19,19*REGBYTES(sp) - STORE x20,20*REGBYTES(sp) - STORE x21,21*REGBYTES(sp) - STORE x22,22*REGBYTES(sp) - STORE x23,23*REGBYTES(sp) - STORE x24,24*REGBYTES(sp) - STORE x25,25*REGBYTES(sp) - STORE x26,26*REGBYTES(sp) - STORE x27,27*REGBYTES(sp) - STORE x28,28*REGBYTES(sp) - STORE x29,29*REGBYTES(sp) - STORE x30,30*REGBYTES(sp) - STORE x31,31*REGBYTES(sp) - - csrrw t0,sscratch,sp - STORE t0,2*REGBYTES(sp) - - # get sr, epc, badvaddr, cause - csrr t0,sstatus - STORE t0,32*REGBYTES(sp) - csrr t0,sepc - STORE t0,33*REGBYTES(sp) - csrr t0,stval - STORE t0,34*REGBYTES(sp) - csrr t0,scause - STORE t0,35*REGBYTES(sp) - - move a0, sp - j handle_trap diff --git a/bb-tests/workloads/src/CTest/rvv/env/v/link.ld b/bb-tests/workloads/src/CTest/rvv/env/v/link.ld deleted file mode 120000 index 86b45f9f..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/v/link.ld +++ /dev/null @@ -1 +0,0 @@ -../p/link.ld \ No newline at end of file diff --git a/bb-tests/workloads/src/CTest/rvv/env/v/riscv_test.h b/bb-tests/workloads/src/CTest/rvv/env/v/riscv_test.h deleted file mode 100644 index 39b98b8e..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/v/riscv_test.h +++ /dev/null @@ -1,101 +0,0 @@ -// See LICENSE for license details. - -#ifndef _ENV_VIRTUAL_SINGLE_CORE_H -#define _ENV_VIRTUAL_SINGLE_CORE_H - -#include "../p/riscv_test.h" - -//----------------------------------------------------------------------- -// Begin Macro -//----------------------------------------------------------------------- - -#undef RVTEST_FP_ENABLE -#define RVTEST_FP_ENABLE fssr x0 - -#undef RVTEST_VECTOR_ENABLE -#define RVTEST_VECTOR_ENABLE \ - csrwi fcsr, 0; \ - csrwi vcsr, 0; - -#undef RVTEST_CODE_BEGIN -#define RVTEST_CODE_BEGIN \ - .text; \ - .global extra_boot; \ - extra_boot: \ - EXTRA_INIT \ - ret; \ - .global trap_filter; \ - trap_filter: \ - FILTER_TRAP \ - li a0, 0; \ - ret; \ - .global pf_filter; \ - pf_filter: \ - FILTER_PAGE_FAULT \ - li a0, 0; \ - ret; \ - .global userstart; \ - userstart: \ - init - -//----------------------------------------------------------------------- -// Pass/Fail Macro -//----------------------------------------------------------------------- - -#undef RVTEST_PASS -#define RVTEST_PASS \ - li a0, 1; \ - scall - -#undef RVTEST_FAIL -#define RVTEST_FAIL \ - sll a0, TESTNUM, 1; \ - 1 : beqz a0, 1b; \ - or a0, a0, 1; \ - scall; - -//----------------------------------------------------------------------- -// Data Section Macro -//----------------------------------------------------------------------- - -#undef RVTEST_DATA_END -#define RVTEST_DATA_END - -//----------------------------------------------------------------------- -// Supervisor mode definitions and macros -//----------------------------------------------------------------------- - -#ifndef LFSR_BITS -#define LFSR_BITS 6 -#endif - -#define MAX_TEST_PAGES \ - ((1 << LFSR_BITS) - 1) // this must be the period of the LFSR below -#define LFSR_NEXT(x) \ - (((((x) ^ ((x) >> 1)) & 1) << (LFSR_BITS - 1)) | ((x) >> 1)) - -#define PGSHIFT 12 -#define PGSIZE (1UL << PGSHIFT) - -#define SIZEOF_TRAPFRAME_T ((__riscv_xlen / 8) * 36) - -#ifndef __ASSEMBLER__ - -typedef unsigned long pte_t; -#define LEVELS (sizeof(pte_t) == sizeof(uint64_t) ? 3 : 2) -#define PTIDXBITS (PGSHIFT - (sizeof(pte_t) == 8 ? 3 : 2)) -#define VPN_BITS (PTIDXBITS * LEVELS) -#define VA_BITS (VPN_BITS + PGSHIFT) -#define PTES_PER_PT (1UL << RISCV_PGLEVEL_BITS) -#define MEGAPAGE_SIZE (PTES_PER_PT * PGSIZE) - -typedef struct { - long gpr[32]; - long sr; - long epc; - long badvaddr; - long cause; -} trapframe_t; -#endif - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/env/v/string.c b/bb-tests/workloads/src/CTest/rvv/env/v/string.c deleted file mode 100644 index f934b194..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/v/string.c +++ /dev/null @@ -1,108 +0,0 @@ -#include -#include -#include - -void *memcpy(void *dest, const void *src, size_t len) { - if ((((uintptr_t)dest | (uintptr_t)src | len) & (sizeof(uintptr_t) - 1)) == - 0) { - const uintptr_t *s = src; - uintptr_t *d = dest; - while (d < (uintptr_t *)(dest + len)) - *d++ = *s++; - } else { - const char *s = src; - char *d = dest; - while (d < (char *)(dest + len)) - *d++ = *s++; - } - return dest; -} - -void *memset(void *dest, int byte, size_t len) { - if ((((uintptr_t)dest | len) & (sizeof(uintptr_t) - 1)) == 0) { - uintptr_t word = byte & 0xFF; - word |= word << 8; - word |= word << 16; - word |= word << 16 << 16; - - uintptr_t *d = dest; - while (d < (uintptr_t *)(dest + len)) - *d++ = word; - } else { - char *d = dest; - while (d < (char *)(dest + len)) - *d++ = byte; - } - return dest; -} - -size_t strlen(const char *s) { - const char *p = s; - while (*p) - p++; - return p - s; -} - -int strcmp(const char *s1, const char *s2) { - unsigned char c1, c2; - - do { - c1 = *s1++; - c2 = *s2++; - } while (c1 != 0 && c1 == c2); - - return c1 - c2; -} - -int memcmp(const void *s1, const void *s2, size_t n) { - if ((((uintptr_t)s1 | (uintptr_t)s2) & (sizeof(uintptr_t) - 1)) == 0) { - const uintptr_t *u1 = s1; - const uintptr_t *u2 = s2; - const uintptr_t *end = u1 + (n / sizeof(uintptr_t)); - while (u1 < end) { - if (*u1 != *u2) - break; - u1++; - u2++; - } - n -= (const void *)u1 - s1; - s1 = u1; - s2 = u2; - } - - while (n--) { - unsigned char c1 = *(const unsigned char *)s1++; - unsigned char c2 = *(const unsigned char *)s2++; - if (c1 != c2) - return c1 - c2; - } - - return 0; -} - -char *strcpy(char *dest, const char *src) { - char *d = dest; - while ((*d++ = *src++)) - ; - return dest; -} - -long atol(const char *str) { - long res = 0; - int sign = 0; - - while (*str == ' ') - str++; - - if (*str == '-' || *str == '+') { - sign = *str == '-'; - str++; - } - - while (*str) { - res *= 10; - res += *str++ - '0'; - } - - return sign ? -res : res; -} diff --git a/bb-tests/workloads/src/CTest/rvv/env/v/vm.c b/bb-tests/workloads/src/CTest/rvv/env/v/vm.c deleted file mode 100644 index a26781f4..00000000 --- a/bb-tests/workloads/src/CTest/rvv/env/v/vm.c +++ /dev/null @@ -1,310 +0,0 @@ -// See LICENSE for license details. - -#include -#include -#include - -#include "riscv_test.h" - -#if __riscv_xlen == 32 -#define SATP_MODE_CHOICE SATP_MODE_SV32 -#elif defined(Sv48) -#define SATP_MODE_CHOICE SATP_MODE_SV48 -#else -#define SATP_MODE_CHOICE SATP_MODE_SV39 -#endif - -void trap_entry(); -void pop_tf(trapframe_t *); - -extern volatile uint64_t tohost; -extern volatile uint64_t fromhost; - -static void do_tohost(uint64_t tohost_value) { - while (tohost) - fromhost = 0; - tohost = tohost_value; -} - -#define pa2kva(pa) ((void *)(pa) - DRAM_BASE - MEGAPAGE_SIZE) -#define uva2kva(pa) ((void *)(pa) - MEGAPAGE_SIZE) - -#define flush_page(addr) asm volatile("sfence.vma %0" : : "r"(addr) : "memory") - -static uint64_t lfsr63(uint64_t x) { - uint64_t bit = (x ^ (x >> 1)) & 1; - return (x >> 1) | (bit << 62); -} - -static void cputchar(int x) { - do_tohost(0x0101000000000000 | (unsigned char)x); -} - -static void cputstring(const char *s) { - while (*s) - cputchar(*s++); -} - -static void terminate(int code) { - do_tohost(code); - while (1) - ; -} - -void wtf() { terminate(841); } - -#define stringify1(x) #x -#define stringify(x) stringify1(x) -#define assert(x) \ - do { \ - if (x) \ - break; \ - cputstring("Assertion failed: " stringify(x) "\n"); \ - terminate(3); \ - } while (0) - -#define l1pt pt[0] -#define user_l2pt pt[1] -#if SATP_MODE_CHOICE == SATP_MODE_SV48 -#define NPT 6 -#define kernel_l2pt pt[2] -#define kernel_l3pt pt[3] -#define user_l3pt pt[4] -#define user_llpt pt[5] -#elif SATP_MODE_CHOICE == SATP_MODE_SV39 -#define NPT 4 -#define kernel_l2pt pt[2] -#define user_llpt pt[3] -#elif SATP_MODE_CHOICE == SATP_MODE_SV32 -#define NPT 2 -#define user_llpt user_l2pt -#else -#error Unknown SATP_MODE_CHOICE -#endif -pte_t pt[NPT][PTES_PER_PT] __attribute__((aligned(PGSIZE))); - -typedef struct { - pte_t addr; - void *next; -} freelist_t; - -freelist_t user_mapping[MAX_TEST_PAGES]; -freelist_t freelist_nodes[MAX_TEST_PAGES]; -freelist_t *freelist_head, *freelist_tail; - -void printhex(uint64_t x) { - char str[17]; - for (int i = 0; i < 16; i++) { - str[15 - i] = (x & 0xF) + ((x & 0xF) < 10 ? '0' : 'a' - 10); - x >>= 4; - } - str[16] = 0; - - cputstring(str); -} - -static void evict(unsigned long addr) { - assert(addr >= PGSIZE && addr < MAX_TEST_PAGES * PGSIZE); - addr = addr / PGSIZE * PGSIZE; - - freelist_t *node = &user_mapping[addr / PGSIZE]; - if (node->addr) { - // check accessed and dirty bits - assert(user_llpt[addr / PGSIZE] & PTE_A); - uintptr_t sstatus = set_csr(sstatus, SSTATUS_SUM); - if (memcmp((void *)addr, uva2kva(addr), PGSIZE)) { - assert(user_llpt[addr / PGSIZE] & PTE_D); - memcpy(uva2kva(addr), (void *)addr, PGSIZE); - } - write_csr(sstatus, sstatus); - - user_mapping[addr / PGSIZE].addr = 0; - - if (freelist_tail == 0) - freelist_head = freelist_tail = node; - else { - freelist_tail->next = node; - freelist_tail = node; - } - } -} - -extern int pf_filter(uintptr_t addr, uintptr_t *pte, int *copy); -extern int trap_filter(trapframe_t *tf); - -void handle_fault(uintptr_t addr, uintptr_t cause) { - uintptr_t filter_encodings = 0; - int copy_page = 1; - - assert(addr >= PGSIZE && addr < MAX_TEST_PAGES * PGSIZE); - addr = addr / PGSIZE * PGSIZE; - - if (user_llpt[addr / PGSIZE]) { - if (!(user_llpt[addr / PGSIZE] & PTE_A)) { - user_llpt[addr / PGSIZE] |= PTE_A; - } else { - assert(!(user_llpt[addr / PGSIZE] & PTE_D) && - cause == CAUSE_STORE_PAGE_FAULT); - user_llpt[addr / PGSIZE] |= PTE_D; - } - flush_page(addr); - return; - } - - freelist_t *node = freelist_head; - assert(node); - freelist_head = node->next; - if (freelist_head == freelist_tail) - freelist_tail = 0; - - uintptr_t new_pte = (node->addr >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V | PTE_U | - PTE_R | PTE_W | PTE_X; - - if (pf_filter(addr, &filter_encodings, ©_page)) { - new_pte = (node->addr >> PGSHIFT << PTE_PPN_SHIFT) | filter_encodings; - } - - user_llpt[addr / PGSIZE] = new_pte | PTE_A | PTE_D; - flush_page(addr); - - assert(user_mapping[addr / PGSIZE].addr == 0); - user_mapping[addr / PGSIZE] = *node; - - uintptr_t sstatus = set_csr(sstatus, SSTATUS_SUM); - memcpy((void *)addr, uva2kva(addr), PGSIZE); - write_csr(sstatus, sstatus); - - user_llpt[addr / PGSIZE] = new_pte; - flush_page(addr); - - asm volatile("fence.i"); -} - -void handle_trap(trapframe_t *tf) { - if (trap_filter(tf)) { - pop_tf(tf); - } - - if (tf->cause == CAUSE_USER_ECALL) { - int n = tf->gpr[10]; - - for (long i = 1; i < MAX_TEST_PAGES; i++) - evict(i * PGSIZE); - - terminate(n); - } else if (tf->cause == CAUSE_ILLEGAL_INSTRUCTION) { - assert(tf->epc % 4 == 0); - - int *fssr; - asm("jal %0, 1f; fssr x0; 1:" : "=r"(fssr)); - - if (*(int *)tf->epc == *fssr) - terminate(1); // FP test on non-FP hardware. "succeed." - else - assert(!"illegal instruction"); - tf->epc += 4; - } else if (tf->cause == CAUSE_FETCH_PAGE_FAULT || - tf->cause == CAUSE_LOAD_PAGE_FAULT || - tf->cause == CAUSE_STORE_PAGE_FAULT) - handle_fault(tf->badvaddr, tf->cause); - else - assert(!"unexpected exception"); - - pop_tf(tf); -} - -static void coherence_torture() { - // cause coherence misses without affecting program semantics - uint64_t random = ENTROPY; - while (1) { - uintptr_t paddr = - DRAM_BASE + ((random % (2 * (MAX_TEST_PAGES + 1) * PGSIZE)) & -4); -#ifdef __riscv_atomic - if (random & 1) // perform a no-op write - asm volatile("amoadd.w zero, zero, (%0)" ::"r"(paddr)); - else // perform a read -#endif - asm volatile("lw zero, (%0)" ::"r"(paddr)); - random = lfsr63(random); - } -} - -void vm_boot(uintptr_t test_addr) { - uint64_t random = ENTROPY; - if (read_csr(mhartid) > 0) - coherence_torture(); - - _Static_assert(SIZEOF_TRAPFRAME_T == sizeof(trapframe_t), "???"); - -#if (MAX_TEST_PAGES > PTES_PER_PT) || (DRAM_BASE % MEGAPAGE_SIZE) != 0 -#error -#endif - // map user to lowermost megapage - l1pt[0] = ((pte_t)user_l2pt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; - // map kernel to uppermost megapage -#if SATP_MODE_CHOICE == SATP_MODE_SV48 - l1pt[PTES_PER_PT - 1] = - ((pte_t)kernel_l2pt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; - kernel_l2pt[PTES_PER_PT - 1] = - ((pte_t)kernel_l3pt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; - kernel_l3pt[PTES_PER_PT - 1] = (DRAM_BASE / RISCV_PGSIZE << PTE_PPN_SHIFT) | - PTE_V | PTE_R | PTE_W | PTE_X | PTE_A | PTE_D; - user_l2pt[0] = ((pte_t)user_l3pt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; - user_l3pt[0] = ((pte_t)user_llpt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; -#elif SATP_MODE_CHOICE == SATP_MODE_SV39 - l1pt[PTES_PER_PT - 1] = - ((pte_t)kernel_l2pt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; - kernel_l2pt[PTES_PER_PT - 1] = (DRAM_BASE / RISCV_PGSIZE << PTE_PPN_SHIFT) | - PTE_V | PTE_R | PTE_W | PTE_X | PTE_A | PTE_D; - user_l2pt[0] = ((pte_t)user_llpt >> PGSHIFT << PTE_PPN_SHIFT) | PTE_V; -#elif SATP_MODE_CHOICE == SATP_MODE_SV32 - l1pt[PTES_PER_PT - 1] = (DRAM_BASE / RISCV_PGSIZE << PTE_PPN_SHIFT) | PTE_V | - PTE_R | PTE_W | PTE_X | PTE_A | PTE_D; -#else -#error -#endif - uintptr_t vm_choice = SATP_MODE_CHOICE; - uintptr_t satp_value = ((uintptr_t)l1pt >> PGSHIFT) | - (vm_choice * (SATP_MODE & ~(SATP_MODE << 1))); - write_csr(satp, satp_value); - if (read_csr(satp) != satp_value) - assert(!"unsupported satp mode"); - - // Set up PMPs if present, ignoring illegal instruction trap if not. - uintptr_t pmpc = PMP_NAPOT | PMP_R | PMP_W | PMP_X; - uintptr_t pmpa = ((uintptr_t)1 << (__riscv_xlen == 32 ? 31 : 53)) - 1; - asm volatile("la t0, 1f\n\t" - "csrrw t0, mtvec, t0\n\t" - "csrw pmpaddr0, %1\n\t" - "csrw pmpcfg0, %0\n\t" - ".align 2\n\t" - "1: csrw mtvec, t0" - : - : "r"(pmpc), "r"(pmpa) - : "t0"); - - // set up supervisor trap handling - write_csr(stvec, pa2kva(trap_entry)); - write_csr(sscratch, pa2kva(read_csr(mscratch))); - write_csr(medeleg, (1 << CAUSE_USER_ECALL) | (1 << CAUSE_FETCH_PAGE_FAULT) | - (1 << CAUSE_LOAD_PAGE_FAULT) | - (1 << CAUSE_STORE_PAGE_FAULT)); - // FPU on; accelerator on; vector unit on - write_csr(mstatus, MSTATUS_FS | MSTATUS_XS | MSTATUS_VS); - write_csr(mie, 0); - - random = 1 + (random % MAX_TEST_PAGES); - freelist_head = pa2kva((void *)&freelist_nodes[0]); - freelist_tail = pa2kva(&freelist_nodes[MAX_TEST_PAGES - 1]); - for (long i = 0; i < MAX_TEST_PAGES; i++) { - freelist_nodes[i].addr = DRAM_BASE + (MAX_TEST_PAGES + random) * PGSIZE; - freelist_nodes[i].next = pa2kva(&freelist_nodes[i + 1]); - random = LFSR_NEXT(random); - } - freelist_nodes[MAX_TEST_PAGES - 1].next = 0; - - trapframe_t tf; - memset(&tf, 0, sizeof(tf)); - tf.epc = test_addr - DRAM_BASE; - pop_tf(&tf); -} diff --git a/bb-tests/workloads/src/CTest/rvv/start.S b/bb-tests/workloads/src/CTest/rvv/start.S deleted file mode 100644 index 53992e9d..00000000 --- a/bb-tests/workloads/src/CTest/rvv/start.S +++ /dev/null @@ -1,22 +0,0 @@ -.section .text.init -.global _start - -_start: - # Get hart ID - csrr t0, mhartid - - # Set up stack pointer for each hart - # Each hart gets 4KB stack space (0x1000 bytes) - li sp, 0x80020000 - li t1, 0x1000 - mul t1, t0, t1 - sub sp, sp, t1 - # Jump to main for all harts (let MLIR code handle hart filtering) - call main - - - -.align 12 -stack_space: - # 32KB stack space - .space 0x8000 diff --git a/bb-tests/workloads/src/CTest/rvv/utasks/utasks.h b/bb-tests/workloads/src/CTest/rvv/utasks/utasks.h deleted file mode 100644 index 80936ddd..00000000 --- a/bb-tests/workloads/src/CTest/rvv/utasks/utasks.h +++ /dev/null @@ -1,490 +0,0 @@ -#ifndef UTASKS_H -#define UTASKS_H - -#include "util.h" -#include -#include -#include -#include -#include -#include -#include - -class runner_t; - -class mutex_t { // can't use std::mutex if we don't have pthreads -public: - mutex_t() : next_ticket(0), now_serving(0) {} - void lock() { - size_t my_ticket = next_ticket.fetch_add(1, std::memory_order_relaxed); - while (my_ticket != now_serving.load(std::memory_order_acquire)) { - } - } - void unlock() { now_serving.fetch_add(1, std::memory_order_release); } - -private: - std::atomic next_ticket; - std::atomic now_serving; -}; - -mutex_t printf_lock; - -#define PRINTF(...) \ - ({ \ - printf_lock.lock(); \ - printf("hart %ld: ", read_csr(mhartid)); \ - printf(__VA_ARGS__); \ - printf_lock.unlock(); \ - }) - -class allocator_t { -public: - virtual void *allocate(size_t n) = 0; - virtual void deallocate(void *buf) = 0; -}; - -mutex_t heap_allocator_lock; -class heap_allocator_t : public allocator_t { -public: - void *allocate(size_t n) { - heap_allocator_lock.lock(); - void *buf = malloc(n); - heap_allocator_lock.unlock(); - return buf; - } - void deallocate(void *buf) { - heap_allocator_lock.lock(); - free(buf); - heap_allocator_lock.unlock(); - } -}; - -class region_allocator_t : public allocator_t { -public: - region_allocator_t(void *base, size_t size) - : base(base), size(size), tail(0) {} - void *allocate(size_t n) { - lock.lock(); - if (tail + n > size) { - lock.unlock(); - PRINTF("illegal allocation size %ld\n", size); - exit(1); - } - void *r = (uint8_t *)base + tail; - tail += n; - lock.unlock(); - return r; - } - void deallocate(void *buf) {} - -private: - mutex_t lock; - void *base; - size_t size; - size_t tail; -}; - -class alignas(64) task_t { - friend class runner_t; - -public: - task_t() : may_finish(false) {} - void set_may_finish() { may_finish = true; } - bool __attribute__((noinline)) is_finished() { - return may_finish && !has_work(); - } - void wait_for_finished() { - while (!this->is_finished()) { - }; - } - virtual void propagate_finished() = 0; - virtual void assign_runner(runner_t *runner) {}; - -private: - virtual void run() = 0; - virtual bool has_work() = 0; - -protected: - bool may_finish; -}; - -class circular_buffer_pointers_t { -public: - circular_buffer_pointers_t() : head_wide(0), tail_wide(0), size(0) {} - size_t head_wide; - size_t tail_wide; - size_t size; -}; - -template class alignas(64) circular_buffer_t { -public: - circular_buffer_t(size_t capacity, void *buffer, - circular_buffer_pointers_t *pointers) - : buffer((T *)buffer), capacity(capacity), pointers(pointers) { - if (capacity < 4 || (capacity & (capacity - 1)) != 0) { - PRINTF("Illegal capacity %ld\n", capacity); - exit(1); - } - mask = capacity - 1; - } - ~circular_buffer_t() { delete[] buffer; } - - std::pair push(size_t n) { - size_t tail_wide_read, size_read; - size_t mask = this->mask; - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(tail_wide_read) - : [incr] "r"(n), [addr] "r"(&pointers->tail_wide)); - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(size_read) - : [incr] "r"(n), [addr] "r"(&pointers->size)); - tail_wide_read += n; - size_read += n; - size_t tail = tail_wide_read & mask; - size_t head = pointers->head_wide & mask; - size_t remaining_capacity = capacity - size_read; - size_t limit = ((head > tail) ? head : capacity) - tail; - size_t r = (remaining_capacity > limit) ? limit : remaining_capacity; - return std::make_pair(buffer + tail, r); - } - - std::pair pop(size_t n) { - size_t head_wide_read, size_read; - size_t mask = this->mask; - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(head_wide_read) - : [incr] "r"(n), [addr] "r"(&pointers->head_wide)); - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(size_read) - : [incr] "r"((~n) + 1), [addr] "r"(&pointers->size)); - head_wide_read += n; - size_read -= n; - size_t head = head_wide_read & mask; - size_t tail = pointers->tail_wide & mask; - size_t limit = ((tail > head) ? tail : capacity) - head; - size_t r = (size_read > limit) ? limit : size_read; - return std::make_pair(buffer + head, r); - } - - bool busy() { return pointers->size != 0; } - - T *buffer; - size_t mask; - size_t capacity; - circular_buffer_pointers_t *pointers; -}; - -template class circular_buffer_helper_t { -public: - static std::tuple - push_pop(circular_buffer_t *const source, - circular_buffer_pointers_t *source_pointers, - circular_buffer_t *const sink, - circular_buffer_pointers_t *sink_pointers, size_t n) { - size_t sink_tail_wide_read, sink_size_read; - size_t source_head_wide_read, source_size_read; - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(sink_tail_wide_read) - : [incr] "r"(n), [addr] "r"(&sink_pointers->tail_wide)); - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(source_head_wide_read) - : [incr] "r"(n), [addr] "r"(&source_pointers->head_wide)); - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(sink_size_read) - : [incr] "r"(n), [addr] "r"(&sink_pointers->size)); - asm volatile("amoadd.d %[rd], %[incr], (%[addr])\n" - : [rd] "=r"(source_size_read) - : [incr] "r"((~n) + 1), [addr] "r"(&source_pointers->size)); - size_t sink_head_wide = sink_pointers->head_wide; - size_t source_tail_wide = source_pointers->tail_wide; - - asm volatile("\n"); - size_t sink_mask = sink->mask; - size_t source_mask = source->mask; - size_t sink_capacity = sink->capacity; - size_t source_capacity = source->capacity; - - asm volatile("add %[rd], %[src], %[incr]\n" - : [rd] "=r"(sink_tail_wide_read) - : [src] "r"(sink_tail_wide_read), [incr] "r"(n)); - asm volatile("add %[rd], %[src], %[incr]\n" - : [rd] "=r"(source_head_wide_read) - : [src] "r"(source_head_wide_read), [incr] "r"(n)); - asm volatile("add %[rd], %[src], %[incr]\n" - : [rd] "=r"(sink_size_read) - : [src] "r"(sink_size_read), [incr] "r"(n)); - asm volatile("sub %[rd], %[src], %[incr]\n" - : [rd] "=r"(source_size_read) - : [src] "r"(source_size_read), [incr] "r"(n)); - - T *source_buffer = source->buffer; - U *sink_buffer = sink->buffer; - size_t remaining_capacity = sink_capacity - sink_size_read; - size_t sink_tail = sink_tail_wide_read & sink_mask; - size_t sink_head = sink_head_wide & sink_mask; - size_t sink_limit = - ((sink_head > sink_tail) ? sink_head : sink_capacity) - sink_tail; - size_t sink_r = - (remaining_capacity > sink_limit) ? sink_limit : remaining_capacity; - - size_t source_head = source_head_wide_read & source_mask; - size_t source_tail = source_tail_wide & source_mask; - size_t source_limit = - ((source_tail > source_head) ? source_tail : source_capacity) - - source_head; - size_t source_r = - (source_size_read > source_limit) ? source_limit : source_size_read; - - size_t r = (sink_r > source_r) ? source_r : sink_r; - return std::make_tuple(source_buffer + source_head, sink_buffer + sink_tail, - r); - } -}; - -template class sink_t; -template class source_t; - -class runner_t { -public: - runner_t(size_t id, allocator_t *allocator) - : id(id), task_count(0), allocator(allocator) {}; - void run() { - while (1) { - task_t *scheduled_task = schedule_task(); - if (scheduled_task) { - scheduled_task->run(); - } - } - } - - void add_task(task_t *task) { - task->assign_runner(this); - task_lock.lock(); - tasks.push_back(task); - task_count++; - task_lock.unlock(); - } - - bool idle() { return task_count.load() == 0; } - - allocator_t *get_allocator() { return allocator; } - -private: - task_t *schedule_task() { - task_lock.lock(); - for (auto &t : tasks) { - if (t->has_work()) { - task_lock.unlock(); - return t; - } - } - std::list::iterator it = tasks.begin(); - while (it != tasks.end()) { - if ((*it)->is_finished()) { - (*it)->propagate_finished(); - it = tasks.erase(it); - task_count--; - } - it++; - } - task_lock.unlock(); - return nullptr; - } - - size_t id; - mutex_t task_lock; - std::atomic task_count; - std::list tasks; - allocator_t *allocator; -}; - -template class sink_t : virtual public task_t { - friend class source_t; - -public: - sink_t(size_t buffer_size) : buffer_size(buffer_size), buffer(nullptr) {} - ~sink_t() { - runner->get_allocator()->deallocate(buffer_data); - runner->get_allocator()->deallocate(buffer); - } - void assign_runner(runner_t *runner) { - this->runner = runner; - void *pointers_buff = - runner->get_allocator()->allocate(sizeof(circular_buffer_pointers_t)); - circular_buffer_pointers_t *pointers = - new (pointers_buff) circular_buffer_pointers_t; - buffer_data = runner->get_allocator()->allocate(buffer_size * sizeof(T)); - buffer = new circular_buffer_t(buffer_size, buffer_data, pointers); - } - size_t buffer_size; - alignas(64) circular_buffer_t *buffer; - -private: - runner_t *runner; - void *buffer_data; -}; - -template class source_t : virtual public task_t { -public: - source_t() : next(nullptr), output(nullptr) {} - void chain(sink_t *next) { - if (this->next) { - PRINTF("Failed chain\n"); - exit(1); - } - this->next = next; - } - void terminate(T *out) { - if (this->next) { - PRINTF("Failed chain\n"); - exit(1); - } - output = out; - } - void propagate_finished() { - if (this->next) { - this->next->set_may_finish(); - } - } - -protected: - sink_t *next; - alignas(64) T *output; -}; - -template -class pipe_task_t : public source_t, public sink_t { -public: - pipe_task_t(size_t buffer_size, size_t max_chunk) - : max_chunk(max_chunk), sink_t(buffer_size) {} - bool has_work() { return this->buffer->busy(); } - virtual size_t kernel(T *in, U *out, size_t n) = 0; - -private: - void run() { - bool has_next = this->next != nullptr; - circular_buffer_t *next_buffer = has_next ? this->next->buffer : nullptr; - size_t max_chunk = this->max_chunk; - circular_buffer_t *buffer = this->buffer; - if (has_next) { - circular_buffer_pointers_t *source_pointers = buffer->pointers; - circular_buffer_pointers_t *next_pointers = next_buffer->pointers; - std::tuple r = circular_buffer_helper_t::push_pop( - buffer, source_pointers, next_buffer, next_pointers, 0); - T *input = std::get<0>(r); - T *output = std::get<1>(r); - size_t n = std::get<2>(r); - while (n > 0) { - size_t completed = 0; - while (completed < max_chunk && n > 0) { - size_t finished = kernel(input, output, n); - n -= finished; - completed += finished; - input += finished; - output += finished; - } - asm volatile("fence"); - r = circular_buffer_helper_t::push_pop( - buffer, source_pointers, next_buffer, next_pointers, completed); - input = std::get<0>(r); - output = std::get<1>(r); - n = std::get<2>(r); - if (completed == 0) - break; - } - } else { - std::pair pop = buffer->pop(0); - T *input = pop.first; - size_t n = pop.second; - while (n > 0) { - size_t completed = 0; - while (completed < max_chunk && n > 0) { - size_t finished = kernel(input, this->output, n); - n -= finished; - completed += finished; - input += finished; - this->output += finished; - } - asm volatile("fence"); - pop = buffer->pop(completed); - input = pop.first; - n = pop.second; - if (completed == 0) - break; - } - } - } - size_t max_chunk; -}; - -template class source_task_t : public source_t { -public: - source_task_t(size_t max_chunk) : max_chunk(max_chunk) {} - virtual size_t kernel(T *out, size_t n) = 0; - virtual bool has_work() = 0; - -private: - void run() { - bool has_next = this->next != nullptr; - assert(has_next); - circular_buffer_t *next_buffer = has_next ? this->next->buffer : nullptr; - size_t max_chunk = this->max_chunk; - - std::pair push = next_buffer->push(0); - T *output = push.first; - size_t n = push.second; - while (n > 0) { - size_t completed = 0; - while (completed < max_chunk && n > 0) { - size_t finished = kernel(output, n); - n -= finished; - completed += finished; - output += finished; - if (finished == 0) - break; - } - asm volatile("fence"); - push = next_buffer->push(completed); - output = push.first; - n = push.second; - if (completed == 0) - break; - } - } - size_t max_chunk; -}; - -template class sink_task_t : public sink_t { -public: - sink_task_t(size_t buffer_size, size_t max_chunk) - : max_chunk(max_chunk), sink_t(buffer_size) {} - bool has_work() { return this->buffer->busy(); } - virtual size_t kernel(T *in, size_t n) = 0; - -private: - void run() { - size_t max_chunk = this->max_chunk; - circular_buffer_t buffer = this->buffer; - std::pair pop = buffer->pop(0); - T *input = pop.first; - size_t n = pop.second; - while (n > 0) { - size_t completed = 0; - while (completed < max_chunk && n > 0) { - size_t finished = kernel(input, n); - n -= finished; - completed += finished; - input += finished; - } - asm volatile("fence"); - pop = buffer->pop(completed); - input = pop.first; - n = pop.second; - if (completed == 0) - break; - } - } - size_t max_chunk; -}; - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conditional/conditional_gendata.pl b/bb-tests/workloads/src/CTest/rvv/vec-conditional/conditional_gendata.pl deleted file mode 100755 index bf29e81c..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conditional/conditional_gendata.pl +++ /dev/null @@ -1,142 +0,0 @@ -#!/usr/bin/perl -w -#========================================================================== -# conditional_gendata.pl -# -# Author: Generated -# Date: Today -# -(our $usageMsg = <<'ENDMSG') =~ s/^\#//gm; -# -# Simple script which creates an input data set and the reference data -# for the given conditional operation. -# -ENDMSG - -use strict "vars"; -use warnings; -no warnings("once"); -use Getopt::Long; - -#-------------------------------------------------------------------------- -# Command line processing -#-------------------------------------------------------------------------- - -our %opts; - -sub usage() -{ - - print "\n"; - print " Usage: conditional_gendata.pl [options] \n"; - print "\n"; - print " Options:\n"; - print " --help print this message\n"; - print " --size size of input data [1000]\n"; - print " --seed random seed [1]\n"; - print "$usageMsg"; - - exit(); -} - -sub processCommandLine() -{ - - $opts{"help"} = 0; - $opts{"size"} = 1000; - $opts{"seed"} = 1; - Getopt::Long::GetOptions( \%opts, 'help|?', 'size:i', 'seed:i' ) or usage(); - $opts{"help"} and usage(); - -} - -#-------------------------------------------------------------------------- -# Helper Functions -#-------------------------------------------------------------------------- -sub printArray -{ - my $arrayName = $_[0]; - my $arrayRef = $_[1]; - my $type = $_[2]; - - my $numCols = 20; - my $arrayLen = scalar(@{$arrayRef}); - - print $type." ".$arrayName."[DATA_SIZE] = \n"; - print "{\n"; - - if ( $arrayLen <= $numCols ) { - print " "; - for ( my $i = 0; $i < $arrayLen; $i++ ) { - print sprintf("%3d",$arrayRef->[$i]); - if ( $i != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - else { - my $numRows = int($arrayLen/$numCols); - for ( my $j = 0; $j < $numRows; $j++ ) { - print " "; - for ( my $i = 0; $i < $numCols; $i++ ) { - my $index = $j*$numCols + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - if ( $arrayLen > ($numRows*$numCols) ) { - print " "; - for ( my $i = 0; $i < ($arrayLen-($numRows*$numCols)); $i++ ) { - my $index = $numCols*$numRows + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - } - - print "};\n\n"; -} - -#-------------------------------------------------------------------------- -# Main -#-------------------------------------------------------------------------- - -sub main() -{ - - processCommandLine(); - srand($opts{"seed"}); - - my @input1_data; # x - my @input2_data; # a - my @input3_data; # b - my @verify_data; # z - for ( my $i = 0; $i < $opts{"size"}; $i++ ) { - my $valueX = int(rand(10)); # x - my $valueA = int(rand(999)); # a - my $valueB = int(rand(999)); # b - - push( @input1_data, $valueX ); - push( @input2_data, $valueA ); - push( @input3_data, $valueB ); - push( @verify_data, ($valueX < 5) ? $valueA : $valueB ); - } - - print "\n\#define DATA_SIZE ".$opts{"size"}." \n\n"; - printArray( "input1_data", \@input1_data, "int8_t" ); # x - printArray( "input2_data", \@input2_data, "int16_t" ); # a - printArray( "input3_data", \@input3_data, "int16_t" ); # b - printArray( "verify_data", \@verify_data, "int16_t" ); # z - -} - -main(); diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conditional/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-conditional/dataset1.h deleted file mode 100644 index 5cfbc028..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conditional/dataset1.h +++ /dev/null @@ -1,250 +0,0 @@ -#define DATA_SIZE 1000 - -int8_t input1_data[DATA_SIZE] = { - 0, 3, 1, 3, 1, 1, 8, 2, 9, 5, 6, 0, 1, 6, 3, 7, 3, 1, 3, 2, 5, 5, 2, 6, 5, - 9, 0, 9, 3, 6, 2, 3, 9, 3, 9, 9, 6, 0, 1, 6, 2, 9, 9, 4, 5, 7, 2, 1, 4, 4, - 7, 6, 2, 5, 1, 5, 3, 8, 2, 7, 5, 2, 4, 5, 5, 2, 2, 6, 1, 3, 9, 4, 5, 7, 9, - 6, 0, 8, 4, 0, 3, 0, 9, 2, 4, 1, 3, 9, 5, 6, 3, 1, 0, 3, 4, 7, 0, 6, 0, 8, - 8, 1, 3, 1, 8, 5, 2, 3, 8, 8, 6, 1, 5, 6, 5, 7, 2, 0, 7, 1, 3, 9, 9, 8, 0, - 6, 1, 1, 8, 2, 1, 9, 9, 3, 7, 6, 5, 6, 0, 5, 7, 6, 0, 3, 8, 2, 2, 6, 3, 3, - 5, 1, 7, 4, 0, 4, 2, 1, 7, 0, 9, 1, 4, 8, 2, 6, 8, 8, 2, 2, 5, 2, 8, 7, 0, - 7, 7, 7, 2, 0, 7, 7, 7, 7, 0, 6, 9, 1, 7, 9, 4, 3, 3, 1, 2, 1, 7, 4, 8, 1, - 6, 7, 1, 1, 3, 8, 1, 0, 7, 6, 5, 8, 1, 4, 7, 4, 0, 7, 5, 0, 0, 0, 1, 4, 6, - 9, 2, 5, 5, 3, 9, 0, 4, 4, 9, 6, 3, 1, 6, 4, 6, 6, 3, 1, 3, 0, 1, 7, 7, 4, - 1, 0, 0, 0, 1, 3, 1, 3, 4, 3, 8, 3, 3, 3, 8, 3, 4, 2, 5, 9, 7, 5, 8, 9, 4, - 9, 1, 6, 7, 9, 7, 3, 7, 1, 9, 8, 5, 7, 7, 5, 7, 5, 1, 8, 8, 7, 7, 9, 4, 0, - 7, 7, 3, 4, 9, 9, 8, 3, 8, 3, 0, 1, 7, 6, 7, 7, 3, 2, 5, 8, 9, 6, 9, 0, 5, - 9, 2, 9, 9, 4, 2, 9, 1, 5, 0, 8, 2, 6, 6, 2, 9, 1, 9, 2, 1, 5, 7, 7, 5, 7, - 2, 0, 4, 5, 2, 6, 6, 2, 4, 2, 0, 8, 8, 5, 7, 4, 3, 0, 4, 9, 3, 8, 4, 2, 9, - 8, 0, 9, 8, 2, 2, 4, 0, 9, 4, 5, 3, 4, 7, 7, 5, 0, 8, 0, 4, 0, 5, 0, 8, 7, - 1, 2, 9, 0, 8, 8, 5, 4, 1, 8, 9, 3, 4, 0, 1, 4, 3, 2, 0, 2, 1, 7, 3, 1, 9, - 2, 3, 1, 2, 6, 8, 4, 1, 2, 8, 8, 4, 9, 2, 5, 0, 7, 0, 2, 8, 1, 7, 7, 2, 6, - 1, 1, 9, 0, 6, 7, 1, 9, 7, 6, 0, 7, 9, 0, 9, 3, 5, 5, 5, 5, 8, 4, 4, 8, 2, - 4, 3, 3, 1, 7, 8, 3, 3, 4, 6, 4, 9, 4, 9, 3, 7, 8, 5, 8, 3, 1, 1, 1, 1, 2, - 4, 6, 4, 8, 4, 8, 4, 3, 0, 5, 1, 9, 5, 8, 3, 4, 2, 8, 4, 6, 5, 3, 3, 5, 5, - 5, 7, 7, 0, 7, 0, 6, 3, 0, 1, 9, 7, 0, 9, 8, 9, 3, 7, 1, 6, 7, 8, 2, 1, 3, - 8, 7, 6, 1, 9, 3, 4, 2, 1, 5, 3, 0, 3, 4, 0, 4, 5, 3, 7, 4, 8, 2, 0, 1, 2, - 5, 3, 7, 0, 0, 5, 3, 3, 0, 4, 7, 3, 2, 1, 8, 9, 4, 9, 4, 3, 1, 2, 6, 9, 7, - 3, 1, 0, 6, 9, 9, 6, 4, 6, 7, 3, 0, 7, 1, 8, 8, 7, 0, 5, 5, 8, 1, 8, 2, 8, - 1, 6, 3, 7, 5, 0, 0, 6, 5, 2, 8, 5, 8, 7, 7, 9, 3, 7, 7, 2, 6, 4, 0, 0, 5, - 7, 8, 6, 3, 3, 8, 1, 9, 7, 6, 6, 5, 2, 4, 8, 8, 9, 5, 7, 4, 4, 6, 0, 2, 7, - 4, 8, 4, 2, 9, 9, 4, 4, 7, 6, 0, 2, 3, 8, 5, 4, 4, 2, 7, 2, 4, 5, 1, 3, 7, - 3, 4, 1, 1, 5, 4, 8, 2, 2, 5, 1, 1, 8, 4, 0, 4, 9, 3, 1, 8, 0, 7, 8, 0, 3, - 4, 4, 4, 6, 1, 2, 0, 2, 9, 4, 1, 2, 7, 7, 1, 8, 6, 7, 0, 9, 5, 5, 0, 2, 1, - 1, 8, 4, 2, 6, 0, 8, 3, 8, 9, 9, 3, 8, 4, 7, 3, 9, 0, 0, 5, 8, 1, 4, 7, 0, - 5, 5, 2, 8, 7, 3, 5, 5, 5, 4, 4, 1, 2, 0, 4, 2, 5, 4, 8, 7, 8, 2, 2, 6, 0, - 7, 8, 1, 3, 1, 2, 2, 9, 1, 1, 7, 5, 8, 3, 4, 2, 4, 4, 0, 2, 9, 1, 7, 7, 4, - 9, 2, 3, 7, 1, 8, 7, 4, 4, 8, 4, 8, 5, 9, 9, 1, 9, 2, 7, 6, 2, 3, 9, 6, 3, - 5, 2, 2, 1, 7, 9, 3, 8, 7, 4, 4, 7, 1, 3, 8, 8, 3, 9, 1, 3, 3, 6, 1, 6, 6, - 4, 6, 4, 4, 9, 8, 8, 2, 5, 1, 1, 4, 8, 5, 8, 7, 5, 9, 9, 3, 2, 1, 1, 7, 1, - 5, 8, 0, 6, 2, 0, 6, 7, 5, 0, 6, 3, 2, 4, 3, 7, 4, 0, 1, 5, 6, 8, 8, 5, 1, - 9, 9, 9, 6, 6, 0, 8, 4, 5, 0, 8, 6, 9, 3, 8, 7, 7, 6, 6, 4, 0, 7, 1, 4, 3, - 3, 1, 8, 6, 3, 2, 3, 9, 0, 9, 5, 5, 5, 1, 5, 8, 7, 0, 5, 9, 5, 4, 6, 8, 8, - 3, 2, 4, 5, 5, 0, 0, 2, 6, 8, 2, 6, 5, 6, 4, 5, 3, 4, 7, 5, 2, 7, 8, 4, 5}; - -int16_t input2_data[DATA_SIZE] = { - 454, 564, 989, 350, 64, 584, 140, 6, 339, 392, 228, 110, 750, 426, 436, - 774, 981, 875, 564, 938, 647, 203, 703, 606, 551, 195, 569, 229, 584, 902, - 489, 826, 592, 195, 649, 350, 814, 181, 998, 65, 704, 141, 715, 264, 538, - 761, 494, 499, 693, 444, 285, 317, 351, 288, 169, 606, 796, 455, 758, 358, - 989, 985, 788, 517, 461, 775, 332, 598, 446, 125, 557, 799, 925, 648, 891, - 19, 557, 95, 440, 614, 396, 808, 170, 344, 536, 965, 539, 882, 54, 701, - 479, 891, 641, 665, 95, 158, 395, 247, 476, 792, 265, 414, 214, 518, 503, - 284, 643, 290, 68, 731, 619, 786, 828, 528, 487, 947, 626, 304, 403, 959, - 818, 294, 874, 748, 60, 105, 153, 651, 412, 356, 423, 140, 621, 533, 715, - 837, 886, 173, 493, 133, 969, 951, 415, 110, 860, 255, 988, 628, 990, 753, - 633, 19, 452, 777, 988, 81, 483, 153, 390, 943, 601, 496, 121, 641, 870, - 113, 871, 275, 404, 784, 526, 803, 226, 280, 726, 454, 54, 41, 397, 508, - 416, 480, 478, 709, 482, 873, 173, 526, 1, 232, 464, 30, 233, 350, 779, - 784, 743, 823, 403, 160, 289, 202, 636, 90, 70, 855, 954, 6, 698, 194, - 598, 526, 746, 698, 534, 49, 732, 893, 547, 492, 215, 83, 59, 921, 96, - 834, 966, 114, 625, 10, 409, 387, 305, 559, 592, 409, 497, 427, 995, 521, - 822, 185, 14, 274, 185, 486, 96, 899, 814, 80, 37, 946, 935, 870, 441, - 62, 554, 845, 639, 211, 249, 562, 974, 495, 470, 159, 419, 180, 542, 458, - 729, 795, 671, 104, 670, 984, 745, 584, 385, 528, 457, 378, 477, 236, 416, - 165, 270, 704, 825, 303, 623, 560, 294, 532, 398, 584, 375, 767, 589, 958, - 845, 837, 522, 824, 165, 171, 461, 155, 376, 694, 158, 747, 193, 124, 274, - 425, 619, 269, 156, 752, 987, 1, 32, 809, 399, 972, 554, 558, 600, 579, - 540, 849, 995, 716, 224, 181, 820, 793, 96, 320, 238, 754, 191, 142, 884, - 165, 140, 646, 23, 422, 480, 542, 201, 745, 119, 932, 81, 533, 880, 203, - 735, 365, 1, 680, 169, 947, 177, 101, 654, 679, 242, 471, 641, 713, 151, - 870, 601, 907, 397, 979, 454, 365, 962, 98, 655, 315, 90, 774, 70, 282, - 494, 594, 488, 290, 639, 449, 539, 644, 723, 767, 552, 491, 362, 617, 144, - 282, 335, 560, 343, 782, 339, 344, 465, 759, 485, 333, 746, 699, 336, 503, - 546, 268, 943, 345, 396, 908, 523, 545, 990, 351, 719, 312, 616, 567, 737, - 989, 316, 182, 796, 771, 115, 644, 165, 911, 742, 983, 671, 719, 962, 154, - 620, 447, 796, 325, 80, 175, 637, 838, 898, 230, 601, 324, 553, 17, 978, - 177, 185, 393, 651, 377, 349, 553, 844, 593, 980, 544, 50, 781, 716, 896, - 6, 548, 703, 829, 771, 613, 559, 313, 434, 607, 784, 896, 415, 431, 162, - 224, 595, 834, 964, 201, 154, 797, 943, 574, 652, 721, 362, 744, 206, 277, - 973, 757, 627, 517, 910, 904, 63, 394, 482, 244, 557, 172, 884, 594, 6, - 159, 562, 454, 506, 167, 382, 494, 72, 52, 932, 342, 961, 287, 985, 626, - 600, 860, 793, 478, 689, 914, 139, 974, 780, 548, 653, 270, 428, 626, 315, - 323, 317, 239, 377, 231, 70, 825, 576, 848, 303, 616, 708, 2, 967, 815, - 831, 410, 741, 422, 117, 9, 952, 64, 804, 811, 846, 335, 971, 415, 318, - 565, 647, 597, 294, 383, 866, 115, 673, 952, 5, 257, 948, 35, 222, 213, - 943, 621, 444, 934, 340, 462, 321, 522, 78, 313, 238, 972, 201, 879, 450, - 163, 143, 643, 771, 781, 797, 630, 966, 435, 413, 696, 116, 710, 538, 581, - 874, 615, 898, 261, 764, 686, 432, 560, 825, 317, 359, 386, 11, 504, 408, - 907, 368, 516, 359, 797, 69, 692, 300, 815, 829, 247, 243, 281, 713, 909, - 406, 932, 304, 373, 86, 412, 28, 850, 28, 707, 941, 682, 672, 43, 713, - 549, 186, 214, 44, 898, 456, 651, 408, 970, 877, 642, 368, 972, 109, 121, - 516, 631, 736, 315, 530, 296, 854, 668, 228, 881, 808, 0, 483, 212, 154, - 528, 668, 338, 334, 374, 746, 385, 545, 587, 456, 729, 971, 871, 304, 269, - 538, 491, 404, 231, 758, 644, 992, 57, 352, 776, 648, 95, 345, 198, 632, - 86, 95, 337, 87, 86, 580, 187, 230, 759, 547, 10, 798, 957, 450, 128, - 220, 590, 217, 249, 858, 799, 610, 157, 963, 974, 571, 783, 955, 772, 361, - 650, 552, 798, 563, 761, 167, 277, 829, 361, 397, 914, 478, 264, 390, 208, - 443, 813, 118, 964, 30, 101, 92, 453, 361, 23, 839, 890, 246, 622, 366, - 296, 529, 80, 260, 851, 243, 760, 631, 843, 333, 936, 594, 75, 329, 751, - 805, 48, 473, 638, 973, 474, 82, 923, 371, 280, 686, 451, 810, 620, 98, - 611, 376, 791, 298, 726, 249, 898, 5, 607, 208, 741, 258, 950, 31, 353, - 50, 700, 821, 774, 361, 263, 810, 940, 180, 639, 33, 418, 716, 786, 408, - 494, 925, 391, 552, 798, 262, 367, 373, 850, 688, 857, 555, 422, 172, 830, - 834, 144, 126, 134, 804, 913, 424, 555, 756, 325, 302, 134, 683, 791, 533, - 86, 440, 471, 47, 777, 900, 459, 559, 685, 470, 437, 616, 106, 121, 694, - 937, 39, 229, 843, 252, 492, 419, 970, 456, 955, 623, 325, 527, 732, 104, - 980, 830, 689, 790, 343, 580, 48, 292, 724, 805, 510, 258, 723, 328, 48, - 450, 485, 515, 441, 346, 405, 609, 446, 154, 965, 993, 467, 480, 687, 942, - 725, 330, 741, 241, 279, 607, 517, 87, 146, 457, 78, 73, 423, 967, 87, - 33, 342, 430, 990, 178, 928, 160, 224, 234, 382, 163, 231, 366, 746, 733, - 722, 26, 440, 435, 667, 813, 924, 373, 877, 47, 321, 49, 286, 385, 215, - 238, 146, 289, 774, 977, 726, 637, 767, 55, 858}; - -int16_t input3_data[DATA_SIZE] = { - 833, 1, 749, 572, 949, 216, 621, 572, 890, 898, 961, 883, 296, 500, 659, - 812, 678, 696, 474, 258, 569, 88, 759, 375, 657, 592, 267, 800, 944, 368, - 913, 313, 985, 543, 566, 997, 657, 208, 859, 847, 349, 253, 886, 415, 979, - 4, 478, 864, 222, 296, 676, 78, 937, 646, 615, 289, 351, 720, 367, 92, - 62, 853, 346, 222, 908, 358, 778, 740, 33, 743, 933, 557, 431, 357, 287, - 514, 86, 853, 587, 678, 280, 17, 819, 380, 512, 917, 808, 887, 946, 951, - 567, 7, 568, 730, 434, 280, 84, 911, 435, 729, 486, 236, 548, 6, 682, - 173, 499, 599, 215, 658, 251, 131, 302, 433, 322, 125, 824, 10, 733, 703, - 722, 406, 653, 86, 378, 667, 381, 98, 840, 12, 54, 216, 343, 921, 521, - 299, 13, 36, 121, 537, 372, 434, 129, 438, 437, 478, 409, 719, 549, 450, - 171, 646, 501, 509, 753, 12, 853, 339, 357, 171, 328, 968, 516, 81, 786, - 603, 907, 610, 32, 745, 357, 441, 751, 943, 458, 201, 459, 53, 377, 141, - 860, 219, 499, 180, 371, 725, 992, 914, 435, 247, 697, 521, 250, 40, 573, - 161, 502, 355, 170, 633, 32, 320, 853, 852, 107, 388, 157, 16, 204, 677, - 218, 824, 239, 511, 706, 428, 799, 702, 373, 798, 621, 651, 794, 657, 880, - 659, 462, 62, 529, 714, 904, 728, 105, 901, 512, 796, 202, 940, 359, 158, - 22, 680, 107, 797, 242, 718, 698, 739, 644, 326, 407, 605, 768, 88, 148, - 344, 838, 954, 141, 991, 821, 832, 615, 266, 916, 336, 803, 215, 840, 636, - 872, 820, 758, 418, 254, 568, 729, 139, 414, 173, 554, 183, 977, 399, 229, - 996, 105, 696, 140, 50, 635, 87, 8, 110, 37, 646, 619, 897, 57, 288, - 985, 99, 67, 432, 749, 28, 621, 272, 43, 898, 236, 443, 265, 935, 941, - 185, 320, 933, 763, 88, 14, 649, 995, 744, 873, 791, 509, 530, 405, 683, - 251, 120, 452, 622, 678, 530, 335, 226, 1, 569, 646, 610, 203, 796, 299, - 108, 974, 852, 619, 890, 565, 372, 991, 408, 372, 464, 34, 52, 908, 744, - 214, 167, 781, 321, 688, 101, 50, 708, 399, 625, 952, 795, 725, 245, 529, - 333, 545, 775, 143, 755, 105, 210, 54, 923, 764, 595, 697, 685, 324, 32, - 622, 389, 75, 409, 726, 950, 135, 303, 290, 125, 415, 107, 116, 869, 235, - 724, 913, 263, 242, 677, 728, 410, 613, 590, 309, 993, 310, 525, 487, 134, - 393, 367, 75, 629, 600, 429, 92, 355, 668, 991, 863, 84, 532, 707, 370, - 452, 912, 746, 160, 618, 854, 226, 779, 830, 137, 705, 565, 886, 785, 293, - 253, 173, 5, 678, 309, 685, 123, 241, 550, 372, 627, 398, 103, 174, 159, - 558, 372, 859, 745, 321, 591, 890, 345, 933, 908, 989, 456, 900, 495, 444, - 700, 285, 29, 652, 546, 534, 116, 507, 971, 42, 169, 421, 916, 539, 147, - 481, 580, 350, 948, 232, 396, 539, 769, 121, 339, 747, 821, 910, 594, 505, - 680, 354, 847, 971, 505, 770, 243, 396, 214, 646, 399, 67, 620, 972, 698, - 882, 809, 366, 93, 831, 853, 452, 78, 59, 10, 924, 680, 797, 92, 885, - 616, 232, 744, 773, 508, 494, 288, 431, 581, 22, 981, 815, 653, 603, 386, - 889, 164, 861, 865, 304, 152, 854, 390, 326, 402, 813, 206, 962, 890, 49, - 665, 321, 246, 176, 675, 764, 426, 531, 758, 611, 437, 854, 237, 639, 684, - 662, 449, 158, 882, 734, 268, 296, 50, 439, 397, 506, 667, 445, 443, 349, - 13, 366, 612, 720, 828, 911, 812, 50, 57, 950, 55, 226, 416, 877, 572, - 99, 12, 812, 968, 488, 242, 293, 795, 345, 599, 386, 93, 680, 112, 960, - 923, 506, 359, 495, 609, 286, 118, 192, 20, 586, 573, 746, 663, 5, 888, - 441, 737, 169, 219, 672, 452, 298, 459, 923, 612, 497, 352, 917, 290, 503, - 625, 108, 685, 954, 646, 446, 664, 655, 392, 735, 523, 451, 156, 766, 124, - 869, 251, 568, 572, 139, 608, 566, 906, 573, 687, 506, 271, 950, 824, 856, - 468, 793, 416, 978, 626, 548, 28, 786, 578, 265, 415, 623, 934, 545, 40, - 712, 349, 307, 872, 225, 996, 136, 487, 837, 487, 114, 688, 667, 624, 729, - 250, 838, 213, 471, 132, 327, 327, 651, 519, 354, 647, 842, 713, 781, 943, - 615, 843, 942, 53, 263, 269, 68, 538, 330, 606, 320, 122, 646, 604, 880, - 175, 425, 346, 63, 514, 766, 939, 120, 172, 755, 323, 866, 100, 530, 696, - 69, 192, 91, 654, 544, 801, 25, 967, 168, 928, 336, 243, 477, 640, 328, - 295, 914, 345, 508, 606, 71, 435, 161, 70, 804, 276, 36, 542, 953, 773, - 133, 104, 462, 815, 237, 393, 196, 168, 613, 589, 197, 232, 306, 424, 690, - 876, 122, 770, 125, 176, 693, 606, 70, 632, 604, 300, 795, 711, 357, 789, - 93, 530, 1, 289, 834, 172, 123, 942, 322, 503, 183, 633, 594, 40, 213, - 794, 217, 267, 554, 199, 99, 946, 891, 834, 286, 256, 467, 918, 695, 529, - 207, 195, 951, 912, 33, 826, 835, 233, 910, 551, 24, 822, 917, 470, 208, - 206, 823, 638, 307, 439, 349, 996, 144, 226, 207, 535, 849, 695, 763, 623, - 724, 92, 86, 303, 652, 530, 321, 963, 369, 164, 957, 2, 296, 304, 935, - 40, 899, 637, 866, 531, 969, 278, 442, 503, 129, 842, 515, 84, 724, 703, - 109, 892, 884, 196, 828, 367, 435, 696, 908, 392, 163, 569, 232, 49, 35, - 900, 700, 759, 631, 637, 885, 147, 919, 995, 875, 734, 163, 293, 104, 363, - 163, 370, 952, 309, 336, 503, 368, 539, 233, 789, 192, 856, 945, 921, 553, - 628, 326, 563, 429, 139, 518, 291, 297, 704, 163, 675, 70, 722, 635, 59, - 874, 677, 204, 474, 548, 333, 508, 769, 448, 452, 665, 866, 442, 101, 713, - 659, 142, 340, 518, 579, 7, 597, 358, 415, 120}; - -int16_t verify_data[DATA_SIZE] = { - 454, 564, 989, 350, 64, 584, 621, 6, 890, 898, 961, 110, 750, 500, 436, - 812, 981, 875, 564, 938, 569, 88, 703, 375, 657, 592, 569, 800, 584, 368, - 489, 826, 985, 195, 566, 997, 657, 181, 998, 847, 704, 253, 886, 264, 979, - 4, 494, 499, 693, 444, 676, 78, 351, 646, 169, 289, 796, 720, 758, 92, - 62, 985, 788, 222, 908, 775, 332, 740, 446, 125, 933, 799, 431, 357, 287, - 514, 557, 853, 440, 614, 396, 808, 819, 344, 536, 965, 539, 887, 946, 951, - 479, 891, 641, 665, 95, 280, 395, 911, 476, 729, 486, 414, 214, 518, 682, - 173, 643, 290, 215, 658, 251, 786, 302, 433, 322, 125, 626, 304, 733, 959, - 818, 406, 653, 86, 60, 667, 153, 651, 840, 356, 423, 216, 343, 533, 521, - 299, 13, 36, 493, 537, 372, 434, 415, 110, 437, 255, 988, 719, 990, 753, - 171, 19, 501, 777, 988, 81, 483, 153, 357, 943, 328, 496, 121, 81, 870, - 603, 907, 610, 404, 784, 357, 803, 751, 943, 726, 201, 459, 53, 397, 508, - 860, 219, 499, 180, 482, 725, 992, 526, 435, 247, 464, 30, 233, 350, 779, - 784, 502, 823, 170, 160, 32, 320, 636, 90, 70, 388, 954, 6, 204, 677, - 218, 824, 746, 698, 706, 49, 732, 702, 373, 492, 215, 83, 59, 921, 880, - 659, 966, 62, 529, 10, 904, 387, 305, 559, 512, 796, 497, 427, 359, 521, - 22, 680, 14, 274, 185, 486, 96, 739, 644, 80, 37, 946, 935, 870, 441, - 62, 554, 845, 639, 211, 821, 562, 974, 495, 916, 159, 419, 180, 840, 636, - 872, 820, 758, 418, 670, 568, 745, 139, 414, 173, 554, 378, 977, 236, 229, - 996, 105, 696, 140, 50, 635, 87, 294, 110, 37, 646, 619, 897, 589, 958, - 985, 99, 522, 824, 749, 28, 621, 155, 43, 694, 158, 747, 265, 935, 941, - 185, 619, 269, 763, 88, 14, 649, 995, 809, 873, 791, 554, 530, 405, 579, - 540, 120, 995, 622, 224, 530, 820, 226, 1, 320, 646, 754, 203, 142, 884, - 108, 974, 852, 619, 890, 480, 542, 201, 408, 119, 464, 34, 533, 880, 203, - 735, 167, 781, 321, 688, 947, 177, 101, 654, 625, 242, 795, 641, 713, 529, - 333, 601, 775, 143, 979, 454, 365, 962, 923, 655, 595, 90, 774, 324, 32, - 622, 594, 75, 290, 639, 449, 135, 644, 290, 125, 552, 491, 116, 617, 235, - 724, 913, 560, 343, 677, 728, 344, 465, 759, 485, 333, 746, 699, 336, 503, - 546, 367, 943, 345, 600, 908, 523, 545, 990, 991, 863, 312, 616, 567, 370, - 452, 316, 746, 796, 618, 115, 226, 165, 911, 137, 983, 565, 886, 962, 293, - 620, 447, 5, 325, 309, 685, 637, 241, 550, 372, 601, 398, 103, 17, 159, - 177, 372, 859, 745, 321, 591, 553, 844, 933, 980, 544, 50, 781, 716, 444, - 700, 548, 703, 829, 546, 613, 116, 313, 971, 607, 169, 421, 916, 539, 162, - 224, 595, 834, 964, 201, 154, 539, 943, 121, 652, 747, 362, 744, 206, 505, - 973, 354, 847, 971, 910, 904, 63, 396, 482, 646, 399, 172, 884, 972, 698, - 882, 809, 366, 506, 831, 382, 452, 72, 52, 932, 924, 680, 287, 92, 885, - 616, 860, 744, 478, 508, 494, 288, 974, 780, 548, 981, 815, 653, 626, 386, - 323, 317, 239, 377, 304, 70, 825, 576, 848, 303, 616, 206, 2, 890, 815, - 665, 410, 741, 422, 117, 764, 952, 531, 804, 811, 437, 335, 971, 415, 318, - 662, 647, 597, 294, 734, 268, 115, 50, 952, 5, 257, 948, 445, 443, 349, - 943, 621, 444, 720, 828, 911, 812, 522, 57, 950, 238, 972, 416, 879, 572, - 99, 12, 643, 968, 488, 242, 630, 795, 435, 599, 696, 93, 710, 112, 960, - 874, 615, 359, 495, 764, 286, 118, 192, 20, 586, 573, 386, 663, 5, 408, - 441, 368, 516, 359, 672, 452, 298, 459, 815, 829, 497, 243, 917, 290, 503, - 625, 108, 304, 373, 646, 446, 664, 655, 392, 707, 941, 451, 672, 43, 124, - 549, 251, 214, 44, 139, 608, 651, 408, 573, 687, 642, 368, 972, 824, 856, - 516, 631, 736, 978, 530, 296, 28, 668, 228, 265, 808, 0, 483, 212, 40, - 528, 349, 338, 334, 225, 746, 385, 487, 587, 456, 729, 688, 871, 304, 729, - 538, 838, 213, 231, 758, 644, 992, 57, 519, 776, 648, 95, 345, 781, 632, - 86, 95, 942, 53, 86, 269, 68, 538, 759, 606, 320, 122, 957, 450, 128, - 220, 425, 217, 249, 514, 799, 939, 157, 172, 755, 323, 783, 100, 772, 696, - 650, 192, 798, 563, 544, 801, 277, 829, 168, 397, 336, 243, 264, 640, 328, - 443, 914, 345, 508, 30, 101, 92, 453, 361, 23, 839, 36, 246, 953, 773, - 133, 529, 80, 815, 851, 393, 196, 631, 843, 333, 936, 594, 306, 329, 751, - 876, 122, 770, 638, 973, 474, 82, 923, 371, 280, 300, 451, 711, 357, 98, - 93, 376, 791, 289, 726, 172, 123, 5, 607, 503, 741, 633, 594, 40, 213, - 50, 217, 821, 554, 199, 263, 810, 891, 834, 639, 256, 418, 716, 786, 529, - 207, 925, 951, 912, 798, 262, 835, 373, 850, 551, 24, 555, 917, 172, 830, - 834, 823, 126, 307, 439, 913, 996, 555, 756, 207, 535, 849, 683, 763, 533, - 86, 440, 86, 303, 652, 530, 321, 963, 369, 470, 437, 616, 106, 304, 694, - 40, 899, 229, 866, 252, 492, 278, 442, 503, 955, 842, 325, 527, 732, 104, - 109, 830, 689, 790, 828, 367, 435, 696, 908, 805, 163, 569, 232, 49, 35, - 450, 700, 515, 631, 346, 885, 147, 919, 154, 875, 734, 163, 293, 104, 942, - 725, 370, 741, 241, 279, 607, 517, 539, 233, 457, 78, 73, 945, 967, 553, - 628, 326, 563, 990, 139, 518, 291, 224, 704, 163, 675, 231, 722, 635, 59, - 722, 26, 440, 474, 548, 813, 924, 373, 448, 452, 321, 866, 442, 101, 215, - 659, 146, 289, 518, 579, 726, 597, 358, 55, 120}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conditional/vec-conditional.S b/bb-tests/workloads/src/CTest/rvv/vec-conditional/vec-conditional.S deleted file mode 100644 index 7b7cefc1..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conditional/vec-conditional.S +++ /dev/null @@ -1,23 +0,0 @@ - .text - .balign 4 - .global vec_conditional - -# (int16) z[i] = ((int8) x[i] < 5) ? (int16) a[i] : (int16) b[i]; -# -vec_conditional: - vsetvli t0, a0, e8, m1, ta, ma # Use 8b elements. - vle8.v v0, (a1) # Get x[i] - sub a0, a0, t0 # Decrement element count - add a1, a1, t0 # x[i] Bump pointer - vmslt.vi v0, v0, 5 # Set mask in v0 - vsetvli x0, x0, e16, m2, ta, mu # Use 16b elements. - slli t0, t0, 1 # Multiply by 2 bytes - vle16.v v2, (a2), v0.t # z[i] = a[i] case - vmnot.m v0, v0 # Invert v0 - add a2, a2, t0 # a[i] bump pointer - vle16.v v2, (a3), v0.t # z[i] = b[i] case - add a3, a3, t0 # b[i] bump pointer - vse16.v v2, (a4) # Store z - add a4, a4, t0 # z[i] bump pointer - bnez a0, vec_conditional # Any more? - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conditional/vec-conditional_main.c b/bb-tests/workloads/src/CTest/rvv/vec-conditional/vec-conditional_main.c deleted file mode 100644 index 4b0c903f..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conditional/vec-conditional_main.c +++ /dev/null @@ -1,60 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Conditional benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized conditional implementation. -// The input data (and reference data) should be generated using -// the conditional_gendata.pl perl script and dumped to a file named -// dataset1.h. - -#include "util.h" -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" -#include - -//-------------------------------------------------------------------------- -// Main - -static int verify_short(int n, const volatile int16_t *test, - const int16_t *verify) { - int i; - // Unrolled for faster verification - for (i = 0; i < n / 2 * 2; i += 2) { - int t0 = test[i], t1 = test[i + 1]; - int v0 = verify[i], v1 = verify[i + 1]; - if (t0 != v0) - return i + 1; - if (t1 != v1) - return i + 2; - } - if (n % 2 != 0 && test[n - 1] != verify[n - 1]) - return n; - return 0; -} - -void vec_conditional(size_t n, int8_t x[], int16_t a[], int16_t b[], - int16_t z[]); - -int main(int argc, char *argv[]) { - int16_t results_data[DATA_SIZE]; - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_conditional(DATA_SIZE, input1_data, input2_data, input3_data, - results_data); -#endif - - // Do the conditional - setStats(1); - vec_conditional(DATA_SIZE, input1_data, input2_data, input3_data, - results_data); - setStats(0); - - return verify_short(DATA_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conjugate-gradient/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-conjugate-gradient/gen_data.py deleted file mode 100644 index bb55ba65..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conjugate-gradient/gen_data.py +++ /dev/null @@ -1,208 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# // Author: Chi Zhang, ETH Zurich - -# arg1: size, arg2: steps, arg3: sparse density - -import random -import numpy as np -import sys -from sklearn.datasets import make_spd_matrix -from sklearn.datasets import make_sparse_spd_matrix - - -# fun for froming file -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -def genSymetricPositveDenseMatrix(size, data_type): - A = make_spd_matrix(size) - M = np.array(A, dtype=data_type) - return M - pass - - -def genSymetricPositveSparseMatrix(size, data_type, idx_type, density, element_byte): - alpha = 1.0 - density - M = make_sparse_spd_matrix(size) - M = np.array(M, dtype=data_type) - insert_list = list(np.flatnonzero(M)) - non_zero = len(insert_list) - sparsity = float(non_zero) / float(size * size) - num_row = size - num_col = size - - # nz_coo = np.nonzero(M) - # nz_coo_row = nz_coo[0] - # nz_coo_col = nz_coo[1] - - coo = np.transpose(np.nonzero(M)) - - # generate data - data_list = [] - for x in range(non_zero): - # data_list.append(random.random()) - a = coo[x][0] - b = coo[x][1] - data_list.append(M[a][b]) - pass - - # Count for p_row - p_row = [] - p_row.append(0) - acc_bar = num_col - acc_cnt = 0 - for x in range(non_zero): - if insert_list[x] >= acc_bar: - p_row.append(x) - acc_bar = acc_bar + num_col - acc_cnt = acc_cnt + 1 - while insert_list[x] >= acc_bar: - p_row.append(x) - acc_bar = acc_bar + num_col - acc_cnt = acc_cnt + 1 - pass - pass - pass - for x in range(num_row - acc_cnt): - p_row.append(non_zero) - pass - - # generate indicies - index_list = [] - for x in range(non_zero): - idx = insert_list[x] - max_idx = num_col - while idx >= max_idx: - idx = idx - max_idx - pass - index_list.append(idx * element_byte) - pass - - # form an equvelant dense matrix - A = np.zeros([size, size], dtype=data_type) - cnt = 0 - for x in range(size): - row_len = p_row[x + 1] - p_row[x] - for y in range(row_len): - if cnt != y + p_row[x]: - print("Error. paralyze sparse matrix wrong") - sys.exit() - pass - idx = int((index_list[cnt]) / element_byte) - A[x][idx] = data_list[cnt] - cnt = cnt + 1 - pass - pass - - # check dialog elements - for x in range(size): - if A[x][x] == 0: - print("Error. paralyze dialog wrong") - sys.exit() - pass - pass - - # check symetric - for x in range(size): - for y in range(size): - if A[x][y] != A[y][x]: - print("Error. paralyze dialog wrong") - sys.exit() - pass - pass - pass - - # check transform correctness - for x in range(size): - for y in range(size): - if A[x][y] != M[x][y]: - print("Error. CSR transform wrong") - sys.exit() - pass - pass - pass - - # Form numpy data - A_PROW = np.array(p_row, dtype=idx_type) - A_IDX = np.array(index_list, dtype=idx_type) - A_DATA = np.array(data_list, dtype=data_type) - - return A_PROW, A_IDX, A_DATA, sparsity - pass - - -def genRandomVector(size, data_type): - v = np.zeros([size], dtype=data_type) - for x in range(size): - v[x] = random.random() - pass - return v - pass - - -# SCRIPT - - -if len(sys.argv) == 4: - S = int(sys.argv[1]) - N = int(sys.argv[2]) - D = float(sys.argv[3]) -else: - print("Error. Give me one argument: the number of vector elements.") - sys.exit() - -data_type = np.float64 -idx_type = np.int32 -element_byte = 8 - -A = genSymetricPositveDenseMatrix(S, data_type) -A_PROW, A_IDX, A_DATA, sparsity = genSymetricPositveSparseMatrix( - S, data_type, idx_type, D, element_byte -) -b = genRandomVector(S, data_type) -x = np.zeros([S], dtype=data_type) -r = np.zeros([S], dtype=data_type) -p = np.zeros([S], dtype=data_type) -Ax = np.zeros([S, S], dtype=data_type) -Ap = np.zeros([S, S], dtype=data_type) - - -print('.section .data,"aw",@progbits') -emit("size", np.array(S, dtype=np.uint64)) -emit("step", np.array(N, dtype=np.uint64)) -emit("sparsity", np.array(sparsity, dtype=np.float64)) -emit("A", A, "NR_LANES*4") -emit("b", b, "NR_LANES*4") -emit("x", x, "NR_LANES*4") -emit("r", r, "NR_LANES*4") -emit("p", p, "NR_LANES*4") -emit("Ax", Ax, "NR_LANES*4") -emit("Ap", Ap, "NR_LANES*4") -emit("A_PROW", A_PROW, "NR_LANES*4") -emit("A_IDX", A_IDX, "NR_LANES*4") -emit("A_DATA", A_DATA, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conjugate-gradient/main.c b/bb-tests/workloads/src/CTest/rvv/vec-conjugate-gradient/main.c deleted file mode 100644 index 1e66b4dc..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conjugate-gradient/main.c +++ /dev/null @@ -1,154 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Chi Zhang, ETH Zurich - -#include -#include - -#include "ara/fdotproduct.h" -#include "ara/spmv.h" -#include "util.h" - -#include - -#define MIN_LOSS 0.005 -#define MAX_ITERS 10 -#define abs(x) (x < 0 ? -x : x) - -extern uint64_t size; -extern uint64_t step; -extern double sparsity; - -extern double A[] __attribute__((aligned(16))); -extern double Ax[] __attribute__((aligned(16))); -extern double x[] __attribute__((aligned(16))); -extern double b[] __attribute__((aligned(16))); -extern double r[] __attribute__((aligned(16))); -extern double p[] __attribute__((aligned(16))); -extern double Ap[] __attribute__((aligned(16))); -extern int32_t A_PROW[] __attribute__((aligned(16))); -extern int32_t A_IDX[] __attribute__((aligned(16))); -extern double A_DATA[] __attribute__((aligned(16))); - -void daxpy(double *x, double a, double *y, double *dest, uint64_t len) { - while (len) { - size_t vl = __riscv_vsetvl_e64m8(len); - asm volatile("vle64.v v0, (%0);" ::"r"(x)); - asm volatile("vle64.v v8, (%0);" ::"r"(y)); - asm volatile("vfmacc.vf v8, %0, v0" ::"f"(a)); - asm volatile("vse64.v v8, (%0);" ::"r"(dest)); - x = x + vl; - y = y + vl; - dest = dest + vl; - len = len - vl; - } -} - -double CG_iteration_spmv(int32_t *A_PROW, int32_t *A_IDX, double *A_DATA, - double *x, double *b, double *r, double *p, double *Ap, - uint64_t size) { - /* - Calculate step length alpha - */ - double rk_norm = fdotp_v64b(r, r, size); - spmv_csr_idx32(size, A_PROW, A_IDX, A_DATA, p, Ap); - double pAp = fdotp_v64b(p, Ap, size); - // printf("pAp: %f\n", pAp); - if (abs(pAp) < MIN_LOSS) { - return rk_norm; - } - double alpha = rk_norm / pAp; - - /* - update x - */ - daxpy(p, alpha, x, x, size); - - /* - update loss r - */ - daxpy(Ap, (-1.0) * alpha, r, r, size); - - /* - calculate beta - */ - double rk_norm_new = fdotp_v64b(r, r, size); - double beta = rk_norm_new / rk_norm; - - /* - update p - */ - daxpy(p, beta, r, p, size); - - /* - return loss - */ - return rk_norm_new; -} - -int main() { - printf("Conjugate Gradient\n"); - printf("Solving a Ax=b equation with (%d x %d) Matrix size...\n", size, size); - printf("Sparse Matrix in CSR format, with %ld nonzeros per row\n", - (size_t)(sparsity * size)); - - printf("Initializing CGM parameters...\n"); - spmv_csr_idx32(size, A_PROW, A_IDX, A_DATA, x, Ax); - daxpy(Ax, -1.0, b, r, size); - daxpy(Ax, -1.0, b, p, size); - - printf("Start CGM ...\n"); - - // Start instruction and cycles count of the region of interest - unsigned long cycles1, cycles2, instr2, instr1; - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - uint64_t i = 0; - while (1) { - if (step > 0 && i >= step) { - break; - } - double loss = - CG_iteration_spmv(A_PROW, A_IDX, A_DATA, x, b, r, p, Ap, size); - if (loss < MIN_LOSS || i > MAX_ITERS) { - break; - } - i++; - } - - asm volatile("fence"); - - // End instruction and cycles count of the region of interest - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - size_t rk_norm_ops = size; - size_t spmv_ops = sparsity * size * size; - size_t pAp_ops = size; - size_t daxpy_ops = size * 3; - size_t rk_norm_new_ops = size; - - size_t operations = - i * (rk_norm_ops + spmv_ops + pAp_ops + daxpy_ops + rk_norm_new_ops); - - // Instruction and cycles count of the region of interest - printf("NUMBER OF OPERATIONS %lu\n", operations); - printf("NUMBER OF EXEC CYCLES :%lu\n", cycles2 - cycles1); - printf("NUMBER OF INSTRUCTIONS EXECUTED :%lu\n", instr2 - instr1); - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-conv-3/dataset1.h deleted file mode 100644 index 7c766b9e..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/dataset1.h +++ /dev/null @@ -1,1993 +0,0 @@ -#define K_DIM 3 -#define IH 100 -#define IW 100 -#define I_SIZE 10000 -#define OH 98 -#define OW 98 -#define O_SIZE 9604 - -float input_k[K_DIM * K_DIM] = { - 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -}; -float input_image[I_SIZE] = { - 20.0, 29.0, 16.0, 10.0, 44.0, 6.0, 2.0, 0.0, 4.0, 12.0, 30.0, - 44.0, 96.0, 20.0, 28.0, 384.0, 3.0, 184.0, 62.0, 20.0, 464.0, 60.0, - 62.0, 216.0, 14.0, 76.0, 304.0, 480.0, 16.0, 192.0, 44.0, 112.0, 2.0, - 27.0, 256.0, 5.0, 52.0, 124.0, 160.0, 112.0, 42.0, 64.0, 16.0, 27.0, - 26.0, 38.0, 23.0, 152.0, 60.0, 184.0, 232.0, 40.0, 176.0, 64.0, 248.0, - 3.0, 176.0, 208.0, 56.0, 31.0, 384.0, 48.0, 108.0, 0.0, 8.0, 48.0, - 26.0, 216.0, 8.0, 20.0, 8.0, 160.0, 336.0, 112.0, 42.0, 12.0, 9.0, - 28.0, 192.0, 112.0, 84.0, 17.0, 288.0, 120.0, 48.0, 28.0, 16.0, 23.0, - 256.0, 112.0, 208.0, 16.0, 8.0, 320.0, 80.0, 116.0, 50.0, 256.0, 16.0, - 248.0, 160.0, 448.0, 100.0, 4.0, 224.0, 54.0, 32.0, 76.0, 16.0, 240.0, - 2.0, 168.0, 32.0, 4.0, 20.0, 26.0, 18.0, 160.0, 4.0, 8.0, 28.0, - 30.0, 240.0, 6.0, 42.0, 24.0, 30.0, 26.0, 64.0, 16.0, 48.0, 320.0, - 6.0, 27.0, 80.0, 54.0, 40.0, 104.0, 3.0, 32.0, 4.0, 168.0, 12.0, - 17.0, 32.0, 112.0, 14.0, 22.0, 11.0, 336.0, 136.0, 68.0, 200.0, 12.0, - 0.0, 1.0, 40.0, 40.0, 0.0, 76.0, 16.0, 384.0, 0.0, 48.0, 240.0, - 144.0, 30.0, 12.0, 27.0, 1.0, 184.0, 192.0, 9.0, 104.0, 16.0, 62.0, - 2.0, 208.0, 32.0, 32.0, 160.0, 52.0, 44.0, 10.0, 100.0, 96.0, 52.0, - 26.0, 64.0, 6.0, 40.0, 30.0, 48.0, 184.0, 14.0, 320.0, 176.0, 2.0, - 84.0, 400.0, 32.0, 52.0, 48.0, 60.0, 208.0, 11.0, 464.0, 26.0, 36.0, - 304.0, 152.0, 64.0, 208.0, 0.0, 80.0, 416.0, 26.0, 44.0, 108.0, 18.0, - 272.0, 64.0, 56.0, 17.0, 60.0, 8.0, 160.0, 320.0, 92.0, 32.0, 12.0, - 160.0, 224.0, 128.0, 0.0, 4.0, 88.0, 13.0, 40.0, 13.0, 6.0, 16.0, - 25.0, 5.0, 11.0, 232.0, 64.0, 336.0, 17.0, 56.0, 128.0, 48.0, 29.0, - 22.0, 18.0, 200.0, 4.0, 28.0, 432.0, 15.0, 416.0, 320.0, 6.0, 272.0, - 12.0, 8.0, 28.0, 480.0, 14.0, 104.0, 64.0, 28.0, 240.0, 20.0, 0.0, - 40.0, 248.0, 112.0, 62.0, 100.0, 136.0, 58.0, 272.0, 24.0, 44.0, 52.0, - 28.0, 64.0, 32.0, 496.0, 0.0, 124.0, 152.0, 40.0, 184.0, 38.0, 10.0, - 24.0, 432.0, 208.0, 48.0, 304.0, 26.0, 7.0, 336.0, 18.0, 112.0, 272.0, - 20.0, 144.0, 192.0, 304.0, 16.0, 34.0, 84.0, 416.0, 240.0, 4.0, 26.0, - 56.0, 23.0, 384.0, 12.0, 36.0, 184.0, 496.0, 10.0, 40.0, 128.0, 112.0, - 108.0, 248.0, 8.0, 0.0, 96.0, 18.0, 168.0, 72.0, 44.0, 176.0, 48.0, - 272.0, 192.0, 216.0, 88.0, 128.0, 32.0, 36.0, 104.0, 27.0, 50.0, 24.0, - 304.0, 31.0, 0.0, 60.0, 232.0, 23.0, 62.0, 0.0, 200.0, 20.0, 6.0, - 12.0, 176.0, 20.0, 432.0, 9.0, 60.0, 30.0, 80.0, 8.0, 24.0, 108.0, - 14.0, 4.0, 232.0, 12.0, 0.0, 224.0, 8.0, 18.0, 72.0, 25.0, 18.0, - 96.0, 32.0, 28.0, 32.0, 240.0, 21.0, 400.0, 0.0, 240.0, 184.0, 24.0, - 13.0, 40.0, 400.0, 432.0, 32.0, 240.0, 3.0, 16.0, 104.0, 22.0, 4.0, - 16.0, 14.0, 480.0, 38.0, 88.0, 58.0, 40.0, 32.0, 24.0, 288.0, 27.0, - 464.0, 38.0, 272.0, 12.0, 64.0, 56.0, 28.0, 144.0, 54.0, 232.0, 19.0, - 28.0, 272.0, 108.0, 496.0, 288.0, 60.0, 8.0, 0.0, 52.0, 60.0, 384.0, - 176.0, 17.0, 52.0, 224.0, 224.0, 12.0, 9.0, 52.0, 0.0, 38.0, 224.0, - 52.0, 480.0, 272.0, 24.0, 29.0, 0.0, 288.0, 4.0, 0.0, 32.0, 136.0, - 400.0, 56.0, 48.0, 30.0, 128.0, 88.0, 29.0, 8.0, 104.0, 29.0, 40.0, - 18.0, 116.0, 52.0, 16.0, 16.0, 26.0, 168.0, 64.0, 104.0, 34.0, 400.0, - 184.0, 72.0, 248.0, 15.0, 44.0, 32.0, 52.0, 46.0, 0.0, 320.0, 64.0, - 40.0, 62.0, 28.0, 112.0, 40.0, 184.0, 60.0, 28.0, 15.0, 416.0, 28.0, - 31.0, 192.0, 48.0, 272.0, 18.0, 44.0, 24.0, 56.0, 48.0, 76.0, 96.0, - 44.0, 62.0, 160.0, 96.0, 128.0, 20.0, 42.0, 240.0, 15.0, 32.0, 216.0, - 0.0, 84.0, 20.0, 25.0, 4.0, 56.0, 84.0, 464.0, 88.0, 16.0, 38.0, - 19.0, 48.0, 26.0, 64.0, 8.0, 448.0, 88.0, 176.0, 6.0, 80.0, 18.0, - 100.0, 160.0, 32.0, 96.0, 16.0, 208.0, 48.0, 52.0, 54.0, 8.0, 29.0, - 464.0, 104.0, 160.0, 21.0, 34.0, 152.0, 112.0, 16.0, 7.0, 216.0, 144.0, - 56.0, 2.0, 14.0, 12.0, 240.0, 6.0, 8.0, 112.0, 80.0, 2.0, 0.0, - 13.0, 25.0, 416.0, 48.0, 240.0, 104.0, 240.0, 84.0, 240.0, 240.0, 48.0, - 152.0, 10.0, 13.0, 176.0, 192.0, 76.0, 8.0, 160.0, 72.0, 32.0, 104.0, - 60.0, 58.0, 464.0, 4.0, 4.0, 248.0, 18.0, 9.0, 176.0, 256.0, 112.0, - 80.0, 4.0, 48.0, 1.0, 14.0, 8.0, 32.0, 92.0, 208.0, 256.0, 23.0, - 168.0, 16.0, 176.0, 76.0, 48.0, 20.0, 28.0, 16.0, 8.0, 64.0, 10.0, - 42.0, 58.0, 48.0, 120.0, 56.0, 19.0, 120.0, 12.0, 108.0, 42.0, 3.0, - 32.0, 32.0, 184.0, 104.0, 96.0, 32.0, 240.0, 208.0, 24.0, 62.0, 320.0, - 30.0, 24.0, 14.0, 44.0, 21.0, 10.0, 58.0, 152.0, 64.0, 240.0, 28.0, - 8.0, 12.0, 60.0, 31.0, 1.0, 56.0, 1.0, 6.0, 8.0, 8.0, 40.0, - 176.0, 160.0, 60.0, 216.0, 25.0, 16.0, 80.0, 208.0, 6.0, 248.0, 1.0, - 16.0, 30.0, 26.0, 14.0, 200.0, 184.0, 184.0, 496.0, 124.0, 12.0, 13.0, - 8.0, 496.0, 1.0, 38.0, 24.0, 288.0, 48.0, 9.0, 22.0, 24.0, 9.0, - 192.0, 38.0, 8.0, 224.0, 160.0, 22.0, 48.0, 42.0, 256.0, 352.0, 64.0, - 16.0, 25.0, 25.0, 6.0, 40.0, 20.0, 256.0, 58.0, 42.0, 30.0, 104.0, - 1.0, 52.0, 136.0, 21.0, 256.0, 50.0, 12.0, 432.0, 72.0, 20.0, 40.0, - 13.0, 30.0, 108.0, 1.0, 40.0, 224.0, 3.0, 24.0, 108.0, 32.0, 76.0, - 24.0, 16.0, 216.0, 96.0, 168.0, 64.0, 16.0, 34.0, 31.0, 48.0, 384.0, - 208.0, 21.0, 30.0, 88.0, 128.0, 52.0, 112.0, 46.0, 16.0, 56.0, 50.0, - 16.0, 152.0, 28.0, 36.0, 14.0, 128.0, 80.0, 92.0, 128.0, 6.0, 28.0, - 22.0, 136.0, 48.0, 104.0, 496.0, 224.0, 136.0, 54.0, 14.0, 42.0, 58.0, - 48.0, 14.0, 2.0, 176.0, 48.0, 304.0, 124.0, 112.0, 60.0, 68.0, 0.0, - 480.0, 48.0, 32.0, 62.0, 64.0, 44.0, 22.0, 16.0, 88.0, 4.0, 22.0, - 272.0, 60.0, 0.0, 96.0, 32.0, 25.0, 29.0, 96.0, 40.0, 56.0, 32.0, - 72.0, 27.0, 3.0, 192.0, 17.0, 16.0, 80.0, 368.0, 12.0, 168.0, 16.0, - 40.0, 248.0, 20.0, 248.0, 288.0, 192.0, 54.0, 56.0, 96.0, 48.0, 24.0, - 30.0, 480.0, 18.0, 22.0, 14.0, 28.0, 72.0, 18.0, 16.0, 0.0, 176.0, - 2.0, 9.0, 288.0, 46.0, 248.0, 36.0, 92.0, 96.0, 32.0, 80.0, 480.0, - 144.0, 32.0, 272.0, 21.0, 48.0, 32.0, 144.0, 17.0, 17.0, 304.0, 40.0, - 96.0, 4.0, 128.0, 0.0, 48.0, 32.0, 25.0, 48.0, 22.0, 32.0, 31.0, - 22.0, 416.0, 1.0, 12.0, 144.0, 46.0, 116.0, 48.0, 124.0, 496.0, 16.0, - 304.0, 464.0, 12.0, 96.0, 2.0, 32.0, 13.0, 16.0, 13.0, 448.0, 36.0, - 136.0, 60.0, 8.0, 96.0, 0.0, 16.0, 304.0, 208.0, 20.0, 96.0, 100.0, - 28.0, 6.0, 4.0, 56.0, 136.0, 6.0, 5.0, 120.0, 216.0, 400.0, 21.0, - 48.0, 6.0, 256.0, 384.0, 224.0, 32.0, 96.0, 32.0, 304.0, 72.0, 104.0, - 288.0, 25.0, 272.0, 36.0, 27.0, 28.0, 40.0, 0.0, 48.0, 0.0, 124.0, - 112.0, 176.0, 34.0, 24.0, 32.0, 18.0, 320.0, 96.0, 96.0, 16.0, 248.0, - 0.0, 12.0, 384.0, 10.0, 168.0, 16.0, 112.0, 4.0, 23.0, 18.0, 384.0, - 84.0, 12.0, 104.0, 336.0, 208.0, 224.0, 0.0, 4.0, 28.0, 24.0, 1.0, - 160.0, 16.0, 192.0, 16.0, 32.0, 4.0, 124.0, 232.0, 18.0, 0.0, 32.0, - 9.0, 24.0, 0.0, 76.0, 6.0, 384.0, 92.0, 7.0, 24.0, 20.0, 224.0, - 0.0, 5.0, 128.0, 240.0, 144.0, 116.0, 64.0, 32.0, 352.0, 128.0, 19.0, - 12.0, 16.0, 12.0, 48.0, 0.0, 112.0, 30.0, 5.0, 160.0, 144.0, 16.0, - 48.0, 9.0, 9.0, 30.0, 4.0, 23.0, 416.0, 184.0, 176.0, 56.0, 26.0, - 480.0, 7.0, 248.0, 192.0, 60.0, 16.0, 152.0, 17.0, 88.0, 32.0, 120.0, - 192.0, 18.0, 24.0, 23.0, 16.0, 496.0, 40.0, 32.0, 16.0, 4.0, 8.0, - 88.0, 32.0, 216.0, 192.0, 496.0, 0.0, 40.0, 108.0, 12.0, 32.0, 216.0, - 44.0, 22.0, 36.0, 448.0, 24.0, 56.0, 8.0, 12.0, 176.0, 16.0, 20.0, - 216.0, 288.0, 88.0, 208.0, 96.0, 0.0, 56.0, 320.0, 216.0, 216.0, 18.0, - 336.0, 38.0, 256.0, 80.0, 192.0, 16.0, 32.0, 384.0, 56.0, 192.0, 32.0, - 52.0, 272.0, 1.0, 30.0, 22.0, 68.0, 62.0, 136.0, 48.0, 384.0, 28.0, - 64.0, 11.0, 29.0, 384.0, 9.0, 176.0, 160.0, 224.0, 232.0, 62.0, 84.0, - 0.0, 104.0, 16.0, 60.0, 96.0, 64.0, 2.0, 8.0, 108.0, 4.0, 20.0, - 30.0, 416.0, 224.0, 232.0, 68.0, 272.0, 120.0, 128.0, 32.0, 216.0, 9.0, - 96.0, 4.0, 9.0, 48.0, 400.0, 36.0, 8.0, 4.0, 46.0, 192.0, 88.0, - 160.0, 8.0, 0.0, 46.0, 136.0, 108.0, 36.0, 64.0, 80.0, 24.0, 8.0, - 416.0, 14.0, 168.0, 20.0, 112.0, 32.0, 224.0, 216.0, 416.0, 4.0, 80.0, - 464.0, 24.0, 32.0, 19.0, 13.0, 480.0, 10.0, 17.0, 32.0, 48.0, 96.0, - 352.0, 184.0, 52.0, 42.0, 4.0, 24.0, 104.0, 124.0, 17.0, 448.0, 26.0, - 8.0, 20.0, 272.0, 432.0, 60.0, 416.0, 304.0, 168.0, 38.0, 336.0, 14.0, - 116.0, 2.0, 272.0, 104.0, 40.0, 144.0, 240.0, 32.0, 56.0, 92.0, 32.0, - 62.0, 56.0, 10.0, 168.0, 30.0, 104.0, 1.0, 48.0, 42.0, 14.0, 6.0, - 224.0, 15.0, 16.0, 432.0, 288.0, 448.0, 304.0, 112.0, 56.0, 27.0, 104.0, - 13.0, 88.0, 76.0, 56.0, 54.0, 112.0, 128.0, 5.0, 24.0, 92.0, 40.0, - 22.0, 7.0, 112.0, 128.0, 24.0, 28.0, 48.0, 464.0, 23.0, 0.0, 42.0, - 128.0, 60.0, 72.0, 1.0, 44.0, 16.0, 92.0, 232.0, 8.0, 8.0, 32.0, - 0.0, 24.0, 32.0, 40.0, 128.0, 496.0, 64.0, 16.0, 0.0, 29.0, 52.0, - 14.0, 58.0, 64.0, 152.0, 56.0, 64.0, 432.0, 52.0, 240.0, 208.0, 352.0, - 30.0, 8.0, 18.0, 34.0, 176.0, 16.0, 168.0, 224.0, 28.0, 40.0, 64.0, - 96.0, 64.0, 72.0, 34.0, 58.0, 40.0, 96.0, 16.0, 0.0, 208.0, 44.0, - 10.0, 496.0, 80.0, 0.0, 0.0, 14.0, 36.0, 62.0, 32.0, 496.0, 72.0, - 30.0, 14.0, 6.0, 16.0, 18.0, 160.0, 16.0, 8.0, 496.0, 112.0, 152.0, - 432.0, 96.0, 192.0, 14.0, 120.0, 224.0, 36.0, 336.0, 24.0, 7.0, 1.0, - 144.0, 26.0, 100.0, 36.0, 100.0, 18.0, 80.0, 6.0, 14.0, 120.0, 104.0, - 32.0, 128.0, 27.0, 192.0, 4.0, 104.0, 32.0, 48.0, 48.0, 10.0, 160.0, - 40.0, 224.0, 56.0, 64.0, 56.0, 40.0, 128.0, 384.0, 44.0, 32.0, 30.0, - 12.0, 384.0, 112.0, 168.0, 32.0, 26.0, 92.0, 160.0, 25.0, 248.0, 84.0, - 120.0, 1.0, 288.0, 224.0, 28.0, 32.0, 60.0, 480.0, 12.0, 352.0, 80.0, - 44.0, 136.0, 1.0, 21.0, 8.0, 12.0, 50.0, 34.0, 200.0, 52.0, 17.0, - 240.0, 68.0, 48.0, 48.0, 56.0, 176.0, 224.0, 240.0, 160.0, 240.0, 40.0, - 0.0, 480.0, 48.0, 8.0, 44.0, 27.0, 13.0, 30.0, 256.0, 0.0, 176.0, - 14.0, 10.0, 80.0, 32.0, 160.0, 168.0, 12.0, 62.0, 4.0, 12.0, 38.0, - 152.0, 28.0, 432.0, 116.0, 160.0, 46.0, 8.0, 104.0, 46.0, 96.0, 46.0, - 21.0, 88.0, 200.0, 36.0, 64.0, 108.0, 120.0, 0.0, 136.0, 240.0, 224.0, - 128.0, 72.0, 72.0, 28.0, 52.0, 0.0, 30.0, 8.0, 0.0, 8.0, 116.0, - 40.0, 40.0, 128.0, 9.0, 44.0, 30.0, 104.0, 24.0, 58.0, 80.0, 9.0, - 23.0, 60.0, 48.0, 120.0, 96.0, 88.0, 28.0, 224.0, 320.0, 64.0, 10.0, - 176.0, 58.0, 100.0, 80.0, 248.0, 240.0, 336.0, 120.0, 336.0, 27.0, 80.0, - 224.0, 208.0, 60.0, 2.0, 30.0, 68.0, 88.0, 24.0, 152.0, 88.0, 208.0, - 496.0, 12.0, 116.0, 40.0, 46.0, 104.0, 0.0, 112.0, 108.0, 30.0, 200.0, - 208.0, 50.0, 24.0, 192.0, 26.0, 112.0, 0.0, 496.0, 248.0, 56.0, 176.0, - 104.0, 18.0, 7.0, 8.0, 0.0, 10.0, 432.0, 42.0, 20.0, 168.0, 0.0, - 48.0, 38.0, 2.0, 496.0, 0.0, 176.0, 88.0, 48.0, 10.0, 16.0, 432.0, - 176.0, 496.0, 15.0, 1.0, 208.0, 128.0, 64.0, 11.0, 64.0, 72.0, 68.0, - 128.0, 144.0, 8.0, 208.0, 8.0, 32.0, 34.0, 2.0, 4.0, 42.0, 30.0, - 27.0, 4.0, 12.0, 28.0, 80.0, 224.0, 21.0, 24.0, 62.0, 30.0, 6.0, - 30.0, 48.0, 9.0, 0.0, 22.0, 128.0, 30.0, 3.0, 32.0, 416.0, 72.0, - 25.0, 208.0, 3.0, 30.0, 40.0, 8.0, 152.0, 8.0, 40.0, 31.0, 448.0, - 40.0, 32.0, 44.0, 92.0, 80.0, 40.0, 0.0, 176.0, 336.0, 2.0, 26.0, - 14.0, 100.0, 200.0, 192.0, 224.0, 62.0, 52.0, 13.0, 12.0, 76.0, 336.0, - 304.0, 48.0, 27.0, 160.0, 32.0, 52.0, 40.0, 48.0, 48.0, 192.0, 416.0, - 12.0, 304.0, 58.0, 368.0, 304.0, 1.0, 224.0, 100.0, 32.0, 104.0, 48.0, - 136.0, 400.0, 48.0, 208.0, 48.0, 20.0, 8.0, 160.0, 4.0, 22.0, 128.0, - 12.0, 40.0, 36.0, 432.0, 152.0, 7.0, 144.0, 100.0, 256.0, 15.0, 304.0, - 6.0, 26.0, 7.0, 28.0, 25.0, 304.0, 496.0, 144.0, 9.0, 28.0, 240.0, - 12.0, 1.0, 6.0, 64.0, 40.0, 20.0, 216.0, 22.0, 128.0, 16.0, 80.0, - 2.0, 36.0, 16.0, 92.0, 336.0, 48.0, 56.0, 128.0, 15.0, 17.0, 496.0, - 160.0, 88.0, 24.0, 144.0, 20.0, 128.0, 18.0, 6.0, 320.0, 4.0, 4.0, - 26.0, 72.0, 46.0, 176.0, 288.0, 12.0, 52.0, 72.0, 128.0, 26.0, 48.0, - 23.0, 24.0, 64.0, 208.0, 46.0, 0.0, 160.0, 96.0, 16.0, 18.0, 176.0, - 88.0, 17.0, 0.0, 26.0, 46.0, 14.0, 1.0, 40.0, 8.0, 16.0, 40.0, - 2.0, 116.0, 52.0, 416.0, 84.0, 60.0, 24.0, 54.0, 336.0, 76.0, 10.0, - 56.0, 200.0, 448.0, 0.0, 56.0, 48.0, 26.0, 12.0, 30.0, 464.0, 24.0, - 48.0, 112.0, 152.0, 216.0, 32.0, 24.0, 8.0, 40.0, 18.0, 24.0, 116.0, - 2.0, 32.0, 32.0, 4.0, 224.0, 64.0, 8.0, 31.0, 15.0, 22.0, 80.0, - 96.0, 92.0, 0.0, 80.0, 496.0, 116.0, 128.0, 496.0, 64.0, 144.0, 136.0, - 216.0, 304.0, 0.0, 480.0, 8.0, 8.0, 200.0, 25.0, 32.0, 6.0, 20.0, - 272.0, 64.0, 168.0, 108.0, 136.0, 12.0, 16.0, 152.0, 92.0, 4.0, 42.0, - 352.0, 28.0, 52.0, 144.0, 8.0, 29.0, 12.0, 32.0, 336.0, 104.0, 48.0, - 8.0, 116.0, 32.0, 26.0, 10.0, 80.0, 160.0, 9.0, 40.0, 176.0, 288.0, - 32.0, 58.0, 0.0, 16.0, 68.0, 64.0, 120.0, 13.0, 30.0, 72.0, 480.0, - 62.0, 112.0, 240.0, 272.0, 208.0, 56.0, 80.0, 30.0, 416.0, 26.0, 64.0, - 240.0, 48.0, 116.0, 0.0, 0.0, 400.0, 480.0, 92.0, 200.0, 320.0, 9.0, - 2.0, 200.0, 4.0, 16.0, 24.0, 52.0, 76.0, 4.0, 31.0, 36.0, 15.0, - 18.0, 100.0, 18.0, 24.0, 64.0, 368.0, 48.0, 28.0, 120.0, 32.0, 144.0, - 52.0, 40.0, 18.0, 80.0, 84.0, 22.0, 464.0, 16.0, 288.0, 40.0, 6.0, - 38.0, 28.0, 96.0, 19.0, 176.0, 24.0, 72.0, 336.0, 18.0, 400.0, 96.0, - 112.0, 108.0, 12.0, 44.0, 224.0, 208.0, 248.0, 320.0, 42.0, 80.0, 18.0, - 144.0, 32.0, 30.0, 224.0, 432.0, 32.0, 256.0, 112.0, 112.0, 3.0, 0.0, - 248.0, 10.0, 24.0, 27.0, 60.0, 184.0, 16.0, 24.0, 224.0, 88.0, 2.0, - 120.0, 12.0, 336.0, 0.0, 44.0, 384.0, 0.0, 42.0, 116.0, 16.0, 10.0, - 120.0, 112.0, 144.0, 136.0, 352.0, 84.0, 10.0, 184.0, 56.0, 31.0, 32.0, - 24.0, 192.0, 0.0, 108.0, 12.0, 80.0, 100.0, 29.0, 192.0, 168.0, 16.0, - 224.0, 32.0, 24.0, 232.0, 176.0, 240.0, 29.0, 22.0, 336.0, 68.0, 23.0, - 108.0, 144.0, 19.0, 112.0, 30.0, 16.0, 58.0, 60.0, 44.0, 24.0, 29.0, - 40.0, 7.0, 116.0, 112.0, 92.0, 152.0, 17.0, 14.0, 29.0, 18.0, 20.0, - 60.0, 6.0, 192.0, 23.0, 12.0, 256.0, 21.0, 8.0, 22.0, 40.0, 96.0, - 17.0, 320.0, 336.0, 480.0, 48.0, 8.0, 36.0, 336.0, 32.0, 288.0, 31.0, - 12.0, 100.0, 30.0, 52.0, 8.0, 368.0, 40.0, 448.0, 100.0, 240.0, 80.0, - 248.0, 48.0, 18.0, 48.0, 248.0, 14.0, 32.0, 248.0, 128.0, 320.0, 10.0, - 23.0, 52.0, 16.0, 480.0, 22.0, 80.0, 120.0, 16.0, 9.0, 72.0, 8.0, - 112.0, 80.0, 320.0, 352.0, 4.0, 96.0, 12.0, 68.0, 4.0, 27.0, 224.0, - 112.0, 104.0, 32.0, 8.0, 42.0, 448.0, 152.0, 96.0, 32.0, 416.0, 58.0, - 6.0, 50.0, 152.0, 192.0, 0.0, 8.0, 8.0, 56.0, 3.0, 29.0, 272.0, - 20.0, 416.0, 80.0, 20.0, 22.0, 20.0, 62.0, 62.0, 84.0, 5.0, 24.0, - 32.0, 40.0, 84.0, 96.0, 336.0, 19.0, 96.0, 16.0, 416.0, 34.0, 21.0, - 100.0, 0.0, 13.0, 2.0, 14.0, 176.0, 32.0, 88.0, 12.0, 46.0, 25.0, - 40.0, 60.0, 14.0, 44.0, 0.0, 40.0, 8.0, 24.0, 304.0, 16.0, 124.0, - 28.0, 31.0, 42.0, 248.0, 42.0, 288.0, 54.0, 96.0, 50.0, 14.0, 18.0, - 168.0, 16.0, 32.0, 136.0, 24.0, 5.0, 5.0, 23.0, 200.0, 54.0, 184.0, - 48.0, 152.0, 56.0, 0.0, 11.0, 11.0, 38.0, 16.0, 0.0, 16.0, 40.0, - 58.0, 16.0, 304.0, 5.0, 0.0, 352.0, 168.0, 48.0, 32.0, 12.0, 0.0, - 352.0, 224.0, 96.0, 432.0, 72.0, 30.0, 54.0, 96.0, 368.0, 22.0, 28.0, - 480.0, 24.0, 12.0, 208.0, 168.0, 21.0, 46.0, 288.0, 25.0, 0.0, 31.0, - 23.0, 96.0, 240.0, 208.0, 176.0, 256.0, 36.0, 120.0, 104.0, 42.0, 400.0, - 44.0, 36.0, 272.0, 304.0, 224.0, 24.0, 352.0, 16.0, 25.0, 17.0, 46.0, - 304.0, 48.0, 464.0, 80.0, 224.0, 200.0, 104.0, 26.0, 3.0, 192.0, 34.0, - 208.0, 216.0, 224.0, 248.0, 40.0, 64.0, 432.0, 12.0, 24.0, 7.0, 31.0, - 27.0, 40.0, 16.0, 36.0, 16.0, 68.0, 28.0, 56.0, 304.0, 14.0, 144.0, - 40.0, 128.0, 12.0, 224.0, 25.0, 80.0, 160.0, 224.0, 4.0, 224.0, 36.0, - 52.0, 48.0, 160.0, 2.0, 480.0, 116.0, 72.0, 20.0, 24.0, 48.0, 136.0, - 20.0, 112.0, 496.0, 22.0, 48.0, 120.0, 14.0, 28.0, 46.0, 26.0, 0.0, - 40.0, 28.0, 32.0, 12.0, 4.0, 0.0, 40.0, 38.0, 0.0, 240.0, 16.0, - 120.0, 36.0, 3.0, 7.0, 32.0, 232.0, 2.0, 44.0, 54.0, 200.0, 16.0, - 4.0, 48.0, 216.0, 248.0, 352.0, 80.0, 6.0, 232.0, 21.0, 120.0, 23.0, - 480.0, 29.0, 58.0, 80.0, 96.0, 232.0, 176.0, 16.0, 9.0, 10.0, 27.0, - 40.0, 64.0, 23.0, 48.0, 32.0, 40.0, 76.0, 416.0, 16.0, 8.0, 80.0, - 62.0, 18.0, 88.0, 26.0, 8.0, 400.0, 17.0, 38.0, 13.0, 4.0, 16.0, - 16.0, 464.0, 62.0, 448.0, 16.0, 40.0, 160.0, 208.0, 0.0, 240.0, 48.0, - 22.0, 58.0, 7.0, 176.0, 9.0, 128.0, 152.0, 32.0, 192.0, 24.0, 26.0, - 256.0, 27.0, 160.0, 52.0, 14.0, 64.0, 4.0, 288.0, 96.0, 9.0, 52.0, - 480.0, 16.0, 56.0, 112.0, 20.0, 368.0, 144.0, 52.0, 160.0, 25.0, 72.0, - 28.0, 32.0, 44.0, 352.0, 120.0, 416.0, 192.0, 60.0, 16.0, 27.0, 28.0, - 50.0, 144.0, 13.0, 160.0, 68.0, 2.0, 104.0, 352.0, 96.0, 4.0, 84.0, - 64.0, 176.0, 15.0, 64.0, 40.0, 192.0, 4.0, 112.0, 416.0, 240.0, 44.0, - 29.0, 480.0, 29.0, 32.0, 27.0, 28.0, 88.0, 12.0, 416.0, 8.0, 432.0, - 15.0, 30.0, 1.0, 176.0, 22.0, 480.0, 12.0, 320.0, 30.0, 64.0, 240.0, - 480.0, 56.0, 168.0, 176.0, 54.0, 192.0, 224.0, 432.0, 8.0, 19.0, 19.0, - 400.0, 208.0, 84.0, 32.0, 18.0, 0.0, 29.0, 0.0, 336.0, 12.0, 14.0, - 60.0, 19.0, 56.0, 64.0, 32.0, 8.0, 42.0, 128.0, 25.0, 26.0, 16.0, - 2.0, 56.0, 208.0, 100.0, 208.0, 58.0, 192.0, 112.0, 12.0, 100.0, 27.0, - 16.0, 20.0, 224.0, 22.0, 192.0, 58.0, 144.0, 108.0, 20.0, 24.0, 152.0, - 200.0, 28.0, 216.0, 48.0, 31.0, 80.0, 11.0, 62.0, 480.0, 136.0, 18.0, - 416.0, 116.0, 5.0, 208.0, 44.0, 496.0, 400.0, 4.0, 52.0, 108.0, 248.0, - 20.0, 104.0, 240.0, 108.0, 24.0, 0.0, 104.0, 112.0, 46.0, 64.0, 96.0, - 124.0, 44.0, 144.0, 4.0, 58.0, 48.0, 288.0, 52.0, 32.0, 68.0, 104.0, - 25.0, 18.0, 56.0, 15.0, 32.0, 96.0, 12.0, 20.0, 240.0, 42.0, 34.0, - 16.0, 416.0, 176.0, 15.0, 184.0, 192.0, 15.0, 184.0, 17.0, 88.0, 200.0, - 80.0, 112.0, 68.0, 72.0, 0.0, 11.0, 14.0, 30.0, 1.0, 144.0, 10.0, - 80.0, 96.0, 13.0, 36.0, 160.0, 52.0, 112.0, 22.0, 160.0, 25.0, 320.0, - 24.0, 34.0, 256.0, 64.0, 30.0, 26.0, 72.0, 16.0, 16.0, 16.0, 128.0, - 48.0, 28.0, 12.0, 10.0, 208.0, 128.0, 12.0, 20.0, 28.0, 8.0, 29.0, - 368.0, 288.0, 176.0, 28.0, 320.0, 4.0, 6.0, 16.0, 16.0, 240.0, 24.0, - 14.0, 32.0, 136.0, 10.0, 80.0, 240.0, 64.0, 208.0, 0.0, 224.0, 12.0, - 5.0, 64.0, 16.0, 7.0, 18.0, 5.0, 52.0, 288.0, 16.0, 432.0, 88.0, - 24.0, 144.0, 336.0, 56.0, 352.0, 112.0, 160.0, 20.0, 0.0, 24.0, 124.0, - 224.0, 19.0, 248.0, 496.0, 128.0, 6.0, 14.0, 38.0, 11.0, 352.0, 224.0, - 56.0, 24.0, 19.0, 27.0, 128.0, 84.0, 44.0, 88.0, 16.0, 34.0, 52.0, - 80.0, 64.0, 4.0, 14.0, 32.0, 68.0, 224.0, 144.0, 16.0, 112.0, 32.0, - 8.0, 80.0, 248.0, 36.0, 120.0, 0.0, 304.0, 16.0, 8.0, 16.0, 17.0, - 60.0, 56.0, 44.0, 160.0, 8.0, 64.0, 16.0, 52.0, 192.0, 48.0, 496.0, - 80.0, 240.0, 10.0, 208.0, 24.0, 304.0, 54.0, 152.0, 32.0, 44.0, 16.0, - 432.0, 496.0, 10.0, 96.0, 288.0, 160.0, 336.0, 160.0, 44.0, 18.0, 248.0, - 32.0, 46.0, 18.0, 120.0, 152.0, 64.0, 12.0, 464.0, 80.0, 108.0, 80.0, - 192.0, 50.0, 16.0, 320.0, 8.0, 16.0, 46.0, 80.0, 152.0, 18.0, 2.0, - 9.0, 14.0, 14.0, 208.0, 32.0, 0.0, 48.0, 0.0, 6.0, 32.0, 432.0, - 208.0, 272.0, 0.0, 272.0, 54.0, 4.0, 0.0, 224.0, 128.0, 160.0, 8.0, - 48.0, 26.0, 46.0, 0.0, 0.0, 16.0, 4.0, 0.0, 40.0, 108.0, 56.0, - 24.0, 28.0, 13.0, 248.0, 64.0, 29.0, 480.0, 26.0, 44.0, 192.0, 16.0, - 416.0, 8.0, 26.0, 120.0, 4.0, 2.0, 62.0, 80.0, 240.0, 12.0, 200.0, - 96.0, 16.0, 30.0, 208.0, 2.0, 36.0, 54.0, 96.0, 224.0, 80.0, 0.0, - 128.0, 96.0, 16.0, 0.0, 18.0, 24.0, 26.0, 92.0, 136.0, 9.0, 12.0, - 68.0, 152.0, 416.0, 184.0, 3.0, 24.0, 480.0, 104.0, 36.0, 46.0, 24.0, - 38.0, 288.0, 10.0, 27.0, 104.0, 36.0, 2.0, 10.0, 248.0, 96.0, 304.0, - 88.0, 52.0, 1.0, 208.0, 14.0, 6.0, 124.0, 224.0, 17.0, 76.0, 19.0, - 176.0, 168.0, 136.0, 120.0, 240.0, 10.0, 60.0, 76.0, 40.0, 272.0, 248.0, - 15.0, 60.0, 4.0, 40.0, 22.0, 42.0, 68.0, 464.0, 8.0, 1.0, 32.0, - 160.0, 13.0, 26.0, 368.0, 448.0, 176.0, 112.0, 72.0, 0.0, 26.0, 80.0, - 24.0, 240.0, 72.0, 72.0, 60.0, 24.0, 88.0, 248.0, 120.0, 448.0, 232.0, - 108.0, 104.0, 48.0, 96.0, 56.0, 22.0, 11.0, 88.0, 136.0, 72.0, 8.0, - 30.0, 10.0, 496.0, 116.0, 24.0, 108.0, 224.0, 60.0, 76.0, 16.0, 368.0, - 96.0, 0.0, 48.0, 160.0, 24.0, 352.0, 96.0, 46.0, 2.0, 224.0, 44.0, - 48.0, 16.0, 0.0, 496.0, 48.0, 18.0, 36.0, 14.0, 7.0, 30.0, 25.0, - 288.0, 24.0, 104.0, 120.0, 29.0, 160.0, 232.0, 192.0, 40.0, 208.0, 12.0, - 88.0, 6.0, 54.0, 128.0, 96.0, 192.0, 48.0, 29.0, 10.0, 28.0, 176.0, - 108.0, 448.0, 28.0, 16.0, 12.0, 116.0, 36.0, 15.0, 20.0, 12.0, 464.0, - 1.0, 44.0, 216.0, 60.0, 0.0, 28.0, 11.0, 108.0, 24.0, 320.0, 84.0, - 48.0, 7.0, 8.0, 200.0, 3.0, 176.0, 32.0, 54.0, 16.0, 108.0, 184.0, - 36.0, 54.0, 32.0, 120.0, 14.0, 60.0, 160.0, 88.0, 50.0, 8.0, 14.0, - 6.0, 0.0, 24.0, 108.0, 416.0, 56.0, 2.0, 80.0, 24.0, 160.0, 304.0, - 136.0, 4.0, 4.0, 36.0, 168.0, 96.0, 184.0, 84.0, 38.0, 30.0, 272.0, - 7.0, 5.0, 18.0, 192.0, 336.0, 46.0, 76.0, 16.0, 34.0, 48.0, 9.0, - 36.0, 0.0, 11.0, 10.0, 480.0, 28.0, 18.0, 52.0, 128.0, 256.0, 24.0, - 9.0, 24.0, 168.0, 52.0, 9.0, 240.0, 160.0, 216.0, 17.0, 29.0, 496.0, - 4.0, 24.0, 50.0, 320.0, 12.0, 448.0, 50.0, 0.0, 72.0, 108.0, 200.0, - 176.0, 272.0, 24.0, 12.0, 26.0, 208.0, 28.0, 160.0, 22.0, 192.0, 208.0, - 76.0, 160.0, 10.0, 12.0, 24.0, 368.0, 232.0, 6.0, 14.0, 0.0, 24.0, - 96.0, 216.0, 2.0, 88.0, 36.0, 31.0, 32.0, 60.0, 24.0, 24.0, 224.0, - 96.0, 88.0, 40.0, 232.0, 80.0, 304.0, 200.0, 23.0, 104.0, 104.0, 0.0, - 120.0, 224.0, 24.0, 336.0, 136.0, 480.0, 32.0, 100.0, 88.0, 40.0, 16.0, - 16.0, 27.0, 3.0, 18.0, 34.0, 72.0, 26.0, 44.0, 144.0, 480.0, 20.0, - 10.0, 18.0, 50.0, 416.0, 52.0, 16.0, 448.0, 304.0, 464.0, 26.0, 14.0, - 480.0, 48.0, 40.0, 12.0, 144.0, 16.0, 28.0, 80.0, 29.0, 26.0, 4.0, - 224.0, 36.0, 32.0, 32.0, 9.0, 88.0, 52.0, 84.0, 16.0, 17.0, 16.0, - 8.0, 108.0, 13.0, 448.0, 116.0, 400.0, 448.0, 480.0, 208.0, 0.0, 32.0, - 8.0, 16.0, 8.0, 16.0, 34.0, 104.0, 8.0, 34.0, 48.0, 128.0, 272.0, - 288.0, 76.0, 13.0, 32.0, 168.0, 18.0, 432.0, 120.0, 100.0, 1.0, 176.0, - 27.0, 72.0, 12.0, 32.0, 40.0, 72.0, 0.0, 72.0, 464.0, 13.0, 16.0, - 184.0, 42.0, 18.0, 448.0, 16.0, 52.0, 320.0, 272.0, 14.0, 52.0, 464.0, - 56.0, 144.0, 144.0, 104.0, 0.0, 8.0, 1.0, 60.0, 38.0, 80.0, 96.0, - 32.0, 5.0, 34.0, 15.0, 17.0, 68.0, 112.0, 12.0, 112.0, 0.0, 104.0, - 42.0, 50.0, 6.0, 96.0, 240.0, 80.0, 64.0, 23.0, 304.0, 96.0, 56.0, - 6.0, 112.0, 256.0, 36.0, 6.0, 0.0, 240.0, 20.0, 176.0, 12.0, 10.0, - 17.0, 24.0, 15.0, 160.0, 20.0, 38.0, 240.0, 7.0, 20.0, 36.0, 92.0, - 31.0, 16.0, 248.0, 1.0, 480.0, 64.0, 88.0, 384.0, 17.0, 24.0, 120.0, - 56.0, 64.0, 104.0, 12.0, 36.0, 14.0, 17.0, 0.0, 240.0, 6.0, 48.0, - 60.0, 28.0, 124.0, 32.0, 112.0, 10.0, 34.0, 9.0, 21.0, 60.0, 352.0, - 80.0, 0.0, 64.0, 96.0, 104.0, 4.0, 24.0, 352.0, 14.0, 13.0, 31.0, - 304.0, 128.0, 184.0, 6.0, 112.0, 56.0, 76.0, 176.0, 16.0, 96.0, 0.0, - 16.0, 40.0, 20.0, 7.0, 4.0, 40.0, 464.0, 52.0, 10.0, 92.0, 16.0, - 16.0, 0.0, 144.0, 0.0, 16.0, 0.0, 6.0, 24.0, 13.0, 40.0, 60.0, - 16.0, 224.0, 24.0, 16.0, 36.0, 30.0, 46.0, 496.0, 68.0, 0.0, 20.0, - 9.0, 104.0, 30.0, 200.0, 21.0, 168.0, 18.0, 9.0, 16.0, 28.0, 384.0, - 3.0, 36.0, 10.0, 24.0, 72.0, 8.0, 31.0, 400.0, 2.0, 208.0, 116.0, - 29.0, 32.0, 27.0, 0.0, 12.0, 40.0, 80.0, 2.0, 104.0, 120.0, 288.0, - 400.0, 160.0, 13.0, 288.0, 20.0, 152.0, 28.0, 22.0, 12.0, 224.0, 248.0, - 16.0, 27.0, 64.0, 40.0, 10.0, 192.0, 20.0, 18.0, 144.0, 32.0, 8.0, - 168.0, 18.0, 152.0, 272.0, 7.0, 22.0, 432.0, 116.0, 120.0, 48.0, 32.0, - 288.0, 28.0, 272.0, 288.0, 9.0, 80.0, 20.0, 416.0, 2.0, 24.0, 464.0, - 28.0, 64.0, 40.0, 29.0, 272.0, 104.0, 5.0, 76.0, 384.0, 16.0, 62.0, - 88.0, 24.0, 168.0, 42.0, 160.0, 116.0, 208.0, 10.0, 48.0, 30.0, 48.0, - 22.0, 22.0, 68.0, 4.0, 1.0, 144.0, 144.0, 112.0, 8.0, 160.0, 112.0, - 64.0, 42.0, 18.0, 96.0, 56.0, 30.0, 24.0, 11.0, 30.0, 29.0, 36.0, - 68.0, 8.0, 96.0, 6.0, 26.0, 56.0, 6.0, 400.0, 80.0, 368.0, 8.0, - 108.0, 24.0, 20.0, 304.0, 128.0, 6.0, 62.0, 20.0, 40.0, 352.0, 1.0, - 208.0, 76.0, 8.0, 16.0, 8.0, 6.0, 200.0, 352.0, 18.0, 29.0, 20.0, - 208.0, 76.0, 44.0, 15.0, 60.0, 44.0, 52.0, 124.0, 92.0, 464.0, 56.0, - 9.0, 24.0, 0.0, 92.0, 288.0, 112.0, 32.0, 56.0, 26.0, 80.0, 32.0, - 52.0, 34.0, 40.0, 58.0, 40.0, 19.0, 64.0, 352.0, 216.0, 27.0, 24.0, - 60.0, 232.0, 104.0, 0.0, 12.0, 24.0, 72.0, 464.0, 108.0, 464.0, 208.0, - 0.0, 368.0, 30.0, 96.0, 112.0, 30.0, 3.0, 50.0, 26.0, 192.0, 76.0, - 152.0, 5.0, 16.0, 76.0, 40.0, 160.0, 18.0, 84.0, 64.0, 96.0, 4.0, - 36.0, 20.0, 28.0, 152.0, 160.0, 28.0, 160.0, 176.0, 240.0, 108.0, 160.0, - 4.0, 52.0, 56.0, 17.0, 96.0, 136.0, 96.0, 116.0, 160.0, 60.0, 80.0, - 40.0, 58.0, 20.0, 144.0, 58.0, 76.0, 432.0, 224.0, 8.0, 176.0, 38.0, - 88.0, 16.0, 80.0, 28.0, 52.0, 232.0, 30.0, 0.0, 112.0, 144.0, 48.0, - 336.0, 240.0, 56.0, 240.0, 58.0, 58.0, 104.0, 384.0, 26.0, 32.0, 27.0, - 8.0, 58.0, 160.0, 20.0, 16.0, 20.0, 10.0, 56.0, 24.0, 48.0, 24.0, - 0.0, 80.0, 120.0, 184.0, 14.0, 4.0, 60.0, 160.0, 72.0, 13.0, 48.0, - 60.0, 44.0, 31.0, 42.0, 15.0, 12.0, 40.0, 4.0, 88.0, 46.0, 20.0, - 304.0, 40.0, 38.0, 40.0, 9.0, 10.0, 336.0, 272.0, 36.0, 4.0, 42.0, - 3.0, 72.0, 20.0, 112.0, 32.0, 25.0, 52.0, 6.0, 200.0, 136.0, 34.0, - 112.0, 24.0, 104.0, 16.0, 160.0, 288.0, 128.0, 352.0, 120.0, 16.0, 12.0, - 12.0, 480.0, 116.0, 0.0, 30.0, 48.0, 23.0, 272.0, 96.0, 480.0, 176.0, - 30.0, 19.0, 8.0, 24.0, 232.0, 112.0, 22.0, 208.0, 60.0, 224.0, 80.0, - 0.0, 30.0, 0.0, 16.0, 20.0, 0.0, 13.0, 124.0, 144.0, 58.0, 19.0, - 56.0, 52.0, 112.0, 50.0, 64.0, 224.0, 6.0, 176.0, 50.0, 0.0, 256.0, - 30.0, 100.0, 18.0, 176.0, 208.0, 136.0, 0.0, 15.0, 24.0, 208.0, 56.0, - 32.0, 64.0, 52.0, 368.0, 56.0, 31.0, 27.0, 112.0, 16.0, 16.0, 7.0, - 104.0, 44.0, 2.0, 5.0, 192.0, 20.0, 128.0, 76.0, 27.0, 17.0, 14.0, - 44.0, 12.0, 112.0, 4.0, 52.0, 416.0, 1.0, 36.0, 240.0, 25.0, 232.0, - 48.0, 28.0, 76.0, 4.0, 10.0, 16.0, 26.0, 248.0, 192.0, 240.0, 56.0, - 10.0, 0.0, 18.0, 400.0, 8.0, 200.0, 208.0, 128.0, 176.0, 56.0, 14.0, - 32.0, 72.0, 46.0, 48.0, 9.0, 448.0, 44.0, 28.0, 50.0, 16.0, 4.0, - 208.0, 232.0, 124.0, 4.0, 88.0, 192.0, 96.0, 32.0, 48.0, 56.0, 128.0, - 80.0, 128.0, 184.0, 26.0, 16.0, 14.0, 84.0, 16.0, 54.0, 224.0, 10.0, - 168.0, 12.0, 160.0, 21.0, 14.0, 8.0, 52.0, 29.0, 152.0, 20.0, 28.0, - 4.0, 72.0, 112.0, 40.0, 72.0, 48.0, 11.0, 192.0, 54.0, 192.0, 144.0, - 8.0, 48.0, 100.0, 23.0, 30.0, 248.0, 52.0, 62.0, 56.0, 28.0, 116.0, - 80.0, 20.0, 32.0, 14.0, 116.0, 4.0, 88.0, 6.0, 4.0, 6.0, 288.0, - 256.0, 0.0, 0.0, 192.0, 128.0, 14.0, 25.0, 2.0, 272.0, 29.0, 22.0, - 88.0, 36.0, 56.0, 64.0, 176.0, 29.0, 72.0, 112.0, 128.0, 46.0, 40.0, - 50.0, 34.0, 64.0, 18.0, 30.0, 368.0, 4.0, 60.0, 56.0, 16.0, 22.0, - 8.0, 10.0, 176.0, 136.0, 16.0, 15.0, 20.0, 16.0, 26.0, 116.0, 116.0, - 56.0, 96.0, 92.0, 62.0, 116.0, 120.0, 4.0, 368.0, 40.0, 16.0, 24.0, - 8.0, 368.0, 72.0, 10.0, 16.0, 16.0, 52.0, 224.0, 224.0, 29.0, 34.0, - 54.0, 0.0, 15.0, 48.0, 80.0, 136.0, 96.0, 32.0, 18.0, 56.0, 60.0, - 22.0, 54.0, 0.0, 24.0, 128.0, 232.0, 9.0, 0.0, 14.0, 0.0, 72.0, - 56.0, 0.0, 8.0, 232.0, 32.0, 64.0, 13.0, 30.0, 4.0, 192.0, 216.0, - 12.0, 104.0, 14.0, 48.0, 0.0, 304.0, 9.0, 464.0, 96.0, 8.0, 320.0, - 9.0, 17.0, 54.0, 4.0, 4.0, 104.0, 48.0, 8.0, 48.0, 240.0, 28.0, - 108.0, 88.0, 124.0, 104.0, 72.0, 12.0, 34.0, 9.0, 32.0, 48.0, 44.0, - 24.0, 15.0, 10.0, 240.0, 14.0, 17.0, 44.0, 15.0, 160.0, 68.0, 8.0, - 80.0, 16.0, 20.0, 40.0, 128.0, 36.0, 20.0, 108.0, 54.0, 22.0, 10.0, - 4.0, 52.0, 108.0, 112.0, 0.0, 248.0, 96.0, 7.0, 15.0, 116.0, 20.0, - 0.0, 96.0, 496.0, 192.0, 128.0, 20.0, 240.0, 240.0, 50.0, 0.0, 232.0, - 34.0, 240.0, 232.0, 0.0, 120.0, 184.0, 288.0, 144.0, 288.0, 80.0, 4.0, - 124.0, 62.0, 40.0, 112.0, 48.0, 8.0, 48.0, 6.0, 208.0, 104.0, 26.0, - 64.0, 52.0, 58.0, 120.0, 76.0, 52.0, 88.0, 15.0, 50.0, 22.0, 128.0, - 10.0, 224.0, 64.0, 76.0, 6.0, 72.0, 5.0, 72.0, 48.0, 8.0, 108.0, - 32.0, 12.0, 168.0, 26.0, 50.0, 448.0, 30.0, 224.0, 448.0, 224.0, 20.0, - 28.0, 42.0, 36.0, 27.0, 232.0, 18.0, 9.0, 48.0, 16.0, 64.0, 176.0, - 17.0, 14.0, 26.0, 32.0, 320.0, 64.0, 60.0, 0.0, 84.0, 56.0, 80.0, - 96.0, 24.0, 12.0, 48.0, 2.0, 208.0, 320.0, 34.0, 10.0, 27.0, 19.0, - 108.0, 64.0, 116.0, 176.0, 12.0, 224.0, 48.0, 0.0, 4.0, 144.0, 496.0, - 14.0, 232.0, 80.0, 416.0, 12.0, 448.0, 76.0, 23.0, 50.0, 96.0, 96.0, - 11.0, 19.0, 240.0, 1.0, 4.0, 44.0, 16.0, 112.0, 32.0, 7.0, 4.0, - 112.0, 16.0, 22.0, 416.0, 208.0, 248.0, 152.0, 44.0, 84.0, 120.0, 16.0, - 19.0, 20.0, 3.0, 23.0, 120.0, 124.0, 112.0, 0.0, 8.0, 336.0, 8.0, - 18.0, 32.0, 432.0, 36.0, 42.0, 21.0, 64.0, 21.0, 16.0, 6.0, 9.0, - 0.0, 16.0, 112.0, 0.0, 80.0, 4.0, 22.0, 17.0, 40.0, 464.0, 112.0, - 20.0, 38.0, 40.0, 80.0, 256.0, 80.0, 124.0, 32.0, 54.0, 88.0, 416.0, - 352.0, 124.0, 28.0, 112.0, 2.0, 144.0, 26.0, 36.0, 38.0, 8.0, 248.0, - 416.0, 304.0, 40.0, 22.0, 13.0, 4.0, 18.0, 12.0, 80.0, 136.0, 0.0, - 104.0, 0.0, 30.0, 152.0, 176.0, 3.0, 96.0, 116.0, 26.0, 336.0, 208.0, - 22.0, 3.0, 9.0, 22.0, 24.0, 15.0, 320.0, 54.0, 400.0, 384.0, 480.0, - 4.0, 124.0, 8.0, 19.0, 92.0, 64.0, 62.0, 224.0, 54.0, 112.0, 0.0, - 12.0, 0.0, 96.0, 60.0, 10.0, 304.0, 62.0, 44.0, 34.0, 248.0, 22.0, - 160.0, 12.0, 36.0, 80.0, 32.0, 15.0, 496.0, 18.0, 26.0, 192.0, 368.0, - 5.0, 272.0, 96.0, 224.0, 20.0, 116.0, 24.0, 5.0, 320.0, 416.0, 46.0, - 128.0, 112.0, 7.0, 144.0, 6.0, 464.0, 68.0, 10.0, 54.0, 176.0, 68.0, - 120.0, 30.0, 28.0, 16.0, 240.0, 20.0, 80.0, 32.0, 9.0, 48.0, 92.0, - 31.0, 28.0, 56.0, 22.0, 184.0, 152.0, 6.0, 0.0, 40.0, 32.0, 112.0, - 208.0, 18.0, 120.0, 40.0, 10.0, 44.0, 24.0, 32.0, 46.0, 496.0, 16.0, - 50.0, 44.0, 40.0, 20.0, 304.0, 0.0, 6.0, 96.0, 50.0, 80.0, 2.0, - 22.0, 34.0, 84.0, 6.0, 4.0, 48.0, 152.0, 40.0, 13.0, 32.0, 76.0, - 12.0, 240.0, 24.0, 3.0, 1.0, 24.0, 52.0, 88.0, 29.0, 128.0, 432.0, - 62.0, 28.0, 52.0, 448.0, 104.0, 19.0, 44.0, 17.0, 16.0, 12.0, 104.0, - 10.0, 10.0, 54.0, 22.0, 240.0, 216.0, 352.0, 232.0, 0.0, 24.0, 62.0, - 29.0, 480.0, 160.0, 72.0, 84.0, 84.0, 0.0, 8.0, 80.0, 0.0, 480.0, - 22.0, 448.0, 25.0, 31.0, 80.0, 496.0, 32.0, 116.0, 64.0, 46.0, 192.0, - 30.0, 8.0, 112.0, 168.0, 112.0, 17.0, 88.0, 20.0, 22.0, 352.0, 24.0, - 48.0, 4.0, 48.0, 52.0, 10.0, 24.0, 24.0, 104.0, 240.0, 64.0, 20.0, - 8.0, 112.0, 10.0, 52.0, 144.0, 22.0, 432.0, 24.0, 13.0, 76.0, 72.0, - 12.0, 384.0, 20.0, 176.0, 8.0, 0.0, 32.0, 24.0, 112.0, 40.0, 21.0, - 8.0, 11.0, 12.0, 14.0, 16.0, 24.0, 288.0, 6.0, 216.0, 304.0, 320.0, - 27.0, 8.0, 25.0, 30.0, 15.0, 24.0, 128.0, 112.0, 144.0, 24.0, 0.0, - 28.0, 192.0, 16.0, 160.0, 8.0, 30.0, 48.0, 224.0, 36.0, 200.0, 22.0, - 31.0, 2.0, 11.0, 64.0, 44.0, 304.0, 30.0, 128.0, 10.0, 29.0, 144.0, - 8.0, 32.0, 64.0, 24.0, 24.0, 184.0, 16.0, 108.0, 0.0, 192.0, 108.0, - 2.0, 8.0, 44.0, 108.0, 48.0, 30.0, 120.0, 21.0, 184.0, 152.0, 272.0, - 152.0, 76.0, 32.0, 50.0, 20.0, 7.0, 104.0, 29.0, 32.0, 28.0, 8.0, - 16.0, 2.0, 22.0, 24.0, 60.0, 384.0, 11.0, 232.0, 25.0, 38.0, 8.0, - 216.0, 16.0, 32.0, 48.0, 88.0, 13.0, 3.0, 160.0, 30.0, 12.0, 40.0, - 40.0, 30.0, 36.0, 80.0, 24.0, 112.0, 32.0, 496.0, 2.0, 28.0, 52.0, - 216.0, 16.0, 112.0, 0.0, 136.0, 32.0, 304.0, 464.0, 368.0, 248.0, 52.0, - 32.0, 32.0, 21.0, 4.0, 96.0, 352.0, 232.0, 0.0, 7.0, 88.0, 16.0, - 0.0, 2.0, 464.0, 128.0, 28.0, 416.0, 56.0, 52.0, 288.0, 24.0, 13.0, - 0.0, 136.0, 16.0, 464.0, 48.0, 16.0, 16.0, 112.0, 384.0, 384.0, 80.0, - 160.0, 4.0, 96.0, 24.0, 144.0, 12.0, 24.0, 80.0, 9.0, 104.0, 9.0, - 72.0, 9.0, 31.0, 432.0, 84.0, 20.0, 96.0, 48.0, 48.0, 12.0, 40.0, - 2.0, 25.0, 7.0, 160.0, 16.0, 384.0, 168.0, 464.0, 216.0, 192.0, 92.0, - 112.0, 240.0, 240.0, 44.0, 192.0, 304.0, 12.0, 11.0, 224.0, 40.0, 160.0, - 240.0, 36.0, 40.0, 20.0, 136.0, 26.0, 80.0, 28.0, 58.0, 304.0, 240.0, - 8.0, 384.0, 64.0, 1.0, 36.0, 448.0, 168.0, 16.0, 20.0, 6.0, 176.0, - 52.0, 7.0, 88.0, 96.0, 16.0, 12.0, 27.0, 48.0, 56.0, 18.0, 208.0, - 44.0, 76.0, 56.0, 128.0, 40.0, 16.0, 480.0, 72.0, 248.0, 0.0, 6.0, - 288.0, 120.0, 28.0, 62.0, 240.0, 42.0, 18.0, 26.0, 12.0, 32.0, 216.0, - 112.0, 24.0, 10.0, 6.0, 12.0, 6.0, 54.0, 304.0, 50.0, 36.0, 28.0, - 27.0, 72.0, 432.0, 40.0, 34.0, 136.0, 34.0, 48.0, 10.0, 1.0, 1.0, - 10.0, 16.0, 0.0, 96.0, 54.0, 240.0, 25.0, 24.0, 464.0, 26.0, 14.0, - 16.0, 0.0, 28.0, 56.0, 72.0, 144.0, 19.0, 304.0, 224.0, 19.0, 112.0, - 12.0, 336.0, 224.0, 56.0, 288.0, 124.0, 96.0, 192.0, 224.0, 0.0, 480.0, - 40.0, 112.0, 160.0, 24.0, 76.0, 152.0, 336.0, 168.0, 56.0, 368.0, 32.0, - 30.0, 23.0, 160.0, 28.0, 88.0, 40.0, 192.0, 192.0, 30.0, 2.0, 88.0, - 88.0, 32.0, 80.0, 48.0, 80.0, 31.0, 192.0, 15.0, 208.0, 116.0, 24.0, - 4.0, 34.0, 2.0, 24.0, 176.0, 144.0, 36.0, 0.0, 368.0, 16.0, 0.0, - 42.0, 368.0, 32.0, 152.0, 20.0, 92.0, 104.0, 40.0, 52.0, 160.0, 26.0, - 3.0, 56.0, 304.0, 40.0, 14.0, 26.0, 9.0, 29.0, 128.0, 48.0, 480.0, - 52.0, 28.0, 208.0, 60.0, 50.0, 64.0, 112.0, 320.0, 54.0, 48.0, 104.0, - 24.0, 1.0, 20.0, 64.0, 8.0, 112.0, 40.0, 24.0, 21.0, 4.0, 108.0, - 20.0, 12.0, 32.0, 18.0, 112.0, 2.0, 136.0, 44.0, 248.0, 384.0, 14.0, - 32.0, 232.0, 24.0, 11.0, 40.0, 21.0, 72.0, 432.0, 232.0, 48.0, 62.0, - 108.0, 168.0, 28.0, 19.0, 336.0, 48.0, 72.0, 28.0, 12.0, 24.0, 24.0, - 176.0, 32.0, 32.0, 304.0, 80.0, 88.0, 56.0, 29.0, 216.0, 112.0, 48.0, - 21.0, 24.0, 80.0, 336.0, 26.0, 4.0, 14.0, 14.0, 42.0, 248.0, 22.0, - 0.0, 160.0, 0.0, 72.0, 104.0, 64.0, 2.0, 21.0, 8.0, 216.0, 4.0, - 176.0, 272.0, 56.0, 160.0, 18.0, 152.0, 144.0, 72.0, 88.0, 60.0, 56.0, - 72.0, 88.0, 56.0, 24.0, 22.0, 58.0, 168.0, 22.0, 34.0, 160.0, 28.0, - 36.0, 10.0, 176.0, 92.0, 28.0, 23.0, 32.0, 176.0, 112.0, 4.0, 10.0, - 42.0, 240.0, 6.0, 116.0, 320.0, 24.0, 22.0, 62.0, 28.0, 160.0, 480.0, - 0.0, 304.0, 4.0, 28.0, 2.0, 16.0, 48.0, 16.0, 104.0, 7.0, 84.0, - 240.0, 120.0, 31.0, 32.0, 29.0, 2.0, 400.0, 18.0, 160.0, 0.0, 8.0, - 19.0, 16.0, 34.0, 0.0, 48.0, 18.0, 88.0, 26.0, 16.0, 18.0, 48.0, - 248.0, 24.0, 16.0, 272.0, 44.0, 8.0, 16.0, 22.0, 48.0, 0.0, 368.0, - 40.0, 112.0, 27.0, 10.0, 160.0, 32.0, 62.0, 11.0, 26.0, 88.0, 200.0, - 38.0, 10.0, 168.0, 88.0, 10.0, 112.0, 96.0, 288.0, 23.0, 27.0, 10.0, - 128.0, 15.0, 16.0, 26.0, 192.0, 12.0, 192.0, 104.0, 50.0, 10.0, 14.0, - 13.0, 160.0, 52.0, 448.0, 62.0, 400.0, 100.0, 27.0, 216.0, 4.0, 56.0, - 192.0, 18.0, 32.0, 184.0, 38.0, 8.0, 480.0, 25.0, 216.0, 76.0, 0.0, - 76.0, 368.0, 14.0, 72.0, 17.0, 92.0, 24.0, 480.0, 52.0, 48.0, 42.0, - 12.0, 400.0, 152.0, 256.0, 248.0, 112.0, 96.0, 160.0, 192.0, 52.0, 31.0, - 400.0, 27.0, 38.0, 1.0, 62.0, 52.0, 27.0, 28.0, 4.0, 26.0, 10.0, - 13.0, 288.0, 12.0, 184.0, 9.0, 432.0, 224.0, 7.0, 64.0, 4.0, 40.0, - 44.0, 416.0, 52.0, 28.0, 40.0, 26.0, 20.0, 23.0, 40.0, 88.0, 44.0, - 168.0, 16.0, 192.0, 112.0, 6.0, 54.0, 40.0, 16.0, 12.0, 40.0, 152.0, - 52.0, 224.0, 224.0, 48.0, 46.0, 116.0, 496.0, 56.0, 72.0, 32.0, 6.0, - 124.0, 8.0, 124.0, 240.0, 108.0, 192.0, 100.0, 24.0, 38.0, 40.0, 88.0, - 3.0, 32.0, 60.0, 16.0, 8.0, 38.0, 11.0, 30.0, 68.0, 28.0, 192.0, - 88.0, 48.0, 24.0, 20.0, 48.0, 18.0, 10.0, 10.0, 96.0, 120.0, 16.0, - 7.0, 44.0, 160.0, 12.0, 25.0, 18.0, 160.0, 0.0, 58.0, 54.0, 13.0, - 8.0, 56.0, 38.0, 2.0, 224.0, 11.0, 50.0, 2.0, 60.0, 352.0, 144.0, - 352.0, 400.0, 104.0, 32.0, 108.0, 62.0, 58.0, 3.0, 11.0, 16.0, 6.0, - 4.0, 88.0, 104.0, 40.0, 5.0, 200.0, 240.0, 448.0, 14.0, 208.0, 84.0, - 176.0, 48.0, 19.0, 56.0, 112.0, 13.0, 30.0, 124.0, 8.0, 36.0, 24.0, - 34.0, 11.0, 152.0, 240.0, 72.0, 44.0, 20.0, 32.0, 32.0, 464.0, 336.0, - 448.0, 36.0, 17.0, 0.0, 92.0, 176.0, 12.0, 216.0, 224.0, 8.0, 42.0, - 240.0, 112.0, 88.0, 72.0, 24.0, 80.0, 24.0, 192.0, 124.0, 288.0, 62.0, - 8.0, 48.0, 240.0, 23.0, 200.0, 2.0, 112.0, 6.0, 30.0, 40.0, 120.0, - 128.0, 496.0, 10.0, 8.0, 352.0, 13.0, 320.0, 40.0, 368.0, 232.0, 13.0, - 104.0, 32.0, 28.0, 8.0, 112.0, 13.0, 80.0, 496.0, 48.0, 88.0, 24.0, - 192.0, 232.0, 16.0, 32.0, 464.0, 80.0, 200.0, 464.0, 1.0, 14.0, 144.0, - 31.0, 448.0, 16.0, 80.0, 4.0, 50.0, 240.0, 176.0, 4.0, 2.0, 120.0, - 8.0, 56.0, 304.0, 5.0, 8.0, 48.0, 4.0, 8.0, 72.0, 48.0, 240.0, - 0.0, 416.0, 448.0, 24.0, 384.0, 40.0, 6.0, 208.0, 12.0, 208.0, 25.0, - 352.0, 3.0, 176.0, 29.0, 8.0, 192.0, 28.0, 32.0, 60.0, 80.0, 64.0, - 128.0, 128.0, 400.0, 416.0, 24.0, 62.0, 8.0, 68.0, 416.0, 92.0, 92.0, - 0.0, 112.0, 32.0, 88.0, 18.0, 136.0, 30.0, 80.0, 21.0, 64.0, 240.0, - 24.0, 208.0, 54.0, 216.0, 88.0, 92.0, 19.0, 32.0, 256.0, 32.0, 32.0, - 80.0, 50.0, 248.0, 52.0, 11.0, 128.0, 3.0, 4.0, 10.0, 0.0, 52.0, - 0.0, 48.0, 30.0, 256.0, 18.0, 72.0, 32.0, 31.0, 136.0, 60.0, 40.0, - 29.0, 60.0, 432.0, 272.0, 27.0, 56.0, 480.0, 26.0, 22.0, 124.0, 50.0, - 0.0, 4.0, 6.0, 68.0, 144.0, 24.0, 17.0, 16.0, 8.0, 32.0, 2.0, - 30.0, 76.0, 76.0, 29.0, 28.0, 8.0, 336.0, 30.0, 416.0, 48.0, 120.0, - 48.0, 4.0, 80.0, 8.0, 8.0, 23.0, 128.0, 68.0, 368.0, 27.0, 176.0, - 192.0, 304.0, 32.0, 32.0, 336.0, 20.0, 36.0, 64.0, 96.0, 8.0, 50.0, - 0.0, 104.0, 40.0, 12.0, 496.0, 108.0, 0.0, 240.0, 272.0, 72.0, 320.0, - 84.0, 416.0, 16.0, 176.0, 480.0, 30.0, 7.0, 20.0, 480.0, 464.0, 14.0, - 384.0, 20.0, 20.0, 116.0, 20.0, 256.0, 0.0, 28.0, 60.0, 208.0, 36.0, - 256.0, 12.0, 304.0, 4.0, 96.0, 17.0, 72.0, 2.0, 32.0, 14.0, 480.0, - 6.0, 60.0, 100.0, 8.0, 24.0, 19.0, 60.0, 100.0, 11.0, 13.0, 168.0, - 28.0, 28.0, 368.0, 84.0, 80.0, 224.0, 240.0, 18.0, 34.0, 120.0, 23.0, - 16.0, 192.0, 8.0, 32.0, 31.0, 17.0, 7.0, 192.0, 10.0, 152.0, 0.0, - 10.0, 116.0, 160.0, 448.0, 13.0, 16.0, 14.0, 76.0, 8.0, 14.0, 168.0, - 6.0, 14.0, 16.0, 480.0, 21.0, 72.0, 80.0, 27.0, 9.0, 116.0, 0.0, - 96.0, 44.0, 184.0, 52.0, 152.0, 192.0, 496.0, 40.0, 28.0, 28.0, 16.0, - 48.0, 208.0, 320.0, 304.0, 4.0, 24.0, 40.0, 32.0, 72.0, 92.0, 1.0, - 112.0, 368.0, 152.0, 288.0, 4.0, 12.0, 8.0, 320.0, 18.0, 240.0, 22.0, - 336.0, 320.0, 80.0, 128.0, 16.0, 64.0, 216.0, 48.0, 31.0, 22.0, 108.0, - 96.0, 232.0, 232.0, 136.0, 31.0, 36.0, 116.0, 120.0, 23.0, 232.0, 48.0, - 16.0, 496.0, 1.0, 368.0, 100.0, 0.0, 104.0, 30.0, 192.0, 40.0, 11.0, - 160.0, 16.0, 18.0, 416.0, 52.0, 0.0, 56.0, 26.0, 20.0, 14.0, 32.0, - 32.0, 68.0, 128.0, 304.0, 1.0, 20.0, 112.0, 7.0, 68.0, 352.0, 22.0, - 6.0, 152.0, 18.0, 40.0, 10.0, 240.0, 14.0, 84.0, 12.0, 72.0, 176.0, - 24.0, 30.0, 2.0, 8.0, 32.0, 2.0, 19.0, 23.0, 16.0, 12.0, 448.0, - 1.0, 0.0, 48.0, 27.0, 13.0, 76.0, 84.0, 144.0, 11.0, 12.0, 116.0, - 50.0, 448.0, 56.0, 17.0, 36.0, 384.0, 8.0, 29.0, 4.0, 56.0, 2.0, - 48.0, 16.0, 28.0, 25.0, 16.0, 15.0, 96.0, 104.0, 15.0, 7.0, 48.0, - 48.0, 96.0, 384.0, 464.0, 240.0, 128.0, 176.0, 12.0, 480.0, 26.0, 160.0, - 336.0, 50.0, 288.0, 176.0, 11.0, 352.0, 4.0, 80.0, 11.0, 26.0, 7.0, - 6.0, 48.0, 27.0, 0.0, 496.0, 52.0, 16.0, 23.0, 21.0, 352.0, 16.0, - 480.0, 192.0, 112.0, 5.0, 64.0, 18.0, 8.0, 24.0, 50.0, 108.0, 17.0, - 24.0, 20.0, 28.0, 24.0, 200.0, 56.0, 4.0, 46.0, 26.0, 0.0, 0.0, - 28.0, 152.0, 120.0, 36.0, 112.0, 88.0, 12.0, 24.0, 17.0, 96.0, 80.0, - 96.0, 128.0, 14.0, 240.0, 80.0, 116.0, 8.0, 32.0, 112.0, 192.0, 23.0, - 144.0, 36.0, 384.0, 40.0, 16.0, 120.0, 320.0, 240.0, 38.0, 25.0, 30.0, - 8.0, 496.0, 30.0, 112.0, 200.0, 384.0, 48.0, 76.0, 8.0, 32.0, 384.0, - 88.0, 108.0, 184.0, 124.0, 16.0, 248.0, 1.0, 32.0, 168.0, 52.0, 18.0, - 124.0, 24.0, 400.0, 240.0, 56.0, 48.0, 368.0, 96.0, 30.0, 16.0, 84.0, - 88.0, 0.0, 8.0, 208.0, 4.0, 29.0, 80.0, 144.0, 384.0, 224.0, 4.0, - 12.0, 40.0, 92.0, 88.0, 160.0, 28.0, 96.0, 17.0, 96.0, 112.0, 224.0, - 64.0, 29.0, 48.0, 216.0, 26.0, 13.0, 104.0, 30.0, 48.0, 13.0, 144.0, - 40.0, 208.0, 14.0, 84.0, 72.0, 208.0, 6.0, 240.0, 44.0, 20.0, 92.0, - 40.0, 16.0, 116.0, 56.0, 0.0, 208.0, 88.0, 112.0, 5.0, 36.0, 160.0, - 448.0, 416.0, 28.0, 128.0, 40.0, 76.0, 14.0, 52.0, 352.0, 160.0, 20.0, - 80.0, 112.0, 50.0, 72.0, 2.0, 8.0, 56.0, 104.0, 4.0, 5.0, 38.0, - 192.0, 464.0, 68.0, 124.0, 52.0, 0.0, 112.0, 120.0, 320.0, 32.0, 20.0, - 112.0, 20.0, 7.0, 13.0, 68.0, 88.0, 184.0, 60.0, 128.0, 25.0, 224.0, - 29.0, 52.0, 352.0, 72.0, 168.0, 46.0, 44.0, 76.0, 54.0, 54.0, 32.0, - 16.0, 128.0, 120.0, 240.0, 8.0, 3.0, 28.0, 12.0, 104.0, 224.0, 176.0, - 448.0, 96.0, 34.0, 176.0, 36.0, 17.0, 88.0, 72.0, 11.0, 48.0, 200.0, - 124.0, 24.0, 208.0, 28.0, 6.0, 13.0, 40.0, 52.0, 400.0, 5.0, 232.0, - 0.0, 192.0, 232.0, 0.0, 4.0, 448.0, 36.0, 48.0, 16.0, 20.0, 18.0, - 8.0, 192.0, 116.0, 40.0, 7.0, 176.0, 18.0, 27.0, 40.0, 4.0, 116.0, - 40.0, 72.0, 56.0, 72.0, 144.0, 116.0, 3.0, 320.0, 29.0, 104.0, 224.0, - 12.0, 384.0, 144.0, 120.0, 9.0, 112.0, 240.0, 44.0, 0.0, 208.0, 76.0, - 68.0, 144.0, 104.0, 432.0, 52.0, 29.0, 0.0, 60.0, 400.0, 46.0, 116.0, - 208.0, 29.0, 64.0, 120.0, 368.0, 0.0, 32.0, 112.0, 416.0, 76.0, 304.0, - 44.0, 24.0, 416.0, 44.0, 30.0, 72.0, 16.0, 29.0, 16.0, 50.0, 112.0, - 0.0, 240.0, 208.0, 128.0, 36.0, 44.0, 16.0, 9.0, 46.0, 5.0, 80.0, - 12.0, 208.0, 23.0, 28.0, 208.0, 10.0, 40.0, 32.0, 27.0, 15.0, 44.0, - 416.0, 62.0, 184.0, 240.0, 34.0, 11.0, 248.0, 200.0, 20.0, 144.0, 200.0, - 240.0, 24.0, 80.0, 52.0, 16.0, 3.0, 64.0, 3.0, 24.0, 62.0, 208.0, - 288.0, 0.0, 64.0, 56.0, 84.0, 20.0, 13.0, 464.0, 480.0, 92.0, 208.0, - 17.0, 92.0, 184.0, 464.0, 272.0, 72.0, 368.0, 128.0, 60.0, 22.0, 26.0, - 0.0, 192.0, 5.0, 44.0, 12.0, 20.0, 2.0, 7.0, 224.0, 26.0, 16.0, - 72.0, 8.0, 48.0, 216.0, 16.0, 0.0, 480.0, 80.0, 104.0, 1.0, 11.0, - 432.0, 24.0, 88.0, 208.0, 6.0, 256.0, 9.0, 8.0, 108.0, 32.0, 116.0, - 0.0, 80.0, 96.0, 0.0, 16.0, 32.0, 256.0, 192.0, 92.0, 0.0, 52.0, - 10.0, 7.0, 27.0, 22.0, 144.0, 30.0, 28.0, 48.0, 208.0, 18.0, 104.0, - 108.0, 16.0, 224.0, 40.0, 96.0, 184.0, 416.0, 352.0, 29.0, 6.0, 48.0, - 20.0, 60.0, 46.0, 176.0, 2.0, 76.0, 224.0, 12.0, 320.0, 96.0, 12.0, - 50.0, 96.0, 80.0, 208.0, 48.0, 7.0, 104.0, 46.0, 224.0, 352.0, 20.0, - 42.0, 168.0, 336.0, 24.0, 288.0, 104.0, 200.0, 36.0, 192.0, 200.0, 42.0, - 7.0, 88.0, 16.0, 144.0, 5.0, 96.0, 48.0, 96.0, 27.0, 12.0, 40.0, - 10.0, 200.0, 464.0, 3.0, 16.0, 7.0, 8.0, 32.0, 48.0, 240.0, 192.0, - 12.0, 64.0, 32.0, 0.0, 28.0, 36.0, 32.0, 12.0, 0.0, 112.0, 0.0, - 22.0, 2.0, 56.0, 80.0, 10.0, 13.0, 28.0, 128.0, 12.0, 80.0, 16.0, - 11.0, 8.0, 92.0, 272.0, 48.0, 22.0, 40.0, 6.0, 32.0, 22.0, 6.0, - 96.0, 28.0, 62.0, 12.0, 432.0, 2.0, 34.0, 2.0, 58.0, 112.0, 416.0, - 5.0, 8.0, 30.0, 26.0, 100.0, 4.0, 20.0, 24.0, 176.0, 144.0, 32.0, - 19.0, 2.0, 54.0, 128.0, 32.0, 19.0, 112.0, 64.0, 112.0, 52.0, 432.0, - 36.0, 24.0, 120.0, 56.0, 58.0, 80.0, 52.0, 34.0, 64.0, 32.0, 72.0, - 32.0, 64.0, 120.0, 100.0, 32.0, 24.0, 24.0, 76.0, 40.0, 108.0, 336.0, - 112.0, 11.0, 0.0, 60.0, 120.0, 22.0, 432.0, 88.0, 64.0, 464.0, 152.0, - 368.0, 8.0, 56.0, 80.0, 20.0, 28.0, 24.0, 112.0, 4.0, 24.0, 12.0, - 224.0, 36.0, 12.0, 56.0, 480.0, 56.0, 72.0, 42.0, 24.0, 30.0, 3.0, - 128.0, 60.0, 96.0, 6.0, 112.0, 72.0, 2.0, 120.0, 3.0, 80.0, 3.0, - 320.0, 1.0, 400.0, 4.0, 64.0, 0.0, 128.0, 88.0, 64.0, 88.0, 34.0, - 30.0, 496.0, 42.0, 23.0, 96.0, 112.0, 88.0, 64.0, 40.0, 120.0, 3.0, - 320.0, 232.0, 48.0, 84.0, 52.0, 20.0, 18.0, 26.0, 432.0, 256.0, 16.0, - 96.0, 480.0, 3.0, 36.0, 200.0, 208.0, 176.0, 18.0, 48.0, 336.0, 56.0, - 28.0, 0.0, 2.0, 36.0, 42.0, 96.0, 80.0, 108.0, 46.0, 1.0, 104.0, - 1.0, 56.0, 432.0, 32.0, 36.0, 176.0, 80.0, 44.0, 248.0, 84.0, 14.0, - 248.0, 464.0, 208.0, 2.0, 96.0, 144.0, 248.0, 52.0, 120.0, 46.0, 352.0, - 448.0, 120.0, 4.0, 120.0, 15.0, 0.0, 208.0, 400.0, 27.0, 208.0, 40.0, - 50.0, 184.0, 48.0, 68.0, 50.0, 128.0, 5.0, 144.0, 144.0, 272.0, 128.0, - 28.0, 16.0, 19.0, 60.0, 0.0, 72.0, 2.0, 232.0, 464.0, 48.0, 84.0, - 32.0, 168.0, 116.0, 0.0, 48.0, 31.0, 28.0, 14.0, 0.0, 13.0, 14.0, - 5.0, 4.0, 12.0, 16.0, 16.0, 36.0, 112.0, 84.0, 19.0, 46.0, 76.0, - 288.0, 15.0, 200.0, 248.0, 272.0, 144.0, 192.0, 16.0, 240.0, 3.0, 256.0, - 112.0, 42.0, 192.0, 224.0, 3.0, 160.0, 32.0, 22.0, 20.0, 54.0, 92.0, - 208.0, 1.0, 44.0, 24.0, 24.0, 200.0, 32.0, 22.0, 96.0, 26.0, 18.0, - 400.0, 44.0, 0.0, 38.0, 160.0, 0.0, 0.0, 23.0, 136.0, 368.0, 16.0, - 52.0, 32.0, 160.0, 36.0, 192.0, 5.0, 208.0, 16.0, 0.0, 10.0, 30.0, - 108.0, 256.0, 64.0, 14.0, 54.0, 10.0, 38.0, 28.0, 184.0, 36.0, 32.0, - 112.0, 8.0, 88.0, 13.0, 22.0, 232.0, 15.0, 112.0, 80.0, 336.0, 400.0, - 24.0, 96.0, 32.0, 34.0, 108.0, 48.0, 168.0, 54.0, 240.0, 224.0, 256.0, - 464.0, 20.0, 16.0, 76.0, 168.0, 64.0, 27.0, 304.0, 22.0, 36.0, 26.0, - 22.0, 464.0, 192.0, 23.0, 7.0, 128.0, 36.0, 24.0, 232.0, 352.0, 16.0, - 96.0, 8.0, 16.0, 18.0, 14.0, 6.0, 26.0, 144.0, 128.0, 56.0, 384.0, - 76.0, 144.0, 16.0, 17.0, 42.0, 120.0, 80.0, 12.0, 0.0, 48.0, 248.0, - 32.0, 23.0, 40.0, 200.0, 30.0, 96.0, 5.0, 124.0, 104.0, 40.0, 19.0, - 232.0, 16.0, 0.0, 9.0, 256.0, 104.0, 28.0, 25.0, 88.0, 64.0, 26.0, - 52.0, 29.0, 232.0, 0.0, 72.0, 24.0, 176.0, 32.0, 25.0, 80.0, 368.0, - 16.0, 76.0, 24.0, 23.0, 8.0, 64.0, 16.0, 20.0, 16.0, 50.0, 352.0, - 58.0, 448.0, 336.0, 12.0, 192.0, 8.0, 12.0, 42.0, 28.0, 88.0, 12.0, - 20.0, 48.0, 20.0, 76.0, 32.0, 128.0, 50.0, 2.0, 112.0, 80.0, 2.0, - 32.0, 144.0, 208.0, 152.0, 160.0, 368.0, 240.0, 30.0, 23.0, 28.0, 36.0, - 200.0, 100.0, 23.0, 144.0, 120.0, 384.0, 100.0, 10.0, 64.0, 48.0, 18.0, - 256.0, 88.0, 40.0, 48.0, 60.0, 12.0, 168.0, 20.0, 2.0, 18.0, 32.0, - 256.0, 84.0, 200.0, 352.0, 400.0, 7.0, 48.0, 96.0, 0.0, 36.0, 24.0, - 100.0, 320.0, 24.0, 26.0, 176.0, 46.0, 44.0, 5.0, 64.0, 6.0, 22.0, - 6.0, 50.0, 20.0, 58.0, 208.0, 32.0, 8.0, 12.0, 30.0, 56.0, 12.0, - 30.0, 34.0, 26.0, 12.0, 216.0, 192.0, 76.0, 36.0, 368.0, 384.0, 288.0, - 272.0, 16.0, 416.0, 28.0, 208.0, 44.0, 288.0, 24.0, 4.0, 19.0, 80.0, - 224.0, 36.0, 80.0, 16.0, 400.0, 24.0, 38.0, 232.0, 32.0, 88.0, 108.0, - 112.0, 12.0, 48.0, 6.0, 128.0, 48.0, 120.0, 26.0, 224.0, 8.0, 92.0, - 208.0, 96.0, 144.0, 60.0, 18.0, 24.0, 18.0, 432.0, 60.0, 68.0, 100.0, - 58.0, 224.0, 192.0, 30.0, 384.0, 208.0, 14.0, 26.0, 11.0, 80.0, 160.0, - 24.0, 400.0, 52.0, 6.0, 240.0, 320.0, 32.0, 18.0, 448.0, 40.0, 144.0, - 120.0, 128.0, 22.0, 80.0, 32.0, 136.0, 96.0, 4.0, 448.0, 32.0, 16.0, - 184.0, 26.0, 160.0, 224.0, 184.0, 0.0, 96.0, 272.0, 16.0, 29.0, 88.0, - 128.0, 124.0, 48.0, 4.0, 248.0, 16.0, 28.0, 31.0, 80.0, 30.0, 84.0, - 52.0, 200.0, 208.0, 352.0, 31.0, 120.0, 80.0, 32.0, 96.0, 40.0, 12.0, - 56.0, 40.0, 17.0, 100.0, 30.0, 50.0, 4.0, 24.0, 256.0, 28.0, 384.0, - 416.0, 496.0, 200.0, 1.0, 36.0, 56.0, 15.0, 44.0, 192.0, 54.0, 96.0, - 80.0, 22.0, 32.0, 184.0, 20.0, 18.0, 10.0, 0.0, 0.0, 36.0, 64.0, - 84.0, 68.0, 16.0, 3.0, 24.0, 168.0, 15.0, 52.0, 72.0, 18.0, 32.0, - 44.0, 24.0, 44.0, 20.0, 31.0, 224.0, 16.0, 108.0, 20.0, 8.0, 144.0, - 112.0, 5.0, 48.0, 10.0, 64.0, 76.0, 23.0, 56.0, 38.0, 23.0, 2.0, - 224.0, 496.0, 36.0, 4.0, 96.0, 240.0, 28.0, 64.0, 8.0, 88.0, 184.0, - 9.0, 0.0, 56.0, 144.0, 6.0, 152.0, 48.0, 0.0, 18.0, 18.0, 128.0, - 52.0, 6.0, 16.0, 112.0, 216.0, 46.0, 8.0, 56.0, 52.0, 0.0, 26.0, - 208.0, 48.0, 1.0, 29.0, 112.0, 104.0, 32.0, 27.0, 96.0, 16.0, 432.0, - 232.0, 21.0, 216.0, 16.0, 31.0, 232.0, 28.0, 60.0, 160.0, 112.0, 64.0, - 128.0, 54.0, 112.0, 272.0, 22.0, 0.0, 20.0, 27.0, 20.0, 144.0, 120.0, - 176.0, 416.0, 8.0, 100.0, 84.0, 16.0, 448.0, 58.0, 68.0, 64.0, 32.0, - 0.0, 248.0, 16.0, 54.0, 288.0, 232.0, 96.0, 0.0, 42.0, 72.0, 176.0, - 31.0, 256.0, 62.0, 72.0, 80.0, 28.0, 30.0, 18.0, 10.0, 120.0, 26.0, - 176.0, 24.0, 176.0, 64.0, 44.0, 72.0, 28.0, 7.0, 2.0, 48.0, 288.0, - 104.0, 0.0, 8.0, 72.0, 208.0, 28.0, 120.0, 28.0, 272.0, 108.0, 0.0, - 31.0, 58.0, 16.0, 60.0, 192.0, 42.0, 36.0, 48.0, 16.0, 100.0, 432.0, - 1.0, 16.0, 0.0, 184.0, 28.0, 8.0, 104.0, 24.0, 288.0, 192.0, 30.0, - 32.0, 160.0, 124.0, 64.0, 224.0, 40.0, 16.0, 240.0, 0.0, 480.0, 16.0, - 208.0, 160.0, 480.0, 144.0, 96.0, 30.0, 24.0, 16.0, 32.0, 26.0, 18.0, - 14.0, 36.0, 4.0, 176.0, 4.0, 336.0, 58.0, 184.0, 24.0, 84.0, 128.0, - 34.0, 8.0, 54.0, 58.0, 54.0, 28.0, 18.0, 64.0, 192.0, 44.0, 112.0, - 416.0, 16.0, 20.0, 3.0, 76.0, 44.0, 52.0, 112.0, 0.0, 26.0, 32.0, - 208.0, 24.0, 56.0, 88.0, 50.0, 224.0, 42.0, 0.0, 160.0, 23.0, 60.0, - 112.0, 48.0, 336.0, 48.0, 200.0, 7.0, 7.0, 448.0, 496.0, 0.0, 192.0, - 18.0, 20.0, 4.0, 64.0, 240.0, 80.0, 24.0, 2.0, 84.0, 64.0, 16.0, - 240.0, 15.0, 29.0, 56.0, 68.0, 40.0, 11.0, 400.0, 16.0, 16.0, 32.0, - 116.0, 368.0, 32.0, 72.0, 224.0, 16.0, 5.0, 22.0, 52.0, 7.0, 136.0, - 384.0, 176.0, 8.0, 288.0, 160.0, 46.0, 84.0, 0.0, 288.0, 64.0, 104.0, - 108.0, 22.0, 80.0, 40.0, 8.0, 6.0, 96.0, 12.0, 40.0, 8.0, 84.0, - 56.0, 12.0, 42.0, 16.0, 24.0, 0.0, 48.0, 1.0, 56.0, 16.0, 28.0, - 96.0, 8.0, 0.0, 9.0, 44.0, 23.0, 200.0, 21.0, 24.0, 496.0, 0.0, - 24.0, 16.0, 176.0, 160.0, 36.0, 124.0, 288.0, 26.0, 20.0, 16.0, 128.0, - 36.0, 0.0, 120.0, 176.0, 96.0, 432.0, 7.0, 44.0, 112.0, 46.0, 288.0, - 44.0, 104.0, 72.0, 18.0, 224.0, 480.0, 0.0, 112.0, 128.0, 0.0, 124.0, - 10.0, 88.0, 192.0, 48.0, 176.0, 8.0, 400.0, 34.0, 9.0, 8.0, 96.0, - 26.0, 0.0, 60.0, 18.0, 28.0, 256.0, 88.0, 100.0, 304.0, 21.0, 72.0, - 176.0, 168.0, 26.0, 16.0, 42.0, 36.0, 272.0, 28.0, 44.0, 64.0, 84.0, - 160.0, 32.0, 62.0, 14.0, 40.0, 26.0, 48.0, 104.0, 46.0, 92.0, 18.0, - 38.0, 400.0, 60.0, 36.0, 96.0, 36.0, 240.0, 304.0, 224.0, 26.0, 12.0, - 16.0, 15.0, 7.0, 72.0, 12.0, 48.0, 27.0, 34.0, 27.0, 336.0, 10.0, - 480.0, 24.0, 136.0, 352.0, 104.0, 4.0, 200.0, 48.0, 48.0, 88.0, 16.0, - 104.0, 12.0, 48.0, 21.0, 240.0, 448.0, 80.0, 18.0, 8.0, 6.0, 48.0, - 4.0, 27.0, 464.0, 54.0, 72.0, 80.0, 38.0, 128.0, 17.0, 216.0, 56.0, - 76.0, 112.0, 22.0, 120.0, 120.0, 24.0, 208.0, 176.0, 5.0, 28.0, 42.0, - 0.0, 26.0, 6.0, 23.0, 32.0, 32.0, 240.0, 28.0, 15.0, 112.0, 46.0, - 27.0, 136.0, 248.0, 100.0, 496.0, 15.0, 120.0, 464.0, 48.0, 112.0, 26.0, - 104.0, 128.0, 240.0, 20.0, 192.0, 4.0, 124.0, 352.0, 480.0, 10.0, 29.0, - 96.0, 96.0, 28.0, 16.0, 0.0, 16.0, 27.0, 88.0, 8.0, 16.0, 54.0, - 16.0, 108.0, 2.0, 448.0, 72.0, 72.0, 48.0, 192.0, 29.0, 160.0, 40.0, - 13.0, 16.0, 240.0, 248.0, 10.0, 6.0, 20.0, 6.0, 116.0, 60.0, 29.0, - 36.0, 16.0, 20.0, 0.0, 2.0, 28.0, 14.0, 62.0, 14.0, 30.0, 5.0, - 2.0, 8.0, 18.0, 46.0, 176.0, 384.0, 32.0, 22.0, 18.0, 21.0, 10.0, - 224.0, 32.0, 20.0, 80.0, 0.0, 80.0, 400.0, 6.0, 160.0, 9.0, 10.0, - 120.0, 12.0, 36.0, 10.0, 184.0, 52.0, 368.0, 20.0, 15.0, 44.0, 52.0, - 432.0, 16.0, 26.0, 124.0, 72.0, 24.0, 80.0, 12.0, 88.0, 11.0, 14.0, - 64.0, 76.0, 112.0, 232.0, 42.0, 108.0, 4.0, 31.0, 18.0, 48.0, 4.0, - 10.0, 4.0, 72.0, 432.0, 108.0, 17.0, 24.0, 40.0, 496.0, 6.0, 58.0, - 240.0, 240.0, 56.0, 42.0, 480.0, 1.0, 20.0, 0.0, 176.0, 8.0, 192.0, - 7.0, 10.0, 208.0, 30.0, 232.0, 76.0, 16.0, 112.0, 62.0, 176.0, 32.0, - 208.0, 100.0, 200.0, 240.0, 416.0, 272.0, 32.0, 96.0, 480.0, 7.0, 20.0, - 224.0, 64.0, 200.0, 32.0, 72.0, 31.0, 28.0, 19.0, 200.0, 44.0, 5.0, - 62.0, 40.0, 0.0, 56.0, 96.0, 28.0, 21.0, 120.0, 72.0, 3.0, 10.0, - 448.0, 16.0, 128.0, 176.0, 216.0, 72.0, 80.0, 58.0, 24.0, 80.0, 100.0, - 100.0, 62.0, 256.0, 52.0, 64.0, 2.0, 0.0, 8.0, 184.0, 2.0, 32.0, - 3.0, 432.0, 8.0, 32.0, 200.0, 28.0, 60.0, 0.0, 24.0, 27.0, 8.0, - 208.0, 16.0, 272.0, 100.0, 88.0, 108.0, 496.0, 0.0, 8.0, 52.0, 12.0, - 12.0, 8.0, 64.0, 240.0, 14.0, 11.0, 0.0, 224.0, 29.0, 48.0, 28.0, - 54.0, 336.0, 112.0, 8.0, 200.0, 464.0, 38.0, 168.0, 24.0, 304.0, 26.0, - 40.0, 124.0, 31.0, 104.0, 100.0, 352.0, 72.0, 56.0, 52.0, 112.0, 80.0, - 496.0, 24.0, 216.0, 56.0, 384.0, 28.0, 0.0, 16.0, 28.0, 14.0, 112.0, - 52.0, 14.0, 40.0, 11.0, 224.0, 92.0, 26.0, 76.0, 160.0, 72.0, 416.0, - 24.0, 22.0, 32.0, 216.0, 56.0, 12.0, 26.0, 128.0, 176.0, 240.0, 272.0, - 400.0, 32.0, 24.0, 400.0, 44.0, 100.0, 18.0, 52.0, 24.0, 62.0, 2.0, - 16.0, 60.0, 160.0, 25.0, 480.0, 72.0, 4.0, 176.0, 0.0, 96.0, 52.0, - 7.0, 232.0, 200.0, 54.0, 120.0, 272.0, 176.0, 56.0, 272.0, 152.0, 56.0, - 144.0, 240.0, 56.0, 2.0, 24.0, 32.0, 48.0, 240.0, 54.0, 104.0, 144.0, - 19.0, 11.0, 336.0, 50.0, 14.0, 224.0, 464.0, 10.0, 20.0, 48.0, 15.0, - 88.0, 160.0, 176.0, 30.0, 48.0, 0.0, 28.0, 128.0, 48.0, 5.0, 32.0, - 192.0, 48.0, 240.0, 16.0, 56.0, 80.0, 20.0, 23.0, 464.0, 232.0, 100.0, - 28.0, 96.0, 56.0, 208.0, 96.0, 6.0, 32.0, 26.0, 28.0, 7.0, 464.0, - 288.0, 92.0, 11.0, 8.0, 50.0, 128.0, 15.0, 320.0, 9.0, 42.0, 224.0, - 32.0, 30.0, 10.0, 10.0, 168.0, 108.0, 4.0, 16.0, 44.0, 1.0, 32.0, - 9.0, 0.0, 15.0, 14.0, 256.0, 88.0, 30.0, 13.0, 0.0, 0.0, 88.0, - 108.0, 256.0, 9.0, 184.0, 128.0, 36.0, 432.0, 8.0, 40.0, 72.0, 80.0, - 224.0, 368.0, 32.0, 29.0, 48.0, 48.0, 72.0, 25.0, 48.0, 80.0, 27.0, - 200.0, 72.0, 44.0, 112.0, 36.0, 18.0, 24.0, 224.0, 26.0, 30.0, 120.0, - 62.0, 432.0, 4.0, 6.0, 13.0, 32.0, 52.0, 8.0, 432.0, 36.0, 31.0, - 15.0, 496.0, 0.0, 12.0, 14.0, 0.0, 26.0, 24.0, 88.0, 9.0, 12.0, - 96.0, 320.0, 400.0, 36.0, 240.0, 176.0, 288.0, 2.0, 32.0, 52.0, 64.0, - 20.0, 15.0, 24.0, 2.0, 42.0, 7.0, 42.0, 8.0, 30.0, 144.0, 48.0, - 0.0, 54.0, 40.0, 12.0, 192.0, 3.0, 352.0, 80.0, 14.0, 128.0, 1.0, - 92.0, 0.0, 208.0, 464.0, 4.0, 21.0, 0.0, 56.0, 80.0, 208.0, 30.0, - 192.0, 58.0, 76.0, 14.0, 56.0, 6.0, 24.0, 52.0, 9.0, 24.0, 4.0, - 128.0, 22.0, 152.0, 34.0, 56.0, 52.0, 36.0, 80.0, 480.0, 46.0, 30.0, - 136.0, 240.0, 56.0, 14.0, 64.0, 240.0, 88.0, 272.0, 448.0, 0.0, 192.0, - 4.0, 144.0, 28.0, 32.0, 38.0, 336.0, 56.0, 96.0, 29.0, 116.0, 13.0, - 22.0, 216.0, 16.0, 0.0, 54.0, 272.0, 6.0, 96.0, 108.0, 23.0, 5.0, - 464.0, 464.0, 58.0, 64.0, 96.0, 52.0, 30.0, 240.0, 288.0, 12.0, 25.0, - 116.0, 56.0, 52.0, 88.0, 11.0, 25.0, 46.0, 176.0, 184.0, 23.0, 0.0, - 464.0, 288.0, 176.0, 124.0, 0.0, 496.0, 72.0, 64.0, 80.0, 0.0, 23.0, - 16.0, 80.0, 384.0, 22.0, 0.0, 368.0, 62.0, 62.0, 464.0, 20.0, 88.0, - 0.0, 464.0, 144.0, 416.0, 10.0, 480.0, 19.0, 72.0, 23.0, 11.0, 24.0, - 24.0, 80.0, 48.0, 11.0, 48.0, 432.0, 48.0, 16.0, 38.0, 32.0, 4.0, - 19.0, 176.0, 68.0, 480.0, 168.0, 152.0, 6.0, 23.0, 120.0, 16.0, 60.0, - 160.0, 96.0, 28.0, 12.0, 80.0, 18.0, 120.0, 36.0, 64.0, 21.0, 30.0, - 216.0, 23.0, 20.0, 240.0, 352.0, 72.0, 27.0, 160.0, 40.0, 20.0, 124.0, - 1.0, 96.0, 16.0, 16.0, 24.0, 100.0, 24.0, 2.0, 108.0, 36.0, 22.0, - 60.0, 96.0, 120.0, 32.0, 88.0, 128.0, 62.0, 136.0, 40.0, 272.0, 7.0, - 272.0, 368.0, 26.0, 31.0, 88.0, 160.0, 42.0, 36.0, 11.0, 17.0, 56.0, - 0.0, 62.0, 256.0, 32.0, 11.0, 16.0, 17.0, 368.0, 4.0, 84.0, 5.0, - 192.0, 56.0, 32.0, 32.0, 120.0, 72.0, 17.0, 120.0, 320.0, 152.0, 5.0, - 92.0, 13.0, 58.0, 120.0, 7.0, 88.0, 248.0, 17.0, 80.0, 80.0, 8.0, - 9.0, 104.0, 30.0, 32.0, 25.0, 15.0, 24.0, 448.0, 48.0, 22.0, 112.0, - 136.0, 6.0, 128.0, 64.0, 28.0, 0.0, 64.0, 152.0, 7.0, 12.0, 21.0, - 208.0, 40.0, 31.0, 0.0, 16.0, 240.0, 28.0, 96.0, 52.0, 10.0, 116.0, - 56.0, 192.0, 160.0, 32.0, 160.0, 176.0, 10.0, 38.0, 6.0, 120.0, 224.0, - 25.0, 10.0, 200.0, 8.0, 64.0, 16.0, 23.0, 304.0, 0.0, 60.0, 16.0, - 0.0, 31.0, 288.0, 30.0, 32.0, 30.0, 432.0, 176.0, 18.0, 24.0, 68.0, - 176.0, 32.0, 144.0, 10.0, 288.0, 58.0, 208.0, 112.0, 184.0, 240.0, 72.0, - 224.0, 25.0, 336.0, 192.0, 40.0, 24.0, 12.0, 36.0, 56.0, 20.0, 0.0, - 240.0, 144.0, 26.0, 22.0, 88.0, 12.0, 8.0, 6.0, 22.0, 18.0, 0.0, - 160.0, 2.0, 19.0, 8.0, 160.0, 29.0, 32.0, 384.0, 4.0, 12.0, 144.0, - 1.0, 272.0, 208.0, 88.0, 40.0, 92.0, 4.0, 168.0, 34.0, 24.0, 42.0, - 432.0, 42.0, 352.0, 29.0, 92.0, 92.0, 144.0, 50.0, 44.0, 288.0, 28.0, - 288.0, 200.0, 200.0, 54.0, 23.0, 16.0, 50.0, 30.0, 432.0, 20.0, 168.0, - 16.0, 32.0, 12.0, 28.0, 384.0, 224.0, 144.0, 320.0, 208.0, 248.0, 48.0, - 496.0, 104.0, 34.0, 112.0, 8.0, 30.0, 0.0, 100.0, 22.0, 104.0, 58.0, - 320.0, 64.0, 80.0, 464.0, 4.0, 24.0, 48.0, 14.0, 20.0, 480.0, 184.0, - 26.0, 128.0, 10.0, 104.0, 14.0, 112.0, 5.0, 52.0, 31.0, 50.0, 19.0, - 16.0, 384.0, 248.0, 4.0, 30.0, 120.0, 22.0, 68.0, 22.0, 26.0, 20.0, - 96.0, 24.0, 4.0, 28.0, 48.0, 0.0, 160.0, 60.0, 124.0, 25.0, 464.0, - 80.0, 304.0, 88.0, 6.0, 64.0, 32.0, 80.0, 16.0, 12.0, 36.0, 216.0, - 368.0, 6.0, 20.0, 24.0, 16.0, 208.0, 29.0, 24.0, 19.0, 20.0, 80.0, - 26.0, 4.0, 112.0, 2.0, 112.0, 22.0, 13.0, 56.0, 136.0, 400.0, 192.0, - 20.0, 10.0, 68.0, 96.0, 31.0, 54.0, 232.0, 104.0, 60.0, 128.0, 0.0, - 208.0, 224.0, 12.0, 44.0, 200.0, 152.0, 60.0, 31.0, 176.0, 58.0, 25.0, - 40.0, 64.0, 42.0, 96.0, 120.0, 46.0, 16.0, 64.0, 28.0, 112.0, 40.0, - 176.0, 24.0, 22.0, 200.0, 168.0, 16.0, 176.0, 8.0, 216.0, 64.0, 28.0, - 72.0, 42.0, 128.0, 48.0, 24.0, 7.0, 48.0, 96.0, 24.0, 44.0, 256.0, - 304.0, 4.0, 4.0, 50.0, 60.0, 124.0, 30.0, 0.0, 120.0, 224.0, 7.0, - 80.0, 24.0, 112.0, 432.0, 16.0, 464.0, 16.0, 168.0, 14.0, 36.0, 288.0, - 64.0, 17.0, 72.0, 6.0, 36.0, 14.0, 8.0, 52.0, 108.0, 84.0, 80.0, - 160.0, 6.0, 4.0, 56.0, 15.0, 46.0, 352.0, 288.0, 152.0, 26.0, 7.0, - 176.0, 32.0, 17.0, 76.0, 24.0, 208.0, 6.0, 50.0, 25.0, 80.0, 12.0, - 352.0, 304.0, 14.0, 192.0, 480.0, 152.0, 4.0, 36.0, 192.0, 17.0, 48.0, - 0.0, 144.0, 20.0, 12.0, 15.0, 480.0, 18.0, 9.0, 20.0, 2.0, 72.0, - 48.0, 256.0, 4.0, 112.0, 30.0, 400.0, 152.0, 256.0, 30.0, 32.0, 48.0, - 416.0, 16.0, 31.0, 23.0, 176.0, 80.0, 176.0, 3.0, 128.0, 58.0, 112.0, - 13.0, 52.0, 96.0, 352.0, 50.0, 12.0, 288.0, 176.0, 64.0, 104.0, 4.0, - 28.0, 48.0, 10.0, 92.0, 448.0, 40.0, 64.0, 7.0, 16.0, 56.0, 120.0, - 30.0, 368.0, 12.0, 384.0, 60.0, 72.0, 32.0, 23.0, 48.0, 64.0, 1.0, - 152.0, 24.0, 52.0, 84.0, 29.0, 60.0, 128.0, 64.0, 62.0, 336.0, 68.0, - 16.0, 24.0, 30.0, 16.0, 128.0, 2.0, 54.0, 14.0, 22.0, 32.0, 18.0, - 224.0, 26.0, 304.0, 18.0, 42.0, 36.0, 8.0, 5.0, 240.0, 96.0, 20.0, - 23.0, 60.0, 336.0, 44.0, 32.0, 60.0, 20.0, 20.0, 56.0, 14.0, 25.0, - 18.0, 9.0, 384.0, 64.0, 80.0, 28.0, 112.0, 6.0, 11.0, 100.0, 176.0, - 304.0, 128.0, 38.0, 64.0, 68.0, 8.0, 112.0, 56.0, 464.0, 72.0, 112.0, - 68.0, 13.0, 10.0, 10.0, 24.0, 208.0, 40.0, 124.0, 116.0, 40.0, 400.0, - 96.0, 80.0, 50.0, 96.0, 432.0, 20.0, 80.0, 124.0, 14.0, 464.0, 112.0, - 272.0, 384.0, 160.0, 24.0, 60.0, 22.0, 32.0, 80.0, 20.0, 352.0, 0.0, - 224.0, 68.0, 31.0, 17.0, 120.0, 248.0, 160.0, 16.0, 232.0, 12.0, 336.0, - 480.0, 208.0, 2.0, 27.0, 16.0, 16.0, 13.0, 10.0, 112.0, 432.0, 3.0, - 21.0, 224.0, 116.0, 40.0, 14.0, 208.0, 20.0, 76.0, 29.0, 416.0, 29.0, - 15.0, 62.0, 136.0, 400.0, 104.0, 42.0, 120.0, 416.0, 40.0, 2.0, 8.0, - 384.0, 80.0, 24.0, 128.0, 4.0, 224.0, 12.0, 4.0, 288.0, 384.0, 48.0, - 10.0, 192.0, 58.0, 256.0, 50.0, 168.0, 17.0, 17.0, 116.0, 128.0, 24.0, - 32.0, 144.0, 272.0, 208.0, 60.0, 26.0, 192.0, 496.0, 336.0, 64.0, 16.0, - 248.0, 44.0, 104.0, 42.0, 80.0, 4.0, 16.0, 152.0, 56.0, 100.0, 88.0, - 124.0, 8.0, 368.0, 24.0, 224.0, 112.0, 30.0, 28.0, 92.0, 34.0, 8.0, - 36.0, 13.0, 60.0, 28.0, 32.0, 16.0, 192.0, 104.0, 144.0, 14.0, 24.0, - 224.0, 72.0, 60.0, 8.0, 32.0, 20.0, 288.0, 60.0, 56.0, 16.0, 96.0, - 32.0, 29.0, 32.0, 80.0, 120.0, 60.0, 224.0, 34.0, 20.0, 100.0, 27.0, - 36.0, 30.0, 304.0, 20.0, 368.0, 120.0, 0.0, 1.0, 28.0, 16.0, 56.0, - 4.0, 26.0, 3.0, 80.0, 0.0, 192.0, 0.0, 0.0, 88.0, 25.0, 40.0, - 52.0, 22.0, 32.0, 108.0, 24.0, 96.0, 6.0, 216.0, 7.0, 48.0, 29.0, - 52.0, 384.0, 44.0, 432.0, 232.0, 384.0, 44.0, 208.0, 10.0, 224.0, 7.0, - 12.0, 40.0, 48.0, 96.0, 184.0, 248.0, 60.0, 128.0, 144.0, 19.0, 31.0, - 168.0, 6.0, 46.0, 12.0, 24.0, 8.0, 30.0, 21.0, 0.0, 34.0, 21.0, - 8.0, 16.0, 54.0, 25.0, 6.0, 0.0, 15.0, 6.0, 120.0, 232.0, 116.0, - 16.0, 0.0, 116.0, 240.0, 136.0, 30.0, 17.0, 2.0, 52.0, 68.0, 88.0, - 128.0, 16.0, 46.0, 64.0, 18.0, 464.0, 28.0, 56.0, 28.0, 1.0, 464.0, - 32.0, 40.0, 6.0, 62.0, 400.0, 10.0, 480.0, 12.0, 120.0, 1.0, 224.0, - 184.0, 136.0, 29.0, 52.0, 0.0, 16.0, 30.0, 192.0, 128.0, 2.0, 208.0, - 15.0, 44.0, 416.0, 18.0, 56.0, 120.0, 62.0, 32.0, 152.0, 88.0, 48.0, - 304.0, 48.0, 64.0, 92.0, 36.0, 21.0, 28.0, 32.0, 240.0, 72.0, 112.0, - 52.0, 216.0, 2.0, 16.0, 7.0, 272.0, 46.0, 40.0, 224.0, 224.0, 22.0, - 8.0, 25.0, 14.0, 20.0, 34.0, 3.0, 120.0, 96.0, 400.0, 10.0, 52.0, - 11.0, 24.0, 31.0, 112.0, 14.0, 0.0, 21.0, 32.0, 18.0, 64.0, 20.0, - 27.0, 6.0, 52.0, 4.0, 192.0, 3.0, 48.0, 8.0, 28.0, 192.0, 120.0, - 36.0, 30.0, 3.0, 240.0, 19.0, 16.0, 288.0, 30.0, 0.0, 24.0, 31.0, - 20.0, 112.0, 84.0, 8.0, 448.0, 100.0, 36.0, 100.0, 60.0, 14.0, 116.0, - 62.0, 60.0, 96.0, 432.0, 0.0, 232.0, 4.0, 84.0, 18.0, 56.0, 0.0, - 192.0, 192.0, 20.0, 8.0, 112.0, 6.0, 336.0, 13.0, 24.0, 60.0, 8.0, - 38.0, 352.0, 24.0, 12.0, 22.0, 128.0, 416.0, 336.0, 304.0, 16.0, 448.0, - 168.0, 240.0, 19.0, 4.0, 56.0, 36.0, 46.0, 31.0, 1.0, 384.0, 29.0, - 19.0, 26.0, 176.0, 16.0, 19.0, 10.0, 32.0, 184.0, 8.0, 34.0, 12.0, - 384.0, 32.0, 176.0, 4.0, 352.0, 34.0, 400.0, 56.0, 16.0, 64.0, 72.0, - 84.0, 24.0, 128.0, 3.0, 48.0, 14.0, 72.0, 116.0, 152.0, 88.0, 21.0, - 80.0, 6.0, 9.0, 2.0, 31.0, 64.0, 112.0, 448.0, 12.0, 0.0, 336.0, - 32.0, 20.0, 20.0, 232.0, 256.0, 136.0, 128.0, 32.0, 176.0, 176.0, 144.0, - 26.0, 25.0, 0.0, 9.0, 144.0, 336.0, 10.0, 448.0, 16.0, 25.0, 15.0, - 16.0, 36.0, 21.0, 8.0, 0.0, 30.0, 48.0, 104.0, 16.0, 224.0, 12.0, - 88.0, 216.0, 208.0, 192.0, 104.0, 40.0, 320.0, 2.0, 40.0, 416.0, 352.0, - 8.0, 11.0, 24.0, 208.0, 176.0, 16.0, 104.0, 20.0, 0.0, 30.0, 320.0, - 20.0, 25.0, 216.0, 112.0, 72.0, 120.0, 104.0, 88.0, 23.0, 0.0, 208.0, - 17.0, 64.0, 336.0, 112.0, 54.0, 56.0, 48.0, 50.0, 88.0, 480.0, 32.0, - 200.0, 232.0, 27.0, 8.0, 24.0, 116.0, 48.0, 224.0, 46.0, 160.0, 6.0, - 100.0, 120.0, 0.0, 72.0, 16.0, 400.0, 288.0, 44.0, 21.0, 8.0, 56.0, - 120.0, 152.0, 116.0, 272.0, 0.0, 4.0, 400.0, 7.0, 192.0, 92.0, 12.0, - 12.0, 29.0, 22.0, 12.0, 432.0, 56.0, 120.0, 64.0, 16.0, 116.0, 48.0, - 48.0, 24.0, 31.0, 21.0, 72.0, 58.0, 256.0, 32.0, 496.0, 448.0, 32.0, - 0.0, 9.0, 16.0, 50.0, 68.0, 224.0, 232.0, 32.0, 216.0, 84.0, 22.0, - 24.0, 44.0, 0.0, 128.0, 88.0, 36.0, 72.0, 224.0, 36.0, 31.0, 176.0, - 176.0, 31.0, 68.0, 11.0, 31.0, 24.0, 3.0, 8.0, 112.0, 8.0, 128.0, - 36.0, 192.0, 9.0, 56.0, 208.0, 30.0, 60.0, 20.0, 72.0, 176.0, 56.0, - 64.0, 30.0, 80.0, 16.0, 88.0, 56.0, 176.0, 80.0, 48.0, 128.0, 104.0, - 8.0, 58.0, 16.0, 7.0, 62.0, 224.0, 200.0, 64.0, 16.0, 14.0, 16.0, - 24.0, 200.0, 14.0, 60.0, 32.0, 60.0, 15.0, 30.0, 32.0, 32.0, 22.0, - 40.0, 10.0, 48.0, 1.0, 31.0, 7.0, 16.0, 24.0, 8.0, 400.0, 28.0, - 96.0, 50.0, 54.0, 124.0, 24.0, 15.0, 8.0, 14.0, 58.0, 56.0, 31.0, - 128.0, 88.0, 256.0, 2.0, 112.0, 9.0, 30.0, 20.0, 10.0, 104.0, 10.0, - 48.0, 448.0, 44.0, 12.0, 56.0, 15.0, 22.0, 12.0, 496.0, 192.0, 160.0, - 0.0, 56.0, 80.0, 56.0, 160.0, 6.0, 60.0, 24.0, 0.0, 44.0, 52.0, - 144.0, 16.0, 192.0, 38.0, 38.0, 256.0, 48.0, 144.0, 20.0, 15.0, 60.0, - 84.0, 19.0, 76.0, 400.0, 36.0, 432.0, 88.0, 25.0, 176.0, 272.0, 58.0, - 208.0, 20.0, 0.0, 4.0, 32.0, 32.0, 416.0, 16.0, 216.0, 38.0, 12.0, - 12.0, 34.0, 108.0, 8.0, 52.0, 4.0, 120.0, 36.0, 120.0, 124.0, 152.0, - 16.0, 64.0, 60.0, 112.0, 108.0, 192.0, 3.0, 112.0, 88.0, 224.0, 7.0, - 34.0, 400.0, 108.0, 32.0, 0.0, 4.0, 44.0, 432.0, 8.0, 8.0, 304.0, - 36.0, 56.0, 38.0, 38.0, 42.0, 4.0, 96.0, 116.0, 124.0, 76.0, 400.0, - 32.0, 44.0, 27.0, 24.0, 368.0, 16.0, 5.0, 24.0, 88.0, 68.0, 16.0, - 208.0, 38.0, 100.0, 2.0, 232.0, 32.0, 28.0, 6.0, 30.0, 20.0, 16.0, - 200.0, 3.0, 304.0, 200.0, 6.0, 116.0, 80.0, 16.0, 32.0, 128.0, 128.0, - 336.0, 2.0, 224.0, 32.0, 52.0, 0.0, 6.0, 168.0, 32.0, 124.0, 248.0, - 8.0, 144.0, 60.0, 16.0, 18.0, 4.0, 58.0, 8.0, 18.0, 8.0, 48.0, - 28.0, 176.0, 224.0, 112.0, 8.0, 11.0, 0.0, 17.0, 16.0, 72.0, 168.0, - 62.0, 144.0, 96.0, 28.0, 10.0, 104.0, 34.0, 46.0, 256.0, 10.0, 84.0, - 28.0, 256.0, 4.0, 96.0, 56.0, 12.0, 304.0, 352.0, 62.0, 112.0, 48.0, - 4.0, 116.0, 104.0, 96.0, 24.0, 104.0, 192.0, 120.0, 352.0, 416.0, 8.0, - 56.0, 27.0, 88.0, 68.0, 168.0, 496.0, 400.0, 44.0, 22.0, 64.0, 112.0, - 52.0, 20.0, 8.0, 80.0, 10.0, 160.0, 48.0, 29.0, 384.0, 40.0, 24.0, - 24.0, 28.0, 16.0, 208.0, 22.0, 16.0, 160.0, 4.0, 40.0, 22.0, 1.0, - 23.0, 52.0, 136.0, 464.0, 16.0, 54.0, 184.0, 8.0, 160.0, 16.0, 304.0, - 0.0, 20.0, 22.0, 25.0, 104.0, 22.0, 8.0, 272.0, 20.0, 144.0, 1.0, - 96.0, 9.0, 336.0, 120.0, 92.0, 208.0, 32.0, 18.0, 224.0, 12.0, 160.0, - 25.0, 96.0, 52.0, 16.0, 116.0, 12.0, 88.0, 56.0, 48.0, 448.0, 248.0, - 14.0, 96.0, 44.0, 24.0, 24.0, 432.0, 40.0, 368.0, 60.0, 88.0, 144.0, - 320.0, 34.0, 16.0, 116.0, 80.0, 192.0, 64.0, 100.0, 272.0, 464.0, 464.0, - 17.0, 96.0, 14.0, 0.0, 464.0, 34.0, 400.0, 50.0, 176.0, 288.0, 24.0, - 34.0, 0.0, 30.0, 8.0, 112.0, 14.0, 496.0, 18.0, 0.0, 320.0, 160.0, - 48.0, 26.0, 56.0, 24.0, 108.0, 144.0, 128.0, 168.0, 20.0, 10.0, 208.0, - 84.0, 27.0, 8.0, 6.0, 18.0, 112.0, 160.0, 24.0, 0.0, 28.0, 64.0, - 320.0, 368.0, 208.0, 40.0, 27.0, 28.0, 272.0, 11.0, 40.0, 400.0, 192.0, - 368.0, 72.0, 0.0, 3.0, 36.0, 34.0, 1.0, 15.0, 200.0, 27.0, 112.0, - 104.0, 96.0, 8.0, 16.0, 432.0, 30.0, 1.0, 80.0, 2.0, 28.0, 24.0, - 30.0, 27.0, 28.0, 496.0, 9.0, 480.0, 20.0, 38.0, 76.0, 40.0, 10.0, - 14.0, 13.0, 12.0, 2.0, 136.0, 20.0, 32.0, 184.0, 168.0, 480.0, 28.0, - 54.0, 208.0, 256.0, 176.0, 11.0, 42.0, 176.0, 120.0, 192.0, 10.0, 28.0, - 25.0, 128.0, 368.0, 4.0, 8.0, 0.0, 8.0, 28.0, 8.0, 160.0, 36.0, - 64.0, 208.0, 62.0, 28.0, 28.0, 16.0, 192.0, 9.0, 27.0, 24.0, 6.0, - 44.0, 8.0, 62.0, 464.0, 48.0, 15.0, 32.0, 32.0, 224.0, 16.0, 200.0, - 4.0, 18.0, 36.0, 14.0, 24.0, 19.0, 0.0, 8.0, 46.0, 56.0, 76.0, - 48.0, 2.0, 496.0, 16.0, 16.0, 25.0, 48.0, 8.0, 80.0, 52.0, 50.0, - 136.0, 48.0, 304.0, 160.0, 28.0, 232.0, 9.0, 19.0, 88.0, 192.0, 36.0, - 232.0, 224.0, 18.0, 6.0, 60.0, 14.0, 160.0, 8.0, 15.0, 72.0, 6.0, - 4.0, 14.0, 42.0, 14.0, 336.0, 288.0, 84.0, 14.0, 256.0, 24.0, 128.0, - 240.0, 16.0, 1.0, 20.0, 336.0, 56.0, 40.0, 80.0, 368.0, 320.0, 272.0, - 100.0, 208.0, 8.0, 248.0, 48.0, 11.0, 96.0, 60.0, 208.0, 128.0, 29.0, - 176.0, 6.0, 320.0, 24.0, 76.0, 20.0, 368.0, 112.0, 60.0, 48.0, 20.0, - 58.0, 496.0, 160.0, 272.0, 480.0, 10.0, 36.0, 200.0, 80.0, 29.0, 8.0, - 40.0, -}; -float verify_data[O_SIZE] = { - 905.0, 767.0, 714.0, 621.0, 1045.0, 671.0, 656.0, 714.0, 796.0, - 1016.0, 796.0, 636.0, 488.0, 978.0, 1001.0, 1261.0, 609.0, 608.0, - 984.0, 964.0, 1276.0, 751.0, 713.0, 463.0, 718.0, 1428.0, 1492.0, - 1238.0, 516.0, 936.0, 928.0, 1006.0, 750.0, 581.0, 579.0, 484.0, - 624.0, 601.0, 412.0, 457.0, 353.0, 350.0, 171.0, 500.0, 552.0, - 993.0, 699.0, 1174.0, 1160.0, 1228.0, 1057.0, 659.0, 769.0, 568.0, - 690.0, 700.0, 984.0, 886.0, 1426.0, 1690.0, 1682.0, 1186.0, 694.0, - 780.0, 544.0, 992.0, 841.0, 882.0, 430.0, 761.0, 1221.0, 1201.0, - 879.0, 408.0, 431.0, 721.0, 893.0, 878.0, 910.0, 751.0, 1111.0, - 885.0, 950.0, 522.0, 464.0, 385.0, 561.0, 1079.0, 1214.0, 1032.0, - 626.0, 922.0, 1030.0, 1296.0, 988.0, 992.0, 1050.0, 1670.0, 1218.0, - 1049.0, 1013.0, 922.0, 1459.0, 1065.0, 1054.0, 1134.0, 1106.0, 1570.0, - 1138.0, 830.0, 478.0, 1080.0, 1326.0, 1350.0, 630.0, 428.0, 543.0, - 883.0, 1109.0, 845.0, 653.0, 873.0, 1014.0, 1114.0, 870.0, 830.0, - 612.0, 1056.0, 1134.0, 1121.0, 569.0, 407.0, 548.0, 561.0, 572.0, - 497.0, 366.0, 735.0, 743.0, 923.0, 598.0, 841.0, 713.0, 976.0, - 636.0, 945.0, 865.0, 873.0, 987.0, 738.0, 616.0, 344.0, 555.0, - 628.0, 861.0, 676.0, 1217.0, 1447.0, 1368.0, 1068.0, 772.0, 932.0, - 1090.0, 1163.0, 1092.0, 737.0, 564.0, 691.0, 829.0, 733.0, 535.0, - 368.0, 618.0, 920.0, 908.0, 782.0, 754.0, 788.0, 820.0, 575.0, - 609.0, 465.0, 518.0, 474.0, 358.0, 988.0, 931.0, 1357.0, 815.0, - 1218.0, 1046.0, 1228.0, 963.0, 647.0, 1181.0, 2022.0, 785.0, 756.0, - 808.0, 782.0, 1279.0, 945.0, 964.0, 1312.0, 1380.0, 1766.0, 1120.0, - 812.0, 552.0, 1126.0, 1606.0, 1485.0, 1227.0, 785.0, 1277.0, 1139.0, - 1159.0, 701.0, 513.0, 1029.0, 1144.0, 1464.0, 1055.0, 1003.0, 803.0, - 1080.0, 1636.0, 1660.0, 1300.0, 602.0, 442.0, 423.0, 537.0, 854.0, - 947.0, 1108.0, 804.0, 1019.0, 1037.0, 1140.0, 800.0, 901.0, 650.0, - 666.0, 644.0, 647.0, 1339.0, 1262.0, 1180.0, 656.0, 567.0, 864.0, - 1073.0, 852.0, 1161.0, 1139.0, 1536.0, 1228.0, 988.0, 634.0, 882.0, - 1223.0, 1268.0, 822.0, 493.0, 455.0, 617.0, 515.0, 580.0, 372.0, - 722.0, 732.0, 724.0, 720.0, 788.0, 880.0, 766.0, 1007.0, 1073.0, - 915.0, 774.0, 635.0, 523.0, 983.0, 949.0, 1411.0, 795.0, 1322.0, - 1184.0, 1134.0, 619.0, 279.0, 1121.0, 1716.0, 925.0, 699.0, 951.0, - 962.0, 1071.0, 695.0, 709.0, 1458.0, 1226.0, 1580.0, 782.0, 664.0, - 392.0, 810.0, 1304.0, 1215.0, 1251.0, 881.0, 1197.0, 1169.0, 1011.0, - 754.0, 682.0, 1241.0, 1203.0, 1239.0, 731.0, 859.0, 771.0, 1005.0, - 1289.0, 1233.0, 1092.0, 1074.0, 986.0, 886.0, 538.0, 861.0, 993.0, - 1166.0, 895.0, 1071.0, 1516.0, 1436.0, 1205.0, 539.0, 495.0, 361.0, - 641.0, 693.0, 1426.0, 1451.0, 1255.0, 736.0, 617.0, 940.0, 763.0, - 491.0, 389.0, 889.0, 1391.0, 1358.0, 983.0, 557.0, 1041.0, 1005.0, - 1026.0, 359.0, 550.0, 626.0, 701.0, 429.0, 392.0, 340.0, 700.0, - 590.0, 556.0, 572.0, 690.0, 780.0, 382.0, 668.0, 771.0, 1249.0, - 1139.0, 1195.0, 791.0, 975.0, 849.0, 1355.0, 1083.0, 1534.0, 1248.0, - 1082.0, 562.0, 406.0, 1036.0, 1496.0, 787.0, 626.0, 790.0, 797.0, - 827.0, 875.0, 831.0, 1494.0, 1126.0, 1210.0, 545.0, 513.0, 699.0, - 820.0, 1012.0, 751.0, 1113.0, 848.0, 1155.0, 729.0, 646.0, 454.0, - 782.0, 1081.0, 1000.0, 1140.0, 760.0, 939.0, 691.0, 837.0, 1069.0, - 1073.0, 1052.0, 1012.0, 792.0, 710.0, 370.0, 679.0, 873.0, 896.0, - 607.0, 586.0, 1215.0, 1155.0, 1197.0, 505.0, 476.0, 271.0, 527.0, - 840.0, 1368.0, 1476.0, 1152.0, 1013.0, 805.0, 1097.0, 740.0, 812.0, - 539.0, 1043.0, 1233.0, 1402.0, 868.0, 424.0, 502.0, 764.0, 799.0, - 716.0, 712.0, 784.0, 637.0, 369.0, 349.0, 306.0, 538.0, 400.0, - 375.0, 351.0, 480.0, 586.0, 508.0, 929.0, 1052.0, 1546.0, 1294.0, - 1296.0, 820.0, 979.0, 850.0, 1156.0, 917.0, 1159.0, 871.0, 706.0, - 411.0, 569.0, 981.0, 1192.0, 1144.0, 516.0, 700.0, 1172.0, 1202.0, - 1368.0, 860.0, 1334.0, 954.0, 949.0, 440.0, 382.0, 624.0, 949.0, - 907.0, 650.0, 604.0, 711.0, 787.0, 637.0, 410.0, 668.0, 1284.0, - 1525.0, 1206.0, 815.0, 521.0, 716.0, 443.0, 495.0, 509.0, 515.0, - 564.0, 786.0, 900.0, 785.0, 415.0, 372.0, 462.0, 732.0, 689.0, - 611.0, 1209.0, 1211.0, 1476.0, 564.0, 488.0, 264.0, 416.0, 665.0, - 761.0, 937.0, 643.0, 939.0, 887.0, 944.0, 664.0, 652.0, 619.0, - 1131.0, 993.0, 1290.0, 692.0, 538.0, 410.0, 599.0, 667.0, 1054.0, - 1211.0, 1256.0, 723.0, 421.0, 421.0, 388.0, 646.0, 526.0, 491.0, - 259.0, 344.0, 372.0, 524.0, 587.0, 650.0, 968.0, 968.0, 1183.0, - 813.0, 1188.0, 948.0, 1188.0, 875.0, 979.0, 693.0, 570.0, 893.0, - 1263.0, 1635.0, 1426.0, 986.0, 561.0, 361.0, 777.0, 919.0, 1343.0, - 1117.0, 1298.0, 1156.0, 911.0, 594.0, 386.0, 1044.0, 1297.0, 1247.0, - 576.0, 560.0, 615.0, 599.0, 335.0, 292.0, 586.0, 1096.0, 1526.0, - 1273.0, 884.0, 429.0, 544.0, 492.0, 452.0, 610.0, 595.0, 612.0, - 310.0, 424.0, 348.0, 375.0, 521.0, 569.0, 864.0, 664.0, 977.0, - 1149.0, 1215.0, 960.0, 518.0, 530.0, 468.0, 734.0, 943.0, 1197.0, - 1183.0, 801.0, 825.0, 815.0, 804.0, 612.0, 1072.0, 1056.0, 1150.0, - 450.0, 626.0, 521.0, 441.0, 309.0, 335.0, 579.0, 1097.0, 1159.0, - 1188.0, 650.0, 801.0, 679.0, 736.0, 604.0, 488.0, 445.0, 725.0, - 848.0, 834.0, 890.0, 897.0, 953.0, 615.0, 703.0, 672.0, 599.0, - 942.0, 881.0, 1064.0, 451.0, 679.0, 385.0, 536.0, 798.0, 1169.0, - 1359.0, 1077.0, 831.0, 766.0, 592.0, 1010.0, 854.0, 963.0, 897.0, - 1036.0, 1188.0, 1309.0, 955.0, 999.0, 1387.0, 1533.0, 1371.0, 490.0, - 558.0, 609.0, 597.0, 354.0, 715.0, 951.0, 1384.0, 1202.0, 990.0, - 601.0, 326.0, 296.0, 544.0, 680.0, 998.0, 823.0, 764.0, 482.0, - 470.0, 304.0, 325.0, 607.0, 619.0, 785.0, 571.0, 1123.0, 1690.0, - 1701.0, 1189.0, 431.0, 687.0, 1037.0, 1531.0, 1335.0, 1229.0, 959.0, - 1001.0, 865.0, 815.0, 796.0, 735.0, 1251.0, 977.0, 1111.0, 473.0, - 639.0, 510.0, 454.0, 268.0, 287.0, 541.0, 1053.0, 1149.0, 1146.0, - 692.0, 795.0, 946.0, 1078.0, 1028.0, 638.0, 742.0, 974.0, 1086.0, - 1174.0, 1072.0, 1083.0, 751.0, 475.0, 534.0, 554.0, 523.0, 1063.0, - 1073.0, 1082.0, 396.0, 866.0, 986.0, 1232.0, 1160.0, 1157.0, 993.0, - 565.0, 376.0, 985.0, 783.0, 733.0, 401.0, 588.0, 1194.0, 1060.0, - 1078.0, 1014.0, 917.0, 1009.0, 1365.0, 1408.0, 1214.0, 718.0, 770.0, - 700.0, 314.0, 175.0, 871.0, 849.0, 963.0, 663.0, 931.0, 1008.0, - 760.0, 564.0, 685.0, 1062.0, 1194.0, 988.0, 567.0, 399.0, 334.0, - 245.0, 228.0, 578.0, 552.0, 519.0, 439.0, 1114.0, 1516.0, 1393.0, - 738.0, 365.0, 662.0, 997.0, 1437.0, 1639.0, 1703.0, 1470.0, 1150.0, - 872.0, 1242.0, 1145.0, 1254.0, 1566.0, 1361.0, 1123.0, 373.0, 344.0, - 439.0, 343.0, 394.0, 550.0, 758.0, 824.0, 574.0, 596.0, 968.0, - 1208.0, 1268.0, 898.0, 788.0, 456.0, 668.0, 984.0, 1300.0, 1492.0, - 1758.0, 1575.0, 1071.0, 545.0, 516.0, 484.0, 483.0, 1055.0, 1129.0, - 1022.0, 846.0, 1188.0, 1308.0, 1032.0, 588.0, 529.0, 341.0, 363.0, - 854.0, 1421.0, 1055.0, 973.0, 361.0, 738.0, 1258.0, 1336.0, 988.0, - 1288.0, 1069.0, 1511.0, 1191.0, 1408.0, 942.0, 816.0, 1044.0, 1002.0, - 816.0, 373.0, 1021.0, 1097.0, 1174.0, 668.0, 630.0, 796.0, 756.0, - 702.0, 778.0, 1544.0, 1504.0, 1299.0, 478.0, 375.0, 598.0, 536.0, - 695.0, 701.0, 900.0, 910.0, 844.0, 1028.0, 1202.0, 1033.0, 662.0, - 321.0, 530.0, 909.0, 1083.0, 1157.0, 1093.0, 1056.0, 980.0, 720.0, - 1508.0, 1647.0, 2024.0, 1556.0, 1405.0, 1063.0, 839.0, 560.0, 701.0, - 482.0, 609.0, 625.0, 675.0, 707.0, 844.0, 781.0, 1069.0, 674.0, - 996.0, 764.0, 890.0, 676.0, 716.0, 560.0, 650.0, 1018.0, 1600.0, - 1530.0, 938.0, 624.0, 460.0, 403.0, 753.0, 1155.0, 1366.0, 784.0, - 1006.0, 1124.0, 1544.0, 1328.0, 1364.0, 1060.0, 736.0, 816.0, 844.0, - 1016.0, 1128.0, 1047.0, 711.0, 595.0, 1049.0, 1302.0, 1196.0, 1252.0, - 1021.0, 973.0, 505.0, 698.0, 502.0, 958.0, 1159.0, 1544.0, 1246.0, - 813.0, 598.0, 900.0, 1278.0, 1200.0, 1334.0, 1412.0, 1540.0, 1100.0, - 1000.0, 1404.0, 1438.0, 1107.0, 652.0, 529.0, 880.0, 786.0, 1053.0, - 921.0, 1030.0, 943.0, 893.0, 873.0, 616.0, 524.0, 427.0, 454.0, - 522.0, 398.0, 372.0, 608.0, 845.0, 958.0, 792.0, 557.0, 1283.0, - 1646.0, 2343.0, 2139.0, 2112.0, 1592.0, 1220.0, 660.0, 820.0, 538.0, - 766.0, 630.0, 659.0, 481.0, 744.0, 841.0, 1224.0, 757.0, 747.0, - 486.0, 532.0, 537.0, 497.0, 543.0, 654.0, 802.0, 1294.0, 1508.0, - 1279.0, 815.0, 393.0, 434.0, 938.0, 990.0, 1013.0, 421.0, 867.0, - 824.0, 1236.0, 892.0, 1180.0, 880.0, 744.0, 816.0, 867.0, 1312.0, - 1592.0, 1383.0, 739.0, 480.0, 770.0, 1023.0, 1070.0, 1338.0, 1254.0, - 1180.0, 744.0, 1150.0, 968.0, 1216.0, 1177.0, 1861.0, 1713.0, 1152.0, - 386.0, 716.0, 1277.0, 1293.0, 1321.0, 1308.0, 1460.0, 1068.0, 920.0, - 1156.0, 1150.0, 840.0, 663.0, 646.0, 972.0, 904.0, 1145.0, 873.0, - 1112.0, 1048.0, 960.0, 1114.0, 882.0, 892.0, 434.0, 402.0, 524.0, - 467.0, 445.0, 755.0, 822.0, 780.0, 492.0, 349.0, 757.0, 1173.0, - 1802.0, 1886.0, 1796.0, 1844.0, 1608.0, 1235.0, 1259.0, 1081.0, 1246.0, - 588.0, 655.0, 605.0, 1059.0, 1374.0, 1085.0, 572.0, 211.0, 550.0, - 651.0, 779.0, 559.0, 651.0, 472.0, 560.0, 494.0, 920.0, 883.0, - 905.0, 489.0, 546.0, 865.0, 1045.0, 954.0, 619.0, 501.0, 500.0, - 836.0, 910.0, 1322.0, 894.0, 964.0, 924.0, 435.0, 952.0, 1752.0, - 1635.0, 1047.0, 210.0, 252.0, 697.0, 1128.0, 1432.0, 1174.0, 776.0, - 520.0, 900.0, 957.0, 1409.0, 1102.0, 1841.0, 1286.0, 1281.0, 623.0, - 900.0, 1236.0, 1110.0, 1840.0, 1740.0, 2152.0, 1246.0, 1150.0, 848.0, - 871.0, 522.0, 590.0, 583.0, 618.0, 578.0, 860.0, 814.0, 821.0, - 741.0, 767.0, 1092.0, 900.0, 856.0, 594.0, 678.0, 992.0, 871.0, - 923.0, 1121.0, 984.0, 1180.0, 888.0, 831.0, 391.0, 582.0, 1014.0, - 1432.0, 1523.0, 1670.0, 1520.0, 1145.0, 1083.0, 928.0, 1047.0, 751.0, - 906.0, 884.0, 844.0, 968.0, 719.0, 578.0, 355.0, 526.0, 937.0, - 915.0, 1011.0, 805.0, 632.0, 536.0, 362.0, 886.0, 863.0, 888.0, - 476.0, 743.0, 741.0, 907.0, 564.0, 709.0, 429.0, 592.0, 844.0, - 1038.0, 1058.0, 682.0, 736.0, 548.0, 442.0, 926.0, 1278.0, 1148.0, - 664.0, 315.0, 351.0, 729.0, 809.0, 981.0, 669.0, 676.0, 580.0, - 1016.0, 987.0, 1304.0, 969.0, 1344.0, 926.0, 1027.0, 833.0, 904.0, - 724.0, 686.0, 1504.0, 1568.0, 1658.0, 986.0, 852.0, 794.0, 643.0, - 818.0, 768.0, 1029.0, 898.0, 1082.0, 919.0, 841.0, 824.0, 1073.0, - 1079.0, 1176.0, 842.0, 828.0, 546.0, 650.0, 954.0, 1000.0, 1218.0, - 1822.0, 1596.0, 1742.0, 812.0, 788.0, 326.0, 269.0, 494.0, 484.0, - 733.0, 1144.0, 1486.0, 1408.0, 1178.0, 1050.0, 1084.0, 904.0, 824.0, - 1306.0, 1366.0, 1474.0, 954.0, 757.0, 532.0, 499.0, 816.0, 861.0, - 888.0, 1000.0, 852.0, 850.0, 492.0, 534.0, 544.0, 487.0, 499.0, - 1109.0, 1009.0, 1319.0, 695.0, 904.0, 514.0, 514.0, 962.0, 1330.0, - 1914.0, 1321.0, 1208.0, 716.0, 445.0, 401.0, 737.0, 776.0, 852.0, - 610.0, 550.0, 994.0, 909.0, 1093.0, 469.0, 472.0, 348.0, 512.0, - 515.0, 679.0, 530.0, 587.0, 380.0, 757.0, 1109.0, 1169.0, 765.0, - 567.0, 1260.0, 1258.0, 1304.0, 778.0, 807.0, 651.0, 450.0, 736.0, - 778.0, 1026.0, 831.0, 1339.0, 1287.0, 1242.0, 905.0, 1057.0, 1058.0, - 699.0, 334.0, 452.0, 634.0, 836.0, 983.0, 1407.0, 1607.0, 1752.0, - 1112.0, 1312.0, 912.0, 950.0, 410.0, 445.0, 812.0, 804.0, 913.0, - 666.0, 1010.0, 962.0, 974.0, 986.0, 842.0, 940.0, 625.0, 1025.0, - 1087.0, 1302.0, 1274.0, 1078.0, 879.0, 582.0, 864.0, 835.0, 850.0, - 904.0, 834.0, 940.0, 1044.0, 1054.0, 1136.0, 623.0, 973.0, 1575.0, - 1495.0, 1501.0, 797.0, 960.0, 610.0, 514.0, 1122.0, 1808.0, 2280.0, - 1767.0, 1088.0, 672.0, 457.0, 363.0, 339.0, 382.0, 572.0, 592.0, - 984.0, 1188.0, 992.0, 732.0, 408.0, 746.0, 569.0, 809.0, 563.0, - 582.0, 212.0, 196.0, 235.0, 705.0, 1421.0, 1573.0, 1130.0, 628.0, - 965.0, 986.0, 713.0, 353.0, 402.0, 501.0, 393.0, 854.0, 1006.0, - 1351.0, 927.0, 1467.0, 1101.0, 1074.0, 690.0, 892.0, 1177.0, 819.0, - 610.0, 532.0, 553.0, 540.0, 871.0, 1456.0, 1711.0, 1584.0, 1088.0, - 980.0, 676.0, 580.0, 462.0, 710.0, 1058.0, 1062.0, 648.0, 482.0, - 722.0, 1066.0, 1284.0, 1358.0, 1072.0, 804.0, 517.0, 911.0, 1047.0, - 1321.0, 1291.0, 1135.0, 973.0, 682.0, 506.0, 465.0, 398.0, 854.0, - 750.0, 992.0, 1168.0, 1089.0, 1053.0, 503.0, 890.0, 1352.0, 1232.0, - 1256.0, 638.0, 732.0, 446.0, 316.0, 904.0, 1378.0, 2272.0, 1895.0, - 1376.0, 668.0, 841.0, 747.0, 443.0, 632.0, 1152.0, 1076.0, 1292.0, - 1084.0, 945.0, 637.0, 393.0, 1074.0, 929.0, 1159.0, 585.0, 747.0, - 580.0, 504.0, 376.0, 541.0, 1229.0, 1335.0, 1000.0, 446.0, 535.0, - 528.0, 385.0, 171.0, 418.0, 459.0, 451.0, 529.0, 492.0, 595.0, - 348.0, 873.0, 886.0, 819.0, 531.0, 956.0, 1377.0, 1289.0, 1258.0, - 1120.0, 1071.0, 704.0, 1103.0, 1848.0, 1783.0, 1576.0, 860.0, 852.0, - 724.0, 611.0, 529.0, 623.0, 900.0, 1140.0, 754.0, 648.0, 624.0, - 1020.0, 1258.0, 1256.0, 1010.0, 734.0, 627.0, 441.0, 701.0, 943.0, - 1243.0, 1023.0, 879.0, 734.0, 522.0, 523.0, 760.0, 884.0, 754.0, - 658.0, 1110.0, 1057.0, 1011.0, 485.0, 918.0, 1066.0, 983.0, 793.0, - 599.0, 924.0, 796.0, 620.0, 536.0, 828.0, 1252.0, 1356.0, 1116.0, - 641.0, 1220.0, 1222.0, 950.0, 842.0, 1508.0, 1456.0, 1548.0, 1068.0, - 887.0, 915.0, 791.0, 1512.0, 1219.0, 1463.0, 913.0, 812.0, 635.0, - 861.0, 1212.0, 1393.0, 1669.0, 1622.0, 1260.0, 670.0, 630.0, 636.0, - 539.0, 131.0, 423.0, 554.0, 552.0, 490.0, 383.0, 516.0, 352.0, - 555.0, 502.0, 448.0, 332.0, 1176.0, 1616.0, 1660.0, 1376.0, 1100.0, - 1199.0, 732.0, 1260.0, 1439.0, 1402.0, 1238.0, 930.0, 1254.0, 1010.0, - 1167.0, 753.0, 741.0, 472.0, 698.0, 552.0, 749.0, 775.0, 925.0, - 1038.0, 1072.0, 958.0, 1150.0, 1014.0, 972.0, 916.0, 751.0, 691.0, - 615.0, 976.0, 1179.0, 1079.0, 889.0, 1078.0, 884.0, 860.0, 564.0, - 660.0, 723.0, 965.0, 799.0, 908.0, 736.0, 790.0, 491.0, 389.0, - 819.0, 818.0, 718.0, 309.0, 355.0, 939.0, 960.0, 1036.0, 629.0, - 1258.0, 1170.0, 1264.0, 1028.0, 1708.0, 1796.0, 1468.0, 874.0, 454.0, - 786.0, 682.0, 1158.0, 1090.0, 1264.0, 980.0, 1108.0, 1168.0, 1246.0, - 1430.0, 1286.0, 1115.0, 797.0, 698.0, 737.0, 569.0, 656.0, 406.0, - 312.0, 544.0, 653.0, 749.0, 603.0, 501.0, 558.0, 458.0, 611.0, - 692.0, 762.0, 926.0, 1477.0, 1463.0, 1571.0, 1362.0, 1295.0, 1199.0, - 847.0, 1003.0, 1041.0, 819.0, 1124.0, 778.0, 1200.0, 880.0, 1129.0, - 698.0, 490.0, 218.0, 533.0, 753.0, 967.0, 987.0, 892.0, 711.0, - 656.0, 667.0, 1081.0, 860.0, 832.0, 972.0, 875.0, 823.0, 795.0, - 969.0, 1146.0, 876.0, 753.0, 980.0, 765.0, 1163.0, 1027.0, 1514.0, - 1306.0, 1396.0, 848.0, 1216.0, 1054.0, 1385.0, 787.0, 671.0, 898.0, - 896.0, 842.0, 241.0, 613.0, 771.0, 1264.0, 1064.0, 1249.0, 1168.0, - 1062.0, 1236.0, 1076.0, 1314.0, 1442.0, 1258.0, 1178.0, 1020.0, 1158.0, - 967.0, 737.0, 663.0, 1276.0, 1314.0, 1378.0, 910.0, 1062.0, 1303.0, - 1319.0, 1132.0, 923.0, 816.0, 1091.0, 1179.0, 1182.0, 792.0, 356.0, - 460.0, 445.0, 552.0, 755.0, 810.0, 930.0, 589.0, 557.0, 506.0, - 1072.0, 1396.0, 1597.0, 1051.0, 1375.0, 1128.0, 1087.0, 609.0, 711.0, - 901.0, 729.0, 499.0, 356.0, 362.0, 771.0, 752.0, 1200.0, 762.0, - 1135.0, 676.0, 751.0, 519.0, 525.0, 751.0, 624.0, 663.0, 643.0, - 600.0, 882.0, 708.0, 850.0, 794.0, 969.0, 842.0, 1022.0, 896.0, - 1493.0, 1293.0, 1151.0, 755.0, 414.0, 788.0, 882.0, 1371.0, 1342.0, - 1444.0, 1076.0, 1232.0, 950.0, 1219.0, 689.0, 571.0, 508.0, 518.0, - 522.0, 235.0, 587.0, 759.0, 1452.0, 1156.0, 1496.0, 687.0, 769.0, - 914.0, 1240.0, 1074.0, 1160.0, 922.0, 994.0, 936.0, 832.0, 697.0, - 447.0, 517.0, 1116.0, 1075.0, 1248.0, 827.0, 890.0, 700.0, 785.0, - 646.0, 695.0, 543.0, 968.0, 1035.0, 998.0, 632.0, 377.0, 422.0, - 325.0, 476.0, 758.0, 853.0, 1226.0, 845.0, 733.0, 727.0, 1450.0, - 1858.0, 1389.0, 663.0, 975.0, 1296.0, 1483.0, 985.0, 1235.0, 1265.0, - 1153.0, 517.0, 354.0, 694.0, 687.0, 668.0, 962.0, 950.0, 1317.0, - 836.0, 1067.0, 754.0, 617.0, 815.0, 764.0, 704.0, 267.0, 228.0, - 278.0, 553.0, 786.0, 1102.0, 1377.0, 1146.0, 1154.0, 680.0, 1079.0, - 1063.0, 1027.0, 793.0, 626.0, 1158.0, 1488.0, 1717.0, 1656.0, 1150.0, - 781.0, 570.0, 638.0, 1106.0, 860.0, 1272.0, 849.0, 1028.0, 744.0, - 702.0, 806.0, 621.0, 1413.0, 1161.0, 1666.0, 1165.0, 1147.0, 798.0, - 1428.0, 1202.0, 1200.0, 537.0, 630.0, 843.0, 756.0, 638.0, 393.0, - 343.0, 860.0, 795.0, 768.0, 643.0, 818.0, 884.0, 733.0, 687.0, - 756.0, 820.0, 981.0, 1148.0, 963.0, 976.0, 565.0, 682.0, 380.0, - 467.0, 505.0, 737.0, 1028.0, 1215.0, 1051.0, 1107.0, 1226.0, 1326.0, - 1036.0, 580.0, 792.0, 1138.0, 1684.0, 1416.0, 1526.0, 1184.0, 1060.0, - 518.0, 284.0, 690.0, 625.0, 572.0, 902.0, 953.0, 1296.0, 808.0, - 920.0, 563.0, 375.0, 537.0, 781.0, 777.0, 583.0, 339.0, 370.0, - 501.0, 742.0, 1115.0, 1422.0, 1197.0, 963.0, 689.0, 1064.0, 1232.0, - 1025.0, 903.0, 985.0, 1541.0, 1495.0, 1019.0, 1110.0, 873.0, 1062.0, - 354.0, 857.0, 982.0, 1076.0, 1108.0, 940.0, 1294.0, 1066.0, 1036.0, - 579.0, 240.0, 603.0, 650.0, 1009.0, 892.0, 953.0, 832.0, 1646.0, - 1400.0, 1346.0, 347.0, 372.0, 307.0, 466.0, 417.0, 430.0, 686.0, - 737.0, 732.0, 254.0, 476.0, 635.0, 775.0, 1132.0, 1140.0, 1538.0, - 1146.0, 973.0, 612.0, 695.0, 892.0, 901.0, 794.0, 606.0, 496.0, - 337.0, 615.0, 780.0, 1280.0, 1196.0, 1337.0, 1104.0, 932.0, 582.0, - 606.0, 557.0, 1075.0, 1443.0, 1528.0, 1448.0, 872.0, 1072.0, 706.0, - 661.0, 775.0, 1099.0, 1032.0, 1150.0, 816.0, 776.0, 792.0, 936.0, - 1005.0, 669.0, 670.0, 894.0, 694.0, 564.0, 330.0, 737.0, 921.0, - 1474.0, 1623.0, 1574.0, 1014.0, 615.0, 629.0, 641.0, 988.0, 761.0, - 1065.0, 1105.0, 1658.0, 1654.0, 1448.0, 1470.0, 1103.0, 950.0, 374.0, - 1035.0, 1154.0, 1220.0, 1102.0, 1122.0, 1412.0, 1316.0, 1484.0, 1299.0, - 868.0, 580.0, 859.0, 1103.0, 878.0, 775.0, 628.0, 1584.0, 1452.0, - 1764.0, 602.0, 689.0, 271.0, 473.0, 414.0, 892.0, 1016.0, 1357.0, - 929.0, 634.0, 777.0, 1191.0, 1274.0, 1398.0, 1254.0, 1552.0, 1312.0, - 1235.0, 1393.0, 1337.0, 1291.0, 882.0, 1178.0, 1201.0, 1132.0, 547.0, - 635.0, 452.0, 1002.0, 916.0, 1345.0, 932.0, 726.0, 420.0, 607.0, - 648.0, 850.0, 1019.0, 960.0, 778.0, 450.0, 733.0, 729.0, 548.0, - 301.0, 687.0, 880.0, 984.0, 800.0, 626.0, 1006.0, 910.0, 924.0, - 658.0, 454.0, 678.0, 444.0, 768.0, 542.0, 1025.0, 834.0, 1324.0, - 1309.0, 1206.0, 698.0, 399.0, 745.0, 755.0, 886.0, 567.0, 880.0, - 912.0, 1168.0, 1007.0, 1449.0, 1548.0, 1345.0, 1127.0, 866.0, 1484.0, - 1116.0, 1079.0, 1034.0, 1470.0, 1544.0, 1268.0, 1120.0, 1377.0, 1111.0, - 731.0, 994.0, 1121.0, 318.0, 479.0, 538.0, 1270.0, 1150.0, 1580.0, - 823.0, 939.0, 398.0, 581.0, 441.0, 1194.0, 1336.0, 1609.0, 969.0, - 686.0, 586.0, 964.0, 911.0, 1289.0, 1045.0, 1515.0, 1088.0, 1102.0, - 1336.0, 1374.0, 1143.0, 586.0, 1192.0, 1545.0, 1427.0, 786.0, 766.0, - 633.0, 751.0, 534.0, 966.0, 1029.0, 978.0, 720.0, 659.0, 696.0, - 722.0, 474.0, 355.0, 267.0, 305.0, 726.0, 722.0, 694.0, 387.0, - 804.0, 959.0, 1125.0, 948.0, 878.0, 1144.0, 1188.0, 1087.0, 1085.0, - 745.0, 778.0, 502.0, 746.0, 720.0, 986.0, 916.0, 1396.0, 1142.0, - 988.0, 580.0, 491.0, 651.0, 573.0, 716.0, 729.0, 958.0, 748.0, - 512.0, 387.0, 1067.0, 1295.0, 1491.0, 1439.0, 1534.0, 1353.0, 1108.0, - 864.0, 1197.0, 1262.0, 1174.0, 1036.0, 976.0, 1454.0, 1146.0, 867.0, - 1095.0, 1216.0, 727.0, 631.0, 822.0, 974.0, 883.0, 1221.0, 804.0, - 876.0, 279.0, 443.0, 384.0, 1417.0, 1258.0, 1920.0, 1050.0, 1162.0, - 787.0, 1435.0, 1411.0, 1537.0, 1023.0, 1165.0, 854.0, 778.0, 1164.0, - 1114.0, 1147.0, 505.0, 1395.0, 1998.0, 2171.0, 1329.0, 673.0, 499.0, - 501.0, 646.0, 1241.0, 1285.0, 1034.0, 577.0, 423.0, 561.0, 518.0, - 491.0, 345.0, 285.0, 361.0, 472.0, 500.0, 497.0, 378.0, 345.0, - 461.0, 687.0, 1088.0, 1126.0, 1028.0, 928.0, 683.0, 881.0, 628.0, - 857.0, 741.0, 1018.0, 772.0, 982.0, 720.0, 836.0, 454.0, 361.0, - 405.0, 521.0, 740.0, 728.0, 706.0, 754.0, 729.0, 639.0, 542.0, - 505.0, 1345.0, 1367.0, 1855.0, 1585.0, 1840.0, 1271.0, 1389.0, 991.0, - 1588.0, 1204.0, 1166.0, 820.0, 936.0, 1630.0, 1384.0, 1156.0, 936.0, - 1222.0, 1180.0, 710.0, 1004.0, 756.0, 773.0, 461.0, 533.0, 689.0, - 569.0, 464.0, 725.0, 1295.0, 1396.0, 1376.0, 1068.0, 1070.0, 711.0, - 1037.0, 979.0, 1177.0, 693.0, 909.0, 710.0, 558.0, 488.0, 479.0, - 713.0, 496.0, 1193.0, 1625.0, 1719.0, 1085.0, 587.0, 503.0, 492.0, - 1087.0, 1548.0, 1849.0, 1152.0, 1035.0, 656.0, 756.0, 437.0, 567.0, - 593.0, 715.0, 479.0, 493.0, 403.0, 550.0, 406.0, 317.0, 211.0, - 343.0, 592.0, 804.0, 718.0, 770.0, 555.0, 765.0, 554.0, 1003.0, - 1003.0, 1099.0, 1079.0, 1079.0, 998.0, 704.0, 396.0, 713.0, 693.0, - 775.0, 518.0, 498.0, 388.0, 530.0, 578.0, 862.0, 752.0, 804.0, - 1100.0, 1001.0, 1363.0, 1269.0, 1530.0, 980.0, 1152.0, 920.0, 1214.0, - 664.0, 570.0, 572.0, 996.0, 1462.0, 1120.0, 818.0, 614.0, 838.0, - 1289.0, 651.0, 831.0, 766.0, 1203.0, 929.0, 872.0, 588.0, 884.0, - 866.0, 1235.0, 1087.0, 1114.0, 1112.0, 1266.0, 1202.0, 839.0, 1031.0, - 1047.0, 1230.0, 638.0, 1026.0, 924.0, 1078.0, 704.0, 621.0, 538.0, - 665.0, 950.0, 1245.0, 1256.0, 1064.0, 561.0, 429.0, 213.0, 1142.0, - 1622.0, 2024.0, 1208.0, 1139.0, 766.0, 814.0, 443.0, 660.0, 956.0, - 1220.0, 969.0, 641.0, 327.0, 420.0, 286.0, 232.0, 198.0, 708.0, - 884.0, 953.0, 573.0, 669.0, 553.0, 459.0, 592.0, 1467.0, 1681.0, - 1481.0, 1089.0, 1143.0, 968.0, 696.0, 422.0, 1009.0, 869.0, 967.0, - 518.0, 566.0, 510.0, 660.0, 688.0, 1330.0, 1392.0, 1532.0, 1488.0, - 1196.0, 1206.0, 784.0, 872.0, 577.0, 749.0, 803.0, 1180.0, 854.0, - 642.0, 348.0, 1252.0, 1806.0, 1686.0, 884.0, 792.0, 1004.0, 1237.0, - 859.0, 863.0, 674.0, 1170.0, 1080.0, 1323.0, 973.0, 1337.0, 980.0, - 1432.0, 1012.0, 1074.0, 484.0, 794.0, 1170.0, 1127.0, 1089.0, 613.0, - 554.0, 175.0, 453.0, 694.0, 1241.0, 997.0, 889.0, 414.0, 551.0, - 768.0, 903.0, 968.0, 898.0, 853.0, 631.0, 458.0, 847.0, 1183.0, - 1580.0, 1182.0, 1456.0, 1032.0, 909.0, 291.0, 471.0, 954.0, 1384.0, - 1563.0, 1123.0, 653.0, 280.0, 282.0, 314.0, 315.0, 665.0, 607.0, - 1013.0, 666.0, 906.0, 654.0, 627.0, 748.0, 1219.0, 1356.0, 1224.0, - 1076.0, 1171.0, 1076.0, 820.0, 521.0, 1031.0, 991.0, 1045.0, 737.0, - 517.0, 560.0, 530.0, 778.0, 1506.0, 1460.0, 1514.0, 874.0, 778.0, - 556.0, 648.0, 648.0, 643.0, 511.0, 567.0, 742.0, 644.0, 434.0, - 286.0, 892.0, 1410.0, 1328.0, 756.0, 536.0, 566.0, 1297.0, 1081.0, - 697.0, 420.0, 1052.0, 1284.0, 1587.0, 1047.0, 1307.0, 904.0, 1232.0, - 765.0, 706.0, 246.0, 629.0, 1394.0, 1443.0, 1161.0, 407.0, 336.0, - 203.0, 402.0, 509.0, 1036.0, 872.0, 881.0, 890.0, 1032.0, 1058.0, - 747.0, 926.0, 1254.0, 1213.0, 866.0, 477.0, 578.0, 755.0, 897.0, - 1003.0, 1321.0, 1322.0, 972.0, 495.0, 785.0, 1131.0, 1396.0, 1345.0, - 1301.0, 953.0, 940.0, 720.0, 766.0, 421.0, 825.0, 967.0, 1453.0, - 1166.0, 1174.0, 774.0, 581.0, 929.0, 1192.0, 1427.0, 1093.0, 877.0, - 1058.0, 1002.0, 1002.0, 515.0, 589.0, 597.0, 999.0, 1207.0, 973.0, - 686.0, 482.0, 672.0, 1244.0, 1464.0, 1376.0, 872.0, 592.0, 569.0, - 493.0, 531.0, 513.0, 527.0, 653.0, 712.0, 652.0, 354.0, 438.0, - 1020.0, 1786.0, 1800.0, 1249.0, 821.0, 729.0, 1404.0, 1292.0, 1192.0, - 684.0, 1368.0, 1180.0, 1596.0, 1056.0, 1028.0, 440.0, 684.0, 638.0, - 646.0, 186.0, 334.0, 1182.0, 1250.0, 1162.0, 454.0, 862.0, 799.0, - 652.0, 203.0, 466.0, 868.0, 955.0, 1233.0, 1287.0, 1603.0, 1735.0, - 1576.0, 1404.0, 1368.0, 1091.0, 933.0, 407.0, 588.0, 589.0, 767.0, - 949.0, 1089.0, 797.0, 408.0, 863.0, 1007.0, 1128.0, 910.0, 1051.0, - 1003.0, 985.0, 878.0, 814.0, 406.0, 300.0, 468.0, 1112.0, 1254.0, - 1550.0, 1146.0, 1346.0, 1486.0, 1678.0, 1571.0, 1045.0, 757.0, 914.0, - 960.0, 928.0, 425.0, 303.0, 415.0, 761.0, 1149.0, 907.0, 724.0, - 570.0, 904.0, 1064.0, 1041.0, 709.0, 641.0, 550.0, 939.0, 863.0, - 1009.0, 645.0, 683.0, 622.0, 691.0, 547.0, 360.0, 474.0, 628.0, - 1276.0, 1308.0, 1537.0, 1014.0, 830.0, 1188.0, 1320.0, 1210.0, 1056.0, - 1548.0, 1592.0, 1666.0, 922.0, 1064.0, 868.0, 1076.0, 712.0, 722.0, - 326.0, 382.0, 679.0, 775.0, 699.0, 530.0, 1008.0, 950.0, 734.0, - 212.0, 177.0, 597.0, 639.0, 1182.0, 1226.0, 1530.0, 1438.0, 1208.0, - 1086.0, 1124.0, 929.0, 777.0, 643.0, 856.0, 785.0, 656.0, 924.0, - 1176.0, 984.0, 479.0, 970.0, 1167.0, 1220.0, 476.0, 509.0, 757.0, - 1189.0, 1170.0, 858.0, 437.0, 268.0, 472.0, 672.0, 976.0, 1236.0, - 1103.0, 1324.0, 1495.0, 1857.0, 1595.0, 1046.0, 773.0, 601.0, 827.0, - 741.0, 1015.0, 785.0, 832.0, 1086.0, 1259.0, 1121.0, 623.0, 668.0, - 966.0, 980.0, 893.0, 587.0, 581.0, 411.0, 804.0, 926.0, 1021.0, - 631.0, 499.0, 612.0, 831.0, 703.0, 608.0, 598.0, 652.0, 781.0, - 792.0, 1153.0, 1309.0, 1216.0, 748.0, 1140.0, 1270.0, 1144.0, 1720.0, - 1682.0, 1597.0, 616.0, 1106.0, 1179.0, 1352.0, 721.0, 740.0, 470.0, - 411.0, 441.0, 469.0, 529.0, 504.0, 994.0, 908.0, 719.0, 186.0, - 163.0, 601.0, 1126.0, 1237.0, 1234.0, 1158.0, 1458.0, 1134.0, 682.0, - 876.0, 784.0, 880.0, 458.0, 634.0, 586.0, 398.0, 592.0, 673.0, - 707.0, 517.0, 692.0, 902.0, 772.0, 480.0, 227.0, 947.0, 1019.0, - 1224.0, 448.0, 344.0, 221.0, 235.0, 522.0, 579.0, 1153.0, 1002.0, - 1457.0, 1292.0, 1648.0, 1627.0, 1251.0, 822.0, 228.0, 421.0, 403.0, - 873.0, 714.0, 1225.0, 1115.0, 1245.0, 841.0, 724.0, 825.0, 1016.0, - 919.0, 596.0, 325.0, 407.0, 407.0, 835.0, 1053.0, 1410.0, 1323.0, - 1239.0, 913.0, 948.0, 616.0, 844.0, 438.0, 502.0, 227.0, 466.0, - 1110.0, 1470.0, 1276.0, 518.0, 1014.0, 908.0, 878.0, 950.0, 1228.0, - 1169.0, 604.0, 1072.0, 1373.0, 1722.0, 1093.0, 995.0, 883.0, 926.0, - 985.0, 621.0, 587.0, 658.0, 674.0, 852.0, 797.0, 707.0, 462.0, - 226.0, 1124.0, 1191.0, 1160.0, 880.0, 758.0, 896.0, 310.0, 489.0, - 583.0, 717.0, 739.0, 623.0, 879.0, 686.0, 930.0, 702.0, 746.0, - 738.0, 672.0, 1008.0, 798.0, 864.0, 488.0, 1084.0, 958.0, 1126.0, - 396.0, 319.0, 284.0, 288.0, 463.0, 599.0, 873.0, 825.0, 757.0, - 608.0, 600.0, 827.0, 781.0, 706.0, 344.0, 535.0, 553.0, 943.0, - 721.0, 1136.0, 1039.0, 1194.0, 884.0, 626.0, 549.0, 438.0, 411.0, - 307.0, 292.0, 656.0, 675.0, 1065.0, 939.0, 1242.0, 1242.0, 1114.0, - 1057.0, 1125.0, 943.0, 924.0, 442.0, 480.0, 527.0, 715.0, 1135.0, - 1206.0, 1075.0, 304.0, 720.0, 984.0, 932.0, 961.0, 651.0, 820.0, - 570.0, 870.0, 936.0, 1177.0, 868.0, 759.0, 855.0, 1082.0, 1656.0, - 1164.0, 1017.0, 569.0, 493.0, 760.0, 1044.0, 1128.0, 840.0, 360.0, - 1138.0, 1156.0, 1106.0, 808.0, 752.0, 806.0, 296.0, 431.0, 602.0, - 688.0, 832.0, 839.0, 1090.0, 786.0, 650.0, 595.0, 686.0, 916.0, - 614.0, 670.0, 502.0, 1126.0, 1090.0, 1874.0, 1478.0, 1362.0, 764.0, - 519.0, 739.0, 475.0, 645.0, 545.0, 761.0, 686.0, 727.0, 617.0, - 753.0, 997.0, 806.0, 644.0, 337.0, 516.0, 506.0, 476.0, 342.0, - 748.0, 667.0, 841.0, 519.0, 549.0, 549.0, 538.0, 527.0, 475.0, - 504.0, 1170.0, 1132.0, 1542.0, 954.0, 1212.0, 1060.0, 1125.0, 1090.0, - 1162.0, 1087.0, 1004.0, 660.0, 660.0, 774.0, 831.0, 1223.0, 891.0, - 805.0, 710.0, 1188.0, 1444.0, 1208.0, 803.0, 563.0, 583.0, 696.0, - 646.0, 633.0, 873.0, 864.0, 719.0, 823.0, 1094.0, 1652.0, 1424.0, - 1353.0, 1089.0, 917.0, 1058.0, 1324.0, 1281.0, 1355.0, 823.0, 1072.0, - 685.0, 647.0, 747.0, 860.0, 920.0, 460.0, 327.0, 504.0, 614.0, - 906.0, 945.0, 1156.0, 815.0, 677.0, 682.0, 954.0, 934.0, 552.0, - 484.0, 650.0, 1336.0, 1223.0, 1435.0, 989.0, 950.0, 811.0, 607.0, - 694.0, 401.0, 378.0, 350.0, 504.0, 617.0, 686.0, 944.0, 1064.0, - 951.0, 509.0, 308.0, 347.0, 801.0, 1018.0, 1016.0, 543.0, 391.0, - 283.0, 348.0, 288.0, 400.0, 536.0, 619.0, 577.0, 519.0, 710.0, - 1380.0, 1380.0, 1638.0, 898.0, 940.0, 396.0, 557.0, 981.0, 1277.0, - 1534.0, 1144.0, 948.0, 606.0, 752.0, 1077.0, 1347.0, 999.0, 660.0, - 939.0, 1337.0, 2070.0, 1730.0, 1307.0, 594.0, 456.0, 539.0, 716.0, - 663.0, 797.0, 775.0, 708.0, 854.0, 888.0, 1288.0, 1250.0, 1183.0, - 767.0, 605.0, 506.0, 769.0, 849.0, 1259.0, 1040.0, 777.0, 380.0, - 332.0, 477.0, 558.0, 590.0, 666.0, 488.0, 569.0, 441.0, 751.0, - 1066.0, 977.0, 725.0, 363.0, 810.0, 1182.0, 1174.0, 662.0, 265.0, - 371.0, 1099.0, 1177.0, 1465.0, 1053.0, 972.0, 1195.0, 991.0, 1037.0, - 421.0, 475.0, 356.0, 359.0, 256.0, 549.0, 819.0, 934.0, 666.0, - 490.0, 401.0, 531.0, 855.0, 1067.0, 1026.0, 536.0, 401.0, 283.0, - 421.0, 283.0, 456.0, 836.0, 978.0, 902.0, 708.0, 923.0, 1415.0, - 1199.0, 1098.0, 594.0, 564.0, 346.0, 435.0, 685.0, 1115.0, 1562.0, - 1628.0, 1348.0, 790.0, 406.0, 712.0, 1204.0, 1140.0, 976.0, 1421.0, - 1483.0, 1758.0, 1262.0, 1026.0, 677.0, 365.0, 338.0, 893.0, 1029.0, - 1198.0, 778.0, 646.0, 704.0, 848.0, 1052.0, 1202.0, 1014.0, 894.0, - 856.0, 766.0, 709.0, 533.0, 963.0, 1072.0, 927.0, 554.0, 586.0, - 651.0, 666.0, 520.0, 648.0, 464.0, 606.0, 612.0, 604.0, 836.0, - 572.0, 798.0, 445.0, 689.0, 829.0, 912.0, 635.0, 462.0, 464.0, - 739.0, 585.0, 533.0, 461.0, 524.0, 843.0, 753.0, 674.0, 434.0, - 494.0, 649.0, 666.0, 521.0, 481.0, 681.0, 685.0, 586.0, 618.0, - 634.0, 761.0, 749.0, 995.0, 964.0, 594.0, 373.0, 251.0, 351.0, - 385.0, 506.0, 844.0, 736.0, 660.0, 376.0, 857.0, 1389.0, 1219.0, - 846.0, 514.0, 668.0, 568.0, 477.0, 557.0, 1165.0, 1537.0, 1603.0, - 1115.0, 600.0, 286.0, 688.0, 1322.0, 1291.0, 1031.0, 1141.0, 965.0, - 1230.0, 978.0, 952.0, 491.0, 335.0, 570.0, 1171.0, 1301.0, 1006.0, - 596.0, 478.0, 608.0, 632.0, 942.0, 996.0, 718.0, 429.0, 371.0, - 463.0, 415.0, 471.0, 675.0, 846.0, 753.0, 713.0, 769.0, 828.0, - 738.0, 522.0, 944.0, 820.0, 984.0, 606.0, 562.0, 1108.0, 924.0, - 1176.0, 439.0, 531.0, 529.0, 886.0, 933.0, 861.0, 527.0, 564.0, - 428.0, 469.0, 403.0, 515.0, 972.0, 948.0, 850.0, 511.0, 533.0, - 727.0, 672.0, 519.0, 403.0, 389.0, 473.0, 588.0, 869.0, 788.0, - 725.0, 408.0, 463.0, 448.0, 410.0, 355.0, 442.0, 506.0, 618.0, - 411.0, 787.0, 614.0, 777.0, 705.0, 1019.0, 1379.0, 979.0, 730.0, - 406.0, 780.0, 737.0, 974.0, 662.0, 1157.0, 1193.0, 1340.0, 973.0, - 532.0, 321.0, 246.0, 826.0, 851.0, 1045.0, 1114.0, 950.0, 606.0, - 546.0, 582.0, 566.0, 466.0, 637.0, 962.0, 1008.0, 729.0, 378.0, - 304.0, 199.0, 317.0, 903.0, 1150.0, 959.0, 458.0, 411.0, 646.0, - 625.0, 570.0, 550.0, 624.0, 648.0, 656.0, 830.0, 812.0, 708.0, - 460.0, 788.0, 710.0, 664.0, 410.0, 402.0, 966.0, 910.0, 1010.0, - 651.0, 581.0, 478.0, 484.0, 727.0, 861.0, 624.0, 433.0, 732.0, - 957.0, 1067.0, 707.0, 876.0, 972.0, 925.0, 687.0, 645.0, 838.0, - 1034.0, 986.0, 748.0, 586.0, 627.0, 1129.0, 1286.0, 1291.0, 897.0, - 556.0, 440.0, 518.0, 578.0, 494.0, 572.0, 506.0, 594.0, 305.0, - 577.0, 463.0, 662.0, 622.0, 860.0, 1056.0, 904.0, 676.0, 502.0, - 888.0, 802.0, 1097.0, 697.0, 1067.0, 887.0, 1022.0, 783.0, 590.0, - 401.0, 372.0, 491.0, 574.0, 562.0, 750.0, 742.0, 572.0, 650.0, - 708.0, 940.0, 828.0, 1236.0, 1159.0, 1403.0, 901.0, 528.0, 300.0, - 235.0, 194.0, 754.0, 863.0, 858.0, 317.0, 200.0, 398.0, 561.0, - 651.0, 621.0, 449.0, 456.0, 698.0, 854.0, 918.0, 606.0, 548.0, - 814.0, 816.0, 742.0, 316.0, 246.0, 648.0, 782.0, 928.0, 988.0, - 948.0, 799.0, 481.0, 694.0, 782.0, 614.0, 521.0, 1036.0, 1157.0, - 1291.0, 767.0, 924.0, 864.0, 942.0, 1080.0, 1042.0, 1142.0, 970.0, - 1370.0, 1056.0, 1306.0, 992.0, 1523.0, 1134.0, 1130.0, 777.0, 589.0, - 420.0, 588.0, 614.0, 523.0, 489.0, 504.0, 604.0, 331.0, 520.0, - 408.0, 687.0, 738.0, 712.0, 960.0, 1006.0, 1292.0, 918.0, 1012.0, - 748.0, 1178.0, 876.0, 923.0, 639.0, 741.0, 690.0, 590.0, 488.0, - 572.0, 431.0, 425.0, 629.0, 522.0, 1010.0, 936.0, 1036.0, 659.0, - 951.0, 822.0, 921.0, 800.0, 1002.0, 796.0, 421.0, 334.0, 317.0, - 346.0, 644.0, 647.0, 573.0, 229.0, 670.0, 963.0, 1095.0, 663.0, - 461.0, 319.0, 564.0, 870.0, 1064.0, 884.0, 518.0, 482.0, 880.0, - 1260.0, 1210.0, 740.0, 462.0, 390.0, 592.0, 650.0, 1096.0, 1006.0, - 797.0, 483.0, 866.0, 1273.0, 1087.0, 770.0, 1023.0, 1127.0, 1263.0, - 658.0, 770.0, 780.0, 894.0, 1174.0, 1040.0, 1142.0, 1014.0, 1592.0, - 1311.0, 1503.0, 1055.0, 1377.0, 1243.0, 1459.0, 1320.0, 808.0, 368.0, - 494.0, 541.0, 520.0, 608.0, 621.0, 1050.0, 1060.0, 1677.0, 1229.0, - 1069.0, 462.0, 443.0, 747.0, 1051.0, 1344.0, 1206.0, 1000.0, 825.0, - 567.0, 431.0, 367.0, 323.0, 560.0, 510.0, 884.0, 789.0, 920.0, - 459.0, 595.0, 773.0, 414.0, 762.0, 860.0, 843.0, 886.0, 1180.0, - 1174.0, 1039.0, 1331.0, 1492.0, 1352.0, 670.0, 810.0, 574.0, 657.0, - 539.0, 528.0, 651.0, 895.0, 1376.0, 1334.0, 1138.0, 674.0, 568.0, - 372.0, 1062.0, 1332.0, 1418.0, 812.0, 574.0, 616.0, 1062.0, 1294.0, - 1302.0, 778.0, 680.0, 502.0, 660.0, 562.0, 857.0, 751.0, 595.0, - 536.0, 879.0, 1237.0, 1057.0, 916.0, 789.0, 685.0, 605.0, 364.0, - 454.0, 464.0, 716.0, 1222.0, 1104.0, 1054.0, 678.0, 1180.0, 917.0, - 1251.0, 853.0, 1359.0, 1185.0, 1301.0, 918.0, 570.0, 264.0, 668.0, - 639.0, 616.0, 510.0, 605.0, 1172.0, 1130.0, 1519.0, 969.0, 871.0, - 392.0, 395.0, 631.0, 1025.0, 1330.0, 1163.0, 869.0, 791.0, 534.0, - 672.0, 443.0, 430.0, 226.0, 240.0, 597.0, 807.0, 935.0, 621.0, - 1035.0, 1270.0, 794.0, 1214.0, 1219.0, 804.0, 722.0, 733.0, 691.0, - 469.0, 755.0, 720.0, 734.0, 484.0, 1036.0, 946.0, 1360.0, 1044.0, - 835.0, 648.0, 906.0, 1418.0, 1777.0, 1551.0, 1129.0, 677.0, 555.0, - 1158.0, 1046.0, 1090.0, 456.0, 1010.0, 974.0, 1872.0, 1569.0, 1574.0, - 714.0, 1155.0, 1026.0, 1242.0, 516.0, 553.0, 491.0, 499.0, 695.0, - 973.0, 1371.0, 1258.0, 925.0, 650.0, 506.0, 323.0, 474.0, 580.0, - 836.0, 644.0, 678.0, 554.0, 422.0, 438.0, 510.0, 561.0, 743.0, - 725.0, 1136.0, 1128.0, 1272.0, 806.0, 541.0, 344.0, 616.0, 977.0, - 849.0, 930.0, 654.0, 1161.0, 1130.0, 1836.0, 1342.0, 1328.0, 464.0, - 429.0, 217.0, 435.0, 626.0, 731.0, 598.0, 580.0, 326.0, 483.0, - 325.0, 417.0, 238.0, 522.0, 769.0, 1050.0, 1105.0, 1225.0, 1566.0, - 1281.0, 806.0, 801.0, 886.0, 558.0, 1007.0, 886.0, 753.0, 420.0, - 932.0, 925.0, 1087.0, 643.0, 1106.0, 904.0, 1470.0, 1268.0, 1189.0, - 863.0, 1080.0, 952.0, 1205.0, 1032.0, 1078.0, 991.0, 775.0, 1244.0, - 798.0, 797.0, 403.0, 981.0, 984.0, 1418.0, 833.0, 794.0, 442.0, - 1115.0, 1192.0, 1108.0, 644.0, 647.0, 693.0, 535.0, 455.0, 461.0, - 603.0, 684.0, 757.0, 746.0, 792.0, 645.0, 1048.0, 1046.0, 1108.0, - 688.0, 596.0, 552.0, 365.0, 387.0, 292.0, 395.0, 557.0, 578.0, - 950.0, 676.0, 742.0, 288.0, 414.0, 778.0, 1037.0, 1549.0, 1056.0, - 866.0, 336.0, 649.0, 532.0, 836.0, 570.0, 888.0, 477.0, 382.0, - 274.0, 453.0, 610.0, 463.0, 350.0, 300.0, 266.0, 505.0, 453.0, - 525.0, 250.0, 996.0, 925.0, 1200.0, 777.0, 1381.0, 1524.0, 1321.0, - 766.0, 1145.0, 1538.0, 1567.0, 1544.0, 1025.0, 545.0, 300.0, 431.0, - 417.0, 563.0, 722.0, 1194.0, 1148.0, 1349.0, 1203.0, 1155.0, 618.0, - 357.0, 636.0, 1209.0, 1366.0, 1403.0, 1228.0, 1142.0, 1026.0, 624.0, - 580.0, 308.0, 890.0, 838.0, 1670.0, 1143.0, 1144.0, 448.0, 975.0, - 1428.0, 1648.0, 1360.0, 1150.0, 848.0, 646.0, 408.0, 574.0, 668.0, - 758.0, 611.0, 501.0, 643.0, 609.0, 1187.0, 1064.0, 1036.0, 808.0, - 805.0, 742.0, 387.0, 381.0, 390.0, 425.0, 557.0, 530.0, 443.0, - 152.0, 372.0, 361.0, 840.0, 1242.0, 1689.0, 2073.0, 1618.0, 1264.0, - 580.0, 867.0, 992.0, 1256.0, 988.0, 1288.0, 861.0, 615.0, 463.0, - 524.0, 794.0, 698.0, 701.0, 495.0, 242.0, 373.0, 359.0, 500.0, - 356.0, 1134.0, 1238.0, 1638.0, 1160.0, 1768.0, 1391.0, 1148.0, 890.0, - 1173.0, 1171.0, 1442.0, 1666.0, 1182.0, 735.0, 315.0, 496.0, 493.0, - 613.0, 691.0, 965.0, 801.0, 663.0, 685.0, 841.0, 690.0, 447.0, - 781.0, 862.0, 881.0, 1227.0, 1480.0, 1702.0, 1178.0, 786.0, 786.0, - 634.0, 766.0, 546.0, 1050.0, 992.0, 940.0, 398.0, 424.0, 890.0, - 1264.0, 1508.0, 1276.0, 692.0, 418.0, 206.0, 448.0, 452.0, 730.0, - 722.0, 674.0, 632.0, 570.0, 920.0, 1197.0, 1156.0, 1238.0, 915.0, - 842.0, 495.0, 387.0, 391.0, 285.0, 201.0, 149.0, 145.0, 172.0, - 382.0, 621.0, 985.0, 1325.0, 1984.0, 1989.0, 1644.0, 851.0, 497.0, - 750.0, 916.0, 944.0, 844.0, 943.0, 1124.0, 978.0, 970.0, 823.0, - 769.0, 982.0, 1100.0, 1042.0, 770.0, 810.0, 830.0, 870.0, 814.0, - 1222.0, 1624.0, 1648.0, 1266.0, 1240.0, 1036.0, 1053.0, 1476.0, 1664.0, - 1596.0, 1634.0, 1712.0, 987.0, 780.0, 474.0, 552.0, 413.0, 565.0, - 931.0, 1181.0, 939.0, 481.0, 555.0, 589.0, 632.0, 354.0, 934.0, - 977.0, 1107.0, 1346.0, 1483.0, 1663.0, 1064.0, 762.0, 681.0, 491.0, - 645.0, 564.0, 1290.0, 1228.0, 1008.0, 570.0, 584.0, 966.0, 1198.0, - 1618.0, 1418.0, 942.0, 504.0, 416.0, 504.0, 488.0, 740.0, 776.0, - 741.0, 496.0, 298.0, 675.0, 1021.0, 1014.0, 1058.0, 806.0, 804.0, - 584.0, 461.0, 907.0, 700.0, 672.0, 369.0, 389.0, 464.0, 516.0, - 799.0, 1375.0, 1343.0, 1951.0, 1568.0, 1552.0, 685.0, 471.0, 573.0, - 768.0, 864.0, 908.0, 951.0, 1060.0, 923.0, 927.0, 762.0, 707.0, - 964.0, 1070.0, 1094.0, 796.0, 914.0, 872.0, 1082.0, 1322.0, 1228.0, - 1524.0, 1400.0, 1472.0, 1211.0, 827.0, 781.0, 2044.0, 1904.0, 1138.0, - 716.0, 970.0, 623.0, 663.0, 741.0, 870.0, 812.0, 592.0, 591.0, - 565.0, 415.0, 466.0, 692.0, 718.0, 896.0, 752.0, 940.0, 607.0, - 660.0, 1075.0, 1340.0, 1515.0, 849.0, 491.0, 481.0, 894.0, 938.0, - 778.0, 718.0, 732.0, 550.0, 794.0, 752.0, 724.0, 500.0, 930.0, - 1026.0, 874.0, 484.0, 462.0, 327.0, 339.0, 805.0, 888.0, 1024.0, - 755.0, 680.0, 978.0, 1165.0, 1232.0, 900.0, 627.0, 572.0, 604.0, - 501.0, 903.0, 808.0, 788.0, 483.0, 424.0, 534.0, 572.0, 864.0, - 1039.0, 991.0, 1157.0, 944.0, 754.0, 407.0, 353.0, 425.0, 319.0, - 423.0, 663.0, 731.0, 844.0, 722.0, 736.0, 779.0, 571.0, 886.0, - 1076.0, 1238.0, 1066.0, 826.0, 802.0, 1090.0, 1856.0, 1702.0, 1918.0, - 1106.0, 1256.0, 613.0, 417.0, 398.0, 1519.0, 1467.0, 1265.0, 1118.0, - 1159.0, 572.0, 520.0, 657.0, 1110.0, 1032.0, 1046.0, 714.0, 646.0, - 311.0, 387.0, 479.0, 498.0, 650.0, 642.0, 834.0, 515.0, 606.0, - 599.0, 854.0, 1029.0, 849.0, 525.0, 499.0, 812.0, 826.0, 636.0, - 434.0, 474.0, 320.0, 1124.0, 1104.0, 1174.0, 419.0, 719.0, 871.0, - 930.0, 698.0, 539.0, 402.0, 392.0, 755.0, 806.0, 882.0, 857.0, - 855.0, 1117.0, 844.0, 906.0, 890.0, 824.0, 706.0, 446.0, 574.0, - 997.0, 955.0, 833.0, 690.0, 628.0, 818.0, 730.0, 820.0, 884.0, - 776.0, 681.0, 617.0, 475.0, 1011.0, 885.0, 1291.0, 797.0, 794.0, - 734.0, 743.0, 653.0, 427.0, 455.0, 666.0, 662.0, 680.0, 734.0, - 1148.0, 1011.0, 1079.0, 611.0, 970.0, 1496.0, 1730.0, 1672.0, 1040.0, - 727.0, 482.0, 254.0, 734.0, 1005.0, 1009.0, 1127.0, 1226.0, 1537.0, - 1143.0, 923.0, 902.0, 1202.0, 1324.0, 1130.0, 565.0, 715.0, 545.0, - 732.0, 367.0, 391.0, 565.0, 623.0, 733.0, 415.0, 361.0, 401.0, - 488.0, 1001.0, 814.0, 865.0, 642.0, 1397.0, 1431.0, 1097.0, 385.0, - 193.0, 248.0, 808.0, 1220.0, 1302.0, 857.0, 429.0, 523.0, 464.0, - 563.0, 358.0, 337.0, 328.0, 859.0, 782.0, 1020.0, 988.0, 1080.0, - 926.0, 544.0, 658.0, 878.0, 843.0, 846.0, 524.0, 797.0, 841.0, - 871.0, 591.0, 612.0, 990.0, 1168.0, 1180.0, 754.0, 498.0, 452.0, - 397.0, 667.0, 671.0, 1354.0, 1380.0, 1606.0, 1021.0, 772.0, 676.0, - 733.0, 699.0, 501.0, 417.0, 642.0, 606.0, 678.0, 729.0, 1065.0, - 988.0, 955.0, 717.0, 850.0, 1148.0, 1244.0, 1334.0, 854.0, 525.0, - 291.0, 217.0, 778.0, 412.0, 364.0, 996.0, 1224.0, 1396.0, 894.0, - 911.0, 697.0, 1017.0, 980.0, 1107.0, 528.0, 728.0, 587.0, 604.0, - 399.0, 388.0, 482.0, 270.0, 373.0, 605.0, 744.0, 948.0, 1027.0, - 1481.0, 1169.0, 1016.0, 719.0, 1185.0, 1112.0, 803.0, 371.0, 194.0, - 204.0, 602.0, 1104.0, 1264.0, 824.0, 514.0, 736.0, 1176.0, 1025.0, - 858.0, 556.0, 765.0, 922.0, 797.0, 747.0, 723.0, 757.0, 593.0, - 477.0, 490.0, 732.0, 543.0, 636.0, 373.0, 790.0, 1056.0, 1119.0, - 731.0, 580.0, 984.0, 1148.0, 1460.0, 1338.0, 1522.0, 1056.0, 676.0, - 496.0, 706.0, 1400.0, 1382.0, 1714.0, 1330.0, 1137.0, 719.0, 703.0, - 801.0, 815.0, 633.0, 534.0, 494.0, 444.0, 583.0, 945.0, 1226.0, - 1321.0, 963.0, 718.0, 776.0, 915.0, 1013.0, 771.0, 503.0, 377.0, - 319.0, 788.0, 1029.0, 803.0, 1179.0, 1150.0, 1378.0, 1396.0, 1192.0, - 1362.0, 1226.0, 1173.0, 878.0, 499.0, 724.0, 628.0, 709.0, 463.0, - 543.0, 989.0, 828.0, 851.0, 633.0, 918.0, 1336.0, 1385.0, 1447.0, - 1361.0, 1304.0, 1151.0, 1597.0, 1453.0, 1214.0, 484.0, 297.0, 757.0, - 681.0, 1240.0, 844.0, 779.0, 659.0, 1005.0, 1394.0, 953.0, 879.0, - 587.0, 824.0, 976.0, 836.0, 816.0, 568.0, 551.0, 387.0, 351.0, - 400.0, 596.0, 424.0, 954.0, 1177.0, 1513.0, 1759.0, 1408.0, 1104.0, - 600.0, 980.0, 1180.0, 1397.0, 1577.0, 1738.0, 1513.0, 847.0, 522.0, - 710.0, 968.0, 1072.0, 924.0, 940.0, 814.0, 648.0, 776.0, 1181.0, - 1507.0, 1207.0, 794.0, 354.0, 328.0, 845.0, 995.0, 1313.0, 784.0, - 850.0, 570.0, 856.0, 609.0, 797.0, 501.0, 646.0, 327.0, 351.0, - 517.0, 1173.0, 1179.0, 1083.0, 982.0, 769.0, 883.0, 883.0, 1226.0, - 1178.0, 869.0, 636.0, 602.0, 591.0, 481.0, 435.0, 539.0, 577.0, - 891.0, 701.0, 806.0, 626.0, 960.0, 1374.0, 1670.0, 1440.0, 1394.0, - 942.0, 1081.0, 1171.0, 1015.0, 787.0, 318.0, 351.0, 1170.0, 1357.0, - 1471.0, 687.0, 846.0, 1101.0, 1439.0, 1480.0, 1080.0, 970.0, 490.0, - 662.0, 754.0, 826.0, 676.0, 433.0, 298.0, 256.0, 307.0, 332.0, - 592.0, 464.0, 932.0, 1114.0, 1218.0, 1324.0, 1284.0, 1160.0, 1172.0, - 816.0, 1096.0, 989.0, 1589.0, 1760.0, 1443.0, 805.0, 305.0, 497.0, - 715.0, 1096.0, 987.0, 1195.0, 1047.0, 1218.0, 1138.0, 1418.0, 1784.0, - 1500.0, 1078.0, 390.0, 462.0, 956.0, 1070.0, 1262.0, 812.0, 706.0, - 418.0, 1076.0, 1065.0, 1309.0, 757.0, 1072.0, 835.0, 977.0, 877.0, - 1638.0, 1784.0, 1558.0, 1283.0, 629.0, 1335.0, 1644.0, 2006.0, 1822.0, - 1175.0, 935.0, 683.0, 670.0, 771.0, 615.0, 559.0, 428.0, 902.0, - 942.0, 1194.0, 516.0, 976.0, 846.0, 1178.0, 701.0, 1043.0, 789.0, - 985.0, 991.0, 1418.0, 1215.0, 834.0, 484.0, 1312.0, 1391.0, 1326.0, - 558.0, 876.0, 1027.0, 1118.0, 784.0, 587.0, 524.0, 608.0, 674.0, - 978.0, 971.0, 1097.0, 728.0, 409.0, 273.0, 317.0, 329.0, 655.0, - 612.0, 1070.0, 1116.0, 1101.0, 976.0, 1036.0, 1013.0, 1390.0, 882.0, - 1174.0, 587.0, 1043.0, 1236.0, 1244.0, 781.0, 295.0, 494.0, 545.0, - 914.0, 773.0, 931.0, 787.0, 980.0, 1358.0, 1541.0, 1917.0, 1401.0, - 1073.0, 330.0, 486.0, 785.0, 942.0, 798.0, 662.0, 628.0, 688.0, - 1176.0, 1594.0, 1574.0, 1096.0, 854.0, 787.0, 921.0, 1073.0, 1522.0, - 1482.0, 1112.0, 1009.0, 400.0, 846.0, 1436.0, 1483.0, 1663.0, 1194.0, - 1394.0, 978.0, 810.0, 727.0, 807.0, 772.0, 801.0, 593.0, 916.0, - 1240.0, 1092.0, 1200.0, 622.0, 946.0, 717.0, 859.0, 508.0, 342.0, - 408.0, 979.0, 1172.0, 1235.0, 895.0, 1088.0, 1099.0, 965.0, 730.0, - 1001.0, 1108.0, 955.0, 660.0, 965.0, 911.0, 1343.0, 959.0, 1078.0, - 810.0, 914.0, 993.0, 611.0, 456.0, 444.0, 388.0, 489.0, 774.0, - 900.0, 720.0, 321.0, 202.0, 690.0, 643.0, 1202.0, 734.0, 878.0, - 570.0, 958.0, 1289.0, 1038.0, 706.0, 221.0, 452.0, 744.0, 1104.0, - 1033.0, 939.0, 759.0, 918.0, 1106.0, 1175.0, 1237.0, 899.0, 681.0, - 404.0, 608.0, 565.0, 596.0, 254.0, 518.0, 466.0, 586.0, 997.0, - 1500.0, 1390.0, 963.0, 1084.0, 1117.0, 1205.0, 797.0, 1352.0, 1177.0, - 1058.0, 852.0, 368.0, 842.0, 1307.0, 1777.0, 1897.0, 1571.0, 1341.0, - 1037.0, 888.0, 837.0, 657.0, 719.0, 728.0, 682.0, 968.0, 1270.0, - 1109.0, 1169.0, 600.0, 739.0, 628.0, 728.0, 512.0, 277.0, 312.0, - 944.0, 1464.0, 2050.0, 1854.0, 1399.0, 879.0, 550.0, 1043.0, 956.0, - 1212.0, 949.0, 1034.0, 1443.0, 1251.0, 1764.0, 1488.0, 1367.0, 1028.0, - 773.0, 925.0, 598.0, 454.0, 449.0, 427.0, 500.0, 1189.0, 1266.0, - 1103.0, 279.0, 197.0, 714.0, 658.0, 1268.0, 928.0, 1078.0, 663.0, - 967.0, 1244.0, 1036.0, 660.0, 264.0, 475.0, 700.0, 689.0, 631.0, - 440.0, 436.0, 498.0, 858.0, 1067.0, 943.0, 587.0, 361.0, 310.0, - 440.0, 577.0, 742.0, 504.0, 632.0, 558.0, 642.0, 573.0, 937.0, - 923.0, 808.0, 844.0, 837.0, 779.0, 703.0, 948.0, 661.0, 524.0, - 671.0, 638.0, 694.0, 546.0, 1383.0, 1495.0, 1593.0, 1093.0, 1337.0, - 1412.0, 1043.0, 684.0, 528.0, 703.0, 920.0, 1198.0, 1408.0, 1147.0, - 1293.0, 912.0, 843.0, 643.0, 659.0, 845.0, 675.0, 844.0, 798.0, - 1380.0, 1828.0, 2076.0, 1496.0, 1028.0, 700.0, 1192.0, 1015.0, 1235.0, - 991.0, 1390.0, 1898.0, 1723.0, 1684.0, 1480.0, 1347.0, 1134.0, 371.0, - 573.0, 494.0, 454.0, 368.0, 484.0, 489.0, 1214.0, 1147.0, 1285.0, - 807.0, 894.0, 1110.0, 689.0, 970.0, 910.0, 1136.0, 877.0, 957.0, - 804.0, 556.0, 392.0, 446.0, 801.0, 1002.0, 908.0, 582.0, 545.0, - 538.0, 717.0, 491.0, 697.0, 552.0, 505.0, 387.0, 391.0, 680.0, - 714.0, 836.0, 534.0, 672.0, 564.0, 708.0, 467.0, 401.0, 351.0, - 396.0, 896.0, 937.0, 875.0, 603.0, 728.0, 534.0, 609.0, 776.0, - 1138.0, 1522.0, 1273.0, 1750.0, 1210.0, 1205.0, 415.0, 1035.0, 1526.0, - 1583.0, 876.0, 442.0, 337.0, 882.0, 834.0, 854.0, 493.0, 831.0, - 856.0, 799.0, 460.0, 378.0, 785.0, 1268.0, 1407.0, 1228.0, 1188.0, - 1444.0, 1640.0, 1329.0, 1377.0, 989.0, 1292.0, 920.0, 1012.0, 827.0, - 1134.0, 1426.0, 1379.0, 1159.0, 1343.0, 1251.0, 1143.0, 614.0, 525.0, - 537.0, 644.0, 633.0, 889.0, 581.0, 1022.0, 827.0, 991.0, 883.0, - 952.0, 1110.0, 805.0, 1168.0, 1320.0, 1372.0, 900.0, 496.0, 414.0, - 375.0, 599.0, 811.0, 1462.0, 1295.0, 1044.0, 508.0, 611.0, 591.0, - 648.0, 600.0, 578.0, 419.0, 426.0, 649.0, 571.0, 776.0, 702.0, - 848.0, 525.0, 691.0, 627.0, 1158.0, 871.0, 994.0, 530.0, 769.0, - 844.0, 900.0, 662.0, 1006.0, 712.0, 445.0, 359.0, 583.0, 1189.0, - 1699.0, 1443.0, 1299.0, 879.0, 885.0, 527.0, 683.0, 1169.0, 1322.0, - 995.0, 581.0, 443.0, 976.0, 1040.0, 1094.0, 687.0, 1201.0, 1152.0, - 1125.0, 602.0, 503.0, 1279.0, 1738.0, 1952.0, 1309.0, 901.0, 861.0, - 948.0, 781.0, 1085.0, 957.0, 976.0, 690.0, 662.0, 985.0, 1176.0, - 1265.0, 946.0, 773.0, 1264.0, 1390.0, 1269.0, 889.0, 761.0, 794.0, - 818.0, 1124.0, 1296.0, 906.0, 643.0, 839.0, 1031.0, 1588.0, 1316.0, - 1086.0, 900.0, 804.0, 1122.0, 734.0, 709.0, 432.0, 388.0, 380.0, - 727.0, 891.0, 1632.0, 1568.0, 1471.0, 819.0, 747.0, 615.0, 465.0, - 391.0, 378.0, 444.0, 447.0, 877.0, 788.0, 1007.0, 781.0, 794.0, - 475.0, 505.0, 490.0, 1020.0, 833.0, 1416.0, 915.0, 1238.0, 1058.0, - 1054.0, 709.0, 917.0, 872.0, 653.0, 791.0, 711.0, 1206.0, 1496.0, - 1388.0, 806.0, 502.0, 515.0, 442.0, 316.0, 783.0, 1282.0, 1188.0, - 840.0, 500.0, 646.0, 666.0, 573.0, 842.0, 1462.0, 1556.0, 1397.0, - 787.0, 704.0, 1148.0, 1974.0, 2292.0, 1737.0, 1197.0, 1105.0, 1116.0, - 726.0, 912.0, 804.0, 942.0, 649.0, 709.0, 880.0, 704.0, 635.0, - 279.0, 662.0, 1049.0, 1144.0, 889.0, 843.0, 759.0, 878.0, 926.0, - 1260.0, 1576.0, 1246.0, 1066.0, 911.0, 894.0, 1424.0, 1031.0, 1018.0, - 980.0, 1050.0, 1448.0, 785.0, 642.0, 281.0, 252.0, 495.0, 666.0, - 862.0, 1376.0, 1344.0, 1266.0, 726.0, 758.0, 805.0, 750.0, 532.0, - 375.0, 324.0, 425.0, 716.0, 647.0, 808.0, 715.0, 690.0, 411.0, - 425.0, 478.0, 896.0, 773.0, 1340.0, 1107.0, 1366.0, 1270.0, 1202.0, - 1217.0, 1697.0, 547.0, 522.0, 766.0, 636.0, 844.0, 696.0, 750.0, - 536.0, 618.0, 827.0, 740.0, 602.0, 523.0, 876.0, 882.0, 964.0, - 624.0, 667.0, 591.0, 606.0, 1134.0, 2018.0, 1984.0, 1647.0, 904.0, - 1203.0, 1441.0, 1928.0, 1984.0, 1673.0, 1293.0, 1357.0, 1380.0, 928.0, - 609.0, 469.0, 681.0, 733.0, 722.0, 986.0, 813.0, 787.0, 281.0, - 457.0, 796.0, 834.0, 926.0, 1140.0, 1148.0, 1056.0, 519.0, 815.0, - 1031.0, 1048.0, 1128.0, 1225.0, 1164.0, 1508.0, 999.0, 1012.0, 864.0, - 850.0, 1056.0, 497.0, 435.0, 366.0, 321.0, 485.0, 350.0, 438.0, - 666.0, 770.0, 791.0, 471.0, 681.0, 744.0, 829.0, 463.0, 311.0, - 228.0, 277.0, 716.0, 711.0, 794.0, 565.0, 516.0, 442.0, 426.0, - 433.0, 528.0, 446.0, 889.0, 972.0, 1448.0, 1292.0, 1246.0, 1019.0, - 1339.0, 980.0, 867.0, 755.0, 654.0, 782.0, 510.0, 526.0, 421.0, - 615.0, 970.0, 871.0, 712.0, 505.0, 791.0, 906.0, 950.0, 635.0, - 630.0, 514.0, 562.0, 1099.0, 2175.0, 2152.0, 1686.0, 727.0, 1063.0, - 1055.0, 1582.0, 1526.0, 1566.0, 1202.0, 1126.0, 1152.0, 780.0, 525.0, - 401.0, 613.0, 697.0, 562.0, 386.0, 349.0, 414.0, 424.0, 852.0, - 892.0, 787.0, 487.0, 841.0, 975.0, 957.0, 880.0, 805.0, 1127.0, - 1264.0, 1664.0, 1649.0, 1088.0, 1144.0, 719.0, 796.0, 508.0, 438.0, - 730.0, 491.0, 457.0, 289.0, 520.0, 662.0, 444.0, 380.0, 862.0, - 914.0, 823.0, 269.0, 611.0, 744.0, 817.0, 553.0, 442.0, 381.0, - 342.0, 630.0, 658.0, 721.0, 500.0, 395.0, 387.0, 430.0, 737.0, - 778.0, 1081.0, 819.0, 918.0, 854.0, 998.0, 1004.0, 1014.0, 1286.0, - 1176.0, 1071.0, 732.0, 351.0, 669.0, 650.0, 642.0, 542.0, 756.0, - 1063.0, 1223.0, 1178.0, 811.0, 597.0, 594.0, 754.0, 605.0, 490.0, - 830.0, 1159.0, 1306.0, 1586.0, 1708.0, 1485.0, 929.0, 985.0, 1206.0, - 1426.0, 1008.0, 1000.0, 892.0, 998.0, 1016.0, 654.0, 447.0, 391.0, - 475.0, 674.0, 539.0, 609.0, 507.0, 535.0, 546.0, 725.0, 796.0, - 1010.0, 893.0, 1245.0, 1091.0, 977.0, 900.0, 945.0, 1007.0, 1050.0, - 1346.0, 2190.0, 1892.0, 1374.0, 558.0, 494.0, 676.0, 580.0, 680.0, - 438.0, 702.0, 1010.0, 1292.0, 978.0, 540.0, 323.0, 821.0, 961.0, - 1319.0, 856.0, 942.0, 539.0, 575.0, 543.0, 580.0, 619.0, 439.0, - 832.0, 785.0, 805.0, 597.0, 753.0, 825.0, 574.0, 635.0, 511.0, - 946.0, 670.0, 702.0, 648.0, 944.0, 1382.0, 1062.0, 930.0, 1409.0, - 1281.0, 768.0, 304.0, 494.0, 499.0, 430.0, 267.0, 471.0, 539.0, - 818.0, 771.0, 685.0, 483.0, 436.0, 534.0, 501.0, 442.0, 820.0, - 1143.0, 1342.0, 1343.0, 1615.0, 1534.0, 1419.0, 1103.0, 1286.0, 1130.0, - 1040.0, 667.0, 1051.0, 941.0, 998.0, 566.0, 656.0, 673.0, 751.0, - 621.0, 588.0, 438.0, 454.0, 461.0, 729.0, 891.0, 970.0, 1017.0, - 735.0, 819.0, 680.0, 748.0, 1024.0, 1058.0, 1100.0, 1406.0, 1488.0, - 2154.0, 1530.0, 1304.0, 648.0, 558.0, 639.0, 675.0, 1111.0, 878.0, - 1094.0, 986.0, 1424.0, 1072.0, 904.0, 476.0, 1088.0, 1044.0, 1440.0, - 831.0, 813.0, 519.0, 749.0, 863.0, 807.0, 716.0, 406.0, 562.0, - 449.0, 713.0, 735.0, 937.0, 927.0, 666.0, 783.0, 496.0, 939.0, - 751.0, 869.0, 501.0, 705.0, 1442.0, 1840.0, 1596.0, 1038.0, 1026.0, - 915.0, 564.0, 608.0, 831.0, 824.0, 939.0, 1195.0, 935.0, 966.0, - 688.0, 892.0, 716.0, 511.0, 715.0, 675.0, 641.0, 709.0, 939.0, - 1626.0, 1425.0, 1774.0, 1264.0, 1397.0, 1074.0, 1240.0, 1228.0, 1458.0, - 1101.0, 1365.0, 931.0, 950.0, 472.0, 536.0, 495.0, 513.0, 545.0, - 730.0, 686.0, 898.0, 837.0, 1109.0, 643.0, 591.0, 633.0, 791.0, - 990.0, 712.0, 638.0, 510.0, 812.0, 844.0, 1093.0, 903.0, 1433.0, - 1272.0, 1198.0, 635.0, 627.0, 744.0, 871.0, 1110.0, 1029.0, 1221.0, - 1194.0, 1189.0, 1065.0, 1001.0, 760.0, 697.0, 593.0, 1009.0, 839.0, - 817.0, 530.0, 1005.0, 963.0, 920.0, 528.0, 516.0, 580.0, 492.0, - 664.0, 1014.0, 1276.0, 1184.0, 696.0, 503.0, 227.0, 313.0, 434.0, - 501.0, 485.0, 599.0, 1644.0, 2020.0, 2214.0, 1018.0, 670.0, 566.0, - 465.0, 459.0, 693.0, 688.0, 867.0, 1051.0, 860.0, 667.0, 261.0, - 528.0, 562.0, 507.0, 723.0, 779.0, 771.0, 347.0, 403.0, 1306.0, - 1561.0, 1702.0, 1365.0, 1646.0, 1473.0, 1089.0, 725.0, 1143.0, 1123.0, - 1299.0, 814.0, 797.0, 675.0, 1154.0, 1061.0, 927.0, 545.0, 634.0, - 532.0, 986.0, 1044.0, 1342.0, 713.0, 578.0, 264.0, 511.0, 690.0, - 658.0, 386.0, 262.0, 746.0, 840.0, 1287.0, 1193.0, 1659.0, 1111.0, - 979.0, 480.0, 529.0, 388.0, 487.0, 850.0, 1255.0, 1147.0, 718.0, - 495.0, 741.0, 1023.0, 716.0, 675.0, 445.0, 493.0, 238.0, 260.0, - 331.0, 835.0, 975.0, 936.0, 476.0, 402.0, 384.0, 407.0, 485.0, - 819.0, 792.0, 730.0, 340.0, 585.0, 584.0, 702.0, 659.0, 849.0, - 1199.0, 1333.0, 1890.0, 1852.0, 2322.0, 982.0, 926.0, 758.0, 702.0, - 399.0, 723.0, 969.0, 1180.0, 1364.0, 960.0, 1140.0, 680.0, 958.0, - 812.0, 741.0, 911.0, 775.0, 864.0, 430.0, 434.0, 962.0, 1364.0, - 1381.0, 1198.0, 1120.0, 1179.0, 739.0, 697.0, 1003.0, 1260.0, 1248.0, - 743.0, 609.0, 431.0, 756.0, 1116.0, 1050.0, 910.0, 648.0, 684.0, - 1272.0, 1352.0, 1434.0, 1018.0, 947.0, 801.0, 667.0, 493.0, 519.0, - 383.0, 376.0, 1186.0, 1176.0, 1301.0, 1029.0, 1781.0, 1621.0, 1151.0, - 780.0, 875.0, 997.0, 632.0, 715.0, 1005.0, 941.0, 752.0, 499.0, - 777.0, 871.0, 1031.0, 754.0, 712.0, 501.0, 438.0, 590.0, 593.0, - 1009.0, 955.0, 882.0, 712.0, 708.0, 599.0, 464.0, 480.0, 911.0, - 840.0, 726.0, 488.0, 645.0, 767.0, 654.0, 531.0, 721.0, 1034.0, - 1292.0, 1698.0, 1496.0, 2322.0, 1051.0, 984.0, 776.0, 546.0, 277.0, - 369.0, 559.0, 573.0, 577.0, 367.0, 820.0, 652.0, 776.0, 788.0, - 790.0, 1184.0, 1250.0, 1798.0, 1458.0, 1047.0, 687.0, 779.0, 809.0, - 1091.0, 1213.0, 1298.0, 893.0, 635.0, 593.0, 794.0, 1022.0, 859.0, - 711.0, 437.0, 736.0, 1088.0, 1040.0, 834.0, 534.0, 572.0, 872.0, - 923.0, 873.0, 977.0, 977.0, 961.0, 627.0, 393.0, 429.0, 385.0, - 416.0, 1002.0, 936.0, 1093.0, 1209.0, 1789.0, 1699.0, 969.0, 785.0, - 822.0, 1028.0, 625.0, 717.0, 777.0, 796.0, 654.0, 637.0, 667.0, - 619.0, 760.0, 660.0, 820.0, 1046.0, 1016.0, 984.0, 622.0, 1036.0, - 1058.0, 910.0, 716.0, 596.0, 647.0, 512.0, 536.0, 503.0, 576.0, - 472.0, 674.0, 728.0, 912.0, 625.0, 479.0, 785.0, 1180.0, 1392.0, - 1354.0, 1170.0, 1808.0, 621.0, 888.0, 672.0, 562.0, 429.0, 493.0, - 688.0, 531.0, 639.0, 524.0, 924.0, 687.0, 695.0, 717.0, 1154.0, - 1700.0, 1741.0, 2073.0, 1597.0, 1132.0, 582.0, 566.0, 609.0, 659.0, - 777.0, 996.0, 904.0, 800.0, 800.0, 968.0, 1092.0, 830.0, 486.0, - 197.0, 155.0, 675.0, 830.0, 1100.0, 1124.0, 1042.0, 1074.0, 753.0, - 689.0, 1349.0, 1351.0, 1415.0, 577.0, 357.0, 335.0, 625.0, 628.0, - 1014.0, 922.0, 1127.0, 1189.0, 1165.0, 1078.0, 628.0, 924.0, 957.0, - 1359.0, 914.0, 947.0, 547.0, 532.0, 422.0, 487.0, 477.0, 501.0, - 821.0, 869.0, 971.0, 1330.0, 1246.0, 1190.0, 726.0, 1052.0, 879.0, - 649.0, 525.0, 882.0, 1037.0, 806.0, 550.0, 485.0, 788.0, 690.0, - 958.0, 650.0, 912.0, 549.0, 555.0, 444.0, 481.0, 457.0, 544.0, - 862.0, 1398.0, 441.0, 512.0, 864.0, 779.0, 831.0, 407.0, 565.0, - 403.0, 519.0, 532.0, 564.0, 663.0, 737.0, 933.0, 1106.0, 1570.0, - 1829.0, 2113.0, 1777.0, 1288.0, 796.0, 680.0, 683.0, 1083.0, 1079.0, - 1330.0, 1030.0, 1324.0, 1276.0, 1292.0, 914.0, 680.0, 334.0, 167.0, - 169.0, 277.0, 378.0, 608.0, 950.0, 1032.0, 876.0, 795.0, 641.0, - 1321.0, 995.0, 1101.0, 565.0, 552.0, 388.0, 470.0, 484.0, 596.0, - 586.0, 769.0, 1001.0, 873.0, 786.0, 606.0, 998.0, 995.0, 1179.0, - 649.0, 654.0, 400.0, 474.0, 496.0, 403.0, 351.0, 323.0, 539.0, - 649.0, 763.0, 1266.0, 1214.0, 1182.0, 632.0, 750.0, 673.0, 552.0, - 400.0, 693.0, 940.0, 1169.0, 737.0, 729.0, 743.0, 728.0, 1120.0, - 1333.0, 1564.0, 1162.0, 626.0, 533.0, 329.0, 379.0, 516.0, 786.0, - 982.0, 360.0, 444.0, 976.0, 882.0, 967.0, 399.0, 610.0, 454.0, - 481.0, 836.0, 811.0, 1011.0, 723.0, 813.0, 1314.0, 1418.0, 1473.0, - 1145.0, 977.0, 836.0, 602.0, 666.0, 657.0, 1163.0, 1355.0, 1736.0, - 1256.0, 1566.0, 1534.0, 1652.0, 966.0, 574.0, 484.0, 471.0, 597.0, - 543.0, 576.0, 718.0, 908.0, 944.0, 762.0, 818.0, 712.0, 1274.0, - 848.0, 998.0, 574.0, 562.0, 376.0, 446.0, 466.0, 524.0, 570.0, - 723.0, 831.0, 675.0, 538.0, 602.0, 994.0, 927.0, 932.0, 441.0, - 565.0, 604.0, 596.0, 591.0, 781.0, 716.0, 726.0, 462.0, 802.0, - 866.0, 916.0, 778.0, 1094.0, 934.0, 744.0, 371.0, 384.0, 480.0, - 697.0, 816.0, 1184.0, 936.0, 1368.0, 1078.0, 1005.0, 981.0, 1329.0, - 1810.0, 1474.0, 1026.0, 589.0, 325.0, 507.0, 1164.0, 1356.0, 1230.0, - 502.0, 462.0, 1090.0, 1094.0, 1061.0, 533.0, 585.0, 960.0, 781.0, - 1034.0, 614.0, 961.0, 698.0, 796.0, 856.0, 816.0, 894.0, 978.0, - 1096.0, 1017.0, 815.0, 800.0, 734.0, 1184.0, 1439.0, 1770.0, 1162.0, - 1404.0, 1334.0, 1564.0, 896.0, 624.0, 578.0, 616.0, 838.0, 628.0, - 546.0, 386.0, 312.0, 424.0, 352.0, 804.0, 710.0, 968.0, 482.0, - 572.0, 840.0, 896.0, 776.0, 358.0, 370.0, 578.0, 792.0, 917.0, - 769.0, 609.0, 454.0, 531.0, 742.0, 742.0, 560.0, 224.0, 262.0, - 499.0, 504.0, 850.0, 1078.0, 1484.0, 1092.0, 946.0, 992.0, 1232.0, - 1000.0, 822.0, 1062.0, 1050.0, 784.0, 416.0, 555.0, 555.0, 523.0, - 457.0, 1101.0, 1533.0, 2056.0, 1336.0, 803.0, 657.0, 1215.0, 1448.0, - 1145.0, 1141.0, 995.0, 826.0, 608.0, 1220.0, 1334.0, 1119.0, 750.0, - 542.0, 752.0, 815.0, 874.0, 780.0, 737.0, 1156.0, 950.0, 1103.0, - 553.0, 615.0, 262.0, 318.0, 657.0, 655.0, 665.0, 934.0, 984.0, - 972.0, 690.0, 677.0, 663.0, 673.0, 1354.0, 1550.0, 1622.0, 1167.0, - 1181.0, 1443.0, 1258.0, 1098.0, 694.0, 786.0, 1022.0, 1024.0, 876.0, - 770.0, 474.0, 528.0, 648.0, 1244.0, 1154.0, 909.0, 351.0, 501.0, - 824.0, 790.0, 650.0, 294.0, 293.0, 543.0, 775.0, 929.0, 737.0, - 421.0, 332.0, 309.0, 728.0, 720.0, 700.0, 377.0, 475.0, 645.0, - 713.0, 871.0, 1127.0, 1415.0, 1303.0, 1184.0, 1226.0, 1208.0, 868.0, - 660.0, 842.0, 916.0, 723.0, 339.0, 453.0, 384.0, 364.0, 259.0, - 635.0, 1145.0, 1576.0, 1171.0, 695.0, 244.0, 301.0, 519.0, 485.0, - 1003.0, 1005.0, 1390.0, 1112.0, 1350.0, 1022.0, 836.0, 866.0, 654.0, - 564.0, 644.0, 750.0, 1160.0, 1070.0, 1558.0, 1006.0, 831.0, 265.0, - 325.0, 366.0, 212.0, 371.0, 385.0, 797.0, 1046.0, 1075.0, 806.0, - 758.0, 1126.0, 1082.0, 952.0, 993.0, 1076.0, 1274.0, 871.0, 841.0, - 1129.0, 1079.0, 1081.0, 411.0, 588.0, 818.0, 1168.0, 1028.0, 942.0, - 486.0, 543.0, 647.0, 1231.0, 1110.0, 823.0, 309.0, 455.0, 1184.0, - 1250.0, 1097.0, 373.0, 292.0, 1063.0, 1245.0, 1440.0, 936.0, 886.0, - 768.0, 507.0, 1174.0, 1139.0, 1184.0, 345.0, 595.0, 562.0, 845.0, - 833.0, 795.0, 1120.0, 1031.0, 1614.0, 1348.0, 1180.0, 700.0, 530.0, - 744.0, 748.0, 805.0, 617.0, 797.0, 744.0, 1056.0, 1031.0, 1059.0, - 1153.0, 1480.0, 1219.0, 719.0, 332.0, 407.0, 561.0, 403.0, 871.0, - 920.0, 1327.0, 876.0, 875.0, 581.0, 493.0, 716.0, 616.0, 494.0, - 534.0, 567.0, 913.0, 1051.0, 1169.0, 649.0, 849.0, 688.0, 804.0, - 556.0, 610.0, 749.0, 675.0, 901.0, 906.0, 865.0, 566.0, 546.0, - 963.0, 1103.0, 1053.0, 949.0, 824.0, 1130.0, 797.0, 939.0, 973.0, - 961.0, 774.0, 742.0, 839.0, 982.0, 1132.0, 1034.0, 1122.0, 458.0, - 547.0, 584.0, 1112.0, 1155.0, 813.0, 649.0, 695.0, 1188.0, 1048.0, - 1293.0, 785.0, 628.0, 751.0, 737.0, 748.0, 414.0, 708.0, 1026.0, - 782.0, 1401.0, 1070.0, 1328.0, 466.0, 809.0, 558.0, 887.0, 854.0, - 924.0, 750.0, 837.0, 1646.0, 1538.0, 1258.0, 470.0, 718.0, 846.0, - 822.0, 811.0, 660.0, 848.0, 847.0, 1448.0, 1474.0, 1230.0, 624.0, - 932.0, 917.0, 1301.0, 900.0, 1081.0, 799.0, 980.0, 844.0, 787.0, - 781.0, 714.0, 743.0, 489.0, 488.0, 433.0, 647.0, 611.0, 622.0, - 499.0, 911.0, 1007.0, 1409.0, 772.0, 1102.0, 691.0, 1004.0, 792.0, - 862.0, 788.0, 780.0, 1144.0, 1146.0, 1253.0, 1195.0, 1095.0, 1246.0, - 1374.0, 1312.0, 1082.0, 502.0, 456.0, 280.0, 446.0, 462.0, 409.0, - 228.0, 792.0, 842.0, 1389.0, 1237.0, 1202.0, 922.0, 422.0, 499.0, - 252.0, 311.0, 604.0, 733.0, 1000.0, 848.0, 1414.0, 1476.0, 1753.0, - 1257.0, 1065.0, 1100.0, 966.0, 1076.0, 776.0, 920.0, 930.0, 714.0, - 947.0, 868.0, 1078.0, 672.0, 799.0, 556.0, 680.0, 991.0, 1108.0, - 1081.0, 856.0, 1844.0, 1738.0, 1488.0, 512.0, 769.0, 855.0, 903.0, - 1030.0, 901.0, 1021.0, 853.0, 1488.0, 1608.0, 1404.0, 761.0, 913.0, - 1056.0, 1467.0, 1331.0, 1348.0, 1096.0, 1104.0, 928.0, 670.0, 682.0, - 841.0, 1101.0, 777.0, 651.0, 527.0, 681.0, 613.0, 586.0, 425.0, - 492.0, 1020.0, 1602.0, 1441.0, 1314.0, 663.0, 931.0, 810.0, 997.0, - 1021.0, 878.0, 911.0, 981.0, 1148.0, 1402.0, 1056.0, 768.0, 1062.0, - 1124.0, 1196.0, 408.0, 300.0, 165.0, 407.0, 324.0, 339.0, 139.0, - 732.0, 973.0, 1495.0, 1191.0, 947.0, 583.0, 281.0, 444.0, 395.0, - 666.0, 907.0, 1120.0, 1303.0, 1110.0, 1502.0, 1340.0, 1676.0, 1228.0, - 1176.0, 916.0, 1096.0, 1140.0, 901.0, 491.0, 519.0, 544.0, 514.0, - 490.0, 730.0, 806.0, 910.0, 671.0, 620.0, 1012.0, 1091.0, 1022.0, - 686.0, 1640.0, 1674.0, 1444.0, 484.0, 791.0, 1119.0, 1131.0, 1056.0, - 584.0, 564.0, 410.0, 724.0, 1172.0, 1160.0, 860.0, 387.0, 1015.0, - 1471.0, 1588.0, 1066.0, 634.0, 848.0, 674.0, 673.0, 672.0, 872.0, - 971.0, 942.0, 1218.0, 877.0, 1289.0, 927.0, 728.0, 366.0, 471.0, - 943.0, 1506.0, 1415.0, 894.0, 257.0, 390.0, 581.0, 534.0, 637.0, - 692.0, 923.0, 1011.0, 1088.0, 1292.0, 882.0, 750.0, 851.0, 1301.0, - 1259.0, 736.0, 456.0, 298.0, 436.0, 223.0, 421.0, 774.0, 941.0, - 1019.0, 1048.0, 1028.0, 823.0, 639.0, 511.0, 790.0, 624.0, 933.0, - 812.0, 1034.0, 883.0, 808.0, 1124.0, 1129.0, 1069.0, 661.0, 828.0, - 1010.0, 1326.0, 1272.0, 1111.0, 549.0, 351.0, 394.0, 845.0, 1071.0, - 1051.0, 765.0, 1015.0, 998.0, 800.0, 728.0, 907.0, 958.0, 966.0, - 1776.0, 1692.0, 1414.0, 450.0, 635.0, 941.0, 965.0, 964.0, 795.0, - 735.0, 663.0, 349.0, 889.0, 838.0, 831.0, 418.0, 1025.0, 1015.0, - 1058.0, 656.0, 670.0, 566.0, 416.0, 488.0, 764.0, 1320.0, 1846.0, - 1774.0, 1626.0, 990.0, 1336.0, 1158.0, 926.0, 497.0, 362.0, 832.0, - 1082.0, 1099.0, 583.0, 303.0, 202.0, 524.0, 656.0, 926.0, 733.0, - 1080.0, 1219.0, 1328.0, 968.0, 478.0, 914.0, 963.0, 1465.0, 931.0, - 718.0, 389.0, 243.0, 417.0, 615.0, 827.0, 1102.0, 1095.0, 1204.0, - 875.0, 1039.0, 813.0, 959.0, 439.0, 1070.0, 1084.0, 1802.0, 1091.0, - 1501.0, 906.0, 1005.0, 792.0, 667.0, 623.0, 216.0, 476.0, 682.0, - 1113.0, 939.0, 1162.0, 779.0, 765.0, 438.0, 827.0, 825.0, 764.0, - 566.0, 976.0, 1455.0, 1342.0, 1162.0, 836.0, 739.0, 827.0, 1233.0, - 1190.0, 1156.0, 688.0, 836.0, 926.0, 822.0, 650.0, 647.0, 655.0, - 805.0, 394.0, 848.0, 901.0, 919.0, 592.0, 1079.0, 1355.0, 1242.0, - 803.0, 617.0, 641.0, 480.0, 549.0, 402.0, 822.0, 1163.0, 1542.0, - 1458.0, 756.0, 1160.0, 1170.0, 1028.0, 529.0, 479.0, 609.0, 571.0, - 495.0, 440.0, 470.0, 459.0, 576.0, 911.0, 782.0, 940.0, 1356.0, - 1610.0, 1455.0, 827.0, 685.0, 1154.0, 1013.0, 1268.0, 715.0, 674.0, - 398.0, 300.0, 658.0, 923.0, 1085.0, 1137.0, 1110.0, 1320.0, 906.0, - 1121.0, 775.0, 1197.0, 679.0, 1262.0, 1008.0, 1534.0, 942.0, 1261.0, - 794.0, 1114.0, 788.0, 668.0, 392.0, 206.0, 447.0, 497.0, 626.0, - 530.0, 1076.0, 1023.0, 985.0, 447.0, 850.0, 777.0, 732.0, 554.0, - 835.0, 1243.0, 1098.0, 998.0, 1131.0, 1093.0, 1267.0, 1149.0, 1186.0, - 1130.0, 782.0, 822.0, 532.0, 416.0, 300.0, 840.0, 827.0, 879.0, - 342.0, 597.0, 694.0, 699.0, 581.0, 584.0, 1100.0, 1018.0, 1141.0, - 767.0, 759.0, 608.0, 593.0, 645.0, 1121.0, 1430.0, 1466.0, 1010.0, - 358.0, 620.0, 1054.0, 1075.0, 702.0, 566.0, 691.0, 744.0, 524.0, - 502.0, 745.0, 735.0, 849.0, 930.0, 778.0, 805.0, 1453.0, 1767.0, - 1703.0, 825.0, 1073.0, 1548.0, 1432.0, 939.0, 390.0, 496.0, 452.0, - 509.0, 623.0, 1272.0, 1141.0, 1019.0, 812.0, 1335.0, 1417.0, 1540.0, - 1175.0, 1174.0, 946.0, 1385.0, 1296.0, 1464.0, 870.0, 1187.0, 822.0, - 1140.0, 782.0, 843.0, 691.0, 579.0, 483.0, 479.0, 446.0, 430.0, - 860.0, 917.0, 887.0, 319.0, 432.0, 333.0, 357.0, 371.0, 616.0, - 1008.0, 1009.0, 1309.0, 1233.0, 1101.0, 827.0, 498.0, 883.0, 971.0, - 1154.0, 818.0, 576.0, 348.0, 466.0, 640.0, 623.0, 491.0, 659.0, - 872.0, 1362.0, 971.0, 803.0, 543.0, 1196.0, 1234.0, 1053.0, 817.0, - 745.0, 1002.0, 882.0, 1197.0, 1083.0, 774.0, 573.0, 513.0, 662.0, - 918.0, 700.0, 751.0, 437.0, 485.0, 918.0, 1156.0, 1080.0, 1039.0, - 1293.0, 1429.0, 1106.0, 1316.0, 1043.0, 1232.0, 1216.0, 1169.0, 925.0, - 275.0, 903.0, 1050.0, 1090.0, 491.0, 656.0, 794.0, 813.0, 1078.0, - 1052.0, 1284.0, 731.0, 699.0, 504.0, 1419.0, 1609.0, 1642.0, 967.0, - 766.0, 1080.0, 961.0, 918.0, 571.0, 469.0, 369.0, 446.0, 669.0, - 753.0, 1156.0, 1281.0, 1156.0, 637.0, 481.0, 479.0, 533.0, 481.0, - 505.0, 459.0, 359.0, 486.0, 383.0, 358.0, 252.0, 429.0, 492.0, - 513.0, 853.0, 1116.0, 1533.0, 1247.0, 1187.0, 1159.0, 1133.0, 996.0, - 636.0, 616.0, 356.0, 464.0, 486.0, 713.0, 891.0, 1128.0, 1151.0, - 1145.0, 762.0, 792.0, 513.0, 845.0, 642.0, 665.0, 677.0, 644.0, - 892.0, 840.0, 1170.0, 1088.0, 797.0, 592.0, 548.0, 1264.0, 1396.0, - 776.0, 683.0, 493.0, 502.0, 823.0, 1225.0, 1230.0, 1187.0, 1307.0, - 1291.0, 1204.0, 1300.0, 1168.0, 961.0, 825.0, 899.0, 912.0, 373.0, - 891.0, 1025.0, 1111.0, 525.0, 721.0, 856.0, 942.0, 1218.0, 996.0, - 1116.0, 558.0, 748.0, 664.0, 1198.0, 1548.0, 1426.0, 1096.0, 731.0, - 1217.0, 1065.0, 1158.0, 587.0, 645.0, 448.0, 545.0, 376.0, 303.0, - 921.0, 1250.0, 1246.0, 553.0, 369.0, 445.0, 486.0, 302.0, 476.0, - 718.0, 746.0, 630.0, 344.0, 351.0, 343.0, 477.0, 559.0, 591.0, - 1133.0, 980.0, 1324.0, 840.0, 1221.0, 1457.0, 1439.0, 1638.0, 934.0, - 1044.0, 462.0, 590.0, 608.0, 878.0, 1089.0, 1241.0, 1005.0, 990.0, - 539.0, 779.0, 540.0, 757.0, 602.0, 573.0, 825.0, 732.0, 884.0, - 724.0, 881.0, 797.0, 826.0, 926.0, 972.0, 1419.0, 1447.0, 651.0, - 439.0, 351.0, 575.0, 843.0, 1217.0, 1223.0, 1239.0, 1081.0, 1408.0, - 1508.0, 1894.0, 1602.0, 1600.0, 1330.0, 1186.0, 754.0, 513.0, 642.0, - 644.0, 550.0, 499.0, 775.0, 764.0, 713.0, 1373.0, 1323.0, 1181.0, - 249.0, 225.0, 380.0, 816.0, 1388.0, 1238.0, 932.0, 556.0, 1174.0, - 1094.0, 1398.0, 769.0, 887.0, 486.0, 937.0, 744.0, 690.0, 731.0, - 1096.0, 1115.0, 793.0, 492.0, 630.0, 567.0, 574.0, 623.0, 849.0, - 867.0, 952.0, 680.0, 584.0, 664.0, 766.0, 890.0, 714.0, 860.0, - 696.0, 1004.0, 766.0, 1214.0, 1590.0, 1538.0, 1622.0, 709.0, 911.0, - 405.0, 518.0, 608.0, 1170.0, 1399.0, 1507.0, 945.0, 680.0, 280.0, - 433.0, 430.0, 564.0, 429.0, 552.0, 620.0, 600.0, 440.0, 373.0, - 366.0, 560.0, 801.0, 1087.0, 1345.0, 1045.0, 897.0, 517.0, 397.0, - 437.0, 687.0, 489.0, 671.0, 539.0, 623.0, 683.0, 900.0, 1558.0, - 1450.0, 1318.0, 1062.0, 1166.0, 1081.0, 857.0, 816.0, 868.0, 661.0, - 427.0, 734.0, 733.0, 734.0, 385.0, 877.0, 875.0, 785.0, 263.0, - 234.0, 355.0, 354.0, 1115.0, 1005.0, 1122.0, 564.0, 1152.0, 1112.0, - 1297.0, 755.0, 1005.0, 978.0, 1412.0, 1114.0, 835.0, 482.0, 588.0, - 667.0, 687.0, 970.0, 1050.0, 1005.0, 714.0, 700.0, 872.0, 758.0, - 856.0, 798.0, 800.0, 956.0, 970.0, 962.0, 1050.0, 1052.0, 1063.0, - 617.0, 423.0, 944.0, 1666.0, 1672.0, 1688.0, 825.0, 1337.0, 867.0, - 1258.0, 1312.0, 1722.0, 1347.0, 1161.0, 657.0, 744.0, 354.0, 317.0, - 629.0, 675.0, 933.0, 781.0, 824.0, 590.0, 498.0, 616.0, 776.0, - 758.0, 1073.0, 1200.0, 1834.0, 1007.0, 522.0, 340.0, 358.0, 308.0, - 531.0, 443.0, 908.0, 696.0, 683.0, 639.0, 1073.0, 1602.0, 1188.0, - 1136.0, 1024.0, 1214.0, 810.0, 966.0, 1047.0, 1061.0, 502.0, 381.0, - 1209.0, 1244.0, 1134.0, 449.0, 1197.0, 1189.0, 1061.0, 187.0, 520.0, - 623.0, 662.0, 1019.0, 921.0, 1256.0, 558.0, 1002.0, 1032.0, 1613.0, - 1275.0, 1047.0, 940.0, 1364.0, 1456.0, 1057.0, 714.0, 605.0, 669.0, - 758.0, 1152.0, 1167.0, 1021.0, 750.0, 824.0, 892.0, 734.0, 838.0, - 1018.0, 1400.0, 1746.0, 1652.0, 1224.0, 1228.0, 1016.0, 1108.0, 496.0, - 538.0, 854.0, 1198.0, 1284.0, 1000.0, 637.0, 933.0, 981.0, 1260.0, - 1474.0, 1734.0, 1594.0, 1368.0, 928.0, 800.0, 448.0, 413.0, 689.0, - 585.0, 746.0, 618.0, 601.0, 464.0, 404.0, 790.0, 1013.0, 1081.0, - 922.0, 696.0, 1304.0, 938.0, 407.0, 175.0, 573.0, 559.0, 627.0, - 337.0, 812.0, 759.0, 685.0, 615.0, 770.0, 1166.0, 778.0, 1030.0, - 656.0, 668.0, 328.0, 921.0, 978.0, 909.0, 615.0, 670.0, 1709.0, - 1588.0, 1446.0, 523.0, 719.0, 721.0, 654.0, 216.0, 575.0, 562.0, - 649.0, 726.0, 885.0, 1076.0, 604.0, 548.0, 563.0, 958.0, 954.0, - 843.0, 936.0, 1030.0, 1140.0, 822.0, 770.0, 693.0, 619.0, 597.0, - 977.0, 989.0, 1297.0, 932.0, 1501.0, 1423.0, 1621.0, 998.0, 1156.0, - 1248.0, 1838.0, 1417.0, 939.0, 943.0, 944.0, 1156.0, 744.0, 980.0, - 1196.0, 1084.0, 1036.0, 739.0, 720.0, 1064.0, 1107.0, 1288.0, 1332.0, - 1298.0, 1228.0, 666.0, 531.0, 335.0, 339.0, 341.0, 649.0, 495.0, - 711.0, 496.0, 509.0, 267.0, 265.0, 646.0, 981.0, 1222.0, 1138.0, - 806.0, 974.0, 1322.0, 743.0, 288.0, 448.0, 484.0, 565.0, 475.0, - 1006.0, 923.0, 803.0, 467.0, 630.0, 1158.0, 940.0, 1214.0, 672.0, - 667.0, 772.0, 1165.0, 1173.0, 631.0, 584.0, 1035.0, 1762.0, 2038.0, - 1536.0, 999.0, 740.0, 966.0, 967.0, 670.0, 829.0, 722.0, 678.0, - 383.0, 474.0, 786.0, 782.0, 650.0, 755.0, 1054.0, 1104.0, 1031.0, - 834.0, 912.0, 864.0, 830.0, 814.0, 799.0, 703.0, 709.0, 785.0, - 797.0, 1065.0, 884.0, 1500.0, 1481.0, 1673.0, 1035.0, 1214.0, 1320.0, - 1890.0, 1373.0, 1039.0, 657.0, 642.0, 605.0, 813.0, 1079.0, 976.0, - 846.0, 992.0, 985.0, 756.0, 517.0, 564.0, 499.0, 632.0, 539.0, - 817.0, 641.0, 1041.0, 727.0, 667.0, 282.0, 284.0, 189.0, 302.0, - 361.0, 312.0, 186.0, 150.0, 332.0, 567.0, 900.0, 841.0, 599.0, - 479.0, 831.0, 749.0, 302.0, 473.0, 667.0, 866.0, 688.0, 638.0, - 445.0, 620.0, 481.0, 544.0, 1101.0, 1104.0, 1270.0, 484.0, 418.0, - 722.0, 807.0, 915.0, 375.0, 1018.0, 1378.0, 1748.0, 1634.0, 1186.0, - 907.0, 352.0, 582.0, 747.0, 838.0, 1023.0, 778.0, 854.0, 387.0, - 638.0, 536.0, 700.0, 484.0, 699.0, 762.0, 788.0, 809.0, 724.0, - 778.0, 812.0, 821.0, 713.0, 661.0, 593.0, 665.0, 922.0, 943.0, - 1269.0, 742.0, 1214.0, 1423.0, 2013.0, 1797.0, 1592.0, 1374.0, 1498.0, - 1333.0, 1050.0, 592.0, 413.0, 305.0, 761.0, 966.0, 928.0, 1162.0, - 1234.0, 1193.0, 522.0, 494.0, 470.0, 490.0, 177.0, 200.0, 427.0, - 505.0, 901.0, 611.0, 947.0, 556.0, 742.0, 323.0, 777.0, 642.0, - 997.0, 556.0, 546.0, 228.0, 407.0, 680.0, 759.0, 653.0, 372.0, - 825.0, 851.0, 582.0, 489.0, 560.0, 651.0, 663.0, 565.0, 318.0, - 505.0, 485.0, 610.0, 1493.0, 1416.0, 1326.0, 514.0, 508.0, 956.0, - 732.0, 1024.0, 790.0, 1272.0, 1544.0, 1352.0, 1462.0, 1082.0, 1282.0, - 669.0, 732.0, 698.0, 796.0, 1090.0, 1234.0, 1235.0, 1098.0, 840.0, - 833.0, 564.0, 452.0, 653.0, 682.0, 736.0, 724.0, 656.0, 694.0, - 830.0, 761.0, 931.0, 595.0, 688.0, 710.0, 1350.0, 1430.0, 1308.0, - 598.0, 818.0, 1077.0, 1327.0, 1595.0, 1764.0, 1888.0, 1427.0, 1135.0, - 1050.0, 941.0, 713.0, 417.0, 573.0, 562.0, 486.0, 1076.0, 1272.0, - 1267.0, 589.0, 629.0, 665.0, 574.0, 409.0, 430.0, 598.0, 554.0, - 1073.0, 785.0, 1181.0, 918.0, 1191.0, 780.0, 921.0, 705.0, 1066.0, - 711.0, 1143.0, 807.0, 978.0, 786.0, 750.0, 556.0, 299.0, 721.0, - 763.0, 829.0, 652.0, 755.0, 755.0, 675.0, 473.0, 174.0, 803.0, - 1063.0, 1214.0, 1300.0, 979.0, 863.0, 586.0, 751.0, 851.0, 775.0, - 876.0, 988.0, 1568.0, 1487.0, 1479.0, 863.0, 876.0, 786.0, 589.0, - 450.0, 352.0, 718.0, 1241.0, 1625.0, 1394.0, 1230.0, 990.0, 775.0, - 426.0, 250.0, 418.0, 533.0, 593.0, 400.0, 564.0, 550.0, 1420.0, - 1499.0, 1693.0, 829.0, 457.0, 447.0, 985.0, 1164.0, 1234.0, 918.0, - 1114.0, 1408.0, 1574.0, 1836.0, 1594.0, 1634.0, 1071.0, 1071.0, 886.0, - 923.0, 675.0, 724.0, 610.0, 528.0, 371.0, 1149.0, 1167.0, 1072.0, - 445.0, 684.0, 684.0, 573.0, 376.0, 496.0, 569.0, 583.0, 629.0, - 635.0, 956.0, 1102.0, 1377.0, 1008.0, 1052.0, 658.0, 1092.0, 944.0, - 1394.0, 1032.0, 1014.0, 858.0, 765.0, 687.0, 374.0, 826.0, 776.0, - 1050.0, 808.0, 697.0, 497.0, 408.0, 372.0, 398.0, 1016.0, 1289.0, - 1219.0, 1071.0, 691.0, 599.0, 772.0, 934.0, 1050.0, 718.0, 812.0, - 891.0, 1133.0, 1008.0, 989.0, 713.0, 774.0, 684.0, 497.0, 317.0, - 194.0, 539.0, 707.0, 1144.0, 778.0, 1426.0, 1106.0, 1193.0, 442.0, - 376.0, 398.0, 351.0, 352.0, 227.0, 461.0, 504.0, 1094.0, 1289.0, - 1535.0, 979.0, 837.0, 687.0, 957.0, 873.0, 997.0, 919.0, 1012.0, - 976.0, 818.0, 942.0, 1444.0, 1406.0, 943.0, 327.0, 542.0, 753.0, - 645.0, 1158.0, 1172.0, 1263.0, 645.0, 949.0, 889.0, 832.0, 667.0, - 685.0, 692.0, 452.0, 415.0, 503.0, 439.0, 599.0, 615.0, 933.0, - 772.0, 942.0, 1117.0, 1138.0, 968.0, 480.0, 485.0, 549.0, 1081.0, - 1059.0, 1041.0, 1133.0, 1097.0, 1319.0, 775.0, 1165.0, 1080.0, 1248.0, - 738.0, 664.0, 332.0, 337.0, 345.0, 861.0, 1438.0, 1840.0, 1282.0, - 713.0, 181.0, 197.0, 578.0, 716.0, 830.0, 710.0, 716.0, 543.0, - 785.0, 768.0, 1089.0, 669.0, 622.0, 328.0, 387.0, 402.0, 555.0, - 808.0, 861.0, 858.0, 712.0, 951.0, 897.0, 1145.0, 928.0, 860.0, - 471.0, 314.0, 335.0, 678.0, 907.0, 874.0, 1232.0, 1469.0, 1587.0, - 857.0, 645.0, 489.0, 529.0, 399.0, 709.0, 919.0, 864.0, 1214.0, - 964.0, 960.0, 739.0, 725.0, 991.0, 692.0, 688.0, 390.0, 362.0, - 1042.0, 1204.0, 1431.0, 857.0, 945.0, 659.0, 801.0, 672.0, 624.0, - 358.0, 212.0, 175.0, 257.0, 460.0, 707.0, 891.0, 1215.0, 993.0, - 847.0, 807.0, 848.0, 874.0, 498.0, 619.0, 955.0, 929.0, 1021.0, - 587.0, 977.0, 722.0, 1110.0, 890.0, 1181.0, 1142.0, 1218.0, 738.0, - 618.0, 200.0, 155.0, 233.0, 843.0, 1034.0, 1170.0, 624.0, 444.0, - 360.0, 540.0, 906.0, 732.0, 573.0, 189.0, 356.0, 300.0, 486.0, - 613.0, 792.0, 752.0, 628.0, 480.0, 468.0, 481.0, 640.0, 526.0, - 697.0, 562.0, 822.0, 873.0, 1069.0, 1253.0, 1072.0, 896.0, 532.0, - 610.0, 879.0, 1245.0, 1047.0, 750.0, 612.0, 661.0, 835.0, 693.0, - 828.0, 688.0, 774.0, 681.0, 1031.0, 1283.0, 1152.0, 1214.0, 723.0, - 809.0, 792.0, 959.0, 1655.0, 1584.0, 1412.0, 604.0, 296.0, 908.0, - 1100.0, 1324.0, 694.0, 670.0, 374.0, 776.0, 780.0, 751.0, 753.0, - 607.0, 588.0, 222.0, 408.0, 527.0, 971.0, 1105.0, 1002.0, 788.0, - 714.0, 758.0, 642.0, 451.0, 513.0, 763.0, 836.0, 1377.0, 1053.0, - 1337.0, 850.0, 1172.0, 1082.0, 1197.0, 1210.0, 980.0, 565.0, 599.0, - 269.0, 208.0, 450.0, 850.0, 984.0, 847.0, 585.0, 456.0, 755.0, - 951.0, 1214.0, 914.0, 631.0, 341.0, 478.0, 447.0, 777.0, 733.0, - 979.0, 839.0, 698.0, 592.0, 514.0, 638.0, 716.0, 679.0, 1195.0, - 1259.0, 1484.0, 799.0, 787.0, 893.0, 990.0, 1176.0, 800.0, 1248.0, - 1184.0, 1714.0, 1302.0, 1222.0, 982.0, 886.0, 786.0, 658.0, 744.0, - 678.0, 760.0, 994.0, 1716.0, 2424.0, 2037.0, 1657.0, 726.0, 757.0, - 764.0, 917.0, 2049.0, 1956.0, 1955.0, 1025.0, 735.0, 724.0, 458.0, - 540.0, 380.0, 604.0, 372.0, 1206.0, 1012.0, 1043.0, 865.0, 997.0, - 1032.0, 388.0, 442.0, 393.0, 947.0, 1029.0, 1136.0, 960.0, 698.0, - 614.0, 432.0, 541.0, 653.0, 787.0, 718.0, 1246.0, 1010.0, 1132.0, - 634.0, 488.0, 578.0, 1476.0, 1600.0, 1058.0, 554.0, 466.0, 572.0, - 483.0, 705.0, 821.0, 1152.0, 1159.0, 947.0, 630.0, 768.0, 932.0, - 1133.0, 835.0, 513.0, 493.0, 544.0, 626.0, 744.0, 765.0, 791.0, - 667.0, 922.0, 930.0, 741.0, 469.0, 387.0, 486.0, 942.0, 1138.0, - 1141.0, 565.0, 1073.0, 985.0, 1433.0, 1145.0, 1198.0, 1346.0, 1290.0, - 1360.0, 882.0, 811.0, 701.0, 565.0, 540.0, 686.0, 800.0, 796.0, - 1060.0, 1684.0, 2176.0, 2650.0, 2011.0, 1575.0, 858.0, 724.0, 890.0, - 1051.0, 1968.0, 2036.0, 1888.0, 1210.0, 681.0, 725.0, 807.0, 748.0, - 498.0, 270.0, 248.0, 908.0, 790.0, 947.0, 1003.0, 1193.0, 1284.0, - 656.0, 504.0, 292.0, 512.0, 758.0, 843.0, 866.0, 556.0, 459.0, - 378.0, 423.0, 479.0, 729.0, 826.0, 1211.0, 847.0, 903.0, 838.0, - 702.0, 844.0, 1130.0, 1288.0, 732.0, 399.0, 289.0, 505.0, 499.0, - 735.0, 929.0, 1248.0, 1251.0, 1419.0, 1060.0, 1044.0, 561.0, 710.0, - 572.0, 518.0, 614.0, 698.0, 831.0, 873.0, 997.0, 1001.0, 785.0, - 1040.0, 931.0, 867.0, 443.0, 538.0, 618.0, 1066.0, 1318.0, 1265.0, - 691.0, 789.0, 777.0, 1311.0, 1171.0, 1217.0, 1069.0, 715.0, 724.0, - 380.0, 649.0, 607.0, 789.0, 954.0, 1078.0, 962.0, 926.0, 1034.0, - 1676.0, 1904.0, 2146.0, 1492.0, 1132.0, 1124.0, 965.0, 1139.0, 903.0, - 1724.0, 1740.0, 1908.0, 1436.0, 1131.0, 843.0, 1043.0, 868.0, 725.0, - 317.0, 317.0, 1022.0, 968.0, 1075.0, 875.0, 951.0, 1338.0, 918.0, - 848.0, 344.0, 724.0, 1012.0, 1137.0, 888.0, 504.0, 381.0, 886.0, - 1074.0, 1361.0, 1565.0, 1377.0, 1085.0, 477.0, 685.0, 893.0, 573.0, - 569.0, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-conv-3/gendata.py deleted file mode 100755 index ef83ab20..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/gendata.py +++ /dev/null @@ -1,60 +0,0 @@ -#!/usr/bin/env python3 - -import numpy as np - -K_DIM = 3 -IH = 100 -IW = 100 -OH = IH - K_DIM + 1 -OW = IW - K_DIM + 1 - -info = np.finfo(np.float32) -nmant = 5 # Limit precision to avoid rounding errors -maxmant = 1 << nmant -minexp = 0 -maxexp = 5 - - -# Generate floating-point values with exact mantissa and exponent -def randf(n): - return np.ldexp( - np.random.randint(maxmant, size=n), np.random.randint(minexp, maxexp, size=n) - ) - - -inputs = randf((IH, IW)).astype(np.float32) -weights = np.ones((K_DIM, K_DIM), dtype=np.float32) -outputs = np.full((OH, OW), np.float32(0.0)) - -# Convolution -for kh in range(K_DIM): - for kw in range(K_DIM): - outputs += inputs[kh : (kh + OH), kw : (kw + OW)] * weights[kh][kw] - -print( - """#define K_DIM {} -#define IH {} -#define IW {} -#define I_SIZE {} -#define OH {} -#define OW {} -#define O_SIZE {} - -""".format( - K_DIM, IH, IW, IH * IW, OH, OW, OH * OW - ) -) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=10): - print("{} {}[{}] = {{".format(data_type, name, data_size)) - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print_array("input_k", weights.flatten(), "K_DIM*K_DIM") -print_array("input_image", inputs.flatten(), "I_SIZE") -print_array("verify_data", outputs.flatten(), "O_SIZE") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/vec-conv.S b/bb-tests/workloads/src/CTest/rvv/vec-conv-3/vec-conv.S deleted file mode 100644 index c2e987a8..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/vec-conv.S +++ /dev/null @@ -1,212 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Vectorized 2D 3x3 convolution -//-------------------------------------------------------------------------- - - .text - .balign 4 - - .global vec_conv -/* - * Calling convention: - * a0: size_t rows - * a1: size_t cols - * a2: size_t a_stride - * a3: size_t b_stride - * a4: const float *k - * a5: const float *a - * a6: float *b - */ - -#define rows a0 -#define cols a1 -#define a_stride a2 -#define b_stride a3 -#define k a4 -#define a a5 -#define b a6 - -#define ap t0 -#define bp t1 -#define vlen t2 -#define row_count t3 -#define VLEN_stride t4 -#define ap_4 t5 -#define ap_8 t6 - -#define row_check s0 -#define rows_odd s1 - -#define k0 ft0 -#define k1 ft1 -#define k2 ft2 -#define k3 ft3 -#define k4 ft4 -#define k5 ft5 -#define k6 ft6 -#define k7 ft7 -#define k8 ft8 - -#define vload0 v0 -#define vload1 v4 -#define vload2 v8 -#define vrow0 v16 -#define vrow1 v20 - -#define FRAMESIZE 32 - -vec_conv: - addi sp, sp, -FRAMESIZE - sd s0, 0(sp) - sd s1, 8(sp) - - # load the kernel into scalar registers - flw k0, 0(k) - flw k1, 4(k) - flw k2, 8(k) - flw k3, 12(k) - flw k4, 16(k) - flw k5, 20(k) - flw k6, 24(k) - flw k7, 28(k) - flw k8, 32(k) - - slli a_stride, a_stride, 2 - slli b_stride, b_stride, 2 - - mv row_check, rows - addi row_check, row_check, -2 - - andi rows_odd, rows, 1 - -# Prolog -loop_prolog: - mv ap, a - addi ap_4, ap, 4 - addi ap_8, ap, 8 - mv bp, b - mv row_count, row_check - - vsetvli vlen, cols, e32, m4, ta, ma - slli VLEN_stride, vlen, 2 - - # Load the first row and compute horizontal - vle32.v vload0, (ap) - vfmul.vf vrow0, vload0, k0 - vle32.v vload1, (ap_4) - vfmacc.vf vrow0, k1, vload1 - vle32.v vload2, (ap_8) - vfmacc.vf vrow0, k2, vload2 - - add ap, ap, a_stride - addi ap_4, ap, 4 - addi ap_8, ap, 8 - - # Load the second row and compute horizontal - vle32.v vload0, (ap) - vfmacc.vf vrow0, k3, vload0 - vle32.v vload1, (ap_4) - vfmacc.vf vrow0, k4, vload1 - vle32.v vload2, (ap_8) - vfmacc.vf vrow0, k5, vload2 - add ap, ap, a_stride - - # Load the third row - vfmul.vf vrow1, vload0, k0 - vle32.v vload0, (ap) - addi ap_4, ap, 4 - vfmacc.vf vrow1, k1, vload1 - vle32.v vload1, (ap_4) - vfmacc.vf vrow1, k2, vload2 - addi ap_8, ap, 8 - vle32.v vload2, (ap_8) - -# Main Loop -conv_loop: - vfmacc.vf vrow0, k6, vload0 - add ap, ap, a_stride - vfmacc.vf vrow0, k7, vload1 - addi ap_4, ap, 4 - vfmacc.vf vrow0, k8, vload2 - addi ap_8, ap, 8 - - vse32.v vrow0, (bp) - - vfmacc.vf vrow1, k3, vload0 - vfmacc.vf vrow1, k4, vload1 - vfmacc.vf vrow1, k5, vload2 - - vfmul.vf vrow0, vload0, k0 - vle32.v vload0, (ap) - vfmacc.vf vrow0, k1, vload1 - vle32.v vload1, (ap_4) - add bp, bp, b_stride - vfmacc.vf vrow0, k2, vload2 - - vle32.v vload2, (ap_8) - - vfmacc.vf vrow1, k6, vload0 - add ap, ap, a_stride - vfmacc.vf vrow1, k7, vload1 - addi ap_4, ap, 4 - vfmacc.vf vrow1, k8, vload2 - addi ap_8, ap, 8 - - vfmacc.vf vrow0, k3, vload0 - vfmacc.vf vrow0, k4, vload1 - vfmacc.vf vrow0, k5, vload2 - - vse32.v vrow1, (bp) - - vfmul.vf vrow1, vload0, k0 - vle32.v vload0, (ap) - vfmacc.vf vrow1, k1, vload1 - vle32.v vload1, (ap_4) - vfmacc.vf vrow1, k2, vload2 - vle32.v vload2, (ap_8) - - add bp, bp, b_stride - addi row_count, row_count, -2 - - bgtz row_count, conv_loop - -epilog: - vfmacc.vf vrow0, k6, vload0 - vfmacc.vf vrow0, k7, vload1 - vfmacc.vf vrow0, k8, vload2 - vse32.v vrow0, (bp) - - bnez rows_odd, row_loop_complete - - vfmacc.vf vrow1, k3, vload0 - vfmacc.vf vrow1, k4, vload1 - vfmacc.vf vrow1, k5, vload2 - - add ap, ap, a_stride - addi ap_4, ap, 4 - addi ap_8, ap, 8 - add bp, bp, b_stride - - vle32.v vload0, (ap) - vfmacc.vf vrow1, k6, vload0 - vle32.v vload1, (ap_4) - vfmacc.vf vrow1, k7, vload1 - vle32.v vload2, (ap_8) - vfmacc.vf vrow1, k8, vload2 - - vse32.v vrow1, (bp) - -row_loop_complete: - add a, a, VLEN_stride - add b, b, VLEN_stride - - sub cols, cols, vlen - bnez cols, loop_prolog - -exit: - ld s0, 0(sp) - ld s1, 8(sp) - addi sp, sp, FRAMESIZE - - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/vec-conv_main.c b/bb-tests/workloads/src/CTest/rvv/vec-conv-3/vec-conv_main.c deleted file mode 100644 index 619f0d00..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-conv-3/vec-conv_main.c +++ /dev/null @@ -1,39 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// 3x3 2D Convolution Benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized 2D 3x3 convolution implementation. - -#include "util.h" -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_conv(size_t, size_t, size_t, size_t, const float *, const float *, - float *); - -int main(int argc, char *argv[]) { - float results_data[O_SIZE] = {0}; - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_conv(OH, OW, IW, OW, input_k, input_image, results_data); - memset(results_data, 0, sizeof(results_data)); -#endif - - // Do the convolution - setStats(1); - vec_conv(OH, OW, IW, OW, input_k, input_image, results_data); - setStats(0); - - // Check the results - return verifyFloat(O_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-cos/cos.c b/bb-tests/workloads/src/CTest/rvv/vec-cos/cos.c deleted file mode 100644 index a79a0405..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-cos/cos.c +++ /dev/null @@ -1,56 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include "cos.h" - -#define COS64_IMPL(m) \ - void cos_f64m##m##_bmark(double *angles, double *results, size_t len) { \ - size_t avl = len; \ - vfloat64m##m##_t cos_vec, res_vec; \ - \ - for (size_t vl = __riscv_vsetvl_e64m##m(avl); avl > 0; avl -= vl) { \ - vl = __riscv_vsetvl_e64m##m(avl); \ - cos_vec = __riscv_vle64_v_f64m##m(angles, vl); \ - res_vec = __cos_f64m##m(cos_vec, vl); \ - __riscv_vse64_v_f64m##m(results, res_vec, vl); \ - angles += vl; \ - results += vl; \ - } \ - } - -#define COS32_IMPL(m) \ - void cos_f32m##m##_bmark(float *angles, float *results, size_t len) { \ - size_t avl = len; \ - vfloat32m##m##_t cos_vec, res_vec; \ - \ - for (size_t vl = __riscv_vsetvl_e32m##m(avl); avl > 0; avl -= vl) { \ - vl = __riscv_vsetvl_e32m##m(avl); \ - cos_vec = __riscv_vle32_v_f32m##m(angles, vl); \ - res_vec = __cos_f32m##m(cos_vec, vl); \ - __riscv_vse32_v_f32m##m(results, res_vec, vl); \ - angles += vl; \ - results += vl; \ - } \ - } - -COS64_IMPL(1) -COS64_IMPL(2) -COS64_IMPL(4) -COS32_IMPL(1) -COS32_IMPL(2) -COS32_IMPL(4) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-cos/cos.h b/bb-tests/workloads/src/CTest/rvv/vec-cos/cos.h deleted file mode 100644 index a2839eda..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-cos/cos.h +++ /dev/null @@ -1,284 +0,0 @@ -// Modified version of: -// "RISC-V VECTOR COS FUNCTION Version by Cristóbal Ramírez Lazo, "Barcelona -// 2019"" Find details on the original version below Author: Matteo Perotti -// - -// RISC-V VECTOR COS FUNCTION Version by Cristóbal Ramírez Lazo, "Barcelona -// 2019" This RISC-V Vector implementation is based on the original code -// presented by Julien Pommier - -/* - AVX implementation of sin, cos, sincos, exp and log - Based on "sse_mathfun.h", by Julien Pommier - http://gruntthepeon.free.fr/ssemath/ - Copyright (C) 2012 Giovanni Garberoglio - Interdisciplinary Laboratory for Computational Science (LISC) - Fondazione Bruno Kessler and University of Trento - via Sommarive, 18 - I-38123 Trento (Italy) - This software is provided 'as-is', without any express or implied - warranty. In no event will the authors be held liable for any damages - arising from the use of this software. - Permission is granted to anyone to use this software for any purpose, - including commercial applications, and to alter it and redistribute it - freely, subject to the following restrictions: - 1. The origin of this software must not be misrepresented; you must not - claim that you wrote the original software. If you use this software - in a product, an acknowledgment in the product documentation would be - appreciated but is not required. - 2. Altered source versions must be plainly marked as such, and must not be - misrepresented as being the original software. - 3. This notice may not be removed or altered from any source distribution. - (this is the zlib license) -*/ - -#include -#include -#include - -#define COS64_INLINE(m, md) \ - void cos_f64m##m##_bmark(double *angles, double *results, size_t len); \ - static inline vfloat64m##m##_t __cos_f64m##m(vfloat64m##m##_t x, \ - size_t gvl) { \ - int64_t _ps_inv_sign_mask = ~0x8000000000000000; \ - double _ps_cephes_FOPI = 1.27323954473516; /* 4 / M_PI */ \ - int64_t _pi32_1 = 1; \ - int64_t _pi32_inv1 = ~0x0000000000000001; \ - int64_t _pi32_2 = 2; \ - int64_t _pi32_4 = 4; \ - int64_t _Zero = 0; \ - \ - vfloat64m##m##_t xmm2; \ - vfloat64m##m##_t xmm1; \ - vfloat64m##m##_t xmm3; \ - vfloat64m##m##_t y; \ - \ - vint64m##m##_t emm0; \ - vint64m##m##_t emm2; \ - \ - vbool##md##_t xMask; \ - /* take the absolute value */ \ - vint64m##m##_t xf = __riscv_vreinterpret_v_f64m##m##_i64m##m(x); \ - vint64m##m##_t xa = __riscv_vand_vx_i64m##m(xf, _ps_inv_sign_mask, gvl); \ - x = __riscv_vreinterpret_v_i64m##m##_f64m##m(xa); \ - \ - /* scale by 4/Pi */ \ - y = __riscv_vfmul_vf_f64m##m(x, _ps_cephes_FOPI, gvl); \ - \ - /* store the integer part of y in mm0 */ \ - emm2 = __riscv_vfcvt_x_f_v_i64m##m(y, gvl); \ - \ - /* j=(j+1) & (~1) (see the cephes sources) */ \ - emm2 = __riscv_vadd_vx_i64m##m(emm2, _pi32_1, gvl); \ - emm2 = __riscv_vand_vx_i64m##m(emm2, _pi32_inv1, gvl); \ - y = __riscv_vfcvt_f_x_v_f64m##m(emm2, gvl); \ - \ - emm2 = __riscv_vsub_vx_i64m##m(emm2, _pi32_2, gvl); \ - \ - /* get the swap sign flag */ \ - emm0 = __riscv_vxor_vx_i64m##m(emm2, 0xffffffffffffffff, gvl); \ - emm0 = __riscv_vand_vx_i64m##m(emm0, _pi32_4, gvl); \ - \ - emm0 = __riscv_vsll_vx_i64m##m(emm0, 61, gvl); \ - \ - /* get the polynom selection mask */ \ - emm2 = __riscv_vand_vx_i64m##m(emm2, _pi32_2, gvl); \ - xMask = __riscv_vmseq_vx_i64m##m##_b##md(emm2, _Zero, gvl); \ - vint64m##m##_t zv = __riscv_vmv_v_x_i64m##m(_Zero, gvl); \ - emm2 = __riscv_vmerge_vxm_i64m##m(zv, 0xffffffffffffffff, xMask, gvl); \ - \ - vfloat64m##m##_t sign_bit = \ - __riscv_vreinterpret_v_i64m##m##_f64m##m(emm0); \ - vfloat64m##m##_t poly_mask = \ - __riscv_vreinterpret_v_i64m##m##_f64m##m(emm2); \ - \ - /* The magic pass: "Extended precision modular arithmetic" \ - x = ((x - y * DP1) - y * DP2) - y * DP3; */ \ - \ - double _ps_minus_cephes_DP1 = -0.78515625; \ - double _ps_minus_cephes_DP2 = -2.4187564849853515625E-4; \ - double _ps_minus_cephes_DP3 = -3.77489497744594108E-8; \ - \ - x = __riscv_vfmacc_vf_f64m##m(x, _ps_minus_cephes_DP1, y, gvl); \ - x = __riscv_vfmacc_vf_f64m##m(x, _ps_minus_cephes_DP2, y, gvl); \ - x = __riscv_vfmacc_vf_f64m##m(x, _ps_minus_cephes_DP3, y, gvl); \ - \ - /* Evaluate the first polynom (0 <= x <= Pi/4) */ \ - double _ps_coscof_p0 = 2.443315711809948E-005; \ - double _ps_coscof_p1 = -1.388731625493765E-003; \ - double _ps_coscof_p2 = 4.166664568298827E-002; \ - double _ps_0p5 = 0.5f; \ - \ - vfloat64m##m##_t z; \ - vfloat64m##m##_t tmp; \ - \ - z = __riscv_vfmul_vv_f64m##m(x, x, gvl); \ - \ - vfloat64m##m##_t vcp1 = __riscv_vfmv_v_f_f64m##m(_ps_coscof_p1, gvl); \ - vfloat64m##m##_t vcp2 = __riscv_vfmv_v_f_f64m##m(_ps_coscof_p2, gvl); \ - y = __riscv_vfmacc_vf_f64m##m(vcp1, _ps_coscof_p0, y, gvl); \ - y = __riscv_vfmacc_vv_f64m##m(vcp2, z, y, gvl); \ - y = __riscv_vfmul_vv_f64m##m(y, z, gvl); \ - y = __riscv_vfmul_vv_f64m##m(y, z, gvl); \ - y = __riscv_vfnmsub_vf_f64m##m(y, _ps_0p5, z, gvl); \ - y = __riscv_vfadd_vf_f64m##m(y, 1.0, gvl); \ - \ - /* Evaluate the second polynom (Pi/4 <= x <= 0) */ \ - double _ps_sincof_p0 = -1.9515295891E-4; \ - double _ps_sincof_p1 = 8.3321608736E-3; \ - double _ps_sincof_p2 = -1.6666654611E-1; \ - vfloat64m##m##_t y2; \ - \ - vfloat64m##m##_t vsp1 = __riscv_vfmv_v_f_f64m##m(_ps_sincof_p1, gvl); \ - vfloat64m##m##_t vsp2 = __riscv_vfmv_v_f_f64m##m(_ps_sincof_p2, gvl); \ - y2 = __riscv_vfmacc_vf_f64m##m(vsp1, _ps_sincof_p0, z, gvl); \ - y2 = __riscv_vfmacc_vv_f64m##m(vsp2, z, y2, gvl); \ - y2 = __riscv_vfmul_vv_f64m##m(y2, z, gvl); \ - y2 = __riscv_vfmacc_vv_f64m##m(x, y2, x, gvl); \ - \ - /* select the correct result from the two polynoms */ \ - xmm3 = poly_mask; \ - vint64m##m##_t t1 = __riscv_vreinterpret_v_f64m##m##_i64m##m(xmm3); \ - vint64m##m##_t t2 = __riscv_vreinterpret_v_f64m##m##_i64m##m(y2); \ - vint64m##m##_t t3 = __riscv_vreinterpret_v_f64m##m##_i64m##m(y); \ - vint64m##m##_t t4 = __riscv_vxor_vx_i64m##m(t1, 0xffffffffffffffff, gvl); \ - vint64m##m##_t at1t2 = __riscv_vand_vv_i64m##m(t1, t2, gvl); \ - vint64m##m##_t at3t4 = __riscv_vand_vv_i64m##m(t3, t4, gvl); \ - y2 = __riscv_vreinterpret_v_i64m##m##_f64m##m(at1t2); \ - y = __riscv_vreinterpret_v_i64m##m##_f64m##m(at3t4); \ - y = __riscv_vfadd_vv_f64m##m(y, y2, gvl); \ - /* update the sign */ \ - t1 = __riscv_vreinterpret_v_f64m##m##_i64m##m(y); \ - t2 = __riscv_vreinterpret_v_f64m##m##_i64m##m(sign_bit); \ - vint64m##m##_t xt1t2 = __riscv_vxor_vv_i64m##m(t1, t2, gvl); \ - y = __riscv_vreinterpret_v_i64m##m##_f64m##m(xt1t2); \ - \ - return y; \ - } - -#define COS32_INLINE(m, md) \ - void cos_f32m##m##_bmark(float *angles, float *results, size_t len); \ - static inline vfloat32m##m##_t __cos_f32m##m(vfloat32m##m##_t x, \ - size_t gvl) { \ - int32_t _ps_inv_sign_mask = ~0x80000000; \ - float _ps_cephes_FOPI = 1.27323954473516; /* 4 / M_PI */ \ - int32_t _pi32_1 = 1; \ - int32_t _pi32_inv1 = ~0x00000001; \ - int32_t _pi32_2 = 2; \ - int32_t _pi32_4 = 4; \ - int32_t _Zero = 0; \ - \ - vfloat32m##m##_t xmm2; \ - vfloat32m##m##_t xmm1; \ - vfloat32m##m##_t xmm3; \ - vfloat32m##m##_t y; \ - \ - vint32m##m##_t emm0; \ - vint32m##m##_t emm2; \ - \ - vbool##md##_t xMask; \ - /* take the absolute value */ \ - vint32m##m##_t xf = __riscv_vreinterpret_v_f32m##m##_i32m##m(x); \ - vint32m##m##_t xa = __riscv_vand_vx_i32m##m(xf, _ps_inv_sign_mask, gvl); \ - x = __riscv_vreinterpret_v_i32m##m##_f32m##m(xa); \ - \ - /* scale by 4/Pi */ \ - y = __riscv_vfmul_vf_f32m##m(x, _ps_cephes_FOPI, gvl); \ - \ - /* store the integer part of y in mm0 */ \ - emm2 = __riscv_vfcvt_x_f_v_i32m##m(y, gvl); \ - \ - /* j=(j+1) & (~1) (see the cephes sources) */ \ - emm2 = __riscv_vadd_vx_i32m##m(emm2, _pi32_1, gvl); \ - emm2 = __riscv_vand_vx_i32m##m(emm2, _pi32_inv1, gvl); \ - y = __riscv_vfcvt_f_x_v_f32m##m(emm2, gvl); \ - \ - emm2 = __riscv_vsub_vx_i32m##m(emm2, _pi32_2, gvl); \ - \ - /* get the swap sign flag */ \ - emm0 = __riscv_vxor_vx_i32m##m(emm2, 0xffffffff, gvl); \ - emm0 = __riscv_vand_vx_i32m##m(emm0, _pi32_4, gvl); \ - \ - emm0 = __riscv_vsll_vx_i32m##m(emm0, 61, gvl); \ - \ - /* get the polynom selection mask */ \ - emm2 = __riscv_vand_vx_i32m##m(emm2, _pi32_2, gvl); \ - xMask = __riscv_vmseq_vx_i32m##m##_b##md(emm2, _Zero, gvl); \ - vint32m##m##_t zv = __riscv_vmv_v_x_i32m##m(_Zero, gvl); \ - emm2 = __riscv_vmerge_vxm_i32m##m(zv, 0xffffffff, xMask, gvl); \ - \ - vfloat32m##m##_t sign_bit = \ - __riscv_vreinterpret_v_i32m##m##_f32m##m(emm0); \ - vfloat32m##m##_t poly_mask = \ - __riscv_vreinterpret_v_i32m##m##_f32m##m(emm2); \ - \ - /* The magic pass: "Extended precision modular arithmetic" \ - x = ((x - y * DP1) - y * DP2) - y * DP3; */ \ - \ - float _ps_minus_cephes_DP1 = -0.78515625; \ - float _ps_minus_cephes_DP2 = -2.4187532849853515625E-4; \ - float _ps_minus_cephes_DP3 = -3.77489497744594108E-8; \ - \ - x = __riscv_vfmacc_vf_f32m##m(x, _ps_minus_cephes_DP1, y, gvl); \ - x = __riscv_vfmacc_vf_f32m##m(x, _ps_minus_cephes_DP2, y, gvl); \ - x = __riscv_vfmacc_vf_f32m##m(x, _ps_minus_cephes_DP3, y, gvl); \ - \ - /* Evaluate the first polynom (0 <= x <= Pi/4) */ \ - float _ps_coscof_p0 = 2.443315711809948E-005; \ - float _ps_coscof_p1 = -1.388731625493765E-003; \ - float _ps_coscof_p2 = 4.166632568298827E-002; \ - float _ps_0p5 = 0.5f; \ - \ - vfloat32m##m##_t z; \ - vfloat32m##m##_t tmp; \ - \ - z = __riscv_vfmul_vv_f32m##m(x, x, gvl); \ - \ - vfloat32m##m##_t vcp1 = __riscv_vfmv_v_f_f32m##m(_ps_coscof_p1, gvl); \ - vfloat32m##m##_t vcp2 = __riscv_vfmv_v_f_f32m##m(_ps_coscof_p2, gvl); \ - y = __riscv_vfmacc_vf_f32m##m(vcp1, _ps_coscof_p0, y, gvl); \ - y = __riscv_vfmacc_vv_f32m##m(vcp2, z, y, gvl); \ - y = __riscv_vfmul_vv_f32m##m(y, z, gvl); \ - y = __riscv_vfmul_vv_f32m##m(y, z, gvl); \ - y = __riscv_vfnmsub_vf_f32m##m(y, _ps_0p5, z, gvl); \ - y = __riscv_vfadd_vf_f32m##m(y, 1.0, gvl); \ - \ - /* Evaluate the second polynom (Pi/4 <= x <= 0) */ \ - float _ps_sincof_p0 = -1.9515295891E-4; \ - float _ps_sincof_p1 = 8.3321608736E-3; \ - float _ps_sincof_p2 = -1.6666654611E-1; \ - vfloat32m##m##_t y2; \ - \ - vfloat32m##m##_t vsp1 = __riscv_vfmv_v_f_f32m##m(_ps_sincof_p1, gvl); \ - vfloat32m##m##_t vsp2 = __riscv_vfmv_v_f_f32m##m(_ps_sincof_p2, gvl); \ - y2 = __riscv_vfmacc_vf_f32m##m(vsp1, _ps_sincof_p0, z, gvl); \ - y2 = __riscv_vfmacc_vv_f32m##m(vsp2, z, y2, gvl); \ - y2 = __riscv_vfmul_vv_f32m##m(y2, z, gvl); \ - y2 = __riscv_vfmacc_vv_f32m##m(x, y2, x, gvl); \ - \ - /* select the correct result from the two polynoms */ \ - xmm3 = poly_mask; \ - vint32m##m##_t t1 = __riscv_vreinterpret_v_f32m##m##_i32m##m(xmm3); \ - vint32m##m##_t t2 = __riscv_vreinterpret_v_f32m##m##_i32m##m(y2); \ - vint32m##m##_t t3 = __riscv_vreinterpret_v_f32m##m##_i32m##m(y); \ - vint32m##m##_t t4 = __riscv_vxor_vx_i32m##m(t1, 0xffffffff, gvl); \ - vint32m##m##_t at1t2 = __riscv_vand_vv_i32m##m(t1, t2, gvl); \ - vint32m##m##_t at3t4 = __riscv_vand_vv_i32m##m(t3, t4, gvl); \ - y2 = __riscv_vreinterpret_v_i32m##m##_f32m##m(at1t2); \ - y = __riscv_vreinterpret_v_i32m##m##_f32m##m(at3t4); \ - y = __riscv_vfadd_vv_f32m##m(y, y2, gvl); \ - /* update the sign */ \ - t1 = __riscv_vreinterpret_v_f32m##m##_i32m##m(y); \ - t2 = __riscv_vreinterpret_v_f32m##m##_i32m##m(sign_bit); \ - vint32m##m##_t xt1t2 = __riscv_vxor_vv_i32m##m(t1, t2, gvl); \ - y = __riscv_vreinterpret_v_i32m##m##_f32m##m(xt1t2); \ - \ - return y; \ - } - -COS64_INLINE(1, 64) -COS64_INLINE(2, 32) -COS64_INLINE(4, 16) -COS32_INLINE(1, 32) -COS32_INLINE(2, 16) -COS32_INLINE(4, 8) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-cos/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-cos/gen_data.py deleted file mode 100644 index 82914e15..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-cos/gen_data.py +++ /dev/null @@ -1,71 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: vector size, arg2: filter size - -import random as rand -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -def rand_matrix(N, dtype): - return np.random.rand(N).astype(dtype) - - -# SCRIPT - -if len(sys.argv) == 2: - N_f64 = int(sys.argv[1]) - N_f32 = 2 * N_f64 -else: - print("Error. Give me one argument: the number of vector elements.") - sys.exit() - -# Vector of samples -angles_f64 = rand_matrix(N_f64, np.float64).astype(np.float64) -angles_f32 = rand_matrix(N_f32, np.float32).astype(np.float32) - -# Results buffer -results_f64 = np.zeros(N_f64, dtype=np.float64) -results_f32 = np.zeros(N_f32, dtype=np.float32) - -# Gold results -gold_results_f64 = np.cos(angles_f64, dtype=np.float64) -gold_results_f32 = np.cos(angles_f32, dtype=np.float32) - -# Create the file -print('.section .data,"aw",@progbits') -emit("N_f64", np.array(N_f64, dtype=np.uint64)) -emit("angles_f64", angles_f64, "32") -emit("results_f64", results_f64, "32") -emit("gold_results_f64", gold_results_f64, "32") -emit("N_f32", np.array(N_f32, dtype=np.uint32)) -emit("angles_f32", angles_f32, "32") -emit("results_f32", results_f32, "32") -emit("gold_results_f32", gold_results_f32, "32") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-cos/main.c b/bb-tests/workloads/src/CTest/rvv/vec-cos/main.c deleted file mode 100644 index b3cba5e2..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-cos/main.c +++ /dev/null @@ -1,190 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include -#include -#include - -#include "ara/util.h" -#include "cos.h" -#include "util.h" - -#define N_F64 (512) -extern size_t N_f64; -extern double angles_f64[] __attribute__((aligned(16))); -extern double results_f64[] __attribute__((aligned(16))); -extern double gold_results_f64[] __attribute__((aligned(16))); -double results_f64m1[N_F64] __attribute__((aligned(16))); -double results_f64m2[N_F64] __attribute__((aligned(16))); -double results_f64m4[N_F64] __attribute__((aligned(16))); - -#define N_F32 (1024) -extern size_t N_f32; -extern float angles_f32[] __attribute__((aligned(16))); -extern float results_f32[] __attribute__((aligned(16))); -extern float gold_results_f32[] __attribute__((aligned(16))); -float results_f32m1[N_F32] __attribute__((aligned(16))); -float results_f32m2[N_F32] __attribute__((aligned(16))); -float results_f32m4[N_F32] __attribute__((aligned(16))); - -#define THRESHOLD 0.3 - -int check64(double *results) { - int error = 0; - for (uint64_t i = 0; i < N_f64; ++i) { - if (!similarity_check(results[i], gold_results_f64[i], THRESHOLD)) { - error = 1; - printf("64-bit error at index %d. %lx != %lx\n", i, - *(uint64_t *)(&results[i]), *(uint64_t *)(&gold_results_f64[i])); - } - } - return error; -} - -int check32(float *results) { - int error = 0; - for (uint64_t i = 0; i < N_f32; ++i) { - if (!similarity_check(results[i], gold_results_f32[i], THRESHOLD)) { - error = 1; - printf("32-bit error at index %d. %x != %x\n", i, - *(uint32_t *)(&results[i]), *(uint32_t *)(&gold_results_f32[i])); - } - } - return error; -} - -int main() { - if (N_F64 != N_f64 || N_F32 != N_f32) - exit(1); - - printf("FCOS\n"); - - int error = 0; - unsigned long cycles1, cycles2, instr2, instr1; - - if (N_f32 >= 256) { - for (size_t t = 8; t <= 256; t += 31) { - /* cycles1 = read_csr(mcycle); */ - /* cos_f32m1_bmark(angles_f32, results_f32m1, t); */ - /* asm volatile("fence"); */ - /* cycles2 = read_csr(mcycle); */ - /* printf("32b LMUL=1 n=%ld cycles=%ld\n", t, cycles2 - cycles1); */ - - cycles1 = read_csr(mcycle); - cos_f32m2_bmark(angles_f32, results_f32m2, t); - asm volatile("fence"); - cycles2 = read_csr(mcycle); - printf("32b LMUL=2 n=%ld cycles=%ld\n", t, cycles2 - cycles1); - - /* cycles1 = read_csr(mcycle); */ - /* cos_f32m4_bmark(angles_f32, results_f32m4, t); */ - /* asm volatile("fence"); */ - /* cycles2 = read_csr(mcycle); */ - /* printf("32b LMUL=4 n=%ld cycles=%ld\n", t, cycles2 - cycles1); */ - } - } - - printf("Executing cosine on %d 64-bit data LMUL1...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - cos_f64m1_bmark(angles_f64, results_f64m1, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles %ld instructions.\n", cycles2 - cycles1, - instr2 - instr1); - - printf("Executing cosine on %d 64-bit data LMUL2...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - cos_f64m2_bmark(angles_f64, results_f64m2, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles %ld instructions.\n", cycles2 - cycles1, - instr2 - instr1); - - printf("Executing cosine on %d 64-bit data LMUL4...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - cos_f64m4_bmark(angles_f64, results_f64m4, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles %ld instructions.\n", cycles2 - cycles1, - instr2 - instr1); - - printf("Executing cosine on %d 32-bit data LMUL1...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - cos_f32m1_bmark(angles_f32, results_f32m1, N_f32); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles %ld instructions.\n", cycles2 - cycles1, - instr2 - instr1); - - printf("Executing cosine on %d 32-bit data LMUL2...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - cos_f32m2_bmark(angles_f32, results_f32m2, N_f32); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles %ld instructions.\n", cycles2 - cycles1, - instr2 - instr1); - - printf("Executing cosine on %d 32-bit data LMUL4...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - cos_f32m4_bmark(angles_f32, results_f32m4, N_f32); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles %ld instructions.\n", cycles2 - cycles1, - instr2 - instr1); - - printf("Checking results:\n"); - - error = check64(results_f64m1); - if (error) { - return error; - } - error = check64(results_f64m2); - if (error) { - return error; - } - error = check64(results_f64m4); - if (error) { - return error; - } - error = check32(results_f32m1); - if (error) { - return error; - } - error = check32(results_f32m2); - if (error) { - return error; - } - error = check32(results_f32m4); - if (error) { - return error; - } - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-div-approx/dataset1.h deleted file mode 100644 index 6f0afd6a..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/dataset1.h +++ /dev/null @@ -1,322 +0,0 @@ - -#define DATA_SIZE 3000 - -float input1_data[DATA_SIZE] = { - 2, 41, 28, 9, 37, 17, 6, 47, 29, 40, 31, 0, 46, 44, 19, 34, 48, 5, 5, - 14, 21, 15, 33, 38, 15, 33, 43, 18, 23, 46, 27, 28, 10, 14, 37, 30, 25, 32, - 9, 4, 13, 11, 16, 47, 45, 12, 45, 41, 46, 49, 9, 48, 28, 17, 32, 32, 9, - 5, 42, 3, 14, 17, 7, 45, 44, 13, 28, 49, 38, 12, 23, 25, 20, 11, 22, 36, - 33, 15, 11, 46, 14, 5, 30, 30, 19, 17, 22, 13, 18, 17, 29, 3, 49, 20, 17, - 25, 27, 45, 38, 12, 38, 29, 7, 1, 6, 47, 46, 40, 27, 21, 32, 47, 14, 0, - 2, 4, 4, 22, 29, 30, 19, 14, 40, 48, 41, 17, 22, 25, 48, 17, 40, 44, 26, - 47, 35, 17, 28, 44, 1, 28, 33, 21, 21, 7, 0, 4, 12, 2, 21, 39, 43, 24, - 20, 18, 27, 25, 44, 34, 14, 13, 24, 14, 44, 10, 36, 34, 12, 39, 27, 15, 26, - 27, 16, 47, 14, 41, 15, 39, 36, 48, 18, 36, 14, 48, 32, 37, 4, 18, 5, 5, - 19, 32, 41, 42, 17, 7, 2, 7, 47, 17, 26, 37, 26, 41, 29, 0, 8, 3, 6, - 6, 37, 18, 47, 1, 6, 5, 42, 21, 12, 13, 20, 31, 19, 27, 37, 28, 8, 0, - 36, 25, 38, 2, 37, 4, 10, 42, 7, 38, 17, 47, 46, 16, 24, 22, 25, 32, 13, - 39, 5, 42, 45, 13, 11, 1, 39, 28, 17, 40, 40, 37, 14, 4, 22, 22, 35, 22, - 2, 12, 18, 25, 35, 43, 24, 37, 25, 35, 2, 18, 43, 47, 49, 26, 36, 21, 11, - 24, 34, 1, 17, 12, 17, 12, 28, 39, 37, 25, 41, 41, 8, 8, 33, 1, 10, 7, - 42, 4, 19, 5, 42, 5, 7, 0, 38, 10, 9, 28, 10, 26, 8, 11, 34, 37, 35, - 2, 4, 40, 44, 25, 18, 24, 0, 31, 4, 5, 39, 46, 32, 44, 41, 12, 23, 5, - 28, 26, 0, 49, 45, 19, 20, 5, 27, 46, 25, 20, 15, 10, 21, 30, 17, 26, 34, - 1, 9, 15, 5, 13, 19, 12, 24, 5, 34, 45, 38, 32, 4, 8, 20, 47, 1, 38, - 43, 5, 7, 3, 7, 41, 42, 21, 7, 10, 40, 41, 28, 18, 30, 24, 40, 45, 7, - 21, 40, 9, 27, 42, 22, 39, 43, 39, 40, 37, 5, 20, 12, 49, 6, 36, 29, 39, - 20, 26, 35, 27, 18, 38, 48, 11, 47, 11, 8, 25, 5, 35, 39, 7, 15, 39, 31, - 28, 5, 0, 26, 40, 1, 29, 37, 30, 38, 23, 2, 47, 39, 49, 41, 15, 3, 41, - 49, 37, 8, 41, 31, 7, 41, 2, 34, 4, 11, 37, 37, 13, 6, 36, 47, 21, 18, - 16, 13, 27, 38, 37, 45, 0, 0, 45, 49, 40, 28, 43, 48, 14, 25, 27, 48, 20, - 29, 14, 12, 42, 6, 22, 35, 4, 33, 9, 12, 16, 39, 32, 0, 16, 49, 32, 37, - 47, 10, 7, 9, 15, 8, 38, 48, 32, 28, 31, 21, 13, 28, 27, 24, 49, 37, 10, - 18, 46, 33, 1, 26, 23, 45, 10, 1, 10, 18, 44, 39, 34, 35, 34, 47, 19, 2, - 5, 23, 19, 33, 16, 47, 23, 23, 36, 35, 46, 26, 43, 3, 27, 45, 42, 7, 49, - 11, 5, 18, 4, 2, 4, 22, 38, 15, 16, 34, 38, 36, 16, 14, 26, 31, 29, 44, - 3, 14, 24, 36, 22, 27, 6, 32, 41, 14, 38, 8, 20, 24, 49, 5, 30, 43, 11, - 14, 28, 45, 28, 9, 12, 39, 49, 36, 17, 21, 30, 38, 8, 15, 16, 17, 15, 35, - 2, 24, 25, 5, 19, 13, 15, 3, 17, 49, 30, 45, 19, 4, 27, 13, 33, 17, 42, - 43, 15, 5, 26, 28, 41, 18, 49, 23, 45, 9, 13, 8, 38, 1, 42, 32, 0, 39, - 45, 42, 6, 49, 35, 28, 35, 12, 39, 7, 9, 12, 22, 45, 0, 16, 30, 15, 8, - 8, 6, 41, 35, 27, 11, 4, 31, 16, 46, 5, 0, 48, 7, 8, 29, 18, 19, 29, - 37, 18, 43, 29, 27, 22, 17, 29, 14, 45, 27, 18, 22, 39, 5, 24, 44, 40, 35, - 27, 17, 1, 41, 31, 27, 30, 48, 5, 15, 45, 48, 30, 39, 8, 44, 25, 45, 21, - 17, 7, 11, 5, 29, 41, 6, 47, 10, 24, 19, 39, 23, 38, 28, 24, 16, 36, 22, - 41, 37, 0, 29, 13, 6, 34, 37, 27, 42, 25, 18, 25, 45, 12, 12, 19, 21, 10, - 12, 26, 20, 8, 15, 31, 29, 26, 34, 7, 37, 40, 22, 3, 4, 8, 0, 42, 24, - 16, 3, 2, 5, 0, 17, 36, 34, 14, 48, 4, 31, 45, 30, 43, 37, 37, 23, 30, - 25, 45, 40, 14, 48, 6, 29, 27, 43, 49, 13, 31, 32, 31, 49, 19, 16, 23, 8, - 12, 9, 43, 11, 17, 7, 41, 16, 19, 42, 1, 20, 30, 27, 10, 0, 39, 44, 40, - 41, 33, 20, 4, 12, 21, 10, 33, 0, 18, 21, 3, 2, 37, 40, 25, 21, 16, 16, - 11, 20, 23, 34, 28, 15, 22, 29, 6, 44, 19, 46, 13, 5, 45, 2, 47, 18, 19, - 12, 11, 33, 1, 49, 22, 10, 19, 0, 31, 2, 30, 46, 47, 41, 23, 31, 40, 26, - 33, 2, 15, 17, 2, 48, 37, 20, 44, 43, 28, 8, 36, 0, 32, 25, 48, 39, 42, - 12, 31, 40, 39, 21, 44, 30, 34, 32, 4, 35, 39, 5, 29, 0, 46, 30, 32, 18, - 13, 11, 30, 34, 27, 5, 28, 37, 1, 15, 48, 28, 19, 38, 33, 25, 10, 44, 45, - 21, 36, 25, 1, 11, 39, 35, 22, 34, 34, 23, 40, 16, 30, 12, 5, 17, 14, 36, - 14, 45, 32, 31, 46, 11, 34, 18, 43, 32, 20, 47, 33, 42, 39, 19, 35, 21, 26, - 34, 1, 7, 2, 38, 6, 27, 40, 12, 10, 10, 28, 44, 47, 30, 32, 21, 45, 48, - 32, 34, 32, 13, 13, 48, 40, 41, 6, 23, 23, 31, 14, 20, 15, 11, 31, 14, 25, - 1, 33, 16, 28, 44, 15, 20, 0, 9, 46, 10, 26, 2, 26, 41, 17, 16, 11, 43, - 18, 7, 49, 19, 42, 24, 29, 0, 24, 36, 47, 34, 43, 7, 31, 13, 2, 12, 24, - 44, 10, 11, 16, 6, 32, 21, 16, 2, 33, 25, 38, 11, 32, 4, 13, 35, 9, 24, - 47, 4, 11, 42, 16, 37, 2, 4, 44, 13, 9, 36, 26, 38, 48, 30, 0, 28, 6, - 47, 14, 30, 6, 5, 8, 29, 20, 17, 12, 31, 25, 40, 44, 47, 7, 44, 8, 48, - 46, 16, 39, 41, 5, 38, 37, 34, 32, 49, 9, 39, 1, 32, 38, 44, 40, 13, 22, - 48, 18, 3, 46, 45, 26, 12, 13, 44, 32, 10, 15, 14, 40, 28, 17, 48, 24, 30, - 5, 6, 21, 22, 1, 3, 1, 12, 13, 44, 21, 27, 31, 35, 38, 14, 13, 5, 4, - 31, 40, 42, 36, 19, 38, 6, 8, 42, 9, 29, 46, 12, 11, 3, 9, 21, 37, 37, - 43, 2, 42, 38, 31, 24, 8, 23, 23, 30, 46, 0, 31, 14, 46, 15, 22, 35, 35, - 31, 22, 39, 30, 11, 26, 39, 36, 14, 36, 40, 8, 44, 22, 47, 30, 40, 25, 37, - 40, 31, 47, 47, 2, 17, 9, 39, 35, 14, 13, 38, 32, 10, 13, 15, 47, 47, 30, - 41, 31, 26, 12, 20, 13, 45, 39, 39, 26, 24, 15, 9, 19, 36, 45, 39, 23, 41, - 18, 5, 11, 42, 42, 27, 42, 15, 41, 21, 8, 23, 41, 15, 10, 7, 5, 31, 6, - 31, 21, 45, 30, 49, 27, 24, 11, 16, 42, 26, 6, 10, 34, 39, 8, 31, 4, 24, - 4, 23, 26, 15, 38, 35, 26, 23, 45, 48, 34, 16, 8, 21, 7, 0, 5, 37, 15, - 34, 25, 2, 1, 3, 31, 42, 12, 26, 24, 34, 13, 48, 27, 25, 47, 34, 42, 16, - 11, 4, 36, 16, 35, 49, 22, 44, 34, 5, 9, 17, 31, 18, 2, 43, 34, 36, 9, - 19, 25, 48, 28, 36, 34, 2, 2, 4, 45, 24, 21, 38, 22, 2, 31, 20, 31, 7, - 22, 18, 49, 48, 36, 36, 23, 34, 14, 34, 21, 18, 36, 36, 18, 37, 22, 15, 13, - 16, 25, 25, 44, 27, 7, 16, 39, 3, 18, 42, 21, 4, 46, 4, 26, 31, 17, 28, - 28, 49, 25, 6, 46, 38, 14, 11, 28, 35, 19, 29, 33, 11, 30, 36, 37, 41, 2, - 36, 12, 33, 22, 29, 23, 33, 4, 16, 46, 10, 38, 43, 41, 22, 16, 30, 43, 14, - 34, 5, 10, 27, 33, 7, 22, 17, 38, 27, 28, 36, 37, 29, 38, 23, 20, 42, 42, - 28, 2, 48, 40, 48, 21, 46, 40, 4, 18, 26, 6, 33, 18, 11, 34, 25, 30, 2, - 30, 9, 18, 11, 43, 34, 34, 29, 0, 33, 20, 31, 36, 31, 23, 7, 0, 34, 11, - 21, 26, 48, 16, 49, 9, 13, 27, 3, 33, 23, 47, 14, 17, 6, 7, 13, 43, 2, - 42, 20, 22, 39, 41, 7, 6, 28, 22, 17, 39, 6, 32, 42, 21, 34, 9, 25, 43, - 17, 30, 42, 25, 10, 11, 16, 49, 30, 45, 16, 16, 4, 27, 18, 36, 39, 23, 36, - 45, 44, 36, 26, 33, 48, 5, 7, 37, 4, 33, 2, 11, 36, 37, 0, 42, 46, 37, - 3, 17, 25, 26, 28, 16, 22, 29, 47, 4, 6, 33, 46, 46, 48, 7, 35, 3, 37, - 23, 30, 47, 13, 46, 18, 37, 37, 43, 0, 19, 5, 25, 42, 4, 45, 4, 9, 21, - 35, 27, 40, 18, 21, 42, 38, 27, 27, 11, 33, 31, 15, 46, 48, 29, 37, 33, 27, - 37, 11, 32, 7, 3, 22, 28, 41, 30, 0, 18, 6, 8, 10, 22, 45, 41, 8, 18, - 20, 15, 19, 7, 47, 22, 14, 49, 45, 14, 48, 9, 39, 6, 2, 3, 43, 15, 9, - 25, 35, 45, 17, 27, 8, 29, 9, 30, 48, 35, 35, 21, 48, 47, 7, 46, 0, 39, - 46, 48, 31, 0, 0, 2, 26, 12, 29, 43, 10, 35, 35, 17, 22, 47, 7, 4, 7, - 10, 4, 35, 7, 28, 26, 17, 27, 30, 26, 37, 44, 45, 9, 18, 35, 31, 29, 34, - 30, 7, 39, 30, 23, 39, 8, 22, 25, 2, 9, 3, 22, 10, 9, 8, 47, 12, 34, - 7, 30, 11, 39, 39, 39, 8, 2, 41, 16, 34, 15, 20, 47, 48, 6, 34, 7, 6, - 36, 45, 4, 36, 26, 23, 44, 35, 22, 38, 19, 44, 35, 12, 28, 18, 11, 47, 5, - 23, 2, 8, 32, 19, 49, 32, 16, 32, 3, 38, 14, 36, 35, 4, 43, 31, 44, 0, - 31, 30, 30, 24, 23, 8, 23, 23, 30, 5, 15, 9, 36, 38, 7, 36, 21, 47, 28, - 23, 19, 39, 35, 23, 36, 28, 7, 17, 39, 41, 49, 1, 24, 15, 25, 29, 40, 48, - 22, 19, 38, 14, 31, 42, 21, 3, 36, 45, 45, 3, 49, 12, 46, 48, 31, 7, 0, - 21, 29, 8, 9, 12, 11, 33, 22, 49, 35, 4, 29, 5, 18, 36, 12, 47, 22, 17, - 36, 23, 1, 49, 2, 35, 20, 32, 28, 2, 0, 5, 9, 3, 8, 19, 9, 23, 37, - 21, 24, 46, 18, 8, 37, 23, 37, 10, 2, 2, 16, 5, 36, 27, 27, 14, 38, 7, - 35, 47, 21, 47, 3, 49, 7, 22, 41, 40, 13, 38, 24, 37, 23, 11, 14, 9, 19, - 11, 4, 26, 38, 25, 48, 34, 24, 49, 1, 18, 37, 15, 10, 18, 10, 27, 31, 18, - 47, 19, 45, 45, 6, 10, 33, 49, 27, 34, 13, 39, 19, 24, 1, 32, 32, 14, 26, - 14, 21, 24, 20, 36, 12, 5, 32, 48, 27, 42, 49, 33, 33, 20, 44, 46, 44, 16, - 26, 33, 30, 44, 38, 6, 24, 1, 44, 11, 1, 48, 43, 15, 8, 39, 9, 26, 11, - 23, 18, 24, 24, 21, 8, 15, 11, 42, 22, 10, 17, 37, 12, 46, 4, 38, 34, 30, - 15, 20, 19, 26, 39, 22, 14, 16, 17, 7, 49, 37, 40, 8, 42, 33, 19, 7, 7, - 3, 34, 5, 38, 31, 31, 12, 3, 5, 47, 3, 23, 34, 43, 11, 34, 14, 12, 40, - 22, 14, 2, 44, 17, 21, 44, 9, 19, 34, 42, 2, 27, 37, 34, 17, 1, 25, 4, - 46, 44, 44, 47, 20, 26, 47, 11, 39, 32, 1, 13, 35, 26, 43, 8, 17, 43, 15, - 0, 14, 4, 47, 4, 27, 28, 49, 6, 11, 10, 28, 18, 41, 38, 15, 12, 16, 35, - 27, 22, 39, 14, 31, 27, 34, 28, 9, 9, 45, 19, 33, 28, 33, 2, 47, 26, 45, - 1, 27, 6, 6, 7, 15, 40, 28, 2, 15, 3, 35, 13, 35, 33, 19, 29, 0, 10, - 46, 13, 36, 10, 21, 35, 15, 7, 4, 46, 32, 24, 42, 39, 27, 14, 18, 39, 23, - 1, 19, 19, 22, 37, 26, 45, 35, 34, 32, 23, 8, 34, 16, 37, 22, 5, 42, 14, - 49, 9, 7, 29, 42, 16, 28, 20, 8, 5, 41, 39, 43, 39, 20, 16, 40, 3, 30, - 6, 41, 26, 46, 2, 23, 34, 25, 41, 13, 3, 4, 19, 45, 44, 37, 2, 48, 36, - 42, 48, 28, 33, 42, 10, 21, 8, 4, 46, 10, 34, 44, 41, 43, 19, 41, 7, 5, - 33, 24, 16, 35, 17, 5, 17, 31, 18, 45, 40, 2, 10, 27, 10, 47, 32, 13, 29, - 24, 42, 1, 35, 32, 0, 38, 9, 31, 41, 5, 4, 46, 16, 14, 42, 41, 10, 17, - 8, 0, 49, 40, 11, 45, 48, 12, 49, 1, 23, 37, 14, 17, 37, 44, 42, 0, 2, - 37, 34, 43, 10, 7, 31, 19, 22, 43, 30, 16, 11, 14, 17, 36, 4, 4, 15, 34, - 47, 10, 39, 29, 40, 21, 45, 1, 46, 35, 32, 2, 46, 21, 28, 14, 15, 20, 11, - 42, 0, 23, 31, 5, 36, 19, 44, 31, 42, 13, 30, 29, 3, 28, 35, 22, 13, 35, - 11, 18, 33, 18, 9, 25, 8, 27, 15, 1, 18, 14, 30, 4, 26, 47, 4, 36, 25, - 7, 33, 37, 21, 33, 27, 39, 41, 36, 26, 26, 20, 17, 2, 12, 40, 18, 14, 33, - 31, 21, 30, 8, 6, 5, 23, 2, 28, 28, 4, 28, 45, 26, 49, 13, 16, 16, 40, - 36, 5, 3, 4, 14, 41, 41, 35, 12, 23, 38, 35, 49, 21, 34, 16, 31, 18, 28, - 23, 42, 33, 46, 25, 39, 48, 27, 22, 43, 4, 16, 5, 30, 11, 44, 47, 16, 35, - 48, 20, 14, 37, 24, 27, 2, 13, 0, 48, 13, 24, 6, 27, 4, 17, 11, 5, 43, - 47, 6, 43, 18, 10, 16, 4, 1, 10, 30, 14, 17, 25, 33, 5, 40, 49, 38, 1, - 37, 44, 23, 12, 13, 26, 48, 17, 13, 25, 32, 2, 13, 3, 13, 38, 21, 8, 17, - 35, 5, 19, 6, 48, 8, 47, 6, 29, 48, 4, 47, 17, 42, 47, 38, 44, 33, 14, - 46, 3, 33, 47, 9, 25, 14, 27, 20, 42, 36, 0, 37, 25, 30, 5, 30, 46, 10, - 8, 3, 30, 13, 3, 23, 44, 27, 47, 0, 36, 14, 37, 46, 17, 14, 12, 24, 28, - 12, 35, 0, 41, 12, 38, 28, 26, 19, 46, 15, 44, 6, 8, 21, 30, 14, 49, 1, - 34, 23, 26, 26, 8, 41, 39, 22, 32, 5, 41, 9, 19, 1, 32, 23, 24, 43, 28, - 0, 4, 15, 39, 44, 6, 30, 30, 5, 18, 36, 4, 12, 7, 46, 49, 5, 31, 9, - 5, 17, 32, 31, 14, 14, 30, 25, 39, 10, 1, 6, 31, 11, 11, 49, 31, 36, 35, - 32, 35, 6, 3, 37, 3, 11, 45, 15, 5, 15, 25, 5, 16, 48, 3, 30, 48, 44, - 4, 36, 0, 5, 30, 21, 35, 19, 47, 11, 8, 34, 38, 42, 20, 47, 32, 24, 38, - 36, 4, 38, 4, 9, 11, 0, 26, 21, 33, 7, 42, 9, 35, 45, 42, 1, 8, 10, - 39, 46, 8, 18, 0, 48, 10, 1, 46, 2, 12, 43, 49, 12, 36, 10, 42, 8, 29, - 5, 43, 5, 23, 41, 28, 34, 27, 43, 47, 39, 26, 32, 45, 1, 25, 4, 40, 21, - 28, 9, 21, 39, 2, 29, 34, 31, 23, 24, 29, 9, 17, 38, 23, 30, 14, 16, 37, - 31, 20, 40, 41, 10, 31, 19, 16, 24, 29, 16, 0, 41, 5, 32, 46, 25, 14, 24, - 11, 39, 25, 43, 24, 7, 8, 4, 35, 31, 23, 7, 32, 36, 14, 4, 44, 25, 9, - 34, 46, 34, 26, 28, 24, 35, 14, 16, 29, 41, 28, 2, 3, 35, 37, 33, 40, 24, - 27, 40, 25, 21, 46, 31, 45, 9, 25, 26, 31, 48, 23, 35, 44, 44, 33, 16, 34, - 9, 13, 25, 5, 12, 17, 11, 31, 30, 16, 13, 4, 7, 16, 18, 7, 17, 16, 9, - 35, 39, 34, 34, 6, 36, 17, 4, 25, 38, 23, 40, 44, 11, 44, 34, 23, 0, 40, - 21, 16, 30, 10, 14, 34, 37, 10, 14, 4, 27, 34, 43, 4, 26, 11, 32, 40, 48, - 32, 39, 21, 9, 40, 49, 38, 49, 37, 32, 43, 10, 39, 43, 4, 15, 9, 2, 31, - 44, 30, 37, 8, 7, 3, 5, 24, 7, 1, 46, 20, 12, 34, 7, 48, 10, 21, 38, - 11, 30, 31, 46, 46, 4, 8, 5, 40, 44, 4, 12, 24, 45, 39, 25, 3}; - -float input2_data[DATA_SIZE] = { - 23, 17, 1, 50, 19, 29, 4, 8, 11, 8, 11, 29, 18, 30, 45, 12, 1, 45, 38, - 33, 26, 22, 36, 41, 50, 8, 35, 29, 14, 13, 33, 26, 5, 36, 34, 19, 28, 47, - 30, 29, 48, 41, 30, 33, 19, 25, 17, 16, 30, 20, 28, 33, 50, 50, 41, 4, 11, - 50, 32, 43, 36, 50, 13, 36, 22, 21, 27, 36, 1, 25, 6, 44, 35, 21, 15, 15, - 32, 4, 18, 28, 33, 9, 27, 15, 40, 41, 37, 38, 38, 5, 50, 14, 43, 40, 27, - 12, 24, 13, 18, 17, 35, 38, 23, 16, 38, 28, 24, 28, 47, 40, 18, 45, 34, 26, - 28, 44, 43, 23, 4, 34, 20, 1, 1, 9, 15, 20, 27, 10, 46, 28, 50, 45, 3, - 31, 48, 25, 8, 1, 33, 17, 37, 5, 37, 15, 20, 35, 46, 24, 41, 37, 14, 7, - 12, 11, 10, 1, 26, 30, 9, 33, 18, 31, 4, 43, 33, 32, 7, 7, 42, 34, 22, - 25, 38, 7, 32, 1, 1, 21, 6, 36, 41, 49, 21, 44, 43, 5, 4, 34, 34, 8, - 7, 5, 21, 12, 1, 22, 49, 11, 32, 19, 47, 36, 33, 16, 45, 35, 2, 25, 28, - 27, 49, 33, 22, 21, 16, 22, 44, 11, 24, 50, 34, 36, 50, 17, 23, 32, 8, 33, - 23, 22, 26, 50, 22, 1, 25, 8, 18, 20, 3, 9, 31, 6, 49, 7, 45, 5, 44, - 31, 31, 44, 45, 31, 21, 12, 38, 27, 14, 23, 12, 38, 48, 37, 36, 11, 3, 38, - 3, 20, 3, 8, 21, 38, 12, 24, 39, 10, 25, 35, 37, 9, 10, 46, 1, 49, 13, - 24, 19, 27, 12, 7, 3, 40, 10, 9, 38, 22, 18, 21, 8, 32, 15, 40, 17, 32, - 6, 43, 4, 41, 20, 48, 2, 1, 35, 35, 34, 30, 45, 42, 38, 24, 26, 27, 23, - 22, 37, 37, 36, 28, 5, 40, 11, 2, 33, 3, 22, 33, 5, 49, 34, 49, 27, 4, - 32, 16, 36, 21, 3, 37, 16, 22, 46, 30, 35, 40, 25, 9, 48, 50, 25, 8, 42, - 31, 35, 1, 6, 40, 10, 1, 36, 5, 38, 37, 41, 22, 17, 2, 2, 31, 47, 2, - 5, 23, 17, 18, 28, 19, 48, 32, 20, 50, 13, 17, 42, 49, 20, 14, 24, 18, 17, - 21, 13, 11, 28, 46, 32, 37, 30, 42, 34, 50, 21, 34, 47, 29, 38, 34, 7, 20, - 47, 9, 23, 16, 10, 24, 10, 21, 21, 41, 50, 14, 37, 35, 42, 27, 3, 32, 27, - 5, 15, 44, 6, 20, 40, 33, 19, 48, 45, 30, 3, 15, 43, 38, 5, 27, 23, 22, - 9, 47, 2, 24, 17, 14, 19, 18, 45, 8, 9, 23, 10, 32, 47, 14, 40, 10, 32, - 15, 47, 8, 42, 5, 50, 32, 33, 2, 1, 38, 20, 46, 40, 28, 50, 27, 31, 21, - 35, 28, 46, 7, 50, 27, 32, 12, 45, 27, 42, 33, 12, 5, 14, 29, 12, 7, 31, - 10, 12, 40, 45, 29, 6, 8, 40, 43, 2, 36, 45, 25, 3, 19, 11, 28, 21, 7, - 33, 24, 5, 15, 3, 45, 12, 38, 37, 45, 9, 1, 27, 17, 9, 25, 6, 9, 3, - 36, 33, 46, 32, 13, 42, 40, 33, 13, 13, 8, 44, 17, 31, 47, 39, 20, 12, 38, - 23, 22, 11, 49, 46, 47, 33, 27, 30, 5, 24, 35, 4, 38, 2, 25, 2, 20, 25, - 2, 21, 33, 2, 48, 28, 5, 16, 37, 37, 7, 28, 13, 6, 19, 3, 44, 8, 43, - 37, 17, 24, 14, 18, 41, 34, 17, 17, 21, 24, 3, 30, 25, 21, 50, 38, 12, 27, - 17, 13, 7, 28, 37, 19, 48, 6, 32, 20, 11, 22, 27, 7, 18, 50, 34, 50, 37, - 23, 5, 31, 11, 36, 37, 43, 23, 16, 49, 38, 40, 27, 31, 6, 37, 12, 9, 11, - 42, 38, 10, 36, 34, 40, 45, 49, 33, 15, 32, 8, 9, 40, 4, 34, 5, 40, 35, - 32, 49, 13, 45, 35, 19, 31, 37, 20, 28, 4, 9, 49, 17, 28, 10, 26, 44, 33, - 29, 17, 18, 21, 45, 43, 45, 47, 50, 23, 50, 3, 17, 46, 36, 38, 23, 1, 20, - 15, 36, 23, 33, 39, 25, 27, 28, 25, 26, 22, 18, 3, 40, 44, 22, 21, 44, 28, - 9, 6, 25, 30, 6, 18, 49, 15, 12, 8, 31, 28, 48, 44, 7, 33, 41, 38, 19, - 18, 46, 11, 29, 26, 49, 49, 18, 32, 43, 49, 46, 21, 39, 4, 44, 20, 25, 32, - 33, 28, 17, 4, 45, 26, 49, 1, 26, 45, 29, 39, 19, 26, 40, 42, 20, 32, 23, - 4, 5, 3, 47, 47, 47, 49, 4, 40, 50, 43, 45, 31, 20, 12, 40, 10, 39, 35, - 39, 25, 7, 12, 22, 40, 19, 2, 33, 38, 41, 22, 7, 31, 16, 18, 45, 16, 14, - 44, 19, 26, 16, 4, 2, 43, 29, 25, 17, 16, 22, 41, 36, 20, 49, 49, 23, 3, - 42, 13, 17, 38, 9, 9, 6, 27, 39, 48, 36, 27, 41, 2, 31, 43, 19, 43, 49, - 2, 32, 16, 37, 34, 33, 15, 8, 15, 45, 37, 44, 24, 15, 34, 24, 22, 1, 8, - 26, 48, 31, 23, 12, 40, 18, 48, 8, 19, 23, 31, 37, 18, 49, 46, 17, 22, 3, - 4, 39, 48, 12, 2, 12, 11, 9, 44, 23, 45, 5, 8, 2, 41, 39, 27, 25, 40, - 10, 15, 49, 14, 18, 21, 10, 20, 6, 17, 35, 27, 28, 49, 44, 3, 26, 45, 28, - 25, 39, 41, 15, 22, 42, 10, 42, 39, 30, 18, 18, 38, 1, 38, 1, 21, 33, 23, - 19, 5, 9, 18, 27, 34, 4, 45, 15, 16, 17, 47, 42, 41, 25, 13, 50, 46, 36, - 33, 26, 21, 28, 6, 16, 24, 48, 5, 41, 23, 2, 28, 33, 2, 24, 37, 48, 31, - 23, 34, 12, 39, 36, 21, 44, 10, 24, 29, 3, 50, 7, 23, 24, 29, 21, 38, 29, - 44, 2, 26, 19, 20, 48, 6, 30, 43, 26, 24, 40, 37, 36, 49, 27, 22, 28, 43, - 10, 40, 12, 37, 14, 41, 25, 32, 25, 6, 28, 8, 24, 36, 34, 12, 16, 17, 27, - 12, 38, 8, 7, 28, 22, 42, 23, 22, 6, 49, 16, 34, 16, 41, 37, 27, 38, 42, - 21, 3, 24, 38, 25, 17, 50, 22, 33, 18, 10, 18, 33, 1, 43, 18, 46, 40, 32, - 8, 31, 5, 38, 48, 5, 7, 14, 30, 34, 4, 12, 4, 17, 28, 30, 17, 40, 2, - 33, 23, 9, 45, 12, 41, 22, 11, 11, 4, 43, 1, 39, 31, 19, 7, 49, 50, 38, - 29, 18, 44, 48, 25, 27, 19, 17, 4, 28, 3, 5, 29, 29, 28, 9, 7, 2, 42, - 40, 9, 20, 29, 17, 24, 13, 24, 20, 36, 17, 23, 27, 46, 6, 27, 26, 2, 21, - 4, 5, 15, 9, 19, 21, 41, 43, 30, 2, 13, 44, 48, 19, 41, 7, 27, 12, 24, - 14, 2, 12, 13, 44, 10, 32, 16, 31, 17, 13, 10, 30, 48, 16, 17, 8, 35, 41, - 28, 7, 24, 20, 7, 49, 11, 35, 5, 22, 4, 19, 11, 31, 35, 7, 40, 41, 36, - 18, 5, 46, 5, 19, 16, 1, 15, 7, 42, 13, 36, 7, 1, 24, 17, 11, 22, 10, - 13, 27, 30, 2, 50, 11, 3, 49, 11, 42, 37, 28, 19, 12, 5, 41, 50, 45, 10, - 17, 15, 2, 11, 24, 36, 7, 35, 21, 47, 11, 47, 41, 48, 28, 22, 2, 14, 38, - 42, 19, 18, 46, 35, 45, 2, 28, 50, 46, 9, 18, 11, 42, 33, 42, 7, 35, 16, - 41, 22, 18, 22, 22, 8, 38, 50, 11, 16, 42, 43, 35, 28, 39, 27, 10, 37, 23, - 44, 5, 3, 41, 33, 46, 27, 17, 28, 49, 19, 24, 15, 48, 31, 6, 15, 7, 7, - 47, 47, 45, 46, 12, 33, 44, 13, 3, 49, 21, 39, 23, 23, 4, 7, 32, 19, 26, - 27, 24, 37, 6, 40, 6, 42, 2, 45, 40, 26, 42, 30, 43, 22, 15, 27, 46, 41, - 48, 9, 13, 47, 12, 17, 33, 2, 23, 43, 36, 26, 28, 32, 18, 41, 45, 31, 48, - 47, 8, 41, 44, 50, 39, 9, 25, 35, 6, 48, 2, 9, 17, 7, 48, 13, 20, 17, - 31, 7, 19, 5, 35, 12, 23, 13, 10, 4, 50, 48, 49, 50, 28, 2, 29, 17, 22, - 6, 22, 9, 43, 26, 9, 4, 15, 12, 48, 9, 9, 22, 4, 19, 45, 32, 37, 19, - 44, 2, 23, 11, 22, 30, 28, 41, 3, 26, 19, 35, 23, 3, 14, 34, 3, 29, 23, - 20, 24, 36, 12, 18, 8, 15, 39, 26, 49, 12, 1, 32, 42, 18, 3, 26, 7, 50, - 30, 38, 47, 6, 49, 2, 36, 5, 11, 20, 32, 50, 50, 35, 26, 5, 10, 8, 5, - 22, 18, 43, 39, 35, 23, 47, 14, 6, 48, 14, 32, 30, 39, 5, 50, 21, 15, 30, - 34, 10, 46, 29, 26, 7, 27, 45, 46, 31, 48, 11, 10, 50, 30, 36, 1, 24, 38, - 49, 44, 32, 46, 20, 14, 18, 9, 37, 46, 37, 44, 25, 4, 34, 34, 7, 29, 45, - 18, 42, 19, 42, 12, 11, 27, 4, 47, 14, 41, 15, 32, 5, 48, 50, 25, 12, 21, - 21, 2, 6, 43, 25, 20, 38, 34, 8, 39, 32, 38, 5, 20, 23, 3, 28, 4, 43, - 36, 2, 5, 35, 34, 7, 18, 22, 46, 21, 22, 43, 47, 15, 44, 19, 8, 35, 33, - 26, 23, 28, 5, 28, 21, 8, 39, 50, 15, 1, 12, 15, 46, 12, 6, 21, 31, 22, - 21, 11, 28, 9, 3, 13, 8, 1, 36, 37, 30, 40, 27, 19, 47, 24, 3, 40, 3, - 42, 15, 23, 31, 11, 25, 28, 3, 49, 31, 34, 41, 31, 31, 19, 49, 20, 7, 8, - 39, 36, 38, 12, 29, 36, 21, 10, 41, 38, 31, 16, 8, 25, 22, 20, 37, 30, 20, - 18, 10, 32, 5, 20, 11, 35, 13, 21, 27, 22, 48, 39, 11, 19, 6, 30, 39, 10, - 36, 50, 38, 18, 39, 9, 14, 19, 28, 49, 9, 47, 6, 11, 17, 9, 45, 13, 26, - 16, 34, 26, 10, 18, 24, 48, 10, 44, 46, 19, 49, 13, 41, 8, 32, 21, 9, 36, - 28, 13, 8, 9, 40, 26, 40, 31, 14, 35, 47, 49, 27, 46, 48, 42, 46, 29, 49, - 34, 23, 13, 29, 20, 6, 7, 3, 44, 39, 29, 46, 7, 15, 13, 35, 14, 37, 27, - 42, 34, 46, 27, 5, 44, 26, 13, 2, 11, 26, 1, 40, 27, 45, 13, 21, 24, 30, - 12, 10, 42, 29, 30, 35, 6, 40, 3, 20, 36, 35, 34, 32, 40, 31, 14, 31, 39, - 50, 3, 40, 48, 31, 10, 16, 35, 40, 1, 23, 11, 47, 26, 46, 11, 21, 8, 14, - 39, 31, 10, 23, 48, 11, 6, 29, 43, 50, 15, 42, 48, 39, 1, 33, 44, 1, 40, - 19, 5, 20, 11, 23, 13, 28, 47, 34, 39, 40, 50, 48, 3, 44, 36, 19, 44, 21, - 30, 7, 21, 38, 22, 7, 29, 3, 47, 20, 39, 4, 22, 32, 34, 35, 15, 10, 36, - 42, 19, 42, 22, 29, 10, 32, 2, 47, 50, 38, 12, 47, 42, 18, 11, 3, 44, 10, - 10, 3, 18, 48, 1, 30, 28, 25, 6, 26, 47, 10, 36, 24, 32, 1, 45, 36, 41, - 37, 37, 12, 47, 2, 18, 5, 11, 29, 5, 41, 16, 23, 32, 33, 3, 45, 14, 39, - 41, 4, 37, 38, 46, 16, 25, 5, 37, 32, 28, 2, 34, 38, 4, 28, 12, 40, 15, - 14, 32, 15, 44, 25, 25, 40, 49, 22, 8, 39, 39, 24, 22, 24, 28, 49, 3, 50, - 19, 49, 25, 41, 39, 19, 24, 39, 6, 12, 6, 3, 46, 32, 35, 21, 41, 2, 40, - 28, 47, 43, 15, 48, 26, 39, 2, 18, 36, 8, 29, 23, 17, 12, 17, 21, 15, 50, - 7, 21, 27, 50, 4, 3, 27, 4, 4, 33, 1, 2, 8, 37, 23, 40, 4, 27, 1, - 35, 30, 33, 6, 43, 28, 16, 30, 8, 16, 22, 47, 3, 21, 43, 16, 34, 7, 50, - 31, 13, 11, 11, 14, 15, 17, 24, 24, 14, 41, 36, 2, 11, 34, 36, 17, 16, 36, - 21, 26, 33, 23, 29, 26, 33, 31, 19, 40, 26, 3, 18, 41, 12, 6, 41, 4, 21, - 14, 38, 1, 21, 22, 9, 37, 31, 49, 23, 32, 10, 24, 42, 43, 11, 15, 6, 6, - 40, 44, 41, 42, 21, 7, 10, 11, 48, 34, 38, 50, 38, 40, 19, 26, 34, 38, 45, - 2, 45, 1, 1, 27, 13, 10, 11, 2, 49, 34, 46, 23, 39, 15, 27, 15, 33, 44, - 43, 42, 13, 44, 21, 5, 36, 27, 10, 21, 20, 44, 9, 12, 29, 9, 21, 13, 25, - 36, 19, 1, 47, 40, 44, 31, 44, 35, 28, 42, 39, 15, 43, 19, 3, 30, 48, 41, - 9, 6, 26, 48, 24, 1, 41, 40, 38, 43, 35, 37, 16, 7, 3, 48, 36, 31, 26, - 49, 5, 18, 43, 40, 36, 7, 4, 14, 37, 21, 14, 47, 6, 31, 49, 11, 18, 14, - 34, 25, 15, 5, 9, 36, 12, 41, 1, 32, 11, 11, 37, 22, 20, 5, 5, 25, 10, - 31, 7, 47, 41, 41, 4, 50, 32, 10, 16, 25, 14, 4, 20, 41, 21, 12, 47, 7, - 28, 35, 20, 32, 40, 41, 49, 43, 48, 35, 21, 32, 18, 9, 50, 11, 45, 27, 25, - 32, 11, 3, 37, 44, 40, 21, 21, 22, 22, 44, 31, 37, 23, 39, 7, 10, 30, 33, - 28, 36, 42, 20, 30, 8, 34, 35, 2, 11, 42, 13, 30, 31, 2, 31, 17, 50, 22, - 38, 12, 9, 39, 12, 26, 50, 15, 44, 50, 2, 2, 22, 13, 6, 41, 40, 24, 4, - 4, 41, 50, 16, 8, 26, 23, 32, 38, 47, 40, 10, 39, 42, 43, 6, 10, 28, 39, - 9, 9, 38, 20, 24, 40, 39, 9, 46, 25, 45, 22, 39, 18, 28, 29, 16, 36, 41, - 15, 40, 24, 22, 21, 38, 28, 50, 31, 42, 21, 41, 49, 42, 36, 13, 46, 50, 36, - 4, 41, 11, 36, 23, 24, 25, 39, 8, 48, 7, 32, 50, 45, 3, 49, 38, 12, 26, - 15, 30, 5, 6, 39, 43, 45, 19, 4, 47, 38, 37, 13, 27, 23, 5, 6, 29, 8, - 2, 1, 15, 39, 18, 24, 49, 16, 1, 2, 50, 30, 37, 49, 34, 25, 41, 14, 24, - 30, 45, 17, 39, 29, 26, 3, 26, 33, 23, 48, 27, 6, 37, 13, 18, 13, 22, 33, - 31, 43, 1, 14, 6, 42, 3, 50, 1, 1, 41, 46, 3, 29, 21, 47, 12, 28, 50, - 34, 1, 11, 15, 17, 45, 29, 37, 32, 3, 33, 40, 8, 40, 9, 18, 4, 33, 11, - 45, 21, 14, 33, 29, 4, 8, 23, 11, 28, 27, 37, 2, 41, 28, 12, 7, 23, 41, - 15, 20, 40, 20, 42, 23, 19, 45, 40, 45, 35, 31, 11, 4, 26, 27, 11, 29, 22, - 34, 32, 23, 8, 8, 28, 22, 26, 9, 20, 36, 15, 32, 36, 3, 45, 33, 13, 35, - 17, 24, 38, 41, 45, 32, 46, 25, 21, 43, 29, 14, 45, 49, 21, 4, 11, 31, 32, - 38, 41, 39, 32, 46, 21, 36, 13, 2, 32, 44, 8, 7, 8, 2, 28, 32, 22, 25, - 40, 17, 8, 21, 7, 7, 42, 18, 11, 46, 18, 26, 45, 28, 27, 31, 48, 50, 6, - 29, 50, 41, 37, 21, 40, 49, 33, 15, 19, 28, 12, 37, 34, 14, 45, 32, 30, 1, - 34, 47, 10, 7, 3, 40, 43, 40, 21, 12, 27, 2, 38, 7, 26, 27, 27, 41, 16, - 24, 14, 29, 24, 48, 21, 29, 45, 10, 19, 30, 50, 40, 34, 27, 45, 42, 41, 12, - 10, 25, 18, 20, 1, 30, 22, 19, 5, 37, 44, 49, 25, 37, 16, 18, 44, 11, 42, - 7, 42, 44, 34, 45, 45, 9, 41, 44, 20, 44, 17, 16, 27, 32, 5, 22, 10, 34, - 37, 7, 49, 17, 15, 21, 2, 38, 16, 16, 16, 37, 38, 44, 33, 4, 13, 21, 46, - 41, 12, 13, 28, 6, 6, 8, 28, 18, 48, 40, 40, 48, 48, 13, 5, 40, 18, 1, - 42, 28, 35, 16, 25, 10, 44, 11, 9, 48, 3, 37, 25, 33, 31, 5, 27, 37, 14, - 39, 14, 16, 49, 45, 30, 9, 21, 43, 44, 36, 17, 40, 36, 43, 38, 50, 20, 9, - 21, 27, 9, 27, 7, 21, 43, 29, 1, 2, 28, 43, 29, 37, 30, 21, 10, 5, 37, - 44, 50, 46, 22, 22, 3, 5, 19, 36, 40, 11, 4, 45, 32, 14, 43, 31, 14, 43, - 17, 8, 44, 9, 1, 24, 35, 40, 10, 34, 16, 4, 18, 17, 9, 9, 45, 6, 5, - 45, 23, 28, 27, 7, 28, 11, 28, 24, 25, 39, 3, 7, 38, 45, 12, 27, 34, 28, - 42, 49, 24, 22, 21, 18, 22, 38, 49, 8, 2, 45, 10, 21, 49, 24, 41, 12, 50, - 2, 45, 19, 26, 33, 3, 15, 16, 24, 19, 41, 13, 22, 23, 43, 50, 37, 5, 6, - 43, 35, 19, 28, 8, 19, 38, 4, 1, 37, 26, 40, 9, 36, 14, 16, 32, 35, 18, - 42, 24, 20, 21, 19, 10, 23, 18, 2, 48, 49, 43, 11, 39, 42, 48, 40, 12, 21, - 47, 28, 33, 23, 35, 26, 9, 47, 31, 3, 17, 39, 11, 15, 14, 40, 23}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/div_approx_gendata.pl b/bb-tests/workloads/src/CTest/rvv/vec-div-approx/div_approx_gendata.pl deleted file mode 100755 index 5ee73828..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/div_approx_gendata.pl +++ /dev/null @@ -1,135 +0,0 @@ -#!/usr/bin/perl -w -#========================================================================== -# div_approx.pl -# -# Author: Generated -# Date: Today -# -(our $usageMsg = <<'ENDMSG') =~ s/^\#//gm; -# -# Simple script which creates an input data set and the reference data -# for the given conditional operation. -# -ENDMSG - -use strict "vars"; -use warnings; -no warnings("once"); -use Getopt::Long; - -#-------------------------------------------------------------------------- -# Command line processing -#-------------------------------------------------------------------------- - -our %opts; - -sub usage() -{ - - print "\n"; - print " Usage: conditional_gendata.pl [options] \n"; - print "\n"; - print " Options:\n"; - print " --help print this message\n"; - print " --size size of input data [1000]\n"; - print " --seed random seed [1]\n"; - print "$usageMsg"; - - exit(); -} - -sub processCommandLine() -{ - - $opts{"help"} = 0; - $opts{"size"} = 300; - $opts{"seed"} = 1; - Getopt::Long::GetOptions( \%opts, 'help|?', 'size:i', 'seed:i' ) or usage(); - $opts{"help"} and usage(); - -} - -#-------------------------------------------------------------------------- -# Helper Functions -#-------------------------------------------------------------------------- -sub printArray -{ - my $arrayName = $_[0]; - my $arrayRef = $_[1]; - my $type = $_[2]; - - my $numCols = 20; - my $arrayLen = scalar(@{$arrayRef}); - - print $type." ".$arrayName."[DATA_SIZE] = \n"; - print "{\n"; - - if ( $arrayLen <= $numCols ) { - print " "; - for ( my $i = 0; $i < $arrayLen; $i++ ) { - print sprintf("%3d",$arrayRef->[$i]); - if ( $i != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - else { - my $numRows = int($arrayLen/$numCols); - for ( my $j = 0; $j < $numRows; $j++ ) { - print " "; - for ( my $i = 0; $i < $numCols; $i++ ) { - my $index = $j*$numCols + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - if ( $arrayLen > ($numRows*$numCols) ) { - print " "; - for ( my $i = 0; $i < ($arrayLen-($numRows*$numCols)); $i++ ) { - my $index = $numCols*$numRows + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - } - - print "};\n\n"; -} - -#-------------------------------------------------------------------------- -# Main -#-------------------------------------------------------------------------- - -sub main() -{ - - processCommandLine(); - srand($opts{"seed"}); - - my @input1_data; # x - my @input2_data; # y - for ( my $i = 0; $i < $opts{"size"}; $i++ ) { - my $valueX = int(rand(50)); # x - my $valueY = 1 + int(rand(50)); # y - - - push( @input1_data, $valueX ); - push( @input2_data, $valueY ); - } - - print "\n\#define DATA_SIZE ".$opts{"size"}." \n\n"; - printArray( "input1_data", \@input1_data, "float" ); # x - printArray( "input2_data", \@input2_data, "float" ); # y -} - -main(); diff --git a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/vec-div_approx.S b/bb-tests/workloads/src/CTest/rvv/vec-div-approx/vec-div_approx.S deleted file mode 100644 index 2281e1b9..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/vec-div_approx.S +++ /dev/null @@ -1,26 +0,0 @@ - .text - .balign 4 - .global vec_div_approx - -# v1 = v1 / v2 to almost 23 bits of precision. - -vec_div_approx: - vsetvli t1, a0, e32, m4, ta, mu - vle32.v v4, (a1) # load x values - vle32.v v8, (a2) # load y values - sub a0, a0, t1 - slli t1, t1, 2 - vfrec7.v v12, v8 # Estimate 1/v2 - li t0, 0x40000000 - vmv.v.x v16, t0 # Splat 2.0 - vfnmsac.vv v16, v8, v12 # 2.0 - v2 * est(1/v2) - vfmul.vv v12, v12, v16 # Better estimate of 1/v2 - vmv.v.x v16, t0 # Splat 2.0 - vfnmsac.vv v16, v8, v12 # 2.0 - v2 * est(1/v2) - vfmul.vv v12, v12, v16 # Better estimate of 1/v2 - vfmul.vv v4, v4, v12 # Estimate of v1/v2 - vse32.v v4, (a1) - add a2, a2, t1 # Bump pointer - add a1, a1, t1 # Bump pointer - bnez a0, vec_div_approx - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/vec-div_approx_main.c b/bb-tests/workloads/src/CTest/rvv/vec-div-approx/vec-div_approx_main.c deleted file mode 100644 index 416ef7ef..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-div-approx/vec-div_approx_main.c +++ /dev/null @@ -1,48 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// division approximation benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized conditional implementation. -// The input data (and reference data) should be generated using -// the div_approx_gendata.pl perl script and dumped to a file named -// dataset1.h. - -#include "util.h" -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" -#include - -//-------------------------------------------------------------------------- -// Main - -void vec_div_approx(size_t n, float x[], float y[]); - -int main(int argc, char *argv[]) { - printf("Div approx size = %ld\n", DATA_SIZE); -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_div_approx(DATA_SIZE, input1_data, input2_data); -#endif - - // Do the division - setStats(1); - vec_div_approx(DATA_SIZE, input1_data, input2_data); - setStats(0); - - // int i; - // // Unrolled for faster verification - // for (i = 0; i < 17/2*2; i+=2) - // { - // float t0 = input1_data[i], t1 = input1_data[i+1]; - // printf("test_val: %.2f\n", t0); - // printf("test_val: %.2f\n", t1); - // } - // if (17 % 2 != 0) printf("test_val: %.2f\n\n", input1_data[17-1]); - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/dotproduct.c b/bb-tests/workloads/src/CTest/rvv/vec-dotprod/dotproduct.c deleted file mode 100644 index 7b976a11..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/dotproduct.c +++ /dev/null @@ -1,299 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "dotproduct.h" - -int64_t dotp_v64b(int64_t *a, int64_t *b, uint64_t avl) { - size_t orig_avl = avl; - size_t vl = __riscv_vsetvl_e64m8(avl); - - vint64m8_t acc, buf_a, buf_b; - vint64m1_t red; - - int64_t *a_ = (int64_t *)a; - int64_t *b_ = (int64_t *)b; - - // Clean the accumulator - red = __riscv_vmv_s_x_i64m1(0, vl); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - vl = __riscv_vsetvl_e64m8(avl); - // Load chunk a and b - buf_a = __riscv_vle64_v_i64m8(a_, vl); - buf_b = __riscv_vle64_v_i64m8(b_, vl); - // Multiply and accumulate - if (avl == orig_avl) { - acc = __riscv_vmul_vv_i64m8(buf_a, buf_b, vl); - } else { - acc = __riscv_vmacc_vv_i64m8(acc, buf_a, buf_b, vl); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - red = __riscv_vredsum_vs_i64m8_i64m1(acc, red, vl); - return __riscv_vmv_x_s_i64m1_i64(red); -} - -int32_t dotp_v32b(int32_t *a, int32_t *b, uint64_t avl) { - size_t orig_avl = avl; - size_t vl = __riscv_vsetvl_e32m8(avl); - - vint32m8_t acc, buf_a, buf_b; - vint32m1_t red; - - int32_t *a_ = (int32_t *)a; - int32_t *b_ = (int32_t *)b; - - // Clean the accumulator - red = __riscv_vmv_s_x_i32m1(0, vl); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - vl = __riscv_vsetvl_e32m8(avl); - // Load chunk a and b - buf_a = __riscv_vle32_v_i32m8(a_, vl); - buf_b = __riscv_vle32_v_i32m8(b_, vl); - // Multiply and accumulate - if (avl == orig_avl) { - acc = __riscv_vmul_vv_i32m8(buf_a, buf_b, vl); - } else { - acc = __riscv_vmacc_vv_i32m8(acc, buf_a, buf_b, vl); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and return - red = __riscv_vredsum_vs_i32m8_i32m1(acc, red, vl); - return __riscv_vmv_x_s_i32m1_i32(red); -} - -int16_t dotp_v16b(int16_t *a, int16_t *b, uint64_t avl) { - size_t orig_avl = avl; - size_t vl = __riscv_vsetvl_e16m8(avl); - - vint16m8_t acc, buf_a, buf_b; - vint16m1_t red; - - int16_t *a_ = (int16_t *)a; - int16_t *b_ = (int16_t *)b; - - // Clean the accumulator - red = __riscv_vmv_s_x_i16m1(0, vl); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - vl = __riscv_vsetvl_e16m8(avl); - // Load chunk a and b - buf_a = __riscv_vle16_v_i16m8(a_, vl); - buf_b = __riscv_vle16_v_i16m8(b_, vl); - // Multiply and accumulate - if (avl == orig_avl) { - acc = __riscv_vmul_vv_i16m8(buf_a, buf_b, vl); - } else { - acc = __riscv_vmacc_vv_i16m8(acc, buf_a, buf_b, vl); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and store - red = __riscv_vredsum_vs_i16m8_i16m1(acc, red, vl); - return __riscv_vmv_x_s_i16m1_i16(red); -} - -int8_t dotp_v8b(int8_t *a, int8_t *b, uint64_t avl) { - size_t orig_avl = avl; - size_t vl = __riscv_vsetvl_e8m8(avl); - - vint8m8_t acc, buf_a, buf_b; - vint8m1_t red; - - int8_t *a_ = (int8_t *)a; - int8_t *b_ = (int8_t *)b; - - // Clean the accumulator - red = __riscv_vmv_s_x_i8m1(0, vl); - // Stripmine and accumulate a partial reduced vector - for (; avl > 0; avl -= vl) { - vl = __riscv_vsetvl_e8m8(avl); - // Load chunk a and b - buf_a = __riscv_vle8_v_i8m8(a_, vl); - buf_b = __riscv_vle8_v_i8m8(b_, vl); - // Multiply and accumulate - if (avl == orig_avl) { - acc = __riscv_vmul_vv_i8m8(buf_a, buf_b, vl); - } else { - acc = __riscv_vmacc_vv_i8m8(acc, buf_a, buf_b, vl); - } - // Bump pointers - a_ += vl; - b_ += vl; - } - - // Reduce and store - red = __riscv_vredsum_vs_i8m8_i8m1(acc, red, vl); - return __riscv_vmv_x_s_i8m1_i8(red); -} - -int64_t dotp_s64b(int64_t *a, int64_t *b, uint64_t avl) { - int64_t acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} - -int32_t dotp_s32b(int32_t *a, int32_t *b, uint64_t avl) { - int32_t acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} - -int16_t dotp_s16b(int16_t *a, int16_t *b, uint64_t avl) { - int16_t acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} - -int8_t dotp_s8b(int8_t *a, int8_t *b, uint64_t avl) { - int8_t acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; - - acc0 = 0; - acc1 = 0; - acc2 = 0; - acc3 = 0; - acc4 = 0; - acc5 = 0; - acc6 = 0; - acc7 = 0; - - for (uint64_t i = 0; i < avl; i += 8) { - acc0 += a[i + 0] * b[i + 0]; - acc1 += a[i + 1] * b[i + 1]; - acc2 += a[i + 2] * b[i + 2]; - acc3 += a[i + 3] * b[i + 3]; - acc4 += a[i + 4] * b[i + 4]; - acc5 += a[i + 5] * b[i + 5]; - acc6 += a[i + 6] * b[i + 6]; - acc7 += a[i + 7] * b[i + 7]; - } - - acc0 += acc1; - acc2 += acc3; - acc4 += acc5; - acc6 += acc7; - - acc0 += acc2; - acc4 += acc6; - - acc0 += acc4; - - return acc0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/dotproduct.h b/bb-tests/workloads/src/CTest/rvv/vec-dotprod/dotproduct.h deleted file mode 100644 index 7139417c..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/dotproduct.h +++ /dev/null @@ -1,37 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#ifndef _DOTPRODUCT_H_ -#define _DOTPRODUCT_H_ - -#include -#include - -#include - -int64_t dotp_v64b(int64_t *a, int64_t *b, uint64_t avl); -int32_t dotp_v32b(int32_t *a, int32_t *b, uint64_t avl); -int16_t dotp_v16b(int16_t *a, int16_t *b, uint64_t avl); -int8_t dotp_v8b(int8_t *a, int8_t *b, uint64_t avl); - -int64_t dotp_s64b(int64_t *a, int64_t *b, uint64_t avl); -int32_t dotp_s32b(int32_t *a, int32_t *b, uint64_t avl); -int16_t dotp_s16b(int16_t *a, int16_t *b, uint64_t avl); -int8_t dotp_s8b(int8_t *a, int8_t *b, uint64_t avl); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-dotprod/gen_data.py deleted file mode 100755 index 2a29ca53..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/gen_data.py +++ /dev/null @@ -1,96 +0,0 @@ -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Generate input data for fdotp benchmark -# arg: #elements per vector - -import numpy as np -import random -from functools import reduce -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# Vector length -if len(sys.argv) > 1: - vsize = int(sys.argv[1]) -else: - # Default: no stripmine - vsize = 64 - -avl64 = int(vsize) -avl32 = int(vsize) -avl16 = int(vsize) -avl8 = int(vsize) - -# Create the vectors -v64a = np.random.randint(-(2 ** (50)), high=2 ** (50) - 1, size=avl64, dtype=np.int64) -v64b = np.random.randint(-(2 ** (50)), high=2 ** (50) - 1, size=avl64, dtype=np.int64) -v32a = np.random.randint(-(2 ** (20)), high=2 ** (20) - 1, size=avl32, dtype=np.int32) -v32b = np.random.randint(-(2 ** (20)), high=2 ** (20) - 1, size=avl32, dtype=np.int32) -v16a = np.random.randint(-(2 ** (10)), high=2 ** (10) - 1, size=avl16, dtype=np.int16) -v16b = np.random.randint(-(2 ** (10)), high=2 ** (10) - 1, size=avl16, dtype=np.int16) -v8a = np.random.randint(-(2 ** (2)), high=2 ** (2) - 1, size=avl8, dtype=np.int8) -v8b = np.random.randint(-(2 ** (2)), high=2 ** (2) - 1, size=avl8, dtype=np.int8) - -# Create the golden output -gold64 = reduce(lambda a, b: a + b, np.multiply(v64a, v64b)) -gold32 = reduce(lambda a, b: a + b, np.multiply(v32a, v32b)) -gold16 = reduce(lambda a, b: a + b, np.multiply(v16a, v16b)) -gold16 = np.array([gold16, gold16]) -gold8 = reduce(lambda a, b: a + b, np.multiply(v8a, v8b)) -gold8 = np.array([gold8, gold8, gold8, gold8]) - -# Create the empty result vectors -res64 = 0 -res32 = 0 -res16 = 0 -res8 = 0 - -# Print information on file -print('.section .data,"aw",@progbits') -emit("vsize", np.array(vsize, dtype=np.uint64)) -emit("v64a", v64a, "32") -emit("v64b", v64b, "32") -emit("v32a", v32a, "32") -emit("v32b", v32b, "32") -emit("v16a", v16a, "32") -emit("v16b", v16b, "32") -emit("v8a", v8a, "32") -emit("v8b", v8b, "32") -# emit("gold64", np.array(gold64, dtype=np.int64)); -# emit("gold32", np.array(gold32, dtype=np.int32)); -# emit("gold16", gold16, '32'); -# emit("gold8", gold8, '32'); -emit("res64_v", np.array(res64, dtype=np.int64)) -emit("res32_v", np.array(res32, dtype=np.int32)) -emit("res16_v", np.array(res16, dtype=np.int32)) -emit("res8_v", np.array(res8, dtype=np.int32)) -emit("res64_s", np.array(res64, dtype=np.int64)) -emit("res32_s", np.array(res32, dtype=np.int32)) -emit("res16_s", np.array(res16, dtype=np.int32)) -emit("res8_s", np.array(res8, dtype=np.int32)) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/main.c b/bb-tests/workloads/src/CTest/rvv/vec-dotprod/main.c deleted file mode 100644 index c33e18ad..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dotprod/main.c +++ /dev/null @@ -1,108 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "dotproduct.h" -#include "util.h" -#include -#include -#include - -// Run also the scalar benchmark -#define SCALAR 1 - -// Check the vector results against golden vectors -#define CHECK 1 - -// Vector size (Byte) -extern uint64_t vsize; -// Vectors for benchmarks -extern int64_t v64a[] __attribute__((aligned(256), section(".l2"))); -extern int64_t v64b[] __attribute__((aligned(256), section(".l2"))); -extern int32_t v32a[] __attribute__((aligned(256), section(".l2"))); -extern int32_t v32b[] __attribute__((aligned(256), section(".l2"))); -extern int16_t v16a[] __attribute__((aligned(256), section(".l2"))); -extern int16_t v16b[] __attribute__((aligned(256), section(".l2"))); -extern int8_t v8a[] __attribute__((aligned(256), section(".l2"))); -extern int8_t v8b[] __attribute__((aligned(256), section(".l2"))); -// Output vectors -extern int64_t res64_v, res64_s; -extern int32_t res32_v, res32_s; -extern int16_t res16_v, res16_s; -extern int8_t res8_v, res8_s; - -int main() { - printf("DOTP %ld\n", vsize); - - unsigned long cycles1, cycles2, instr2, instr1; - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - // Dotp - printf("Calulating 64b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res64_v = dotp_v64b(v64a, v64b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector cycles: %ld instructions: %ld\n", cycles2 - cycles1, - instr2 - instr1); - } - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - // Dotp - printf("Calulating 32b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res32_v = dotp_v32b(v32a, v32b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector cycles: %ld instructions: %ld\n", cycles2 - cycles1, - instr2 - instr1); - } - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - // Dotp - printf("Calulating 16b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res16_v = dotp_v16b(v16a, v16b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector cycles: %ld instructions: %ld\n", cycles2 - cycles1, - instr2 - instr1); - } - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - // Dotp - printf("Calulating 8b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res8_v = dotp_v8b(v8a, v8b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector cycles: %ld instructions: %ld\n", cycles2 - cycles1, - instr2 - instr1); - } - - printf("SUCCESS.\n"); - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dropout/dropout.c b/bb-tests/workloads/src/CTest/rvv/vec-dropout/dropout.c deleted file mode 100644 index 8a1ec7b6..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dropout/dropout.c +++ /dev/null @@ -1,61 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include "dropout.h" - -// Scalar dropout -void dropout_gold(const unsigned int n, const float *i, const float scale, - const uint8_t *sel_ptr, float *o) { - uint8_t buf_sel, sel; - for (unsigned int k = 0; k < n; ++k) { - if (!(k % 8)) - buf_sel = sel_ptr[k >> 3]; - sel = buf_sel & 0x01; - o[k] = sel ? (i[k] * scale) : 0; - buf_sel >>= 1; - } -} - -void dropout_vec(const unsigned int n, const float *i, const float scale, - const uint8_t *sel_ptr, float *o) { - unsigned int vl; - - asm volatile("vsetvli %[vl], %[n], e32, m8, ta, ma" - : [vl] "=r"(vl) - : [n] "r"(n)); - - for (unsigned int avl = n; avl > 0; avl -= vl) { - // Find next vl - asm volatile("vsetvli %[vl], %[avl], e32, m8, ta, ma" - : [vl] "=r"(vl) - : [avl] "r"(avl)); - // Load the mask vector (1 = keep, 0 = drop) - asm volatile("vlm.v v0, (%[sel_ptr])" ::[sel_ptr] "r"(sel_ptr)); - // Initialize output vector with zeroes - asm volatile("vmv.v.i v24, 0"); - // Load input vector - asm volatile("vle32.v v8, (%[i])" ::[i] "r"(i)); - // Calculate output vector - asm volatile("vfmul.vf v24, v8, %[scale], v0.t" ::[scale] "f"(scale)); - asm volatile("vse32.v v24, (%[o])" ::[o] "r"(o)); - // Bump pointers - i += vl; - sel_ptr += vl >> 3; - o += vl; - } -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dropout/dropout.h b/bb-tests/workloads/src/CTest/rvv/vec-dropout/dropout.h deleted file mode 100644 index 812b7080..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dropout/dropout.h +++ /dev/null @@ -1,31 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#ifndef _DROPOUT_H_ -#define _DROPOUT_H_ - -#include - -#include - -void dropout_gold(const unsigned int n, const float *i, const float scale, - const uint8_t *sel_ptr, float *o); -void dropout_vec(const unsigned int n, const float *i, const float scale, - const uint8_t *sel_ptr, float *o); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dropout/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-dropout/gen_data.py deleted file mode 100644 index 1abdf9e4..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dropout/gen_data.py +++ /dev/null @@ -1,66 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: image size, arg2: filter size - -import numpy as np -import random -import sys - - -def rand_array(N, dtype): - return np.random.rand(N).astype(dtype) - - -def rand_sel(N, dtype): - return np.random.randint(0, 256, N, dtype) - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -if len(sys.argv) > 1: - N = int(sys.argv[1]) -else: - N = 1024 - -# Generate inputs -input_array = rand_array(N, np.float32) -SCALE = rand_array(1, np.float32)[0] -SEL = rand_sel(N, np.uint8) - -# Create the empty o matrix -o = np.zeros(N).astype(np.float32) -o_gold = o - -# Print information on file -print('.section .data,"aw",@progbits') -emit("N", np.array(N, dtype=np.uint64)) -emit("SCALE", np.array(SCALE, dtype=np.float32)) -emit("I", input_array, "NR_LANES*4") -emit("SEL", SEL, "NR_LANES*4") -emit("o", o, "NR_LANES*4") -emit("o_gold", o_gold, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-dropout/main.c b/bb-tests/workloads/src/CTest/rvv/vec-dropout/main.c deleted file mode 100644 index 94fcd334..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-dropout/main.c +++ /dev/null @@ -1,66 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -// Include to use vector intrinsics -// Documentation: https://github.com/riscv/rvv-intrinsic-doc -// Compiler support: -// https://github.com/riscv/riscv-gnu-toolchain/tree/rvv-intrinsic - -#include "dropout.h" -#include "util.h" - -extern const unsigned int N; -extern const float SCALE; -extern const float I[] __attribute__((aligned(16))); -extern const uint8_t SEL[] __attribute__((aligned(16))); -extern float o[] __attribute__((aligned(16))); -extern float o_gold[] __attribute__((aligned(16))); - -int main() { - printf("DROPOU\n"); - unsigned long cycles1, cycles2, instr2, instr1; - - printf("Running Dropout with %d elements.\n", N); - - // Call the main kernel, and measure cycles - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - dropout_vec(N, I, SCALE, SEL, o); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - // Only count effective SPFLOP/cycle - float performance = (float)N / (cycles2 - cycles1); - printf("The execution took %ld cycles.\n", cycles2 - cycles1); - printf("The execution performed %d SFLOPs per 1000 cycles\n", - (int)(performance * 1000)); - - // Verify correctness - dropout_gold(N, I, SCALE, SEL, o_gold); - - for (unsigned int k = 0; k < N; ++k) { - if (o[k] != o_gold[k]) { - printf("Error: o[%d] = %f != %f\n", k, o[k], o_gold[k]); - return k ? k : -1; - } - } - printf("Passed.\n"); - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-exp/exp.c b/bb-tests/workloads/src/CTest/rvv/vec-exp/exp.c deleted file mode 100644 index 6fb2e277..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-exp/exp.c +++ /dev/null @@ -1,58 +0,0 @@ -// Modified version of: -// "RISC-V VECTOR EXP FUNCTION Version by Cristóbal Ramírez Lazo, "Barcelona -// 2019"" Find details on the original version below Author: Matteo Perotti -// - -// -// RISC-V VECTOR EXP FUNCTION Version by Cristóbal Ramírez Lazo, "Barcelona -// 2019" This RISC-V Vector implementation is based on the original code -// presented by Julien Pommier - -/* - AVX implementation of sin, cos, sincos, exp and log - Based on "sse_mathfun.h", by Julien Pommier - http://gruntthepeon.free.fr/ssemath/ - Copyright (C) 2012 Giovanni Garberoglio - Interdisciplinary Laboratory for Computational Science (LISC) - Fondazione Bruno Kessler and University of Trento - via Sommarive, 18 - I-38123 Trento (Italy) - This software is provided 'as-is', without any express or implied - warranty. In no event will the authors be held liable for any damages - arising from the use of this software. - Permission is granted to anyone to use this software for any purpose, - including commercial applications, and to alter it and redistribute it - freely, subject to the following restrictions: - 1. The origin of this software must not be misrepresented; you must not - claim that you wrote the original software. If you use this software - in a product, an acknowledgment in the product documentation would be - appreciated but is not required. - 2. Altered source versions must be plainly marked as such, and must not be - misrepresented as being the original software. - 3. This notice may not be removed or altered from any source distribution. - (this is the zlib license) -*/ - -#include "ara/exp.h" - -#define EXP_BMARK(t, w, l) \ - void exp_f##w##m##l##_bmark(t *exponents, t *results, size_t len) { \ - size_t avl = len; \ - vfloat##w##m##l##_t exp_vec, res_vec; \ - \ - for (size_t vl = __riscv_vsetvl_e##w##m##l(avl); avl > 0; avl -= vl) { \ - vl = __riscv_vsetvl_e##w##m##l(avl); \ - exp_vec = __riscv_vle##w##_v_f##w##m##l(exponents, vl); \ - res_vec = __exp_f##w##m##l(exp_vec, vl); \ - __riscv_vse##w##_v_f##w##m##l(results, res_vec, vl); \ - exponents += vl; \ - results += vl; \ - } \ - } - -EXP_BMARK(double, 64, 1) -EXP_BMARK(double, 64, 2) -EXP_BMARK(double, 64, 4) -EXP_BMARK(float, 32, 1) -EXP_BMARK(float, 32, 2) -EXP_BMARK(float, 32, 4) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-exp/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-exp/gen_data.py deleted file mode 100644 index 0532139a..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-exp/gen_data.py +++ /dev/null @@ -1,71 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: vector size, arg2: filter size - -import random as rand -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -def rand_matrix(N, dtype): - return np.random.rand(N).astype(dtype) - - -# SCRIPT - -if len(sys.argv) == 2: - N_f64 = int(sys.argv[1]) - N_f32 = 2 * N_f64 -else: - print("Error. Give me one argument: the number of vector elements.") - sys.exit() - -# Vector of samples -exponents_f64 = rand_matrix(N_f64, np.float64).astype(np.float64) -exponents_f32 = rand_matrix(N_f32, np.float32).astype(np.float32) - -# Results buffer -results_f64 = np.zeros(N_f64, dtype=np.float64) -results_f32 = np.zeros(N_f32, dtype=np.float32) - -# Gold results -gold_results_f64 = np.exp(exponents_f64, dtype=np.float64) -gold_results_f32 = np.exp(exponents_f32, dtype=np.float32) - -# Create the file -print('.section .data,"aw",@progbits') -emit("N_f64", np.array(N_f64, dtype=np.uint64)) -emit("exponents_f64", exponents_f64, "32") -emit("results_f64", results_f64, "32") -emit("gold_results_f64", gold_results_f64, "32") -emit("N_f32", np.array(N_f32, dtype=np.uint32)) -emit("exponents_f32", exponents_f32, "32") -emit("results_f32", results_f32, "32") -emit("gold_results_f32", gold_results_f32, "32") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-exp/main.c b/bb-tests/workloads/src/CTest/rvv/vec-exp/main.c deleted file mode 100644 index a84f85bb..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-exp/main.c +++ /dev/null @@ -1,174 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include -#include - -#include "ara/exp.h" -#include "ara/util.h" -#include "util.h" -#include - -#define N_F64 512 -#define N_F32 1024 - -extern size_t N_f64; -extern double exponents_f64[] __attribute__((aligned(32))); -extern double results_f64[] __attribute__((aligned(32))); -extern double gold_results_f64[] __attribute__((aligned(32))); - -extern size_t N_f32; -extern float exponents_f32[] __attribute__((aligned(32))); -extern float results_f32[] __attribute__((aligned(32))); -extern float gold_results_f32[] __attribute__((aligned(32))); - -double results_f64m1[N_F64] __attribute__((aligned(32))); -double results_f64m2[N_F64] __attribute__((aligned(32))); -double results_f64m4[N_F64] __attribute__((aligned(32))); -float results_f32m1[N_F32] __attribute__((aligned(32))); -float results_f32m2[N_F32] __attribute__((aligned(32))); -float results_f32m4[N_F32] __attribute__((aligned(32))); - -#define THRESHOLD 1.0 - -int check64(double *results) { - int error = 0; - for (uint64_t i = 0; i < N_f64; ++i) { - if (!similarity_check(results[i], gold_results_f64[i], THRESHOLD)) { - error = 1; - printf("64-bit error at index %d. %lx != %lx\n", i, - *(uint64_t *)(&results[i]), *(uint64_t *)(&gold_results_f64[i])); - } - } - return error; -} - -int check32(float *results) { - int error = 0; - for (uint64_t i = 0; i < N_f32; ++i) { - if (!similarity_check(results[i], gold_results_f32[i], THRESHOLD)) { - error = 1; - printf("32-bit error at index %d. %x != %x\n", i, - *(uint32_t *)(&results[i]), *(uint32_t *)(&gold_results_f32[i])); - } - } - return error; -} - -int main() { - printf("FEXP N_64=%ld N_32=%ld\n", N_f64, N_f32); - - if (N_F64 != N_f64 || N_F32 != N_f32) - return 1; - - int error = 0; - unsigned long cycles1, cycles2, instr2, instr1; - - if (N_f32 >= 256) { - for (size_t t = 8; t <= 256; t += 31) { - cycles1 = read_csr(mcycle); - exp_f32m4_bmark(exponents_f32, results_f32m2, t); - asm volatile("fence"); - cycles2 = read_csr(mcycle); - printf("32b LMUL=4 n=%ld cycles=%ld\n", t, cycles2 - cycles1); - } - } - - printf("Executing exponential on %d 64-bit data LMUL=1...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - exp_f64m1_bmark(exponents_f64, results_f64m1, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %d cycles.\n", cycles2 - cycles1); - - printf("Executing exponential on %d 64-bit data LMUL=2...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - exp_f64m2_bmark(exponents_f64, results_f64m2, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %d cycles.\n", cycles2 - cycles1); - - printf("Executing exponential on %d 64-bit data LMUL=4...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - exp_f64m4_bmark(exponents_f64, results_f64m4, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %d cycles.\n", cycles2 - cycles1); - - printf("Executing exponential on %d 32-bit data LMUL=1...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - exp_f32m1_bmark(exponents_f32, results_f32m1, N_f32); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %d cycles.\n", cycles2 - cycles1); - - printf("Executing exponential on %d 32-bit data LMUL=2...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - exp_f32m2_bmark(exponents_f32, results_f32m2, N_f32); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %d cycles.\n", cycles2 - cycles1); - - printf("Executing exponential on %d 32-bit data LMUL=4...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - exp_f32m4_bmark(exponents_f32, results_f32m4, N_f32); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("The execution took %d cycles.\n", cycles2 - cycles1); - - printf("Checking results:\n"); - - error = check64(results_f64m1); - if (error) { - return error; - } - error = check64(results_f64m2); - if (error) { - return error; - } - error = check64(results_f64m4); - if (error) { - return error; - } - error = check32(results_f32m1); - if (error) { - return error; - } - error = check32(results_f32m2); - if (error) { - return error; - } - error = check32(results_f32m4); - if (error) { - return error; - } - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d.h b/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d.h deleted file mode 100644 index 888d93a2..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d.h +++ /dev/null @@ -1,43 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#ifndef FCONV2D_H -#define FCONV2D_H - -#include -#include - -void fconv2d_3x3(double *o, double *i, double *f, int64_t R, int64_t C, - int64_t F); -void fconv2d_vec_4xC_slice_init_3x3(double *o, int64_t C); -void fconv2d_vec_4xC_slice_preload_3x3(double *i, int64_t C, int64_t F); -void fconv2d_vec_4xC_slice_move_3x3(int64_t C, int64_t F); -void fconv2d_vec_4xC_3x3(double *o, double *i, double *f, int64_t C, int64_t F); - -void fconv2d_7x7(double *o, double *i, double *f, int64_t R, int64_t C, - int64_t F); -void fconv2d_7x7_block(double *o, double *i, double *f, int64_t R, int64_t C, - int64_t n_, int64_t F); - -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - -// Threshold for FP numbers comparison during the final check -#define THRESHOLD 0.000000000001 -// #define THRESHOLD 0 - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d_3x3.c b/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d_3x3.c deleted file mode 100644 index f4e614e2..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d_3x3.c +++ /dev/null @@ -1,334 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "fconv2d.h" - -void fconv2d_3x3(double *o, double *i, double *f, int64_t R, int64_t C, - int64_t F) { - // We work on 4 rows of the output matrix at once - int64_t block_size_o = 4; - // We work on block_size_o + F - 1 rows of the input matrix at once - - // First iteration round, r = 0 - double *i_ = i; - double *o_ = o; - - // Preload the first two input rows -> This is not needed in the other rounds - fconv2d_vec_4xC_slice_preload_3x3(i_, C, F); - // The first F-1 rows have already been loaded by - // fconv2d_vec_4xC_slice_preload_3x3() - double *i__ = i_ + (F - 1) * (C + F - 1); - fconv2d_vec_4xC_3x3(o_, i__, f, C, F); - // Re-use some of the already-loaded input rows - fconv2d_vec_4xC_slice_move_3x3(C, F); - - i_ = i + block_size_o * (C + F - 1); - i__ = i_ + (F - 1) * (C + F - 1); - - int64_t ldi = (C + F - 1) << 3; - int64_t ldf = F << 3; - - // Temporary variables - double t0, t1, t2; - // Helper variables - double *f_; - f_ = f; - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t0) - : "r"(ldf)); - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t1) - : "r"(ldf)); - asm volatile("fld %1, (%0);" : "+&r"(f_), "=&f"(t2)); - - // Iterate over the output rows - for (int64_t r = block_size_o; r < R; r += block_size_o) { - - // The first F-1 rows have already been loaded by - // fconv2d_vec_4xC_slice_init() - - double t3, t4, t5; - - // Fetch C + F - 1 elements (padding included) - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - f_ = f; - - // Fetch the first column of the filter, and start calculating its - // contribution on the four output rows (v0, v2, v4, v6) - - // Fetch 4 + F - 1 - 2 rows of the input matrix - // Compute on C + F - 1 elements, instead of C elements, to cover the - // latency of the load instructions - asm volatile("vmv.v.v v8, v16"); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - asm volatile("vfmul.vf v0, v8, %0" ::"f"(t0)); - - asm volatile("vmv.v.v v10, v18"); - asm volatile("vfmul.vf v2, v10, %0" ::"f"(t0)); - asm volatile("vle64.v v14, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - asm volatile("vfmacc.vf v0, %0, v10" ::"f"(t1)); - - asm volatile("vfmacc.vf v2, %0, v12" ::"f"(t1)); - asm volatile("vle64.v v16, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - asm volatile("vfmacc.vf v0, %0, v12" ::"f"(t2)); - asm volatile("vslidedown.vi v20, v8, 1"); - asm volatile("vfmul.vf v4, v12, %0" ::"f"(t0)); - - asm volatile("vle64.v v18, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C)); - - asm volatile("vfmul.vf v6, v14, %0" ::"f"(t0)); - asm volatile("vslidedown.vi v22, v10, 1"); - asm volatile("vfmacc.vf v4, %0, v14" ::"f"(t1)); - asm volatile("vfmacc.vf v2, %0, v14" ::"f"(t2)); - asm volatile("vslidedown.vi v24, v12, 1"); - - asm volatile("vfmacc.vf v6, %0, v16" ::"f"(t1)); - asm volatile("vfmacc.vf v4, %0, v16" ::"f"(t2)); - - asm volatile("vslidedown.vi v26, v14, 1"); - - asm volatile("vfmacc.vf v6, %0, v18" ::"f"(t2)); - - f_ = f + 1; - // Fetch the middle column of the filter, and start calculating its - // contributions on the output rows To do so, slide down the input rows by - // one - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t3) - : "r"(ldf)); - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t4) - : "r"(ldf)); - asm volatile("fld %1, (%0);" : "+&r"(f_), "=&f"(t5)); - - asm volatile("vfmacc.vf v0, %0, v20" ::"f"(t3)); - - asm volatile("vfmacc.vf v0, %0, v22" ::"f"(t4)); - asm volatile("vslidedown.vi v28, v16, 1"); - asm volatile("vfmacc.vf v2, %0, v22" ::"f"(t3)); - - i_ = i + (r + block_size_o) * (C + F - 1); - asm volatile("vfmacc.vf v0, %0, v24" ::"f"(t5)); - asm volatile("vslidedown.vi v30, v18, 1"); - asm volatile("vfmacc.vf v2, %0, v24" ::"f"(t4)); - asm volatile("vfmacc.vf v4, %0, v24" ::"f"(t3)); - asm volatile("vslidedown.vi v20, v8, 2"); - - asm volatile("vfmacc.vf v2, %0, v26" ::"f"(t5)); - asm volatile("vfmacc.vf v4, %0, v26" ::"f"(t4)); - asm volatile("vslidedown.vi v22, v10, 2"); - asm volatile("vfmacc.vf v6, %0, v26" ::"f"(t3)); - i__ = i_ + (F - 1) * (C + F - 1); - - asm volatile("vfmacc.vf v4, %0, v28" ::"f"(t5)); - f_ = f + 2; - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t3) - : "r"(ldf)); - asm volatile("vfmacc.vf v6, %0, v28" ::"f"(t4)); - asm volatile("vslidedown.vi v24, v12, 2"); - - asm volatile("vfmacc.vf v6, %0, v30" ::"f"(t5)); - asm volatile("vfmacc.vf v0, %0, v20" ::"f"(t3)); - asm volatile("vslidedown.vi v26, v14, 2"); - - // Repeat for the last filter column, and then store the output rows - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t4) - : "r"(ldf)); - asm volatile("fld %1, (%0);" : "+&r"(f_), "=&f"(t5)); - - asm volatile("vfmacc.vf v0, %0, v22" ::"f"(t4)); - o_ = o + r * C; - - // Compute on C elements - int64_t ldo = C << 3; - asm volatile("vfmacc.vf v2, %0, v22" ::"f"(t3)); - asm volatile("vslidedown.vi v28, v16, 2"); - - asm volatile("vfmacc.vf v0, %0, v24" ::"f"(t5)); - asm volatile("vfmacc.vf v2, %0, v24" ::"f"(t4)); - asm volatile("vslidedown.vi v30, v18, 2"); - asm volatile("vse64.v v0, (%0); add %0, %0, %1" : "+&r"(o_) : "r"(ldo)); - asm volatile("vfmacc.vf v4, %0, v24" ::"f"(t3)); - - asm volatile("vfmacc.vf v2, %0, v26" ::"f"(t5)); - asm volatile("vse64.v v2, (%0); add %0, %0, %1" : "+&r"(o_) : "r"(ldo)); - asm volatile("vfmacc.vf v4, %0, v26" ::"f"(t4)); - asm volatile("vfmacc.vf v6, %0, v26" ::"f"(t3)); - - asm volatile("vfmacc.vf v4, %0, v28" ::"f"(t5)); - asm volatile("vse64.v v4, (%0); add %0, %0, %1" : "+&r"(o_) : "r"(ldo)); - asm volatile("vfmacc.vf v6, %0, v28" ::"f"(t4)); - - asm volatile("vfmacc.vf v6, %0, v30" ::"f"(t5)); - asm volatile("vse64.v v6, (%0);" : "+r"(o_)); - } -} - -// Load 4 rows of the output matrix -void fconv2d_vec_4xC_slice_preload_3x3(double *i, int64_t C, int64_t F) { - // Helper variables - int64_t ldi = (C + F - 1) << 3; - - // Set the vector configuration - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Fetch the first floor(F/2) + 1 input rows - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+r"(i)); -} - -// Calculate 4 output matrix rows -void fconv2d_vec_4xC_3x3(double *o, double *i, double *f, int64_t C, - int64_t F) { - - // Temporary variables - double t0, t1, t2; - - // Helper variables - int64_t ldo = C << 3; - int64_t ldi = (C + F - 1) << 3; - int64_t ldf = F << 3; - double *f_; - - // Fetch C + F - 1 elements (padding included) - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - f_ = f; - // Fetch the first column of the filter, and start calculating its - // contribution on the four output rows (v0, v2, v4, v6) - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t0) - : "r"(ldf)); - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t1) - : "r"(ldf)); - asm volatile("fld %1, (%0);" : "+&r"(f_), "=&f"(t2)); - - // Fetch 4 + F - 1 - 2 rows of the input matrix - // Compute on C + F - 1 elements, instead of C elements, to cover the latency - // of the load instructions - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vfmul.vf v0, v8, %0" ::"f"(t0)); - - asm volatile("vfmul.vf v2, v10, %0" ::"f"(t0)); - asm volatile("vle64.v v14, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vfmacc.vf v0, %0, v10" ::"f"(t1)); - - asm volatile("vfmacc.vf v2, %0, v12" ::"f"(t1)); - asm volatile("vle64.v v16, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vfmacc.vf v0, %0, v12" ::"f"(t2)); - asm volatile("vslidedown.vi v20, v8, 1"); - asm volatile("vfmul.vf v4, v12, %0" ::"f"(t0)); - - asm volatile("vle64.v v18, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C)); - - asm volatile("vfmul.vf v6, v14, %0" ::"f"(t0)); - asm volatile("vfmacc.vf v4, %0, v14" ::"f"(t1)); - asm volatile("vslidedown.vi v22, v10, 1"); - asm volatile("vfmacc.vf v2, %0, v14" ::"f"(t2)); - - asm volatile("vfmacc.vf v6, %0, v16" ::"f"(t1)); - asm volatile("vfmacc.vf v4, %0, v16" ::"f"(t2)); - - asm volatile("vslidedown.vi v24, v12, 1"); - asm volatile("vfmacc.vf v6, %0, v18" ::"f"(t2)); - - f_ = f + 1; - // Fetch the middle column of the filter, and start calculating its - // contributions on the output rows To do so, slide down the input rows by one - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t0) - : "r"(ldf)); - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t1) - : "r"(ldf)); - asm volatile("fld %1, (%0);" : "+&r"(f_), "=&f"(t2)); - - asm volatile("vfmacc.vf v0, %0, v20" ::"f"(t0)); - - asm volatile("vfmacc.vf v0, %0, v22" ::"f"(t1)); - asm volatile("vslidedown.vi v26, v14, 1"); - asm volatile("vfmacc.vf v2, %0, v22" ::"f"(t0)); - - asm volatile("vfmacc.vf v0, %0, v24" ::"f"(t2)); - asm volatile("vfmacc.vf v2, %0, v24" ::"f"(t1)); - asm volatile("vslidedown.vi v28, v16, 1"); - asm volatile("vfmacc.vf v4, %0, v24" ::"f"(t0)); - - asm volatile("vfmacc.vf v2, %0, v26" ::"f"(t2)); - asm volatile("vfmacc.vf v4, %0, v26" ::"f"(t1)); - asm volatile("vslidedown.vi v30, v18, 1"); - asm volatile("vfmacc.vf v6, %0, v26" ::"f"(t0)); - - asm volatile("vfmacc.vf v4, %0, v28" ::"f"(t2)); - asm volatile("vslidedown.vi v20, v8, 2"); - asm volatile("vfmacc.vf v6, %0, v28" ::"f"(t1)); - - asm volatile("vfmacc.vf v6, %0, v30" ::"f"(t2)); - asm volatile("vslidedown.vi v22, v10, 2"); - - f_ = f + 2; - // Repeat for the last filter column, and then store the output rows - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t0) - : "r"(ldf)); - asm volatile("fld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&f"(t1) - : "r"(ldf)); - asm volatile("fld %1, (%0);" : "+&r"(f_), "=&f"(t2)); - - asm volatile("vfmacc.vf v0, %0, v20" ::"f"(t0)); - - asm volatile("vfmacc.vf v0, %0, v22" ::"f"(t1)); - asm volatile("vslidedown.vi v24, v12, 2"); - asm volatile("vfmacc.vf v2, %0, v22" ::"f"(t0)); - - // Compute on C elements - - asm volatile("vfmacc.vf v0, %0, v24" ::"f"(t2)); - asm volatile("vse64.v v0, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vslidedown.vi v26, v14, 2"); - asm volatile("vfmacc.vf v2, %0, v24" ::"f"(t1)); - asm volatile("vfmacc.vf v4, %0, v24" ::"f"(t0)); - - asm volatile("vfmacc.vf v2, %0, v26" ::"f"(t2)); - asm volatile("vse64.v v2, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vslidedown.vi v28, v16, 2"); - asm volatile("vfmacc.vf v4, %0, v26" ::"f"(t1)); - asm volatile("vfmacc.vf v6, %0, v26" ::"f"(t0)); - - asm volatile("vfmacc.vf v4, %0, v28" ::"f"(t2)); - asm volatile("vslidedown.vi v30, v18, 2"); - asm volatile("vse64.v v4, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v6, %0, v28" ::"f"(t1)); - - asm volatile("vfmacc.vf v6, %0, v30" ::"f"(t2)); - asm volatile("vse64.v v6, (%0);" : "+r"(o)); -} - -void fconv2d_vec_4xC_slice_move_3x3(int64_t C, int64_t F) { - // Move C+F-1 elements - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Move the last floor(F/2) + 1 input rows - asm volatile("vmv.v.v v8, v16"); - asm volatile("vmv.v.v v10, v18"); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d_7x7.c b/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d_7x7.c deleted file mode 100644 index 6b69f3de..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/fconv2d_7x7.c +++ /dev/null @@ -1,636 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -/* - Optimized convolution for Ara - The code is long only because of: - 1) Special cases related to the first/last 7 rows - 2) Unrolling of the loops to hide the latency of the moves, slides, mem ops - - At the end of the file, you can find the not-unrolled main loop in a comment, - without the edge-code. - - Algorithm: - a) Load the next input row - b) Calculate its contributions to the F = 7 output rows using one column of - the filter c) Slide down the input row by 1, injecting the next input scalar - element in the tail d) Repeat from b), taking the next colum of the filter, - until the last column is fetched e) Store the first output row, the one that - is complete f) Move all the output rows up by one register, to restore the - initial condition g) Repeat from a) - - Every time a new input row is loaded, a new output row is created. - - The first 6 rows and the last 6 rows do not follow this pattern, thus we wrote - dedicated code. Because of the unrolling, we counted for this the first and - last 7 rows, instead of 6 - - This algorithm helps in minimizing the data dependencies, as every input rows - is used To calculate 7 different output rows. -*/ - -#include "fconv2d.h" - -void fconv2d_7x7(double *o, double *i, double *f, int64_t M, int64_t N, - int64_t F) { - - unsigned long int block_size_n; - - // Set the vector configuration - asm volatile("vsetvli %0, %1, e64, m2, ta, ma" : "=r"(block_size_n) : "r"(N)); - - // Slice the matrix into a manageable number of columns n_ - for (unsigned long int n = 0; n < N; n += block_size_n) { - // Set the vector length - const unsigned long int n_ = MIN(N - n, block_size_n); - - // Find pointers to the submatrices - const double *i_ = i + n; - double *o_ = o + n; - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(n_)); - - fconv2d_7x7_block(o_, i_, f, M, N, n_, F); - } -} -void fconv2d_7x7_block(double *o, double *i, double *f, int64_t R, int64_t C, - int64_t n_, int64_t F) { - - // Helper variables - int64_t ldo = C << 3; - int64_t ldi_pad = (C + F - 1) << 3; - - double *i_ = i; - - double f6, f13, f20, f27, f34, f41, f48; - - double *i_slide_ptr_0; - double *i_slide_ptr_1; - double *i_slide_ptr_2; - double *i_slide_ptr_3; - - // Buffer some of the filter coefficients not to lose efficiency after a - // vector store (CVA6 cannot issue memory operations if there is a pending - // store!) - f6 = f[6]; - f13 = f[13]; - f20 = f[20]; - f27 = f[27]; - f34 = f[34]; - f41 = f[41]; - f48 = f[48]; - - // Point to the scalar elements to insert during a slide - i_slide_ptr_0 = i_ + n_ + 0 * (C + F - 1); - i_slide_ptr_1 = i_ + n_ + 1 * (C + F - 1); - i_slide_ptr_2 = i_ + n_ + 2 * (C + F - 1); - i_slide_ptr_3 = i_ + n_ + 3 * (C + F - 1); - - //////////////// - // Row 0 -> 3 // - //////////////// - - // Load one input row - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vle64.v v4, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - if (k == 0) - asm volatile("vfmul.vf v16, v0, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[0 + (2 * k)])); - if (k == 0) - asm volatile("vfmul.vf v18, v4, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[0 + (2 * k)])); - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v4" ::"f"(f[7 + (2 * k)])); - if (k == 0) - asm volatile("vfmul.vf v22, v12, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f[0 + (2 * k)])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v18, %0, v8" ::"f"(f[7 + (2 * k)])); - asm volatile("vfmacc.vf v16, %0, v8" ::"f"(f[14 + (2 * k)])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - if (k == 0) - asm volatile("vfmul.vf v20, v8, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[0 + (2 * k)])); - asm volatile("vfmacc.vf v18, %0, v12" ::"f"(f[14 + (2 * k)])); - asm volatile("vfmacc.vf v16, %0, v12" ::"f"(f[21 + (2 * k)])); - asm volatile("vfslide1down.vf v14, v12, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v20, %0, v12" ::"f"(f[7 + (2 * k)])); - - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(f[7 + (2 * k + 1)])); - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(f[7 + (2 * k + 1)])); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfmacc.vf v18, %0, v14" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v22, %0, v14" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfmacc.vf v16, %0, v14" ::"f"(f[21 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v12, v14, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v20, %0, v14" ::"f"(f[7 + (2 * k + 1)])); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i_ + n_ + 0 * (C + F - 1); - i_slide_ptr_1 = i_ + n_ + 1 * (C + F - 1); - i_slide_ptr_2 = i_ + n_ + 2 * (C + F - 1); - - // Main kernel, last iteration with filter coefficients reuse - // Start loading next rows, from 4 to 6 - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f6)); - asm volatile("vle64.v v2, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f6)); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f6)); - asm volatile("vfmacc.vf v16, %0, v4" ::"f"(f13)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vfmacc.vf v18, %0, v8" ::"f"(f13)); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f6)); - asm volatile("vfmacc.vf v16, %0, v8" ::"f"(f20)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vfmacc.vf v18, %0, v12" ::"f"(f20)); - asm volatile("vfmacc.vf v20, %0, v12" ::"f"(f13)); - asm volatile("vfmacc.vf v16, %0, v12" ::"f"(f27)); - - //////////////// - // Row 4 -> 6 // - //////////////// - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f[21 + (2 * k)])); - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(f[35 + (2 * k)])); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(f[42 + (2 * k)])); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(f[35 + (2 * k)])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f[14 + (2 * k)])); - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(f[21 + (2 * k)])); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[28 + (2 * k)])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f[7 + (2 * k)])); - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(f[14 + (2 * k)])); - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(f[21 + (2 * k)])); - - if (k == 0) - asm volatile("vfmul.vf v24, v2, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[0 + (2 * k)])); - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(f[7 + (2 * k)])); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(f[14 + (2 * k)])); - - if (k == 0) - asm volatile("vfmul.vf v26, v6, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[0 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(f[7 + (2 * k)])); - - if (k == 0) - asm volatile("vfmul.vf v28, v10, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[0 + (2 * k)])); - - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v16, %0, v4" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfmacc.vf v16, %0, v8" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f[21 + (2 * k + 1)])); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v18, %0, v8" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(f[21 + (2 * k + 1)])); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f[7 + (2 * k + 1)])); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(f[21 + (2 * k + 1)])); - - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(f[7 + (2 * k + 1)])); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(f[14 + (2 * k + 1)])); - - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(f[7 + (2 * k + 1)])); - - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[0 + (2 * k + 1)])); - } - - // Main kernel, last iteration with filter coefficients reuse - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f34)); - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(f41)); - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(f48)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f27)); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f34)); - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(f41)); - asm volatile("vmv.v.v v16, v18"); - - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f20)); - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(f27)); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f34)); - asm volatile("vmv.v.v v18, v20"); - - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f13)); - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(f20)); - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(f27)); - asm volatile("vmv.v.v v20, v22"); - - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f6)); - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(f13)); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(f20)); - asm volatile("vmv.v.v v22, v24"); - - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f6)); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(f13)); - asm volatile("vmv.v.v v24, v26"); - - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f6)); - asm volatile("vmv.v.v v26, v28"); - - //////////// - // REGIME // - //////////// - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i_ + n_; - - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - - // The following loop is unrolled by 2 - // The input matrix has R + F - 1 rows - // We have computed F input rows already - // Compute now until only F input rows are left - // (The last F-1 rows do not contribute to F output rows each, so keep them - // outside of this loop) (We keep F rows outside because of the unrolling by - // 2, just for easeness) - for (int j = 0; j < ((R + F - 1) - 2 * F) / 2; ++j) { - // Work on F output rows - - ////////////// - // UNROLL 0 // - ////////////// - - // Main loop - for (int k = 0; k < F / 2; ++k) { - double f42, f35, f28, f21, f14, f7, f0, fs; - f42 = f[42 + (2 * k)]; - f35 = f[35 + (2 * k)]; - f28 = f[28 + (2 * k)]; - f21 = f[21 + (2 * k)]; - f14 = f[14 + (2 * k)]; - f7 = f[7 + (2 * k)]; - f0 = f[0 + (2 * k)]; - fs = *i_slide_ptr_0++; - asm volatile("" ::: "memory"); - // Calculate F contributions of the input rows, on F different output rows - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f42)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f35)); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f28)); - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(fs)); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f21)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f14)); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f7)); - if (k == 0) - asm volatile("vfmul.vf v28, v0, %0" ::"f"(f0)); - else - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f0)); - - f42 = f[42 + (2 * k + 1)]; - f35 = f[35 + (2 * k + 1)]; - f28 = f[28 + (2 * k + 1)]; - f21 = f[21 + (2 * k + 1)]; - f14 = f[14 + (2 * k + 1)]; - f7 = f[7 + (2 * k + 1)]; - f0 = f[0 + (2 * k + 1)]; - fs = *i_slide_ptr_0++; - asm volatile("" ::: "memory"); - // Calculate F contributions of the input rows, on F different output rows - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f42)); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f35)); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f28)); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(fs)); - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f21)); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f14)); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f7)); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f0)); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_1 = i_ + n_; - - // The last iteration is used to mask the latency of the store and the moves - // Use buffered coefficients not to stall CVA6 for coherency - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f48)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f41)); - asm volatile("vmv.v.v v16, v18"); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f34)); - asm volatile("vle64.v v2, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vmv.v.v v18, v20"); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f27)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f20)); - asm volatile("vmv.v.v v20, v22"); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f13)); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f6)); - asm volatile("vmv.v.v v22, v24"); - - ////////////// - // UNROLL 1 // - ////////////// - - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[42])); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f[35])); - asm volatile("vmv.v.v v24, v26"); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f[28])); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f[21])); - asm volatile("vmv.v.v v26, v28"); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[14])); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f[7])); - asm volatile("vfmul.vf v28, v2, %0" ::"f"(f[0])); - - for (int k = 1; k < F; k += 2) { - double f42, f35, f28, f21, f14, f7, f0, fs; - f42 = f[42 + k]; - f35 = f[35 + k]; - f28 = f[28 + k]; - f21 = f[21 + k]; - f14 = f[14 + k]; - f7 = f[7 + k]; - f0 = f[0 + k]; - fs = *i_slide_ptr_1++; - asm volatile("" ::: "memory"); - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f42)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f35)); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f28)); - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(fs)); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f21)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f14)); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f7)); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f0)); - - if (k == F - 2) - break; - - f42 = f[42 + k + 1]; - f35 = f[35 + k + 1]; - f28 = f[28 + k + 1]; - f21 = f[21 + k + 1]; - f14 = f[14 + k + 1]; - f7 = f[7 + k + 1]; - f0 = f[0 + k + 1]; - fs = *i_slide_ptr_1++; - asm volatile("" ::: "memory"); - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f42)); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f35)); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f28)); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(fs)); - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f21)); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f14)); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f7)); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f0)); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i_ + n_; - - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f48)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f41)); - asm volatile("vmv.v.v v16, v18"); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f34)); - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vmv.v.v v18, v20"); - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f27)); - asm volatile("vmv.v.v v20, v22"); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f20)); - asm volatile("vmv.v.v v22, v24"); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f13)); - asm volatile("vmv.v.v v24, v26"); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f6)); - asm volatile("vmv.v.v v26, v28"); - } - - //////////////////////// - // Row I-F -> (I-1)-3 // - //////////////////////// - - // Point to the scalar elements to insert during a slide - // i_slide_ptr_0 has already been computed - i_slide_ptr_1 = i_ + n_ + 0 * (C + F - 1); - i_slide_ptr_2 = i_ + n_ + 1 * (C + F - 1); - i_slide_ptr_3 = i_ + n_ + 2 * (C + F - 1); - - // Load other three input rows (one was already loaded) - asm volatile("vle64.v v4, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - // Process 4 input rows - for (int k = 0; k < F / 2; ++k) { - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[42 + (2 * k)])); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f[35 + (2 * k)])); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f[21 + (2 * k)])); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[14 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f[7 + (2 * k)])); - if (k == 0) - asm volatile("vfmul.vf v28, v0, %0" ::"f"(f[0 + (2 * k)])); - else - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f[0 + (2 * k)])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[42 + (2 * k)])); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(f[35 + (2 * k)])); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(f[21 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[14 + (2 * k)])); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(f[7 + (2 * k)])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[42 + (2 * k)])); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(f[35 + (2 * k)])); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(f[21 + (2 * k)])); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[14 + (2 * k)])); - asm volatile("vfslide1down.vf v14, v12, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f[42 + (2 * k)])); - asm volatile("vfmacc.vf v24, %0, v12" ::"f"(f[35 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v12" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v28, %0, v12" ::"f"(f[21 + (2 * k)])); - - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f[21 + (2 * k + 1)])); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f[7 + (2 * k + 1)])); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f[0 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(f[21 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(f[7 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(f[21 + (2 * k + 1)])); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[14 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v12, v14, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v22, %0, v14" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfmacc.vf v24, %0, v14" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v14" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v28, %0, v14" ::"f"(f[21 + (2 * k + 1)])); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i_ + n_ + 0 * (C + F - 1); - i_slide_ptr_1 = i_ + n_ + 1 * (C + F - 1); - i_slide_ptr_2 = i_ + n_ + 2 * (C + F - 1); - - asm volatile("vle64.v v2, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f48)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f41)); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f34)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f27)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f20)); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f13)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f6)); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f48)); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(f41)); - asm volatile("vse64.v v18, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(f34)); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(f27)); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f20)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+&r"(i_) : "r"(ldi_pad)); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(f13)); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f48)); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(f41)); - asm volatile("vse64.v v20, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(f34)); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(f27)); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f20)); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f48)); - asm volatile("vse64.v v22, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v24, %0, v12" ::"f"(f41)); - asm volatile("vfmacc.vf v26, %0, v12" ::"f"(f34)); - asm volatile("vfmacc.vf v28, %0, v12" ::"f"(f27)); - - ////////////////////////// - // Row (I-1)-3 -> (I-1) // - ////////////////////////// - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[42 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f[35 + (2 * k)])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f[28 + (2 * k)])); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[42 + (2 * k)])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(f[35 + (2 * k)])); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[42 + (2 * k)])); - - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f[28 + (2 * k + 1)])); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[42 + (2 * k + 1)])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(f[35 + (2 * k + 1)])); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[42 + (2 * k + 1)])); - } - - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f48)); - asm volatile("vse64.v v24, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f41)); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f34)); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f48)); - asm volatile("vse64.v v26, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(f41)); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f48)); - asm volatile("vse64.v v28, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); -} - -/* - //////////////////// - // MAIN ALGORITHM // - //////////////////// - - // Start calculating the pointer to the next element to be slided in - i_slide_ptr_0 = i + C; - - // Load one input row - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - - // Kernel - for (int k = 0; k < F; ++k) { - // Calculate F*F contributions of the input rows, on F different output rows - // v28 should be initialized during the first iteration - asm volatile("vfmacc.vf v16, %0, v0" :: "f"(f[42 + (2*k)])); - asm volatile("vfmacc.vf v18, %0, v0" :: "f"(f[35 + (2*k)])); - asm volatile("vfmacc.vf v20, %0, v0" :: "f"(f[28 + (2*k)])); - asm volatile("vfmacc.vf v22, %0, v0" :: "f"(f[21 + (2*k)])); - asm volatile("vfmacc.vf v24, %0, v0" :: "f"(f[14 + (2*k)])); - asm volatile("vfmacc.vf v26, %0, v0" :: "f"(f[7 + (2*k)])); - if (k == 0) - asm volatile("vfmul.vf v28, v0, %0" :: "f"(f[0 + (2*k)])); - else - asm volatile("vfmacc.vf v28, %0, v0" :: "f"(f[0 + (2*k)])); - - // Slide the input row by one, and inject the next scalar element of the row - asm volatile("vfslide1down.vf v0, v0, %0" :: "f"(*i_slide_ptr_0++)); - } - - // Store one output row - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - - // Move all the input rows to return to the initial situation - // To avoid these operations, unroll the loop via software, renaming the - registers manually asm volatile("vmv.v.v v16, v18"); asm volatile("vmv.v.v - v18, v20"); asm volatile("vmv.v.v v20, v22"); asm volatile("vmv.v.v v22, - v24"); asm volatile("vmv.v.v v24, v26"); asm volatile("vmv.v.v v26, v28"); -*/ diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/gen_data.py deleted file mode 100755 index 9bf90b96..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/gen_data.py +++ /dev/null @@ -1,123 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: image size, arg2: filter size - -import numpy as np -import sys - - -def convolve2D(kernel, image, padding): - # Default stride - strides = 1 - - # Gather Shapes of Kernel + Image + Padding - xKernShape = kernel.shape[0] - yKernShape = kernel.shape[1] - xImgShape = image.shape[0] - yImgShape = image.shape[1] - - # Shape of Output Convolution - xOutput = xImgShape - xKernShape + 1 - yOutput = yImgShape - yKernShape + 1 - output = np.zeros((xOutput, yOutput)) - - # Iterate through image - for y in range(image.shape[1]): - # Exit Convolution - if y > image.shape[1] - yKernShape: - break - # Only Convolve if y has gone down by the specified Strides - if y % strides == 0: - for x in range(image.shape[0]): - # Go to next row once kernel is out of bounds - if x > image.shape[0] - xKernShape: - break - try: - # Only Convolve if x has moved by the specified Strides - if x % strides == 0: - output[x, y] = ( - kernel * image[x : x + xKernShape, y : y + yKernShape] - ).sum() - except Exception: - break - - return output - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# Define the filter size and the matrix dimension (max, for now, is 128 64-bit elements) -if len(sys.argv) > 1: - matrix_width = int(sys.argv[1]) - assert ( - matrix_width <= 128 - ), "The width of the image cannot be greater than 128 64-bit \ - elements. If this is not enough, modify the algorithm." - f = int(sys.argv[2]) - # Filter size must be odd - assert f % 2 == 1, "The filter size must be an odd integer number" -else: - matrix_width = 64 - f = 3 - -dtype = np.float64 - -# Input image. Take a square image -M = matrix_width -N = matrix_width -padding = int(f / 2) -M_pad = M + 2 * padding -N_pad = N + 2 * padding -assert ( - M % 4 == 0 -), "Output image dimension must be divisible by 4, pad the input image accordingly" -assert ( - N % 4 == 0 -), "Output image dimension must be divisible by 4, pad the input image accordingly" - -# Generate a random float64 input padded image -image = np.random.rand(M_pad, N_pad).astype(dtype) - -# Generate a random float64 filter -gen_filter = np.random.rand(f, f).astype(dtype) - -# Create the empty o matrix -empty_o = np.zeros((M, N)).astype(dtype) - -# Calculate the output matrix -result = convolve2D(gen_filter, image, padding).astype(dtype) - -# Print information on file -print('.section .data,"aw",@progbits') -emit("M", np.array(M, dtype=np.uint64)) -emit("N", np.array(N, dtype=np.uint64)) -emit("F", np.array(f, dtype=np.uint64)) -emit("i", image, "NR_LANES*4") -emit("f", gen_filter, "NR_LANES*4") -emit("o", empty_o, "NR_LANES*4") -emit("golden_o", result, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/main.c b/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/main.c deleted file mode 100644 index 026f1a73..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv2d/main.c +++ /dev/null @@ -1,103 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include -#include -#include - -#include "ara/util.h" -#include "fconv2d.h" -#include "util.h" - -// Define Matrix dimensions: -// o = i ° f, with i=[MxN], f=[FxF], o=[MxN] -// The filter is a square matrix, and F is odd - -// Matrices defined in data.S -extern double i[] - __attribute__((aligned(32))); // [ (M+floor(F/2)) * (N+floor(F/2)) ] -extern double f[] __attribute__((aligned(32))); // [ F*F ] -extern double o[] __attribute__((aligned(32))); // [ M*N ] -extern double golden_o[] __attribute__((aligned(32))); // [ M*N ] -// M, N, F defined in data.S -extern int64_t M; -extern int64_t N; -extern int64_t F; - -// Verify the matrices -int verify_matrix(double *matrix, double *golden_matrix, int64_t R, int64_t C, - double threshold) { - for (int r = 0; r < R; ++r) - for (int c = 0; c < C; ++c) - if (!similarity_check(matrix[c + C * r], golden_matrix[c + C * r], - threshold)) { - printf("Error: o[%d][%d] = %lf, instead of %lf\n", r, c, - matrix[c + C * r], golden_matrix[c + C * r]); - return 1; - } - return 0; -} - -int main() { - printf("FCONV2D M=%ld N=%ld F=%ld\n", M, N, F); - -#if PREALLOCATE - if (F == 3) - fconv2d_3x3(o, i, f, M, N, F); - else if (F == 7) - fconv2d_7x7(o, i, f, M, N, F); - else - printf("Error: the filter size is different from 3 or 5 or 7.\n"); -#endif - - unsigned long cycles1, cycles2, instr2, instr1; - - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - - // Call the main kernel, and measure cycles - if (F == 3) - fconv2d_3x3(o, i, f, M, N, F); - else if (F == 7) - fconv2d_7x7(o, i, f, M, N, F); - else - printf("Error: the filter size is different from 3 or 5 or 7.\n"); - - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - // Performance metrics - int64_t runtime = cycles2 - cycles1; - float performance = 2.0 * F * F * M * N / runtime; - printf("operations: %d\n", F * F * M * N); - printf("The execution took %d cycles.\n", runtime); - printf("The performance is %ld DPFLOP/1000 cycles.\n", - (uint64_t)(1000.0 * performance)); - - // Verify correctness - printf("Verifying result...\n"); - int error = verify_matrix(o, golden_o, M, N, THRESHOLD); - if (error != 0) { - printf("Fail.\n"); - } else { - printf("Passed.\n"); - } - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/fconv3d.h b/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/fconv3d.h deleted file mode 100644 index 386e5506..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/fconv3d.h +++ /dev/null @@ -1,37 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#ifndef FCONV3D_H -#define FCONV3D_H - -#include -#include - -void fconv3d_CHx7x7(double *o, double *i, double *f, int64_t M, int64_t N, - int64_t C, int64_t F); - -void fconv3d_CHx7x7_block(double *o, double *i, double *f, int64_t M, int64_t N, - int64_t n_, int64_t C, int64_t F); - -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - -// Threshold for FP numbers comparison during the final check -#define THRESHOLD 0.000000000001 -// #define THRESHOLD 0 - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/fconv3d_3x7x7.c b/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/fconv3d_3x7x7.c deleted file mode 100644 index 422f04a8..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/fconv3d_3x7x7.c +++ /dev/null @@ -1,894 +0,0 @@ -// Nopyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LINENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WAMMANTIES OM NONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -/* - Optimized convolution for Ara - The code is long only because of: - 1) Special cases related to the first/last 7 rows - 2) Unrolling of the loops to hide the latency of the moves, slides, mem ops - - At the end of the file, you can find the not-unrolled main loop in a comment, - without the edge-code. - - Algorithm: - a) Load the next input row - b) Calculate its contributions to the F = 7 output rows using one column of - the filter - c) Slide down the input row by 1, injecting the next input scalar - element in the tail - d) Repeat from b), taking the next colum of the filter, - until the last column is fetched - d2) Repeat from a) for all the channels - e) Store the first output row, the one that is complete - f) Move all the output rows up by one register, to restore the initial - condition g) Repeat from a) - - Every time a new input row is loaded, a new output row is created. - - The first 6 rows and the last 6 rows do not follow this pattern, thus we wrote - dedicated code. Because of the unrolling, we counted for this the first and - last 7 rows, instead of 6 - - This algorithm helps in minimizing the data dependencies, as every input rows - is used To calculate 7 different output rows. -*/ - -#include "fconv3d.h" - -extern int64_t event_trigger; - -void fconv3d_CHx7x7(double *o, double *i, double *f, int64_t M, int64_t N, - int64_t C, int64_t F) { - - unsigned long int block_size_n; - - // Set the vector configuration - asm volatile("vsetvli %0, %1, e64, m2, ta, ma" : "=r"(block_size_n) : "r"(N)); - - // Slice the matrix into a manageable number of columns n_ - for (unsigned long int n = 0; n < N; n += block_size_n) { - // Set the vector length - const unsigned long int n_ = MIN(N - n, block_size_n); - - // Find pointers to the submatrices - const double *i_ = i + n; - double *o_ = o + n; - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(n_)); - - fconv3d_CHx7x7_block(o_, i_, f, M, N, n_, C, F); - } -} - -void fconv3d_CHx7x7_block(double *o, double *i, double *f, int64_t M, int64_t N, - int64_t n_, int64_t C, int64_t F) { - - // Helper variables - int64_t ldo = N << 3; - int64_t ldi_pad = (N + F - 1) << 3; - - // Number of elements that separates two adjacent channels - int64_t ich_len = (M + F - 1) * (N + F - 1); - int64_t fch_len = F * F; - - double *i_ = i; - double *i__ = i; - - // Very last column of coefficients - double fl0, fl1, fl2, fl3, fl4, fl5, fl6; - // Buffers for coefficients preloading (solve 16-lane starvation problem) - double f0_buf, f1_buf, f2_buf, f3_buf, f4_buf, f5_buf, f6_buf; - - double *i_slide_ptr_0; - double *i_slide_ptr_1; - double *i_slide_ptr_2; - double *i_slide_ptr_3; - - // Buffer some of the filter coefficients not to lose efficiency after a - // vector store (CVA6 cannot issue memory operations if there is a pending - // store!) - int64_t last_f_column = (C - 1) * fch_len + F - 1; - - fl0 = f[last_f_column + 0 * F]; - fl1 = f[last_f_column + 1 * F]; - fl2 = f[last_f_column + 2 * F]; - fl3 = f[last_f_column + 3 * F]; - fl4 = f[last_f_column + 4 * F]; - fl5 = f[last_f_column + 5 * F]; - fl6 = f[last_f_column + 6 * F]; - - //////////////// - // Row 0 -> 3 // - //////////////// - - // Loop on the channels - for (int ch = 0; ch < C; ++ch) { - - // Point to the first element of the channel ch - i__ = i_ + ch * ich_len; - - // Point to the scalar elements to insert during a slide - i_slide_ptr_0 = i__ + n_ + 0 * (N + F - 1); - i_slide_ptr_1 = i__ + n_ + 1 * (N + F - 1); - i_slide_ptr_2 = i__ + n_ + 2 * (N + F - 1); - i_slide_ptr_3 = i__ + n_ + 3 * (N + F - 1); - - // Load four input rows belonging to channel ch - asm volatile("vle64.v v0, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v4, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - // Unrolled because of double buffering - // With HW renaming, this unroll is not needed - for (int64_t k = 0; k < F / 2; ++k) { - // Two base indexes because of the unrolling - // Point to the first element of the current column (k) of the current - // channel (ch) of the filter (f) - int64_t base_idx_0 = (2 * k) + (ch * fch_len); - // Point to the first element of the current column (k+1) of the current - // channel (ch) of the filter (f) - int64_t base_idx_1 = (2 * k + 1) + (ch * fch_len); - - if ((k | ch) == 0) - asm volatile("vfmul.vf v16, v0, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[0 + base_idx_0])); - if ((k | ch) == 0) - asm volatile("vfmul.vf v18, v4, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[0 + base_idx_0])); - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v4" ::"f"(f[7 + base_idx_0])); - if ((k | ch) == 0) - asm volatile("vfmul.vf v22, v12, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f[0 + base_idx_0])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v18, %0, v8" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v8" ::"f"(f[14 + base_idx_0])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - if ((k | ch) == 0) - asm volatile("vfmul.vf v20, v8, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v12" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v12" ::"f"(f[21 + base_idx_0])); - asm volatile("vfslide1down.vf v14, v12, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v20, %0, v12" ::"f"(f[7 + base_idx_0])); - - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[0 + base_idx_1])); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[0 + base_idx_1])); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(f[7 + base_idx_1])); - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(f[7 + base_idx_1])); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[0 + base_idx_1])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(f[14 + base_idx_1])); - asm volatile("vfmacc.vf v18, %0, v14" ::"f"(f[14 + base_idx_1])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v22, %0, v14" ::"f"(f[0 + base_idx_1])); - asm volatile("vfmacc.vf v16, %0, v14" ::"f"(f[21 + base_idx_1])); - asm volatile("vfslide1down.vf v12, v14, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v20, %0, v14" ::"f"(f[7 + base_idx_1])); - } - - int64_t base_idx_0 = (F - 1) + (ch * fch_len); - - // Don't slide during the last iteration - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v4" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v8" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v8" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v12" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v12" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v12" ::"f"(f[21 + base_idx_0])); - } - - // Bump the input ptr - i_ += 4 * (N + F - 1); - - //////////////// - // Row 4 -> 6 // - //////////////// - - // Loop on the channels - for (int ch = 0; ch < C; ++ch) { - - // Point to the first element of the channel ch - i__ = i_ + ch * ich_len; - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i__ + n_ + 0 * (N + F - 1); - i_slide_ptr_1 = i__ + n_ + 1 * (N + F - 1); - i_slide_ptr_2 = i__ + n_ + 2 * (N + F - 1); - - asm volatile("vle64.v v2, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - // Two base indexes because of the unrolling - // Point to the first element of the current column (k) of the current - // channel (ch) of the filter (f) - int64_t base_idx_0 = (2 * k) + (ch * fch_len); - // Point to the first element of the current column (k+1) of the current - // channel (ch) of the filter (f) - int64_t base_idx_1 = (2 * k + 1) + (ch * fch_len); - - // Unroll 0 - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(f[42 + base_idx_0])); - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(f[35 + base_idx_0])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[28 + base_idx_0])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(f[21 + base_idx_0])); - - if ((k | ch) == 0) - asm volatile("vfmul.vf v24, v2, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(f[14 + base_idx_0])); - - if ((k | ch) == 0) - asm volatile("vfmul.vf v26, v6, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(f[7 + base_idx_0])); - - if ((k | ch) == 0) - asm volatile("vfmul.vf v28, v10, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[0 + base_idx_0])); - - // Unroll 1 - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v16, %0, v4" ::"f"(f[35 + base_idx_1])); - asm volatile("vfmacc.vf v16, %0, v8" ::"f"(f[42 + base_idx_1])); - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f[21 + base_idx_1])); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v18, %0, v8" ::"f"(f[35 + base_idx_1])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f[14 + base_idx_1])); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(f[21 + base_idx_1])); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[28 + base_idx_1])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f[7 + base_idx_1])); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(f[14 + base_idx_1])); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(f[21 + base_idx_1])); - - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[0 + base_idx_1])); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(f[7 + base_idx_1])); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(f[14 + base_idx_1])); - - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[0 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(f[7 + base_idx_1])); - - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[0 + base_idx_1])); - } - - // The very last iterations require mixing the instructions with the store - // and the moves - if (ch != C - 1) { - // Point to the first element of the current column (k) of the current - // channel (ch) of the filter (f) - int64_t base_idx_0 = (F - 1) + (ch * fch_len); - - // Don't slide the elements here - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(f[42 + base_idx_0])); - - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(f[35 + base_idx_0])); - - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[28 + base_idx_0])); - - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(f[21 + base_idx_0])); - - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(f[14 + base_idx_0])); - - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(f[7 + base_idx_0])); - - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[0 + base_idx_0])); - } - } - - // Reuse preloaded coefficients - // Buffer the next coefficients for faster use - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(fl4)); - f6_buf = f[42]; - asm volatile("vfmacc.vf v16, %0, v6" ::"f"(fl5)); - f5_buf = f[35]; - asm volatile("vfmacc.vf v16, %0, v10" ::"f"(fl6)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(fl3)); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(fl4)); - asm volatile("vfmacc.vf v18, %0, v10" ::"f"(fl5)); - asm volatile("vmv.v.v v16, v18"); - - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(fl2)); - f4_buf = f[28]; - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(fl3)); - f3_buf = f[21]; - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(fl4)); - asm volatile("vmv.v.v v18, v20"); - - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(fl1)); - f2_buf = f[14]; - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(fl2)); - f1_buf = f[7]; - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(fl3)); - asm volatile("vmv.v.v v20, v22"); - - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(fl0)); - f0_buf = f[0]; - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(fl1)); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(fl2)); - asm volatile("vmv.v.v v22, v24"); - - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(fl0)); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(fl1)); - asm volatile("vmv.v.v v24, v26"); - - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(fl0)); - asm volatile("vmv.v.v v26, v28"); - - // Bump the input ptr - i_ += 3 * (N + F - 1); - - //////////// - // REGIME // - //////////// - - // The following loop is unrolled by 2 - // The input matrix has M + F - 1 rows - // We have computed F input rows already - // Nompute now until only F input rows are left - // (The last F-1 rows do not contribute to F output rows each, so keep them - // outside of this loop) (We keep F rows outside because of the unrolling by - // 2, just for easeness) - for (int j = 0; j < ((M + F - 1) - 2 * F) / 2; ++j) { -#ifdef VCD_DUMP - // Start dumping VCD - event_trigger = +1; -#endif - - // Work on F output rows - - // Loop on the channels - for (int ch = 0; ch < C; ++ch) { - // Point to the first element of the channel ch - i__ = i_ + ch * ich_len; - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i__ + n_; - - asm volatile("vle64.v v0, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - - ////////////// - // UNROLL 0 // - ////////////// - - // Main loop - // Use double buffering on the filter coefficients for 16-lanes config - // The computation is too fast, and every coefficient belongs to a - // different $line At every fld, CVA6 misses, and until it does not get - // the new coefficient, it cannot dispatch the next V instruction - for (int k = 0; k < F / 2; ++k) { - // Two base indexes because of the unrolling - // Look ahead to the first element of the current column (k+2) of the - // current channel (ch) of the filter (f) - int64_t base_idx_0 = (2 * k + 2) + (ch * fch_len); - // Point to the first element of the current column (k+1) of the current - // channel (ch) of the filter (f) - int64_t base_idx_1 = (2 * k + 1) + (ch * fch_len); - double fs; - fs = *i_slide_ptr_0++; - asm volatile("" ::: "memory"); - - // Calculate F contributions of the input rows, on F different output - // rows - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f6_buf)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f5_buf)); - f6_buf = f[42 + base_idx_1]; - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f4_buf)); - f5_buf = f[35 + base_idx_1]; - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(fs)); - f4_buf = f[28 + base_idx_1]; - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f3_buf)); - f3_buf = f[21 + base_idx_1]; - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f2_buf)); - f2_buf = f[14 + base_idx_1]; - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f1_buf)); - f1_buf = f[7 + base_idx_1]; - if ((k | ch) == 0) - asm volatile("vfmul.vf v28, v0, %0" ::"f"(f0_buf)); - else - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f0_buf)); - f0_buf = f[0 + base_idx_1]; - - fs = *i_slide_ptr_0++; - asm volatile("" ::: "memory"); - - // Nalculate F contributions of the input rows, on F different output - // rows - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f6_buf)); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f5_buf)); - f6_buf = f[42 + base_idx_0]; - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f4_buf)); - f5_buf = f[35 + base_idx_0]; - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(fs)); - f4_buf = f[28 + base_idx_0]; - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f3_buf)); - f3_buf = f[21 + base_idx_0]; - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f2_buf)); - f2_buf = f[14 + base_idx_0]; - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f1_buf)); - f1_buf = f[7 + base_idx_0]; - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f0_buf)); - f0_buf = f[0 + base_idx_0]; - } - - if (ch != C - 1) { - int64_t base_idx_0 = (ch + 1) * fch_len; - - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f6_buf)); - f6_buf = f[42 + base_idx_0]; - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f5_buf)); - f5_buf = f[35 + base_idx_0]; - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f4_buf)); - f4_buf = f[28 + base_idx_0]; - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f3_buf)); - f3_buf = f[21 + base_idx_0]; - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f2_buf)); - f2_buf = f[14 + base_idx_0]; - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f1_buf)); - f1_buf = f[7 + base_idx_0]; - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f0_buf)); - f0_buf = f[0 + base_idx_0]; - } - } - - // The last iteration is used to mask the latency of the store and the moves - // Use buffered coefficients not to stall NVA6 for coherency - f6_buf = f[42]; - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(fl6)); - f5_buf = f[35]; - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(fl5)); - asm volatile("vmv.v.v v16, v18"); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(fl4)); - asm volatile("vmv.v.v v18, v20"); - f4_buf = f[28]; - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(fl3)); - asm volatile("vmv.v.v v20, v22"); - f3_buf = f[21]; - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(fl2)); - asm volatile("vmv.v.v v22, v24"); - f2_buf = f[14]; - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(fl1)); - asm volatile("vmv.v.v v24, v26"); - f1_buf = f[7]; - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(fl0)); - asm volatile("vmv.v.v v26, v28"); - f0_buf = f[0]; - - // Bump the input ptr - i_ += N + F - 1; - -#ifdef VCD_DUMP - // Stop dumping VCD - event_trigger = -1; -#endif - - ////////////// - // UNROLL 1 // - ////////////// - - // Loop on the channels - for (int ch = 0; ch < C; ++ch) { - - // Point to the first element of the channel ch - i__ = i_ + ch * ich_len; - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_1 = i__ + n_; - - asm volatile("vle64.v v2, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - - for (int k = 0; k < F / 2; ++k) { - // Two base indexes because of the unrolling - // Point to the first element of the current column (k) of the current - // channel (ch) of the filter (f) - int64_t base_idx_0 = (2 * k + 2) + (ch * fch_len); - // Point to the first element of the current column (k+1) of the current - // channel (ch) of the filter (f) - int64_t base_idx_1 = (2 * k + 1) + (ch * fch_len); - double fs; - fs = *i_slide_ptr_1++; - asm volatile("" ::: "memory"); - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f6_buf)); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f5_buf)); - f6_buf = f[42 + base_idx_1]; - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f4_buf)); - f5_buf = f[35 + base_idx_1]; - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(fs)); - f4_buf = f[28 + base_idx_1]; - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f3_buf)); - f3_buf = f[21 + base_idx_1]; - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f2_buf)); - f2_buf = f[14 + base_idx_1]; - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f1_buf)); - f1_buf = f[7 + base_idx_1]; - if ((k | ch) == 0) - asm volatile("vfmul.vf v28, v2, %0" ::"f"(f0_buf)); - else - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f0_buf)); - f0_buf = f[0 + base_idx_1]; - - fs = *i_slide_ptr_1++; - asm volatile("" ::: "memory"); - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f6_buf)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f5_buf)); - f6_buf = f[42 + base_idx_0]; - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f4_buf)); - f5_buf = f[35 + base_idx_0]; - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(fs)); - f4_buf = f[28 + base_idx_0]; - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f3_buf)); - f3_buf = f[21 + base_idx_0]; - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f2_buf)); - f2_buf = f[14 + base_idx_0]; - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f1_buf)); - f1_buf = f[7 + base_idx_0]; - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f0_buf)); - f0_buf = f[0 + base_idx_0]; - } - - if (ch != C - 1) { - int64_t base_idx_0 = (ch + 1) * fch_len; - - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f6_buf)); - f6_buf = f[42 + base_idx_0]; - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f5_buf)); - f5_buf = f[35 + base_idx_0]; - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f4_buf)); - f4_buf = f[28 + base_idx_0]; - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f3_buf)); - f3_buf = f[21 + base_idx_0]; - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f2_buf)); - f2_buf = f[14 + base_idx_0]; - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f1_buf)); - f1_buf = f[7 + base_idx_0]; - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f0_buf)); - f0_buf = f[0 + base_idx_0]; - } - } - - // The last iteration is used to mask the latency of the store and the moves - // Use buffered coefficients not to stall CVA6 for coherency - f6_buf = f[42]; - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(fl6)); - f5_buf = f[35]; - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(fl5)); - asm volatile("vmv.v.v v16, v18"); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(fl4)); - asm volatile("vmv.v.v v18, v20"); - f4_buf = f[28]; - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(fl3)); - asm volatile("vmv.v.v v20, v22"); - f3_buf = f[21]; - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(fl2)); - asm volatile("vmv.v.v v22, v24"); - f2_buf = f[14]; - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(fl1)); - asm volatile("vmv.v.v v24, v26"); - f1_buf = f[7]; - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(fl0)); - asm volatile("vmv.v.v v26, v28"); - f0_buf = f[0]; - - // Bump the input ptr - i_ += N + F - 1; - } - - //////////////////////// - // Row I-F -> (I-1)-3 // - //////////////////////// - - for (int64_t ch = 0; ch < C; ++ch) { - - // Point to the first element of the channel ch - i__ = i_ + ch * ich_len; - - // Point to the scalar elements to insert during a slide - // i_slide_ptr_0 has already been computed - i_slide_ptr_0 = i__ + n_ + 0 * (N + F - 1); - i_slide_ptr_1 = i__ + n_ + 1 * (N + F - 1); - i_slide_ptr_2 = i__ + n_ + 2 * (N + F - 1); - i_slide_ptr_3 = i__ + n_ + 3 * (N + F - 1); - - // Load other three input rows (one was already loaded) - asm volatile("vle64.v v0, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v4, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - // Process 4 input rows - for (int k = 0; k < F / 2; ++k) { - // Two base indexes because of the unrolling - // Point to the first element of the current column (k) of the current - // channel (ch) of the filter (f) - int64_t base_idx_0 = (2 * k) + (ch * fch_len); - // Point to the first element of the current column (k+1) of the current - // channel (ch) of the filter (f) - int64_t base_idx_1 = (2 * k + 1) + (ch * fch_len); - - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f[7 + base_idx_0])); - if ((k | ch) == 0) - asm volatile("vfmul.vf v28, v0, %0" ::"f"(f[0 + base_idx_0])); - else - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f[0 + base_idx_0])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(f[7 + base_idx_0])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[14 + base_idx_0])); - asm volatile("vfslide1down.vf v14, v12, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v12" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v12" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v12" ::"f"(f[21 + base_idx_0])); - - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v16, %0, v2" ::"f"(f[42 + base_idx_1])); - asm volatile("vfmacc.vf v18, %0, v2" ::"f"(f[35 + base_idx_1])); - asm volatile("vfmacc.vf v20, %0, v2" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v22, %0, v2" ::"f"(f[21 + base_idx_1])); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[14 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f[7 + base_idx_1])); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f[0 + base_idx_1])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v18, %0, v6" ::"f"(f[42 + base_idx_1])); - asm volatile("vfmacc.vf v20, %0, v6" ::"f"(f[35 + base_idx_1])); - asm volatile("vfmacc.vf v22, %0, v6" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v24, %0, v6" ::"f"(f[21 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[14 + base_idx_1])); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(f[7 + base_idx_1])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v20, %0, v10" ::"f"(f[42 + base_idx_1])); - asm volatile("vfmacc.vf v22, %0, v10" ::"f"(f[35 + base_idx_1])); - asm volatile("vfmacc.vf v24, %0, v10" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v10" ::"f"(f[21 + base_idx_1])); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[14 + base_idx_1])); - asm volatile("vfslide1down.vf v12, v14, %0" ::"f"(*i_slide_ptr_3++)); - asm volatile("vfmacc.vf v22, %0, v14" ::"f"(f[42 + base_idx_1])); - asm volatile("vfmacc.vf v24, %0, v14" ::"f"(f[35 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v14" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v28, %0, v14" ::"f"(f[21 + base_idx_1])); - } - - if (ch != C - 1) { - int64_t base_idx_0 = (F - 1) + (ch * fch_len); - - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f[0 + base_idx_0])); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(f[7 + base_idx_0])); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(f[21 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[14 + base_idx_0])); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v24, %0, v12" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v12" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v12" ::"f"(f[21 + base_idx_0])); - } - } - - asm volatile("vfmacc.vf v16, %0, v0" ::"f"(fl6)); - asm volatile("vfmacc.vf v18, %0, v0" ::"f"(fl5)); - asm volatile("vfmacc.vf v20, %0, v0" ::"f"(fl4)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v22, %0, v0" ::"f"(fl3)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(fl2)); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(fl1)); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(fl0)); - asm volatile("vfmacc.vf v18, %0, v4" ::"f"(fl6)); - asm volatile("vfmacc.vf v20, %0, v4" ::"f"(fl5)); - asm volatile("vse64.v v18, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v22, %0, v4" ::"f"(fl4)); - asm volatile("vfmacc.vf v24, %0, v4" ::"f"(fl3)); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(fl2)); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(fl1)); - asm volatile("vfmacc.vf v20, %0, v8" ::"f"(fl6)); - asm volatile("vfmacc.vf v22, %0, v8" ::"f"(fl5)); - asm volatile("vse64.v v20, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v24, %0, v8" ::"f"(fl4)); - asm volatile("vfmacc.vf v26, %0, v8" ::"f"(fl3)); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(fl2)); - asm volatile("vfmacc.vf v22, %0, v12" ::"f"(fl6)); - asm volatile("vse64.v v22, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v24, %0, v12" ::"f"(fl5)); - asm volatile("vfmacc.vf v26, %0, v12" ::"f"(fl4)); - asm volatile("vfmacc.vf v28, %0, v12" ::"f"(fl3)); - - // Bump the input ptr - i_ += 4 * (N + F - 1); - - ////////////////////////// - // Row (I-1)-3 -> (I-1) // - ////////////////////////// - - for (int64_t ch = 0; ch < C; ++ch) { - - // Point to the first element of the channel ch - i__ = i_ + ch * ich_len; - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i__ + n_ + 0 * (N + F - 1); - i_slide_ptr_1 = i__ + n_ + 1 * (N + F - 1); - i_slide_ptr_2 = i__ + n_ + 2 * (N + F - 1); - - asm volatile("vle64.v v2, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" - : "+&r"(i__) - : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - // Two base indexes because of the unrolling - // Point to the first element of the current column (k) of the current - // channel (ch) of the filter (f) - int64_t base_idx_0 = (2 * k) + (ch * fch_len); - // Point to the first element of the current column (k+1) of the current - // channel (ch) of the filter (f) - int64_t base_idx_1 = (2 * k + 1) + (ch * fch_len); - - asm volatile("vfslide1down.vf v0, v2, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f[35 + base_idx_0])); - asm volatile("vfslide1down.vf v4, v6, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[42 + base_idx_0])); - asm volatile("vfslide1down.vf v8, v10, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[42 + base_idx_0])); - - asm volatile("vfslide1down.vf v2, v0, %0" ::"f"(*i_slide_ptr_0++)); - asm volatile("vfmacc.vf v24, %0, v0" ::"f"(f[42 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v0" ::"f"(f[35 + base_idx_1])); - asm volatile("vfslide1down.vf v6, v4, %0" ::"f"(*i_slide_ptr_1++)); - asm volatile("vfmacc.vf v28, %0, v0" ::"f"(f[28 + base_idx_1])); - asm volatile("vfmacc.vf v26, %0, v4" ::"f"(f[42 + base_idx_1])); - asm volatile("vfslide1down.vf v10, v8, %0" ::"f"(*i_slide_ptr_2++)); - asm volatile("vfmacc.vf v28, %0, v4" ::"f"(f[35 + base_idx_1])); - asm volatile("vfmacc.vf v28, %0, v8" ::"f"(f[42 + base_idx_1])); - } - - if (ch != C - 1) { - int64_t base_idx_0 = (F - 1) + (ch * fch_len); - - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(f[28 + base_idx_0])); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(f[42 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(f[35 + base_idx_0])); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(f[42 + base_idx_0])); - } - } - - asm volatile("vfmacc.vf v24, %0, v2" ::"f"(fl6)); - asm volatile("vse64.v v24, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v26, %0, v2" ::"f"(fl5)); - asm volatile("vfmacc.vf v28, %0, v2" ::"f"(fl4)); - asm volatile("vfmacc.vf v26, %0, v6" ::"f"(fl6)); - asm volatile("vse64.v v26, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vfmacc.vf v28, %0, v6" ::"f"(fl5)); - asm volatile("vfmacc.vf v28, %0, v10" ::"f"(fl6)); - asm volatile("vse64.v v28, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/gen_data.py deleted file mode 100755 index ae7d6274..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/gen_data.py +++ /dev/null @@ -1,131 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: image size, arg2: filter size - -import numpy as np -import sys - - -def convolve2D(kernel, image, padding): - # Default stride - strides = 1 - - # Gather Shapes of Kernel + Image + Padding - xKernShape = kernel.shape[0] - yKernShape = kernel.shape[1] - xImgShape = image.shape[0] - yImgShape = image.shape[1] - - # Shape of Output Convolution - xOutput = xImgShape - xKernShape + 1 - yOutput = yImgShape - yKernShape + 1 - output = np.zeros((xOutput, yOutput)) - - # Iterate through image - for y in range(image.shape[1]): - # Exit Convolution - if y > image.shape[1] - yKernShape: - break - # Only Convolve if y has gone down by the specified Strides - if y % strides == 0: - for x in range(image.shape[0]): - # Go to next row once kernel is out of bounds - if x > image.shape[0] - xKernShape: - break - try: - # Only Convolve if x has moved by the specified Strides - if x % strides == 0: - output[x, y] = ( - kernel * image[x : x + xKernShape, y : y + yKernShape] - ).sum() - except Exception: - break - - return output - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# Define the filter size and the matrix dimension (max, for now, is 128 64-bit elements) -if len(sys.argv) > 1: - matrix_width = int(sys.argv[1]) - # assert(matrix_width <= 128), "The width of the image cannot be greater than 128 64-bit \ - # elements. If this is not enough, modify the algorithm." - F = int(sys.argv[2]) - # Filter size must be odd - assert F % 2 == 1, "The filter size must be an odd integer number" -else: - matrix_width = 64 - F = 3 - -# 64-bit data -dtype = np.float64 - -# Input image. Take a square image -M = matrix_width -N = matrix_width -# 3 Channels -CH = 3 -padding = int(F / 2) -M_pad = M + 2 * padding -N_pad = N + 2 * padding -assert ( - M % 4 == 0 -), "Output image dimension must be divisible by 4, pad the input image accordingly" -assert ( - N % 4 == 0 -), "Output image dimension must be divisible by 4, pad the input image accordingly" - -image = list() -# Generate a random float64 input padded image -for ch in range(CH): - image += [np.random.rand(M_pad, N_pad).astype(dtype)] - -gen_filter = list() -# Generate a random float64 filter -for ch in range(CH): - gen_filter += [np.random.rand(F, F).astype(dtype)] - -# Create the empty o matrix -empty_o = np.zeros((M, N)).astype(dtype) - -# Calculate the output matrix -result = np.zeros((M, N)).astype(dtype) -for ch in range(CH): - result += convolve2D(gen_filter[ch], image[ch], padding).astype(dtype) - -# Print information on file -print('.section .data,"aw",@progbits') -emit("M", np.array(M, dtype=np.uint64)) -emit("N", np.array(N, dtype=np.uint64)) -emit("F", np.array(F, dtype=np.uint64)) -emit("CH", np.array(CH, dtype=np.uint64)) -emit("i", np.concatenate(image), "NR_LANES*4") -emit("f", np.concatenate(gen_filter), "NR_LANES*4") -emit("o", empty_o, "NR_LANES*4") -emit("golden_o", result, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/main.c b/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/main.c deleted file mode 100644 index eec76a4d..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fconv3d/main.c +++ /dev/null @@ -1,101 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include -#include -#include - -#include "ara/util.h" -#include "fconv3d.h" -#include "util.h" - -// Define Matrix dimensions: -// o = i ° f, with i=[(M+F-1)x(N+f-1)xCH], f=[FxFxCH], o=[MxN] -// The filter is a square matrix, and F is odd - -// Matrices defined in data.S -extern double i[] - __attribute__((aligned(32))); // [ (M+floor(F/2)) * (N+floor(F/2)) * CH ] -extern double f[] __attribute__((aligned(32))); // [ F*F*CH ] -extern double o[] __attribute__((aligned(32))); // [ M*N ] -extern double golden_o[] __attribute__((aligned(32))); // [ M*N ] -// M, N, F defined in data.S -extern int64_t M; -extern int64_t N; -extern int64_t CH; -extern int64_t F; - -// Verify the matrices -int verify_matrix(double *matrix, double *golden_matrix, int64_t R, int64_t C, - double threshold) { - for (int r = 0; r < R; ++r) - for (int c = 0; c < C; ++c) - if (!similarity_check(matrix[c + C * r], golden_matrix[c + C * r], - threshold)) { - printf("Error: o[%d][%d] = %lf, instead of %lf\n", r, c, - matrix[c + C * r], golden_matrix[c + C * r]); - return 1; - } - return 0; -} - -int main() { - printf("FCONV3D float64\n"); - printf("Input Mtx size: %dx%d\n", M + F - 1, N + F - 1); - printf("Output Mtx size: %dx%d\n", M, N); - printf("Filter size: %dx%d\n", F, F); - printf("Channels: %d\n", CH); - -#if PREALLOCATE - if (F == 7) - fconv3d_CHx7x7(o, i, f, M, N, CH, F); - else - printf("Error: the filter size is different from 7.\n"); -#endif - - unsigned long cycles1, cycles2, instr2, instr1; - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - // Call the main kernel, and measure cycles - if (F == 7) - fconv3d_CHx7x7(o, i, f, M, N, CH, F); - else - printf("Error: the filter size is different from 7.\n"); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - // Performance metrics - int64_t runtime = cycles2 - cycles1; - float performance = 2.0 * CH * F * F * M * N / runtime; - printf("Operations: %ld\n", CH * F * F * M * N); - printf("The execution took %d cycles.\n", runtime); - printf("The performance is %ld DPFLOP/1000 cycles.\n", - (uint64_t)(1000.0 * performance)); - - // Verify correctness - printf("Verifying result...\n"); - int error = verify_matrix(o, golden_o, M, N, THRESHOLD); - if (error != 0) { - printf("Fail.\n"); - } else { - printf("Passed.\n"); - } - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fdotprod/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-fdotprod/gen_data.py deleted file mode 100755 index 8c2f4b48..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fdotprod/gen_data.py +++ /dev/null @@ -1,85 +0,0 @@ -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Generate input data for fdotp benchmark -# arg: #elements per vector - -import numpy as np -import random -from functools import reduce -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# Vector length -if len(sys.argv) > 1: - vsize = int(sys.argv[1]) -else: - # Default: no stripmine - vsize = 64 - -avl64 = int(vsize) -avl32 = int(vsize) -avl16 = int(vsize) - -# Create the vectors -v64a = np.random.rand(avl64).astype(np.float64) -v64b = np.random.rand(avl64).astype(np.float64) -v32a = np.random.rand(avl32).astype(np.float32) -v32b = np.random.rand(avl32).astype(np.float32) -v16a = np.random.rand(avl16).astype(np.float16) -v16b = np.random.rand(avl16).astype(np.float16) - -# Create the golden output -gold64 = reduce(lambda a, b: a + b, np.multiply(v64a, v64b)) -gold32 = reduce(lambda a, b: a + b, np.multiply(v32a, v32b)) -gold16 = reduce(lambda a, b: a + b, np.multiply(v16a, v16b)) -gold16 = np.array([gold16, gold16]) - -# Create the empty result vectors -res64 = 0 -res32 = 0 -res16 = 0 - -# Print information on file -print('.section .data,"aw",@progbits') -emit("vsize", np.array(vsize, dtype=np.uint64)) -emit("v64a", v64a, "32") -emit("v64b", v64b, "32") -emit("v32a", v32a, "32") -emit("v32b", v32b, "32") -emit("v16a", v16a, "32") -emit("v16b", v16b, "32") -emit("gold64", np.array(gold64, dtype=np.float64)) -emit("gold32", np.array(gold32, dtype=np.float32)) -emit("gold16", gold16, "32") -emit("res64_v", np.array(res64, dtype=np.float64)) -emit("res32_v", np.array(res32, dtype=np.float32)) -emit("res16_v", np.array(res16, dtype=np.float32)) -emit("res64_s", np.array(res64, dtype=np.float64)) -emit("res32_s", np.array(res32, dtype=np.float32)) -emit("res16_s", np.array(res16, dtype=np.float32)) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fdotprod/main.c b/bb-tests/workloads/src/CTest/rvv/vec-fdotprod/main.c deleted file mode 100644 index 4e18c7fc..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fdotprod/main.c +++ /dev/null @@ -1,91 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "util.h" -#include -#include - -#include "ara/fdotproduct.h" -#include "ara/util.h" -#include - -// Threshold for FP comparisons -#define THRESHOLD_64b 0.0000000001 -#define THRESHOLD_32b 0.0001 -#define THRESHOLD_16b 1 - -// Vector size (Byte) -extern uint64_t vsize; -// Input vectors -extern double v64a[] __attribute__((aligned(256))); -extern double v64b[] __attribute__((aligned(256))); -extern float v32a[] __attribute__((aligned(256))); -extern float v32b[] __attribute__((aligned(256))); -extern _Float16 v16a[] __attribute__((aligned(256))); -extern _Float16 v16b[] __attribute__((aligned(256))); -// Golden outputs -extern double gold64; -extern float gold32; -extern _Float16 gold16; -// Output vectors -extern double res64_v, res64_s; -extern float res32_v, res32_s; -extern _Float16 res16_v, res16_s; - -int main() { - printf("FDOTP\n"); - - unsigned long cycles1, cycles2, instr2, instr1; - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - printf("Calulating 64b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res64_v = fdotp_v64b(v64a, v64b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector runtime: %ld\n", cycles2 - cycles1); - } - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - printf("Calulating 32b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res32_v = fdotp_v32b(v32a, v32b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector runtime: %ld\n", cycles2 - cycles1); - } - - for (uint64_t avl = 8; avl <= vsize; avl *= 8) { - printf("Calulating 16b dotp with vectors with length = %lu\n", avl); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - res16_v = fdotp_v16b(v16a, v16b, avl); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Vector runtime: %ld\n", cycles2 - cycles1); - } - - printf("SUCCESS.\n"); - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fft/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-fft/dataset1.h deleted file mode 100644 index e24c6c7c..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fft/dataset1.h +++ /dev/null @@ -1,4450 +0,0 @@ -#define LOG2_DATA_SIZE 10 -#define DATA_SIZE 1024 -float input_Xr[DATA_SIZE] = { - 0.171875f, 1.21875f, 0.765625f, 6.625f, 0.0f, 2.40625f, - 12.8125f, 0.265625f, 1.15625f, 0.125f, 6.625f, 0.0f, - 1.640625f, 4.3125f, 0.515625f, 3.09375f, 4.40625f, 3.8125f, - 7.875f, 1.890625f, 2.703125f, 14.0f, 4.75f, 3.203125f, - 3.0f, 6.0f, 3.6875f, 7.03125f, 0.328125f, 2.25f, - 1.6640625f, 14.375f, 7.5f, 0.4375f, 0.953125f, 3.40625f, - 0.8359375f, 1.5625f, 12.0f, 15.4375f, 1.21875f, 0.1640625f, - 0.90625f, 12.125f, 1.9765625f, 5.0f, 2.453125f, 1.796875f, - 2.03125f, 0.3671875f, 0.453125f, 3.15625f, 1.1875f, 0.0625f, - 1.1484375f, 0.265625f, 1.21875f, 0.34375f, 5.625f, 0.328125f, - 3.921875f, 9.25f, 2.21875f, 5.0f, 5.625f, 0.125f, - 12.8125f, 0.3125f, 11.4375f, 1.6484375f, 7.4375f, 0.7734375f, - 3.15625f, 5.9375f, 0.5625f, 5.1875f, 3.75f, 0.421875f, - 15.875f, 4.75f, 6.0625f, 3.90625f, 7.8125f, 5.71875f, - 0.3125f, 5.9375f, 1.7578125f, 2.875f, 0.1328125f, 0.953125f, - 1.453125f, 3.34375f, 13.3125f, 0.015625f, 7.8125f, 4.9375f, - 0.59375f, 0.5f, 1.21875f, 12.9375f, 0.78125f, 9.9375f, - 0.203125f, 0.4375f, 3.625f, 6.875f, 1.6796875f, 0.609375f, - 0.046875f, 1.703125f, 1.765625f, 4.3125f, 1.1015625f, 1.03125f, - 4.4375f, 0.625f, 11.6875f, 1.5859375f, 3.421875f, 9.1875f, - 7.4375f, 0.5703125f, 6.9375f, 1.2734375f, 5.25f, 1.1484375f, - 2.46875f, 1.25f, 4.59375f, 0.40625f, 0.703125f, 1.609375f, - 1.8046875f, 4.40625f, 11.5f, 4.5f, 0.734375f, 3.3125f, - 3.734375f, 1.125f, 0.75f, 2.28125f, 5.40625f, 0.1875f, - 0.4375f, 7.0625f, 1.8359375f, 1.625f, 1.3984375f, 3.75f, - 3.40625f, 1.921875f, 2.5f, 3.578125f, 0.5390625f, 3.875f, - 7.8125f, 12.8125f, 2.1875f, 0.0f, 3.28125f, 3.0625f, - 0.6953125f, 1.1171875f, 0.8125f, 2.6875f, 1.171875f, 1.3359375f, - 7.75f, 0.8125f, 0.3203125f, 0.03125f, 0.8125f, 7.0f, - 1.125f, 0.796875f, 1.640625f, 3.03125f, 2.78125f, 0.484375f, - 1.828125f, 3.25f, 2.6875f, 7.625f, 0.890625f, 15.625f, - 1.609375f, 2.84375f, 0.6953125f, 1.203125f, 3.515625f, 0.703125f, - 1.9375f, 10.9375f, 0.1796875f, 3.0625f, 1.640625f, 1.625f, - 3.015625f, 1.828125f, 3.0625f, 0.984375f, 0.5234375f, 0.34375f, - 4.5f, 7.46875f, 3.9375f, 0.78125f, 0.59375f, 1.6015625f, - 12.0625f, 3.96875f, 2.53125f, 4.0625f, 0.4609375f, 12.375f, - 1.03125f, 1.953125f, 1.875f, 3.0f, 1.3515625f, 3.46875f, - 0.8125f, 1.953125f, 10.9375f, 0.109375f, 3.375f, 1.5f, - 1.8046875f, 1.8125f, 1.09375f, 3.84375f, 0.8671875f, 15.375f, - 1.03125f, 2.1875f, 3.875f, 4.375f, 3.515625f, 3.640625f, - 7.65625f, 2.875f, 1.4375f, 6.65625f, 1.6875f, 0.0625f, - 1.34375f, 0.9375f, 1.84375f, 2.375f, 0.5f, 2.625f, - 10.3125f, 0.53125f, 3.6875f, 0.25f, 3.65625f, 7.40625f, - 1.375f, 7.375f, 3.8125f, 4.0f, 0.953125f, 1.0546875f, - 0.1171875f, 2.65625f, 3.40625f, 3.59375f, 0.90625f, 3.96875f, - 8.625f, 13.9375f, 0.9375f, 3.28125f, 3.5625f, 14.25f, - 2.859375f, 5.625f, 1.4375f, 5.46875f, 0.234375f, 6.5f, - 1.0859375f, 8.625f, 5.71875f, 6.625f, 3.71875f, 3.921875f, - 5.625f, 12.3125f, 0.09375f, 2.015625f, 0.28125f, 7.96875f, - 8.9375f, 2.59375f, 3.109375f, 12.9375f, 0.625f, 6.96875f, - 2.3125f, 0.578125f, 2.21875f, 6.25f, 0.328125f, 3.03125f, - 0.171875f, 3.90625f, 5.1875f, 1.90625f, 1.5f, 6.125f, - 14.0625f, 1.96875f, 15.0f, 2.375f, 2.421875f, 1.8515625f, - 0.25f, 4.375f, 1.6953125f, 1.46875f, 1.8515625f, 1.75f, - 0.5625f, 1.1484375f, 0.90625f, 12.3125f, 3.875f, 3.78125f, - 11.8125f, 1.46875f, 0.546875f, 1.2890625f, 1.90625f, 0.921875f, - 0.96875f, 15.9375f, 9.3125f, 6.59375f, 0.625f, 12.9375f, - 0.171875f, 0.84375f, 6.5f, 7.5625f, 1.125f, 1.5f, - 0.8203125f, 13.9375f, 5.8125f, 1.09375f, 3.0f, 1.3984375f, - 15.6875f, 0.375f, 1.03125f, 0.2890625f, 3.140625f, 0.953125f, - 3.125f, 1.6171875f, 0.5859375f, 1.5625f, 0.5546875f, 6.0625f, - 0.4765625f, 14.1875f, 3.90625f, 0.734375f, 2.609375f, 15.0f, - 1.171875f, 4.0f, 9.0625f, 1.984375f, 2.265625f, 1.453125f, - 1.75f, 0.625f, 5.125f, 8.375f, 2.875f, 14.625f, - 0.125f, 2.703125f, 0.515625f, 1.109375f, 3.90625f, 2.515625f, - 0.140625f, 1.5234375f, 0.0625f, 2.375f, 1.328125f, 0.6484375f, - 0.4296875f, 5.0f, 2.0f, 11.25f, 1.3125f, 1.6015625f, - 0.6328125f, 2.25f, 11.125f, 6.96875f, 2.9375f, 1.546875f, - 0.0078125f, 14.3125f, 3.40625f, 1.1328125f, 4.3125f, 3.0625f, - 1.875f, 9.125f, 1.640625f, 13.8125f, 2.578125f, 0.7265625f, - 10.75f, 0.625f, 3.625f, 1.921875f, 1.0546875f, 0.0546875f, - 6.125f, 8.5f, 14.3125f, 0.7265625f, 2.125f, 0.84375f, - 0.3125f, 1.921875f, 10.0f, 4.9375f, 5.625f, 0.0f, - 0.8125f, 1.4765625f, 7.0625f, 1.1953125f, 0.75f, 0.921875f, - 4.59375f, 1.09375f, 0.75f, 1.015625f, 15.4375f, 1.03125f, - 0.8125f, 5.875f, 14.9375f, 0.796875f, 0.5703125f, 15.625f, - 7.3125f, 0.78125f, 0.3984375f, 1.765625f, 4.125f, 0.546875f, - 4.0f, 2.3125f, 4.125f, 9.5625f, 5.65625f, 2.125f, - 0.1875f, 3.765625f, 7.21875f, 2.53125f, 1.578125f, 9.375f, - 6.4375f, 14.6875f, 0.9296875f, 7.0625f, 2.9375f, 2.546875f, - 11.6875f, 1.6796875f, 12.9375f, 0.328125f, 3.53125f, 14.875f, - 13.5625f, 4.71875f, 6.625f, 12.75f, 1.796875f, 0.96875f, - 1.703125f, 3.046875f, 0.8125f, 0.984375f, 3.453125f, 2.0f, - 5.65625f, 7.90625f, 0.875f, 2.6875f, 0.125f, 0.921875f, - 6.65625f, 14.8125f, 2.78125f, 0.65625f, 0.3359375f, 0.1015625f, - 2.484375f, 12.6875f, 1.25f, 7.625f, 1.0f, 1.53125f, - 0.71875f, 0.4375f, 2.75f, 15.4375f, 1.328125f, 3.0f, - 6.65625f, 2.5625f, 0.8125f, 12.375f, 6.125f, 2.0625f, - 0.703125f, 11.6875f, 1.9921875f, 0.6171875f, 6.875f, 1.59375f, - 0.3515625f, 4.78125f, 1.28125f, 0.296875f, 2.65625f, 2.21875f, - 0.5234375f, 0.921875f, 1.0625f, 5.0625f, 1.0390625f, 1.453125f, - 11.125f, 3.5625f, 0.0546875f, 1.7109375f, 2.328125f, 2.390625f, - 9.8125f, 3.953125f, 2.296875f, 1.6015625f, 0.65625f, 3.25f, - 2.75f, 12.8125f, 1.625f, 2.109375f, 0.84375f, 3.625f, - 0.53125f, 0.15625f, 6.4375f, 3.4375f, 13.4375f, 1.171875f, - 1.890625f, 0.9921875f, 1.90625f, 1.15625f, 0.8046875f, 3.90625f, - 4.75f, 2.75f, 7.28125f, 14.0f, 3.71875f, 2.15625f, - 15.25f, 13.4375f, 0.8828125f, 7.90625f, 0.828125f, 12.8125f, - 2.390625f, 1.71875f, 2.5625f, 0.375f, 1.53125f, 0.390625f, - 1.5625f, 0.078125f, 1.359375f, 10.125f, 0.453125f, 1.296875f, - 11.375f, 1.0f, 1.5f, 1.625f, 4.65625f, 9.4375f, - 0.9609375f, 3.265625f, 1.796875f, 6.6875f, 0.1640625f, 0.5703125f, - 1.2734375f, 15.9375f, 1.1953125f, 3.703125f, 0.3125f, 0.21875f, - 1.15625f, 0.734375f, 5.46875f, 0.7890625f, 6.1875f, 0.96875f, - 0.0625f, 9.5f, 1.46875f, 2.953125f, 4.375f, 4.125f, - 3.171875f, 1.5f, 10.25f, 0.921875f, 0.484375f, 6.25f, - 2.0f, 12.0f, 1.9296875f, 3.46875f, 2.90625f, 2.09375f, - 2.625f, 0.90625f, 11.8125f, 9.5f, 2.6875f, 1.15625f, - 4.0f, 1.40625f, 9.4375f, 1.515625f, 3.390625f, 0.4453125f, - 9.9375f, 3.03125f, 1.984375f, 1.0625f, 0.546875f, 5.5f, - 0.09375f, 4.9375f, 1.3984375f, 3.53125f, 3.34375f, 0.09375f, - 0.3359375f, 6.46875f, 2.375f, 2.0625f, 1.90625f, 8.625f, - 3.171875f, 1.4453125f, 3.0f, 11.0625f, 11.75f, 2.9375f, - 5.9375f, 3.296875f, 2.328125f, 3.5f, 3.15625f, 0.640625f, - 6.125f, 0.9453125f, 0.375f, 1.8515625f, 1.96875f, 0.78125f, - 1.6171875f, 0.609375f, 15.875f, 2.984375f, 0.3359375f, 8.1875f, - 0.71875f, 0.125f, 2.59375f, 6.15625f, 0.9921875f, 2.890625f, - 0.375f, 2.65625f, 1.3203125f, 3.640625f, 0.3828125f, 3.875f, - 0.6875f, 6.0625f, 0.90625f, 0.609375f, 5.09375f, 0.625f, - 0.1875f, 0.0234375f, 1.765625f, 0.671875f, 2.65625f, 1.3125f, - 1.5625f, 13.5f, 15.8125f, 12.375f, 3.0f, 0.4375f, - 4.125f, 1.734375f, 0.1875f, 0.921875f, 0.78125f, 0.96875f, - 1.5078125f, 0.21875f, 1.8359375f, 3.6875f, 1.0625f, 0.6015625f, - 13.8125f, 1.8125f, 5.75f, 3.28125f, 13.5f, 4.3125f, - 0.890625f, 1.03125f, 15.1875f, 1.5625f, 3.90625f, 6.125f, - 3.0625f, 2.109375f, 0.328125f, 0.8515625f, 1.953125f, 3.625f, - 6.3125f, 0.15625f, 1.0625f, 1.15625f, 5.5f, 2.75f, - 7.75f, 3.25f, 6.96875f, 2.203125f, 1.5f, 1.1953125f, - 0.265625f, 11.125f, 0.1484375f, 3.953125f, 15.125f, 1.5625f, - 2.390625f, 0.859375f, 0.109375f, 3.859375f, 3.890625f, 5.125f, - 0.4921875f, 6.75f, 3.703125f, 6.4375f, 13.5f, 1.6796875f, - 11.125f, 1.90625f, 0.40625f, 3.8125f, 3.90625f, 7.3125f, - 5.9375f, 5.375f, 0.8125f, 0.7421875f, 0.140625f, 2.3125f, - 7.6875f, 13.1875f, 10.8125f, 6.3125f, 1.203125f, 5.8125f, - 1.71875f, 1.625f, 1.40625f, 11.4375f, 0.1171875f, 1.703125f, - 3.046875f, 2.78125f, 2.546875f, 1.046875f, 5.0625f, 1.203125f, - 1.078125f, 0.890625f, 1.5390625f, 0.109375f, 3.125f, 10.5f, - 2.125f, 3.25f, 1.0f, 5.75f, 0.4765625f, 1.546875f, - 2.0f, 14.8125f, 2.53125f, 0.46875f, 1.8515625f, 0.484375f, - 0.421875f, 1.125f, 3.125f, 3.859375f, 2.171875f, 1.28125f, - 2.765625f, 3.515625f, 3.875f, 7.875f, 1.0546875f, 3.03125f, - 1.1015625f, 0.8359375f, 0.15625f, 3.03125f, 4.8125f, 7.0f, - 3.0f, 3.296875f, 2.609375f, 3.53125f, 1.9453125f, 3.390625f, - 0.84375f, 1.546875f, 1.875f, 6.90625f, 3.40625f, 1.375f, - 0.09375f, 2.171875f, 0.859375f, 0.9765625f, 14.8125f, 1.0625f, - 14.0625f, 2.65625f, 1.328125f, 0.03125f, 1.640625f, 0.984375f, - 0.9375f, 0.9375f, 0.1875f, 0.828125f, 7.625f, 4.46875f, - 4.0625f, 1.78125f, 4.46875f, 1.8046875f, 1.8828125f, 0.734375f, - 14.9375f, 1.296875f, 3.84375f, 2.0625f, 3.96875f, 4.375f, - 1.4375f, 1.21875f, 0.96875f, 1.4375f, 6.40625f, 1.859375f, - 2.3125f, 13.6875f, 5.125f, 1.828125f, 6.1875f, 0.25f, - 1.8125f, 0.375f, 0.921875f, 5.625f, 12.4375f, 4.75f, - 0.890625f, 8.9375f, 6.125f, 15.6875f, 0.5390625f, 1.5f, - 14.375f, 1.4296875f, 4.875f, 3.09375f, 0.0859375f, 1.3828125f, - 12.875f, 0.25f, 0.8125f, 1.28125f, 10.1875f, 6.6875f, - 6.9375f, 2.296875f, 0.703125f, 1.984375f, 1.125f, 1.71875f, - 4.5625f, 0.609375f, 1.0f, 1.0859375f, 3.078125f, 0.375f, - 1.03125f, 0.140625f, 8.0f, 2.15625f, 4.8125f, 1.21875f, - 0.6875f, 1.25f, 2.15625f, 1.625f, 2.0f, 1.9296875f, - 5.4375f, 2.125f, 9.0f, 0.3125f, 3.921875f, 0.0546875f, - 6.5625f, 0.3671875f, 6.78125f, 3.375f, 2.90625f, 1.234375f, - 0.625f, 3.3125f, 1.8671875f, 0.6953125f, 6.125f, 7.0625f, - 6.9375f, 0.5f, 0.4140625f, 3.453125f, 0.5f, 4.9375f, - 7.8125f, 0.6796875f, 0.5625f, 2.125f, 3.28125f, 1.671875f, - 0.015625f, 0.5390625f, 2.65625f, 7.1875f, 6.5f, 1.828125f, - 2.4375f, 0.5f, 1.3984375f, 9.125f, 0.875f, 4.6875f, - 1.453125f, 2.453125f, 0.796875f, 5.34375f, 3.125f, 1.1328125f, - 9.9375f, 2.5625f, 5.40625f, 2.4375f, 2.34375f, 0.0078125f, - 1.890625f, 6.625f, 1.46875f, 1.6875f, 2.21875f, 0.828125f, - 2.90625f, 11.0f, 0.515625f, 3.296875f, 1.140625f, 4.25f, - 8.4375f, 1.5f, 0.2890625f, 0.546875f, -}; -float input_Xi[DATA_SIZE] = { - 0.078125f, 1.671875f, 9.3125f, 1.75f, 1.5625f, 6.96875f, - 0.59375f, 1.59375f, 2.5625f, 5.5f, 3.21875f, 1.09375f, - 4.9375f, 3.15625f, 0.109375f, 7.59375f, 3.359375f, 2.25f, - 1.2109375f, 2.03125f, 6.84375f, 1.40625f, 7.59375f, 15.375f, - 1.1875f, 1.03125f, 2.390625f, 0.5f, 6.4375f, 2.96875f, - 0.921875f, 2.609375f, 3.421875f, 2.890625f, 3.375f, 13.6875f, - 2.625f, 0.9609375f, 2.0625f, 5.21875f, 10.9375f, 0.828125f, - 4.125f, 1.4375f, 7.84375f, 1.0703125f, 1.84375f, 3.125f, - 6.0f, 4.75f, 10.8125f, 2.75f, 0.8203125f, 10.6875f, - 0.453125f, 0.03125f, 0.109375f, 0.5234375f, 1.25f, 0.2265625f, - 0.6875f, 12.3125f, 0.671875f, 4.40625f, 1.734375f, 6.96875f, - 6.03125f, 0.765625f, 1.1953125f, 7.4375f, 2.3125f, 1.7265625f, - 6.75f, 3.5f, 2.15625f, 0.34375f, 7.28125f, 1.8828125f, - 3.6875f, 2.9375f, 3.28125f, 2.625f, 0.265625f, 7.75f, - 1.0f, 2.96875f, 9.1875f, 2.171875f, 5.03125f, 6.4375f, - 8.25f, 2.1875f, 0.0234375f, 1.8671875f, 1.25f, 7.78125f, - 3.59375f, 3.46875f, 1.640625f, 1.4453125f, 11.5625f, 3.875f, - 3.6875f, 2.0f, 0.0f, 1.9375f, 0.546875f, 11.0f, - 7.9375f, 1.5f, 0.765625f, 0.0f, 11.6875f, 2.875f, - 2.40625f, 6.125f, 0.5f, 1.4453125f, 1.296875f, 3.75f, - 7.75f, 0.25f, 1.046875f, 1.9375f, 14.6875f, 0.34375f, - 3.9375f, 2.515625f, 2.765625f, 5.875f, 15.5f, 7.15625f, - 12.8125f, 7.40625f, 2.21875f, 1.0859375f, 1.15625f, 0.953125f, - 4.25f, 1.453125f, 2.5625f, 4.8125f, 5.5625f, 1.875f, - 0.3125f, 1.0f, 2.0625f, 2.734375f, 1.84375f, 0.328125f, - 2.046875f, 2.40625f, 3.609375f, 4.6875f, 0.3125f, 7.46875f, - 1.265625f, 5.4375f, 1.0625f, 5.375f, 0.53125f, 0.59375f, - 3.75f, 0.875f, 0.59375f, 8.5f, 0.8125f, 2.171875f, - 0.5234375f, 0.59375f, 7.15625f, 12.1875f, 0.6015625f, 6.4375f, - 0.3125f, 0.71875f, 0.9609375f, 0.03125f, 2.953125f, 3.375f, - 7.0f, 13.3125f, 3.078125f, 1.1875f, 0.234375f, 2.90625f, - 0.15625f, 1.140625f, 2.046875f, 12.875f, 0.515625f, 2.609375f, - 1.2109375f, 1.84375f, 1.4765625f, 2.5f, 1.4453125f, 3.0625f, - 3.53125f, 4.9375f, 0.3515625f, 9.375f, 2.265625f, 4.8125f, - 1.7734375f, 5.78125f, 7.6875f, 2.796875f, 7.84375f, 8.5f, - 1.8203125f, 1.21875f, 1.4375f, 14.9375f, 5.15625f, 0.34375f, - 0.125f, 1.8203125f, 1.3125f, 0.5625f, 0.7265625f, 0.375f, - 2.15625f, 3.84375f, 1.765625f, 0.703125f, 1.90625f, 0.765625f, - 14.6875f, 1.8671875f, 2.3125f, 0.75f, 1.1484375f, 1.34375f, - 1.96875f, 3.09375f, 1.84375f, 1.765625f, 1.21875f, 0.5390625f, - 11.375f, 1.03125f, 4.65625f, 1.546875f, 1.9296875f, 7.25f, - 0.203125f, 2.65625f, 0.859375f, 5.5625f, 2.546875f, 5.6875f, - 3.953125f, 0.703125f, 3.046875f, 0.625f, 7.6875f, 4.3125f, - 0.265625f, 1.390625f, 1.265625f, 1.9375f, 3.59375f, 3.78125f, - 6.6875f, 1.640625f, 3.84375f, 2.53125f, 6.0625f, 1.390625f, - 1.609375f, 3.3125f, 1.15625f, 0.65625f, 2.125f, 6.9375f, - 4.40625f, 3.75f, 13.0f, 0.609375f, 0.90625f, 6.53125f, - 3.0f, 14.125f, 2.75f, 2.890625f, 0.625f, 1.375f, - 14.75f, 1.671875f, 2.15625f, 2.125f, 0.171875f, 4.1875f, - 3.34375f, 1.40625f, 1.7890625f, 2.203125f, 2.5f, 11.75f, - 0.109375f, 3.4375f, 3.40625f, 4.5f, 1.6875f, 3.3125f, - 5.125f, 6.6875f, 4.375f, 1.3828125f, 1.5078125f, 1.234375f, - 10.25f, 2.25f, 3.53125f, 3.0625f, 0.625f, 0.796875f, - 3.875f, 0.1875f, 15.625f, 4.8125f, 1.625f, 4.5f, - 3.0625f, 0.15625f, 0.96875f, 1.78125f, 7.1875f, 13.0625f, - 12.25f, 2.578125f, 0.765625f, 3.546875f, 0.6875f, 2.21875f, - 1.296875f, 4.78125f, 1.453125f, 1.484375f, 0.4453125f, 2.15625f, - 1.75f, 0.875f, 0.0859375f, 3.390625f, 0.7578125f, 1.359375f, - 1.125f, 1.5625f, 0.9296875f, 7.5f, 0.125f, 0.5703125f, - 1.921875f, 6.125f, 3.8125f, 1.3515625f, 3.5f, 1.25f, - 1.703125f, 0.828125f, 2.84375f, 2.46875f, 1.875f, 9.125f, - 2.65625f, 12.3125f, 3.9375f, 2.8125f, 0.09375f, 3.59375f, - 14.6875f, 1.5625f, 0.6015625f, 0.65625f, 0.25f, 14.125f, - 10.375f, 2.84375f, 0.65625f, 13.375f, 3.5f, 7.0f, - 0.2421875f, 0.671875f, 8.75f, 1.71875f, 2.078125f, 1.84375f, - 3.5f, 0.9375f, 1.6328125f, 12.6875f, 5.21875f, 11.5625f, - 3.96875f, 1.6484375f, 2.5f, 7.0625f, 0.6015625f, 0.90625f, - 9.875f, 3.90625f, 1.7109375f, 13.5625f, 0.1875f, 1.6640625f, - 8.125f, 3.5f, 0.359375f, 6.875f, 7.8125f, 3.5f, - 9.0625f, 1.0625f, 3.6875f, 13.0f, 1.390625f, 1.8125f, - 1.265625f, 0.390625f, 1.515625f, 3.265625f, 0.40625f, 0.8125f, - 7.75f, 1.421875f, 6.53125f, 4.75f, 0.2890625f, 9.9375f, - 2.5625f, 1.75f, 0.84375f, 5.4375f, 3.0625f, 1.7265625f, - 1.984375f, 9.9375f, 2.703125f, 1.921875f, 5.90625f, 12.75f, - 0.3828125f, 3.125f, 7.75f, 3.375f, 0.1875f, 5.21875f, - 3.390625f, 7.625f, 3.5625f, 0.46875f, 1.625f, 2.3125f, - 7.0f, 2.25f, 1.625f, 0.3671875f, 1.375f, 1.015625f, - 0.4765625f, 6.625f, 1.1328125f, 3.0f, 7.375f, 0.640625f, - 1.25f, 2.875f, 2.5625f, 0.5f, 2.765625f, 4.25f, - 13.625f, 6.875f, 2.78125f, 6.96875f, 1.3515625f, 0.375f, - 1.1875f, 4.875f, 1.34375f, 3.125f, 3.46875f, 4.40625f, - 1.921875f, 5.5625f, 5.8125f, 1.0f, 2.546875f, 1.671875f, - 0.5390625f, 1.765625f, 12.3125f, 1.5859375f, 0.109375f, 1.1328125f, - 3.625f, 0.84375f, 2.8125f, 0.46875f, 1.203125f, 5.53125f, - 1.5f, 0.578125f, 0.015625f, 0.8125f, 11.75f, 10.875f, - 0.0f, 2.15625f, 2.515625f, 10.8125f, 6.5f, 1.65625f, - 3.4375f, 5.25f, 15.0625f, 7.0f, 7.625f, 0.265625f, - 1.875f, 11.3125f, 5.625f, 6.28125f, 1.1875f, 2.03125f, - 0.3125f, 9.3125f, 0.375f, 0.765625f, 2.3125f, 5.8125f, - 6.875f, 1.71875f, 6.1875f, 6.09375f, 1.3125f, 8.1875f, - 0.03125f, 9.25f, 0.3046875f, 3.03125f, 6.25f, 1.53125f, - 3.15625f, 13.6875f, 6.125f, 0.953125f, 1.734375f, 1.3125f, - 0.8671875f, 5.9375f, 3.90625f, 1.875f, 2.0f, 6.9375f, - 2.328125f, 12.9375f, 9.1875f, 1.8203125f, 3.84375f, 0.125f, - 0.234375f, 0.3125f, 14.5625f, 3.609375f, 14.0f, 15.1875f, - 15.5f, 6.125f, 3.828125f, 9.3125f, 1.109375f, 3.34375f, - 9.875f, 5.34375f, 6.5625f, 1.8984375f, 7.09375f, 0.6171875f, - 6.15625f, 5.5625f, 3.0f, 3.578125f, 1.46875f, 9.6875f, - 3.578125f, 13.9375f, 2.9375f, 15.75f, 2.265625f, 1.484375f, - 3.40625f, 0.8125f, 1.484375f, 10.8125f, 0.6796875f, 2.3125f, - 0.9375f, 3.4375f, 6.625f, 3.6875f, 0.6484375f, 0.0625f, - 0.609375f, 0.28125f, 0.125f, 14.5f, 1.6875f, 1.4765625f, - 0.640625f, 11.125f, 1.40625f, 15.3125f, 3.46875f, 0.2734375f, - 3.5625f, 0.25f, 2.4375f, 2.3125f, 9.25f, 3.0625f, - 4.875f, 15.75f, 1.8125f, 3.90625f, 2.375f, 1.71875f, - 1.21875f, 4.90625f, 2.625f, 4.5625f, 7.21875f, 0.015625f, - 1.7265625f, 0.375f, 2.421875f, 0.328125f, 0.875f, 7.84375f, - 3.921875f, 1.5f, 1.578125f, 3.03125f, 3.15625f, 0.1875f, - 5.9375f, 12.375f, 9.3125f, 1.7109375f, 6.8125f, 0.21875f, - 0.703125f, 1.03125f, 5.21875f, 0.453125f, 15.25f, 10.1875f, - 1.53125f, 5.3125f, 1.3125f, 1.140625f, 3.734375f, 5.25f, - 10.6875f, 7.78125f, 1.46875f, 1.359375f, 0.6015625f, 3.109375f, - 1.515625f, 1.5859375f, 3.40625f, 5.09375f, 7.84375f, 8.625f, - 3.8125f, 3.25f, 0.5078125f, 6.15625f, 0.703125f, 1.3125f, - 2.625f, 3.875f, 1.53125f, 3.8125f, 9.875f, 0.3515625f, - 0.03125f, 0.796875f, 2.71875f, 1.875f, 6.09375f, 1.140625f, - 3.8125f, 0.3125f, 1.828125f, 7.5625f, 11.8125f, 9.25f, - 0.5546875f, 9.625f, 0.0625f, 0.8125f, 14.9375f, 11.8125f, - 3.375f, 0.09375f, 2.8125f, 1.125f, 0.7109375f, 3.625f, - 1.8125f, 3.234375f, 2.1875f, 0.7890625f, 2.65625f, 10.125f, - 2.84375f, 2.3125f, 12.5f, 0.515625f, 3.296875f, 6.375f, - 5.40625f, 1.3125f, 0.8203125f, 5.6875f, 2.015625f, 5.75f, - 2.65625f, 0.8984375f, 12.75f, 3.25f, 6.5625f, 0.1484375f, - 0.25f, 0.03125f, 3.9375f, 2.890625f, 3.5625f, 0.0f, - 3.484375f, 1.09375f, 6.5f, 12.25f, 1.6875f, 3.25f, - 0.90625f, 2.90625f, 1.4140625f, 2.21875f, 1.875f, 8.375f, - 5.25f, 1.5f, 6.59375f, 0.9296875f, 4.4375f, 4.46875f, - 0.71875f, 6.875f, 0.1875f, 0.984375f, 15.4375f, 0.453125f, - 1.25f, 9.375f, 1.8125f, 4.40625f, 2.40625f, 7.625f, - 3.1875f, 0.078125f, 1.2265625f, 0.234375f, 6.9375f, 11.5625f, - 0.5f, 1.1875f, 0.640625f, 7.5f, 15.625f, 2.46875f, - 0.6875f, 0.875f, 2.875f, 0.0703125f, 5.09375f, 4.0f, - 3.1875f, 8.0f, 1.5f, 2.25f, 1.15625f, 1.75f, - 7.3125f, 5.03125f, 10.9375f, 6.84375f, 12.375f, 1.0f, - 7.9375f, 3.359375f, 3.671875f, 0.09375f, 1.265625f, 1.75f, - 6.1875f, 0.78125f, 1.5078125f, 1.4765625f, 1.078125f, 0.09375f, - 3.921875f, 0.7890625f, 0.515625f, 15.9375f, 1.8125f, 0.5625f, - 7.0625f, 1.5625f, 5.25f, 0.640625f, 1.328125f, 0.8125f, - 1.65625f, 1.9140625f, 0.0625f, 8.0f, 1.09375f, 14.25f, - 1.9140625f, 2.625f, 0.46875f, 4.53125f, 1.8515625f, 0.046875f, - 7.6875f, 5.4375f, 2.9375f, 1.59375f, 7.15625f, 0.734375f, - 3.40625f, 4.21875f, 1.765625f, 1.03125f, 1.5546875f, 1.0625f, - 0.390625f, 0.6484375f, 2.96875f, 1.640625f, 0.046875f, 0.0625f, - 6.6875f, 6.875f, 15.3125f, 11.1875f, 6.65625f, 6.0625f, - 4.78125f, 2.65625f, 4.625f, 1.8515625f, 2.203125f, 1.5625f, - 0.9765625f, 4.09375f, 8.8125f, 5.875f, 7.40625f, 0.5f, - 11.0625f, 2.890625f, 5.9375f, 1.859375f, 1.65625f, 6.3125f, - 0.78125f, 1.015625f, 3.75f, 1.7578125f, 0.953125f, 2.5625f, - 1.96875f, 0.875f, 3.21875f, 12.6875f, 15.5f, 0.3515625f, - 3.625f, 0.1875f, 0.59375f, 0.40625f, 1.1640625f, 0.0859375f, - 1.515625f, 0.625f, 1.046875f, 1.15625f, 0.890625f, 0.5234375f, - 3.75f, 3.3125f, 0.9296875f, 10.8125f, 5.53125f, 6.0625f, - 1.0859375f, 7.1875f, 4.9375f, 4.71875f, 13.125f, 0.15625f, - 1.828125f, 1.296875f, 1.84375f, 5.375f, 1.46875f, 1.7578125f, - 1.5703125f, 0.0625f, 0.4375f, 1.3125f, 1.1484375f, 0.1015625f, - 15.1875f, 0.8671875f, 1.6171875f, 5.96875f, 3.421875f, 3.1875f, - 7.0f, 0.9375f, 12.0f, 14.9375f, 4.75f, 3.40625f, - 1.96875f, 1.390625f, 7.40625f, 3.5625f, 1.9453125f, 1.28125f, - 0.71875f, 11.125f, 2.09375f, 1.703125f, 7.875f, 3.53125f, - 0.5625f, 3.859375f, 7.75f, 2.34375f, 0.25f, 1.984375f, - 2.03125f, 0.265625f, 7.25f, 7.5625f, 4.21875f, 5.40625f, - 3.3125f, 5.4375f, 0.09375f, 6.875f, 1.703125f, 4.375f, - 1.1328125f, 3.828125f, 7.21875f, 3.5625f, 0.0234375f, 1.71875f, - 7.28125f, 1.609375f, 1.1875f, 0.703125f, 13.3125f, 13.5625f, - 2.46875f, 4.1875f, 3.5f, 3.09375f, 2.75f, 6.125f, - 5.375f, 1.75f, 1.140625f, 8.6875f, 3.984375f, 1.640625f, - 0.75f, 2.125f, 3.28125f, 1.078125f, 1.1953125f, 0.8515625f, - 4.625f, 3.140625f, 1.546875f, 0.390625f, 5.9375f, 0.9375f, - 0.875f, 1.65625f, 7.1875f, 4.40625f, 3.6875f, 4.53125f, - 1.953125f, 5.59375f, 15.75f, 8.8125f, 5.4375f, 2.40625f, - 14.0625f, 2.21875f, 1.0390625f, 3.5f, -}; -float input_Wr[DATA_SIZE - 1] = { - 1.0f, - 0.999981164932251f, - 0.9999247193336487f, - 0.9998306035995483f, - 0.99969881772995f, - 0.9995294213294983f, - 0.9993223547935486f, - 0.9990777373313904f, - 0.9987954497337341f, - 0.9984755516052246f, - 0.9981181025505066f, - 0.9977230429649353f, - 0.9972904324531555f, - 0.9968202710151672f, - 0.9963126182556152f, - 0.9957674145698547f, - 0.9951847195625305f, - 0.9945645928382874f, - 0.9939069747924805f, - 0.9932119250297546f, - 0.9924795627593994f, - 0.9917097687721252f, - 0.9909026622772217f, - 0.990058183670044f, - 0.9891765117645264f, - 0.9882575869560242f, - 0.9873014092445374f, - 0.9863080978393555f, - 0.9852776527404785f, - 0.9842100739479065f, - 0.983105480670929f, - 0.9819638729095459f, - 0.9807852506637573f, - 0.9795697927474976f, - 0.978317379951477f, - 0.9770281314849854f, - 0.9757021069526672f, - 0.9743393659591675f, - 0.9729399681091309f, - 0.9715039134025574f, - 0.9700312614440918f, - 0.9685220718383789f, - 0.9669764637947083f, - 0.9653944373130798f, - 0.9637760519981384f, - 0.9621214270591736f, - 0.9604305028915405f, - 0.9587034583091736f, - 0.9569403529167175f, - 0.9551411867141724f, - 0.9533060193061829f, - 0.9514350295066833f, - 0.949528157711029f, - 0.9475855827331543f, - 0.9456073045730591f, - 0.943593442440033f, - 0.9415440559387207f, - 0.9394592046737671f, - 0.9373390078544617f, - 0.9351835250854492f, - 0.9329928159713745f, - 0.9307669401168823f, - 0.928506076335907f, - 0.9262102246284485f, - 0.9238795042037964f, - 0.9215140342712402f, - 0.9191138744354248f, - 0.9166790843009949f, - 0.91420978307724f, - 0.9117060303688049f, - 0.909168004989624f, - 0.9065957069396973f, - 0.903989315032959f, - 0.9013488292694092f, - 0.898674488067627f, - 0.8959662318229675f, - 0.89322429895401f, - 0.8904487490653992f, - 0.8876396417617798f, - 0.8847970962524414f, - 0.8819212913513184f, - 0.8790122270584106f, - 0.8760700821876526f, - 0.8730949759483337f, - 0.8700869679450989f, - 0.8670462369918823f, - 0.8639728426933289f, - 0.8608669638633728f, - 0.8577286005020142f, - 0.854557991027832f, - 0.8513551950454712f, - 0.8481203317642212f, - 0.8448535799980164f, - 0.8415549993515015f, - 0.8382247090339661f, - 0.8348628878593445f, - 0.8314695954322815f, - 0.8280450701713562f, - 0.8245893120765686f, - 0.821102499961853f, - 0.8175848126411438f, - 0.8140363097190857f, - 0.810457170009613f, - 0.8068475723266602f, - 0.803207516670227f, - 0.7995372414588928f, - 0.7958369255065918f, - 0.792106568813324f, - 0.7883464097976685f, - 0.7845565676689148f, - 0.7807372212409973f, - 0.7768884897232056f, - 0.7730104327201843f, - 0.7691033482551575f, - 0.765167236328125f, - 0.7612023949623108f, - 0.7572088241577148f, - 0.753186821937561f, - 0.7491363883018494f, - 0.7450577616691589f, - 0.7409511208534241f, - 0.7368165850639343f, - 0.7326542735099792f, - 0.7284643650054932f, - 0.7242470979690552f, - 0.7200025320053101f, - 0.7157308459281921f, - 0.7114322185516357f, - 0.7071067690849304f, - 0.7027547359466553f, - 0.6983762383460999f, - 0.6939714550971985f, - 0.6895405650138855f, - 0.6850836873054504f, - 0.6806010007858276f, - 0.6760926842689514f, - 0.6715589761734009f, - 0.6669999361038208f, - 0.6624158024787903f, - 0.6578066945075989f, - 0.6531728506088257f, - 0.6485143899917603f, - 0.6438315510749817f, - 0.6391244530677795f, - 0.6343932747840881f, - 0.6296382546424866f, - 0.6248595118522644f, - 0.620057225227356f, - 0.6152315735816956f, - 0.6103827953338623f, - 0.6055110692977905f, - 0.600616455078125f, - 0.5956993103027344f, - 0.5907596945762634f, - 0.5857978463172913f, - 0.5808139443397522f, - 0.5758081674575806f, - 0.5707807540893555f, - 0.5657318234443665f, - 0.5606615543365479f, - 0.5555702447891235f, - 0.5504579544067383f, - 0.545324981212616f, - 0.5401714444160461f, - 0.5349976420402527f, - 0.5298036336898804f, - 0.5245896577835083f, - 0.5193560123443604f, - 0.5141027569770813f, - 0.5088301301002502f, - 0.5035383701324463f, - 0.49822765588760376f, - 0.49289819598197937f, - 0.48755016922950745f, - 0.4821837842464447f, - 0.47679921984672546f, - 0.4713967442512512f, - 0.4659765064716339f, - 0.46053871512413025f, - 0.45508357882499695f, - 0.4496113359928131f, - 0.4441221356391907f, - 0.43861624598503113f, - 0.4330938160419464f, - 0.4275550842285156f, - 0.4220002591609955f, - 0.4164295494556427f, - 0.410843163728714f, - 0.40524131059646606f, - 0.39962419867515564f, - 0.39399203658103943f, - 0.38834503293037415f, - 0.3826834261417389f, - 0.3770074248313904f, - 0.37131720781326294f, - 0.3656129837036133f, - 0.3598950505256653f, - 0.3541635274887085f, - 0.3484186828136444f, - 0.34266072511672974f, - 0.3368898630142212f, - 0.3311063051223755f, - 0.32531028985977173f, - 0.3195020258426666f, - 0.3136817514896393f, - 0.307849645614624f, - 0.30200594663619995f, - 0.29615089297294617f, - 0.290284663438797f, - 0.28440752625465393f, - 0.2785196900367737f, - 0.27262136340141296f, - 0.2667127549648285f, - 0.26079410314559937f, - 0.2548656463623047f, - 0.24892760813236237f, - 0.24298018217086792f, - 0.23702360689640045f, - 0.23105810582637787f, - 0.22508391737937927f, - 0.21910123527050018f, - 0.2131103128194809f, - 0.20711137354373932f, - 0.20110464096069336f, - 0.19509032368659973f, - 0.18906866014003754f, - 0.18303988873958588f, - 0.17700421810150146f, - 0.1709618866443634f, - 0.1649131178855896f, - 0.15885815024375916f, - 0.15279719233512878f, - 0.1467304676771164f, - 0.14065824449062347f, - 0.13458070158958435f, - 0.1284981071949005f, - 0.12241067737340927f, - 0.11631862819194794f, - 0.11022220551967621f, - 0.104121632874012f, - 0.0980171412229538f, - 0.09190895408391953f, - 0.08579730987548828f, - 0.07968243956565857f, - 0.0735645666718483f, - 0.06744392216205597f, - 0.06132073700428009f, - 0.055195245891809464f, - 0.049067676067352295f, - 0.04293825849890709f, - 0.03680722415447235f, - 0.030674804002046585f, - 0.024541229009628296f, - 0.018406730145215988f, - 0.012271538376808167f, - 0.006135884672403336f, - 6.123234262925839e-17f, - -0.006135884672403336f, - -0.012271538376808167f, - -0.018406730145215988f, - -0.024541229009628296f, - -0.030674804002046585f, - -0.03680722415447235f, - -0.04293825849890709f, - -0.049067676067352295f, - -0.055195245891809464f, - -0.06132073700428009f, - -0.06744392216205597f, - -0.0735645666718483f, - -0.07968243956565857f, - -0.08579730987548828f, - -0.09190895408391953f, - -0.0980171412229538f, - -0.104121632874012f, - -0.11022220551967621f, - -0.11631862819194794f, - -0.12241067737340927f, - -0.1284981071949005f, - -0.13458070158958435f, - -0.14065824449062347f, - -0.1467304676771164f, - -0.15279719233512878f, - -0.15885815024375916f, - -0.1649131178855896f, - -0.1709618866443634f, - -0.17700421810150146f, - -0.18303988873958588f, - -0.18906866014003754f, - -0.19509032368659973f, - -0.20110464096069336f, - -0.20711137354373932f, - -0.2131103128194809f, - -0.21910123527050018f, - -0.22508391737937927f, - -0.23105810582637787f, - -0.23702360689640045f, - -0.24298018217086792f, - -0.24892760813236237f, - -0.2548656463623047f, - -0.26079410314559937f, - -0.2667127549648285f, - -0.27262136340141296f, - -0.2785196900367737f, - -0.28440752625465393f, - -0.290284663438797f, - -0.29615089297294617f, - -0.30200594663619995f, - -0.307849645614624f, - -0.3136817514896393f, - -0.3195020258426666f, - -0.32531028985977173f, - -0.3311063051223755f, - -0.3368898630142212f, - -0.34266072511672974f, - -0.3484186828136444f, - -0.3541635274887085f, - -0.3598950505256653f, - -0.3656129837036133f, - -0.37131720781326294f, - -0.3770074248313904f, - -0.3826834261417389f, - -0.38834503293037415f, - -0.39399203658103943f, - -0.39962419867515564f, - -0.40524131059646606f, - -0.410843163728714f, - -0.4164295494556427f, - -0.4220002591609955f, - -0.4275550842285156f, - -0.4330938160419464f, - -0.43861624598503113f, - -0.4441221356391907f, - -0.4496113359928131f, - -0.45508357882499695f, - -0.46053871512413025f, - -0.4659765064716339f, - -0.4713967442512512f, - -0.47679921984672546f, - -0.4821837842464447f, - -0.48755016922950745f, - -0.49289819598197937f, - -0.49822765588760376f, - -0.5035383701324463f, - -0.5088301301002502f, - -0.5141027569770813f, - -0.5193560123443604f, - -0.5245896577835083f, - -0.5298036336898804f, - -0.5349976420402527f, - -0.5401714444160461f, - -0.545324981212616f, - -0.5504579544067383f, - -0.5555702447891235f, - -0.5606615543365479f, - -0.5657318234443665f, - -0.5707807540893555f, - -0.5758081674575806f, - -0.5808139443397522f, - -0.5857978463172913f, - -0.5907596945762634f, - -0.5956993103027344f, - -0.600616455078125f, - -0.6055110692977905f, - -0.6103827953338623f, - -0.6152315735816956f, - -0.620057225227356f, - -0.6248595118522644f, - -0.6296382546424866f, - -0.6343932747840881f, - -0.6391244530677795f, - -0.6438315510749817f, - -0.6485143899917603f, - -0.6531728506088257f, - -0.6578066945075989f, - -0.6624158024787903f, - -0.6669999361038208f, - -0.6715589761734009f, - -0.6760926842689514f, - -0.6806010007858276f, - -0.6850836873054504f, - -0.6895405650138855f, - -0.6939714550971985f, - -0.6983762383460999f, - -0.7027547359466553f, - -0.7071067690849304f, - -0.7114322185516357f, - -0.7157308459281921f, - -0.7200025320053101f, - -0.7242470979690552f, - -0.7284643650054932f, - -0.7326542735099792f, - -0.7368165850639343f, - -0.7409511208534241f, - -0.7450577616691589f, - -0.7491363883018494f, - -0.753186821937561f, - -0.7572088241577148f, - -0.7612023949623108f, - -0.765167236328125f, - -0.7691033482551575f, - -0.7730104327201843f, - -0.7768884897232056f, - -0.7807372212409973f, - -0.7845565676689148f, - -0.7883464097976685f, - -0.792106568813324f, - -0.7958369255065918f, - -0.7995372414588928f, - -0.803207516670227f, - -0.8068475723266602f, - -0.810457170009613f, - -0.8140363097190857f, - -0.8175848126411438f, - -0.821102499961853f, - -0.8245893120765686f, - -0.8280450701713562f, - -0.8314695954322815f, - -0.8348628878593445f, - -0.8382247090339661f, - -0.8415549993515015f, - -0.8448535799980164f, - -0.8481203317642212f, - -0.8513551950454712f, - -0.854557991027832f, - -0.8577286005020142f, - -0.8608669638633728f, - -0.8639728426933289f, - -0.8670462369918823f, - -0.8700869679450989f, - -0.8730949759483337f, - -0.8760700821876526f, - -0.8790122270584106f, - -0.8819212913513184f, - -0.8847970962524414f, - -0.8876396417617798f, - -0.8904487490653992f, - -0.89322429895401f, - -0.8959662318229675f, - -0.898674488067627f, - -0.9013488292694092f, - -0.903989315032959f, - -0.9065957069396973f, - -0.909168004989624f, - -0.9117060303688049f, - -0.91420978307724f, - -0.9166790843009949f, - -0.9191138744354248f, - -0.9215140342712402f, - -0.9238795042037964f, - -0.9262102246284485f, - -0.928506076335907f, - -0.9307669401168823f, - -0.9329928159713745f, - -0.9351835250854492f, - -0.9373390078544617f, - -0.9394592046737671f, - -0.9415440559387207f, - -0.943593442440033f, - -0.9456073045730591f, - -0.9475855827331543f, - -0.949528157711029f, - -0.9514350295066833f, - -0.9533060193061829f, - -0.9551411867141724f, - -0.9569403529167175f, - -0.9587034583091736f, - -0.9604305028915405f, - -0.9621214270591736f, - -0.9637760519981384f, - -0.9653944373130798f, - -0.9669764637947083f, - -0.9685220718383789f, - -0.9700312614440918f, - -0.9715039134025574f, - -0.9729399681091309f, - -0.9743393659591675f, - -0.9757021069526672f, - -0.9770281314849854f, - -0.978317379951477f, - -0.9795697927474976f, - -0.9807852506637573f, - -0.9819638729095459f, - -0.983105480670929f, - -0.9842100739479065f, - -0.9852776527404785f, - -0.9863080978393555f, - -0.9873014092445374f, - -0.9882575869560242f, - -0.9891765117645264f, - -0.990058183670044f, - -0.9909026622772217f, - -0.9917097687721252f, - -0.9924795627593994f, - -0.9932119250297546f, - -0.9939069747924805f, - -0.9945645928382874f, - -0.9951847195625305f, - -0.9957674145698547f, - -0.9963126182556152f, - -0.9968202710151672f, - -0.9972904324531555f, - -0.9977230429649353f, - -0.9981181025505066f, - -0.9984755516052246f, - -0.9987954497337341f, - -0.9990777373313904f, - -0.9993223547935486f, - -0.9995294213294983f, - -0.99969881772995f, - -0.9998306035995483f, - -0.9999247193336487f, - -0.999981164932251f, - 1.0f, - 0.9999247193336487f, - 0.99969881772995f, - 0.9993223547935486f, - 0.9987954497337341f, - 0.9981181025505066f, - 0.9972904324531555f, - 0.9963126182556152f, - 0.9951847195625305f, - 0.9939069747924805f, - 0.9924795627593994f, - 0.9909026622772217f, - 0.9891765117645264f, - 0.9873014092445374f, - 0.9852776527404785f, - 0.983105480670929f, - 0.9807852506637573f, - 0.978317379951477f, - 0.9757021069526672f, - 0.9729399681091309f, - 0.9700312614440918f, - 0.9669764637947083f, - 0.9637760519981384f, - 0.9604305028915405f, - 0.9569403529167175f, - 0.9533060193061829f, - 0.949528157711029f, - 0.9456073045730591f, - 0.9415440559387207f, - 0.9373390078544617f, - 0.9329928159713745f, - 0.928506076335907f, - 0.9238795042037964f, - 0.9191138744354248f, - 0.91420978307724f, - 0.909168004989624f, - 0.903989315032959f, - 0.898674488067627f, - 0.89322429895401f, - 0.8876396417617798f, - 0.8819212913513184f, - 0.8760700821876526f, - 0.8700869679450989f, - 0.8639728426933289f, - 0.8577286005020142f, - 0.8513551950454712f, - 0.8448535799980164f, - 0.8382247090339661f, - 0.8314695954322815f, - 0.8245893120765686f, - 0.8175848126411438f, - 0.810457170009613f, - 0.803207516670227f, - 0.7958369255065918f, - 0.7883464097976685f, - 0.7807372212409973f, - 0.7730104327201843f, - 0.765167236328125f, - 0.7572088241577148f, - 0.7491363883018494f, - 0.7409511208534241f, - 0.7326542735099792f, - 0.7242470979690552f, - 0.7157308459281921f, - 0.7071067690849304f, - 0.6983762383460999f, - 0.6895405650138855f, - 0.6806010007858276f, - 0.6715589761734009f, - 0.6624158024787903f, - 0.6531728506088257f, - 0.6438315510749817f, - 0.6343932747840881f, - 0.6248595118522644f, - 0.6152315735816956f, - 0.6055110692977905f, - 0.5956993103027344f, - 0.5857978463172913f, - 0.5758081674575806f, - 0.5657318234443665f, - 0.5555702447891235f, - 0.545324981212616f, - 0.5349976420402527f, - 0.5245896577835083f, - 0.5141027569770813f, - 0.5035383701324463f, - 0.49289819598197937f, - 0.4821837842464447f, - 0.4713967442512512f, - 0.46053871512413025f, - 0.4496113359928131f, - 0.43861624598503113f, - 0.4275550842285156f, - 0.4164295494556427f, - 0.40524131059646606f, - 0.39399203658103943f, - 0.3826834261417389f, - 0.37131720781326294f, - 0.3598950505256653f, - 0.3484186828136444f, - 0.3368898630142212f, - 0.32531028985977173f, - 0.3136817514896393f, - 0.30200594663619995f, - 0.290284663438797f, - 0.2785196900367737f, - 0.2667127549648285f, - 0.2548656463623047f, - 0.24298018217086792f, - 0.23105810582637787f, - 0.21910123527050018f, - 0.20711137354373932f, - 0.19509032368659973f, - 0.18303988873958588f, - 0.1709618866443634f, - 0.15885815024375916f, - 0.1467304676771164f, - 0.13458070158958435f, - 0.12241067737340927f, - 0.11022220551967621f, - 0.0980171412229538f, - 0.08579730987548828f, - 0.0735645666718483f, - 0.06132073700428009f, - 0.049067676067352295f, - 0.03680722415447235f, - 0.024541229009628296f, - 0.012271538376808167f, - 6.123234262925839e-17f, - -0.012271538376808167f, - -0.024541229009628296f, - -0.03680722415447235f, - -0.049067676067352295f, - -0.06132073700428009f, - -0.0735645666718483f, - -0.08579730987548828f, - -0.0980171412229538f, - -0.11022220551967621f, - -0.12241067737340927f, - -0.13458070158958435f, - -0.1467304676771164f, - -0.15885815024375916f, - -0.1709618866443634f, - -0.18303988873958588f, - -0.19509032368659973f, - -0.20711137354373932f, - -0.21910123527050018f, - -0.23105810582637787f, - -0.24298018217086792f, - -0.2548656463623047f, - -0.2667127549648285f, - -0.2785196900367737f, - -0.290284663438797f, - -0.30200594663619995f, - -0.3136817514896393f, - -0.32531028985977173f, - -0.3368898630142212f, - -0.3484186828136444f, - -0.3598950505256653f, - -0.37131720781326294f, - -0.3826834261417389f, - -0.39399203658103943f, - -0.40524131059646606f, - -0.4164295494556427f, - -0.4275550842285156f, - -0.43861624598503113f, - -0.4496113359928131f, - -0.46053871512413025f, - -0.4713967442512512f, - -0.4821837842464447f, - -0.49289819598197937f, - -0.5035383701324463f, - -0.5141027569770813f, - -0.5245896577835083f, - -0.5349976420402527f, - -0.545324981212616f, - -0.5555702447891235f, - -0.5657318234443665f, - -0.5758081674575806f, - -0.5857978463172913f, - -0.5956993103027344f, - -0.6055110692977905f, - -0.6152315735816956f, - -0.6248595118522644f, - -0.6343932747840881f, - -0.6438315510749817f, - -0.6531728506088257f, - -0.6624158024787903f, - -0.6715589761734009f, - -0.6806010007858276f, - -0.6895405650138855f, - -0.6983762383460999f, - -0.7071067690849304f, - -0.7157308459281921f, - -0.7242470979690552f, - -0.7326542735099792f, - -0.7409511208534241f, - -0.7491363883018494f, - -0.7572088241577148f, - -0.765167236328125f, - -0.7730104327201843f, - -0.7807372212409973f, - -0.7883464097976685f, - -0.7958369255065918f, - -0.803207516670227f, - -0.810457170009613f, - -0.8175848126411438f, - -0.8245893120765686f, - -0.8314695954322815f, - -0.8382247090339661f, - -0.8448535799980164f, - -0.8513551950454712f, - -0.8577286005020142f, - -0.8639728426933289f, - -0.8700869679450989f, - -0.8760700821876526f, - -0.8819212913513184f, - -0.8876396417617798f, - -0.89322429895401f, - -0.898674488067627f, - -0.903989315032959f, - -0.909168004989624f, - -0.91420978307724f, - -0.9191138744354248f, - -0.9238795042037964f, - -0.928506076335907f, - -0.9329928159713745f, - -0.9373390078544617f, - -0.9415440559387207f, - -0.9456073045730591f, - -0.949528157711029f, - -0.9533060193061829f, - -0.9569403529167175f, - -0.9604305028915405f, - -0.9637760519981384f, - -0.9669764637947083f, - -0.9700312614440918f, - -0.9729399681091309f, - -0.9757021069526672f, - -0.978317379951477f, - -0.9807852506637573f, - -0.983105480670929f, - -0.9852776527404785f, - -0.9873014092445374f, - -0.9891765117645264f, - -0.9909026622772217f, - -0.9924795627593994f, - -0.9939069747924805f, - -0.9951847195625305f, - -0.9963126182556152f, - -0.9972904324531555f, - -0.9981181025505066f, - -0.9987954497337341f, - -0.9993223547935486f, - -0.99969881772995f, - -0.9999247193336487f, - 1.0f, - 0.99969881772995f, - 0.9987954497337341f, - 0.9972904324531555f, - 0.9951847195625305f, - 0.9924795627593994f, - 0.9891765117645264f, - 0.9852776527404785f, - 0.9807852506637573f, - 0.9757021069526672f, - 0.9700312614440918f, - 0.9637760519981384f, - 0.9569403529167175f, - 0.949528157711029f, - 0.9415440559387207f, - 0.9329928159713745f, - 0.9238795042037964f, - 0.91420978307724f, - 0.903989315032959f, - 0.89322429895401f, - 0.8819212913513184f, - 0.8700869679450989f, - 0.8577286005020142f, - 0.8448535799980164f, - 0.8314695954322815f, - 0.8175848126411438f, - 0.803207516670227f, - 0.7883464097976685f, - 0.7730104327201843f, - 0.7572088241577148f, - 0.7409511208534241f, - 0.7242470979690552f, - 0.7071067690849304f, - 0.6895405650138855f, - 0.6715589761734009f, - 0.6531728506088257f, - 0.6343932747840881f, - 0.6152315735816956f, - 0.5956993103027344f, - 0.5758081674575806f, - 0.5555702447891235f, - 0.5349976420402527f, - 0.5141027569770813f, - 0.49289819598197937f, - 0.4713967442512512f, - 0.4496113359928131f, - 0.4275550842285156f, - 0.40524131059646606f, - 0.3826834261417389f, - 0.3598950505256653f, - 0.3368898630142212f, - 0.3136817514896393f, - 0.290284663438797f, - 0.2667127549648285f, - 0.24298018217086792f, - 0.21910123527050018f, - 0.19509032368659973f, - 0.1709618866443634f, - 0.1467304676771164f, - 0.12241067737340927f, - 0.0980171412229538f, - 0.0735645666718483f, - 0.049067676067352295f, - 0.024541229009628296f, - 6.123234262925839e-17f, - -0.024541229009628296f, - -0.049067676067352295f, - -0.0735645666718483f, - -0.0980171412229538f, - -0.12241067737340927f, - -0.1467304676771164f, - -0.1709618866443634f, - -0.19509032368659973f, - -0.21910123527050018f, - -0.24298018217086792f, - -0.2667127549648285f, - -0.290284663438797f, - -0.3136817514896393f, - -0.3368898630142212f, - -0.3598950505256653f, - -0.3826834261417389f, - -0.40524131059646606f, - -0.4275550842285156f, - -0.4496113359928131f, - -0.4713967442512512f, - -0.49289819598197937f, - -0.5141027569770813f, - -0.5349976420402527f, - -0.5555702447891235f, - -0.5758081674575806f, - -0.5956993103027344f, - -0.6152315735816956f, - -0.6343932747840881f, - -0.6531728506088257f, - -0.6715589761734009f, - -0.6895405650138855f, - -0.7071067690849304f, - -0.7242470979690552f, - -0.7409511208534241f, - -0.7572088241577148f, - -0.7730104327201843f, - -0.7883464097976685f, - -0.803207516670227f, - -0.8175848126411438f, - -0.8314695954322815f, - -0.8448535799980164f, - -0.8577286005020142f, - -0.8700869679450989f, - -0.8819212913513184f, - -0.89322429895401f, - -0.903989315032959f, - -0.91420978307724f, - -0.9238795042037964f, - -0.9329928159713745f, - -0.9415440559387207f, - -0.949528157711029f, - -0.9569403529167175f, - -0.9637760519981384f, - -0.9700312614440918f, - -0.9757021069526672f, - -0.9807852506637573f, - -0.9852776527404785f, - -0.9891765117645264f, - -0.9924795627593994f, - -0.9951847195625305f, - -0.9972904324531555f, - -0.9987954497337341f, - -0.99969881772995f, - 1.0f, - 0.9987954497337341f, - 0.9951847195625305f, - 0.9891765117645264f, - 0.9807852506637573f, - 0.9700312614440918f, - 0.9569403529167175f, - 0.9415440559387207f, - 0.9238795042037964f, - 0.903989315032959f, - 0.8819212913513184f, - 0.8577286005020142f, - 0.8314695954322815f, - 0.803207516670227f, - 0.7730104327201843f, - 0.7409511208534241f, - 0.7071067690849304f, - 0.6715589761734009f, - 0.6343932747840881f, - 0.5956993103027344f, - 0.5555702447891235f, - 0.5141027569770813f, - 0.4713967442512512f, - 0.4275550842285156f, - 0.3826834261417389f, - 0.3368898630142212f, - 0.290284663438797f, - 0.24298018217086792f, - 0.19509032368659973f, - 0.1467304676771164f, - 0.0980171412229538f, - 0.049067676067352295f, - 6.123234262925839e-17f, - -0.049067676067352295f, - -0.0980171412229538f, - -0.1467304676771164f, - -0.19509032368659973f, - -0.24298018217086792f, - -0.290284663438797f, - -0.3368898630142212f, - -0.3826834261417389f, - -0.4275550842285156f, - -0.4713967442512512f, - -0.5141027569770813f, - -0.5555702447891235f, - -0.5956993103027344f, - -0.6343932747840881f, - -0.6715589761734009f, - -0.7071067690849304f, - -0.7409511208534241f, - -0.7730104327201843f, - -0.803207516670227f, - -0.8314695954322815f, - -0.8577286005020142f, - -0.8819212913513184f, - -0.903989315032959f, - -0.9238795042037964f, - -0.9415440559387207f, - -0.9569403529167175f, - -0.9700312614440918f, - -0.9807852506637573f, - -0.9891765117645264f, - -0.9951847195625305f, - -0.9987954497337341f, - 1.0f, - 0.9951847195625305f, - 0.9807852506637573f, - 0.9569403529167175f, - 0.9238795042037964f, - 0.8819212913513184f, - 0.8314695954322815f, - 0.7730104327201843f, - 0.7071067690849304f, - 0.6343932747840881f, - 0.5555702447891235f, - 0.4713967442512512f, - 0.3826834261417389f, - 0.290284663438797f, - 0.19509032368659973f, - 0.0980171412229538f, - 6.123234262925839e-17f, - -0.0980171412229538f, - -0.19509032368659973f, - -0.290284663438797f, - -0.3826834261417389f, - -0.4713967442512512f, - -0.5555702447891235f, - -0.6343932747840881f, - -0.7071067690849304f, - -0.7730104327201843f, - -0.8314695954322815f, - -0.8819212913513184f, - -0.9238795042037964f, - -0.9569403529167175f, - -0.9807852506637573f, - -0.9951847195625305f, - 1.0f, - 0.9807852506637573f, - 0.9238795042037964f, - 0.8314695954322815f, - 0.7071067690849304f, - 0.5555702447891235f, - 0.3826834261417389f, - 0.19509032368659973f, - 6.123234262925839e-17f, - -0.19509032368659973f, - -0.3826834261417389f, - -0.5555702447891235f, - -0.7071067690849304f, - -0.8314695954322815f, - -0.9238795042037964f, - -0.9807852506637573f, - 1.0f, - 0.9238795042037964f, - 0.7071067690849304f, - 0.3826834261417389f, - 6.123234262925839e-17f, - -0.3826834261417389f, - -0.7071067690849304f, - -0.9238795042037964f, - 1.0f, - 0.7071067690849304f, - 6.123234262925839e-17f, - -0.7071067690849304f, - 1.0f, - 6.123234262925839e-17f, - 1.0f, -}; -float input_Wi[DATA_SIZE - 1] = { - 0.0f, - -0.006135884672403336f, - -0.012271538376808167f, - -0.018406730145215988f, - -0.024541229009628296f, - -0.030674804002046585f, - -0.03680722415447235f, - -0.04293825849890709f, - -0.049067676067352295f, - -0.055195245891809464f, - -0.06132073700428009f, - -0.06744392216205597f, - -0.0735645666718483f, - -0.07968243956565857f, - -0.08579730987548828f, - -0.09190895408391953f, - -0.0980171412229538f, - -0.104121632874012f, - -0.11022220551967621f, - -0.11631862819194794f, - -0.12241067737340927f, - -0.1284981071949005f, - -0.13458070158958435f, - -0.14065824449062347f, - -0.1467304676771164f, - -0.15279719233512878f, - -0.15885815024375916f, - -0.1649131178855896f, - -0.1709618866443634f, - -0.17700421810150146f, - -0.18303988873958588f, - -0.18906866014003754f, - -0.19509032368659973f, - -0.20110464096069336f, - -0.20711137354373932f, - -0.2131103128194809f, - -0.21910123527050018f, - -0.22508391737937927f, - -0.23105810582637787f, - -0.23702360689640045f, - -0.24298018217086792f, - -0.24892760813236237f, - -0.2548656463623047f, - -0.26079410314559937f, - -0.2667127549648285f, - -0.27262136340141296f, - -0.2785196900367737f, - -0.28440752625465393f, - -0.290284663438797f, - -0.29615089297294617f, - -0.30200594663619995f, - -0.307849645614624f, - -0.3136817514896393f, - -0.3195020258426666f, - -0.32531028985977173f, - -0.3311063051223755f, - -0.3368898630142212f, - -0.34266072511672974f, - -0.3484186828136444f, - -0.3541635274887085f, - -0.3598950505256653f, - -0.3656129837036133f, - -0.37131720781326294f, - -0.3770074248313904f, - -0.3826834261417389f, - -0.38834503293037415f, - -0.39399203658103943f, - -0.39962419867515564f, - -0.40524131059646606f, - -0.410843163728714f, - -0.4164295494556427f, - -0.4220002591609955f, - -0.4275550842285156f, - -0.4330938160419464f, - -0.43861624598503113f, - -0.4441221356391907f, - -0.4496113359928131f, - -0.45508357882499695f, - -0.46053871512413025f, - -0.4659765064716339f, - -0.4713967442512512f, - -0.47679921984672546f, - -0.4821837842464447f, - -0.48755016922950745f, - -0.49289819598197937f, - -0.49822765588760376f, - -0.5035383701324463f, - -0.5088301301002502f, - -0.5141027569770813f, - -0.5193560123443604f, - -0.5245896577835083f, - -0.5298036336898804f, - -0.5349976420402527f, - -0.5401714444160461f, - -0.545324981212616f, - -0.5504579544067383f, - -0.5555702447891235f, - -0.5606615543365479f, - -0.5657318234443665f, - -0.5707807540893555f, - -0.5758081674575806f, - -0.5808139443397522f, - -0.5857978463172913f, - -0.5907596945762634f, - -0.5956993103027344f, - -0.600616455078125f, - -0.6055110692977905f, - -0.6103827953338623f, - -0.6152315735816956f, - -0.620057225227356f, - -0.6248595118522644f, - -0.6296382546424866f, - -0.6343932747840881f, - -0.6391244530677795f, - -0.6438315510749817f, - -0.6485143899917603f, - -0.6531728506088257f, - -0.6578066945075989f, - -0.6624158024787903f, - -0.6669999361038208f, - -0.6715589761734009f, - -0.6760926842689514f, - -0.6806010007858276f, - -0.6850836873054504f, - -0.6895405650138855f, - -0.6939714550971985f, - -0.6983762383460999f, - -0.7027547359466553f, - -0.7071067690849304f, - -0.7114322185516357f, - -0.7157308459281921f, - -0.7200025320053101f, - -0.7242470979690552f, - -0.7284643650054932f, - -0.7326542735099792f, - -0.7368165850639343f, - -0.7409511208534241f, - -0.7450577616691589f, - -0.7491363883018494f, - -0.753186821937561f, - -0.7572088241577148f, - -0.7612023949623108f, - -0.765167236328125f, - -0.7691033482551575f, - -0.7730104327201843f, - -0.7768884897232056f, - -0.7807372212409973f, - -0.7845565676689148f, - -0.7883464097976685f, - -0.792106568813324f, - -0.7958369255065918f, - -0.7995372414588928f, - -0.803207516670227f, - -0.8068475723266602f, - -0.810457170009613f, - -0.8140363097190857f, - -0.8175848126411438f, - -0.821102499961853f, - -0.8245893120765686f, - -0.8280450701713562f, - -0.8314695954322815f, - -0.8348628878593445f, - -0.8382247090339661f, - -0.8415549993515015f, - -0.8448535799980164f, - -0.8481203317642212f, - -0.8513551950454712f, - -0.854557991027832f, - -0.8577286005020142f, - -0.8608669638633728f, - -0.8639728426933289f, - -0.8670462369918823f, - -0.8700869679450989f, - -0.8730949759483337f, - -0.8760700821876526f, - -0.8790122270584106f, - -0.8819212913513184f, - -0.8847970962524414f, - -0.8876396417617798f, - -0.8904487490653992f, - -0.89322429895401f, - -0.8959662318229675f, - -0.898674488067627f, - -0.9013488292694092f, - -0.903989315032959f, - -0.9065957069396973f, - -0.909168004989624f, - -0.9117060303688049f, - -0.91420978307724f, - -0.9166790843009949f, - -0.9191138744354248f, - -0.9215140342712402f, - -0.9238795042037964f, - -0.9262102246284485f, - -0.928506076335907f, - -0.9307669401168823f, - -0.9329928159713745f, - -0.9351835250854492f, - -0.9373390078544617f, - -0.9394592046737671f, - -0.9415440559387207f, - -0.943593442440033f, - -0.9456073045730591f, - -0.9475855827331543f, - -0.949528157711029f, - -0.9514350295066833f, - -0.9533060193061829f, - -0.9551411867141724f, - -0.9569403529167175f, - -0.9587034583091736f, - -0.9604305028915405f, - -0.9621214270591736f, - -0.9637760519981384f, - -0.9653944373130798f, - -0.9669764637947083f, - -0.9685220718383789f, - -0.9700312614440918f, - -0.9715039134025574f, - -0.9729399681091309f, - -0.9743393659591675f, - -0.9757021069526672f, - -0.9770281314849854f, - -0.978317379951477f, - -0.9795697927474976f, - -0.9807852506637573f, - -0.9819638729095459f, - -0.983105480670929f, - -0.9842100739479065f, - -0.9852776527404785f, - -0.9863080978393555f, - -0.9873014092445374f, - -0.9882575869560242f, - -0.9891765117645264f, - -0.990058183670044f, - -0.9909026622772217f, - -0.9917097687721252f, - -0.9924795627593994f, - -0.9932119250297546f, - -0.9939069747924805f, - -0.9945645928382874f, - -0.9951847195625305f, - -0.9957674145698547f, - -0.9963126182556152f, - -0.9968202710151672f, - -0.9972904324531555f, - -0.9977230429649353f, - -0.9981181025505066f, - -0.9984755516052246f, - -0.9987954497337341f, - -0.9990777373313904f, - -0.9993223547935486f, - -0.9995294213294983f, - -0.99969881772995f, - -0.9998306035995483f, - -0.9999247193336487f, - -0.999981164932251f, - -1.0f, - -0.999981164932251f, - -0.9999247193336487f, - -0.9998306035995483f, - -0.99969881772995f, - -0.9995294213294983f, - -0.9993223547935486f, - -0.9990777373313904f, - -0.9987954497337341f, - -0.9984755516052246f, - -0.9981181025505066f, - -0.9977230429649353f, - -0.9972904324531555f, - -0.9968202710151672f, - -0.9963126182556152f, - -0.9957674145698547f, - -0.9951847195625305f, - -0.9945645928382874f, - -0.9939069747924805f, - -0.9932119250297546f, - -0.9924795627593994f, - -0.9917097687721252f, - -0.9909026622772217f, - -0.990058183670044f, - -0.9891765117645264f, - -0.9882575869560242f, - -0.9873014092445374f, - -0.9863080978393555f, - -0.9852776527404785f, - -0.9842100739479065f, - -0.983105480670929f, - -0.9819638729095459f, - -0.9807852506637573f, - -0.9795697927474976f, - -0.978317379951477f, - -0.9770281314849854f, - -0.9757021069526672f, - -0.9743393659591675f, - -0.9729399681091309f, - -0.9715039134025574f, - -0.9700312614440918f, - -0.9685220718383789f, - -0.9669764637947083f, - -0.9653944373130798f, - -0.9637760519981384f, - -0.9621214270591736f, - -0.9604305028915405f, - -0.9587034583091736f, - -0.9569403529167175f, - -0.9551411867141724f, - -0.9533060193061829f, - -0.9514350295066833f, - -0.949528157711029f, - -0.9475855827331543f, - -0.9456073045730591f, - -0.943593442440033f, - -0.9415440559387207f, - -0.9394592046737671f, - -0.9373390078544617f, - -0.9351835250854492f, - -0.9329928159713745f, - -0.9307669401168823f, - -0.928506076335907f, - -0.9262102246284485f, - -0.9238795042037964f, - -0.9215140342712402f, - -0.9191138744354248f, - -0.9166790843009949f, - -0.91420978307724f, - -0.9117060303688049f, - -0.909168004989624f, - -0.9065957069396973f, - -0.903989315032959f, - -0.9013488292694092f, - -0.898674488067627f, - -0.8959662318229675f, - -0.89322429895401f, - -0.8904487490653992f, - -0.8876396417617798f, - -0.8847970962524414f, - -0.8819212913513184f, - -0.8790122270584106f, - -0.8760700821876526f, - -0.8730949759483337f, - -0.8700869679450989f, - -0.8670462369918823f, - -0.8639728426933289f, - -0.8608669638633728f, - -0.8577286005020142f, - -0.854557991027832f, - -0.8513551950454712f, - -0.8481203317642212f, - -0.8448535799980164f, - -0.8415549993515015f, - -0.8382247090339661f, - -0.8348628878593445f, - -0.8314695954322815f, - -0.8280450701713562f, - -0.8245893120765686f, - -0.821102499961853f, - -0.8175848126411438f, - -0.8140363097190857f, - -0.810457170009613f, - -0.8068475723266602f, - -0.803207516670227f, - -0.7995372414588928f, - -0.7958369255065918f, - -0.792106568813324f, - -0.7883464097976685f, - -0.7845565676689148f, - -0.7807372212409973f, - -0.7768884897232056f, - -0.7730104327201843f, - -0.7691033482551575f, - -0.765167236328125f, - -0.7612023949623108f, - -0.7572088241577148f, - -0.753186821937561f, - -0.7491363883018494f, - -0.7450577616691589f, - -0.7409511208534241f, - -0.7368165850639343f, - -0.7326542735099792f, - -0.7284643650054932f, - -0.7242470979690552f, - -0.7200025320053101f, - -0.7157308459281921f, - -0.7114322185516357f, - -0.7071067690849304f, - -0.7027547359466553f, - -0.6983762383460999f, - -0.6939714550971985f, - -0.6895405650138855f, - -0.6850836873054504f, - -0.6806010007858276f, - -0.6760926842689514f, - -0.6715589761734009f, - -0.6669999361038208f, - -0.6624158024787903f, - -0.6578066945075989f, - -0.6531728506088257f, - -0.6485143899917603f, - -0.6438315510749817f, - -0.6391244530677795f, - -0.6343932747840881f, - -0.6296382546424866f, - -0.6248595118522644f, - -0.620057225227356f, - -0.6152315735816956f, - -0.6103827953338623f, - -0.6055110692977905f, - -0.600616455078125f, - -0.5956993103027344f, - -0.5907596945762634f, - -0.5857978463172913f, - -0.5808139443397522f, - -0.5758081674575806f, - -0.5707807540893555f, - -0.5657318234443665f, - -0.5606615543365479f, - -0.5555702447891235f, - -0.5504579544067383f, - -0.545324981212616f, - -0.5401714444160461f, - -0.5349976420402527f, - -0.5298036336898804f, - -0.5245896577835083f, - -0.5193560123443604f, - -0.5141027569770813f, - -0.5088301301002502f, - -0.5035383701324463f, - -0.49822765588760376f, - -0.49289819598197937f, - -0.48755016922950745f, - -0.4821837842464447f, - -0.47679921984672546f, - -0.4713967442512512f, - -0.4659765064716339f, - -0.46053871512413025f, - -0.45508357882499695f, - -0.4496113359928131f, - -0.4441221356391907f, - -0.43861624598503113f, - -0.4330938160419464f, - -0.4275550842285156f, - -0.4220002591609955f, - -0.4164295494556427f, - -0.410843163728714f, - -0.40524131059646606f, - -0.39962419867515564f, - -0.39399203658103943f, - -0.38834503293037415f, - -0.3826834261417389f, - -0.3770074248313904f, - -0.37131720781326294f, - -0.3656129837036133f, - -0.3598950505256653f, - -0.3541635274887085f, - -0.3484186828136444f, - -0.34266072511672974f, - -0.3368898630142212f, - -0.3311063051223755f, - -0.32531028985977173f, - -0.3195020258426666f, - -0.3136817514896393f, - -0.307849645614624f, - -0.30200594663619995f, - -0.29615089297294617f, - -0.290284663438797f, - -0.28440752625465393f, - -0.2785196900367737f, - -0.27262136340141296f, - -0.2667127549648285f, - -0.26079410314559937f, - -0.2548656463623047f, - -0.24892760813236237f, - -0.24298018217086792f, - -0.23702360689640045f, - -0.23105810582637787f, - -0.22508391737937927f, - -0.21910123527050018f, - -0.2131103128194809f, - -0.20711137354373932f, - -0.20110464096069336f, - -0.19509032368659973f, - -0.18906866014003754f, - -0.18303988873958588f, - -0.17700421810150146f, - -0.1709618866443634f, - -0.1649131178855896f, - -0.15885815024375916f, - -0.15279719233512878f, - -0.1467304676771164f, - -0.14065824449062347f, - -0.13458070158958435f, - -0.1284981071949005f, - -0.12241067737340927f, - -0.11631862819194794f, - -0.11022220551967621f, - -0.104121632874012f, - -0.0980171412229538f, - -0.09190895408391953f, - -0.08579730987548828f, - -0.07968243956565857f, - -0.0735645666718483f, - -0.06744392216205597f, - -0.06132073700428009f, - -0.055195245891809464f, - -0.049067676067352295f, - -0.04293825849890709f, - -0.03680722415447235f, - -0.030674804002046585f, - -0.024541229009628296f, - -0.018406730145215988f, - -0.012271538376808167f, - -0.006135884672403336f, - 0.0f, - -0.012271538376808167f, - -0.024541229009628296f, - -0.03680722415447235f, - -0.049067676067352295f, - -0.06132073700428009f, - -0.0735645666718483f, - -0.08579730987548828f, - -0.0980171412229538f, - -0.11022220551967621f, - -0.12241067737340927f, - -0.13458070158958435f, - -0.1467304676771164f, - -0.15885815024375916f, - -0.1709618866443634f, - -0.18303988873958588f, - -0.19509032368659973f, - -0.20711137354373932f, - -0.21910123527050018f, - -0.23105810582637787f, - -0.24298018217086792f, - -0.2548656463623047f, - -0.2667127549648285f, - -0.2785196900367737f, - -0.290284663438797f, - -0.30200594663619995f, - -0.3136817514896393f, - -0.32531028985977173f, - -0.3368898630142212f, - -0.3484186828136444f, - -0.3598950505256653f, - -0.37131720781326294f, - -0.3826834261417389f, - -0.39399203658103943f, - -0.40524131059646606f, - -0.4164295494556427f, - -0.4275550842285156f, - -0.43861624598503113f, - -0.4496113359928131f, - -0.46053871512413025f, - -0.4713967442512512f, - -0.4821837842464447f, - -0.49289819598197937f, - -0.5035383701324463f, - -0.5141027569770813f, - -0.5245896577835083f, - -0.5349976420402527f, - -0.545324981212616f, - -0.5555702447891235f, - -0.5657318234443665f, - -0.5758081674575806f, - -0.5857978463172913f, - -0.5956993103027344f, - -0.6055110692977905f, - -0.6152315735816956f, - -0.6248595118522644f, - -0.6343932747840881f, - -0.6438315510749817f, - -0.6531728506088257f, - -0.6624158024787903f, - -0.6715589761734009f, - -0.6806010007858276f, - -0.6895405650138855f, - -0.6983762383460999f, - -0.7071067690849304f, - -0.7157308459281921f, - -0.7242470979690552f, - -0.7326542735099792f, - -0.7409511208534241f, - -0.7491363883018494f, - -0.7572088241577148f, - -0.765167236328125f, - -0.7730104327201843f, - -0.7807372212409973f, - -0.7883464097976685f, - -0.7958369255065918f, - -0.803207516670227f, - -0.810457170009613f, - -0.8175848126411438f, - -0.8245893120765686f, - -0.8314695954322815f, - -0.8382247090339661f, - -0.8448535799980164f, - -0.8513551950454712f, - -0.8577286005020142f, - -0.8639728426933289f, - -0.8700869679450989f, - -0.8760700821876526f, - -0.8819212913513184f, - -0.8876396417617798f, - -0.89322429895401f, - -0.898674488067627f, - -0.903989315032959f, - -0.909168004989624f, - -0.91420978307724f, - -0.9191138744354248f, - -0.9238795042037964f, - -0.928506076335907f, - -0.9329928159713745f, - -0.9373390078544617f, - -0.9415440559387207f, - -0.9456073045730591f, - -0.949528157711029f, - -0.9533060193061829f, - -0.9569403529167175f, - -0.9604305028915405f, - -0.9637760519981384f, - -0.9669764637947083f, - -0.9700312614440918f, - -0.9729399681091309f, - -0.9757021069526672f, - -0.978317379951477f, - -0.9807852506637573f, - -0.983105480670929f, - -0.9852776527404785f, - -0.9873014092445374f, - -0.9891765117645264f, - -0.9909026622772217f, - -0.9924795627593994f, - -0.9939069747924805f, - -0.9951847195625305f, - -0.9963126182556152f, - -0.9972904324531555f, - -0.9981181025505066f, - -0.9987954497337341f, - -0.9993223547935486f, - -0.99969881772995f, - -0.9999247193336487f, - -1.0f, - -0.9999247193336487f, - -0.99969881772995f, - -0.9993223547935486f, - -0.9987954497337341f, - -0.9981181025505066f, - -0.9972904324531555f, - -0.9963126182556152f, - -0.9951847195625305f, - -0.9939069747924805f, - -0.9924795627593994f, - -0.9909026622772217f, - -0.9891765117645264f, - -0.9873014092445374f, - -0.9852776527404785f, - -0.983105480670929f, - -0.9807852506637573f, - -0.978317379951477f, - -0.9757021069526672f, - -0.9729399681091309f, - -0.9700312614440918f, - -0.9669764637947083f, - -0.9637760519981384f, - -0.9604305028915405f, - -0.9569403529167175f, - -0.9533060193061829f, - -0.949528157711029f, - -0.9456073045730591f, - -0.9415440559387207f, - -0.9373390078544617f, - -0.9329928159713745f, - -0.928506076335907f, - -0.9238795042037964f, - -0.9191138744354248f, - -0.91420978307724f, - -0.909168004989624f, - -0.903989315032959f, - -0.898674488067627f, - -0.89322429895401f, - -0.8876396417617798f, - -0.8819212913513184f, - -0.8760700821876526f, - -0.8700869679450989f, - -0.8639728426933289f, - -0.8577286005020142f, - -0.8513551950454712f, - -0.8448535799980164f, - -0.8382247090339661f, - -0.8314695954322815f, - -0.8245893120765686f, - -0.8175848126411438f, - -0.810457170009613f, - -0.803207516670227f, - -0.7958369255065918f, - -0.7883464097976685f, - -0.7807372212409973f, - -0.7730104327201843f, - -0.765167236328125f, - -0.7572088241577148f, - -0.7491363883018494f, - -0.7409511208534241f, - -0.7326542735099792f, - -0.7242470979690552f, - -0.7157308459281921f, - -0.7071067690849304f, - -0.6983762383460999f, - -0.6895405650138855f, - -0.6806010007858276f, - -0.6715589761734009f, - -0.6624158024787903f, - -0.6531728506088257f, - -0.6438315510749817f, - -0.6343932747840881f, - -0.6248595118522644f, - -0.6152315735816956f, - -0.6055110692977905f, - -0.5956993103027344f, - -0.5857978463172913f, - -0.5758081674575806f, - -0.5657318234443665f, - -0.5555702447891235f, - -0.545324981212616f, - -0.5349976420402527f, - -0.5245896577835083f, - -0.5141027569770813f, - -0.5035383701324463f, - -0.49289819598197937f, - -0.4821837842464447f, - -0.4713967442512512f, - -0.46053871512413025f, - -0.4496113359928131f, - -0.43861624598503113f, - -0.4275550842285156f, - -0.4164295494556427f, - -0.40524131059646606f, - -0.39399203658103943f, - -0.3826834261417389f, - -0.37131720781326294f, - -0.3598950505256653f, - -0.3484186828136444f, - -0.3368898630142212f, - -0.32531028985977173f, - -0.3136817514896393f, - -0.30200594663619995f, - -0.290284663438797f, - -0.2785196900367737f, - -0.2667127549648285f, - -0.2548656463623047f, - -0.24298018217086792f, - -0.23105810582637787f, - -0.21910123527050018f, - -0.20711137354373932f, - -0.19509032368659973f, - -0.18303988873958588f, - -0.1709618866443634f, - -0.15885815024375916f, - -0.1467304676771164f, - -0.13458070158958435f, - -0.12241067737340927f, - -0.11022220551967621f, - -0.0980171412229538f, - -0.08579730987548828f, - -0.0735645666718483f, - -0.06132073700428009f, - -0.049067676067352295f, - -0.03680722415447235f, - -0.024541229009628296f, - -0.012271538376808167f, - 0.0f, - -0.024541229009628296f, - -0.049067676067352295f, - -0.0735645666718483f, - -0.0980171412229538f, - -0.12241067737340927f, - -0.1467304676771164f, - -0.1709618866443634f, - -0.19509032368659973f, - -0.21910123527050018f, - -0.24298018217086792f, - -0.2667127549648285f, - -0.290284663438797f, - -0.3136817514896393f, - -0.3368898630142212f, - -0.3598950505256653f, - -0.3826834261417389f, - -0.40524131059646606f, - -0.4275550842285156f, - -0.4496113359928131f, - -0.4713967442512512f, - -0.49289819598197937f, - -0.5141027569770813f, - -0.5349976420402527f, - -0.5555702447891235f, - -0.5758081674575806f, - -0.5956993103027344f, - -0.6152315735816956f, - -0.6343932747840881f, - -0.6531728506088257f, - -0.6715589761734009f, - -0.6895405650138855f, - -0.7071067690849304f, - -0.7242470979690552f, - -0.7409511208534241f, - -0.7572088241577148f, - -0.7730104327201843f, - -0.7883464097976685f, - -0.803207516670227f, - -0.8175848126411438f, - -0.8314695954322815f, - -0.8448535799980164f, - -0.8577286005020142f, - -0.8700869679450989f, - -0.8819212913513184f, - -0.89322429895401f, - -0.903989315032959f, - -0.91420978307724f, - -0.9238795042037964f, - -0.9329928159713745f, - -0.9415440559387207f, - -0.949528157711029f, - -0.9569403529167175f, - -0.9637760519981384f, - -0.9700312614440918f, - -0.9757021069526672f, - -0.9807852506637573f, - -0.9852776527404785f, - -0.9891765117645264f, - -0.9924795627593994f, - -0.9951847195625305f, - -0.9972904324531555f, - -0.9987954497337341f, - -0.99969881772995f, - -1.0f, - -0.99969881772995f, - -0.9987954497337341f, - -0.9972904324531555f, - -0.9951847195625305f, - -0.9924795627593994f, - -0.9891765117645264f, - -0.9852776527404785f, - -0.9807852506637573f, - -0.9757021069526672f, - -0.9700312614440918f, - -0.9637760519981384f, - -0.9569403529167175f, - -0.949528157711029f, - -0.9415440559387207f, - -0.9329928159713745f, - -0.9238795042037964f, - -0.91420978307724f, - -0.903989315032959f, - -0.89322429895401f, - -0.8819212913513184f, - -0.8700869679450989f, - -0.8577286005020142f, - -0.8448535799980164f, - -0.8314695954322815f, - -0.8175848126411438f, - -0.803207516670227f, - -0.7883464097976685f, - -0.7730104327201843f, - -0.7572088241577148f, - -0.7409511208534241f, - -0.7242470979690552f, - -0.7071067690849304f, - -0.6895405650138855f, - -0.6715589761734009f, - -0.6531728506088257f, - -0.6343932747840881f, - -0.6152315735816956f, - -0.5956993103027344f, - -0.5758081674575806f, - -0.5555702447891235f, - -0.5349976420402527f, - -0.5141027569770813f, - -0.49289819598197937f, - -0.4713967442512512f, - -0.4496113359928131f, - -0.4275550842285156f, - -0.40524131059646606f, - -0.3826834261417389f, - -0.3598950505256653f, - -0.3368898630142212f, - -0.3136817514896393f, - -0.290284663438797f, - -0.2667127549648285f, - -0.24298018217086792f, - -0.21910123527050018f, - -0.19509032368659973f, - -0.1709618866443634f, - -0.1467304676771164f, - -0.12241067737340927f, - -0.0980171412229538f, - -0.0735645666718483f, - -0.049067676067352295f, - -0.024541229009628296f, - 0.0f, - -0.049067676067352295f, - -0.0980171412229538f, - -0.1467304676771164f, - -0.19509032368659973f, - -0.24298018217086792f, - -0.290284663438797f, - -0.3368898630142212f, - -0.3826834261417389f, - -0.4275550842285156f, - -0.4713967442512512f, - -0.5141027569770813f, - -0.5555702447891235f, - -0.5956993103027344f, - -0.6343932747840881f, - -0.6715589761734009f, - -0.7071067690849304f, - -0.7409511208534241f, - -0.7730104327201843f, - -0.803207516670227f, - -0.8314695954322815f, - -0.8577286005020142f, - -0.8819212913513184f, - -0.903989315032959f, - -0.9238795042037964f, - -0.9415440559387207f, - -0.9569403529167175f, - -0.9700312614440918f, - -0.9807852506637573f, - -0.9891765117645264f, - -0.9951847195625305f, - -0.9987954497337341f, - -1.0f, - -0.9987954497337341f, - -0.9951847195625305f, - -0.9891765117645264f, - -0.9807852506637573f, - -0.9700312614440918f, - -0.9569403529167175f, - -0.9415440559387207f, - -0.9238795042037964f, - -0.903989315032959f, - -0.8819212913513184f, - -0.8577286005020142f, - -0.8314695954322815f, - -0.803207516670227f, - -0.7730104327201843f, - -0.7409511208534241f, - -0.7071067690849304f, - -0.6715589761734009f, - -0.6343932747840881f, - -0.5956993103027344f, - -0.5555702447891235f, - -0.5141027569770813f, - -0.4713967442512512f, - -0.4275550842285156f, - -0.3826834261417389f, - -0.3368898630142212f, - -0.290284663438797f, - -0.24298018217086792f, - -0.19509032368659973f, - -0.1467304676771164f, - -0.0980171412229538f, - -0.049067676067352295f, - 0.0f, - -0.0980171412229538f, - -0.19509032368659973f, - -0.290284663438797f, - -0.3826834261417389f, - -0.4713967442512512f, - -0.5555702447891235f, - -0.6343932747840881f, - -0.7071067690849304f, - -0.7730104327201843f, - -0.8314695954322815f, - -0.8819212913513184f, - -0.9238795042037964f, - -0.9569403529167175f, - -0.9807852506637573f, - -0.9951847195625305f, - -1.0f, - -0.9951847195625305f, - -0.9807852506637573f, - -0.9569403529167175f, - -0.9238795042037964f, - -0.8819212913513184f, - -0.8314695954322815f, - -0.7730104327201843f, - -0.7071067690849304f, - -0.6343932747840881f, - -0.5555702447891235f, - -0.4713967442512512f, - -0.3826834261417389f, - -0.290284663438797f, - -0.19509032368659973f, - -0.0980171412229538f, - 0.0f, - -0.19509032368659973f, - -0.3826834261417389f, - -0.5555702447891235f, - -0.7071067690849304f, - -0.8314695954322815f, - -0.9238795042037964f, - -0.9807852506637573f, - -1.0f, - -0.9807852506637573f, - -0.9238795042037964f, - -0.8314695954322815f, - -0.7071067690849304f, - -0.5555702447891235f, - -0.3826834261417389f, - -0.19509032368659973f, - 0.0f, - -0.3826834261417389f, - -0.7071067690849304f, - -0.9238795042037964f, - -1.0f, - -0.9238795042037964f, - -0.7071067690849304f, - -0.3826834261417389f, - 0.0f, - -0.7071067690849304f, - -1.0f, - -0.7071067690849304f, - 0.0f, - -1.0f, - 0.0f, -}; -float verify_Xr[DATA_SIZE] = { - 3716.4453125f, - -267.9983825683594f, - 98.32199096679688f, - -3.9653897285461426f, - 156.2613983154297f, - -215.6190948486328f, - -40.383750915527344f, - -128.69554138183594f, - 150.4287872314453f, - 40.17384719848633f, - -49.59769058227539f, - 218.05001831054688f, - 17.241098403930664f, - -64.37892150878906f, - -80.90656280517578f, - -119.32131958007812f, - -101.27069091796875f, - 39.86737060546875f, - 79.09239959716797f, - -35.4920539855957f, - -64.3996810913086f, - -167.54385375976562f, - 0.4107278287410736f, - -218.21878051757812f, - -92.39613342285156f, - -32.023468017578125f, - 92.21080780029297f, - 40.778953552246094f, - -117.93646240234375f, - -216.8610382080078f, - 127.96993255615234f, - -79.22346496582031f, - 21.981021881103516f, - 16.1799373626709f, - -56.68623733520508f, - 36.67020797729492f, - -88.30011749267578f, - -85.9017105102539f, - -124.73710632324219f, - -282.87811279296875f, - -20.9166316986084f, - 15.774201393127441f, - -65.98809814453125f, - -105.0404052734375f, - 249.93673706054688f, - -58.16583251953125f, - -51.16581726074219f, - 79.26528930664062f, - -67.16885375976562f, - -43.098323822021484f, - 30.136682510375977f, - 119.19956970214844f, - -6.615887641906738f, - 95.95442199707031f, - 28.836986541748047f, - 4.779397487640381f, - -20.376508712768555f, - -300.6998291015625f, - -122.56625366210938f, - 73.70780944824219f, - 4.307649612426758f, - 44.70223617553711f, - -64.6448974609375f, - 14.843615531921387f, - -73.48257446289062f, - 148.53822326660156f, - -68.0318374633789f, - 192.67015075683594f, - 131.0720672607422f, - -19.413494110107422f, - -219.27980041503906f, - -348.4984130859375f, - 68.71867370605469f, - 31.098865509033203f, - 48.53592300415039f, - -63.057884216308594f, - -96.89049530029297f, - -154.32130432128906f, - -7.252927303314209f, - 100.28987884521484f, - -228.52822875976562f, - -22.02593231201172f, - 24.110275268554688f, - -171.10226440429688f, - 68.65274047851562f, - 26.319927215576172f, - -36.66766357421875f, - 126.6988525390625f, - -114.3982162475586f, - -40.64214324951172f, - -48.92008972167969f, - 173.309326171875f, - 4.272372722625732f, - -67.78557586669922f, - 221.308837890625f, - -124.54756164550781f, - -12.350656509399414f, - 53.30644226074219f, - -28.111221313476562f, - 112.14599609375f, - 118.84486389160156f, - 57.616783142089844f, - -130.2318878173828f, - -34.73903274536133f, - -80.29009246826172f, - 279.3127746582031f, - 207.20697021484375f, - -145.65826416015625f, - -69.60079956054688f, - -25.42017364501953f, - 95.73529052734375f, - -128.13534545898438f, - 3.367725372314453f, - -114.2160415649414f, - 12.591071128845215f, - 203.12730407714844f, - -178.6352996826172f, - 42.915985107421875f, - 60.196266174316406f, - 110.21049499511719f, - -48.13032913208008f, - -91.4577407836914f, - 108.74422454833984f, - -124.69869995117188f, - -264.6165466308594f, - -8.151089668273926f, - -67.97704315185547f, - -51.68428039550781f, - 175.5607147216797f, - -162.81996154785156f, - 71.17991638183594f, - -37.44194030761719f, - 322.8318786621094f, - -56.17836380004883f, - -35.8679313659668f, - -12.838441848754883f, - 75.71537780761719f, - 223.6103515625f, - 2.763267755508423f, - -4.584329128265381f, - -171.41603088378906f, - 187.4361572265625f, - 40.88155746459961f, - -118.55718231201172f, - 77.58524322509766f, - 104.25991821289062f, - -46.09897232055664f, - -217.0098419189453f, - 16.38774299621582f, - -124.71113586425781f, - 158.76336669921875f, - -86.4174575805664f, - 4.181918621063232f, - -27.771739959716797f, - 189.92918395996094f, - -77.45915222167969f, - -50.57688903808594f, - -78.34752655029297f, - 138.12245178222656f, - -78.32986450195312f, - 174.60963439941406f, - -211.41314697265625f, - 75.97240447998047f, - -94.54586029052734f, - 160.9453582763672f, - 96.19084167480469f, - -38.41650390625f, - 222.4024200439453f, - -182.30453491210938f, - -144.46377563476562f, - 43.61826705932617f, - -234.02035522460938f, - -137.03924560546875f, - -142.681640625f, - 29.008403778076172f, - 23.3663387298584f, - 37.032958984375f, - 39.966739654541016f, - -185.57571411132812f, - -135.69219970703125f, - -22.468799591064453f, - -401.6009521484375f, - 161.05062866210938f, - -64.17134094238281f, - -20.798065185546875f, - 192.5962677001953f, - 7.344879150390625f, - -44.931819915771484f, - 11.586076736450195f, - -31.446372985839844f, - 158.1278533935547f, - 76.51516723632812f, - -34.20839309692383f, - 174.81085205078125f, - 269.4658508300781f, - 38.61288833618164f, - 218.0538787841797f, - 78.40435791015625f, - -32.705352783203125f, - 23.128578186035156f, - 151.4041290283203f, - -264.2483825683594f, - 6.105564594268799f, - -53.721656799316406f, - 71.58146667480469f, - 17.168991088867188f, - 27.277376174926758f, - 55.880924224853516f, - -37.79456329345703f, - -124.24323272705078f, - -111.4734878540039f, - -72.12679290771484f, - 70.06756591796875f, - 111.41484069824219f, - 124.65719604492188f, - 88.25847625732422f, - 206.87484741210938f, - 35.45157241821289f, - 81.11978149414062f, - -149.1204833984375f, - 81.3038330078125f, - -57.826255798339844f, - -43.81435775756836f, - 83.97855377197266f, - 221.73239135742188f, - 64.00222778320312f, - -67.90376281738281f, - 6.598657608032227f, - -58.06986999511719f, - 79.76872253417969f, - -237.78717041015625f, - -16.716285705566406f, - 206.3964080810547f, - -95.24481201171875f, - 271.76666259765625f, - 79.9280776977539f, - 3.824159860610962f, - 216.51751708984375f, - 151.5911865234375f, - -118.24484252929688f, - 309.4955749511719f, - 152.53187561035156f, - -83.72059631347656f, - 141.96697998046875f, - -19.267358779907227f, - -234.8446807861328f, - -59.2127571105957f, - -2.7918202877044678f, - -145.12274169921875f, - 59.45796203613281f, - -151.22183227539062f, - 67.78755950927734f, - -267.6466369628906f, - -110.92325592041016f, - -14.01571273803711f, - -108.51387023925781f, - 28.90625f, - -31.722383499145508f, - 30.998132705688477f, - -58.94423294067383f, - 13.604656219482422f, - 284.51983642578125f, - 37.304237365722656f, - -290.47698974609375f, - 108.97459411621094f, - -23.286916732788086f, - 75.91149139404297f, - 79.21886444091797f, - 199.45970153808594f, - 132.3271026611328f, - 8.498868942260742f, - -116.3739242553711f, - -15.861201286315918f, - 42.807952880859375f, - -42.02463150024414f, - -276.7430725097656f, - 134.4761505126953f, - -22.05470848083496f, - -185.7274627685547f, - -74.75767517089844f, - -94.84724426269531f, - -20.810901641845703f, - -254.74266052246094f, - 285.9656677246094f, - -38.4069938659668f, - 83.486328125f, - -77.92860412597656f, - -86.39501190185547f, - 264.1016540527344f, - -15.31369686126709f, - 212.34625244140625f, - -148.6055450439453f, - 121.60713958740234f, - 94.39031982421875f, - -59.807945251464844f, - 7.396492004394531f, - -193.93150329589844f, - -44.8328857421875f, - -81.45255279541016f, - 161.16818237304688f, - -12.082969665527344f, - -8.949947357177734f, - 77.4330062866211f, - -382.95562744140625f, - 18.183481216430664f, - -328.0595397949219f, - 3.782353639602661f, - -269.7502746582031f, - 24.702268600463867f, - -127.29301452636719f, - 83.09827423095703f, - -133.48577880859375f, - 85.69451904296875f, - 61.983585357666016f, - -166.6085662841797f, - 89.13935089111328f, - -48.88228988647461f, - -68.81436157226562f, - -10.889283180236816f, - -116.15495300292969f, - -85.010009765625f, - 105.73049926757812f, - -52.646114349365234f, - -165.97984313964844f, - 241.7491455078125f, - 61.7586784362793f, - 75.14899444580078f, - -62.79006576538086f, - 124.43060302734375f, - 158.90757751464844f, - -150.20494079589844f, - 2.0644400119781494f, - 86.75874328613281f, - 160.28855895996094f, - -1.0678155422210693f, - 160.57864379882812f, - -102.02857208251953f, - 232.5377960205078f, - -2.1763575077056885f, - 52.75016784667969f, - -58.68678283691406f, - -23.688016891479492f, - -3.6604325771331787f, - 92.83557891845703f, - -229.3317108154297f, - 145.25921630859375f, - -227.78736877441406f, - 18.35646629333496f, - -13.128929138183594f, - 85.8497543334961f, - -42.664268493652344f, - -209.39785766601562f, - 41.946311950683594f, - -55.164676666259766f, - -56.866676330566406f, - 20.113664627075195f, - -14.598816871643066f, - 153.2086944580078f, - 50.850433349609375f, - 188.54708862304688f, - -221.94485473632812f, - 155.04615783691406f, - -51.46888732910156f, - -190.162353515625f, - -141.369384765625f, - -40.86743927001953f, - 43.31480407714844f, - 111.43016815185547f, - 65.90547943115234f, - 175.8692169189453f, - -124.55435943603516f, - -43.304405212402344f, - -95.52716827392578f, - 39.70339584350586f, - -145.7320556640625f, - 113.78180694580078f, - -64.5067138671875f, - -61.62473678588867f, - -25.680763244628906f, - 58.27463150024414f, - 13.465123176574707f, - -115.64102935791016f, - -122.65838623046875f, - -24.242862701416016f, - -111.2925796508789f, - 37.94166564941406f, - -85.5996322631836f, - -29.948413848876953f, - 101.77119445800781f, - -228.21238708496094f, - -39.54848861694336f, - 62.068973541259766f, - 145.73410034179688f, - -34.55966567993164f, - 37.06189727783203f, - 116.53771209716797f, - -131.80372619628906f, - -95.345458984375f, - 59.29605484008789f, - -119.4374771118164f, - -121.610595703125f, - -180.873046875f, - -211.82196044921875f, - -7.555936336517334f, - -14.749756813049316f, - 272.78424072265625f, - -109.26966857910156f, - -2.6984620094299316f, - 45.46072769165039f, - -92.12094116210938f, - 26.404687881469727f, - -119.10208892822266f, - -163.9833526611328f, - 128.92665100097656f, - 184.5361328125f, - -46.76160430908203f, - -22.133018493652344f, - -192.9971160888672f, - -84.17024230957031f, - -1.84059739112854f, - 239.0219268798828f, - 34.37836456298828f, - 10.844440460205078f, - 340.9754333496094f, - 106.36224365234375f, - 74.96183013916016f, - -15.301036834716797f, - -189.72930908203125f, - -165.38990783691406f, - -181.21461486816406f, - -143.1200408935547f, - 34.488956451416016f, - -38.14014434814453f, - 251.35348510742188f, - 97.41740417480469f, - 12.338766098022461f, - 75.62059783935547f, - -329.05816650390625f, - 67.9797134399414f, - -11.97373104095459f, - -125.91352844238281f, - -150.73419189453125f, - -151.3784637451172f, - 141.75218200683594f, - 158.54180908203125f, - -157.11302185058594f, - 36.320003509521484f, - 137.31776428222656f, - -127.21408081054688f, - 160.99888610839844f, - 46.550048828125f, - -21.136625289916992f, - -40.46889114379883f, - 153.88656616210938f, - 50.84693908691406f, - -99.83378601074219f, - -38.282073974609375f, - 19.900785446166992f, - 28.735605239868164f, - 64.42896270751953f, - 89.98438262939453f, - 45.73869705200195f, - -69.83159637451172f, - 149.3732452392578f, - -107.57123565673828f, - 76.0601806640625f, - 29.771875381469727f, - -63.514183044433594f, - -3.7916922569274902f, - 149.35874938964844f, - -172.8159637451172f, - 81.9051284790039f, - -46.2247428894043f, - 18.939220428466797f, - -50.25407409667969f, - -79.6228256225586f, - 74.16687774658203f, - 9.875847816467285f, - 91.94304656982422f, - -15.942201614379883f, - 16.493213653564453f, - -22.30901527404785f, - -26.23944854736328f, - 64.68000793457031f, - -77.1461181640625f, - -150.03860473632812f, - 38.89314270019531f, - 136.08029174804688f, - -31.66204071044922f, - 7.8224287033081055f, - -30.954023361206055f, - 122.1265640258789f, - -73.90191650390625f, - 127.4166030883789f, - -102.27446746826172f, - 18.749908447265625f, - 188.90773010253906f, - 182.96279907226562f, - 320.8580322265625f, - 3.4644596576690674f, - 136.3708038330078f, - -68.52808380126953f, - -233.33721923828125f, - 109.71868133544922f, - -35.075660705566406f, - 145.01583862304688f, - -93.46265411376953f, - -243.19422912597656f, - -136.68853759765625f, - -33.333587646484375f, - 116.13785552978516f, - 250.43563842773438f, - -116.2578125f, - 64.16232299804688f, - 25.666641235351562f, - 12.60000991821289f, - -86.0320053100586f, - -37.34809494018555f, - -255.72695922851562f, - -14.907633781433105f, - -83.75979614257812f, - 109.7326431274414f, - -92.25244903564453f, - 78.03236389160156f, - -96.92267608642578f, - 227.78262329101562f, - -24.218843460083008f, - 18.733007431030273f, - 30.255895614624023f, - 110.47029876708984f, - -11.414347648620605f, - -0.8711560964584351f, - -31.576143264770508f, - -77.83340454101562f, - -193.951416015625f, - -37.830570220947266f, - -25.405086517333984f, - -71.5651626586914f, - -49.03468704223633f, - 83.08433532714844f, - -85.48650360107422f, - 58.10995101928711f, - 31.280187606811523f, - 0.04943599924445152f, - 39.50444412231445f, - 69.45149993896484f, - -40.92319107055664f, - -176.72042846679688f, - 134.76194763183594f, - -250.06788635253906f, - -29.67942237854004f, - -90.71533203125f, - 8.054924011230469f, - -14.922624588012695f, - -159.2633819580078f, - 174.2520751953125f, - -0.76315838098526f, - -17.414867401123047f, - 293.4519348144531f, - -129.2104034423828f, - -139.7736053466797f, - -15.642581939697266f, - 84.23212432861328f, - -102.75594329833984f, - -39.68460464477539f, - 75.42041778564453f, - -6.906343460083008f, - 187.524658203125f, - -178.32521057128906f, - -135.93215942382812f, - 79.09184265136719f, - -15.3114595413208f, - 276.1877136230469f, - 49.678863525390625f, - -101.88280487060547f, - -86.98890686035156f, - 53.159507751464844f, - 43.015380859375f, - 21.89893341064453f, - -284.0716247558594f, - 0.38634511828422546f, - -121.55339813232422f, - -182.25286865234375f, - -169.5706024169922f, - -163.08074951171875f, - 110.75883483886719f, - -46.9610481262207f, - 43.89743423461914f, - -19.169172286987305f, - 132.5880584716797f, - 198.7369384765625f, - -275.69158935546875f, - -43.074214935302734f, - 136.45791625976562f, - -52.473426818847656f, - 71.65959930419922f, - -381.06597900390625f, - 127.48417663574219f, - 185.041015625f, - 118.1220932006836f, - -245.19395446777344f, - -33.291748046875f, - 85.84770965576172f, - 65.02568054199219f, - -33.77978515625f, - -53.001949310302734f, - 94.92362213134766f, - -63.16624450683594f, - -41.262229919433594f, - -17.5167236328125f, - -1.7106685638427734f, - -52.2843017578125f, - -174.19161987304688f, - -182.76206970214844f, - 132.1415557861328f, - -167.4803009033203f, - 90.92243194580078f, - -9.168347358703613f, - 9.301093101501465f, - 108.76580047607422f, - 62.61652374267578f, - 116.02909088134766f, - -128.9445343017578f, - 266.7103576660156f, - 14.737152099609375f, - -20.74768829345703f, - 146.28184509277344f, - 127.03099822998047f, - 58.494266510009766f, - 7.257815837860107f, - 28.500375747680664f, - 144.65966796875f, - -276.5377502441406f, - -11.430262565612793f, - 148.80093383789062f, - 18.164194107055664f, - -135.44407653808594f, - 85.2668228149414f, - 76.359619140625f, - -6.770626068115234f, - -1.7794605493545532f, - 131.64955139160156f, - 189.0666046142578f, - -84.97420501708984f, - -176.89639282226562f, - 83.90332794189453f, - 26.913190841674805f, - -23.49361228942871f, - 70.26067352294922f, - 88.14080047607422f, - -225.23545837402344f, - 103.89613342285156f, - -152.700927734375f, - 56.8355598449707f, - 35.84552001953125f, - -169.8487091064453f, - -81.34219360351562f, - 97.16031646728516f, - 61.42207717895508f, - 26.09053611755371f, - -173.17767333984375f, - -146.08279418945312f, - 88.75503540039062f, - 149.96218872070312f, - -25.60020637512207f, - -35.27156066894531f, - 114.19355010986328f, - -153.427734375f, - -56.803749084472656f, - -3.8066279888153076f, - 86.96688079833984f, - 102.71210479736328f, - -15.364280700683594f, - 224.45150756835938f, - -52.2860221862793f, - -43.42850112915039f, - -8.903968811035156f, - -19.921058654785156f, - 151.2247314453125f, - 42.96928024291992f, - -82.81979370117188f, - -19.180442810058594f, - 71.85559844970703f, - 176.02688598632812f, - 32.765296936035156f, - 135.79933166503906f, - 44.87236022949219f, - -23.774145126342773f, - 133.3619842529297f, - 9.101649284362793f, - 121.32196807861328f, - 111.6192398071289f, - -113.06497955322266f, - 184.81663513183594f, - -153.46603393554688f, - 230.6988067626953f, - -89.84378814697266f, - 17.414491653442383f, - 202.9366912841797f, - 8.737259864807129f, - -34.381431579589844f, - 69.1857681274414f, - -122.71168518066406f, - 152.33656311035156f, - 99.53600311279297f, - 73.8495101928711f, - 13.505208969116211f, - 38.027740478515625f, - 188.8500518798828f, - -2.5176286697387695f, - 28.31868553161621f, - -54.47079086303711f, - 18.86955451965332f, - -118.70134735107422f, - -3.5161025524139404f, - 21.982622146606445f, - 149.87969970703125f, - 13.80260944366455f, - 174.12692260742188f, - -85.37506103515625f, - 145.75296020507812f, - -11.743408203125f, - 234.7858428955078f, - 1.138235092163086f, - 71.95234680175781f, - -116.95512390136719f, - 173.62339782714844f, - 40.180484771728516f, - -29.161800384521484f, - 19.680877685546875f, - 20.55025291442871f, - 64.88135528564453f, - -18.270795822143555f, - 95.78907012939453f, - 129.6099395751953f, - -75.50790405273438f, - 85.50211334228516f, - 223.43417358398438f, - -41.03782272338867f, - -126.09185791015625f, - -164.7236328125f, - 29.41657257080078f, - 97.71292114257812f, - 12.223085403442383f, - -100.67781066894531f, - 85.59883880615234f, - -96.4440689086914f, - -121.69043731689453f, - 56.89594650268555f, - -79.05570983886719f, - -137.75701904296875f, - 65.38069152832031f, - 86.17481231689453f, - -249.42633056640625f, - -34.02916717529297f, - -31.07555389404297f, - 35.15608215332031f, - -55.366668701171875f, - -149.71397399902344f, - 93.68408203125f, - 13.069314002990723f, - 67.02820587158203f, - -101.64639282226562f, - -49.998680114746094f, - -112.3769302368164f, - 52.92754364013672f, - -11.714064598083496f, - -131.51829528808594f, - -43.875f, - 63.630496978759766f, - -39.22026824951172f, - -35.74212646484375f, - 50.25226974487305f, - 40.541542053222656f, - -140.16587829589844f, - 231.16973876953125f, - -86.7997055053711f, - 159.0029296875f, - 0.8330770134925842f, - -92.62281799316406f, - 100.59178161621094f, - 47.539852142333984f, - -17.978498458862305f, - -87.43820190429688f, - -104.4014663696289f, - -244.84527587890625f, - 22.510719299316406f, - -25.61060905456543f, - 29.320112228393555f, - 135.25115966796875f, - -71.51119995117188f, - 154.22097778320312f, - 42.11758041381836f, - 26.23859977722168f, - 50.29695129394531f, - -192.14952087402344f, - -139.32801818847656f, - 122.04437255859375f, - -44.85347366333008f, - 27.507734298706055f, - -1.8671587705612183f, - 45.51864242553711f, - -154.51206970214844f, - -147.97552490234375f, - 138.759765625f, - -17.840024948120117f, - -262.289306640625f, - -80.66183471679688f, - -30.47651481628418f, - 63.28193664550781f, - -92.01419067382812f, - 110.24069213867188f, - -355.8740539550781f, - 180.4039764404297f, - -86.51178741455078f, - 149.55552673339844f, - -170.4729766845703f, - 63.914310455322266f, - 6.286461353302002f, - 199.44725036621094f, - -173.6062774658203f, - 175.8278045654297f, - -116.53065490722656f, - 88.60388946533203f, - 33.15425109863281f, - -90.26959991455078f, - -67.87789154052734f, - 193.117431640625f, - 63.18465042114258f, - 312.9288635253906f, - -123.77703857421875f, - -27.51091766357422f, - -78.10443115234375f, - -30.127527236938477f, - -55.962554931640625f, - 7.956935882568359f, - -68.44017791748047f, - 2.924241542816162f, - -62.82518768310547f, - 100.80918884277344f, - -50.56694030761719f, - 108.91692352294922f, - -57.98463821411133f, - 32.00202560424805f, - -16.503559112548828f, - 63.616458892822266f, - -16.54847526550293f, - 4.1085710525512695f, - -69.63945770263672f, - -79.22288513183594f, - -166.32130432128906f, - -31.279300689697266f, - 119.0594253540039f, - 58.89280700683594f, - 303.51123046875f, - -123.64029693603516f, - -7.160595417022705f, - 4.807007789611816f, - 38.48084259033203f, - -52.0728759765625f, - 35.93497848510742f, - 89.69671630859375f, - -3.7185006141662598f, - -145.78851318359375f, - -7.911993026733398f, - 120.19866180419922f, - -14.273785591125488f, - -143.8704376220703f, - 214.9861602783203f, - -61.68754577636719f, - -81.26407623291016f, - -45.98223876953125f, - 88.20536041259766f, - 14.377686500549316f, - -53.34117126464844f, - -160.9559326171875f, - -231.17153930664062f, - -173.10906982421875f, - 201.5467987060547f, - -110.07127380371094f, - -97.08951568603516f, - -117.51844024658203f, - 201.6609649658203f, - -128.12538146972656f, - 49.61946487426758f, - -107.62490844726562f, - -78.12398529052734f, - 152.9134979248047f, - 50.76310729980469f, - -67.79439544677734f, - 187.232421875f, - 43.671077728271484f, - 217.22227478027344f, - -138.4818115234375f, - 156.10812377929688f, - -29.20357894897461f, - -168.26991271972656f, - -69.33748626708984f, - -14.261480331420898f, - -254.51373291015625f, - 242.31539916992188f, - -85.15767669677734f, - -61.03107452392578f, - -29.2766170501709f, - -183.45358276367188f, - -113.76705932617188f, - 180.55242919921875f, - -207.7382354736328f, - -102.73787689208984f, - 86.01400756835938f, - -77.5783462524414f, - -25.485754013061523f, - -22.199670791625977f, - 64.99531555175781f, - 17.64375877380371f, - -245.71104431152344f, - 102.23419952392578f, - -61.552032470703125f, - 104.89251708984375f, - 23.665218353271484f, - -210.1998291015625f, - -152.0834503173828f, - 39.67820739746094f, - 227.91554260253906f, - -10.562061309814453f, - 44.48756790161133f, - -103.19779205322266f, - -44.18059539794922f, - -194.83229064941406f, - 2.623906135559082f, - -199.0507354736328f, - 50.16276931762695f, - 16.36436653137207f, - -2.457519769668579f, - 87.26914978027344f, - -93.98115539550781f, - 121.28546905517578f, - -380.8306884765625f, - 76.85733032226562f, - 10.825291633605957f, - -35.28277587890625f, - -56.3471794128418f, - -34.314517974853516f, - -142.5583038330078f, - -68.0180892944336f, - -66.32486724853516f, - 247.0606689453125f, - 154.88973999023438f, - 2.552262783050537f, - 99.9106674194336f, - -128.2970733642578f, - -158.5529022216797f, - 149.9820556640625f, - -28.713106155395508f, - -218.87530517578125f, - -100.13453674316406f, - -90.79178619384766f, - -144.76596069335938f, - -78.1505355834961f, - 92.39896392822266f, - 119.88646697998047f, - -195.92398071289062f, - -33.41939163208008f, - 114.09896087646484f, - -92.72386932373047f, - 22.77845001220703f, - 20.617828369140625f, - 14.405352592468262f, - 1.1235179901123047f, - 16.037588119506836f, - -148.62826538085938f, - 84.35147094726562f, - 189.62689208984375f, - 81.88655853271484f, - -80.06804656982422f, - -43.18307113647461f, - -20.14699363708496f, - 106.05152893066406f, - 65.7607421875f, - 31.98207664489746f, - -21.91515350341797f, - 110.77540588378906f, - -166.47952270507812f, - 63.88624572753906f, - -4.501167297363281f, - 302.5154724121094f, - 45.891178131103516f, - 40.41679000854492f, - -10.10234260559082f, - -110.00862121582031f, - 32.05299758911133f, - 93.86236572265625f, - 138.10086059570312f, - -144.97265625f, - 184.87110900878906f, - 149.15052795410156f, - 122.22515106201172f, - 183.83311462402344f, - -128.37950134277344f, - -26.775157928466797f, - -58.92021942138672f, - 10.440747261047363f, - -98.1972885131836f, - 164.5303497314453f, - -195.93344116210938f, - -28.368824005126953f, - 27.745210647583008f, - -39.71839904785156f, - -137.37872314453125f, - 81.43755340576172f, - 100.70350646972656f, - -72.3413314819336f, - -99.10224914550781f, - -144.1120147705078f, - 25.696548461914062f, - 131.39773559570312f, - -44.74735641479492f, - 136.8644256591797f, - -194.6734161376953f, - 13.941021919250488f, - -13.088021278381348f, - 51.315494537353516f, - -5.307824611663818f, - -55.42025375366211f, -}; -float verify_Xi[DATA_SIZE] = { - 3859.5234375f, - -162.10719299316406f, - 216.35067749023438f, - -42.63295364379883f, - -107.55464172363281f, - -81.96739196777344f, - -23.154935836791992f, - -117.87298583984375f, - -13.819063186645508f, - -2.4805359840393066f, - 137.46368408203125f, - -20.401477813720703f, - 1.3199840784072876f, - -35.12789535522461f, - 137.55300903320312f, - -188.75289916992188f, - -53.85108184814453f, - -113.4060287475586f, - -95.153076171875f, - 122.8521957397461f, - -8.896185874938965f, - 92.5451889038086f, - 135.66143798828125f, - 138.63369750976562f, - 45.68700408935547f, - 233.44322204589844f, - 36.69723129272461f, - 7.478953838348389f, - 14.228840827941895f, - -35.5638313293457f, - -10.12244987487793f, - 11.051090240478516f, - 44.25132751464844f, - 66.73035430908203f, - -162.56544494628906f, - -52.060394287109375f, - -108.2804946899414f, - -70.13434600830078f, - 305.9521179199219f, - -133.90663146972656f, - 13.145288467407227f, - -71.51019287109375f, - 24.34937858581543f, - -220.2060546875f, - 71.12361907958984f, - 176.25631713867188f, - -96.35183715820312f, - 203.01805114746094f, - 186.58401489257812f, - -14.370003700256348f, - -43.88832092285156f, - -77.20586395263672f, - 67.40638732910156f, - -407.44207763671875f, - 58.04410934448242f, - 18.208463668823242f, - 77.89894104003906f, - -148.797607421875f, - -58.10478973388672f, - 65.6527099609375f, - -44.31429672241211f, - 29.472503662109375f, - -17.508182525634766f, - 83.27496337890625f, - -161.7346649169922f, - 18.19756317138672f, - -173.54904174804688f, - -91.16741180419922f, - 102.01688385009766f, - 67.13031005859375f, - -62.773399353027344f, - -65.17326354980469f, - -119.4058609008789f, - -181.1643829345703f, - -18.88604164123535f, - 108.04225158691406f, - -180.22561645507812f, - 57.99738311767578f, - -27.24393081665039f, - -85.58831787109375f, - -159.4800262451172f, - 60.384918212890625f, - -19.268644332885742f, - -82.28206634521484f, - 34.856746673583984f, - 36.77248764038086f, - 174.96009826660156f, - -84.91166687011719f, - 38.32730484008789f, - -194.20938110351562f, - 53.525299072265625f, - -253.4040069580078f, - 75.84454345703125f, - 42.586177825927734f, - -65.00910949707031f, - 139.08119201660156f, - 89.42534637451172f, - 111.02899932861328f, - -177.09559631347656f, - 106.34598541259766f, - 4.885265827178955f, - 209.70419311523438f, - 102.48601531982422f, - 163.53994750976562f, - -240.7526092529297f, - -231.74630737304688f, - 85.8790054321289f, - -48.83012771606445f, - -14.494956016540527f, - 50.40317153930664f, - -3.5898590087890625f, - -255.33428955078125f, - -42.62025833129883f, - -268.8581848144531f, - -92.89601135253906f, - 189.10411071777344f, - -202.51751708984375f, - 41.35236358642578f, - 80.72704315185547f, - 49.50514221191406f, - 194.8304443359375f, - 67.56221771240234f, - -119.60233306884766f, - 140.35537719726562f, - 19.76124382019043f, - -7.212024211883545f, - -40.494110107421875f, - -26.146860122680664f, - -67.55929565429688f, - -124.36994934082031f, - 142.58139038085938f, - 70.3119888305664f, - -100.26668548583984f, - 247.9021759033203f, - 50.10114288330078f, - -166.2733154296875f, - -132.74000549316406f, - 73.29490661621094f, - -31.767108917236328f, - -8.095459938049316f, - -56.1514892578125f, - 1.6728588342666626f, - 64.86177825927734f, - 201.24720764160156f, - 44.31425857543945f, - -79.04727172851562f, - 77.4909896850586f, - -167.1195831298828f, - 149.49549865722656f, - 44.96175003051758f, - 22.878067016601562f, - 36.186370849609375f, - 101.68644714355469f, - 197.0634307861328f, - 80.28263092041016f, - 263.1461181640625f, - -136.82252502441406f, - -64.13824462890625f, - -127.88070678710938f, - -53.07760238647461f, - -38.42239761352539f, - -157.55482482910156f, - 62.477542877197266f, - 14.90542984008789f, - -24.007083892822266f, - -120.66967010498047f, - -156.54144287109375f, - -47.69419860839844f, - -13.861870765686035f, - -249.0836639404297f, - 155.2084197998047f, - -90.05982971191406f, - -38.34132385253906f, - -108.60028076171875f, - 294.6951599121094f, - -71.52525329589844f, - -68.72283935546875f, - -63.87487030029297f, - -68.7685317993164f, - 17.60015869140625f, - 184.40444946289062f, - 160.1727294921875f, - 45.90409851074219f, - 6.924811363220215f, - 177.18492126464844f, - -122.43462371826172f, - -313.75921630859375f, - 126.83333587646484f, - -26.18158531188965f, - -42.28096389770508f, - -229.7159423828125f, - -239.4152069091797f, - 20.892427444458008f, - -49.03478240966797f, - -247.34117126464844f, - 14.298881530761719f, - 10.22720718383789f, - -170.78298950195312f, - -0.20347070693969727f, - -104.31876373291016f, - 35.001930236816406f, - -77.7791519165039f, - -17.722915649414062f, - -87.74828338623047f, - 9.097053527832031f, - 42.380855560302734f, - 0.5503619313240051f, - 44.6552619934082f, - -60.78847885131836f, - 109.84307861328125f, - -107.83258819580078f, - -105.59030151367188f, - -192.1314239501953f, - -106.77371215820312f, - -109.34872436523438f, - 43.107398986816406f, - -54.129371643066406f, - 5.410218715667725f, - 50.18254470825195f, - -56.018707275390625f, - -79.51019287109375f, - -32.840545654296875f, - 56.35155487060547f, - 26.266202926635742f, - -193.1510009765625f, - 71.05934143066406f, - 86.17952728271484f, - 8.82178783416748f, - 208.0476531982422f, - 22.50674819946289f, - 315.8392028808594f, - 8.642745018005371f, - -168.06356811523438f, - 151.88966369628906f, - 105.09650421142578f, - -67.96394348144531f, - 21.7277774810791f, - 144.8367462158203f, - 60.80427551269531f, - -76.60871887207031f, - -33.02667236328125f, - -222.5613555908203f, - 73.507568359375f, - -159.29383850097656f, - -28.219532012939453f, - 11.770097732543945f, - -138.0048370361328f, - 114.90724182128906f, - -8.10396957397461f, - 12.715537071228027f, - -89.48042297363281f, - 2.3293402194976807f, - 1.3461135625839233f, - -51.01222229003906f, - 83.78270721435547f, - 115.69560241699219f, - -38.796875f, - 40.63555145263672f, - -11.338504791259766f, - 165.01864624023438f, - 128.85240173339844f, - -145.48365783691406f, - -31.586021423339844f, - 8.898234367370605f, - -38.66812515258789f, - 304.38134765625f, - -17.463777542114258f, - -29.357486724853516f, - 2.3181469440460205f, - 78.80506896972656f, - 109.3966064453125f, - -38.00449752807617f, - 211.4761962890625f, - 18.49785041809082f, - 92.4312744140625f, - 52.957489013671875f, - -114.36100769042969f, - -193.0952606201172f, - 92.93617248535156f, - -2.9809720516204834f, - 17.200666427612305f, - 87.44407653808594f, - -49.91697692871094f, - -45.632022857666016f, - 264.6940612792969f, - 149.47315979003906f, - 117.99211883544922f, - -73.39962005615234f, - 155.99913024902344f, - -2.365872621536255f, - 146.9036407470703f, - -154.1237030029297f, - 5.841152191162109f, - -179.5755615234375f, - 233.66294860839844f, - 11.12192440032959f, - -9.423892974853516f, - -22.0430965423584f, - 68.12129974365234f, - -43.38506317138672f, - 75.87794494628906f, - -53.93659591674805f, - -3.226472854614258f, - 135.43951416015625f, - -333.9949035644531f, - -141.47996520996094f, - 112.2854232788086f, - -183.3912353515625f, - -149.4689483642578f, - 209.281494140625f, - -15.698558807373047f, - 131.81045532226562f, - -83.71649932861328f, - 42.94691467285156f, - -61.98582077026367f, - -56.409210205078125f, - -31.598636627197266f, - 41.57673645019531f, - -20.41191864013672f, - 21.38064956665039f, - -27.18816375732422f, - -86.78993225097656f, - 107.39463806152344f, - 1.556180477142334f, - 75.82249450683594f, - -13.068035125732422f, - 22.255216598510742f, - -284.30084228515625f, - 19.261274337768555f, - 131.0461883544922f, - -133.93312072753906f, - 141.090087890625f, - -19.26228904724121f, - 89.3337173461914f, - -30.812822341918945f, - -65.74747467041016f, - 41.438026428222656f, - -77.89169311523438f, - -76.03744506835938f, - 117.61126708984375f, - -224.61460876464844f, - 187.72314453125f, - 138.2435760498047f, - -249.70970153808594f, - -81.5892105102539f, - -158.05593872070312f, - 62.37664031982422f, - -41.13130569458008f, - -14.999872207641602f, - 49.11982727050781f, - 312.8498229980469f, - 26.108434677124023f, - -115.0377426147461f, - -120.07964324951172f, - 28.61427116394043f, - 39.45253372192383f, - -82.32657623291016f, - -228.00531005859375f, - 163.6329803466797f, - 101.8342514038086f, - -28.19927406311035f, - -59.467063903808594f, - -48.61459732055664f, - -5.734185695648193f, - -62.037235260009766f, - 54.25400924682617f, - -138.86416625976562f, - -52.57071304321289f, - 28.12044334411621f, - 310.1101989746094f, - 45.119361877441406f, - 87.28075408935547f, - -10.705041885375977f, - -66.69117736816406f, - -0.46381324529647827f, - -33.68659591674805f, - -208.1666717529297f, - 16.515682220458984f, - -70.44041442871094f, - -75.04505920410156f, - 241.7930908203125f, - -39.15034484863281f, - -106.08187866210938f, - -139.19581604003906f, - 121.72572326660156f, - 7.031482219696045f, - -55.58714294433594f, - 121.48421478271484f, - -47.54794692993164f, - -46.38457489013672f, - 98.83296203613281f, - -197.46034240722656f, - -62.538578033447266f, - -20.493175506591797f, - -59.41530227661133f, - 39.36539077758789f, - -64.60511779785156f, - -0.8515464663505554f, - -92.49464416503906f, - -79.47791290283203f, - -29.38449478149414f, - 87.85887908935547f, - 152.54405212402344f, - 96.42279052734375f, - -114.98602294921875f, - -222.06919860839844f, - -83.46102142333984f, - 26.983272552490234f, - -44.08611297607422f, - 218.93521118164062f, - -74.8515853881836f, - -161.33145141601562f, - -227.23471069335938f, - 48.57564163208008f, - -105.96250915527344f, - -206.54971313476562f, - 98.10832214355469f, - -125.15953826904297f, - 119.8016586303711f, - 99.37368774414062f, - 79.8921890258789f, - -73.67032623291016f, - -44.51357650756836f, - -23.366985321044922f, - -8.54898738861084f, - -156.1937713623047f, - -293.222900390625f, - 194.49366760253906f, - -142.75254821777344f, - 3.9505903720855713f, - 86.56517028808594f, - 43.396114349365234f, - -103.13311767578125f, - -39.961490631103516f, - -26.91535186767578f, - 139.69610595703125f, - -88.43083953857422f, - -73.12378692626953f, - -14.680524826049805f, - -47.00327682495117f, - -113.6279296875f, - 85.4216079711914f, - -189.7412109375f, - 320.2787170410156f, - -2.6356499195098877f, - 53.53076171875f, - -33.93054962158203f, - -0.27021336555480957f, - -8.65054988861084f, - -13.951908111572266f, - -139.52134704589844f, - -66.98359680175781f, - 68.11567687988281f, - -225.55091857910156f, - 219.04132080078125f, - -99.8638916015625f, - 172.65960693359375f, - -107.35255432128906f, - -255.43182373046875f, - -102.2039794921875f, - -139.47637939453125f, - 26.18199348449707f, - 103.39767456054688f, - 30.30826187133789f, - -46.62534713745117f, - 250.52838134765625f, - 203.4808807373047f, - -42.22444152832031f, - -20.038286209106445f, - 190.86148071289062f, - 68.15234375f, - 42.235538482666016f, - -29.986183166503906f, - 83.13088989257812f, - 131.00238037109375f, - -72.70922088623047f, - 44.12586212158203f, - -23.398818969726562f, - -133.02188110351562f, - 17.0338134765625f, - 76.3660888671875f, - -41.137386322021484f, - -22.563138961791992f, - 113.41226196289062f, - -61.34068298339844f, - 161.10055541992188f, - 36.345985412597656f, - 37.33025360107422f, - 102.1360855102539f, - -31.668241500854492f, - 23.508710861206055f, - 41.72523498535156f, - 84.09330749511719f, - 11.673369407653809f, - 145.71868896484375f, - 94.37517547607422f, - -124.03079223632812f, - -93.1677017211914f, - 56.02840042114258f, - 104.00797271728516f, - -153.0567169189453f, - -29.876150131225586f, - 154.9228515625f, - -8.432499885559082f, - 45.505455017089844f, - 90.9654312133789f, - 23.51136016845703f, - -134.15127563476562f, - -96.28108215332031f, - 85.21797943115234f, - -79.01165008544922f, - -34.771522521972656f, - -64.6953125f, - 323.95269775390625f, - -28.56310272216797f, - -72.33696746826172f, - 225.65081787109375f, - -168.097412109375f, - -19.08945655822754f, - -33.46170425415039f, - 199.44422912597656f, - 319.9913635253906f, - -196.61436462402344f, - -3.5490829944610596f, - -32.589351654052734f, - 85.96110534667969f, - 97.52603149414062f, - -41.00606155395508f, - 146.2366485595703f, - -269.1156311035156f, - -15.457045555114746f, - -34.35690689086914f, - -117.60065460205078f, - -278.73382568359375f, - 271.447021484375f, - 228.63180541992188f, - -10.905858039855957f, - 127.27762603759766f, - -155.43377685546875f, - -73.31597900390625f, - 28.972776412963867f, - -38.188270568847656f, - -127.81822204589844f, - 118.17975616455078f, - -49.494728088378906f, - -151.9487762451172f, - 84.42776489257812f, - -58.65167999267578f, - -64.57534790039062f, - -192.86288452148438f, - -352.2412109375f, - 82.50288391113281f, - 70.57123565673828f, - 83.0187759399414f, - 28.126449584960938f, - -5.961607933044434f, - -17.430923461914062f, - 8.66455364227295f, - 126.03501892089844f, - -104.26994323730469f, - 112.5761489868164f, - -36.26558303833008f, - 45.606727600097656f, - 4.59487771987915f, - 20.76837730407715f, - -21.595640182495117f, - -1.6269980669021606f, - 193.50173950195312f, - 105.55724334716797f, - -50.26815414428711f, - -93.21055603027344f, - -132.32032775878906f, - -82.77812194824219f, - 129.45541381835938f, - -72.15545654296875f, - 241.36087036132812f, - -55.44118118286133f, - 37.212764739990234f, - -126.32389068603516f, - 103.30265045166016f, - 108.17375183105469f, - 15.69797134399414f, - -168.99253845214844f, - 33.36368179321289f, - 75.35752868652344f, - -31.316875457763672f, - 289.2848815917969f, - -58.30707550048828f, - 3.0277023315429688f, - 57.63889694213867f, - -56.961402893066406f, - -73.54338836669922f, - -152.86378479003906f, - 72.8802719116211f, - -78.78392028808594f, - 5.551248550415039f, - 102.1882095336914f, - -12.372392654418945f, - 35.12169647216797f, - 13.023275375366211f, - -32.699363708496094f, - 84.06828308105469f, - -118.74812316894531f, - -15.617579460144043f, - 36.95216369628906f, - -98.35597229003906f, - 2.6237080097198486f, - 144.97764587402344f, - -102.6312255859375f, - 74.2625503540039f, - -137.870361328125f, - 76.1920394897461f, - 216.6727294921875f, - -42.4135627746582f, - 123.97212982177734f, - 280.18499755859375f, - 94.32286071777344f, - -48.4082145690918f, - 33.86771774291992f, - 247.26234436035156f, - 79.92304992675781f, - -4.578495979309082f, - -112.900146484375f, - 47.7380485534668f, - -15.120208740234375f, - -52.205902099609375f, - -37.073184967041016f, - 94.64490509033203f, - 67.50112915039062f, - 109.04624938964844f, - -112.78507995605469f, - 230.94007873535156f, - -132.72340393066406f, - 245.00465393066406f, - -74.5984878540039f, - -55.08383560180664f, - 153.66915893554688f, - -199.5551300048828f, - -23.83885955810547f, - 14.643333435058594f, - 7.824918270111084f, - 50.7313117980957f, - -156.41860961914062f, - 40.06352996826172f, - 47.176631927490234f, - -165.0926513671875f, - -189.6812744140625f, - -129.0862274169922f, - -44.684425354003906f, - -329.1989440917969f, - 10.748456001281738f, - 156.52296447753906f, - 105.1468734741211f, - -5.233786106109619f, - 39.888763427734375f, - -52.88683319091797f, - -70.18294525146484f, - 50.95967483520508f, - 57.3264045715332f, - -174.33384704589844f, - 68.8270492553711f, - 25.070758819580078f, - -170.18702697753906f, - 92.45588684082031f, - -125.59127807617188f, - 58.35456466674805f, - -9.600170135498047f, - 113.51036071777344f, - 183.36505126953125f, - 22.675222396850586f, - 157.54107666015625f, - -115.71614837646484f, - -72.87358856201172f, - -160.70028686523438f, - -69.5691146850586f, - 150.5703582763672f, - 8.418537139892578f, - 88.89569854736328f, - 180.59140014648438f, - 63.89979553222656f, - -21.135499954223633f, - -207.58445739746094f, - -70.88751220703125f, - -94.33464050292969f, - -137.97003173828125f, - 111.3921127319336f, - 19.499685287475586f, - -312.1326599121094f, - 150.3218231201172f, - -165.3433837890625f, - -194.05770874023438f, - -83.79118347167969f, - -100.244873046875f, - 45.19735336303711f, - -92.67762756347656f, - 196.86610412597656f, - -0.8776497840881348f, - -207.4056854248047f, - 254.46568298339844f, - 46.04387664794922f, - 25.560640335083008f, - 81.80233001708984f, - -87.09173583984375f, - -51.44218444824219f, - -109.4305191040039f, - -87.77290344238281f, - -72.5025634765625f, - -71.03917694091797f, - 81.08611297607422f, - 36.95709991455078f, - 11.51680850982666f, - 182.03323364257812f, - -212.05760192871094f, - -31.009309768676758f, - -118.43470001220703f, - -195.50328063964844f, - 8.653175354003906f, - -49.39155197143555f, - -29.82746124267578f, - 115.71063995361328f, - 76.51995086669922f, - -112.41429901123047f, - -100.7438735961914f, - 85.10140991210938f, - -22.545175552368164f, - -69.88835906982422f, - 106.33526611328125f, - -154.49871826171875f, - -9.744109153747559f, - 68.9471435546875f, - -54.2943000793457f, - -54.66527557373047f, - -56.717220306396484f, - -95.66127014160156f, - -168.70849609375f, - -42.78828430175781f, - -65.67985534667969f, - -79.04722595214844f, - 80.04450225830078f, - -8.90677547454834f, - 91.45559692382812f, - 50.986045837402344f, - -95.42179870605469f, - -154.49038696289062f, - -240.11776733398438f, - -231.41041564941406f, - -122.27427673339844f, - 202.0795440673828f, - -169.81570434570312f, - 43.557979583740234f, - -70.55223083496094f, - 47.67177200317383f, - -218.0948944091797f, - 264.9474792480469f, - 28.309864044189453f, - 14.498138427734375f, - 325.55743408203125f, - -46.86637496948242f, - 178.78509521484375f, - -9.062460899353027f, - 231.8539581298828f, - -77.2197494506836f, - -20.854524612426758f, - 91.53836822509766f, - -186.99847412109375f, - -99.1507339477539f, - -157.4418182373047f, - 9.972236633300781f, - -6.28125f, - 99.97623443603516f, - -30.271888732910156f, - -201.14129638671875f, - 25.63542938232422f, - -109.6525650024414f, - 223.48802185058594f, - 116.15115356445312f, - -32.53921890258789f, - -24.171070098876953f, - 112.84081268310547f, - -109.77652740478516f, - -32.968360900878906f, - 126.21894073486328f, - -202.82025146484375f, - 100.95269012451172f, - 209.97239685058594f, - 271.3077697753906f, - 73.13675689697266f, - -242.15115356445312f, - 36.10845947265625f, - -82.5064926147461f, - 56.885032653808594f, - -160.25302124023438f, - -10.426532745361328f, - 183.02537536621094f, - 60.179805755615234f, - 185.84054565429688f, - -173.71913146972656f, - -71.13534545898438f, - 31.757095336914062f, - 78.82966613769531f, - 125.65397644042969f, - -13.152907371520996f, - -214.11570739746094f, - 142.90155029296875f, - 70.62444305419922f, - -267.61810302734375f, - -1.0453978776931763f, - 203.69168090820312f, - 211.11415100097656f, - -16.614988327026367f, - -52.523948669433594f, - 26.819725036621094f, - -105.91644287109375f, - -58.762123107910156f, - 36.64875793457031f, - 94.743408203125f, - -171.47760009765625f, - -11.505159378051758f, - 184.693359375f, - -5.477065086364746f, - -1.5112241506576538f, - 146.1390380859375f, - -5.636251449584961f, - 93.68453216552734f, - -83.00010681152344f, - 135.9005584716797f, - -151.1685791015625f, - -36.205116271972656f, - 62.70370101928711f, - 18.818185806274414f, - -98.87806701660156f, - 92.42321014404297f, - 188.27024841308594f, - 44.4658317565918f, - -14.097587585449219f, - -143.35324096679688f, - 115.63300323486328f, - 36.08750534057617f, - -89.73224639892578f, - 23.46270179748535f, - -139.22267150878906f, - -1.9937586784362793f, - 46.304161071777344f, - -157.25872802734375f, - 163.53012084960938f, - -23.19097900390625f, - -27.195619583129883f, - 4.113766193389893f, - -198.32443237304688f, - -30.747196197509766f, - 33.31110382080078f, - -53.90256118774414f, - 1.8770688772201538f, - 297.3330078125f, - 103.519287109375f, - -100.82967376708984f, - 162.2014617919922f, - 51.338871002197266f, - -97.9660873413086f, - -133.7017822265625f, - -75.39482116699219f, - 141.11669921875f, - 28.595518112182617f, - -21.463674545288086f, - -202.76638793945312f, - -131.42169189453125f, - -40.25505447387695f, - -131.302734375f, - 53.51456832885742f, - -6.649378776550293f, - -107.54181671142578f, - -132.5585479736328f, - -38.88276290893555f, - 31.4736385345459f, - -71.8187255859375f, - 184.7156982421875f, - 83.46792602539062f, - -103.20989990234375f, - -25.910274505615234f, - 135.04061889648438f, - -8.511241912841797f, - 2.7380611896514893f, - -124.47846984863281f, - 129.15823364257812f, - 172.5139923095703f, - -6.357154846191406f, - -84.99467468261719f, - 82.92466735839844f, - -201.97467041015625f, - 104.57106018066406f, - 24.37415885925293f, - -152.82064819335938f, - 222.5098876953125f, - 94.37055969238281f, - -10.682762145996094f, - 6.171597003936768f, - -92.80384826660156f, - -231.78701782226562f, - 19.302297592163086f, - 184.43209838867188f, - 170.4208221435547f, - 110.23370361328125f, - -211.06044006347656f, - -85.98785400390625f, - -129.76553344726562f, - 26.75244140625f, - 98.64550018310547f, - -68.53064727783203f, - -64.21843719482422f, - 60.90932846069336f, - -58.2801399230957f, - -78.43678283691406f, - 46.193946838378906f, - -147.91542053222656f, - -69.74618530273438f, - -91.16255187988281f, - 30.660913467407227f, - 41.74235916137695f, - 122.64801788330078f, - -149.9815673828125f, - 10.667651176452637f, - -4.11725378036499f, - -49.88005065917969f, - 194.25961303710938f, - 10.81910514831543f, - 52.20478820800781f, - -194.96707153320312f, - 103.95787048339844f, - -56.2220458984375f, - 34.992122650146484f, - 24.17713737487793f, - -67.56669616699219f, - 98.42030334472656f, - 5.869609355926514f, - -185.66201782226562f, - -22.619779586791992f, - -99.78618621826172f, - -29.17544937133789f, - -2.734194040298462f, - 90.64649963378906f, - 137.74049377441406f, - -76.53794860839844f, - -33.14202117919922f, - 176.7542724609375f, - 114.79841613769531f, - -21.431236267089844f, - -126.58631896972656f, - -0.08337810635566711f, - 20.011159896850586f, - -205.6652069091797f, - -36.403812408447266f, - 106.2046127319336f, - -41.90856170654297f, - -27.340274810791016f, - 171.6712188720703f, - -83.71830749511719f, - -104.601318359375f, - 15.443634986877441f, - -325.96636962890625f, - -90.53739166259766f, - 194.4698944091797f, - -38.57292556762695f, - 87.65369415283203f, - -78.615478515625f, - -41.28123474121094f, - -80.71792602539062f, - -206.88754272460938f, - 168.9356231689453f, - -10.628336906433105f, - 203.1552276611328f, - 201.97047424316406f, - 16.276418685913086f, - -230.58273315429688f, - -72.89488983154297f, - 138.2285919189453f, - -58.731693267822266f, - -170.4008331298828f, - 44.836326599121094f, - 150.89585876464844f, - 57.03733825683594f, - -29.133432388305664f, - -55.0482177734375f, - -390.0728454589844f, - -33.58816146850586f, - -39.95915222167969f, - 26.748714447021484f, - -66.64178466796875f, - -23.946762084960938f, - 129.82733154296875f, - -37.920310974121094f, - 96.69943237304688f, - -111.68416595458984f, - 45.59977340698242f, - -53.79838180541992f, - 99.45844268798828f, - -146.32188415527344f, - 211.4451141357422f, - -143.8907470703125f, - 98.45091247558594f, - 154.1499481201172f, - -100.17235565185547f, - 22.179594039916992f, - 15.100101470947266f, - -194.14572143554688f, - 126.51385498046875f, - -98.1740951538086f, - 107.97393798828125f, - -61.86323165893555f, - 253.83168029785156f, - -8.708842277526855f, - 0.42567136883735657f, - 111.22328186035156f, - -97.56349182128906f, - 78.77286529541016f, - -99.5571060180664f, - -18.407638549804688f, - 4.615132808685303f, - 234.16668701171875f, - -119.48499298095703f, - 111.75059509277344f, - 84.89192962646484f, - 104.53251647949219f, - 65.72364807128906f, - -56.023189544677734f, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2.h b/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2.h deleted file mode 100644 index e6a660a0..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2.h +++ /dev/null @@ -1,11 +0,0 @@ -// See LICENSE for license details. - -#ifndef __FFT2_H -#define __FFT2_H - -#include - -extern void fft2(float[], float[], const float[], const float[], size_t, - size_t); - -#endif /* __FFT2_H */ diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2_gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2_gendata.py deleted file mode 100644 index 7eb4cb8f..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2_gendata.py +++ /dev/null @@ -1,71 +0,0 @@ -#!/usr/bin/env python3 - -import numpy -import argparse - -parser = argparse.ArgumentParser(description="Generate fft2 dataset") -parser.add_argument( - "log2size", metavar="M", type=int, nargs="?", default=6, help="log2(FFT size)" -) -args = parser.parse_args() - -M = args.log2size -N = 1 << M - -dtype = numpy.float32 -info = numpy.finfo(dtype) -nmant = 8 # Limit precision to avoid rounding errors -maxmant = 1 << nmant -minexp = 1 - nmant # info.minexp / 2 -maxexp = -3 # (info.maxexp / 2) - nmant - - -# Generate floating-point values with exact mantissa and exponent -def randf(n): - return numpy.ldexp( - numpy.random.randint(maxmant, size=n), - numpy.random.randint(minexp, maxexp, size=n), - ) - - -Xr = randf(N).astype(dtype) -Xi = randf(N).astype(dtype) -X = Xr.astype(numpy.complex128 if dtype == numpy.float64 else numpy.complex64) -X.imag = Xi -Y = numpy.fft.fft(X) - -# Precompute "long weight vector" (Baily 1987), which stores twiddle -# factors for each stage separately to enable unit-stride access -# -# for stage p = 1 to M -# for k = 0 to N/(2^p) - 1 -# A = k * 2^{p-1} -# W = e^{-j 2\pi A / N} = e^{-j \pi k 2^p / N} -# -# Omit final two stages (i = 1, 0) with trivial twiddle factor -# components {-1, 0, 1} -angles = [] -for i in range(M - 1, -1, -1): - size = 1 << i # N / 2^p - for k in range(0, size): - angles.append((-k / size) * numpy.pi) - -Wr = numpy.cos(angles).astype(dtype) -Wi = numpy.sin(angles).astype(dtype) - - -def print_array(name, data, data_size="DATA_SIZE", fold=10): - print("float {}[{}] = {{".format(name, data_size)) - for i in range(0, len(data), fold): - print(" ", ", ".join("{}f".format(x) for x in data[i : i + fold]), ",", sep="") - print("};") - - -print("#define LOG2_DATA_SIZE {}".format(M)) -print("#define DATA_SIZE {}".format(N)) -print_array("input_Xr", Xr) -print_array("input_Xi", Xi) -print_array("input_Wr", Wr, "DATA_SIZE-1") -print_array("input_Wi", Wi, "DATA_SIZE-1") -print_array("verify_Xr", Y.real.astype(dtype)) -print_array("verify_Xi", Y.imag.astype(dtype)) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2_main.c b/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2_main.c deleted file mode 100644 index 1567c577..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fft/fft2_main.c +++ /dev/null @@ -1,54 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// fft2 benchmark -//-------------------------------------------------------------------------- -// - -#include "fft2.h" -#include "util.h" - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -int main(int argc, char *argv[]) { -#if PREALLOCATE - for (size_t i = 0; i < DATA_SIZE - 1; i++) { - volatile float tmp; - tmp = input_Xr[i]; - tmp = input_Xi[i]; - tmp = input_Wr[i]; - tmp = input_Wi[i]; - } -#endif - - // Do the FFT - setStats(1); - fft2(input_Xr, input_Xi, input_Wr, input_Wi, DATA_SIZE, LOG2_DATA_SIZE); - setStats(0); - -#define VERIFY -#ifdef VERIFY -#define FFT_MAX_ERROR (10e-8f) - // Check the results - { - size_t i; - for (i = 0; i < DATA_SIZE; i++) { - float rdiff, idiff, err; - rdiff = input_Xr[i] - verify_Xr[i]; - idiff = input_Xi[i] - verify_Xi[i]; - - err = (rdiff * rdiff) + (idiff * idiff); - if (err > FFT_MAX_ERROR) { - return (i + 1); - } - } - } -#endif - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-fft/vec-fft2.c b/bb-tests/workloads/src/CTest/rvv/vec-fft/vec-fft2.c deleted file mode 100644 index b93aeae3..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-fft/vec-fft2.c +++ /dev/null @@ -1,225 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Vectorized decimation-in-frequency radix-2 FFT -//-------------------------------------------------------------------------- -#include "fft2.h" - -void fft2(float Xr[], float Xi[], const float Wr[], const float Wi[], size_t N, - size_t M) { - { - size_t N1, N2; - float *end; - - end = Xr + N; - - for (N1 = N; N1 > 4;) { - float *Xr0; - float *Xi0; - float *Xr1; - float *Xi1; - const float *wr; - const float *wi; - - N2 = N1 / 2; - - Xr0 = Xr; // Lower half - Xi0 = Xi; - Xr1 = Xr + N2; // Upper half - Xi1 = Xi + N2; - - // Iterate over butterfly groups - do { - size_t n; - - n = N2; - wr = Wr; - wi = Wi; - // Iterate over butterflies in group - do { - size_t vl; - __asm__ __volatile__("vsetvli %0, %1, e32, m4, ta, ma" - "\n\t" - : "=r"(vl) - : "r"(n)); - - __asm__ __volatile__("vle32.v v8, %0" - "\n\t" // ar - "vle32.v v12, %1" - "\n\t" // br - "vle32.v v16, %2" - "\n\t" // ai - "vle32.v v20, %3" - "\n\t" // bi - - "vfsub.vv v24, v8, v12" - "\n\t" // ar - br - "vle32.v v4, %5" - "\n\t" // Wi - "vfsub.vv v28, v16, v20" - "\n\t" // ai - bi - "vle32.v v0, %4" - "\n\t" // Wr - "vfadd.vv v16, v16, v20" - "\n\t" // ai' = ai + bi - "vfmul.vv v20, v24, v4" - "\n\t" // Wi * (ar - br) - "vfmul.vv v4, v28, v4" - "\n\t" // Wi * (ai - bi) - "vfadd.vv v8, v8, v12" - "\n\t" // ar' = ar + br - - "vse32.v v16, %2" - "\n\t" // ai' - - "vfmadd.vv v28, v0, v20" - "\n\t" // bi' = Wr * (ai - bi) + Wi * (ar - br) - "vfmsub.vv v24, v0, v4" - "\n\t" // br' = Wr * (ar - br) - Wi * (ai - bi) - - "vse32.v v8, %0" - "\n\t" // ar' - "vse32.v v28, %3" - "\n\t" // bi' - "vse32.v v24, %1" - "\n\t" // br' - - : - : "A"(*Xr0), "A"(*Xr1), "A"(*Xi0), "A"(*Xi1), - "A"(*wr), "A"(*wi)); - - n -= vl; - wr += vl; - wi += vl; - Xr0 += vl; - Xi0 += vl; - Xr1 += vl; - Xi1 += vl; - - } while (n > 0); - - Xr0 = Xr1; - Xi0 = Xi1; - Xr1 += N2; - Xi1 += N2; - - } while (Xr1 < end); - - Wr = wr; - Wi = wi; - N1 = N2; - } - } - - { - float *xr; - float *xi; - size_t n; - - /* Stage M-2 */ - xr = Xr; - xi = Xi; - n = N / 4; - do { - size_t vl; - __asm__ __volatile__("vsetvli %0, %1, e32, m2, ta, ma" - "\n\t" - : "=r"(vl) - : "r"(n)); - - __asm__ __volatile__("vlseg4e32.v v0, %0" - "\n\t" - "vlseg4e32.v v8, %1" - "\n\t" - - "vfadd.vv v16, v0, v4" - "\n\t" // xr[0] + xr[2] - "vfadd.vv v18, v2, v6" - "\n\t" // xr[1] + xr[3] - "vfsub.vv v20, v0, v4" - "\n\t" // xr[0] - xr[2] - "vfsub.vv v22, v10, v14" - "\n\t" // xi[1] - xi[3] - - "vfsub.vv v30, v6, v2" - "\n\t" // xr[3] - xr[1] - "vfadd.vv v24, v8, v12" - "\n\t" // xi[0] + xi[2] - "vfadd.vv v26, v10, v14" - "\n\t" // xi[1] + xi[3] - "vfsub.vv v28, v8, v12" - "\n\t" // xi[0] - xi[2] - - "vsseg4e32.v v16, %0" - "\n\t" - "vsseg4e32.v v24, %1" - "\n\t" - : - : "A"(*xr), "A"(*xi)); - - n -= vl; - xr += 4 * vl; - xi += 4 * vl; - } while (n > 0); - - /* Stage M-1 */ - xr = Xr; - xi = Xi; - n = N / 2; - do { - size_t vl; - __asm__ __volatile__("vsetvli %0, %1, e32, m4, ta, ma" - "\n\t" - : "=r"(vl) - : "r"(n)); - - __asm__ __volatile__("vlseg2e32.v v0, %0" - "\n\t" - "vlseg2e32.v v8, %1" - "\n\t" - - "vfadd.vv v16, v0, v4" - "\n\t" // xr[0] + xr[1] - "vfsub.vv v20, v0, v4" - "\n\t" // xr[0] - xr[1] - "vfadd.vv v24, v8, v12" - "\n\t" // xi[0] + xi[1] - "vfsub.vv v28, v8, v12" - "\n\t" // xi[0] - xi[1] - - "vsseg2e32.v v16, %0" - "\n\t" - "vsseg2e32.v v24, %1" - "\n\t" - : - : "A"(*xr), "A"(*xi)); - - n -= vl; - xr += 2 * vl; - xi += 2 * vl; - } while (n > 0); - } - - /* Bit-reversal unscrambler */ - { - size_t i, j, b; - size_t N1, N2; - N1 = N - 1; - N2 = N >> 1; - for (i = 0, j = 0; i < N1; i++) { - if (i < j) { - float z; - z = Xr[j]; - Xr[j] = Xr[i]; - Xr[i] = z; - - z = Xi[j]; - Xi[j] = Xi[i]; - Xi[i] = z; - } - b = ~i & (i + 1); - b = N2 / b; - j ^= N1 & ~(b - 1); - } - } -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/gen_data.py deleted file mode 100644 index 9f43f697..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/gen_data.py +++ /dev/null @@ -1,127 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: image size, arg2: filter size - -import numpy as np -import sys - - -def convolve2D(kernel, image, padding): - # Default stride - strides = 1 - - # Gather Shapes of Kernel + Image + Padding - xKernShape = kernel.shape[0] - yKernShape = kernel.shape[1] - xImgShape = image.shape[0] - yImgShape = image.shape[1] - - # Shape of Output Convolution - xOutput = xImgShape - xKernShape + 1 - yOutput = yImgShape - yKernShape + 1 - output = np.zeros((xOutput, yOutput)) - - # Iterate through image - for y in range(image.shape[1]): - # Exit Convolution - if y > image.shape[1] - yKernShape: - break - # Only Convolve if y has gone down by the specified Strides - if y % strides == 0: - for x in range(image.shape[0]): - # Go to next row once kernel is out of bounds - if x > image.shape[0] - xKernShape: - break - try: - # Only Convolve if x has moved by the specified Strides - if x % strides == 0: - output[x, y] = ( - kernel * image[x : x + xKernShape, y : y + yKernShape] - ).sum() - except Exception: - break - - return output - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# Define the filter size and the matrix dimension (max, for now, is 128 64-bit elements) -if len(sys.argv) > 1: - matrix_width = int(sys.argv[1]) - assert ( - matrix_width <= 128 - ), "The width of the image cannot be greater than 128 64-bit \ - elements. If this is not enough, modify the algorithm." - F = int(sys.argv[2]) - # Filter size must be odd - assert F % 2 == 1, "The filter size must be an odd integer number" -else: - matrix_width = 64 - F = 3 - -dtype = np.int64 -MIN_DTYPE = -(2**20) -MAX_DTYPE = +(2**20) - -# Input image. Take a square image -M = matrix_width -N = matrix_width -padding = int(F / 2) -M_pad = M + 2 * padding -N_pad = N + 2 * padding -assert ( - M % 4 == 0 -), "Output image dimension must be divisible by 4, pad the input image accordingly" -assert ( - N % 4 == 0 -), "Output image dimension must be divisible by 4, pad the input image accordingly" - -# Generate a random int64 input padded image -image = np.random.randint(MIN_DTYPE, MAX_DTYPE, M_pad * N_pad, dtype).reshape( - M_pad, N_pad -) - -# Generate a random int64 filter -gen_filter = np.random.randint(MIN_DTYPE, MAX_DTYPE, F * F, dtype).reshape(F, F) - -# Create the empty o matrix -empty_o = np.zeros((M, N)).astype(np.int64) - -# Calculate the output matrix -result = np.around(convolve2D(gen_filter, image, padding)).astype(np.int64) - -# Print information on file -print('.section .data,"aw",@progbits') -emit("M", np.array(M, dtype=np.uint64)) -emit("N", np.array(N, dtype=np.uint64)) -emit("F", np.array(F, dtype=np.uint64)) -emit("i", image, "NR_LANES*4") -emit("f", gen_filter, "NR_LANES*4") -emit("o", empty_o, "NR_LANES*4") -emit("golden_o", result, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d.h b/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d.h deleted file mode 100644 index 1234e3ff..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d.h +++ /dev/null @@ -1,47 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#ifndef ICONV2D_H -#define ICONV2D_H - -#include - -void iconv2d_3x3(int64_t *o, int64_t *i, int64_t *f, int64_t R, int64_t C, - int64_t F); -void iconv2d_vec_4xC_slice_init_3x3(int64_t *o, int64_t C); -void iconv2d_vec_4xC_slice_preload_3x3(int64_t *i, int64_t C, int64_t F); -void iconv2d_vec_4xC_slice_move_3x3(int64_t C, int64_t F); -void iconv2d_vec_4xC_3x3(int64_t *o, int64_t *i, int64_t *f, int64_t C, - int64_t F); - -void iconv2d_5x5(int64_t *o, int64_t *i, int64_t *f, int64_t R, int64_t C, - int64_t F); -void iconv2d_vec_4xC_slice_init_5x5(int64_t *o, int64_t C); -void iconv2d_vec_4xC_slice_preload_5x5(int64_t *i, int64_t C, int64_t F); -void iconv2d_vec_4xC_slice_move_5x5(int64_t C, int64_t F); -void iconv2d_vec_4xC_5x5(int64_t *o, int64_t *i, int64_t *f, int64_t C, - int64_t F); - -void iconv2d_7x7(int64_t *o, int64_t *i, int64_t *f, int64_t M, int64_t N, - int64_t F); -void iconv2d_7x7_block(int64_t *o, int64_t *i, int64_t *f, int64_t R, int64_t C, - int64_t n_, int64_t F); - -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_3x3.c b/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_3x3.c deleted file mode 100644 index f8176fee..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_3x3.c +++ /dev/null @@ -1,321 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "iconv2d.h" -#include - -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - -void iconv2d_3x3(int64_t *o, int64_t *i, int64_t *f, int64_t R, int64_t C, - int64_t F) { - // We work on 4 rows of the output matrix at once - int64_t block_size_o = 4; - // We work on block_size_o + F - 1 rows of the input matrix at once - - // First iteration round, r = 0 - int64_t *i_ = i; - int64_t *o_ = o; - - // Preload the first two input rows -> This is not needed in the other rounds - iconv2d_vec_4xC_slice_preload_3x3(i_, C, F); - // The first F-1 rows have already been loaded by - // iconv2d_vec_4xC_slice_preload_3x3() - int64_t *i__ = i_ + (F - 1) * (C + F - 1); - iconv2d_vec_4xC_3x3(o_, i__, f, C, F); - // Re-use some of the already-loaded input rows - iconv2d_vec_4xC_slice_move_3x3(C, F); - - i_ = i + block_size_o * (C + F - 1); - i__ = i_ + (F - 1) * (C + F - 1); - - int64_t ldi = (C + F - 1) << 3; - int64_t ldf = F << 3; - - // Temporary variables - int64_t t0, t1, t2; - // Helper variables - int64_t *f_; - f_ = f; - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t0) : "r"(ldf)); - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t1) : "r"(ldf)); - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t2)); - - // Iterate over the output rows - for (int64_t r = block_size_o; r < R; r += block_size_o) { - - // The first F-1 rows have already been loaded by - // iconv2d_vec_4xC_slice_init() - - int64_t t3, t4, t5; - - // Fetch C + F - 1 elements (padding included) - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - f_ = f; - - // Fetch the first column of the filter, and start calculating its - // contribution on the four output rows (v0, v2, v4, v6) - - // Fetch 4 + F - 1 - 2 rows of the input matrix - // Compute on C + F - 1 elements, instead of C elements, to cover the - // latency of the load instructions - asm volatile("vmv.v.v v8, v16"); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - asm volatile("vmul.vx v0, v8, %0" ::"r"(t0)); - - asm volatile("vmv.v.v v10, v18"); - asm volatile("vmul.vx v2, v10, %0" ::"r"(t0)); - asm volatile("vle64.v v14, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - asm volatile("vmacc.vx v0, %0, v10" ::"r"(t1)); - - asm volatile("vmacc.vx v2, %0, v12" ::"r"(t1)); - asm volatile("vle64.v v16, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - asm volatile("vmacc.vx v0, %0, v12" ::"r"(t2)); - asm volatile("vslidedown.vi v20, v8, 1"); - asm volatile("vmul.vx v4, v12, %0" ::"r"(t0)); - - asm volatile("vle64.v v18, (%0); add %0, %0, %1" : "+&r"(i__) : "r"(ldi)); - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C)); - - asm volatile("vmul.vx v6, v14, %0" ::"r"(t0)); - asm volatile("vslidedown.vi v22, v10, 1"); - asm volatile("vmacc.vx v4, %0, v14" ::"r"(t1)); - asm volatile("vmacc.vx v2, %0, v14" ::"r"(t2)); - asm volatile("vslidedown.vi v24, v12, 1"); - - asm volatile("vmacc.vx v6, %0, v16" ::"r"(t1)); - asm volatile("vmacc.vx v4, %0, v16" ::"r"(t2)); - - asm volatile("vslidedown.vi v26, v14, 1"); - - asm volatile("vmacc.vx v6, %0, v18" ::"r"(t2)); - - f_ = f + 1; - // Fetch the middle column of the filter, and start calculating its - // contributions on the output rows To do so, slide down the input rows by - // one - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t3) - : "r"(ldf)); - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t4) - : "r"(ldf)); - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t5)); - - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t3)); - - asm volatile("vmacc.vx v0, %0, v22" ::"r"(t4)); - asm volatile("vslidedown.vi v28, v16, 1"); - asm volatile("vmacc.vx v2, %0, v22" ::"r"(t3)); - - i_ = i + (r + block_size_o) * (C + F - 1); - asm volatile("vmacc.vx v0, %0, v24" ::"r"(t5)); - asm volatile("vslidedown.vi v30, v18, 1"); - asm volatile("vmacc.vx v2, %0, v24" ::"r"(t4)); - asm volatile("vmacc.vx v4, %0, v24" ::"r"(t3)); - asm volatile("vslidedown.vi v20, v8, 2"); - - asm volatile("vmacc.vx v2, %0, v26" ::"r"(t5)); - asm volatile("vmacc.vx v4, %0, v26" ::"r"(t4)); - asm volatile("vslidedown.vi v22, v10, 2"); - asm volatile("vmacc.vx v6, %0, v26" ::"r"(t3)); - i__ = i_ + (F - 1) * (C + F - 1); - - asm volatile("vmacc.vx v4, %0, v28" ::"r"(t5)); - f_ = f + 2; - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t3) - : "r"(ldf)); - asm volatile("vmacc.vx v6, %0, v28" ::"r"(t4)); - asm volatile("vslidedown.vi v24, v12, 2"); - - asm volatile("vmacc.vx v6, %0, v30" ::"r"(t5)); - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t3)); - asm volatile("vslidedown.vi v26, v14, 2"); - - // Repeat for the last filter column, and then store the output rows - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t4) - : "r"(ldf)); - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t5)); - - asm volatile("vmacc.vx v0, %0, v22" ::"r"(t4)); - o_ = o + r * C; - - // Compute on C elements - int64_t ldo = C << 3; - asm volatile("vmacc.vx v2, %0, v22" ::"r"(t3)); - asm volatile("vslidedown.vi v28, v16, 2"); - - asm volatile("vmacc.vx v0, %0, v24" ::"r"(t5)); - asm volatile("vmacc.vx v2, %0, v24" ::"r"(t4)); - asm volatile("vslidedown.vi v30, v18, 2"); - asm volatile("vse64.v v0, (%0); add %0, %0, %1" : "+&r"(o_) : "r"(ldo)); - asm volatile("vmacc.vx v4, %0, v24" ::"r"(t3)); - - asm volatile("vmacc.vx v2, %0, v26" ::"r"(t5)); - asm volatile("vse64.v v2, (%0); add %0, %0, %1" : "+&r"(o_) : "r"(ldo)); - asm volatile("vmacc.vx v4, %0, v26" ::"r"(t4)); - asm volatile("vmacc.vx v6, %0, v26" ::"r"(t3)); - - asm volatile("vmacc.vx v4, %0, v28" ::"r"(t5)); - asm volatile("vse64.v v4, (%0); add %0, %0, %1" : "+&r"(o_) : "r"(ldo)); - asm volatile("vmacc.vx v6, %0, v28" ::"r"(t4)); - - asm volatile("vmacc.vx v6, %0, v30" ::"r"(t5)); - asm volatile("vse64.v v6, (%0);" : "+r"(o_)); - } -} - -// Load 4 rows of the output matrix -void iconv2d_vec_4xC_slice_preload_3x3(int64_t *i, int64_t C, int64_t F) { - // Helper variables - int64_t ldi = (C + F - 1) << 3; - - // Set the vector configuration - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Fetch the first floor(F/2) + 1 input rows - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+r"(i)); -} - -// Calculate 4 output matrix rows -void iconv2d_vec_4xC_3x3(int64_t *o, int64_t *i, int64_t *f, int64_t C, - int64_t F) { - - // Temporary variables - int64_t t0, t1, t2; - - // Helper variables - int64_t ldo = C << 3; - int64_t ldi = (C + F - 1) << 3; - int64_t ldf = F << 3; - int64_t *f_; - - // Fetch C + F - 1 elements (padding included) - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - f_ = f; - // Fetch the first column of the filter, and start calculating its - // contribution on the four output rows (v0, v2, v4, v6) - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t0) : "r"(ldf)); - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t1) : "r"(ldf)); - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t2)); - - // Fetch 4 + F - 1 - 2 rows of the input matrix - // Compute on C + F - 1 elements, instead of C elements, to cover the latency - // of the load instructions - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vmul.vx v0, v8, %0" ::"r"(t0)); - - asm volatile("vmul.vx v2, v10, %0" ::"r"(t0)); - asm volatile("vle64.v v14, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vmacc.vx v0, %0, v10" ::"r"(t1)); - - asm volatile("vmacc.vx v2, %0, v12" ::"r"(t1)); - asm volatile("vle64.v v16, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vmacc.vx v0, %0, v12" ::"r"(t2)); - asm volatile("vslidedown.vi v20, v8, 1"); - asm volatile("vmul.vx v4, v12, %0" ::"r"(t0)); - - asm volatile("vle64.v v18, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C)); - - asm volatile("vmul.vx v6, v14, %0" ::"r"(t0)); - asm volatile("vmacc.vx v4, %0, v14" ::"r"(t1)); - asm volatile("vslidedown.vi v22, v10, 1"); - asm volatile("vmacc.vx v2, %0, v14" ::"r"(t2)); - - asm volatile("vmacc.vx v6, %0, v16" ::"r"(t1)); - asm volatile("vmacc.vx v4, %0, v16" ::"r"(t2)); - - asm volatile("vslidedown.vi v24, v12, 1"); - asm volatile("vmacc.vx v6, %0, v18" ::"r"(t2)); - - f_ = f + 1; - // Fetch the middle column of the filter, and start calculating its - // contributions on the output rows To do so, slide down the input rows by one - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t0) : "r"(ldf)); - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t1) : "r"(ldf)); - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t2)); - - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t0)); - - asm volatile("vmacc.vx v0, %0, v22" ::"r"(t1)); - asm volatile("vslidedown.vi v26, v14, 1"); - asm volatile("vmacc.vx v2, %0, v22" ::"r"(t0)); - - asm volatile("vmacc.vx v0, %0, v24" ::"r"(t2)); - asm volatile("vmacc.vx v2, %0, v24" ::"r"(t1)); - asm volatile("vslidedown.vi v28, v16, 1"); - asm volatile("vmacc.vx v4, %0, v24" ::"r"(t0)); - - asm volatile("vmacc.vx v2, %0, v26" ::"r"(t2)); - asm volatile("vmacc.vx v4, %0, v26" ::"r"(t1)); - asm volatile("vslidedown.vi v30, v18, 1"); - asm volatile("vmacc.vx v6, %0, v26" ::"r"(t0)); - - asm volatile("vmacc.vx v4, %0, v28" ::"r"(t2)); - asm volatile("vslidedown.vi v20, v8, 2"); - asm volatile("vmacc.vx v6, %0, v28" ::"r"(t1)); - - asm volatile("vmacc.vx v6, %0, v30" ::"r"(t2)); - asm volatile("vslidedown.vi v22, v10, 2"); - - f_ = f + 2; - // Repeat for the last filter column, and then store the output rows - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t0) : "r"(ldf)); - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t1) : "r"(ldf)); - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t2)); - - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t0)); - - asm volatile("vmacc.vx v0, %0, v22" ::"r"(t1)); - asm volatile("vslidedown.vi v24, v12, 2"); - asm volatile("vmacc.vx v2, %0, v22" ::"r"(t0)); - - // Compute on C elements - - asm volatile("vmacc.vx v0, %0, v24" ::"r"(t2)); - asm volatile("vse64.v v0, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vslidedown.vi v26, v14, 2"); - asm volatile("vmacc.vx v2, %0, v24" ::"r"(t1)); - asm volatile("vmacc.vx v4, %0, v24" ::"r"(t0)); - - asm volatile("vmacc.vx v2, %0, v26" ::"r"(t2)); - asm volatile("vse64.v v2, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vslidedown.vi v28, v16, 2"); - asm volatile("vmacc.vx v4, %0, v26" ::"r"(t1)); - asm volatile("vmacc.vx v6, %0, v26" ::"r"(t0)); - - asm volatile("vmacc.vx v4, %0, v28" ::"r"(t2)); - asm volatile("vslidedown.vi v30, v18, 2"); - asm volatile("vse64.v v4, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v6, %0, v28" ::"r"(t1)); - - asm volatile("vmacc.vx v6, %0, v30" ::"r"(t2)); - asm volatile("vse64.v v6, (%0);" : "+r"(o)); -} - -void iconv2d_vec_4xC_slice_move_3x3(int64_t C, int64_t F) { - // Move C+F-1 elements - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Move the last floor(F/2) + 1 input rows - asm volatile("vmv.v.v v8, v16"); - asm volatile("vmv.v.v v10, v18"); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_5x5.c b/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_5x5.c deleted file mode 100644 index 07735a8b..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_5x5.c +++ /dev/null @@ -1,216 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include "iconv2d.h" -#include - -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - -void iconv2d_5x5(int64_t *o, int64_t *i, int64_t *f, int64_t R, int64_t C, - int64_t F) { - // We work on 2 rows of the output matrix at once - int64_t block_size_o = 2; - // We work on block_size_o + F - 1 rows of the input matrix at once - - // First iteration round, r = 0 - int64_t *i_ = i; - int64_t *o_ = o; - - // For simplicity, compute over the padding rows as well - iconv2d_vec_4xC_slice_init_5x5(o_, C); - // Preload the first two input rows -> This is not needed in the other rounds - iconv2d_vec_4xC_slice_preload_5x5(i_, C, F); - // The first (floor(F/2) + 1 = 2) rows have already been loaded by - // iconv2d_vec_4xC_slice_init() - int64_t *i__ = i_ + (F - 1) * (C + F - 1); - iconv2d_vec_4xC_5x5(o_, i__, f, C, F); - // Re-use some of the already-loaded input rows - iconv2d_vec_4xC_slice_move_5x5(C, F); - - // Iterate over the output rows - for (int64_t r = block_size_o; r < R; r += block_size_o) { - i_ = i + r * (C + F - 1); - o_ = o + r * C; - - // For simplicity, compute over the padding rows as well - iconv2d_vec_4xC_slice_init_5x5(o_, C); - // The first F-1 rows have already been loaded by - // iconv2d_vec_4xC_slice_init() - i__ = i_ + (F - 1) * (C + F - 1); - iconv2d_vec_4xC_5x5(o_, i__, f, C, F); - // Re-use some of the already-loaded input rows - iconv2d_vec_4xC_slice_move_5x5(C, F); - } -} - -// Load 4 rows of the output matrix -void iconv2d_vec_4xC_slice_init_5x5(int64_t *o, int64_t C) { - // Helper variables - int64_t ldo = C << 3; - - // Set the vector configuration - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C)); - // Fetch 2 output rows - asm volatile("vmv.v.i v0, 0; add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmv.v.i v2, 0;" : "+r"(o)); -} - -// Load 4 rows of the output matrix -void iconv2d_vec_4xC_slice_preload_5x5(int64_t *i, int64_t C, int64_t F) { - // Helper variables - int64_t ldi = (C + F - 1) << 3; - - // Set the vector configuration - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Fetch the first F-1 = 4 input rows - asm volatile("vle64.v v4, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+r"(i)); -} - -// Calculate 4 output matrix rows -void iconv2d_vec_4xC_5x5(int64_t *o, int64_t *i, int64_t *f, int64_t C, - int64_t F) { - - // Temporary variables (one filter column) - int64_t t0, t1, t2, t3, t4; - int64_t slamt; - - // Helper variables - int64_t ldo = C << 3; - int64_t ldi = (C + F - 1) << 3; - int64_t ldf = F << 3; - int64_t *f_; - - // Compute on C elements - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Fetch other 2 rows of the input matrix - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - asm volatile("vle64.v v14, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi)); - - // Compute on C elements - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C)); - f_ = f; - // Fetch the first column of the filter, and start calculating its - // contribution on the two output rows (v0, v2) - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t0) : "r"(ldf)); - asm volatile("vmacc.vx v0, %0, v4" ::"r"(t0)); - asm volatile("vmacc.vx v2, %0, v6" ::"r"(t0)); - - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t1) : "r"(ldf)); - asm volatile("vmacc.vx v0, %0, v6" ::"r"(t1)); - asm volatile("vmacc.vx v2, %0, v8" ::"r"(t1)); - - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t2) : "r"(ldf)); - asm volatile("vmacc.vx v0, %0, v8" ::"r"(t2)); - asm volatile("vmacc.vx v2, %0, v10" ::"r"(t2)); - - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t3) : "r"(ldf)); - asm volatile("vmacc.vx v0, %0, v10" ::"r"(t3)); - asm volatile("vmacc.vx v2, %0, v12" ::"r"(t3)); - - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t4)); - asm volatile("vmacc.vx v0, %0, v12" ::"r"(t4)); - asm volatile("vmacc.vx v2, %0, v14" ::"r"(t4)); - - for (int64_t idx = 1; idx < F - 1; ++idx) { - // Adjust filter mtx pointer and slide-amount - f_ = f + idx; - slamt = idx; - // Fetch the other columns of the filter (except for the last one), and - // start calculating their contributions on the two output rows (v0, v2) To - // do so, at each iteration slide down the input rows by one - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t0) - : "r"(ldf)); - asm volatile("vslidedown.vx v16, v4, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v16" ::"r"(t0)); - - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t1) - : "r"(ldf)); - asm volatile("vslidedown.vx v18, v6, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v18" ::"r"(t1)); - asm volatile("vmacc.vx v2, %0, v18" ::"r"(t0)); - - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t2) - : "r"(ldf)); - asm volatile("vslidedown.vx v20, v8, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t2)); - asm volatile("vmacc.vx v2, %0, v20" ::"r"(t1)); - - asm volatile("ld %1, (%0); add %0, %0, %2" - : "+&r"(f_), "=&r"(t3) - : "r"(ldf)); - asm volatile("vslidedown.vx v22, v10, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v22" ::"r"(t3)); - asm volatile("vmacc.vx v2, %0, v22" ::"r"(t2)); - - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t4)); - asm volatile("vslidedown.vx v24, v12, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v24" ::"r"(t4)); - asm volatile("vmacc.vx v2, %0, v24" ::"r"(t3)); - - asm volatile("vslidedown.vx v26, v14, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v2, %0, v26" ::"r"(t4)); - } - - f_ = f + (F - 1); - slamt = (F - 1); - // Repeat for the last filter column, and then store the output rows - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t0) : "r"(ldf)); - asm volatile("vslidedown.vx v16, v4, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v16" ::"r"(t0)); - - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t1) : "r"(ldf)); - asm volatile("vslidedown.vx v18, v6, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v18" ::"r"(t1)); - asm volatile("vmacc.vx v2, %0, v18" ::"r"(t0)); - - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t2) : "r"(ldf)); - asm volatile("vslidedown.vx v20, v8, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t2)); - asm volatile("vmacc.vx v2, %0, v20" ::"r"(t1)); - - asm volatile("ld %1, (%0); add %0, %0, %2" : "+&r"(f_), "=&r"(t3) : "r"(ldf)); - asm volatile("vslidedown.vx v22, v10, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v22" ::"r"(t3)); - asm volatile("vmacc.vx v2, %0, v22" ::"r"(t2)); - - asm volatile("ld %1, (%0);" : "+&r"(f_), "=&r"(t4)); - asm volatile("vslidedown.vx v24, v12, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v0, %0, v24" ::"r"(t4)); - asm volatile("vse64.v v0, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v2, %0, v24" ::"r"(t3)); - - asm volatile("vslidedown.vx v26, v14, %0" ::"r"(slamt)); - asm volatile("vmacc.vx v2, %0, v26" ::"r"(t4)); - asm volatile("vse64.v v2, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); -} - -void iconv2d_vec_4xC_slice_move_5x5(int64_t C, int64_t F) { - // Move C+F-1 elements - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(C + F - 1)); - // Move the last floor(F/2) + 1 input rows - asm volatile("vmv.v.v v4, v8"); - asm volatile("vmv.v.v v6, v10"); - asm volatile("vmv.v.v v8, v12"); - asm volatile("vmv.v.v v10, v14"); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_7x7.c b/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_7x7.c deleted file mode 100644 index 019b6540..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/iconv2d_7x7.c +++ /dev/null @@ -1,599 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -/* - Optimized convolution for Ara - The code is long only because of: - 1) Special cases related to the first/last 7 rows - 2) Unrolling of the loops to hide the latency of the moves, slides, mem ops - - At the end of the file, you can find the not-unrolled main loop in a comment, - without the edge-code. - - Algorithm: - a) Load the next input row - b) Calculate its contributions to the F = 7 output rows using one column of - the filter c) Slide down the input row by 1, injecting the next input scalar - element in the tail d) Repeat from b), taking the next colum of the filter, - until the last column is fetched e) Store the first output row, the one that - is complete f) Move all the output rows up by one register, to restore the - initial condition g) Repeat from a) - - Every time a new input row is loaded, a new output row is created. - - The first 6 rows and the last 6 rows do not follow this pattern, thus we wrote - dedicated code. Because of the unrolling, we counted for this the first and - last 7 rows, instead of 6 - - This algorithm helps in minimizing the data dependencies, as every input rows - is used To calculate 7 different output rows. -*/ - -#include "iconv2d.h" - -void iconv2d_7x7(int64_t *o, int64_t *i, int64_t *f, int64_t M, int64_t N, - int64_t F) { - - unsigned long int block_size_n; - - // Set the vector configuration - asm volatile("vsetvli %0, %1, e64, m2, ta, ma" : "=r"(block_size_n) : "r"(N)); - - // Slice the matrix into a manageable number of columns n_ - for (unsigned long int n = 0; n < N; n += block_size_n) { - // Set the vector length - const unsigned long int n_ = MIN(N - n, block_size_n); - - // Find pointers to the submatrices - const int64_t *i_ = i + n; - int64_t *o_ = o + n; - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(n_)); - - iconv2d_7x7_block(o_, i_, f, M, N, n_, F); - } -} - -void iconv2d_7x7_block(int64_t *o, int64_t *i, int64_t *f, int64_t R, int64_t C, - int64_t n_, int64_t F) { - - // Helper variables - int64_t ldo = C << 3; - int64_t ldi_pad = (C + F - 1) << 3; - - int64_t *i_ = i; - - int64_t t6, t13, t20, t27, t34, t41, t48; - - int64_t *i_slide_ptr_0; - int64_t *i_slide_ptr_1; - int64_t *i_slide_ptr_2; - int64_t *i_slide_ptr_3; - - // Buffer some of the filter coefficients not to lose efficiency after a - // vector store (CVA6 cannot issue memory operations if there is a pending - // store!) - t6 = f[6]; - t13 = f[13]; - t20 = f[20]; - t27 = f[27]; - t34 = f[34]; - t41 = f[41]; - t48 = f[48]; - - // Point to the scalar elements to insert during a slide - i_slide_ptr_0 = i + n_ + 0 * (C + F - 1); - i_slide_ptr_1 = i + n_ + 1 * (C + F - 1); - i_slide_ptr_2 = i + n_ + 2 * (C + F - 1); - i_slide_ptr_3 = i + n_ + 3 * (C + F - 1); - - //////////////// - // Row 0 -> 3 // - //////////////// - - // Load one input row - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vle64.v v4, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - if (k == 0) - asm volatile("vmul.vx v16, v0, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v16, %0, v0" ::"r"(f[0 + (2 * k)])); - if (k == 0) - asm volatile("vmul.vx v18, v4, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v18, %0, v4" ::"r"(f[0 + (2 * k)])); - asm volatile("vslide1down.vx v2, v0, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v16, %0, v4" ::"r"(f[7 + (2 * k)])); - if (k == 0) - asm volatile("vmul.vx v22, v12, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v22, %0, v12" ::"r"(f[0 + (2 * k)])); - asm volatile("vslide1down.vx v6, v4, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v18, %0, v8" ::"r"(f[7 + (2 * k)])); - asm volatile("vmacc.vx v16, %0, v8" ::"r"(f[14 + (2 * k)])); - asm volatile("vslide1down.vx v10, v8, %0" ::"r"(*i_slide_ptr_2++)); - if (k == 0) - asm volatile("vmul.vx v20, v8, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v20, %0, v8" ::"r"(f[0 + (2 * k)])); - asm volatile("vmacc.vx v18, %0, v12" ::"r"(f[14 + (2 * k)])); - asm volatile("vmacc.vx v16, %0, v12" ::"r"(f[21 + (2 * k)])); - asm volatile("vslide1down.vx v14, v12, %0" ::"r"(*i_slide_ptr_3++)); - asm volatile("vmacc.vx v20, %0, v12" ::"r"(f[7 + (2 * k)])); - - asm volatile("vmacc.vx v16, %0, v2" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v6" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v16, %0, v6" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v10" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vmacc.vx v20, %0, v10" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vslide1down.vx v4, v6, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v16, %0, v10" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v14" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vslide1down.vx v8, v10, %0" ::"r"(*i_slide_ptr_2++)); - asm volatile("vmacc.vx v22, %0, v14" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vmacc.vx v16, %0, v14" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vslide1down.vx v12, v14, %0" ::"r"(*i_slide_ptr_3++)); - asm volatile("vmacc.vx v20, %0, v14" ::"r"(f[7 + (2 * k + 1)])); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i + n_ + 0 * (C + F - 1); - i_slide_ptr_1 = i + n_ + 1 * (C + F - 1); - i_slide_ptr_2 = i + n_ + 2 * (C + F - 1); - - // Main kernel, last iteration with filter coefficients reuse - // Start loading next rows, from 4 to 6 - asm volatile("vmacc.vx v16, %0, v0" ::"r"(t6)); - asm volatile("vle64.v v2, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmacc.vx v18, %0, v4" ::"r"(t6)); - asm volatile("vmacc.vx v22, %0, v12" ::"r"(t6)); - asm volatile("vmacc.vx v16, %0, v4" ::"r"(t13)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmacc.vx v18, %0, v8" ::"r"(t13)); - asm volatile("vmacc.vx v20, %0, v8" ::"r"(t6)); - asm volatile("vmacc.vx v16, %0, v8" ::"r"(t20)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmacc.vx v18, %0, v12" ::"r"(t20)); - asm volatile("vmacc.vx v20, %0, v12" ::"r"(t13)); - asm volatile("vmacc.vx v16, %0, v12" ::"r"(t27)); - - //////////////// - // Row 4 -> 6 // - //////////////// - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - asm volatile("vmacc.vx v16, %0, v2" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v18, %0, v2" ::"r"(f[21 + (2 * k)])); - asm volatile("vmacc.vx v16, %0, v6" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v18, %0, v6" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v16, %0, v10" ::"r"(f[42 + (2 * k)])); - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_0++)); - - asm volatile("vmacc.vx v18, %0, v10" ::"r"(f[35 + (2 * k)])); - asm volatile("vslide1down.vx v4, v6, %0" ::"r"(*i_slide_ptr_1++)); - - asm volatile("vmacc.vx v20, %0, v2" ::"r"(f[14 + (2 * k)])); - asm volatile("vmacc.vx v20, %0, v6" ::"r"(f[21 + (2 * k)])); - asm volatile("vmacc.vx v20, %0, v10" ::"r"(f[28 + (2 * k)])); - asm volatile("vslide1down.vx v8, v10, %0" ::"r"(*i_slide_ptr_2++)); - - asm volatile("vmacc.vx v22, %0, v2" ::"r"(f[7 + (2 * k)])); - asm volatile("vmacc.vx v22, %0, v6" ::"r"(f[14 + (2 * k)])); - asm volatile("vmacc.vx v22, %0, v10" ::"r"(f[21 + (2 * k)])); - - if (k == 0) - asm volatile("vmul.vx v24, v2, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v24, %0, v2" ::"r"(f[0 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v6" ::"r"(f[7 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v10" ::"r"(f[14 + (2 * k)])); - - if (k == 0) - asm volatile("vmul.vx v26, v6, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v26, %0, v6" ::"r"(f[0 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v10" ::"r"(f[7 + (2 * k)])); - - if (k == 0) - asm volatile("vmul.vx v28, v10, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v28, %0, v10" ::"r"(f[0 + (2 * k)])); - - asm volatile("vmacc.vx v16, %0, v0" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v16, %0, v4" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v16, %0, v8" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vslide1down.vx v2, v0, %0" ::"r"(*i_slide_ptr_0++)); - - asm volatile("vmacc.vx v18, %0, v0" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v4" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v8" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vslide1down.vx v6, v4, %0" ::"r"(*i_slide_ptr_1++)); - - asm volatile("vmacc.vx v20, %0, v0" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vmacc.vx v20, %0, v4" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vmacc.vx v20, %0, v8" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vslide1down.vx v10, v8, %0" ::"r"(*i_slide_ptr_2++)); - - asm volatile("vmacc.vx v22, %0, v0" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vmacc.vx v22, %0, v4" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vmacc.vx v22, %0, v8" ::"r"(f[21 + (2 * k + 1)])); - - asm volatile("vmacc.vx v24, %0, v0" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v4" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v8" ::"r"(f[14 + (2 * k + 1)])); - - asm volatile("vmacc.vx v26, %0, v4" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v8" ::"r"(f[7 + (2 * k + 1)])); - - asm volatile("vmacc.vx v28, %0, v8" ::"r"(f[0 + (2 * k + 1)])); - } - - // Main kernel, last iteration with filter coefficients reuse - asm volatile("vmacc.vx v16, %0, v2" ::"r"(t34)); - asm volatile("vmacc.vx v16, %0, v6" ::"r"(t41)); - asm volatile("vmacc.vx v16, %0, v10" ::"r"(t48)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - - asm volatile("vmacc.vx v18, %0, v2" ::"r"(t27)); - asm volatile("vmacc.vx v18, %0, v6" ::"r"(t34)); - asm volatile("vmacc.vx v18, %0, v10" ::"r"(t41)); - asm volatile("vmv.v.v v16, v18"); - - asm volatile("vmacc.vx v20, %0, v2" ::"r"(t20)); - asm volatile("vmacc.vx v20, %0, v6" ::"r"(t27)); - asm volatile("vmacc.vx v20, %0, v10" ::"r"(t34)); - asm volatile("vmv.v.v v18, v20"); - - asm volatile("vmacc.vx v22, %0, v2" ::"r"(t13)); - asm volatile("vmacc.vx v22, %0, v6" ::"r"(t20)); - asm volatile("vmacc.vx v22, %0, v10" ::"r"(t27)); - asm volatile("vmv.v.v v20, v22"); - - asm volatile("vmacc.vx v24, %0, v2" ::"r"(t6)); - asm volatile("vmacc.vx v24, %0, v6" ::"r"(t13)); - asm volatile("vmacc.vx v24, %0, v10" ::"r"(t20)); - asm volatile("vmv.v.v v22, v24"); - - asm volatile("vmacc.vx v26, %0, v6" ::"r"(t6)); - asm volatile("vmacc.vx v26, %0, v10" ::"r"(t13)); - asm volatile("vmv.v.v v24, v26"); - - asm volatile("vmacc.vx v28, %0, v10" ::"r"(t6)); - asm volatile("vmv.v.v v26, v28"); - - //////////// - // REGIME // - //////////// - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i + n_; - - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - - // The following loop is unrolled by 2 - // The input matrix has R + F - 1 rows - // We have computed F input rows already - // Compute now until only F input rows are left - // (The last F-1 rows do not contribute to F output rows each, so keep them - // outside of this loop) (We keep F rows outside because of the unrolling by - // 2, just for easeness) - for (int j = 0; j < ((R + F - 1) - 2 * F) / 2; ++j) { - // Work on F output rows - - ////////////// - // UNROLL 0 // - ////////////// - - // Main loop - for (int k = 0; k < F / 2; ++k) { - // Calculate F contributions of the input rows, on F different output rows - asm volatile("vmacc.vx v16, %0, v0" ::"r"(f[42 + (2 * k)])); - asm volatile("vmacc.vx v18, %0, v0" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v20, %0, v0" ::"r"(f[28 + (2 * k)])); - asm volatile("vslide1down.vx v2, v0, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v22, %0, v0" ::"r"(f[21 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v0" ::"r"(f[14 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v0" ::"r"(f[7 + (2 * k)])); - if (k == 0) - asm volatile("vmul.vx v28, v0, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v28, %0, v0" ::"r"(f[0 + (2 * k)])); - - // Calculate F contributions of the input rows, on F different output rows - asm volatile("vmacc.vx v16, %0, v2" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v2" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v20, %0, v2" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v22, %0, v2" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v2" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vmacc.vx v28, %0, v2" ::"r"(f[0 + (2 * k + 1)])); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_1 = i + n_; - - // The last iteration is used to mask the latency of the store and the moves - // Use buffered coefficients not to stall CVA6 for coherency - asm volatile("vmacc.vx v16, %0, v0" ::"r"(t48)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v18, %0, v0" ::"r"(t41)); - asm volatile("vmv.v.v v16, v18"); - asm volatile("vmacc.vx v20, %0, v0" ::"r"(t34)); - asm volatile("vle64.v v2, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmv.v.v v18, v20"); - asm volatile("vmacc.vx v22, %0, v0" ::"r"(t27)); - asm volatile("vmacc.vx v24, %0, v0" ::"r"(t20)); - asm volatile("vmv.v.v v20, v22"); - asm volatile("vmacc.vx v26, %0, v0" ::"r"(t13)); - asm volatile("vmacc.vx v28, %0, v0" ::"r"(t6)); - asm volatile("vmv.v.v v22, v24"); - - ////////////// - // UNROLL 1 // - ////////////// - - asm volatile("vmacc.vx v16, %0, v2" ::"r"(f[42])); - asm volatile("vmacc.vx v18, %0, v2" ::"r"(f[35])); - asm volatile("vmv.v.v v24, v26"); - asm volatile("vmacc.vx v20, %0, v2" ::"r"(f[28])); - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v22, %0, v2" ::"r"(f[21])); - asm volatile("vmv.v.v v26, v28"); - asm volatile("vmacc.vx v24, %0, v2" ::"r"(f[14])); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(f[7])); - asm volatile("vmul.vx v28, v2, %0" ::"r"(f[0])); - - for (int k = 1; k < F; k += 2) { - asm volatile("vmacc.vx v16, %0, v0" ::"r"(f[42 + k])); - asm volatile("vmacc.vx v18, %0, v0" ::"r"(f[35 + k])); - asm volatile("vmacc.vx v20, %0, v0" ::"r"(f[28 + k])); - asm volatile("vslide1down.vx v2, v0, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v22, %0, v0" ::"r"(f[21 + k])); - asm volatile("vmacc.vx v24, %0, v0" ::"r"(f[14 + k])); - asm volatile("vmacc.vx v26, %0, v0" ::"r"(f[7 + k])); - asm volatile("vmacc.vx v28, %0, v0" ::"r"(f[0 + k])); - - if (k == F - 2) - break; - - asm volatile("vmacc.vx v16, %0, v2" ::"r"(f[42 + (k + 1)])); - asm volatile("vmacc.vx v18, %0, v2" ::"r"(f[35 + (k + 1)])); - asm volatile("vmacc.vx v20, %0, v2" ::"r"(f[28 + (k + 1)])); - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v22, %0, v2" ::"r"(f[21 + (k + 1)])); - asm volatile("vmacc.vx v24, %0, v2" ::"r"(f[14 + (k + 1)])); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(f[7 + (k + 1)])); - asm volatile("vmacc.vx v28, %0, v2" ::"r"(f[0 + (k + 1)])); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i + n_; - - asm volatile("vmacc.vx v16, %0, v2" ::"r"(t48)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v18, %0, v2" ::"r"(t41)); - asm volatile("vmv.v.v v16, v18"); - asm volatile("vmacc.vx v20, %0, v2" ::"r"(t34)); - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmv.v.v v18, v20"); - asm volatile("vmacc.vx v22, %0, v2" ::"r"(t27)); - asm volatile("vmv.v.v v20, v22"); - asm volatile("vmacc.vx v24, %0, v2" ::"r"(t20)); - asm volatile("vmv.v.v v22, v24"); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(t13)); - asm volatile("vmv.v.v v24, v26"); - asm volatile("vmacc.vx v28, %0, v2" ::"r"(t6)); - asm volatile("vmv.v.v v26, v28"); - } - - //////////////////////// - // Row I-F -> (I-1)-3 // - //////////////////////// - - // Point to the scalar elements to insert during a slide - // i_slide_ptr_0 has already been computed - i_slide_ptr_1 = i + n_ + 0 * (C + F - 1); - i_slide_ptr_2 = i + n_ + 1 * (C + F - 1); - i_slide_ptr_3 = i + n_ + 2 * (C + F - 1); - - // Load other three input rows (one was already loaded) - asm volatile("vle64.v v4, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vle64.v v8, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vle64.v v12, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - - // Main kernel, unrolled by 2 - // Process 4 input rows - for (int k = 0; k < F / 2; ++k) { - asm volatile("vslide1down.vx v2, v0, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v16, %0, v0" ::"r"(f[42 + (2 * k)])); - asm volatile("vmacc.vx v18, %0, v0" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v20, %0, v0" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v22, %0, v0" ::"r"(f[21 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v0" ::"r"(f[14 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v0" ::"r"(f[7 + (2 * k)])); - if (k == 0) - asm volatile("vmul.vx v28, v0, %0" ::"r"(f[0 + (2 * k)])); - else - asm volatile("vmacc.vx v28, %0, v0" ::"r"(f[0 + (2 * k)])); - asm volatile("vslide1down.vx v6, v4, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v18, %0, v4" ::"r"(f[42 + (2 * k)])); - asm volatile("vmacc.vx v20, %0, v4" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v22, %0, v4" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v4" ::"r"(f[21 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v4" ::"r"(f[14 + (2 * k)])); - asm volatile("vmacc.vx v28, %0, v4" ::"r"(f[7 + (2 * k)])); - asm volatile("vslide1down.vx v10, v8, %0" ::"r"(*i_slide_ptr_2++)); - asm volatile("vmacc.vx v20, %0, v8" ::"r"(f[42 + (2 * k)])); - asm volatile("vmacc.vx v22, %0, v8" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v8" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v8" ::"r"(f[21 + (2 * k)])); - asm volatile("vmacc.vx v28, %0, v8" ::"r"(f[14 + (2 * k)])); - asm volatile("vslide1down.vx v14, v12, %0" ::"r"(*i_slide_ptr_3++)); - asm volatile("vmacc.vx v22, %0, v12" ::"r"(f[42 + (2 * k)])); - asm volatile("vmacc.vx v24, %0, v12" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v12" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v28, %0, v12" ::"r"(f[21 + (2 * k)])); - - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v16, %0, v2" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vmacc.vx v18, %0, v2" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v20, %0, v2" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v22, %0, v2" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v2" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vmacc.vx v28, %0, v2" ::"r"(f[0 + (2 * k + 1)])); - asm volatile("vslide1down.vx v4, v6, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v18, %0, v6" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vmacc.vx v20, %0, v6" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v22, %0, v6" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v6" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v6" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vmacc.vx v28, %0, v6" ::"r"(f[7 + (2 * k + 1)])); - asm volatile("vslide1down.vx v8, v10, %0" ::"r"(*i_slide_ptr_2++)); - asm volatile("vmacc.vx v20, %0, v10" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vmacc.vx v22, %0, v10" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v10" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v10" ::"r"(f[21 + (2 * k + 1)])); - asm volatile("vmacc.vx v28, %0, v10" ::"r"(f[14 + (2 * k + 1)])); - asm volatile("vslide1down.vx v12, v14, %0" ::"r"(*i_slide_ptr_3++)); - asm volatile("vmacc.vx v22, %0, v14" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vmacc.vx v24, %0, v14" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v14" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v28, %0, v14" ::"r"(f[21 + (2 * k + 1)])); - } - - // Start calculating the next pointers to the elements to be slided in - i_slide_ptr_0 = i + n_ + 0 * (C + F - 1); - i_slide_ptr_1 = i + n_ + 1 * (C + F - 1); - i_slide_ptr_2 = i + n_ + 2 * (C + F - 1); - - asm volatile("vle64.v v2, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmacc.vx v16, %0, v0" ::"r"(t48)); - asm volatile("vmacc.vx v18, %0, v0" ::"r"(t41)); - asm volatile("vmacc.vx v20, %0, v0" ::"r"(t34)); - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v22, %0, v0" ::"r"(t27)); - asm volatile("vmacc.vx v24, %0, v0" ::"r"(t20)); - asm volatile("vmacc.vx v26, %0, v0" ::"r"(t13)); - asm volatile("vle64.v v6, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmacc.vx v28, %0, v0" ::"r"(t6)); - asm volatile("vmacc.vx v18, %0, v4" ::"r"(t48)); - asm volatile("vmacc.vx v20, %0, v4" ::"r"(t41)); - asm volatile("vse64.v v18, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v22, %0, v4" ::"r"(t34)); - asm volatile("vmacc.vx v24, %0, v4" ::"r"(t27)); - asm volatile("vmacc.vx v26, %0, v4" ::"r"(t20)); - asm volatile("vle64.v v10, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - asm volatile("vmacc.vx v28, %0, v4" ::"r"(t13)); - asm volatile("vmacc.vx v20, %0, v8" ::"r"(t48)); - asm volatile("vmacc.vx v22, %0, v8" ::"r"(t41)); - asm volatile("vse64.v v20, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v24, %0, v8" ::"r"(t34)); - asm volatile("vmacc.vx v26, %0, v8" ::"r"(t27)); - asm volatile("vmacc.vx v28, %0, v8" ::"r"(t20)); - asm volatile("vmacc.vx v22, %0, v12" ::"r"(t48)); - asm volatile("vse64.v v22, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v24, %0, v12" ::"r"(t41)); - asm volatile("vmacc.vx v26, %0, v12" ::"r"(t34)); - asm volatile("vmacc.vx v28, %0, v12" ::"r"(t27)); - - ////////////////////////// - // Row (I-1)-3 -> (I-1) // - ////////////////////////// - - // Main kernel, unrolled by 2 - for (int k = 0; k < F / 2; ++k) { - asm volatile("vslide1down.vx v0, v2, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v24, %0, v2" ::"r"(f[42 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(f[35 + (2 * k)])); - asm volatile("vslide1down.vx v4, v6, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v28, %0, v2" ::"r"(f[28 + (2 * k)])); - asm volatile("vmacc.vx v26, %0, v6" ::"r"(f[42 + (2 * k)])); - asm volatile("vslide1down.vx v8, v10, %0" ::"r"(*i_slide_ptr_2++)); - asm volatile("vmacc.vx v28, %0, v6" ::"r"(f[35 + (2 * k)])); - asm volatile("vmacc.vx v28, %0, v10" ::"r"(f[42 + (2 * k)])); - - asm volatile("vslide1down.vx v2, v0, %0" ::"r"(*i_slide_ptr_0++)); - asm volatile("vmacc.vx v24, %0, v0" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v0" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vslide1down.vx v6, v4, %0" ::"r"(*i_slide_ptr_1++)); - asm volatile("vmacc.vx v28, %0, v0" ::"r"(f[28 + (2 * k + 1)])); - asm volatile("vmacc.vx v26, %0, v4" ::"r"(f[42 + (2 * k + 1)])); - asm volatile("vslide1down.vx v10, v8, %0" ::"r"(*i_slide_ptr_2++)); - asm volatile("vmacc.vx v28, %0, v4" ::"r"(f[35 + (2 * k + 1)])); - asm volatile("vmacc.vx v28, %0, v8" ::"r"(f[42 + (2 * k + 1)])); - } - - asm volatile("vmacc.vx v24, %0, v2" ::"r"(t48)); - asm volatile("vse64.v v24, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v26, %0, v2" ::"r"(t41)); - asm volatile("vmacc.vx v28, %0, v2" ::"r"(t34)); - asm volatile("vmacc.vx v26, %0, v6" ::"r"(t48)); - asm volatile("vse64.v v26, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - asm volatile("vmacc.vx v28, %0, v6" ::"r"(t41)); - asm volatile("vmacc.vx v28, %0, v10" ::"r"(t48)); - asm volatile("vse64.v v28, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); -} - -/* - //////////////////// - // MAIN ALGORITHM // - //////////////////// - - // Start calculating the pointer to the next element to be slided in - i_slide_ptr_0 = i + C; - - // Load one input row - asm volatile("vle64.v v0, (%0); add %0, %0, %1" : "+&r"(i) : "r"(ldi_pad)); - - // Kernel - for (int k = 0; k < F; ++k) { - // Calculate F*F contributions of the input rows, on F different output rows - // v28 should be initialized during the first iteration - asm volatile("vmacc.vx v16, %0, v0" :: "r"(f[42 + (2*k)])); - asm volatile("vmacc.vx v18, %0, v0" :: "r"(f[35 + (2*k)])); - asm volatile("vmacc.vx v20, %0, v0" :: "r"(f[28 + (2*k)])); - asm volatile("vmacc.vx v22, %0, v0" :: "r"(f[21 + (2*k)])); - asm volatile("vmacc.vx v24, %0, v0" :: "r"(f[14 + (2*k)])); - asm volatile("vmacc.vx v26, %0, v0" :: "r"(f[7 + (2*k)])); - if (k == 0) - asm volatile("vmul.vx v28, v0, %0" :: "r"(f[0 + (2*k)])); - else - asm volatile("vmacc.vx v28, %0, v0" :: "r"(f[0 + (2*k)])); - - // Slide the input row by one, and inject the next scalar element of the row - asm volatile("vslide1down.vx v0, v0, %0" :: "r"(*i_slide_ptr_0++)); - } - - // Store one output row - asm volatile("vse64.v v16, (%0); add %0, %0, %1" : "+&r"(o) : "r"(ldo)); - - // Move all the input rows to return to the initial situation - // To avoid these operations, unroll the loop via software, renaming the - registers manually asm volatile("vmv.v.v v16, v18"); asm volatile("vmv.v.v - v18, v20"); asm volatile("vmv.v.v v20, v22"); asm volatile("vmv.v.v v22, - v24"); asm volatile("vmv.v.v v24, v26"); asm volatile("vmv.v.v v26, v28"); -*/ diff --git a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/main.c b/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/main.c deleted file mode 100644 index 09b075b5..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-iconv2d/main.c +++ /dev/null @@ -1,91 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matteo Perotti - -#include -#include -#include - -#include "iconv2d.h" -#include "util.h" - -// Define Matrix dimensions: -// o = i ° f, with i=[MxN], f=[FxF], o=[MxN] -// The filter is a square matrix, and F is odd - -// Matrices defined in data.S -extern int64_t i[] - __attribute__((aligned(32))); // [ (M+floor(F/2)) * (N+floor(F/2)) ] -extern int64_t f[] __attribute__((aligned(32))); // [ F*F ] -extern int64_t o[] __attribute__((aligned(32))); // [ M*N ] -extern int64_t golden_o[] __attribute__((aligned(32))); // [ M*N ] -// M, N, F defined in data.S -extern int64_t M; -extern int64_t N; -extern int64_t F; - -// Verify the matrices -int verify_matrix(int64_t *matrix, int64_t *golden_matrix, int64_t R, - int64_t C) { - for (int r = 0; r < R; ++r) - for (int c = 0; c < C; ++c) - if (matrix[c + C * r] != golden_matrix[c + C * r]) { - printf("Error: o[%d][%d] = %ld, instead of %ld\n", r, c, - matrix[c + C * r], golden_matrix[c + C * r]); - return 1; - } - return 0; -} - -int main() { - printf("ICONV2D M=%ld N=%ld F=%ld\n", M, N, F); - - unsigned long cycles1, cycles2, instr2, instr1; - // Call the main kernel, and measure cycles - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - if (F == 3) - iconv2d_3x3(o, i, f, M, N, F); - else if (F == 5) - iconv2d_5x5(o, i, f, M, N, F); - else if (F == 7) - iconv2d_7x7(o, i, f, M, N, F); - else - printf("Error: the filter size is different from 3 or 5 or 7.\n"); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - // Performance metrics - int64_t runtime = cycles2 - cycles1; - float performance = 2.0 * F * F * M * N / runtime; - printf("operations %ld\n", F * F * M * N); - printf("The execution took %d cycles.\n", runtime); - printf("The performance is %ld OPs/1000 cycles\n", - (uint64_t)(1000.0 * performance)); - - // Verify correctness - printf("Verifying result...\n"); - int error = verify_matrix(o, golden_o, M, N); - if (error != 0) { - printf("Fail.\n"); - } else { - printf("Passed.\n"); - } - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-igemm/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-igemm/gen_data.py deleted file mode 100644 index 8f073ad3..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-igemm/gen_data.py +++ /dev/null @@ -1,71 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2022 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Author: Matteo Perotti - -# C = AB with A=[MxN], B=[NxP], C=[MxP] -# arg1, arg2, arg3: M, N, P - -import random as rand -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# SCRIPT - -if len(sys.argv) == 4: - M = int(sys.argv[1]) - N = int(sys.argv[2]) - P = int(sys.argv[3]) -else: - print("Error. Give me three argument: M, N, P.") - print("C = AB with A=[MxN], B=[NxP], C=[MxP]") - sys.exit() - -dtype = np.int64 - -UPPER_LIMIT = 10000 -LOWER_LIMIT = -10000 - -# Matrices and results -A = np.random.randint(LOWER_LIMIT, UPPER_LIMIT, size=(M, N)).astype(dtype) -B = np.random.randint(LOWER_LIMIT, UPPER_LIMIT, size=(N, P)).astype(dtype) -C = np.zeros([M, P], dtype=dtype) -# Golden result matrix -G = np.matmul(A, B).astype(dtype) - -# Create the file -print('.section .data,"aw",@progbits') -emit("M", np.array(M, dtype=np.uint64)) -emit("N", np.array(N, dtype=np.uint64)) -emit("P", np.array(P, dtype=np.uint64)) -emit("a", A, "32") -emit("b", B, "32") -emit("c", C, "32") -emit("g", G, "32") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-igemm/imatmul.c b/bb-tests/workloads/src/CTest/rvv/vec-igemm/imatmul.c deleted file mode 100644 index 2be76e72..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-igemm/imatmul.c +++ /dev/null @@ -1,324 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matheus Cavalcante, ETH Zurich -// Samuel Riedel, ETH Zurich - -#include "imatmul.h" - -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - -void imatmul(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int M, const unsigned long int N, - const unsigned long int P) { - if (M <= 4) { - imatmul_4x4(c, a, b, M, N, P); - } else if (M <= 128) { - imatmul_8x8(c, a, b, M, N, P); - } else { - // Vector length is 64 elements. With an 4x4 matmul, - // we can use LMUL=4, having a vl of 256. - imatmul_4x4(c, a, b, M, N, P); - } -} - -// --------------- -// 4x4 -// --------------- - -void imatmul_4x4(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int M, const unsigned long int N, - const unsigned long int P) { - // We work on 4 rows of the matrix at once - const unsigned long int block_size = 4; - unsigned long int block_size_p; - - // Set the vector configuration - asm volatile("vsetvli %0, %1, e64, m4, ta, ma" : "=r"(block_size_p) : "r"(P)); - - // Slice the matrix into a manageable number of columns p_ - for (unsigned long int p = 0; p < P; p += block_size_p) { - // Set the vector length - const unsigned long int p_ = MIN(P - p, block_size_p); - - // Find pointers to the submatrices - const int64_t *b_ = b + p; - int64_t *c_ = c + p; - - asm volatile("vsetvli zero, %0, e64, m4, ta, ma" ::"r"(p_)); - - // Iterate over the rows - for (unsigned long int m = 0; m < M; m += block_size) { - // Find pointer to the submatrices - const int64_t *a_ = a + m * N; - int64_t *c__ = c_ + m * P; - - imatmul_vec_4x4_slice_init(); - imatmul_vec_4x4(c__, a_, b_, N, P); - } - } -} - -void imatmul_vec_4x4_slice_init() { - asm volatile("vmv.v.i v0, 0"); - asm volatile("vmv.v.i v4, 0"); - asm volatile("vmv.v.i v8, 0"); - asm volatile("vmv.v.i v12, 0"); -} - -void imatmul_vec_4x4(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int N, const unsigned long int P) { - // Temporary variables - int64_t t0, t1, t2, t3; - - // Original pointer - const int64_t *a_ = a; - - // Prefetch one row of matrix B - asm volatile("vle64.v v16, (%0);" ::"r"(b)); - b += P; - - // Prefetch one row of scalar values - t0 = *a, a += N; - t1 = *a, a += N; - t2 = *a, a += N; - t3 = *a; - - // Compute the multiplication - unsigned long int n = 0; - - while (n < N) { -#ifdef VCD_DUMP - // Start dumping VCD - if (n == 8) - event_trigger = +1; - // Stop dumping VCD - if (n == 12) - event_trigger = -1; -#endif - - // Calculate pointer to the matrix A - a = a_ + ++n; - - asm volatile("vmacc.vx v0, %0, v16" ::"r"(t0)); - t0 = *a, a += N; - - // Load one row of B - asm volatile("vle64.v v20, (%0);" ::"r"(b)); - b += P; - - asm volatile("vmacc.vx v4, %0, v16" ::"r"(t1)); - t1 = *a, a += N; - asm volatile("vmacc.vx v8, %0, v16" ::"r"(t2)); - t2 = *a, a += N; - asm volatile("vmacc.vx v12, %0, v16" ::"r"(t3)); - t3 = *a; - - a = a_ + ++n; - - if (n == N) - break; - - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t0)); - t0 = *a, a += N; - - // Load one row of B - asm volatile("vle64.v v16, (%0);" ::"r"(b)); - b += P; - - asm volatile("vmacc.vx v4, %0, v20" ::"r"(t1)); - t1 = *a, a += N; - asm volatile("vmacc.vx v8, %0, v20" ::"r"(t2)); - t2 = *a, a += N; - asm volatile("vmacc.vx v12, %0, v20" ::"r"(t3)); - t3 = *a; - } - - // Last iteration: store results - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t0)); - asm volatile("vse64.v v0, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v4, %0, v20" ::"r"(t1)); - asm volatile("vse64.v v4, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v8, %0, v20" ::"r"(t2)); - asm volatile("vse64.v v8, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v12, %0, v20" ::"r"(t3)); - asm volatile("vse64.v v12, (%0);" ::"r"(c)); -} - -// --------------- -// 8x8 -// --------------- - -void imatmul_8x8(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int M, const unsigned long int N, - const unsigned long int P) { - // We work on 4 rows of the matrix at once - const unsigned long int block_size = 8; - unsigned long int block_size_p; - - // Set the vector configuration - asm volatile("vsetvli %0, %1, e64, m2, ta, ma" : "=r"(block_size_p) : "r"(P)); - - // Slice the matrix into a manageable number of columns p_ - for (unsigned long int p = 0; p < P; p += block_size_p) { - // Set the vector length - const unsigned long int p_ = MIN(P - p, block_size_p); - - // Find pointers to the submatrices - const int64_t *b_ = b + p; - int64_t *c_ = c + p; - - asm volatile("vsetvli zero, %0, e64, m2, ta, ma" ::"r"(p_)); - - // Iterate over the rows - for (unsigned long int m = 0; m < M; m += block_size) { - // Find pointer to the submatrices - const int64_t *a_ = a + m * N; - int64_t *c__ = c_ + m * P; - - imatmul_vec_8x8_slice_init(); - imatmul_vec_8x8(c__, a_, b_, N, P); - } - } -} - -void imatmul_vec_8x8_slice_init() { - asm volatile("vmv.v.i v0, 0"); - asm volatile("vmv.v.i v2, 0"); - asm volatile("vmv.v.i v4, 0"); - asm volatile("vmv.v.i v6, 0"); - asm volatile("vmv.v.i v8, 0"); - asm volatile("vmv.v.i v10, 0"); - asm volatile("vmv.v.i v12, 0"); - asm volatile("vmv.v.i v14, 0"); -} - -void imatmul_vec_8x8(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int N, const unsigned long int P) { - // Temporary variables - int64_t t0, t1, t2, t3, t4, t5, t6, t7; - - // Original pointer - const int64_t *a_ = a; - - // Prefetch one row of matrix B - asm volatile("vle64.v v18, (%0);" ::"r"(b)); - b += P; - - // Prefetch one row of scalar values - t0 = *a, a += N; - t1 = *a, a += N; - t2 = *a, a += N; - t3 = *a, a += N; - t4 = *a, a += N; - t5 = *a, a += N; - t6 = *a, a += N; - t7 = *a; - - // Compute the multiplication - unsigned long int n = 0; - - while (n < N) { -#ifdef VCD_DUMP - // Start dumping VCD - if (n == 8) - event_trigger = +1; - // Stop dumping VCD - if (n == 12) - event_trigger = -1; -#endif - - // Calculate pointer to the matrix A - a = a_ + ++n; - - asm volatile("vmacc.vx v0, %0, v18" ::"r"(t0)); - t0 = *a, a += N; - - // Load one row of B - asm volatile("vle64.v v20, (%0);" ::"r"(b)); - b += P; - - asm volatile("vmacc.vx v2, %0, v18" ::"r"(t1)); - t1 = *a, a += N; - asm volatile("vmacc.vx v4, %0, v18" ::"r"(t2)); - t2 = *a, a += N; - asm volatile("vmacc.vx v6, %0, v18" ::"r"(t3)); - t3 = *a, a += N; - asm volatile("vmacc.vx v8, %0, v18" ::"r"(t4)); - t4 = *a, a += N; - asm volatile("vmacc.vx v10, %0, v18" ::"r"(t5)); - t5 = *a, a += N; - asm volatile("vmacc.vx v12, %0, v18" ::"r"(t6)); - t6 = *a, a += N; - asm volatile("vmacc.vx v14, %0, v18" ::"r"(t7)); - t7 = *a; - - a = a_ + ++n; - - if (n == N) - break; - - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t0)); - t0 = *a, a += N; - - // Load one row of B - asm volatile("vle64.v v18, (%0);" ::"r"(b)); - b += P; - - asm volatile("vmacc.vx v2, %0, v20" ::"r"(t1)); - t1 = *a, a += N; - asm volatile("vmacc.vx v4, %0, v20" ::"r"(t2)); - t2 = *a, a += N; - asm volatile("vmacc.vx v6, %0, v20" ::"r"(t3)); - t3 = *a, a += N; - asm volatile("vmacc.vx v8, %0, v20" ::"r"(t4)); - t4 = *a, a += N; - asm volatile("vmacc.vx v10, %0, v20" ::"r"(t5)); - t5 = *a, a += N; - asm volatile("vmacc.vx v12, %0, v20" ::"r"(t6)); - t6 = *a, a += N; - asm volatile("vmacc.vx v14, %0, v20" ::"r"(t7)); - t7 = *a; - } - - // Last iteration: store results - asm volatile("vmacc.vx v0, %0, v20" ::"r"(t0)); - asm volatile("vse64.v v0, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v2, %0, v20" ::"r"(t1)); - asm volatile("vse64.v v2, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v4, %0, v20" ::"r"(t2)); - asm volatile("vse64.v v4, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v6, %0, v20" ::"r"(t3)); - asm volatile("vse64.v v6, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v8, %0, v20" ::"r"(t4)); - asm volatile("vse64.v v8, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v10, %0, v20" ::"r"(t5)); - asm volatile("vse64.v v10, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v12, %0, v20" ::"r"(t6)); - asm volatile("vse64.v v12, (%0);" ::"r"(c)); - c += P; - asm volatile("vmacc.vx v14, %0, v20" ::"r"(t7)); - asm volatile("vse64.v v14, (%0);" ::"r"(c)); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-igemm/imatmul.h b/bb-tests/workloads/src/CTest/rvv/vec-igemm/imatmul.h deleted file mode 100644 index 145b498d..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-igemm/imatmul.h +++ /dev/null @@ -1,45 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matheus Cavalcante, ETH Zurich -// Samuel Riedel, ETH Zurich - -#ifndef IMATMUL_H -#define IMATMUL_H - -#include - -void imatmul(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int m, const unsigned long int n, - const unsigned long int p); - -void imatmul_4x4(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int m, const unsigned long int n, - const unsigned long int p); -void imatmul_vec_4x4_slice_init(); -void imatmul_vec_4x4(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int n, const unsigned long int p); - -void imatmul_8x8(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int m, const unsigned long int n, - const unsigned long int p); -void imatmul_vec_8x8_slice_init(); -void imatmul_vec_8x8(int64_t *c, const int64_t *a, const int64_t *b, - const unsigned long int n, const unsigned long int p); - -extern int64_t event_trigger; - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-igemm/main.c b/bb-tests/workloads/src/CTest/rvv/vec-igemm/main.c deleted file mode 100644 index 01d51fb8..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-igemm/main.c +++ /dev/null @@ -1,93 +0,0 @@ -// Copyright 2020 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Matheus Cavalcante, ETH Zurich -// Samuel Riedel, ETH Zurich - -#include -#include -#include - -#include "imatmul.h" -#include "util.h" - -// Define Matrix dimensions: -// C = AB with A=[MxN], B=[NxP], C=[MxP] -extern uint64_t M; -extern uint64_t N; -extern uint64_t P; - -extern int64_t a[] __attribute__((aligned(256))); -extern int64_t b[] __attribute__((aligned(256))); -extern int64_t c[] __attribute__((aligned(256))); -// Gold results -extern int64_t g[] __attribute__((aligned(256))); - -// Verify the matrix -int verify_matrix(int64_t *result, int64_t *gold, size_t R, size_t C) { - for (uint64_t i = 0; i < R; ++i) { - for (uint64_t j = 0; j < C; ++j) { - uint64_t idx = i * C + j; - if (result[idx] != gold[idx]) { - return (i + j) == 0 ? -1 : idx; - } - } - } - return 0; -} - -int main() { - printf("IMATMUL\n"); - unsigned long cycles1, cycles2, instr2, instr1; - - for (int s = 4; s <= M; s *= 2) { - printf("Calculating a (%d x %d) x (%d x %d) matrix multiplication...\n", s, - s, s, s); - - // Matrices are initialized --> Start calculating - printf("Calculating imatmul...\n"); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - imatmul(c, a, b, s, s, s); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - // Metrics - int64_t runtime = cycles2 - cycles1; - float performance = 2.0 * s * s * s / runtime; - - printf("The execution took %d cycles.\n", runtime); - printf("The performance is %ld OPs/1000 cycles.\n", - (uint64_t)(1000.0 * performance)); - - // Verify the result only for s == M (to keep it simple) - if (s == M) { - // Verify the result - printf("Verifying result...\n"); - int error = verify_matrix(c, g, s, s); - if (error != 0) { - printf("Error code %d\n", error); - printf("c[%d]=%d\n", error, c[error]); - return error; - } else { - printf("Passed.\n"); - } - } - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/gen_data.py deleted file mode 100755 index e0774213..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/gen_data.py +++ /dev/null @@ -1,68 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: vector size, arg2: filter size - -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# SCRIPT - -if len(sys.argv) == 3: - R = int(sys.argv[1]) - C = int(sys.argv[2]) -else: - print("Error. Give me one argument: the number of vector elements.") - sys.exit() - -dtype = np.float64 - -TSTEPS = 1 - -# Fill in the extra data to align the matrices to 4*NrLanes in SW -maxNrLanes = 16 -maxAlignment = 4 * maxNrLanes # [B] -sizeOfDType = np.dtype(dtype).itemsize # [B] -R_ext = int(R + (maxAlignment / sizeOfDType)) -C_ext = int(C + (maxAlignment / sizeOfDType)) - -# Vector of samples (padding is random since it does not impact performance) -A = np.random.rand(R_ext, C_ext).astype(dtype) -B = np.zeros([R_ext, C_ext], dtype=dtype) - -# Create the file -print('.section .data,"aw",@progbits') -emit("R", np.array(R, dtype=np.uint64)) -emit("C", np.array(C, dtype=np.uint64)) -emit("TSTEPS", np.array(TSTEPS, dtype=np.uint64)) -emit("A_v", A, "NR_LANES*4") -emit("B_v", B, "NR_LANES*4") -emit("A_s", A, "NR_LANES*4") -emit("B_s", B, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/jacobi2d.c b/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/jacobi2d.c deleted file mode 100644 index 50d4a646..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/jacobi2d.c +++ /dev/null @@ -1,146 +0,0 @@ -/* -OHIO STATE UNIVERSITY SOFTWARE DISTRIBUTION LICENSE - -PolyBench/C, a collection of benchmarks containing static control -parts (the "Software") -Copyright (c) 2010-2016, Ohio State University. All rights reserved. - -The Software is available for download and use subject to the terms -and conditions of this License. Access or use of the Software -constitutes acceptance and agreement to the terms and conditions of -this License. Redistribution and use of the Software in source and -binary forms, with or without modification, are permitted provided -that the following conditions are met: - -1. Redistributions of source code must retain the above copyright -notice, this list of conditions and the capitalized paragraph below. - -2. Redistributions in binary form must reproduce the above copyright -notice, this list of conditions and the capitalized paragraph below in -the documentation and/or other materials provided with the -distribution. - -3. The name of Ohio State University, or its faculty, staff or -students may not be used to endorse or promote products derived from -the Software without specific prior written permission. - -This software was produced with support from the U.S. Defense Advanced -Research Projects Agency (DARPA), the U.S. Department of Energy (DoE) -and the U.S. National Science Foundation. Nothing in this work should -be construed as reflecting the official policy or position of the -Defense Department, the United States government or Ohio State -University. - -THIS SOFTWARE HAS BEEN APPROVED FOR PUBLIC RELEASE, UNLIMITED -DISTRIBUTION. THE SOFTWARE IS PROVIDED ?AS IS? AND WITHOUT ANY -EXPRESS, IMPLIED OR STATUTORY WARRANTIES, INCLUDING, BUT NOT LIMITED -TO, WARRANTIES OF ACCURACY, COMPLETENESS, NONINFRINGEMENT, -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. -ACCESS OR USE OF THE SOFTWARE IS ENTIRELY AT THE USER?S RISK. IN NO -EVENT SHALL OHIO STATE UNIVERSITY OR ITS FACULTY, STAFF OR STUDENTS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR -BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, -WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE -OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN -IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE SOFTWARE USER SHALL -INDEMNIFY, DEFEND AND HOLD HARMLESS OHIO STATE UNIVERSITY AND ITS -FACULTY, STAFF AND STUDENTS FROM ANY AND ALL CLAIMS, ACTIONS, DAMAGES, -LOSSES, LIABILITIES, COSTS AND EXPENSES, INCLUDING ATTORNEYS? FEES AND -COURT COSTS, DIRECTLY OR INDIRECTLY ARISING OUT OF OR IN CONNECTION -WITH ACCESS OR USE OF THE SOFTWARE. -*/ - -/** - * This version is stamped on May 10, 2016 - * - * Contact: - * Louis-Noel Pouchet - * Tomofumi Yuki - * - * Web address: http://polybench.sourceforge.net - */ -/* jacobi-2d.c: this file is part of PolyBench/C */ - -/************************************************************************* - * RISC-V Vectorized Version - * Author: Cristóbal Ramírez Lazo - * email: cristobal.ramirez@bsc.es - * Barcelona Supercomputing Center (2020) - *************************************************************************/ - -// Porting to Ara SW environment and Optimization -// Author: Matteo Perotti, ETH Zurich, - -#include "jacobi2d.h" -#define DOUBLE_BUFFERING - -void j2d_s(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B, - uint64_t tsteps) { - for (uint32_t t = 0; t < tsteps; t++) { - for (uint32_t i = 1; i < r - 1; i++) - for (uint32_t j = 1; j < c - 1; j++) - B[i * c + j] = - (0.2) * (A[i * c + j] + A[i * c + j - 1] + A[i * c + j + 1] + - A[(i + 1) * c + j] + A[(i - 1) * c + j]); -#ifdef DOUBLE_BUFFERING - for (uint32_t i = 1; i < r - 1; i++) - for (uint32_t j = 1; j < c - 1; j++) - A[i * c + j] = - (0.2) * (B[i * c + j] + B[i * c + j - 1] + B[i * c + j + 1] + - B[(i + 1) * c + j] + B[(i - 1) * c + j]); -#endif - } -} - -void j2d_v(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B, - uint64_t tsteps) { - for (uint32_t t = 0; t < tsteps; t++) { - j2d_kernel_v(r, c, B, A); - } -} - -void j2d_kernel_v(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B) { - vfloat64m4_t xU; - vfloat64m4_t xUtmp; - vfloat64m4_t xUleft; - vfloat64m4_t xUright; - vfloat64m4_t xUtop; - vfloat64m4_t xUbottom; - - DATA_TYPE izq, der; - uint32_t size_x = c - 2; - uint32_t size_y = r - 2; - - size_t gvl = __riscv_vsetvl_e64m4(size_x); - - for (uint32_t j = 1; j <= size_x; j = j + gvl) { - gvl = __riscv_vsetvl_e64m4(size_x - j + 1); - xU = __riscv_vle64_v_f64m4(&A[1 * c + j], gvl); - xUtop = __riscv_vle64_v_f64m4(&A[0 * c + j], gvl); - xUbottom = __riscv_vle64_v_f64m4(&A[2 * c + j], gvl); - - for (uint32_t i = 1; i <= size_y; i++) { - if (i != 1) { - xUtop = xU; - xU = xUbottom; - xUbottom = __riscv_vle64_v_f64m4(&A[(i + 1) * c + j], gvl); - } - izq = A[i * c + j - 1]; - der = A[i * c + j + gvl]; - if (i != size_y) { - asm volatile("ld x0, 0(%0)" : : "r"(A + (i + 1) * c + j - 1)); - asm volatile("ld x0, 0(%0)" : : "r"(A + (i + 1) * c + j + gvl)); - } - xUleft = __riscv_vfslide1up_vf_f64m4(xU, izq, gvl); - xUright = __riscv_vfslide1down_vf_f64m4(xU, der, gvl); - xUtmp = __riscv_vfadd_vv_f64m4(xUleft, xUright, gvl); - xUtmp = __riscv_vfadd_vv_f64m4(xUtmp, xUtop, gvl); - xUtmp = __riscv_vfadd_vv_f64m4(xUtmp, xUbottom, gvl); - xUtmp = __riscv_vfadd_vv_f64m4(xUtmp, xU, gvl); - xUtmp = __riscv_vfmul_vf_f64m4(xUtmp, (float)0.2, gvl); - __riscv_vse64_v_f64m4(&B[i * c + j], xUtmp, gvl); - } - } -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/jacobi2d.h b/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/jacobi2d.h deleted file mode 100644 index 3c029375..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/jacobi2d.h +++ /dev/null @@ -1,108 +0,0 @@ -/* -OHIO STATE UNIVERSITY SOFTWARE DISTRIBUTION LICENSE - -PolyBench/C, a collection of benchmarks containing static control -parts (the "Software") -Copyright (c) 2010-2016, Ohio State University. All rights reserved. - -The Software is available for download and use subject to the terms -and conditions of this License. Access or use of the Software -constitutes acceptance and agreement to the terms and conditions of -this License. Redistribution and use of the Software in source and -binary forms, with or without modification, are permitted provided -that the following conditions are met: - -1. Redistributions of source code must retain the above copyright -notice, this list of conditions and the capitalized paragraph below. - -2. Redistributions in binary form must reproduce the above copyright -notice, this list of conditions and the capitalized paragraph below in -the documentation and/or other materials provided with the -distribution. - -3. The name of Ohio State University, or its faculty, staff or -students may not be used to endorse or promote products derived from -the Software without specific prior written permission. - -This software was produced with support from the U.S. Defense Advanced -Research Projects Agency (DARPA), the U.S. Department of Energy (DoE) -and the U.S. National Science Foundation. Nothing in this work should -be construed as reflecting the official policy or position of the -Defense Department, the United States government or Ohio State -University. - -THIS SOFTWARE HAS BEEN APPROVED FOR PUBLIC RELEASE, UNLIMITED -DISTRIBUTION. THE SOFTWARE IS PROVIDED ?AS IS? AND WITHOUT ANY -EXPRESS, IMPLIED OR STATUTORY WARRANTIES, INCLUDING, BUT NOT LIMITED -TO, WARRANTIES OF ACCURACY, COMPLETENESS, NONINFRINGEMENT, -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. -ACCESS OR USE OF THE SOFTWARE IS ENTIRELY AT THE USER?S RISK. IN NO -EVENT SHALL OHIO STATE UNIVERSITY OR ITS FACULTY, STAFF OR STUDENTS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR -BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, -WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE -OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN -IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE SOFTWARE USER SHALL -INDEMNIFY, DEFEND AND HOLD HARMLESS OHIO STATE UNIVERSITY AND ITS -FACULTY, STAFF AND STUDENTS FROM ANY AND ALL CLAIMS, ACTIONS, DAMAGES, -LOSSES, LIABILITIES, COSTS AND EXPENSES, INCLUDING ATTORNEYS? FEES AND -COURT COSTS, DIRECTLY OR INDIRECTLY ARISING OUT OF OR IN CONNECTION -WITH ACCESS OR USE OF THE SOFTWARE. -*/ - -/** - * This version is stamped on May 10, 2016 - * - * Contact: - * Louis-Noel Pouchet - * Tomofumi Yuki - * - * Web address: http://polybench.sourceforge.net - */ -/* jacobi-2d.c: this file is part of PolyBench/C */ - -/************************************************************************* - * RISC-V Vectorized Version - * Author: Cristóbal Ramírez Lazo - * email: cristobal.ramirez@bsc.es - * Barcelona Supercomputing Center (2020) - *************************************************************************/ - -// Porting to Ara SW environment -// Author: Matteo Perotti, ETH Zurich, - -#ifndef _JACOBI2D_H_ - -#define _JACOBI2D_H_ - -#include -#include -#include - -#include - -#include "util.h" - -// The vector algorithm seems not to be parametrized on the data type -// So, don't change this parameter if also the vector implementation is used -#define DATA_TYPE double - -// Threshold for FP numbers comparison during the final check -#define THRESHOLD 0.000001 - -// #define SOURCE_PRINT -// #define RESULT_PRINT - -void j2d_s(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B, uint64_t tsteps); -void j2d_v(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B, uint64_t tsteps); -void j2d_kernel_v(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B); -void j2d_kernel_opt_v(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B); -void j2d_kernel_asm_v(uint64_t r, uint64_t c, DATA_TYPE *A, DATA_TYPE *B); - -int check_result(uint64_t r, uint64_t c, DATA_TYPE *A_s, DATA_TYPE *B_s, - DATA_TYPE *A_v, DATA_TYPE *B_v); -void output_printfile(uint64_t r, uint64_t c, DATA_TYPE *A); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/main.c b/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/main.c deleted file mode 100644 index 7fbefbad..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-jacobi2d/main.c +++ /dev/null @@ -1,124 +0,0 @@ -/* -OHIO STATE UNIVERSITY SOFTWARE DISTRIBUTION LICENSE - -PolyBench/C, a collection of benchmarks containing static control -parts (the "Software") -Copyright (c) 2010-2016, Ohio State University. All rights reserved. - -The Software is available for download and use subject to the terms -and conditions of this License. Access or use of the Software -constitutes acceptance and agreement to the terms and conditions of -this License. Redistribution and use of the Software in source and -binary forms, with or without modification, are permitted provided -that the following conditions are met: - -1. Redistributions of source code must retain the above copyright -notice, this list of conditions and the capitalized paragraph below. - -2. Redistributions in binary form must reproduce the above copyright -notice, this list of conditions and the capitalized paragraph below in -the documentation and/or other materials provided with the -distribution. - -3. The name of Ohio State University, or its faculty, staff or -students may not be used to endorse or promote products derived from -the Software without specific prior written permission. - -This software was produced with support from the U.S. Defense Advanced -Research Projects Agency (DARPA), the U.S. Department of Energy (DoE) -and the U.S. National Science Foundation. Nothing in this work should -be construed as reflecting the official policy or position of the -Defense Department, the United States government or Ohio State -University. - -THIS SOFTWARE HAS BEEN APPROVED FOR PUBLIC RELEASE, UNLIMITED -DISTRIBUTION. THE SOFTWARE IS PROVIDED ?AS IS? AND WITHOUT ANY -EXPRESS, IMPLIED OR STATUTORY WARRANTIES, INCLUDING, BUT NOT LIMITED -TO, WARRANTIES OF ACCURACY, COMPLETENESS, NONINFRINGEMENT, -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. -ACCESS OR USE OF THE SOFTWARE IS ENTIRELY AT THE USER?S RISK. IN NO -EVENT SHALL OHIO STATE UNIVERSITY OR ITS FACULTY, STAFF OR STUDENTS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR -BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, -WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE -OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN -IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE SOFTWARE USER SHALL -INDEMNIFY, DEFEND AND HOLD HARMLESS OHIO STATE UNIVERSITY AND ITS -FACULTY, STAFF AND STUDENTS FROM ANY AND ALL CLAIMS, ACTIONS, DAMAGES, -LOSSES, LIABILITIES, COSTS AND EXPENSES, INCLUDING ATTORNEYS? FEES AND -COURT COSTS, DIRECTLY OR INDIRECTLY ARISING OUT OF OR IN CONNECTION -WITH ACCESS OR USE OF THE SOFTWARE. -*/ - -/** - * This version is stamped on May 10, 2016 - * - * Contact: - * Louis-Noel Pouchet - * Tomofumi Yuki - * - * Web address: http://polybench.sourceforge.net - */ -/* jacobi-2d.c: this file is part of PolyBench/C */ - -/************************************************************************* - * RISC-V Vectorized Version - * Author: Cristóbal Ramírez Lazo - * email: cristobal.ramirez@bsc.es - * Barcelona Supercomputing Center (2020) - *************************************************************************/ - -// Porting to Ara SW environment -// Author: Matteo Perotti, ETH Zurich, - -#include -#include - -#include "ara/util.h" -#include "jacobi2d.h" -#include "util.h" - -// The padded matrices should be aligned in SW not on the padding, -// but on the actual data. -// R and C contain the padding as well. -extern uint64_t R; -extern uint64_t C; - -extern uint64_t TSTEPS; - -extern DATA_TYPE A_s[] __attribute__((aligned(32), section(".l2"))); -extern DATA_TYPE B_s[] __attribute__((aligned(32), section(".l2"))); -extern DATA_TYPE A_v[] __attribute__((aligned(32), section(".l2"))); -extern DATA_TYPE B_v[] __attribute__((aligned(32), section(".l2"))); - -int main() { - printf("JACOBI2D\n"); - - int error = 0; - unsigned long cycles1, cycles2, instr2, instr1; - -#if PREALLOCATE - j2d_v(R, C, A_v, B_v, TSTEPS); -#endif - - // Measure vector kernel execution - printf("Processing the vector benchmark\n"); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - j2d_v(R, C, A_v, B_v, TSTEPS); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - int64_t runtime = cycles2 - cycles1; - // 2* since we have 2 jacobi kernels, one on A_fixed_v, one on B_fixed_v - // TSTEPS*5*N*N is the number of DPFLOP to compute - float performance = (2.0 * TSTEPS * 5.0 * (R - 1) * (C - 1) / runtime); - printf("operations = %ld\n", (size_t)(TSTEPS * 5.0 * (R - 1) * (C - 1))); - printf("Vector jacobi2d (R=%ld C=%ld) cycle count: %d\n", R, C, runtime); - printf("The performance is %ld DPFLOP/1000 cycles\n", - (uint64_t)(1000.0 * performance)); - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-log/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-log/gen_data.py deleted file mode 100644 index 0c7f239f..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-log/gen_data.py +++ /dev/null @@ -1,71 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: vector size, arg2: filter size - -import random as rand -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -def rand_matrix(N, dtype): - return np.random.rand(N).astype(dtype) - - -# SCRIPT - -if len(sys.argv) == 2: - N_f64 = int(sys.argv[1]) - N_f32 = 2 * N_f64 -else: - print("Error. Give me one argument: the number of vector elements.") - sys.exit() - -# Vector of samples -args_f64 = rand_matrix(N_f64, np.float64).astype(np.float64) -args_f32 = rand_matrix(N_f32, np.float32).astype(np.float32) - -# Results buffer -results_f64 = np.zeros(N_f64, dtype=np.float64) -results_f32 = np.zeros(N_f32, dtype=np.float32) - -# Gold results -gold_results_f64 = np.log(args_f64, dtype=np.float64) -gold_results_f32 = np.log(args_f32, dtype=np.float32) - -# Create the file -print('.section .data,"aw",@progbits') -emit("N_f64", np.array(N_f64, dtype=np.uint64)) -emit("args_f64", args_f64, "NR_LANES*4") -emit("results_f64", results_f64, "NR_LANES*4") -emit("gold_results_f64", gold_results_f64, "NR_LANES*4") -emit("N_f32", np.array(N_f32, dtype=np.uint32)) -emit("args_f32", args_f32, "NR_LANES*4") -emit("results_f32", results_f32, "NR_LANES*4") -emit("gold_results_f32", gold_results_f32, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-log/log.c b/bb-tests/workloads/src/CTest/rvv/vec-log/log.c deleted file mode 100644 index a6253160..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-log/log.c +++ /dev/null @@ -1,59 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include "log.h" - -void log_1xf64_bmark(double *args, double *results, size_t len) { - - size_t avl = len; - vfloat64m1_t log_vec, res_vec; - - for (size_t vl = __riscv_vsetvl_e64m1(avl); avl > 0; avl -= vl) { - // Strip-mine - vl = __riscv_vsetvl_e64m1(avl); - // Load vector - log_vec = __riscv_vle64_v_f64m1(args, vl); - // Compute - res_vec = __log_1xf64(log_vec, vl); - // Store - __riscv_vse64_v_f64m1(results, res_vec, vl); - // Bump pointers - args += vl; - results += vl; - } -} - -void log_2xf32_bmark(float *args, float *results, size_t len) { - - size_t avl = len; - vfloat32m1_t log_vec, res_vec; - - for (size_t vl = __riscv_vsetvl_e32m1(avl); avl > 0; avl -= vl) { - // Strip-mine - vl = __riscv_vsetvl_e32m1(avl); - // Load vector - log_vec = __riscv_vle32_v_f32m1(args, vl); - // Compute - res_vec = __log_2xf32(log_vec, vl); - // Store - __riscv_vse32_v_f32m1(results, res_vec, vl); - // Bump pointers - args += vl; - results += vl; - } -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-log/log.h b/bb-tests/workloads/src/CTest/rvv/vec-log/log.h deleted file mode 100644 index af6685a6..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-log/log.h +++ /dev/null @@ -1,157 +0,0 @@ -// Modified version of: -// "RISC-V VECTOR LOG FUNCTION Version by Cristóbal Ramírez Lazo, "Barcelona -// 2019"" Find details on the original version below Author: Matteo Perotti -// - -// RISC-V VECTOR LOG FUNCTION Version by Cristóbal Ramírez Lazo, "Barcelona -// 2019" This RISC-V Vector implementation is based on the original code -// presented by Julien Pommier - -/* - AVX implementation of sin, cos, sincos, exp and log - Based on "sse_mathfun.h", by Julien Pommier - http://gruntthepeon.free.fr/ssemath/ - Copyright (C) 2012 Giovanni Garberoglio - Interdisciplinary Laboratory for Computational Science (LISC) - Fondazione Bruno Kessler and University of Trento - via Sommarive, 18 - I-38123 Trento (Italy) - This software is provided 'as-is', without any express or implied - warranty. In no event will the authors be held liable for any damages - arising from the use of this software. - Permission is granted to anyone to use this software for any purpose, - including commercial applications, and to alter it and redistribute it - freely, subject to the following restrictions: - 1. The origin of this software must not be misrepresented; you must not - claim that you wrote the original software. If you use this software - in a product, an acknowledgment in the product documentation would be - appreciated but is not required. - 2. Altered source versions must be plainly marked as such, and must not be - misrepresented as being the original software. - 3. This notice may not be removed or altered from any source distribution. - (this is the zlib license) -*/ - -#include -#include - -#include "ara/rivec/vector_defines.h" - -void log_1xf64_bmark(double *args, double *results, size_t len); -void log_2xf32_bmark(float *args, float *results, size_t len); - -inline _MMR_f64 __log_1xf64(_MMR_f64 x, unsigned long int gvl) { - - _MMR_i64 _x_i; - _MMR_u64 imm0_u; - _MMR_i64 imm0; - _MMR_f64 e; - _MMR_MASK_i64 invalid_mask = _MM_VFLE_f64(x, _MM_SET_f64(0.0f, gvl), gvl); - - x = _MM_MAX_f64(x, _MM_CAST_f64_i64(_MM_SET_i64(0x0010000000000000, gvl)), - gvl); /* cut off denormalized stuff */ - imm0_u = _MM_SRL_i64(_MM_CAST_u64_f64(x), - _MM_CAST_u64_i64(_MM_SET_i64(52, gvl)), gvl); - imm0 = _MM_CAST_i64_u64(imm0_u); - /* keep only the fractional part */ - _x_i = _MM_AND_i64(_MM_CAST_i64_f64(x), _MM_SET_i64(~0x7ff0000000000000, gvl), - gvl); - _x_i = _MM_OR_i64(_x_i, _MM_CAST_i64_f64(_MM_SET_f64(0.5f, gvl)), gvl); - x = _MM_CAST_f64_i64(_x_i); - imm0 = _MM_SUB_i64(imm0, _MM_SET_i64(1023, gvl), gvl); - e = _MM_VFCVT_F_X_f64(imm0, gvl); - e = _MM_ADD_f64(e, _MM_SET_f64(1.0f, gvl), gvl); - - _MMR_MASK_i64 mask = - _MM_VFLT_f64(x, _MM_SET_f64(0.707106781186547524, gvl), gvl); - _MMR_f64 tmp = _MM_MERGE_f64(_MM_SET_f64(0.0f, gvl), x, mask, gvl); - - x = _MM_SUB_f64(x, _MM_SET_f64(1.0f, gvl), gvl); - e = _MM_SUB_f64( - e, - _MM_MERGE_f64(_MM_SET_f64(0.0f, gvl), _MM_SET_f64(1.0f, gvl), mask, gvl), - gvl); - x = _MM_ADD_f64(x, tmp, gvl); - - _MMR_f64 z = _MM_MUL_f64(x, x, gvl); - _MMR_f64 y; - - y = _MM_MADD_f64(_MM_SET_f64(7.0376836292E-2, gvl), x, - _MM_SET_f64(-1.1514610310E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(1.1676998740E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(-1.2420140846E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(1.4249322787E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(-1.6668057665E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(2.0000714765E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(-2.4999993993E-1, gvl), gvl); - y = _MM_MADD_f64(y, x, _MM_SET_f64(3.3333331174E-1, gvl), gvl); - y = _MM_MUL_f64(y, z, gvl); - y = _MM_MACC_f64(y, e, _MM_SET_f64(-2.12194440e-4, gvl), gvl); - tmp = _MM_MUL_f64(z, _MM_SET_f64(0.5f, gvl), gvl); - y = _MM_SUB_f64(y, tmp, gvl); - tmp = _MM_MUL_f64(e, _MM_SET_f64(0.693359375, gvl), gvl); - x = _MM_ADD_f64(x, y, gvl); - x = _MM_ADD_f64(x, tmp, gvl); - x = _MM_MERGE_f64(x, _MM_CAST_f64_i64(_MM_SET_i64(0xffffffffffffffff, gvl)), - invalid_mask, gvl); - - return x; -} - -inline _MMR_f32 __log_2xf32(_MMR_f32 x, unsigned long int gvl) { - - _MMR_i32 _x_i; - _MMR_u32 imm0_u; - _MMR_i32 imm0; - _MMR_f32 e; - - _MMR_MASK_i32 invalid_mask = _MM_VFLE_f32(x, _MM_SET_f32(0.0f, gvl), gvl); - - x = _MM_MAX_f32(x, _MM_CAST_f32_i32(_MM_SET_i32(0x00800000, gvl)), - gvl); /* cut off denormalized stuff */ - imm0_u = _MM_SRL_i32(_MM_CAST_u32_f32(x), - _MM_CAST_u32_i32(_MM_SET_i32(23, gvl)), gvl); - imm0 = _MM_CAST_i32_u32(imm0_u); - /* keep only the fractional part */ - _x_i = _MM_AND_i32(_MM_CAST_i32_f32(x), _MM_SET_i32(~0x7f800000, gvl), gvl); - _x_i = _MM_OR_i32(_x_i, _MM_CAST_i32_f32(_MM_SET_f32(0.5f, gvl)), gvl); - x = _MM_CAST_f32_i32(_x_i); - imm0 = _MM_SUB_i32(imm0, _MM_SET_i32(0x7f, gvl), gvl); - e = _MM_VFCVT_F_X_f32(imm0, gvl); - e = _MM_ADD_f32(e, _MM_SET_f32(1.0f, gvl), gvl); - - _MMR_MASK_i32 mask = - _MM_VFLT_f32(x, _MM_SET_f32(0.707106781186547524, gvl), gvl); - _MMR_f32 tmp = _MM_MERGE_f32(_MM_SET_f32(0.0f, gvl), x, mask, gvl); - - x = _MM_SUB_f32(x, _MM_SET_f32(1.0f, gvl), gvl); - e = _MM_SUB_f32( - e, - _MM_MERGE_f32(_MM_SET_f32(0.0f, gvl), _MM_SET_f32(1.0f, gvl), mask, gvl), - gvl); - x = _MM_ADD_f32(x, tmp, gvl); - - _MMR_f32 z = _MM_MUL_f32(x, x, gvl); - _MMR_f32 y; - - y = _MM_MADD_f32(_MM_SET_f32(7.0376836292E-2, gvl), x, - _MM_SET_f32(-1.1514610310E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(1.1676998740E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(-1.2420140846E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(1.4249322787E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(-1.6668057665E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(2.0000714765E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(-2.4999993993E-1, gvl), gvl); - y = _MM_MADD_f32(y, x, _MM_SET_f32(3.3333331174E-1, gvl), gvl); - y = _MM_MUL_f32(y, z, gvl); - y = _MM_MACC_f32(y, e, _MM_SET_f32(-2.12194440e-4, gvl), gvl); - tmp = _MM_MUL_f32(z, _MM_SET_f32(0.5f, gvl), gvl); - y = _MM_SUB_f32(y, tmp, gvl); - tmp = _MM_MUL_f32(e, _MM_SET_f32(0.693359375, gvl), gvl); - x = _MM_ADD_f32(x, y, gvl); - x = _MM_ADD_f32(x, tmp, gvl); - x = _MM_MERGE_f32(x, _MM_CAST_f32_i32(_MM_SET_i32(0xffffffff, gvl)), - invalid_mask, gvl); - - return x; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-log/main.c b/bb-tests/workloads/src/CTest/rvv/vec-log/main.c deleted file mode 100644 index 30cf36a0..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-log/main.c +++ /dev/null @@ -1,87 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include -#include - -#include "ara/util.h" -#include "log.h" -#include "util.h" - -#define THRESHOLD 0.1 - -#define CHECK - -extern size_t N_f64; -extern double args_f64[] __attribute__((aligned(16))); -extern double results_f64[] __attribute__((aligned(16))); -extern double gold_results_f64[] __attribute__((aligned(16))); - -extern size_t N_f32; -extern float args_f32[] __attribute__((aligned(16))); -extern float results_f32[] __attribute__((aligned(16))); -extern float gold_results_f32[] __attribute__((aligned(16))); - -// Natural logarithm (base e) -int main() { - printf("FLOG\n"); - unsigned long cycles1, cycles2, instr2, instr1; - - int error = 0; - int64_t runtime; - - printf("Executing natural log (base e) on %d 64-bit data...\n", N_f64); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - log_1xf64_bmark(args_f64, results_f64, N_f64); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - runtime = cycles2 = cycles1; - printf("The execution took %d cycles.\n", runtime); - - printf("Executing natural log (base e) on %d 32-bit data...\n", N_f32); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - log_2xf32_bmark(args_f32, results_f32, N_f32); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - runtime = cycles2 = cycles1; - printf("The execution took %d cycles.\n", runtime); - -#ifdef CHECK - printf("Checking results:\n"); - - for (uint64_t i = 0; i < N_f64; ++i) { - if (!similarity_check(results_f64[i], gold_results_f64[i], THRESHOLD)) { - error = 1; - printf("64-bit error at index %d. %f != %f\n", i, results_f64[i], - gold_results_f64[i]); - } - } - for (uint64_t i = 0; i < N_f32; ++i) { - if (!similarity_check(results_f32[i], gold_results_f32[i], THRESHOLD)) { - error = 1; - printf("32-bit error at index %d. %f != %f\n", i, results_f32[i], - gold_results_f32[i]); - } - } -#endif - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/dataset1.h deleted file mode 100644 index d93de1e0..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/dataset1.h +++ /dev/null @@ -1,181 +0,0 @@ -#define DATA_SIZE 1000 - -int8_t input1_data[DATA_SIZE] = { - 0, 8, 5, 1, 7, 3, 1, 9, 5, 8, 6, 0, 9, 8, 3, 6, 9, 1, 1, 2, 4, 3, 6, 7, 3, - 6, 8, 3, 4, 9, 5, 5, 2, 2, 7, 6, 5, 6, 1, 0, 2, 2, 3, 9, 9, 2, 9, 8, 9, 9, - 1, 9, 5, 3, 6, 6, 1, 1, 8, 0, 2, 3, 1, 9, 8, 2, 5, 9, 7, 2, 4, 5, 4, 2, 4, - 7, 6, 3, 2, 9, 2, 1, 6, 6, 3, 3, 4, 2, 3, 3, 5, 0, 9, 4, 3, 5, 5, 9, 7, 2, - 7, 5, 1, 0, 1, 9, 9, 8, 5, 4, 6, 9, 2, 0, 0, 0, 0, 4, 5, 6, 3, 2, 8, 9, 8, - 3, 4, 5, 9, 3, 8, 8, 5, 9, 7, 3, 5, 8, 0, 5, 6, 4, 4, 1, 0, 0, 2, 0, 4, 7, - 8, 4, 4, 3, 5, 5, 8, 6, 2, 2, 4, 2, 8, 2, 7, 6, 2, 7, 5, 3, 5, 5, 3, 9, 2, - 8, 3, 7, 7, 9, 3, 7, 2, 9, 6, 7, 0, 3, 1, 1, 3, 6, 8, 8, 3, 1, 0, 1, 9, 3, - 5, 7, 5, 8, 5, 0, 1, 0, 1, 1, 7, 3, 9, 0, 1, 1, 8, 4, 2, 2, 4, 6, 3, 5, 7, - 5, 1, 0, 7, 5, 7, 0, 7, 0, 2, 8, 1, 7, 3, 9, 9, 3, 4, 4, 5, 6, 2, 7, 1, 8, - 9, 2, 2, 0, 7, 5, 3, 8, 8, 7, 2, 0, 4, 4, 7, 4, 0, 2, 3, 5, 7, 8, 4, 7, 5, - 7, 0, 3, 8, 9, 9, 5, 7, 4, 2, 4, 6, 0, 3, 2, 3, 2, 5, 7, 7, 5, 8, 8, 1, 1, - 6, 0, 2, 1, 8, 0, 3, 1, 8, 1, 1, 0, 7, 2, 1, 5, 2, 5, 1, 2, 6, 7, 7, 0, 0, - 8, 8, 5, 3, 4, 0, 6, 0, 1, 7, 9, 6, 8, 8, 2, 4, 1, 5, 5, 0, 9, 9, 3, 4, 1, - 5, 9, 5, 4, 3, 2, 4, 6, 3, 5, 6, 0, 1, 3, 1, 2, 3, 2, 4, 1, 6, 9, 7, 6, 0, - 1, 4, 9, 0, 7, 8, 1, 1, 0, 1, 8, 8, 4, 1, 2, 8, 8, 5, 3, 6, 4, 8, 9, 1, 4, - 8, 1, 5, 8, 4, 7, 8, 7, 8, 7, 1, 4, 2, 9, 1, 7, 5, 7, 4, 5, 7, 5, 3, 7, 9, - 2, 9, 2, 1, 5, 1, 7, 7, 1, 3, 7, 6, 5, 1, 0, 5, 8, 0, 5, 7, 6, 7, 4, 0, 9, - 7, 9, 8, 3, 0, 8, 9, 7, 1, 8, 6, 1, 8, 0, 6, 0, 2, 7, 7, 2, 1, 7, 9, 4, 3, - 3, 2, 5, 7, 7, 9, 0, 0, 9, 9, 8, 5, 8, 9, 2, 5, 5, 9, 4, 5, 2, 2, 8, 1, 4, - 7, 0, 6, 1, 2, 3, 7, 6, 0, 3, 9, 6, 7, 9, 2, 1, 1, 3, 1, 7, 9, 6, 5, 6, 4, - 2, 5, 5, 4, 9, 7, 2, 3, 9, 6, 0, 5, 4, 9, 2, 0, 2, 3, 8, 7, 6, 7, 6, 9, 3, - 0, 1, 4, 3, 6, 3, 9, 4, 4, 7, 7, 9, 5, 8, 0, 5, 9, 8, 1, 9, 2, 1, 3, 0, 0, - 0, 4, 7, 3, 3, 6, 7, 7, 3, 2, 5, 6, 5, 8, 0, 2, 4, 7, 4, 5, 1, 6, 8, 2, 7, - 1, 4, 4, 9, 1, 6, 8, 2, 2, 5, 9, 5, 1, 2, 7, 9, 7, 3, 4, 6, 7, 1, 3, 3, 3, - 3, 7, 0, 4, 5, 1, 3, 2, 3, 0, 3, 9, 6, 9, 3, 0, 5, 2, 6, 3, 8, 8, 3, 1, 5, - 5, 8, 3, 9, 4, 9, 1, 2, 1, 7, 0, 8, 6, 0, 7, 9, 8, 1, 9, 7, 5, 7, 2, 7, 1, - 1, 2, 4, 9, 0, 3, 6, 3, 1, 1, 1, 8, 7, 5, 2, 0, 6, 3, 9, 1, 0, 9, 1, 1, 5, - 3, 3, 5, 7, 3, 8, 5, 5, 4, 3, 5, 2, 9, 5, 3, 4, 7, 1, 4, 8, 8, 7, 5, 3, 0, - 8, 6, 5, 6, 9, 1, 3, 9, 9, 6, 7, 1, 8, 5, 9, 4, 3, 1, 2, 1, 5, 8, 1, 9, 2, - 4, 3, 7, 4, 7, 5, 4, 3, 7, 4, 8, 7, 0, 5, 2, 1, 6, 7, 5, 8, 5, 3, 5, 9, 2, - 2, 3, 4, 2, 2, 5, 4, 1, 3, 6, 5, 5, 6, 1, 7, 8, 4, 0, 0, 1, 0, 8, 4, 3, 0, - 0, 1, 0, 3, 7, 6, 2, 9, 0, 6, 9, 6, 8, 7, 7, 4, 6, 5, 9, 8, 2, 9, 1, 5, 5, - 8, 9, 2, 6, 6, 6, 9, 3, 3, 4, 1, 2, 1, 8, 2, 3, 1, 8, 3, 3, 8, 0, 4, 6, 5, - 2, 0, 7, 8, 8, 8, 6, 4, 0, 2, 4, 2, 6, 0, 3, 4, 0, 0, 7, 8, 5, 4, 3, 3, 2, - 4, 4, 6, 5, 3, 4, 5, 1, 8, 3, 9, 2, 1, 9, 0, 9, 3, 3, 2, 2, 6, 0, 9, 4, 2, - 3, 0, 6, 0, 6, 9, 9, 8, 4, 6, 8, 5, 6, 0, 3, 3, 0, 9, 7, 4, 8, 8, 5, 1, 7, - 0, 6, 5, 9, 7, 8, 2, 6, 8, 7, 4, 8, 6, 6, 6, 0, 7, 7, 1, 5, 0, 9, 6, 6, 3, - 2, 2, 6, 6, 5, 1, 5, 7, 0, 3, 9, 5, 3, 7, 6, 5, 2, 8, 9, 4, 7, 5, 0, 2, 7, - 7, 4, 6, 6, 4, 8, 3, 6, 2, 1, 3, 2, 7, 2, 9, 6, 6, 9, 2, 6, 3, 8, 6, 4, 9}; - -int input2_data[DATA_SIZE] = { - 454, 335, 1, 989, 365, 572, 64, 153, 216, 140, 210, 572, 339, 593, 898, - 228, 12, 883, 750, 646, 500, 436, 701, 812, 981, 150, 696, 564, 272, 258, - 647, 509, 88, 703, 669, 375, 551, 936, 592, 569, 952, 800, 584, 643, 368, - 489, 328, 313, 592, 388, 543, 649, 979, 997, 814, 79, 208, 998, 629, 847, - 704, 997, 253, 715, 430, 415, 538, 700, 4, 494, 100, 864, 693, 416, 296, - 285, 620, 78, 351, 540, 646, 169, 527, 289, 796, 801, 720, 758, 745, 92, - 989, 271, 853, 788, 531, 222, 461, 241, 358, 332, 684, 740, 446, 311, 743, - 557, 479, 557, 925, 796, 357, 891, 666, 514, 557, 870, 853, 440, 61, 678, - 396, 9, 17, 170, 291, 380, 536, 185, 917, 539, 983, 887, 54, 612, 951, - 479, 151, 7, 641, 335, 730, 95, 728, 280, 395, 688, 911, 476, 815, 729, - 265, 127, 236, 214, 180, 6, 503, 596, 173, 643, 346, 599, 68, 849, 658, - 619, 121, 131, 828, 667, 433, 487, 753, 125, 626, 14, 10, 403, 106, 703, - 818, 964, 406, 874, 856, 86, 60, 660, 667, 153, 121, 98, 412, 236, 12, - 423, 965, 216, 621, 361, 921, 715, 647, 299, 886, 682, 36, 493, 551, 537, - 969, 643, 434, 415, 303, 438, 860, 203, 478, 988, 675, 719, 990, 338, 450, - 633, 155, 646, 452, 427, 509, 988, 426, 12, 483, 142, 339, 390, 50, 171, - 601, 105, 968, 121, 879, 81, 870, 600, 603, 871, 887, 610, 404, 234, 745, - 526, 275, 441, 226, 752, 943, 726, 709, 201, 54, 758, 53, 397, 41, 141, - 416, 747, 219, 478, 770, 180, 482, 691, 725, 173, 186, 914, 1, 963, 247, - 464, 362, 521, 233, 120, 40, 779, 195, 161, 743, 439, 355, 403, 141, 633, - 289, 782, 320, 636, 118, 852, 70, 816, 388, 954, 36, 16, 698, 695, 677, - 598, 883, 824, 746, 462, 511, 534, 440, 428, 732, 726, 702, 547, 86, 798, - 215, 21, 651, 59, 429, 657, 96, 973, 659, 966, 524, 62, 625, 303, 714, - 409, 55, 728, 305, 436, 901, 592, 691, 796, 497, 177, 940, 995, 480, 158, - 822, 611, 680, 14, 111, 797, 185, 0, 718, 96, 749, 739, 814, 435, 326, - 37, 33, 605, 935, 27, 88, 441, 339, 344, 554, 365, 954, 639, 396, 991, - 249, 338, 832, 974, 393, 266, 470, 348, 336, 419, 249, 215, 542, 903, 636, - 729, 581, 820, 671, 979, 418, 670, 920, 568, 745, 662, 139, 385, 927, 173, - 457, 316, 183, 477, 196, 399, 416, 805, 996, 270, 735, 696, 825, 528, 50, - 623, 537, 87, 294, 867, 110, 398, 781, 646, 375, 943, 897, 589, 44, 288, - 845, 742, 99, 522, 443, 432, 165, 930, 28, 461, 323, 272, 376, 340, 898, - 158, 168, 443, 193, 631, 935, 274, 781, 185, 619, 292, 933, 156, 827, 88, - 987, 629, 649, 32, 1, 744, 399, 915, 791, 554, 984, 530, 600, 401, 683, - 540, 903, 120, 995, 521, 622, 224, 895, 530, 820, 651, 226, 96, 262, 569, - 238, 126, 610, 191, 238, 796, 884, 573, 108, 140, 789, 852, 23, 704, 890, - 480, 52, 372, 201, 546, 408, 119, 645, 464, 81, 293, 52, 880, 224, 744, - 735, 886, 167, 1, 532, 321, 169, 485, 101, 177, 42, 708, 654, 915, 625, - 242, 822, 795, 641, 252, 245, 151, 876, 333, 601, 938, 775, 397, 233, 755, - 454, 424, 210, 962, 900, 923, 655, 529, 595, 90, 464, 685, 70, 754, 32, - 494, 25, 389, 488, 37, 409, 639, 27, 950, 539, 80, 303, 723, 734, 125, - 552, 248, 107, 362, 48, 869, 144, 841, 724, 335, 470, 263, 343, 809, 677, - 339, 336, 410, 465, 56, 590, 485, 406, 993, 746, 238, 525, 336, 256, 134, - 546, 722, 367, 943, 106, 629, 396, 208, 429, 523, 130, 355, 990, 673, 991, - 719, 449, 84, 616, 211, 707, 737, 847, 452, 316, 974, 746, 796, 522, 618, - 115, 727, 226, 165, 200, 830, 742, 187, 705, 671, 785, 886, 962, 657, 293, - 620, 144, 173, 796, 72, 678, 80, 793, 685, 637, 967, 241, 898, 693, 372, - 601, 721, 398, 553, 72, 174, 978, 325, 558, 185, 505, 859, 651, 573, 321, - 349, 400, 890, 844, 885, 933, 980, 448, 989, 50, 332, 900, 716, 747, 444, - 6, 394, 285, 703, 450, 652, 771, 485, 534, 559, 481, 507, 434, 343, 42, - 784, 865, 421, 415, 871, 539, 162, 105, 481, 595, 115, 350, 964, 287, 232, - 154, 602, 539, 943, 872, 121, 652, 811, 747, 362, 340, 910, 206, 572, 505, - 973, 961, 354, 627, 849, 971, 910, 410, 770, 63, 874, 396, 482, 619, 646, - 557, 328, 67, 884, 512, 972, 6, 513, 882, 562, 764, 366, 506, 786, 831, - 382, 638, 452, 72, 83, 59, 932, 929, 924, 961, 69, 797, 985, 854, 885, - 600, 389, 232, 793, 179, 773, 689, 775, 494, 139, 234, 431, 780, 371, 22, - 653, 741, 815, 428, 139, 603, 315, 344, 889, 317, 260, 861, 377, 511, 304, - 70, 35, 854, 576, 490, 326, 303, 431, 813, 708, 388, 962, 967, 442, 49, - 831, 251, 321, 741, 179, 176, 117, 523, 764, 952, 704, 531, 804, 23, 611, - 846, 375, 854, 971, 24, 639, 318, 723, 662, 647, 281, 158, 294, 885, 734, - 866, 471, 296, 673, 472, 439, 5, 155, 506, 948, 600, 445, 222, 784, 349, - 943, 150, 366, 444, 604, 720, 340, 972, 911, 321, 435, 50, 78, 761, 950, - 238, 27, 226, 201, 176, 877, 450, 879, 99, 143, 31, 812, 771, 527, 488, - 797, 194, 293, 966, 276, 345, 413, 197, 386, 116, 322, 680, 538, 553, 960, - 874, 48, 506, 898, 539, 495, 764, 805, 286, 432, 836, 192, 825, 778, 586, - 359, 352, 746, 11, 749, 5, 408, 643, 441, 368, 97, 169, 359, 527, 672, - 69, 880, 298, 300, 327, 923, 829, 816, 497, 243, 981, 917, 713, 653, 503, - 406, 543, 108, 304, 464, 954, 86, 802, 446, 28}; - -int verify_data[DATA_SIZE] = { - 454, 1, 1, 989, 1, 572, 64, 1, 1, 1, 1, 572, 1, 1, 898, - 1, 1, 883, 750, 646, 500, 436, 1, 1, 981, 1, 1, 564, 272, 1, - 1, 1, 88, 703, 1, 1, 1, 1, 592, 569, 952, 800, 584, 1, 1, - 489, 1, 1, 1, 1, 543, 1, 1, 997, 1, 1, 208, 998, 1, 847, - 704, 997, 253, 1, 1, 415, 1, 1, 1, 494, 100, 1, 693, 416, 296, - 1, 1, 78, 351, 1, 646, 169, 1, 1, 796, 801, 720, 758, 745, 92, - 1, 271, 1, 788, 531, 1, 1, 1, 1, 332, 1, 1, 446, 311, 743, - 1, 1, 1, 1, 796, 1, 1, 666, 514, 557, 870, 853, 440, 1, 1, - 396, 9, 1, 1, 1, 380, 536, 1, 1, 539, 1, 1, 1, 1, 1, - 479, 1, 1, 641, 1, 1, 95, 728, 280, 395, 688, 911, 476, 815, 1, - 1, 127, 236, 214, 1, 1, 1, 1, 173, 643, 346, 599, 1, 849, 1, - 1, 121, 1, 1, 667, 1, 1, 753, 1, 626, 1, 10, 1, 1, 1, - 818, 1, 406, 1, 1, 1, 60, 660, 667, 153, 121, 1, 1, 1, 12, - 423, 965, 216, 1, 361, 1, 1, 1, 1, 1, 682, 36, 493, 551, 537, - 1, 643, 1, 415, 303, 438, 1, 203, 478, 988, 675, 1, 990, 1, 1, - 1, 155, 646, 1, 1, 1, 988, 1, 12, 483, 1, 339, 1, 50, 1, - 1, 105, 968, 121, 1, 1, 870, 1, 603, 1, 1, 610, 404, 234, 1, - 1, 275, 1, 1, 1, 943, 726, 709, 201, 1, 758, 53, 397, 41, 1, - 1, 1, 219, 1, 1, 1, 482, 691, 1, 1, 1, 1, 1, 963, 247, - 464, 1, 521, 233, 120, 40, 779, 1, 1, 1, 1, 1, 1, 141, 633, - 1, 782, 320, 636, 1, 852, 70, 816, 1, 954, 36, 16, 1, 695, 677, - 1, 883, 1, 746, 462, 1, 1, 1, 428, 732, 1, 1, 1, 86, 798, - 215, 1, 651, 59, 1, 1, 1, 1, 1, 966, 524, 62, 1, 1, 714, - 1, 1, 728, 305, 436, 1, 1, 1, 796, 497, 177, 940, 1, 480, 1, - 1, 611, 680, 14, 111, 797, 185, 0, 718, 96, 1, 1, 1, 1, 326, - 37, 33, 1, 935, 1, 1, 441, 339, 344, 554, 1, 1, 639, 396, 991, - 1, 1, 1, 974, 1, 266, 1, 1, 336, 419, 1, 215, 1, 1, 636, - 1, 1, 1, 1, 1, 418, 670, 920, 1, 745, 1, 1, 1, 927, 1, - 1, 1, 183, 1, 1, 399, 1, 805, 996, 1, 735, 1, 1, 528, 50, - 1, 1, 1, 294, 867, 1, 1, 781, 1, 1, 1, 1, 589, 44, 1, - 1, 1, 1, 522, 443, 1, 1, 1, 28, 1, 1, 272, 1, 340, 1, - 158, 168, 1, 1, 631, 935, 1, 1, 185, 619, 292, 933, 1, 1, 1, - 1, 629, 649, 1, 1, 1, 1, 1, 1, 554, 1, 1, 1, 401, 1, - 540, 903, 1, 995, 521, 1, 224, 1, 530, 820, 651, 1, 1, 262, 569, - 1, 1, 1, 1, 238, 796, 884, 573, 108, 1, 1, 1, 1, 1, 890, - 480, 1, 1, 201, 1, 1, 119, 645, 1, 1, 293, 1, 880, 1, 744, - 735, 886, 167, 1, 1, 1, 1, 1, 1, 177, 42, 708, 654, 915, 1, - 242, 1, 795, 641, 1, 1, 1, 1, 1, 601, 1, 1, 1, 233, 1, - 454, 424, 210, 962, 900, 923, 655, 1, 595, 90, 1, 1, 1, 754, 32, - 1, 1, 1, 1, 37, 409, 639, 1, 950, 1, 80, 1, 1, 734, 1, - 552, 248, 107, 1, 48, 1, 1, 841, 724, 1, 1, 1, 343, 809, 1, - 1, 1, 410, 465, 1, 1, 485, 406, 993, 746, 238, 1, 336, 256, 1, - 546, 722, 367, 943, 106, 629, 1, 1, 1, 523, 130, 1, 990, 1, 991, - 1, 1, 84, 616, 1, 1, 1, 847, 1, 316, 1, 746, 796, 522, 1, - 115, 1, 1, 165, 1, 1, 1, 187, 1, 1, 1, 1, 962, 1, 293, - 620, 144, 173, 1, 72, 678, 1, 793, 685, 637, 967, 1, 1, 1, 372, - 601, 1, 398, 1, 72, 174, 1, 325, 558, 1, 505, 859, 1, 1, 321, - 1, 1, 1, 844, 885, 1, 980, 1, 1, 50, 332, 1, 716, 747, 1, - 1, 1, 1, 703, 450, 1, 1, 1, 1, 1, 481, 507, 1, 1, 1, - 1, 865, 1, 1, 1, 539, 162, 105, 481, 595, 1, 1, 964, 1, 232, - 154, 602, 1, 943, 1, 1, 652, 811, 1, 362, 1, 1, 206, 1, 505, - 973, 1, 1, 1, 1, 1, 910, 1, 1, 63, 874, 396, 482, 619, 646, - 1, 328, 67, 884, 1, 1, 1, 1, 882, 1, 1, 366, 506, 786, 831, - 382, 1, 452, 72, 83, 59, 932, 929, 924, 1, 1, 797, 1, 854, 1, - 1, 1, 1, 1, 1, 773, 1, 1, 1, 1, 234, 1, 780, 1, 1, - 1, 1, 815, 1, 1, 1, 1, 344, 889, 317, 260, 861, 377, 1, 304, - 70, 35, 1, 576, 490, 1, 303, 431, 1, 1, 388, 962, 1, 1, 1, - 1, 1, 321, 741, 179, 176, 117, 1, 764, 952, 704, 531, 804, 1, 1, - 1, 375, 854, 971, 24, 639, 318, 1, 1, 647, 281, 1, 294, 1, 734, - 1, 471, 296, 1, 472, 1, 5, 155, 506, 948, 1, 445, 1, 784, 349, - 943, 150, 1, 444, 1, 1, 1, 1, 911, 1, 1, 1, 1, 761, 950, - 238, 27, 1, 1, 176, 1, 1, 1, 99, 1, 31, 1, 1, 1, 1, - 1, 194, 1, 1, 1, 345, 1, 1, 1, 1, 322, 1, 1, 553, 1, - 874, 1, 1, 1, 539, 495, 764, 1, 1, 1, 836, 1, 1, 778, 586, - 1, 1, 746, 1, 1, 1, 408, 1, 1, 368, 1, 1, 359, 527, 1, - 1, 880, 1, 1, 327, 1, 829, 1, 497, 243, 981, 917, 1, 653, 1, - 1, 1, 1, 304, 1, 954, 1, 1, 446, 1}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/mixed_with_mask_gendata.pl b/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/mixed_with_mask_gendata.pl deleted file mode 100755 index dc9aa1a2..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/mixed_with_mask_gendata.pl +++ /dev/null @@ -1,191 +0,0 @@ -#!/usr/bin/perl -w -#========================================================================== -# conditional_gendata.pl -# -# Author: Generated -# Date: Today -# -(our $usageMsg = <<'ENDMSG') =~ s/^\#//gm; -# -# Simple script which creates an input data set and the reference data -# for the given conditional operation. -# -ENDMSG - -use strict "vars"; -use warnings; -no warnings("once"); -use Getopt::Long; - -#-------------------------------------------------------------------------- -# Command line processing -#-------------------------------------------------------------------------- - -our %opts; - -sub usage() -{ - - print "\n"; - print " Usage: conditional_gendata.pl [options] \n"; - print "\n"; - print " Options:\n"; - print " --help print this message\n"; - print " --size size of input data [1000]\n"; - print " --seed random seed [1]\n"; - print "$usageMsg"; - - exit(); -} - -sub processCommandLine() -{ - - $opts{"help"} = 0; - $opts{"size"} = 1000; - $opts{"seed"} = 1; - Getopt::Long::GetOptions( \%opts, 'help|?', 'size:i', 'seed:i' ) or usage(); - $opts{"help"} and usage(); - -} - -#-------------------------------------------------------------------------- -# Helper Functions -#-------------------------------------------------------------------------- - -sub printArray -{ - my $arrayName = $_[0]; - my $arrayRef = $_[1]; - - my $numCols = 20; - my $arrayLen = scalar(@{$arrayRef}); - - print "int ".$arrayName."[DATA_SIZE] = \n"; - print "{\n"; - - if ( $arrayLen <= $numCols ) { - print " "; - for ( my $i = 0; $i < $arrayLen; $i++ ) { - print sprintf("%3d",$arrayRef->[$i]); - if ( $i != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - else { - my $numRows = int($arrayLen/$numCols); - for ( my $j = 0; $j < $numRows; $j++ ) { - print " "; - for ( my $i = 0; $i < $numCols; $i++ ) { - my $index = $j*$numCols + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - if ( $arrayLen > ($numRows*$numCols) ) { - print " "; - for ( my $i = 0; $i < ($arrayLen-($numRows*$numCols)); $i++ ) { - my $index = $numCols*$numRows + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - } - - print "};\n\n"; -} - -sub print8bitArray -{ - my $arrayName = $_[0]; - my $arrayRef = $_[1]; - - my $numCols = 20; - my $arrayLen = scalar(@{$arrayRef}); - - print "int8_t ".$arrayName."[DATA_SIZE] = \n"; - print "{\n"; - - if ( $arrayLen <= $numCols ) { - print " "; - for ( my $i = 0; $i < $arrayLen; $i++ ) { - print sprintf("%3d",$arrayRef->[$i]); - if ( $i != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - else { - my $numRows = int($arrayLen/$numCols); - for ( my $j = 0; $j < $numRows; $j++ ) { - print " "; - for ( my $i = 0; $i < $numCols; $i++ ) { - my $index = $j*$numCols + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - if ( $arrayLen > ($numRows*$numCols) ) { - print " "; - for ( my $i = 0; $i < ($arrayLen-($numRows*$numCols)); $i++ ) { - my $index = $numCols*$numRows + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - } - - print "};\n\n"; -} - -#-------------------------------------------------------------------------- -# Main -#-------------------------------------------------------------------------- - -sub main() -{ - - processCommandLine(); - srand($opts{"seed"}); - - my @input1_data; - my @input2_data; - my @verify_data; - for ( my $i = 0; $i < $opts{"size"}; $i++ ) { - my $valueA = int(rand(10)); # Ensure fluctuation around 5 - my $valueC = int(rand(999)); - - push( @input1_data, $valueA ); - push( @input2_data, $valueC ); - push( @verify_data, ($valueA < 5) ? $valueC : 1 ); - } - - print "\n\#define DATA_SIZE ".$opts{"size"}." \n\n"; - print8bitArray( "input1_data", \@input1_data ); - printArray( "input2_data", \@input2_data ); - printArray( "verify_data", \@verify_data ); - -} - -main(); diff --git a/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/vec-mixed_width_mask.S b/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/vec-mixed_width_mask.S deleted file mode 100644 index 8af7edae..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-mixed_width_mask/vec-mixed_width_mask.S +++ /dev/null @@ -1,25 +0,0 @@ - .text - .balign 4 - .global vec_mixed_width_mask - -# Code using one width for predicate and different width for masked -# compute. -# int8_t a[]; int32_t b[], c[]; -# for (i=0; i - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void vec_mixed_width_mask(size_t n, int8_t x[], int y[], int z[]); - -int main(int argc, char *argv[]) { - int results_data[DATA_SIZE]; - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_mixed_width_mask(DATA_SIZE, input1_data, results_data, input2_data); -#endif - - // Do the compute - setStats(1); - vec_mixed_width_mask(DATA_SIZE, input1_data, results_data, input2_data); - setStats(0); - - // Check the results - return verify(DATA_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/LICENSE_0 b/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/LICENSE_0 deleted file mode 100644 index a7393e56..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/LICENSE_0 +++ /dev/null @@ -1,41 +0,0 @@ -Pathfinder is derived by RODINIA. -Original RODINIA License: - -LICENSE TERMS - -Copyright (c)2008-2011 University of Virginia -All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted without royalty fees or other restrictions, provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. - * Neither the name of the University of Virginia, the Dept. of Computer Science, nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF VIRGINIA OR THE SOFTWARE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -If you use this software or a modified version of it, please cite the most relevant among the following papers: - -- M. A. Goodrum, M. J. Trotter, A. Aksel, S. T. Acton, and K. Skadron. Parallelization of Particle Filter Algorithms. In Proceedings -of the 3rd Workshop on Emerging Applications and Many-core Architecture (EAMA), in conjunction with the IEEE/ACM International -Symposium on Computer Architecture (ISCA), June 2010. - -- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, Sang-Ha Lee and K. Skadron. -"Rodinia: A Benchmark Suite for Heterogeneous Computing". IEEE International Symposium -on Workload Characterization, Oct 2009. - -- J. Meng and K. Skadron. "Performance Modeling and Automatic Ghost Zone Optimization -for Iterative Stencil Loops on GPUs." In Proceedings of the 23rd Annual ACM International -Conference on Supercomputing (ICS), June 2009. - -- L.G. Szafaryn, K. Skadron and J. Saucerman. "Experiences Accelerating MATLAB Systems -Biology Applications." in Workshop on Biomedicine in Computing (BiC) at the International -Symposium on Computer Architecture (ISCA), June 2009. - -- M. Boyer, D. Tarjan, S. T. Acton, and K. Skadron. "Accelerating Leukocyte Tracking using CUDA: -A Case Study in Leveraging Manycore Coprocessors." In Proceedings of the International Parallel -and Distributed Processing Symposium (IPDPS), May 2009. - -- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. "A Performance -Study of General Purpose Applications on Graphics Processors using CUDA" Journal of -Parallel and Distributed Computing, Elsevier, June 2008. diff --git a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/LICENSE_1 b/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/LICENSE_1 deleted file mode 100644 index 107b78e1..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/LICENSE_1 +++ /dev/null @@ -1,28 +0,0 @@ -The vectorized RODINIA comes from RiVEC Benchmark Suite. -RiVEC License: - -Copyright (c) 2020, Barcelona Supercomputing Center -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are -met: redistributions of source code must retain the above copyright -notice, this list of conditions and the following disclaimer; -redistributions in binary form must reproduce the above copyright -notice, this list of conditions and the following disclaimer in the -documentation and/or other materials provided with the distribution; -neither the name of the copyright holders nor the names of its -contributors may be used to endorse or promote products derived from -this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/gen_data.py deleted file mode 100644 index 13741047..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/gen_data.py +++ /dev/null @@ -1,70 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: vector size, arg2: filter size - -import random as rand -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -def rand_matrix(N, dtype): - return np.random.rand(N).astype(dtype) - - -# SCRIPT - -if len(sys.argv) == 4: - num_runs = int(sys.argv[1]) - cols = int(sys.argv[2]) - rows = int(sys.argv[3]) -else: - print("Error. Give me three arguments: num_runs, cols, and rows.") - sys.exit() - -dtype = np.int32 -dmax = np.iinfo(dtype).max - -# Vector of samples -wall = np.random.randint(10, size=rows * cols, dtype=dtype) - -# Buffers -result_s = np.zeros(cols, dtype=dtype) -result_v = np.zeros(cols, dtype=dtype) -src = np.zeros(cols, dtype=dtype) - -# Create the file -print('.section .data,"aw",@progbits') -emit("num_runs", np.array(num_runs, dtype=np.uint32)) -emit("rows", np.array(rows, dtype=np.uint32)) -emit("cols", np.array(cols, dtype=np.uint32)) -emit("wall", wall, "NR_LANES*4") -emit("result_s", result_s, "NR_LANES*4") -emit("result_v", result_v, "NR_LANES*4") -emit("src", src, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/main.c b/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/main.c deleted file mode 100644 index 60a19193..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/main.c +++ /dev/null @@ -1,89 +0,0 @@ -// Modified version of pathfinder from RODINIA and then RiVEC, adapted to Ara -// environment. Author: Matteo Perotti Check LICENSE_0 -// and LICENCE_1 for additional information - -/************************************************************************* - * RISC-V Vectorized Version - * Author: Cristóbal Ramírez Lazo - * email: cristobal.ramirez@bsc.es - * Barcelona Supercomputing Center (2020) - *************************************************************************/ - -#include -#include - -#include "pathfinder.h" -#include "util.h" -#include - -// #define CHECK - -extern int32_t num_runs; -extern int32_t rows; -extern int32_t cols; -extern int src[] __attribute__((aligned(32), section(".l2"))); -extern int wall[] __attribute__((aligned(32), section(".l2"))); -extern int result_v[] __attribute__((aligned(32), section(".l2"))); -extern int result_s[] __attribute__((aligned(32), section(".l2"))); - -int verify_result(int *result_s, int *result_v, uint32_t cols) { -#ifdef CHECK - // Check vector with scalar result - for (uint32_t i = 0; i < cols; i++) { - if (result_v[i] != result_s[i]) { - printf("Error. result_v[%d]=%d != result_s[%d]=%d \n", i, result_v[i], i, - result_s[i]); - return 1; - } - } - - printf("Test result: PASS. No errors found.\n"); -#else - volatile uint32_t x; - for (uint32_t i = 0; i < cols; i++) { - x = result_v[i]; - } -#endif - return 0; -} - -int main() { - printf("PATHFINDER\n"); - - int error; - int *s_ptr; - unsigned long cycles1, cycles2, instr2, instr1; - - printf("Number of runs: %d\n", num_runs); - printf("rows=%ld cols=%ld\n", rows, cols); - printf("operations=%ld\n", num_runs * cols * rows * 3); - -#ifdef CHECK - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - s_ptr = run(wall, result_s, src, cols, rows, num_runs); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - printf("Scalar code cycles: %d\n", cycles2 - cycles1); -#endif - -#define TEST(l) \ - instr1 = read_csr(minstret); \ - cycles1 = read_csr(mcycle); \ - run_vectorm##l(wall, result_v, cols, rows, num_runs); \ - asm volatile("fence"); \ - instr2 = read_csr(minstret); \ - cycles2 = read_csr(mcycle); \ - printf("Vector code LMUL=%d, cycles: %d\n", l, cycles2 - cycles1); \ - error = verify_result(s_ptr, result_v, cols); \ - if (error) \ - return error; - - TEST(1); - TEST(2); - TEST(4); - TEST(8); - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/pathfinder.c b/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/pathfinder.c deleted file mode 100644 index 8cae18b8..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/pathfinder.c +++ /dev/null @@ -1,100 +0,0 @@ -// Modified version of pathfinder from RODINIA and then RiVEC, adapted to Ara -// environment. Author: Matteo Perotti Check LICENSE_0 -// and LICENCE_1 for additional information - -/************************************************************************* - * RISC-V Vectorized Version - * Author: Cristóbal Ramírez Lazo - * email: cristobal.ramirez@bsc.es - * Barcelona Supercomputing Center (2020) - *************************************************************************/ - -#include "pathfinder.h" - -#define MIN(a, b) ((a < b) ? a : b) - -int *run(int *wall, int *result_s, int *src, uint32_t cols, uint32_t rows, - uint32_t num_runs) { - int min; - int *temp; - int *dst; - - for (uint32_t j = 0; j < num_runs; j++) { - for (uint32_t x = 0; x < cols; x++) { - result_s[x] = wall[x]; - } - - dst = result_s; - - for (uint32_t t = 0; t < rows - 1; t++) { - temp = src; - src = dst; - dst = temp; - for (uint32_t n = 0; n < cols; n++) { - min = src[n]; - if (n > 0) - min = MIN(min, src[n - 1]); - if (n < cols - 1) - min = MIN(min, src[n + 1]); - dst[n] = wall[(t + 1) * cols + n] + min; - } - } - // Reset the pointer not to lose it - src = temp; - } - return dst; -} - -#define IMPL(l) \ - void run_vectorm##l(int *wall, int *result_v, uint32_t cols, uint32_t rows, \ - uint32_t num_runs) { \ - \ - size_t gvl; \ - \ - vint32m##l##_t temp; \ - vint32m##l##_t xSrc_slideup; \ - vint32m##l##_t xSrc_slidedown; \ - vint32m##l##_t xSrc; \ - vint32m##l##_t xNextrow; \ - \ - int aux, aux2; \ - int *dst; \ - \ - for (uint32_t j = 0; j < num_runs; j++) { \ - for (uint32_t n = 0; n < cols; n += gvl) { \ - gvl = __riscv_vsetvl_e32m##l(cols); \ - temp = __riscv_vle32_v_i32m##l(&wall[n], gvl); \ - __riscv_vse32_v_i32m##l(&result_v[n], temp, gvl); \ - } \ - dst = result_v; \ - \ - gvl = __riscv_vsetvl_e32m##l(cols); \ - \ - for (uint32_t t = 0; t < rows - 1; t++) { \ - aux = dst[0]; \ - for (uint32_t n = 0; n < cols; n = n + gvl) { \ - gvl = __riscv_vsetvl_e32m##l(cols - n); \ - xNextrow = __riscv_vle32_v_i32m##l(&dst[n], gvl); \ - \ - xSrc = xNextrow; \ - aux2 = (n + gvl >= cols) ? dst[n + gvl - 1] : dst[n + gvl]; \ - xSrc_slideup = __riscv_vslide1up_vx_i32m##l(xSrc, aux, gvl); \ - xSrc_slidedown = __riscv_vslide1down_vx_i32m##l(xSrc, aux2, gvl); \ - \ - xSrc = __riscv_vmin_vv_i32m##l(xSrc, xSrc_slideup, gvl); \ - xSrc = __riscv_vmin_vv_i32m##l(xSrc, xSrc_slidedown, gvl); \ - \ - xNextrow = __riscv_vle32_v_i32m##l(&wall[(t + 1) * cols + n], gvl); \ - xNextrow = __riscv_vadd_vv_i32m##l(xNextrow, xSrc, gvl); \ - \ - aux = dst[n + gvl - 1]; \ - __riscv_vse32_v_i32m##l(&dst[n], xNextrow, gvl); \ - } \ - } \ - } \ - } - -IMPL(1) -IMPL(2) -IMPL(4) -IMPL(8) diff --git a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/pathfinder.h b/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/pathfinder.h deleted file mode 100644 index b1524b32..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-pathfinder/pathfinder.h +++ /dev/null @@ -1,19 +0,0 @@ -#ifndef _PATHFINDER_H_ -#define _PATHFINDER_H_ - -#include -#include -#include - -int *run(int *wall, int *result_s, int *src, uint32_t cols, uint32_t rows, - uint32_t num_runs); - -#define DECLARE(l) \ - void run_vectorm##l(int *wall, int *result_v, uint32_t cols, uint32_t rows, \ - uint32_t num_runs); - -DECLARE(1) -DECLARE(2) -DECLARE(4) -DECLARE(8) -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-roi-align/gen_data.py deleted file mode 100755 index c403e2b4..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/gen_data.py +++ /dev/null @@ -1,96 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: image size, arg2: filter size - -import numpy as np -import sys - - -# Batch * Depth * Height * Width -def rand_matrix(dims): - mtx = np.random.rand(*dims).astype(dtype=np.float32) - return mtx - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# Define the filter size and the matrix dimension (max, for now, is 128 64-bit elements) -if len(sys.argv) > 1: - batch_size = int(sys.argv[1]) - depth = int(sys.argv[2]) - height = int(sys.argv[3]) - width = int(sys.argv[4]) - n_boxes = int(sys.argv[5]) - crop_h = int(sys.argv[6]) - crop_w = int(sys.argv[7]) -else: - print( - "Give me 7 arguments. Batch_size, depth, height, width, n_boxes (in total), crop_h, crop_w." - ) - sys.exit(-1) - -# Generate a random batch of feature maps -image_data = np.random.rand(batch_size, depth, height, width).astype(np.float32) - -# Generate random coordinates for the boxes -xs = np.random.uniform(0, width, size=(n_boxes, 2)) -ys = np.random.uniform(0, height, size=(n_boxes, 2)) - -xs /= height - 1 -ys /= width - 1 - -xs.sort(axis=1) -ys.sort(axis=1) - -boxes_data = np.stack((ys[:, 0], xs[:, 0], ys[:, 1], xs[:, 1]), axis=-1).astype( - np.float32 -) - -# Generate box indexes -box_index_data = np.random.randint(0, batch_size, size=n_boxes, dtype=np.int32) - -# Generate mem space for output crops -# Randomize to enhance verification -dims = (n_boxes, depth, crop_h, crop_w) -crops_data = rand_matrix(dims) -crops_data_vec = rand_matrix(dims) - -# Print information on file -print('.section .data,"aw",@progbits') -emit("BATCH_SIZE", np.array(batch_size, dtype=np.uint64)) -emit("DEPTH", np.array(depth, dtype=np.uint64)) -emit("IMAGE_HEIGHT", np.array(height, dtype=np.uint64)) -emit("IMAGE_WIDTH", np.array(width, dtype=np.uint64)) -emit("N_BOXES", np.array(n_boxes, dtype=np.uint64)) -emit("CROP_HEIGHT", np.array(crop_h, dtype=np.uint64)) -emit("CROP_WIDTH", np.array(crop_w, dtype=np.uint64)) -emit("image_data", np.concatenate(image_data), "NR_LANES*4") -emit("boxes_data", boxes_data, "NR_LANES*4") -emit("box_index_data", box_index_data, "NR_LANES*4") -emit("crops_data", np.concatenate(crops_data), "NR_LANES*4") -emit("crops_data_vec", np.concatenate(crops_data_vec), "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/main.c b/bb-tests/workloads/src/CTest/rvv/vec-roi-align/main.c deleted file mode 100644 index 74727413..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/main.c +++ /dev/null @@ -1,113 +0,0 @@ -/* - Original implementation taken from - https://github.com/longcw/RoIAlign.pytorch No license found on the website. - A question about the license was made here - https://github.com/longcw/RoIAlign.pytorch/issues/48 Following the answer to - this question, a correct header will be added also here - Adaptation by: Matteo Perotti, ETH Zurich, -*/ - -#include -#include - -#include "ara/util.h" -#include "roi_align.h" -#include "util.h" -#include - -#define EXTRAPOLATION_VALUE 0 - -extern uint64_t BATCH_SIZE; -extern uint64_t DEPTH; -extern uint64_t IMAGE_HEIGHT; -extern uint64_t IMAGE_WIDTH; -extern uint64_t N_BOXES; -extern uint64_t CROP_HEIGHT; -extern uint64_t CROP_WIDTH; - -extern float image_data[]; -extern float boxes_data[]; -extern int box_index_data[]; -extern float crops_data[]; -extern float crops_data_vec[]; - -// Compare the vector and scalar implementation. -// Return 0 if no error is found -// Return -1 if we have an error on the first element -// A positive return value indicates the index of the faulty element -int verify_result(float *s_crops_data, float *v_crops_data, size_t size, - float delta) { - int ret; - - for (unsigned long int i = 0; i < size; ++i) { - if (!similarity_check_32b(s_crops_data[i], v_crops_data[i], delta)) { - ret = (!i) ? -1 : i; - return ret; - } - } - - return 0; -} - -int main() { - printf("RoI Align\n"); - - int64_t err; - unsigned long cycles1, cycles2, instr2, instr1; - uint64_t runtime_s, runtime_v; - uint64_t result_size = N_BOXES * DEPTH * CROP_HEIGHT * CROP_WIDTH; - - // Parameters - printf("BATCH_SIZE = %ld\nDEPTH = %ld\nIMAGE_HEIGHT = %ld\nIMAGE_WIDTH = " - "%ld\nN_BOXES = %ld\nCROP_HEIGHT = %ld\nCROP_WIDTH = " - "%ld\nEXTRAPOLATION_VALUE = %ld\n", - BATCH_SIZE, DEPTH, IMAGE_HEIGHT, IMAGE_WIDTH, N_BOXES, CROP_HEIGHT, - CROP_WIDTH, EXTRAPOLATION_VALUE); - - // Scalar benchmark - printf("Starting scalar benchmark...\n"); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - CropAndResizePerBox(image_data, BATCH_SIZE, DEPTH, IMAGE_HEIGHT, IMAGE_WIDTH, - boxes_data, box_index_data, 0, N_BOXES, crops_data, - CROP_HEIGHT, CROP_WIDTH, EXTRAPOLATION_VALUE); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - runtime_s = cycles2 - cycles1; - printf("Scalar benchmark complete.\n"); - printf("Cycles: %d\n", runtime_s); - - // Vector benchmark - printf("Starting vector benchmark...\n"); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - CropAndResizePerBox_BCHW_vec(image_data, BATCH_SIZE, DEPTH, IMAGE_HEIGHT, - IMAGE_WIDTH, boxes_data, box_index_data, 0, - N_BOXES, crops_data_vec, CROP_HEIGHT, CROP_WIDTH, - EXTRAPOLATION_VALUE); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - runtime_v = cycles2 - cycles1; - printf("Vector benchmark complete.\n"); - printf("Cycles: %d\n", runtime_v); - printf("Operations: %ld\n", N_BOXES * CROP_WIDTH * CROP_HEIGHT * DEPTH * 6); - printf("Loads: %ld\n", N_BOXES * CROP_WIDTH * CROP_HEIGHT * DEPTH * 4); - printf("Stores: %ld\n", N_BOXES * CROP_WIDTH * CROP_HEIGHT * DEPTH); - - // Check for errors - err = verify_result(crops_data, crops_data_vec, result_size, DELTA); - - if (err != 0) { - // Fix return code to match the index of the faulty element - err = (err == -1) ? 0 : err; - printf("Failed. Index %d: %x != %x\n", err, *((uint32_t *)&crops_data[err]), - *((uint32_t *)&crops_data_vec[err])); - return err; - } else { - printf("Passed.\n"); - } - - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/roi_align.c b/bb-tests/workloads/src/CTest/rvv/vec-roi-align/roi_align.c deleted file mode 100644 index c58099a3..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/roi_align.c +++ /dev/null @@ -1,562 +0,0 @@ -/* - Original implementation taken from - https://github.com/longcw/RoIAlign.pytorch No license found on the website. - A question about the license was made here - https://github.com/longcw/RoIAlign.pytorch/issues/48 Following the answer to - this question, a correct header will be added also here - Adaptation by: Matteo Perotti, ETH Zurich, -*/ - -#include "roi_align.h" - -void printf_fx(float num) { printf("%x\n", *((uint32_t *)&num)); } - -int64_t CropAndResizePerBox(const float *image_data, const int batch_size, - const int depth, const int image_height, - const int image_width, - - const float *boxes_data, const int *box_index_data, - const int start_box, const int limit_box, - - float *crops_data, const int crop_height, - const int crop_width, - const float extrapolation_value) { - - const int image_channel_elements = image_height * image_width; - const int image_elements = depth * image_channel_elements; - - const int channel_elements = crop_height * crop_width; - const int crop_elements = depth * channel_elements; - - int b; - // #pragma omp parallel for - - for (b = start_box; b < limit_box; ++b) { - const float *box = boxes_data + b * 4; - const float y1 = box[0]; - const float x1 = box[1]; - const float y2 = box[2]; - const float x2 = box[3]; - - const int b_in = box_index_data[b]; - if (b_in < 0 || b_in >= batch_size) { - printf("Error: batch_index %d out of range [0, %d)\n", b_in, batch_size); - return -1; - } -#ifdef PRINTF - printf("box = %d; y1, x1, y2, x2 --------------------------:\n", b); - printf_fx(y1); - printf_fx(x1); - printf_fx(y2); - printf_fx(y2); -#endif - - const float height_scale = - (crop_height > 1) ? (y2 - y1) * (image_height - 1) / (crop_height - 1) - : 0; - const float width_scale = - (crop_width > 1) ? (x2 - x1) * (image_width - 1) / (crop_width - 1) : 0; - -#ifdef PRINTF - printf("h_scale, w_scale:\n"); - printf_fx(height_scale); - printf_fx(width_scale); -#endif - for (int y = 0; y < crop_height; ++y) { - const float in_y = (crop_height > 1) - ? y1 * (image_height - 1) + y * height_scale - : 0.5 * (y1 + y2) * (image_height - 1); - - if (in_y < 0 || in_y > image_height - 1) { - for (int x = 0; x < crop_width; ++x) { - for (int d = 0; d < depth; ++d) { - // crops(b, y, x, d) = extrapolation_value; - crops_data[crop_elements * b + channel_elements * d + - y * crop_width + x] = extrapolation_value; - } - } - continue; - } - - const int top_y_index = floorf(in_y); - const int bottom_y_index = ceilf(in_y); - const float y_lerp = in_y - top_y_index; - -#ifdef PRINTF - printf("in_y, top_y_idx, bottom_y_idx, y_lerp\n"); - printf_fx(in_y); - printf("%d\n", top_y_index); - printf("%d\n", bottom_y_index); - printf_fx(y_lerp); -#endif - - for (int x = 0; x < crop_width; ++x) { - const float in_x = (crop_width > 1) - ? x1 * (image_width - 1) + x * width_scale - : 0.5 * (x1 + x2) * (image_width - 1); - if (in_x < 0 || in_x > image_width - 1) { - for (int d = 0; d < depth; ++d) { - crops_data[crop_elements * b + channel_elements * d + - y * crop_width + x] = extrapolation_value; - } - continue; - } - const int left_x_index = floorf(in_x); - const int right_x_index = ceilf(in_x); - const float x_lerp = in_x - left_x_index; - -#ifdef PRINTF - printf("in_x, left_x_idx, right_x_idx, x_lerp\n"); - printf_fx(in_x); - printf("%d\n", left_x_index); - printf("%d\n", right_x_index); - printf_fx(x_lerp); -#endif - for (int d = 0; d < depth; ++d) { - const float *pimage = - image_data + b_in * image_elements + d * image_channel_elements; - - const float top_left = - pimage[top_y_index * image_width + left_x_index]; - const float top_right = - pimage[top_y_index * image_width + right_x_index]; - const float bottom_left = - pimage[bottom_y_index * image_width + left_x_index]; - const float bottom_right = - pimage[bottom_y_index * image_width + right_x_index]; - - const float top = top_left + (top_right - top_left) * x_lerp; - const float bottom = - bottom_left + (bottom_right - bottom_left) * x_lerp; - - crops_data[crop_elements * b + channel_elements * d + y * crop_width + - x] = top + (bottom - top) * y_lerp; - } - } // end for x - } // end for y - } // end for b -#ifdef PRINTF - printf("End of the scalar function\n"); -#endif - return 0; -} - -int64_t CropAndResizePerBox_BCHW_vec( - const float *image_data, const int batch_size, const int depth, - const int image_height, const int image_width, - - const float *boxes_data, const int *box_index_data, const int start_box, - const int limit_box, - - float *crops_data, const int crop_height, const int crop_width, - const float extrapolation_value) { - - const int image_channel_elements = image_height * image_width; - const int image_elements = depth * image_channel_elements; - - const int channel_elements = crop_height * crop_width; - const int crop_elements = depth * channel_elements; - - float *prev_pimage; - float *prev_crops_data = crops_data; - - int b; - // #pragma omp parallel for - for (b = start_box; b < limit_box; ++b) { - const float *box = boxes_data + b * 4; - const float y1 = box[0]; - const float x1 = box[1]; - const float y2 = box[2]; - const float x2 = box[3]; - - const int b_in = box_index_data[b]; - if (b_in < 0 || b_in >= batch_size) { - printf("Error: batch_index %d out of range [0, %d)\n", b_in, batch_size); - return -1; - } - const float *pimage = image_data + b_in * image_elements; - const float *prev_pimage = pimage; - - const float height_scale = - (crop_height > 1) ? (y2 - y1) * (image_height - 1) / (crop_height - 1) - : 0; - const float width_scale = - (crop_width > 1) ? (x2 - x1) * (image_width - 1) / (crop_width - 1) : 0; - - for (int y = 0; y < crop_height; ++y) { - const float in_y = (crop_height > 1) - ? y1 * (image_height - 1) + y * height_scale - : 0.5 * (y1 + y2) * (image_height - 1); - - if (in_y < 0 || in_y > image_height - 1) { - for (int x = 0; x < crop_width; ++x) { - for (int d = 0; d < depth; ++d) { - // crops(b, y, x, d) = extrapolation_value; - crops_data[crop_elements * b + channel_elements * d + - y * crop_width + x] = extrapolation_value; - } - } - continue; - } - - const int top_y_index = floorf(in_y); - const int bottom_y_index = ceilf(in_y); - const float y_lerp = in_y - top_y_index; - - for (int x = 0; x < crop_width; ++x) { - const float in_x = (crop_width > 1) - ? x1 * (image_width - 1) + x * width_scale - : 0.5 * (x1 + x2) * (image_width - 1); - if (in_x < 0 || in_x > image_width - 1) { - for (int d = 0; d < depth; ++d) { - crops_data[crop_elements * b + channel_elements * d + - y * crop_width + x] = extrapolation_value; - } - continue; - } - - const int left_x_index = floorf(in_x); - const int right_x_index = ceilf(in_x); - const float x_lerp = in_x - left_x_index; - - // Load the elements that belong to the same channel - vfloat32m1_t top_left, top_right, bottom_left, bottom_right, top, - bottom, result; - ptrdiff_t cstride_pimage = image_channel_elements * sizeof(pimage[0]); - ptrdiff_t cstride_crops = channel_elements * sizeof(crops_data[0]); - size_t avl, vl; - - avl = depth; -#ifdef INTRINSICS - vl = vsetvl_e32m1(avl); - - for (avl = depth; avl > 0; avl -= vl) { - vl = vsetvl_e32m1(avl); - top_left = - vlse32_v_f32m1(&pimage[top_y_index * image_width + left_x_index], - cstride_pimage, vl); - top_right = - vlse32_v_f32m1(&pimage[top_y_index * image_width + right_x_index], - cstride_pimage, vl); - - top = vfsub_vv_f32m1(top_right, top_left, vl); - top = vfmadd_vf_f32m1(top, x_lerp, top_left, vl); - - bottom_left = vlse32_v_f32m1( - &pimage[bottom_y_index * image_width + left_x_index], - cstride_pimage, vl); - bottom_right = vlse32_v_f32m1( - &pimage[bottom_y_index * image_width + right_x_index], - cstride_pimage, vl); - - bottom = vfsub_vv_f32m1(bottom_right, bottom_left, vl); - bottom = vfmadd_vf_f32m1(bottom, x_lerp, bottom_left, vl); - - result = vfsub_vv_f32m1(bottom, top, vl); - result = vfmadd_vf_f32m1(result, y_lerp, top, vl); - - vsse32_v_f32m1(&crops_data[crop_elements * b + y * crop_width + x], - cstride_crops, result, vl); - - // Bump pointers - pimage += vl * image_channel_elements; - crops_data += vl * channel_elements; - } -#else - asm volatile("vsetvli %0, %1, e32, m1, ta, ma" : "=r"(vl) : "r"(avl)); - - for (avl = depth; avl > 0; avl -= vl) { - asm volatile("vsetvli %0, %1, e32, m1, ta, ma" : "=r"(vl) : "r"(avl)); - asm volatile("vlse32.v v0, (%0), %1" ::"r"( - &pimage[top_y_index * image_width + left_x_index]), - "r"(cstride_pimage) - : "v0"); // top left - asm volatile("vlse32.v v1, (%0), %1" ::"r"( - &pimage[top_y_index * image_width + right_x_index]), - "r"(cstride_pimage) - : "v1"); // top right - - asm volatile("vfsub.vv v2, v1, v0"); // top - asm volatile("vfmadd.vf v2, %0, v0" ::"f"(x_lerp)); // top - - asm volatile( - "vlse32.v v3, (%0), %1" ::"r"( - &pimage[bottom_y_index * image_width + left_x_index]), - "r"(cstride_pimage) - : "v3"); // bottom left - asm volatile( - "vlse32.v v4, (%0), %1" ::"r"( - &pimage[bottom_y_index * image_width + right_x_index]), - "r"(cstride_pimage) - : "v4"); // bottom right - - asm volatile("vfsub.vv v5, v4, v3"); // bottom - asm volatile("vfmadd.vf v5, %0, v3" ::"f"(x_lerp)); // bottom - - asm volatile("vfsub.vv v6, v5, v2"); // bottom - asm volatile("vfmadd.vf v6, %0, v2" ::"f"(y_lerp)); // bottom - - asm volatile("vsse32.v v6, (%0), %1" ::"r"( - &crops_data[crop_elements * b + y * crop_width + x]), - "r"(cstride_crops)); - - // Bump pointers - pimage += vl * image_channel_elements; - crops_data += vl * channel_elements; - } -#endif - pimage = prev_pimage; - - crops_data = prev_crops_data; - } // end for x - } // end for y - } // end for b - return 0; -} - -int64_t CropAndResizePerBox_BHWC_vec( - const float *image_data, const int batch_size, const int depth, - const int image_height, const int image_width, - - const float *boxes_data, const int *box_index_data, const int start_box, - const int limit_box, - - float *crops_data, const int crop_height, const int crop_width, - const float extrapolation_value) { - - const int image_channel_elements = image_height * image_width; - const int image_elements = depth * image_channel_elements; - - const int channel_elements = crop_height * crop_width; - const int crop_elements = depth * channel_elements; - - float *prev_pimage; - float *prev_crops_data = crops_data; - - int b; - // #pragma omp parallel for - for (b = start_box; b < limit_box; ++b) { - const float *box = boxes_data + b * 4; - const float y1 = box[0]; - const float x1 = box[1]; - const float y2 = box[2]; - const float x2 = box[3]; - - const int b_in = box_index_data[b]; - if (b_in < 0 || b_in >= batch_size) { - printf("Error: batch_index %d out of range [0, %d)\n", b_in, batch_size); - return -1; - } - const float *pimage = image_data + b_in * image_elements; - const float *prev_pimage = pimage; - - const float height_scale = - (crop_height > 1) ? (y2 - y1) * (image_height - 1) / (crop_height - 1) - : 0; - const float width_scale = - (crop_width > 1) ? (x2 - x1) * (image_width - 1) / (crop_width - 1) : 0; - - for (int y = 0; y < crop_height; ++y) { - const float in_y = (crop_height > 1) - ? y1 * (image_height - 1) + y * height_scale - : 0.5 * (y1 + y2) * (image_height - 1); - - if (in_y < 0 || in_y > image_height - 1) { - for (int x = 0; x < crop_width; ++x) { - for (int d = 0; d < depth; ++d) { - // crops(b, y, x, d) = extrapolation_value; - crops_data[crop_elements * b + y * crop_width * depth + x * depth + - d] = extrapolation_value; - } - } - continue; - } - - const int top_y_index = floorf(in_y); - const int bottom_y_index = ceilf(in_y); - const float y_lerp = in_y - top_y_index; - - for (int x = 0; x < crop_width; ++x) { - const float in_x = (crop_width > 1) - ? x1 * (image_width - 1) + x * width_scale - : 0.5 * (x1 + x2) * (image_width - 1); - if (in_x < 0 || in_x > image_width - 1) { - for (int d = 0; d < depth; ++d) { - crops_data[crop_elements * b + y * crop_width * depth + x * depth + - d] = extrapolation_value; - } - continue; - } - - const int left_x_index = floorf(in_x); - const int right_x_index = ceilf(in_x); - const float x_lerp = in_x - left_x_index; - - // Load the elements that belong to the same channel - vfloat32m1_t top_left, top_right, bottom_left, bottom_right, top, - bottom, result; - size_t avl, vl; - - avl = depth; - -#ifdef INTRINSICS - vl = vsetvl_e32m1(avl); - - for (avl = depth; avl > 0; avl -= vl) { - vl = vsetvl_e32m1(avl); - top_left = vle32_v_f32m1( - &pimage[depth * (top_y_index * image_width + left_x_index)], vl); - top_right = vle32_v_f32m1( - &pimage[depth * (top_y_index * image_width + right_x_index)], vl); - - top = vfsub_vv_f32m1(top_right, top_left, vl); - top = vfmadd_vf_f32m1(top, x_lerp, top_left, vl); - - bottom_left = vle32_v_f32m1( - &pimage[depth * (bottom_y_index * image_width + left_x_index)], - vl); - bottom_right = vle32_v_f32m1( - &pimage[depth * (bottom_y_index * image_width + right_x_index)], - vl); - - bottom = vfsub_vv_f32m1(bottom_right, bottom_left, vl); - bottom = vfmadd_vf_f32m1(bottom, x_lerp, bottom_left, vl); - - result = vfsub_vv_f32m1(bottom, top, vl); - result = vfmadd_vf_f32m1(result, y_lerp, top, vl); - - vse32_v_f32m1(&crops_data[crop_elements * b + y * crop_width * depth + - x * depth], - result, vl); - - // Bump pointers - pimage += vl; - crops_data += vl; - } -#else - asm volatile("vsetvli %0, %1, e32, m1, ta, ma" : "=r"(vl) : "r"(avl)); - - for (avl = depth; avl > 0; avl -= vl) { - asm volatile("vsetvli %0, %1, e32, m1, ta, ma" : "=r"(vl) : "r"(avl)); - asm volatile("vle32.v v0, (%0)" ::"r"( - &pimage[top_y_index * image_width + left_x_index]) - : "v0"); // top left - asm volatile("vle32.v v1, (%0)" ::"r"( - &pimage[top_y_index * image_width + right_x_index]) - : "v1"); // top right - - asm volatile("vfsub.vv v2, v1, v0"); // top - asm volatile("vfmadd.vf v2, %0, v0" ::"f"(x_lerp)); // top - - asm volatile("vle32.v v3, (%0)" ::"r"( - &pimage[bottom_y_index * image_width + left_x_index]) - : "v3"); // bottom left - asm volatile( - "vle32.v v4, (%0)" ::"r"( - &pimage[bottom_y_index * image_width + right_x_index]) - : "v4"); // bottom right - - asm volatile("vfsub.vv v5, v4, v3"); // bottom - asm volatile("vfmadd.vf v5, %0, v3" ::"f"(x_lerp)); // bottom - - asm volatile("vfsub.vv v6, v5, v2"); // bottom - asm volatile("vfmadd.vf v6, %0, v2" ::"f"(y_lerp)); // bottom - - asm volatile("vse32.v v6, (%0)" ::"r"( - &crops_data[crop_elements * b + y * crop_width + x])); - - // Bump pointers - pimage += vl * image_channel_elements; - crops_data += vl * channel_elements; - } -#endif - pimage = prev_pimage; - - crops_data = prev_crops_data; - } // end for x - } // end for y - } // end for b - return 0; -} - -// Normalized image -void init_image(float *vec, size_t size) { - for (unsigned long int i = 0; i < size; ++i) - vec[i] = (float)((i + 5) % size) / size; -} - -// Boxes must have meaningful coordinates -void init_boxes(float *vec, size_t size) { - // 4 coordinates per box: y1 x1 y2 x2 - for (unsigned long int i = 0; i < size; i += 4) { - vec[i] = 3; // y1 - vec[i + 1] = 7; // x1 - vec[i + 2] = 35; // y2 - vec[i + 3] = 39; // x2 - } -} - -// Each box can belong to one of the #BATCH_SIZE images -void init_boxes_idx(int *vec, size_t size, uint64_t batch_size) { - for (unsigned long int i = 0; i < size; ++i) - vec[i] = i % batch_size; -} - -// Crops initialized to zero -void init_crops(float *vec, size_t size) { - for (unsigned long int i = 0; i < size; ++i) - vec[i] = 0; -} - -// Roi Align vector kernel -// Fake just for timing measurements and comparison with Ideal Dispatcher -void roi_align_fake_kernel_asm(float *pimage, float *crops_data, - int left_x_index, int right_x_index, int b, - int y, size_t depth) { - - volatile float x_lerp = 0.14135; - volatile float y_lerp = 0.4363; - volatile int image_channel_elements = 64; - volatile int channel_elements = 32; - volatile int crop_elements = 5; - volatile int crop_width = 3; - volatile int tyiw = 9; - volatile int byiw = 7; - volatile int x = 11; - volatile int k = 0; - volatile int z = 0; - - size_t vl; - size_t avl = depth; - - asm volatile("vsetvli %0, %1, e32, m4, ta, ma" : "=r"(vl) : "r"(avl)); - - for (avl = depth; avl > 0; avl -= vl) { - asm volatile("vsetvli %0, %1, e32, m4, ta, ma" : "=r"(vl) : "r"(avl)); - asm volatile("vle32.v v0, (%0)" ::"r"(&pimage[tyiw + left_x_index]) - : "v0"); // top left - asm volatile("vle32.v v8, (%0)" ::"r"(&pimage[tyiw + right_x_index]) - : "v8"); // top right - - asm volatile("vfsub.vv v12, v8, v0"); // top - asm volatile("vfmadd.vf v12, %0, v0" ::"f"(x_lerp)); // top - - asm volatile("vle32.v v16, (%0)" ::"r"(&pimage[byiw + left_x_index]) - : "v16"); // bottom left - asm volatile("vle32.v v20, (%0)" ::"r"(&pimage[byiw + right_x_index]) - : "v20"); // bottom right - - asm volatile("vfsub.vv v24, v20, v16"); // bottom - asm volatile("vfmadd.vf v24, %0, v16" ::"f"(x_lerp)); // bottom - - asm volatile("vfsub.vv v28, v24, v12"); // bottom - asm volatile("vfmadd.vf v28, %0, v12" ::"f"(y_lerp)); // bottom - - asm volatile("vse32.v v28, (%0)" ::"r"( - &crops_data[crop_elements * b + y * crop_width + x])); - - // Bump pointers - k += vl * image_channel_elements; - z += vl * channel_elements; - } -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/roi_align.h b/bb-tests/workloads/src/CTest/rvv/vec-roi-align/roi_align.h deleted file mode 100644 index 8a843663..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-roi-align/roi_align.h +++ /dev/null @@ -1,70 +0,0 @@ -/* - Original implementation taken from - https://github.com/longcw/RoIAlign.pytorch No license found on the website. - A question about the license was made here - https://github.com/longcw/RoIAlign.pytorch/issues/48 Following the answer to - this question, a correct header will be added also here - Adaptation by: Matteo Perotti, ETH Zurich, -*/ - -#ifndef _ROI_ALIGN_H_ -#define _ROI_ALIGN_H_ - -#include - -#include "util.h" -#include -#include - -#define DELTA 0.0001 - -void printf_fx(float num); - -int64_t CropAndResizePerBox(const float *image_data, const int batch_size, - const int depth, const int image_height, - const int image_width, - - const float *boxes_data, const int *box_index_data, - const int start_box, const int limit_box, - - float *crops_data, const int crop_height, - const int crop_width, - const float extrapolation_value); - -int64_t CropAndResizePerBox_BCHW_vec( - const float *image_data, const int batch_size, const int depth, - const int image_height, const int image_width, - - const float *boxes_data, const int *box_index_data, const int start_box, - const int limit_box, - - float *crops_data, const int crop_height, const int crop_width, - const float extrapolation_value); - -int64_t CropAndResizePerBox_BHWC_vec( - const float *image_data, const int batch_size, const int depth, - const int image_height, const int image_width, - - const float *boxes_data, const int *box_index_data, const int start_box, - const int limit_box, - - float *crops_data, const int crop_height, const int crop_width, - const float extrapolation_value); - -void roi_align_fake_kernel_asm(float *pimage, float *crops_data, - int left_x_index, int right_x_index, int b, - int y, size_t depth); - -// Normalized image -void init_image(float *vec, size_t size); - -// Boxes must have meaningful coordinates -void init_boxes(float *vec, size_t size); - -// Each box can belong to one of the #BATCH_SIZE images -void init_boxes_idx(int *vec, size_t size, uint64_t batch_size); - -// Crops initialized to zero -void init_crops(float *vec, size_t size); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/dataset1.h deleted file mode 100644 index 90d2eca1..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/dataset1.h +++ /dev/null @@ -1,2928 +0,0 @@ -#define KH 3 -#define KW 3 -#define IH 121 -#define IW 121 -#define I_SIZE 14641 -#define OH 119 -#define OW 119 -#define O_SIZE 14161 - -float input_k1[IW] = { - 1.0, - 1.0, - 1.0, -}; -float input_k2[IH] = { - 1.0, - 1.0, - 1.0, -}; -float input_image[I_SIZE] = { - 92.0, 28.0, 120.0, 18.0, 20.0, 12.0, 88.0, 112.0, 56.0, 32.0, 9.0, - 48.0, 216.0, 240.0, 96.0, 4.0, 20.0, 20.0, 4.0, 168.0, 4.0, 28.0, - 22.0, 6.0, 3.0, 64.0, 216.0, 20.0, 48.0, 8.0, 56.0, 208.0, 52.0, - 64.0, 44.0, 176.0, 240.0, 96.0, 24.0, 12.0, 32.0, 464.0, 76.0, 32.0, - 88.0, 32.0, 8.0, 496.0, 4.0, 9.0, 352.0, 4.0, 76.0, 16.0, 20.0, - 12.0, 480.0, 480.0, 208.0, 60.0, 224.0, 96.0, 416.0, 320.0, 24.0, 24.0, - 112.0, 16.0, 15.0, 200.0, 72.0, 3.0, 160.0, 96.0, 24.0, 31.0, 480.0, - 208.0, 52.0, 19.0, 104.0, 448.0, 248.0, 8.0, 42.0, 108.0, 12.0, 4.0, - 38.0, 0.0, 22.0, 288.0, 100.0, 40.0, 15.0, 48.0, 8.0, 16.0, 448.0, - 52.0, 56.0, 27.0, 24.0, 24.0, 84.0, 248.0, 128.0, 16.0, 112.0, 3.0, - 72.0, 12.0, 240.0, 124.0, 200.0, 12.0, 120.0, 104.0, 40.0, 240.0, 32.0, - 192.0, 31.0, 200.0, 336.0, 48.0, 96.0, 14.0, 4.0, 96.0, 112.0, 216.0, - 136.0, 152.0, 9.0, 96.0, 24.0, 50.0, 144.0, 400.0, 25.0, 32.0, 68.0, - 14.0, 48.0, 1.0, 320.0, 46.0, 0.0, 24.0, 92.0, 46.0, 240.0, 28.0, - 224.0, 2.0, 0.0, 88.0, 100.0, 26.0, 20.0, 56.0, 400.0, 48.0, 22.0, - 128.0, 14.0, 64.0, 176.0, 120.0, 168.0, 16.0, 32.0, 128.0, 176.0, 80.0, - 116.0, 4.0, 88.0, 84.0, 304.0, 48.0, 112.0, 464.0, 32.0, 168.0, 192.0, - 56.0, 432.0, 40.0, 16.0, 36.0, 176.0, 112.0, 60.0, 56.0, 14.0, 46.0, - 112.0, 2.0, 48.0, 92.0, 60.0, 76.0, 2.0, 112.0, 30.0, 464.0, 11.0, - 28.0, 20.0, 2.0, 48.0, 240.0, 32.0, 200.0, 20.0, 20.0, 124.0, 24.0, - 128.0, 304.0, 14.0, 288.0, 192.0, 0.0, 304.0, 14.0, 176.0, 28.0, 24.0, - 20.0, 3.0, 14.0, 29.0, 29.0, 368.0, 496.0, 16.0, 32.0, 15.0, 10.0, - 112.0, 54.0, 2.0, 80.0, 26.0, 240.0, 46.0, 232.0, 416.0, 23.0, 40.0, - 72.0, 64.0, 88.0, 12.0, 6.0, 22.0, 184.0, 29.0, 36.0, 256.0, 80.0, - 62.0, 76.0, 96.0, 8.0, 136.0, 224.0, 7.0, 31.0, 0.0, 464.0, 1.0, - 22.0, 76.0, 464.0, 2.0, 50.0, 31.0, 8.0, 32.0, 48.0, 24.0, 60.0, - 13.0, 2.0, 160.0, 16.0, 84.0, 96.0, 80.0, 22.0, 176.0, 72.0, 14.0, - 30.0, 88.0, 92.0, 24.0, 4.0, 72.0, 120.0, 40.0, 29.0, 15.0, 152.0, - 28.0, 240.0, 28.0, 7.0, 32.0, 16.0, 176.0, 6.0, 26.0, 4.0, 36.0, - 112.0, 52.0, 25.0, 136.0, 31.0, 16.0, 480.0, 104.0, 232.0, 28.0, 20.0, - 20.0, 6.0, 8.0, 24.0, 120.0, 13.0, 38.0, 240.0, 208.0, 34.0, 28.0, - 60.0, 112.0, 16.0, 224.0, 54.0, 29.0, 192.0, 9.0, 384.0, 46.0, 54.0, - 208.0, 232.0, 13.0, 80.0, 96.0, 72.0, 0.0, 72.0, 40.0, 80.0, 200.0, - 128.0, 32.0, 72.0, 24.0, 384.0, 62.0, 12.0, 256.0, 200.0, 52.0, 432.0, - 54.0, 116.0, 11.0, 20.0, 56.0, 160.0, 120.0, 80.0, 2.0, 64.0, 96.0, - 128.0, 52.0, 192.0, 24.0, 38.0, 30.0, 76.0, 192.0, 26.0, 384.0, 112.0, - 24.0, 288.0, 240.0, 144.0, 0.0, 24.0, 84.0, 16.0, 2.0, 0.0, 12.0, - 28.0, 32.0, 80.0, 10.0, 16.0, 304.0, 136.0, 40.0, 40.0, 32.0, 13.0, - 18.0, 2.0, 112.0, 10.0, 416.0, 432.0, 0.0, 3.0, 304.0, 0.0, 24.0, - 40.0, 88.0, 64.0, 21.0, 32.0, 64.0, 64.0, 13.0, 120.0, 80.0, 8.0, - 13.0, 248.0, 52.0, 80.0, 18.0, 56.0, 14.0, 17.0, 38.0, 26.0, 64.0, - 80.0, 62.0, 224.0, 432.0, 320.0, 144.0, 48.0, 0.0, 16.0, 232.0, 48.0, - 13.0, 112.0, 232.0, 432.0, 32.0, 32.0, 23.0, 20.0, 42.0, 76.0, 200.0, - 40.0, 13.0, 432.0, 352.0, 6.0, 496.0, 136.0, 20.0, 160.0, 96.0, 72.0, - 16.0, 40.0, 116.0, 15.0, 0.0, 12.0, 60.0, 6.0, 28.0, 84.0, 18.0, - 160.0, 21.0, 20.0, 31.0, 216.0, 20.0, 21.0, 224.0, 64.0, 192.0, 48.0, - 0.0, 128.0, 20.0, 5.0, 44.0, 48.0, 200.0, 0.0, 25.0, 0.0, 176.0, - 22.0, 13.0, 136.0, 22.0, 448.0, 160.0, 104.0, 52.0, 16.0, 11.0, 112.0, - 128.0, 0.0, 64.0, 352.0, 18.0, 208.0, 1.0, 32.0, 28.0, 104.0, 1.0, - 15.0, 64.0, 19.0, 48.0, 40.0, 352.0, 48.0, 224.0, 240.0, 62.0, 26.0, - 64.0, 0.0, 80.0, 80.0, 56.0, 480.0, 416.0, 4.0, 192.0, 24.0, 384.0, - 38.0, 88.0, 32.0, 176.0, 26.0, 144.0, 48.0, 168.0, 8.0, 136.0, 1.0, - 32.0, 64.0, 384.0, 92.0, 8.0, 0.0, 36.0, 68.0, 64.0, 56.0, 8.0, - 25.0, 48.0, 23.0, 2.0, 128.0, 112.0, 240.0, 128.0, 16.0, 60.0, 64.0, - 128.0, 36.0, 29.0, 62.0, 42.0, 88.0, 8.0, 60.0, 4.0, 20.0, 320.0, - 168.0, 16.0, 4.0, 20.0, 16.0, 8.0, 0.0, 32.0, 256.0, 448.0, 38.0, - 92.0, 3.0, 54.0, 62.0, 62.0, 2.0, 40.0, 3.0, 368.0, 160.0, 8.0, - 80.0, 12.0, 400.0, 29.0, 48.0, 480.0, 12.0, 19.0, 16.0, 64.0, 240.0, - 62.0, 288.0, 368.0, 52.0, 22.0, 128.0, 0.0, 15.0, 72.0, 12.0, 6.0, - 8.0, 12.0, 9.0, 25.0, 104.0, 3.0, 36.0, 16.0, 208.0, 54.0, 17.0, - 27.0, 176.0, 384.0, 2.0, 168.0, 38.0, 40.0, 11.0, 50.0, 80.0, 336.0, - 96.0, 2.0, 2.0, 240.0, 4.0, 26.0, 6.0, 160.0, 400.0, 120.0, 72.0, - 9.0, 64.0, 54.0, 40.0, 21.0, 20.0, 120.0, 2.0, 44.0, 18.0, 16.0, - 176.0, 128.0, 30.0, 16.0, 32.0, 96.0, 272.0, 12.0, 26.0, 54.0, 100.0, - 384.0, 144.0, 160.0, 0.0, 56.0, 5.0, 40.0, 5.0, 60.0, 144.0, 48.0, - 11.0, 17.0, 15.0, 52.0, 16.0, 2.0, 352.0, 14.0, 48.0, 184.0, 104.0, - 0.0, 448.0, 72.0, 13.0, 14.0, 1.0, 368.0, 100.0, 28.0, 216.0, 17.0, - 112.0, 36.0, 64.0, 288.0, 104.0, 0.0, 368.0, 16.0, 28.0, 136.0, 17.0, - 14.0, 22.0, 200.0, 16.0, 8.0, 176.0, 0.0, 464.0, 88.0, 176.0, 64.0, - 224.0, 22.0, 192.0, 42.0, 32.0, 3.0, 304.0, 128.0, 72.0, 112.0, 208.0, - 19.0, 16.0, 20.0, 20.0, 29.0, 100.0, 12.0, 448.0, 92.0, 200.0, 30.0, - 0.0, 16.0, 208.0, 32.0, 8.0, 20.0, 72.0, 34.0, 52.0, 26.0, 112.0, - 240.0, 144.0, 17.0, 12.0, 192.0, 40.0, 64.0, 200.0, 96.0, 12.0, 224.0, - 24.0, 304.0, 18.0, 4.0, 144.0, 4.0, 38.0, 2.0, 8.0, 100.0, 7.0, - 24.0, 256.0, 16.0, 8.0, 384.0, 20.0, 52.0, 464.0, 124.0, 24.0, 60.0, - 224.0, 20.0, 14.0, 144.0, 0.0, 10.0, 22.0, 368.0, 64.0, 0.0, 24.0, - 208.0, 26.0, 24.0, 192.0, 416.0, 120.0, 13.0, 2.0, 16.0, 9.0, 352.0, - 27.0, 248.0, 22.0, 160.0, 432.0, 25.0, 50.0, 36.0, 27.0, 224.0, 96.0, - 2.0, 416.0, 64.0, 60.0, 76.0, 92.0, 16.0, 0.0, 88.0, 304.0, 40.0, - 3.0, 108.0, 124.0, 19.0, 56.0, 22.0, 88.0, 44.0, 116.0, 448.0, 38.0, - 72.0, 56.0, 8.0, 44.0, 48.0, 64.0, 100.0, 240.0, 11.0, 48.0, 248.0, - 23.0, 27.0, 14.0, 4.0, 16.0, 64.0, 128.0, 12.0, 30.0, 3.0, 60.0, - 42.0, 352.0, 152.0, 64.0, 124.0, 0.0, 112.0, 34.0, 14.0, 12.0, 8.0, - 40.0, 7.0, 352.0, 272.0, 368.0, 40.0, 120.0, 38.0, 40.0, 4.0, 208.0, - 2.0, 160.0, 32.0, 28.0, 40.0, 42.0, 16.0, 4.0, 15.0, 464.0, 96.0, - 50.0, 4.0, 0.0, 58.0, 22.0, 112.0, 52.0, 64.0, 26.0, 104.0, 400.0, - 26.0, 112.0, 224.0, 152.0, 96.0, 9.0, 72.0, 8.0, 184.0, 29.0, 30.0, - 27.0, 72.0, 100.0, 36.0, 200.0, 48.0, 60.0, 248.0, 112.0, 224.0, 24.0, - 240.0, 26.0, 64.0, 272.0, 16.0, 128.0, 31.0, 54.0, 192.0, 3.0, 40.0, - 136.0, 184.0, 6.0, 44.0, 23.0, 52.0, 56.0, 416.0, 496.0, 50.0, 176.0, - 176.0, 304.0, 96.0, 54.0, 96.0, 80.0, 104.0, 16.0, 0.0, 80.0, 28.0, - 26.0, 30.0, 28.0, 21.0, 224.0, 36.0, 100.0, 336.0, 448.0, 24.0, 40.0, - 0.0, 22.0, 248.0, 26.0, 0.0, 104.0, 464.0, 28.0, 368.0, 184.0, 124.0, - 18.0, 48.0, 54.0, 21.0, 184.0, 20.0, 20.0, 72.0, 336.0, 0.0, 12.0, - 16.0, 2.0, 104.0, 24.0, 40.0, 24.0, 15.0, 26.0, 176.0, 72.0, 160.0, - 56.0, 88.0, 36.0, 20.0, 16.0, 208.0, 20.0, 60.0, 56.0, 3.0, 64.0, - 496.0, 448.0, 2.0, 80.0, 0.0, 4.0, 64.0, 128.0, 176.0, 104.0, 52.0, - 96.0, 72.0, 16.0, 144.0, 31.0, 17.0, 184.0, 128.0, 88.0, 108.0, 0.0, - 32.0, 192.0, 76.0, 14.0, 112.0, 104.0, 26.0, 384.0, 176.0, 12.0, 232.0, - 8.0, 5.0, 208.0, 208.0, 184.0, 12.0, 48.0, 192.0, 304.0, 72.0, 22.0, - 116.0, 4.0, 40.0, 44.0, 54.0, 96.0, 68.0, 19.0, 48.0, 72.0, 52.0, - 28.0, 40.0, 96.0, 16.0, 56.0, 5.0, 80.0, 48.0, 40.0, 152.0, 448.0, - 480.0, 368.0, 19.0, 64.0, 1.0, 10.0, 8.0, 352.0, 336.0, 8.0, 25.0, - 80.0, 23.0, 56.0, 40.0, 14.0, 272.0, 40.0, 80.0, 5.0, 3.0, 12.0, - 5.0, 56.0, 256.0, 176.0, 24.0, 22.0, 8.0, 56.0, 58.0, 14.0, 16.0, - 62.0, 56.0, 10.0, 1.0, 116.0, 16.0, 30.0, 232.0, 44.0, 24.0, 5.0, - 56.0, 192.0, 2.0, 72.0, 2.0, 60.0, 8.0, 7.0, 68.0, 56.0, 0.0, - 12.0, 224.0, 160.0, 224.0, 25.0, 224.0, 5.0, 80.0, 10.0, 20.0, 56.0, - 25.0, 16.0, 432.0, 6.0, 336.0, 368.0, 17.0, 4.0, 112.0, 352.0, 496.0, - 36.0, 16.0, 432.0, 80.0, 116.0, 11.0, 176.0, 15.0, 80.0, 256.0, 144.0, - 0.0, 26.0, 176.0, 62.0, 80.0, 12.0, 336.0, 7.0, 16.0, 120.0, 12.0, - 56.0, 2.0, 352.0, 44.0, 52.0, 72.0, 88.0, 28.0, 96.0, 320.0, 8.0, - 232.0, 0.0, 31.0, 14.0, 368.0, 368.0, 64.0, 20.0, 120.0, 12.0, 62.0, - 232.0, 26.0, 32.0, 24.0, 21.0, 11.0, 80.0, 72.0, 14.0, 48.0, 80.0, - 80.0, 16.0, 56.0, 28.0, 240.0, 16.0, 14.0, 76.0, 92.0, 384.0, 1.0, - 15.0, 48.0, 16.0, 160.0, 9.0, 24.0, 22.0, 240.0, 38.0, 92.0, 72.0, - 112.0, 144.0, 34.0, 6.0, 68.0, 32.0, 144.0, 160.0, 112.0, 84.0, 16.0, - 80.0, 232.0, 10.0, 320.0, 108.0, 16.0, 13.0, 80.0, 60.0, 6.0, 16.0, - 64.0, 224.0, 4.0, 56.0, 20.0, 32.0, 400.0, 40.0, 304.0, 400.0, 2.0, - 116.0, 23.0, 72.0, 54.0, 32.0, 464.0, 0.0, 56.0, 16.0, 232.0, 152.0, - 124.0, 16.0, 72.0, 36.0, 16.0, 56.0, 240.0, 12.0, 18.0, 12.0, 26.0, - 112.0, 21.0, 46.0, 0.0, 30.0, 192.0, 68.0, 168.0, 16.0, 104.0, 25.0, - 17.0, 56.0, 68.0, 32.0, 256.0, 16.0, 36.0, 34.0, 16.0, 12.0, 136.0, - 29.0, 152.0, 48.0, 168.0, 22.0, 24.0, 304.0, 56.0, 28.0, 40.0, 480.0, - 56.0, 336.0, 52.0, 1.0, 64.0, 16.0, 11.0, 208.0, 20.0, 128.0, 38.0, - 152.0, 240.0, 32.0, 240.0, 104.0, 15.0, 8.0, 352.0, 116.0, 36.0, 496.0, - 0.0, 116.0, 496.0, 34.0, 10.0, 44.0, 32.0, 62.0, 60.0, 44.0, 96.0, - 26.0, 21.0, 18.0, 200.0, 40.0, 176.0, 176.0, 18.0, 24.0, 4.0, 16.0, - 12.0, 10.0, 160.0, 448.0, 50.0, 21.0, 64.0, 2.0, 24.0, 26.0, 368.0, - 464.0, 104.0, 128.0, 136.0, 100.0, 38.0, 2.0, 30.0, 112.0, 448.0, 6.0, - 192.0, 25.0, 168.0, 10.0, 224.0, 2.0, 25.0, 88.0, 160.0, 40.0, 0.0, - 12.0, 116.0, 168.0, 0.0, 3.0, 32.0, 72.0, 7.0, 26.0, 96.0, 176.0, - 5.0, 112.0, 120.0, 29.0, 32.0, 200.0, 112.0, 108.0, 16.0, 24.0, 46.0, - 22.0, 2.0, 80.0, 18.0, 144.0, 208.0, 128.0, 216.0, 76.0, 16.0, 32.0, - 0.0, 52.0, 36.0, 3.0, 3.0, 100.0, 0.0, 32.0, 32.0, 6.0, 28.0, - 6.0, 72.0, 144.0, 16.0, 64.0, 0.0, 0.0, 34.0, 6.0, 200.0, 72.0, - 208.0, 10.0, 14.0, 16.0, 256.0, 128.0, 24.0, 80.0, 304.0, 27.0, 18.0, - 108.0, 24.0, 20.0, 32.0, 40.0, 136.0, 240.0, 11.0, 18.0, 272.0, 64.0, - 84.0, 48.0, 42.0, 160.0, 16.0, 68.0, 416.0, 464.0, 64.0, 48.0, 21.0, - 448.0, 224.0, 68.0, 88.0, 112.0, 116.0, 56.0, 88.0, 128.0, 48.0, 0.0, - 16.0, 32.0, 496.0, 18.0, 23.0, 100.0, 4.0, 16.0, 58.0, 4.0, 56.0, - 48.0, 0.0, 4.0, 14.0, 184.0, 9.0, 29.0, 116.0, 56.0, 54.0, 12.0, - 352.0, 7.0, 8.0, 320.0, 84.0, 3.0, 10.0, 48.0, 13.0, 448.0, 160.0, - 480.0, 16.0, 240.0, 0.0, 52.0, 3.0, 22.0, 44.0, 34.0, 48.0, 24.0, - 144.0, 400.0, 2.0, 104.0, 0.0, 48.0, 21.0, 72.0, 54.0, 416.0, 3.0, - 7.0, 80.0, 31.0, 416.0, 112.0, 248.0, 248.0, 168.0, 9.0, 0.0, 0.0, - 32.0, 62.0, 256.0, 6.0, 2.0, 30.0, 54.0, 336.0, 224.0, 9.0, 192.0, - 50.0, 0.0, 16.0, 30.0, 124.0, 21.0, 208.0, 14.0, 88.0, 32.0, 80.0, - 64.0, 40.0, 28.0, 32.0, 160.0, 88.0, 288.0, 32.0, 104.0, 240.0, 23.0, - 8.0, 50.0, 1.0, 176.0, 5.0, 9.0, 23.0, 28.0, 80.0, 16.0, 6.0, - 72.0, 44.0, 46.0, 240.0, 24.0, 5.0, 48.0, 4.0, 46.0, 31.0, 96.0, - 16.0, 232.0, 24.0, 192.0, 34.0, 224.0, 3.0, 40.0, 96.0, 27.0, 7.0, - 5.0, 46.0, 24.0, 46.0, 18.0, 27.0, 288.0, 464.0, 256.0, 92.0, 76.0, - 72.0, 32.0, 24.0, 144.0, 17.0, 56.0, 64.0, 112.0, 16.0, 52.0, 176.0, - 36.0, 16.0, 48.0, 2.0, 104.0, 400.0, 0.0, 104.0, 7.0, 96.0, 7.0, - 320.0, 24.0, 8.0, 72.0, 0.0, 13.0, 24.0, 11.0, 34.0, 3.0, 40.0, - 208.0, 15.0, 32.0, 72.0, 168.0, 336.0, 44.0, 432.0, 48.0, 23.0, 240.0, - 120.0, 136.0, 8.0, 50.0, 120.0, 0.0, 432.0, 54.0, 52.0, 112.0, 384.0, - 14.0, 224.0, 36.0, 2.0, 27.0, 496.0, 112.0, 28.0, 128.0, 248.0, 80.0, - 232.0, 0.0, 64.0, 0.0, 6.0, 40.0, 40.0, 32.0, 88.0, 24.0, 68.0, - 232.0, 56.0, 48.0, 42.0, 304.0, 144.0, 64.0, 68.0, 248.0, 136.0, 44.0, - 128.0, 248.0, 30.0, 22.0, 60.0, 16.0, 12.0, 3.0, 120.0, 7.0, 112.0, - 6.0, 12.0, 0.0, 92.0, 10.0, 32.0, 96.0, 192.0, 16.0, 22.0, 30.0, - 28.0, 96.0, 16.0, 32.0, 0.0, 12.0, 3.0, 112.0, 48.0, 48.0, 20.0, - 14.0, 16.0, 40.0, 208.0, 224.0, 32.0, 48.0, 160.0, 26.0, 136.0, 128.0, - 68.0, 14.0, 12.0, 20.0, 56.0, 26.0, 36.0, 36.0, 48.0, 28.0, 112.0, - 14.0, 44.0, 80.0, 44.0, 496.0, 112.0, 152.0, 128.0, 48.0, 88.0, 46.0, - 216.0, 0.0, 24.0, 30.0, 4.0, 16.0, 6.0, 368.0, 3.0, 29.0, 128.0, - 80.0, 272.0, 18.0, 240.0, 108.0, 52.0, 10.0, 152.0, 256.0, 176.0, 8.0, - 52.0, 8.0, 128.0, 208.0, 192.0, 100.0, 4.0, 24.0, 25.0, 16.0, 13.0, - 416.0, 76.0, 52.0, 96.0, 384.0, 12.0, 28.0, 100.0, 384.0, 100.0, 52.0, - 18.0, 0.0, 4.0, 48.0, 16.0, 3.0, 32.0, 13.0, 272.0, 6.0, 64.0, - 11.0, 368.0, 24.0, 1.0, 60.0, 24.0, 62.0, 304.0, 22.0, 27.0, 40.0, - 160.0, 48.0, 25.0, 15.0, 64.0, 30.0, 232.0, 58.0, 0.0, 112.0, 72.0, - 16.0, 256.0, 136.0, 30.0, 176.0, 20.0, 112.0, 64.0, 6.0, 448.0, 16.0, - 88.0, 16.0, 36.0, 120.0, 10.0, 84.0, 384.0, 32.0, 60.0, 320.0, 152.0, - 42.0, 44.0, 32.0, 64.0, 496.0, 20.0, 416.0, 68.0, 48.0, 88.0, 96.0, - 22.0, 384.0, 42.0, 16.0, 84.0, 50.0, 8.0, 44.0, 136.0, 36.0, 2.0, - 8.0, 0.0, 384.0, 76.0, 28.0, 10.0, 28.0, 240.0, 64.0, 48.0, 4.0, - 8.0, 56.0, 6.0, 58.0, 224.0, 208.0, 28.0, 54.0, 0.0, 20.0, 384.0, - 120.0, 52.0, 16.0, 16.0, 208.0, 26.0, 13.0, 144.0, 368.0, 84.0, 8.0, - 168.0, 48.0, 112.0, 44.0, 18.0, 2.0, 0.0, 10.0, 168.0, 40.0, 288.0, - 144.0, 14.0, 208.0, 0.0, 80.0, 80.0, 216.0, 104.0, 96.0, 144.0, 13.0, - 14.0, 14.0, 8.0, 0.0, 448.0, 56.0, 120.0, 10.0, 54.0, 0.0, 84.0, - 24.0, 56.0, 320.0, 48.0, 4.0, 22.0, 12.0, 27.0, 2.0, 60.0, 56.0, - 22.0, 52.0, 224.0, 0.0, 52.0, 18.0, 8.0, 50.0, 60.0, 80.0, 34.0, - 232.0, 2.0, 112.0, 104.0, 12.0, 50.0, 72.0, 176.0, 208.0, 288.0, 10.0, - 24.0, 0.0, 24.0, 56.0, 6.0, 18.0, 128.0, 12.0, 76.0, 18.0, 176.0, - 112.0, 192.0, 32.0, 96.0, 448.0, 432.0, 256.0, 0.0, 96.0, 10.0, 76.0, - 384.0, 2.0, 144.0, 184.0, 30.0, 8.0, 58.0, 176.0, 60.0, 28.0, 18.0, - 112.0, 8.0, 56.0, 8.0, 12.0, 26.0, 112.0, 496.0, 288.0, 2.0, 72.0, - 40.0, 18.0, 16.0, 12.0, 128.0, 160.0, 192.0, 9.0, 52.0, 56.0, 22.0, - 20.0, 25.0, 48.0, 17.0, 28.0, 112.0, 30.0, 72.0, 0.0, 160.0, 23.0, - 152.0, 46.0, 116.0, 208.0, 0.0, 120.0, 20.0, 10.0, 112.0, 18.0, 20.0, - 100.0, 248.0, 72.0, 48.0, 56.0, 29.0, 48.0, 60.0, 124.0, 40.0, 144.0, - 44.0, 24.0, 80.0, 19.0, 16.0, 18.0, 76.0, 44.0, 216.0, 192.0, 64.0, - 0.0, 320.0, 18.0, 16.0, 24.0, 2.0, 46.0, 96.0, 48.0, 200.0, 96.0, - 288.0, 80.0, 80.0, 84.0, 32.0, 120.0, 46.0, 40.0, 160.0, 12.0, 48.0, - 184.0, 120.0, 96.0, 30.0, 176.0, 432.0, 44.0, 56.0, 224.0, 0.0, 288.0, - 20.0, 52.0, 80.0, 6.0, 224.0, 52.0, 40.0, 320.0, 8.0, 24.0, 1.0, - 144.0, 80.0, 256.0, 108.0, 56.0, 20.0, 12.0, 384.0, 144.0, 28.0, 46.0, - 92.0, 96.0, 56.0, 42.0, 22.0, 384.0, 248.0, 176.0, 88.0, 240.0, 24.0, - 200.0, 248.0, 4.0, 224.0, 3.0, 80.0, 68.0, 22.0, 104.0, 116.0, 336.0, - 124.0, 176.0, 152.0, 216.0, 12.0, 4.0, 7.0, 208.0, 23.0, 80.0, 64.0, - 13.0, 120.0, 27.0, 32.0, 44.0, 216.0, 36.0, 160.0, 92.0, 88.0, 352.0, - 80.0, 20.0, 120.0, 18.0, 0.0, 19.0, 240.0, 192.0, 8.0, 32.0, 50.0, - 34.0, 288.0, 496.0, 416.0, 62.0, 12.0, 52.0, 10.0, 2.0, 4.0, 3.0, - 80.0, 60.0, 112.0, 0.0, 50.0, 1.0, 92.0, 48.0, 28.0, 100.0, 96.0, - 20.0, 76.0, 64.0, 256.0, 144.0, 30.0, 15.0, 124.0, 20.0, 44.0, 24.0, - 92.0, 448.0, 14.0, 0.0, 0.0, 6.0, 19.0, 480.0, 208.0, 16.0, 26.0, - 24.0, 12.0, 112.0, 108.0, 32.0, 216.0, 11.0, 104.0, 36.0, 29.0, 28.0, - 16.0, 32.0, 48.0, 7.0, 216.0, 240.0, 38.0, 34.0, 52.0, 144.0, 16.0, - 96.0, 160.0, 136.0, 48.0, 208.0, 56.0, 44.0, 15.0, 88.0, 144.0, 200.0, - 128.0, 336.0, 336.0, 12.0, 64.0, 26.0, 24.0, 320.0, 232.0, 24.0, 48.0, - 20.0, 48.0, 192.0, 22.0, 80.0, 240.0, 54.0, 432.0, 416.0, 48.0, 64.0, - 36.0, 16.0, 272.0, 24.0, 40.0, 124.0, 0.0, 80.0, 22.0, 192.0, 88.0, - 4.0, 24.0, 160.0, 384.0, 128.0, 4.0, 240.0, 96.0, 23.0, 8.0, 104.0, - 224.0, 28.0, 80.0, 18.0, 184.0, 200.0, 28.0, 40.0, 1.0, 64.0, 36.0, - 29.0, 368.0, 464.0, 32.0, 16.0, 10.0, 28.0, 19.0, 112.0, 80.0, 15.0, - 160.0, 6.0, 32.0, 24.0, 224.0, 120.0, 26.0, 44.0, 88.0, 24.0, 72.0, - 64.0, 432.0, 192.0, 96.0, 30.0, 23.0, 496.0, 80.0, 5.0, 14.0, 240.0, - 32.0, 3.0, 416.0, 20.0, 224.0, 160.0, 40.0, 448.0, 46.0, 8.0, 34.0, - 80.0, 12.0, 48.0, 56.0, 28.0, 2.0, 26.0, 120.0, 0.0, 124.0, 304.0, - 112.0, 54.0, 80.0, 160.0, 192.0, 28.0, 24.0, 144.0, 12.0, 208.0, 32.0, - 27.0, 24.0, 20.0, 8.0, 68.0, 96.0, 16.0, 320.0, 64.0, 52.0, 20.0, - 14.0, 64.0, 64.0, 23.0, 19.0, 17.0, 14.0, 124.0, 24.0, 0.0, 48.0, - 28.0, 184.0, 152.0, 28.0, 208.0, 20.0, 6.0, 448.0, 36.0, 38.0, 24.0, - 416.0, 176.0, 448.0, 112.0, 4.0, 104.0, 8.0, 96.0, 320.0, 4.0, 400.0, - 116.0, 320.0, 16.0, 104.0, 320.0, 208.0, 36.0, 17.0, 24.0, 18.0, 24.0, - 116.0, 88.0, 112.0, 16.0, 96.0, 64.0, 320.0, 31.0, 68.0, 26.0, 25.0, - 128.0, 36.0, 34.0, 48.0, 152.0, 128.0, 24.0, 24.0, 2.0, 0.0, 29.0, - 1.0, 0.0, 34.0, 42.0, 23.0, 352.0, 22.0, 13.0, 12.0, 0.0, 48.0, - 80.0, 304.0, 50.0, 68.0, 80.0, 44.0, 68.0, 54.0, 432.0, 10.0, 40.0, - 16.0, 48.0, 58.0, 84.0, 32.0, 72.0, 42.0, 32.0, 108.0, 48.0, 24.0, - 7.0, 32.0, 12.0, 56.0, 24.0, 1.0, 36.0, 192.0, 48.0, 432.0, 0.0, - 1.0, 15.0, 128.0, 68.0, 48.0, 112.0, 16.0, 304.0, 272.0, 184.0, 92.0, - 64.0, 22.0, 208.0, 60.0, 16.0, 96.0, 8.0, 56.0, 17.0, 6.0, 152.0, - 84.0, 28.0, 80.0, 108.0, 6.0, 448.0, 48.0, 24.0, 30.0, 34.0, 64.0, - 16.0, 44.0, 152.0, 42.0, 88.0, 23.0, 6.0, 96.0, 24.0, 10.0, 6.0, - 19.0, 40.0, 42.0, 60.0, 23.0, 0.0, 0.0, 10.0, 84.0, 160.0, 96.0, - 112.0, 16.0, 12.0, 21.0, 40.0, 0.0, 100.0, 16.0, 136.0, 31.0, 29.0, - 36.0, 448.0, 160.0, 144.0, 25.0, 30.0, 144.0, 32.0, 384.0, 168.0, 12.0, - 120.0, 48.0, 168.0, 52.0, 60.0, 96.0, 4.0, 272.0, 168.0, 14.0, 80.0, - 4.0, 14.0, 2.0, 4.0, 216.0, 64.0, 160.0, 20.0, 8.0, 36.0, 152.0, - 48.0, 336.0, 16.0, 20.0, 104.0, 256.0, 72.0, 208.0, 96.0, 28.0, 96.0, - 2.0, 15.0, 8.0, 16.0, 11.0, 20.0, 20.0, 144.0, 96.0, 120.0, 36.0, - 224.0, 128.0, 36.0, 18.0, 184.0, 320.0, 100.0, 80.0, 22.0, 62.0, 96.0, - 124.0, 384.0, 144.0, 32.0, 200.0, 2.0, 40.0, 14.0, 216.0, 176.0, 112.0, - 62.0, 432.0, 0.0, 240.0, 22.0, 400.0, 64.0, 136.0, 22.0, 112.0, 96.0, - 256.0, 96.0, 12.0, 24.0, 44.0, 288.0, 104.0, 120.0, 23.0, 24.0, 54.0, - 28.0, 80.0, 18.0, 26.0, 56.0, 48.0, 24.0, 320.0, 112.0, 64.0, 21.0, - 128.0, 60.0, 72.0, 16.0, 144.0, 112.0, 128.0, 10.0, 28.0, 0.0, 92.0, - 0.0, 20.0, 11.0, 24.0, 28.0, 20.0, 25.0, 5.0, 56.0, 52.0, 11.0, - 144.0, 92.0, 8.0, 64.0, 208.0, 50.0, 72.0, 480.0, 100.0, 48.0, 19.0, - 96.0, 184.0, 64.0, 23.0, 4.0, 36.0, 240.0, 24.0, 48.0, 16.0, 24.0, - 28.0, 160.0, 8.0, 3.0, 1.0, 25.0, 136.0, 104.0, 24.0, 64.0, 320.0, - 0.0, 128.0, 208.0, 26.0, 336.0, 29.0, 44.0, 23.0, 160.0, 224.0, 12.0, - 112.0, 50.0, 25.0, 62.0, 224.0, 19.0, 48.0, 2.0, 42.0, 104.0, 216.0, - 24.0, 54.0, 84.0, 120.0, 108.0, 22.0, 112.0, 22.0, 184.0, 28.0, 232.0, - 48.0, 10.0, 36.0, 2.0, 80.0, 54.0, 224.0, 112.0, 12.0, 6.0, 54.0, - 232.0, 4.0, 24.0, 38.0, 64.0, 13.0, 16.0, 21.0, 29.0, 46.0, 120.0, - 64.0, 48.0, 16.0, 11.0, 92.0, 0.0, 0.0, 10.0, 304.0, 64.0, 240.0, - 216.0, 48.0, 96.0, 32.0, 2.0, 5.0, 64.0, 1.0, 0.0, 26.0, 31.0, - 216.0, 12.0, 56.0, 152.0, 56.0, 60.0, 15.0, 17.0, 60.0, 304.0, 20.0, - 46.0, 152.0, 8.0, 48.0, 32.0, 68.0, 12.0, 50.0, 256.0, 8.0, 4.0, - 52.0, 192.0, 16.0, 5.0, 16.0, 448.0, 19.0, 448.0, 48.0, 40.0, 224.0, - 8.0, 108.0, 108.0, 176.0, 96.0, 120.0, 120.0, 88.0, 24.0, 416.0, 24.0, - 128.0, 176.0, 64.0, 112.0, 8.0, 18.0, 128.0, 72.0, 256.0, 92.0, 68.0, - 120.0, 240.0, 7.0, 20.0, 30.0, 22.0, 240.0, 6.0, 192.0, 24.0, 208.0, - 8.0, 4.0, 16.0, 44.0, 38.0, 336.0, 8.0, 108.0, 29.0, 10.0, 384.0, - 6.0, 176.0, 176.0, 96.0, 216.0, 40.0, 112.0, 5.0, 16.0, 24.0, 31.0, - 5.0, 14.0, 256.0, 336.0, 120.0, 29.0, 44.0, 144.0, 32.0, 0.0, 40.0, - 56.0, 184.0, 72.0, 12.0, 27.0, 72.0, 256.0, 15.0, 60.0, 22.0, 84.0, - 62.0, 56.0, 24.0, 56.0, 128.0, 448.0, 44.0, 176.0, 76.0, 64.0, 0.0, - 496.0, 16.0, 168.0, 144.0, 18.0, 160.0, 9.0, 120.0, 23.0, 100.0, 96.0, - 0.0, 48.0, 88.0, 20.0, 176.0, 15.0, 12.0, 16.0, 3.0, 124.0, 16.0, - 2.0, 11.0, 48.0, 36.0, 116.0, 256.0, 160.0, 120.0, 10.0, 23.0, 304.0, - 17.0, 192.0, 272.0, 12.0, 32.0, 480.0, 28.0, 12.0, 60.0, 80.0, 24.0, - 232.0, 28.0, 31.0, 304.0, 480.0, 36.0, 30.0, 29.0, 400.0, 27.0, 208.0, - 44.0, 24.0, 496.0, 21.0, 168.0, 32.0, 4.0, 23.0, 48.0, 288.0, 304.0, - 14.0, 176.0, 13.0, 24.0, 224.0, 26.0, 28.0, 496.0, 32.0, 56.0, 256.0, - 64.0, 72.0, 14.0, 112.0, 2.0, 120.0, 240.0, 464.0, 40.0, 32.0, 176.0, - 48.0, 32.0, 368.0, 0.0, 136.0, 168.0, 62.0, 176.0, 8.0, 368.0, 7.0, - 28.0, 52.0, 26.0, 24.0, 24.0, 2.0, 272.0, 128.0, 26.0, 8.0, 48.0, - 224.0, 10.0, 256.0, 144.0, 38.0, 48.0, 24.0, 40.0, 120.0, 120.0, 54.0, - 34.0, 50.0, 144.0, 232.0, 96.0, 200.0, 304.0, 21.0, 16.0, 40.0, 1.0, - 24.0, 48.0, 46.0, 8.0, 104.0, 19.0, 10.0, 448.0, 8.0, 46.0, 88.0, - 12.0, 20.0, 496.0, 26.0, 19.0, 120.0, 192.0, 32.0, 24.0, 20.0, 24.0, - 58.0, 26.0, 80.0, 48.0, 128.0, 9.0, 192.0, 68.0, 28.0, 50.0, 25.0, - 272.0, 160.0, 176.0, 12.0, 10.0, 4.0, 22.0, 10.0, 80.0, 26.0, 17.0, - 48.0, 60.0, 56.0, 44.0, 96.0, 10.0, 96.0, 42.0, 48.0, 128.0, 368.0, - 144.0, 29.0, 8.0, 62.0, 256.0, 116.0, 28.0, 8.0, 88.0, 42.0, 108.0, - 28.0, 84.0, 52.0, 0.0, 160.0, 120.0, 368.0, 32.0, 224.0, 22.0, 128.0, - 208.0, 200.0, 248.0, 4.0, 12.0, 48.0, 25.0, 200.0, 8.0, 240.0, 38.0, - 16.0, 368.0, 14.0, 336.0, 18.0, 100.0, 96.0, 120.0, 224.0, 20.0, 96.0, - 13.0, 8.0, 10.0, 72.0, 112.0, 24.0, 12.0, 40.0, 29.0, 288.0, 32.0, - 176.0, 32.0, 84.0, 416.0, 304.0, 18.0, 112.0, 22.0, 248.0, 64.0, 184.0, - 20.0, 27.0, 208.0, 16.0, 19.0, 44.0, 176.0, 48.0, 11.0, 104.0, 200.0, - 0.0, 24.0, 27.0, 116.0, 34.0, 32.0, 464.0, 40.0, 2.0, 16.0, 17.0, - 288.0, 48.0, 10.0, 0.0, 96.0, 16.0, 160.0, 200.0, 92.0, 38.0, 5.0, - 272.0, 18.0, 416.0, 40.0, 80.0, 38.0, 176.0, 48.0, 36.0, 48.0, 128.0, - 240.0, 76.0, 112.0, 6.0, 88.0, 2.0, 2.0, 30.0, 8.0, 104.0, 12.0, - 13.0, 28.0, 62.0, 40.0, 104.0, 16.0, 336.0, 28.0, 4.0, 60.0, 27.0, - 2.0, 29.0, 240.0, 54.0, 200.0, 84.0, 248.0, 208.0, 224.0, 80.0, 124.0, - 100.0, 96.0, 128.0, 31.0, 64.0, 0.0, 112.0, 23.0, 92.0, 48.0, 32.0, - 16.0, 0.0, 16.0, 112.0, 44.0, 24.0, 20.0, 0.0, 4.0, 144.0, 20.0, - 160.0, 24.0, 44.0, 18.0, 120.0, 120.0, 2.0, 248.0, 0.0, 26.0, 42.0, - 80.0, 0.0, 14.0, 8.0, 68.0, 20.0, 58.0, 232.0, 400.0, 368.0, 7.0, - 9.0, 208.0, 384.0, 224.0, 104.0, 24.0, 0.0, 28.0, 368.0, 34.0, 496.0, - 58.0, 104.0, 208.0, 3.0, 18.0, 36.0, 38.0, 48.0, 64.0, 4.0, 16.0, - 8.0, 12.0, 22.0, 22.0, 24.0, 48.0, 72.0, 104.0, 2.0, 24.0, 25.0, - 104.0, 60.0, 20.0, 34.0, 44.0, 12.0, 60.0, 92.0, 320.0, 6.0, 240.0, - 48.0, 54.0, 116.0, 112.0, 23.0, 10.0, 84.0, 32.0, 32.0, 29.0, 224.0, - 28.0, 9.0, 8.0, 304.0, 112.0, 30.0, 12.0, 15.0, 176.0, 16.0, 116.0, - 42.0, 48.0, 96.0, 28.0, 120.0, 14.0, 448.0, 16.0, 168.0, 240.0, 22.0, - 46.0, 80.0, 23.0, 160.0, 92.0, 76.0, 84.0, 54.0, 240.0, 5.0, 40.0, - 80.0, 42.0, 68.0, 480.0, 14.0, 40.0, 58.0, 116.0, 124.0, 36.0, 352.0, - 84.0, 32.0, 64.0, 112.0, 208.0, 184.0, 200.0, 2.0, 5.0, 240.0, 48.0, - 11.0, 3.0, 5.0, 54.0, 76.0, 112.0, 28.0, 17.0, 32.0, 32.0, 16.0, - 124.0, 80.0, 26.0, 32.0, 432.0, 112.0, 0.0, 8.0, 176.0, 248.0, 232.0, - 48.0, 36.0, 52.0, 38.0, 100.0, 22.0, 17.0, 16.0, 22.0, 24.0, 14.0, - 7.0, 5.0, 104.0, 1.0, 28.0, 256.0, 0.0, 128.0, 144.0, 8.0, 336.0, - 144.0, 2.0, 240.0, 192.0, 0.0, 16.0, 112.0, 24.0, 29.0, 72.0, 88.0, - 16.0, 52.0, 18.0, 112.0, 16.0, 224.0, 23.0, 32.0, 216.0, 12.0, 88.0, - 34.0, 28.0, 108.0, 0.0, 4.0, 20.0, 32.0, 120.0, 18.0, 12.0, 368.0, - 6.0, 0.0, 40.0, 272.0, 208.0, 24.0, 52.0, 448.0, 168.0, 42.0, 1.0, - 336.0, 8.0, 56.0, 96.0, 248.0, 44.0, 36.0, 336.0, 248.0, 96.0, 11.0, - 14.0, 48.0, 120.0, 216.0, 336.0, 64.0, 10.0, 40.0, 16.0, 24.0, 480.0, - 22.0, 112.0, 48.0, 6.0, 88.0, 96.0, 104.0, 72.0, 4.0, 24.0, 36.0, - 464.0, 28.0, 24.0, 36.0, 50.0, 480.0, 32.0, 28.0, 48.0, 160.0, 28.0, - 352.0, 26.0, 336.0, 2.0, 29.0, 12.0, 208.0, 4.0, 7.0, 256.0, 12.0, - 62.0, 32.0, 48.0, 26.0, 5.0, 46.0, 0.0, 336.0, 400.0, 32.0, 16.0, - 240.0, 24.0, 176.0, 11.0, 128.0, 200.0, 5.0, 208.0, 48.0, 76.0, 40.0, - 64.0, 40.0, 88.0, 38.0, 38.0, 0.0, 160.0, 352.0, 304.0, 288.0, 30.0, - 34.0, 480.0, 13.0, 10.0, 224.0, 10.0, 152.0, 176.0, 448.0, 16.0, 48.0, - 8.0, 144.0, 0.0, 96.0, 16.0, 108.0, 23.0, 3.0, 24.0, 108.0, 21.0, - 46.0, 432.0, 54.0, 20.0, 24.0, 84.0, 128.0, 52.0, 26.0, 30.0, 352.0, - 6.0, 44.0, 160.0, 4.0, 108.0, 240.0, 60.0, 8.0, 36.0, 32.0, 232.0, - 12.0, 24.0, 18.0, 116.0, 64.0, 112.0, 32.0, 36.0, 224.0, 18.0, 12.0, - 80.0, 352.0, 304.0, 88.0, 18.0, 80.0, 64.0, 64.0, 192.0, 240.0, 0.0, - 18.0, 20.0, 24.0, 36.0, 26.0, 44.0, 15.0, 32.0, 184.0, 112.0, 224.0, - 20.0, 248.0, 96.0, 144.0, 38.0, 176.0, 32.0, 16.0, 40.0, 16.0, 31.0, - 36.0, 88.0, 18.0, 31.0, 34.0, 32.0, 56.0, 21.0, 56.0, 16.0, 224.0, - 15.0, 136.0, 384.0, 64.0, 40.0, 21.0, 104.0, 384.0, 144.0, 160.0, 8.0, - 416.0, 44.0, 192.0, 136.0, 0.0, 8.0, 0.0, 16.0, 8.0, 144.0, 26.0, - 128.0, 23.0, 34.0, 42.0, 16.0, 208.0, 144.0, 208.0, 112.0, 72.0, 88.0, - 40.0, 208.0, 40.0, 272.0, 18.0, 4.0, 256.0, 40.0, 19.0, 52.0, 0.0, - 72.0, 96.0, 4.0, 50.0, 496.0, 18.0, 80.0, 416.0, 76.0, 12.0, 46.0, - 0.0, 160.0, 25.0, 0.0, 12.0, 58.0, 112.0, 48.0, 112.0, 46.0, 58.0, - 80.0, 80.0, 48.0, 4.0, 12.0, 56.0, 0.0, 32.0, 38.0, 108.0, 80.0, - 2.0, 11.0, 64.0, 4.0, 208.0, 400.0, 13.0, 58.0, 27.0, 4.0, 24.0, - 0.0, 112.0, 224.0, 22.0, 40.0, 15.0, 21.0, 28.0, 384.0, 76.0, 14.0, - 432.0, 64.0, 14.0, 92.0, 100.0, 32.0, 32.0, 256.0, 160.0, 72.0, 64.0, - 224.0, 36.0, 16.0, 6.0, 56.0, 248.0, 15.0, 48.0, 60.0, 256.0, 64.0, - 4.0, 21.0, 30.0, 8.0, 62.0, 208.0, 32.0, 32.0, 52.0, 60.0, 18.0, - 240.0, 28.0, 92.0, 216.0, 22.0, 18.0, 28.0, 32.0, 104.0, 160.0, 112.0, - 224.0, 56.0, 32.0, 30.0, 88.0, 22.0, 28.0, 144.0, 112.0, 13.0, 4.0, - 42.0, 168.0, 32.0, 52.0, 80.0, 168.0, 26.0, 16.0, 120.0, 76.0, 80.0, - 368.0, 416.0, 108.0, 192.0, 104.0, 32.0, 44.0, 8.0, 40.0, 30.0, 480.0, - 6.0, 80.0, 400.0, 40.0, 240.0, 56.0, 448.0, 48.0, 72.0, 14.0, 84.0, - 8.0, 19.0, 10.0, 24.0, 432.0, 56.0, 34.0, 34.0, 44.0, 10.0, 7.0, - 40.0, 5.0, 0.0, 6.0, 208.0, 32.0, 0.0, 42.0, 240.0, 432.0, 7.0, - 36.0, 24.0, 20.0, 24.0, 50.0, 128.0, 20.0, 112.0, 232.0, 21.0, 8.0, - 58.0, 80.0, 27.0, 12.0, 32.0, 60.0, 7.0, 8.0, 64.0, 15.0, 32.0, - 42.0, 64.0, 120.0, 25.0, 80.0, 36.0, 21.0, 216.0, 20.0, 256.0, 54.0, - 320.0, 496.0, 152.0, 144.0, 224.0, 64.0, 24.0, 2.0, 32.0, 7.0, 48.0, - 384.0, 40.0, 22.0, 240.0, 26.0, 0.0, 16.0, 6.0, 152.0, 7.0, 34.0, - 42.0, 80.0, 12.0, 12.0, 208.0, 56.0, 21.0, 1.0, 448.0, 208.0, 2.0, - 320.0, 192.0, 50.0, 48.0, 96.0, 1.0, 68.0, 24.0, 8.0, 384.0, 72.0, - 22.0, 56.0, 11.0, 224.0, 29.0, 8.0, 18.0, 304.0, 54.0, 21.0, 92.0, - 192.0, 40.0, 208.0, 496.0, 320.0, 44.0, 80.0, 38.0, 304.0, 80.0, 80.0, - 8.0, 23.0, 16.0, 96.0, 384.0, 88.0, 76.0, 10.0, 80.0, 48.0, 52.0, - 144.0, 152.0, 31.0, 62.0, 40.0, 112.0, 8.0, 26.0, 16.0, 24.0, 168.0, - 2.0, 32.0, 24.0, 496.0, 2.0, 288.0, 144.0, 272.0, 24.0, 34.0, 18.0, - 62.0, 42.0, 32.0, 72.0, 240.0, 108.0, 88.0, 24.0, 30.0, 320.0, 416.0, - 248.0, 24.0, 20.0, 480.0, 16.0, 216.0, 32.0, 108.0, 27.0, 0.0, 496.0, - 224.0, 80.0, 7.0, 58.0, 24.0, 22.0, 22.0, 88.0, 21.0, 160.0, 21.0, - 208.0, 80.0, 0.0, 88.0, 124.0, 21.0, 20.0, 108.0, 80.0, 3.0, 56.0, - 0.0, 92.0, 40.0, 56.0, 0.0, 0.0, 4.0, 58.0, 104.0, 8.0, 112.0, - 96.0, 192.0, 12.0, 92.0, 352.0, 48.0, 72.0, 13.0, 4.0, 60.0, 50.0, - 25.0, 54.0, 26.0, 21.0, 208.0, 72.0, 16.0, 240.0, 0.0, 416.0, 11.0, - 16.0, 60.0, 8.0, 62.0, 64.0, 128.0, 4.0, 56.0, 72.0, 1.0, 100.0, - 224.0, 42.0, 56.0, 38.0, 54.0, 32.0, 144.0, 352.0, 4.0, 28.0, 320.0, - 62.0, 120.0, 8.0, 11.0, 48.0, 10.0, 496.0, 3.0, 9.0, 120.0, 240.0, - 184.0, 28.0, 15.0, 24.0, 96.0, 216.0, 4.0, 6.0, 8.0, 144.0, 20.0, - 11.0, 64.0, 6.0, 320.0, 40.0, 30.0, 7.0, 112.0, 34.0, 48.0, 192.0, - 40.0, 38.0, 11.0, 80.0, 52.0, 176.0, 192.0, 448.0, 448.0, 32.0, 124.0, - 232.0, 160.0, 62.0, 50.0, 192.0, 72.0, 16.0, 24.0, 120.0, 96.0, 27.0, - 0.0, 30.0, 29.0, 46.0, 64.0, 32.0, 19.0, 56.0, 16.0, 176.0, 52.0, - 384.0, 30.0, 92.0, 12.0, 0.0, 8.0, 464.0, 30.0, 84.0, 224.0, 23.0, - 76.0, 240.0, 224.0, 9.0, 36.0, 224.0, 72.0, 32.0, 28.0, 60.0, 8.0, - 20.0, 136.0, 60.0, 16.0, 68.0, 400.0, 16.0, 24.0, 31.0, 9.0, 96.0, - 22.0, 10.0, 15.0, 60.0, 92.0, 2.0, 224.0, 7.0, 16.0, 240.0, 18.0, - 20.0, 22.0, 56.0, 0.0, 4.0, 30.0, 384.0, 18.0, 448.0, 464.0, 76.0, - 29.0, 8.0, 240.0, 32.0, 18.0, 144.0, 22.0, 20.0, 24.0, 96.0, 5.0, - 80.0, 200.0, 30.0, 216.0, 10.0, 144.0, 28.0, 28.0, 0.0, 17.0, 160.0, - 232.0, 112.0, 448.0, 88.0, 16.0, 24.0, 56.0, 12.0, 17.0, 38.0, 9.0, - 16.0, 25.0, 10.0, 52.0, 48.0, 0.0, 32.0, 44.0, 496.0, 8.0, 14.0, - 104.0, 64.0, 480.0, 120.0, 16.0, 18.0, 4.0, 48.0, 56.0, 40.0, 232.0, - 208.0, 384.0, 184.0, 136.0, 40.0, 50.0, 0.0, 29.0, 11.0, 192.0, 208.0, - 48.0, 120.0, 10.0, 176.0, 0.0, 34.0, 64.0, 12.0, 20.0, 60.0, 8.0, - 0.0, 80.0, 6.0, 184.0, 480.0, 368.0, 224.0, 38.0, 320.0, 480.0, 112.0, - 24.0, 30.0, 104.0, 224.0, 20.0, 76.0, 40.0, 184.0, 14.0, 14.0, 9.0, - 432.0, 96.0, 52.0, 48.0, 13.0, 44.0, 400.0, 112.0, 22.0, 216.0, 120.0, - 200.0, 50.0, 24.0, 48.0, 84.0, 8.0, 72.0, 16.0, 8.0, 88.0, 208.0, - 24.0, 40.0, 144.0, 144.0, 48.0, 80.0, 64.0, 200.0, 160.0, 160.0, 0.0, - 68.0, 24.0, 42.0, 112.0, 46.0, 2.0, 160.0, 5.0, 22.0, 48.0, 52.0, - 112.0, 40.0, 2.0, 76.0, 14.0, 128.0, 48.0, 208.0, 29.0, 80.0, 10.0, - 12.0, 20.0, 6.0, 496.0, 0.0, 464.0, 224.0, 120.0, 128.0, 42.0, 92.0, - 46.0, 34.0, 44.0, 42.0, 24.0, 72.0, 320.0, 11.0, 16.0, 108.0, 60.0, - 19.0, 88.0, 384.0, 100.0, 3.0, 96.0, 232.0, 4.0, 480.0, 44.0, 54.0, - 108.0, 6.0, 116.0, 2.0, 72.0, 80.0, 36.0, 240.0, 192.0, 464.0, 27.0, - 72.0, 136.0, 160.0, 12.0, 14.0, 0.0, 17.0, 56.0, 216.0, 432.0, 18.0, - 84.0, 384.0, 112.0, 120.0, 20.0, 184.0, 12.0, 38.0, 96.0, 124.0, 496.0, - 104.0, 4.0, 68.0, 144.0, 62.0, 16.0, 40.0, 336.0, 96.0, 184.0, 80.0, - 6.0, 8.0, 8.0, 28.0, 28.0, 64.0, 18.0, 304.0, 48.0, 128.0, 112.0, - 4.0, 96.0, 4.0, 28.0, 12.0, 192.0, 44.0, 176.0, 6.0, 44.0, 10.0, - 496.0, 192.0, 96.0, 50.0, 32.0, 68.0, 23.0, 15.0, 336.0, 32.0, 31.0, - 144.0, 224.0, 18.0, 9.0, 52.0, 28.0, 16.0, 19.0, 24.0, 448.0, 248.0, - 100.0, 192.0, 96.0, 168.0, 36.0, 48.0, 160.0, 168.0, 24.0, 224.0, 38.0, - 16.0, 48.0, 3.0, 12.0, 6.0, 22.0, 208.0, 24.0, 336.0, 24.0, 136.0, - 16.0, 0.0, 64.0, 40.0, 24.0, 8.0, 336.0, 36.0, 30.0, 144.0, 4.0, - 16.0, 48.0, 14.0, 36.0, 24.0, 96.0, 400.0, 24.0, 58.0, 176.0, 208.0, - 25.0, 96.0, 48.0, 112.0, 112.0, 0.0, 68.0, 320.0, 23.0, 108.0, 336.0, - 18.0, 76.0, 20.0, 20.0, 400.0, 496.0, 304.0, 4.0, 64.0, 60.0, 128.0, - 16.0, 240.0, 38.0, 96.0, 16.0, 8.0, 25.0, 96.0, 84.0, 304.0, 13.0, - 108.0, 128.0, 32.0, 160.0, 27.0, 104.0, 62.0, 56.0, 48.0, 9.0, 100.0, - 14.0, 14.0, 64.0, 160.0, 336.0, 76.0, 28.0, 4.0, 96.0, 240.0, 22.0, - 144.0, 416.0, 96.0, 16.0, 30.0, 5.0, 160.0, 46.0, 22.0, 88.0, 40.0, - 21.0, 40.0, 184.0, 4.0, 128.0, 56.0, 52.0, 232.0, 36.0, 32.0, 272.0, - 176.0, 208.0, 216.0, 72.0, 28.0, 80.0, 124.0, 240.0, 136.0, 20.0, 40.0, - 40.0, 144.0, 18.0, 28.0, 18.0, 18.0, 14.0, 112.0, 16.0, 23.0, 216.0, - 20.0, 18.0, 192.0, 448.0, 46.0, 104.0, 6.0, 16.0, 27.0, 48.0, 4.0, - 96.0, 8.0, 26.0, 16.0, 4.0, 8.0, 480.0, 104.0, 16.0, 144.0, 104.0, - 16.0, 56.0, 108.0, 20.0, 92.0, 0.0, 36.0, 20.0, 19.0, 44.0, 288.0, - 32.0, 192.0, 40.0, 4.0, 18.0, 22.0, 27.0, 21.0, 4.0, 22.0, 432.0, - 1.0, 16.0, 104.0, 23.0, 52.0, 96.0, 208.0, 12.0, 0.0, 144.0, 19.0, - 224.0, 8.0, 30.0, 42.0, 124.0, 124.0, 224.0, 84.0, 27.0, 192.0, 28.0, - 6.0, 8.0, 13.0, 25.0, 30.0, 52.0, 480.0, 128.0, 31.0, 368.0, 232.0, - 16.0, 88.0, 36.0, 9.0, 0.0, 68.0, 26.0, 42.0, 15.0, 176.0, 2.0, - 120.0, 48.0, 0.0, 56.0, 8.0, 52.0, 44.0, 124.0, 20.0, 10.0, 136.0, - 0.0, 23.0, 432.0, 58.0, 116.0, 112.0, 48.0, 336.0, 208.0, 416.0, 256.0, - 116.0, 16.0, 0.0, 480.0, 120.0, 416.0, 272.0, 18.0, 92.0, 124.0, 80.0, - 192.0, 304.0, 60.0, 240.0, 52.0, 88.0, 22.0, 176.0, 432.0, 88.0, 44.0, - 30.0, 48.0, 44.0, 10.0, 10.0, 2.0, 128.0, 8.0, 4.0, 64.0, 232.0, - 60.0, 16.0, 208.0, 13.0, 56.0, 8.0, 24.0, 13.0, 28.0, 56.0, 3.0, - 18.0, 26.0, 8.0, 64.0, 52.0, 108.0, 48.0, 14.0, 80.0, 16.0, 62.0, - 216.0, 30.0, 144.0, 6.0, 5.0, 12.0, 10.0, 19.0, 8.0, 6.0, 0.0, - 6.0, 42.0, 216.0, 58.0, 200.0, 26.0, 160.0, 7.0, 496.0, 240.0, 38.0, - 56.0, 40.0, 52.0, 88.0, 58.0, 44.0, 21.0, 32.0, 384.0, 52.0, 9.0, - 384.0, 368.0, 48.0, 192.0, 10.0, 44.0, 44.0, 128.0, 448.0, 40.0, 108.0, - 13.0, 30.0, 96.0, 12.0, 21.0, 72.0, 2.0, 112.0, 48.0, 15.0, 14.0, - 96.0, 216.0, 248.0, 38.0, 14.0, 96.0, 368.0, 224.0, 80.0, 224.0, 8.0, - 14.0, 24.0, 23.0, 64.0, 496.0, 18.0, 96.0, 320.0, 4.0, 38.0, 464.0, - 16.0, 272.0, 384.0, 14.0, 136.0, 44.0, 168.0, 32.0, 34.0, 0.0, 18.0, - 16.0, 0.0, 136.0, 20.0, 80.0, 248.0, 0.0, 11.0, 112.0, 304.0, 104.0, - 112.0, 38.0, 32.0, 32.0, 68.0, 100.0, 68.0, 160.0, 2.0, 32.0, 432.0, - 4.0, 14.0, 56.0, 0.0, 32.0, 0.0, 0.0, 5.0, 120.0, 96.0, 36.0, - 30.0, 5.0, 48.0, 480.0, 208.0, 18.0, 192.0, 48.0, 16.0, 144.0, 76.0, - 240.0, 40.0, 10.0, 248.0, 23.0, 27.0, 26.0, 112.0, 40.0, 224.0, 46.0, - 28.0, 8.0, 176.0, 26.0, 22.0, 496.0, 40.0, 50.0, 6.0, 24.0, 26.0, - 240.0, 30.0, 20.0, 0.0, 23.0, 4.0, 120.0, 30.0, 31.0, 7.0, 6.0, - 42.0, 30.0, 26.0, 88.0, 72.0, 9.0, 8.0, 100.0, 24.0, 192.0, 80.0, - 96.0, 112.0, 232.0, 36.0, 32.0, 62.0, 26.0, 56.0, 320.0, 32.0, 40.0, - 100.0, 40.0, 16.0, 14.0, 352.0, 416.0, 8.0, 336.0, 18.0, 46.0, 0.0, - 224.0, 62.0, 12.0, 128.0, 26.0, 0.0, 368.0, 19.0, 11.0, 96.0, 80.0, - 120.0, 60.0, 60.0, 2.0, 23.0, 64.0, 62.0, 60.0, 496.0, 60.0, 176.0, - 128.0, 112.0, 12.0, 26.0, 120.0, 15.0, 120.0, 32.0, 17.0, 42.0, 32.0, - 64.0, 64.0, 8.0, 24.0, 8.0, 464.0, 9.0, 7.0, 80.0, 20.0, 54.0, - 52.0, 46.0, 4.0, 84.0, 88.0, 18.0, 14.0, 40.0, 160.0, 16.0, 368.0, - 14.0, 8.0, 32.0, 68.0, 21.0, 46.0, 14.0, 400.0, 54.0, 14.0, 480.0, - 256.0, 64.0, 192.0, 46.0, 18.0, 32.0, 30.0, 18.0, 104.0, 240.0, 0.0, - 36.0, 26.0, 0.0, 24.0, 56.0, 224.0, 18.0, 120.0, 76.0, 62.0, 92.0, - 40.0, 120.0, 28.0, 192.0, 11.0, 24.0, 4.0, 64.0, 48.0, 176.0, 16.0, - 124.0, 22.0, 12.0, 80.0, 116.0, 192.0, 24.0, 11.0, 54.0, 31.0, 128.0, - 60.0, 30.0, 192.0, 96.0, 40.0, 48.0, 8.0, 208.0, 5.0, 12.0, 30.0, - 8.0, 40.0, 40.0, 27.0, 46.0, 176.0, 60.0, 224.0, 36.0, 52.0, 120.0, - 72.0, 352.0, 240.0, 29.0, 100.0, 44.0, 60.0, 92.0, 4.0, 34.0, 34.0, - 368.0, 88.0, 320.0, 416.0, 3.0, 40.0, 112.0, 144.0, 80.0, 112.0, 0.0, - 96.0, 176.0, 20.0, 30.0, 56.0, 20.0, 12.0, 208.0, 24.0, 400.0, 80.0, - 496.0, 32.0, 240.0, 40.0, 8.0, 54.0, 0.0, 11.0, 34.0, 44.0, 88.0, - 124.0, 400.0, 16.0, 36.0, 0.0, 16.0, 432.0, 68.0, 62.0, 176.0, 29.0, - 9.0, 10.0, 6.0, 16.0, 80.0, 36.0, 30.0, 36.0, 88.0, 96.0, 336.0, - 160.0, 20.0, 12.0, 38.0, 18.0, 16.0, 8.0, 160.0, 11.0, 112.0, 192.0, - 5.0, 17.0, 84.0, 88.0, 2.0, 48.0, 1.0, 104.0, 232.0, 0.0, 40.0, - 56.0, 84.0, 384.0, 28.0, 8.0, 44.0, 26.0, 192.0, 3.0, 124.0, 88.0, - 7.0, 384.0, 64.0, 32.0, 10.0, 11.0, 168.0, 54.0, 38.0, 32.0, 320.0, - 24.0, 30.0, 72.0, 30.0, 96.0, 112.0, 0.0, 58.0, 352.0, 16.0, 36.0, - 40.0, 160.0, 14.0, 24.0, 116.0, 14.0, 52.0, 8.0, 368.0, 144.0, 144.0, - 112.0, 80.0, 44.0, 272.0, 8.0, 240.0, 24.0, 224.0, 0.0, 20.0, 22.0, - 25.0, 64.0, 16.0, 16.0, 48.0, 24.0, 2.0, 96.0, 56.0, 46.0, 76.0, - 11.0, 8.0, 80.0, 48.0, 116.0, 104.0, 48.0, 8.0, 16.0, 23.0, 58.0, - 96.0, 40.0, 24.0, 6.0, 13.0, 24.0, 12.0, 64.0, 8.0, 32.0, 416.0, - 4.0, 42.0, 200.0, 36.0, 80.0, 25.0, 64.0, 80.0, 320.0, 38.0, 16.0, - 464.0, 32.0, 32.0, 13.0, 224.0, 32.0, 17.0, 0.0, 64.0, 1.0, 24.0, - 60.0, 27.0, 176.0, 56.0, 240.0, 32.0, 112.0, 0.0, 56.0, 56.0, 72.0, - 120.0, 496.0, 28.0, 108.0, 14.0, 88.0, 416.0, 96.0, 116.0, 100.0, 100.0, - 240.0, 24.0, 124.0, 30.0, 19.0, 48.0, 160.0, 4.0, 0.0, 40.0, 50.0, - 21.0, 184.0, 116.0, 240.0, 240.0, 16.0, 208.0, 20.0, 9.0, 120.0, 36.0, - 1.0, 96.0, 26.0, 28.0, 0.0, 96.0, 8.0, 184.0, 30.0, 416.0, 112.0, - 144.0, 32.0, 80.0, 18.0, 36.0, 448.0, 152.0, 68.0, 14.0, 58.0, 144.0, - 12.0, 184.0, 96.0, 48.0, 23.0, 30.0, 100.0, 40.0, 6.0, 416.0, 4.0, - 32.0, 96.0, 40.0, 124.0, 9.0, 5.0, 4.0, 352.0, 26.0, 8.0, 58.0, - 96.0, 192.0, 304.0, 19.0, 272.0, 4.0, 64.0, 14.0, 6.0, 224.0, 7.0, - 224.0, 24.0, 224.0, 10.0, 224.0, 32.0, 32.0, 6.0, 25.0, 88.0, 80.0, - 64.0, 38.0, 0.0, 48.0, 29.0, 23.0, 20.0, 23.0, 12.0, 120.0, 6.0, - 17.0, 0.0, 400.0, 2.0, 80.0, 144.0, 36.0, 96.0, 320.0, 336.0, 108.0, - 4.0, 176.0, 48.0, 192.0, 10.0, 62.0, 62.0, 17.0, 160.0, 208.0, 12.0, - 48.0, 288.0, 108.0, 108.0, 120.0, 40.0, 26.0, 16.0, 4.0, 128.0, 12.0, - 28.0, 192.0, 6.0, 176.0, 124.0, 36.0, 152.0, 24.0, 27.0, 288.0, 224.0, - 16.0, 24.0, 25.0, 416.0, 5.0, 5.0, 84.0, 168.0, 0.0, 22.0, 30.0, - 25.0, 384.0, 8.0, 160.0, 20.0, 25.0, 56.0, 72.0, 14.0, 8.0, 80.0, - 50.0, 38.0, 416.0, 0.0, 52.0, 24.0, 240.0, 18.0, 124.0, 32.0, 60.0, - 136.0, 168.0, 40.0, 32.0, 88.0, 120.0, 31.0, 56.0, 40.0, 22.0, 120.0, - 496.0, 27.0, 34.0, 176.0, 72.0, 18.0, 14.0, 8.0, 36.0, 12.0, 8.0, - 64.0, 26.0, 42.0, 64.0, 10.0, 48.0, 176.0, 104.0, 432.0, 14.0, 96.0, - 100.0, 8.0, 25.0, 248.0, 32.0, 23.0, 10.0, 38.0, 128.0, 80.0, 112.0, - 160.0, 6.0, 16.0, 56.0, 96.0, 112.0, 144.0, 16.0, 16.0, 184.0, 23.0, - 0.0, 18.0, 128.0, 0.0, 26.0, 128.0, 96.0, 31.0, 52.0, 40.0, 4.0, - 84.0, 46.0, 160.0, 56.0, 248.0, 368.0, 176.0, 12.0, 48.0, 60.0, 56.0, - 28.0, 30.0, 496.0, 40.0, 12.0, 24.0, 22.0, 208.0, 16.0, 416.0, 108.0, - 50.0, 6.0, 20.0, 14.0, 120.0, 104.0, 29.0, 60.0, 64.0, 208.0, 8.0, - 304.0, 64.0, 448.0, 26.0, 56.0, 224.0, 84.0, 34.0, 13.0, 352.0, 0.0, - 20.0, 336.0, 20.0, 112.0, 40.0, 176.0, 48.0, 8.0, 26.0, 40.0, 6.0, - 128.0, 12.0, 44.0, 24.0, 6.0, 56.0, 10.0, 8.0, 64.0, 40.0, 160.0, - 192.0, 27.0, 128.0, 40.0, 34.0, 288.0, 10.0, 28.0, 56.0, 2.0, 52.0, - 4.0, 224.0, 120.0, 100.0, 184.0, 64.0, 48.0, 232.0, 32.0, 432.0, 320.0, - 192.0, 100.0, 2.0, 12.0, 176.0, 352.0, 24.0, 4.0, 128.0, 240.0, 11.0, - 240.0, 21.0, 208.0, 496.0, 21.0, 304.0, 100.0, 7.0, 60.0, 104.0, 96.0, - 232.0, 112.0, 384.0, 104.0, 44.0, 24.0, 288.0, 0.0, 108.0, 56.0, 496.0, - 22.0, 192.0, 320.0, 58.0, 3.0, 72.0, 432.0, 76.0, 168.0, 68.0, 16.0, - 48.0, 32.0, 16.0, 6.0, 68.0, 44.0, 144.0, 240.0, 208.0, 320.0, 30.0, - 224.0, 152.0, 96.0, 2.0, 88.0, 38.0, 16.0, 32.0, 4.0, 4.0, 36.0, - 20.0, 128.0, 36.0, 44.0, 34.0, 336.0, 232.0, 36.0, 28.0, 12.0, 416.0, - 400.0, 52.0, 176.0, 22.0, 13.0, 128.0, 10.0, 88.0, 26.0, 108.0, 84.0, - 29.0, 136.0, 76.0, 12.0, 16.0, 160.0, 17.0, 432.0, 48.0, 50.0, 29.0, - 22.0, 56.0, 112.0, 19.0, 30.0, 120.0, 100.0, 24.0, 8.0, 384.0, 16.0, - 10.0, 8.0, 144.0, 30.0, 72.0, 17.0, 42.0, 48.0, 304.0, 40.0, 48.0, - 1.0, 34.0, 62.0, 40.0, 16.0, 14.0, 48.0, 200.0, 200.0, 116.0, 96.0, - 96.0, 56.0, 8.0, 29.0, 10.0, 320.0, 288.0, 108.0, 56.0, 100.0, 40.0, - 368.0, 8.0, 4.0, 88.0, 3.0, 5.0, 44.0, 4.0, 16.0, 20.0, 8.0, - 208.0, 40.0, 368.0, 36.0, 124.0, 48.0, 48.0, 96.0, 12.0, 112.0, 112.0, - 192.0, 20.0, 168.0, 144.0, 64.0, 176.0, 92.0, 25.0, 88.0, 24.0, 48.0, - 320.0, 28.0, 184.0, 20.0, 52.0, 36.0, 38.0, 10.0, 18.0, 80.0, 68.0, - 240.0, 0.0, 56.0, 30.0, 56.0, 88.0, 29.0, 48.0, 40.0, 20.0, 84.0, - 64.0, 80.0, 16.0, 108.0, 13.0, 19.0, 3.0, 27.0, 11.0, 10.0, 5.0, - 30.0, 6.0, 368.0, 28.0, 14.0, 100.0, 30.0, 80.0, 4.0, 80.0, 26.0, - 72.0, 11.0, 144.0, 336.0, 8.0, 80.0, 16.0, 72.0, 20.0, 232.0, 288.0, - 176.0, 52.0, 144.0, 40.0, 112.0, 80.0, 48.0, 176.0, 2.0, 448.0, 48.0, - 96.0, 40.0, 100.0, 112.0, 104.0, 16.0, 21.0, 248.0, 160.0, 12.0, 24.0, - 116.0, 64.0, 24.0, 288.0, 400.0, 176.0, 152.0, 28.0, 30.0, 24.0, 0.0, - 416.0, 24.0, 60.0, 21.0, 80.0, 168.0, 20.0, 30.0, 52.0, 20.0, 224.0, - 136.0, 44.0, 32.0, 40.0, 36.0, 60.0, 9.0, 116.0, 72.0, 12.0, 12.0, - 8.0, 5.0, 60.0, 36.0, 34.0, 124.0, 232.0, 128.0, 62.0, 2.0, 8.0, - 352.0, 224.0, 8.0, 24.0, 100.0, 112.0, 26.0, 44.0, 18.0, 11.0, 400.0, - 112.0, 4.0, 21.0, 256.0, 6.0, 80.0, 108.0, 16.0, 30.0, 240.0, 248.0, - 22.0, 104.0, 68.0, 25.0, 8.0, 22.0, 112.0, 44.0, 108.0, 288.0, 0.0, - 16.0, 14.0, 7.0, 128.0, 80.0, 48.0, 24.0, 16.0, 96.0, 0.0, 7.0, - 32.0, 160.0, 76.0, 28.0, 56.0, 30.0, 116.0, 20.0, 25.0, 124.0, 22.0, - 18.0, 176.0, 432.0, 352.0, 23.0, 0.0, 8.0, 128.0, 224.0, 384.0, 128.0, - 96.0, 336.0, 5.0, 48.0, 64.0, 23.0, 13.0, 184.0, 58.0, 432.0, 72.0, - 16.0, 6.0, 88.0, 304.0, 14.0, 168.0, 3.0, 80.0, 20.0, 0.0, 120.0, - 48.0, 464.0, 6.0, 10.0, 112.0, 17.0, 32.0, 4.0, 14.0, 400.0, 352.0, - 15.0, 400.0, 72.0, 352.0, 32.0, 208.0, 248.0, 4.0, 28.0, 20.0, 176.0, - 60.0, 12.0, 64.0, 1.0, 29.0, 30.0, 29.0, 9.0, 0.0, 80.0, 34.0, - 22.0, 128.0, 336.0, 68.0, 28.0, 448.0, 8.0, 46.0, 21.0, 28.0, 48.0, - 10.0, 44.0, 16.0, 56.0, 76.0, 480.0, 64.0, 8.0, 4.0, 200.0, 24.0, - 104.0, 384.0, 76.0, 168.0, 92.0, 20.0, 12.0, 336.0, 288.0, 32.0, 448.0, - 448.0, 168.0, 26.0, 56.0, 80.0, 48.0, 272.0, 72.0, 40.0, 152.0, 10.0, - 224.0, 0.0, 464.0, 208.0, 64.0, 21.0, 152.0, 352.0, 30.0, 34.0, 46.0, - 200.0, 8.0, 84.0, 272.0, 36.0, 38.0, 44.0, 4.0, 480.0, 432.0, 27.0, - 72.0, 24.0, 46.0, 52.0, 124.0, 96.0, 4.0, 112.0, 22.0, 24.0, 20.0, - 192.0, 29.0, 46.0, 216.0, 1.0, 128.0, 32.0, 116.0, 46.0, 60.0, 28.0, - 0.0, 192.0, 176.0, 76.0, 27.0, 52.0, 15.0, 160.0, 96.0, 120.0, 32.0, - 52.0, 184.0, 124.0, 56.0, 48.0, 480.0, 30.0, 28.0, 248.0, 2.0, 72.0, - 14.0, 272.0, 5.0, 76.0, 56.0, 100.0, 5.0, 20.0, 8.0, 480.0, 184.0, - 152.0, 21.0, 24.0, 184.0, 160.0, 40.0, 56.0, 80.0, 100.0, 23.0, 100.0, - 21.0, 11.0, 224.0, 160.0, 48.0, 108.0, 0.0, 40.0, 42.0, 32.0, 40.0, - 96.0, 27.0, 152.0, 4.0, 22.0, 80.0, 84.0, 24.0, 124.0, 96.0, 0.0, - 96.0, 128.0, 34.0, 52.0, 240.0, 432.0, 64.0, 5.0, 352.0, 16.0, 29.0, - 7.0, 42.0, 8.0, 208.0, 256.0, 0.0, 6.0, 16.0, 20.0, 100.0, 12.0, - 48.0, 144.0, 64.0, 128.0, 384.0, 3.0, 464.0, 21.0, 2.0, 496.0, 0.0, - 56.0, 22.0, 432.0, 88.0, 36.0, 432.0, 30.0, 32.0, 0.0, 8.0, 208.0, - 432.0, 12.0, 240.0, 160.0, 20.0, 14.0, 34.0, 80.0, 44.0, 14.0, 44.0, - 8.0, 240.0, 6.0, 84.0, 1.0, 18.0, 31.0, 44.0, 496.0, 20.0, 18.0, - 120.0, 20.0, 8.0, 24.0, 26.0, 18.0, 216.0, 3.0, 208.0, 30.0, 240.0, - 96.0, 17.0, 12.0, 20.0, 128.0, 52.0, 104.0, 54.0, 21.0, 60.0, 42.0, - 4.0, 84.0, 56.0, 40.0, 22.0, 34.0, 88.0, 176.0, 20.0, 28.0, 248.0, - 448.0, 32.0, 64.0, 144.0, 192.0, 36.0, 44.0, 34.0, 42.0, 0.0, 40.0, - 14.0, 192.0, 15.0, 8.0, 128.0, 128.0, 21.0, 48.0, 88.0, 48.0, 192.0, - 480.0, 16.0, 48.0, 304.0, 400.0, 92.0, 3.0, 44.0, 23.0, 58.0, 160.0, - 38.0, 72.0, 40.0, 4.0, 17.0, 240.0, 8.0, 96.0, 30.0, 26.0, 96.0, - 9.0, 21.0, 56.0, 50.0, 32.0, 92.0, 160.0, 56.0, 144.0, 496.0, 60.0, - 160.0, 288.0, 64.0, 52.0, 21.0, 112.0, 24.0, 0.0, 27.0, 50.0, 68.0, - 23.0, 8.0, 9.0, 144.0, 96.0, 88.0, 144.0, 8.0, 6.0, 7.0, 14.0, - 9.0, 116.0, 448.0, 21.0, 72.0, 26.0, 48.0, 320.0, 432.0, 248.0, 19.0, - 120.0, 20.0, 27.0, 304.0, 44.0, 31.0, 232.0, 46.0, 288.0, 62.0, 232.0, - 8.0, 208.0, 40.0, 272.0, 84.0, 416.0, 34.0, 20.0, 176.0, 288.0, 20.0, - 16.0, 224.0, 2.0, 2.0, 40.0, 432.0, 60.0, 232.0, 64.0, 24.0, 368.0, - 26.0, 22.0, 32.0, 352.0, 46.0, 56.0, 192.0, 192.0, 104.0, 12.0, 272.0, - 208.0, 25.0, 208.0, 40.0, 80.0, 464.0, 4.0, 416.0, 400.0, 224.0, 0.0, - 240.0, 92.0, 60.0, 48.0, 16.0, 22.0, 20.0, 26.0, 112.0, 34.0, 21.0, - 24.0, 96.0, 112.0, 232.0, 320.0, 240.0, 18.0, 184.0, 8.0, 36.0, 48.0, - 80.0, 12.0, 208.0, 40.0, 24.0, 58.0, 52.0, 10.0, 240.0, 84.0, 128.0, - 46.0, 52.0, 416.0, 25.0, 240.0, 48.0, 12.0, 176.0, 0.0, 36.0, 32.0, - 64.0, 80.0, 48.0, 16.0, 208.0, 1.0, 34.0, 20.0, 152.0, 12.0, 36.0, - 368.0, 288.0, 52.0, 12.0, 9.0, 480.0, 136.0, 400.0, 448.0, 8.0, 40.0, - 4.0, 4.0, 352.0, 22.0, 4.0, 22.0, 4.0, 0.0, 12.0, 32.0, 56.0, - 116.0, 22.0, 14.0, 232.0, 112.0, 80.0, 48.0, 24.0, 432.0, 192.0, 6.0, - 14.0, 36.0, 4.0, 272.0, 56.0, 96.0, 18.0, 48.0, 16.0, 20.0, 29.0, - 32.0, 16.0, 20.0, 28.0, 136.0, 64.0, 36.0, 50.0, 0.0, 26.0, 16.0, - 54.0, 128.0, 4.0, 42.0, 496.0, 160.0, 112.0, 320.0, 44.0, 4.0, 6.0, - 44.0, 6.0, 2.0, 29.0, 0.0, 3.0, 36.0, 24.0, 104.0, 96.0, 448.0, - 224.0, 3.0, 54.0, 18.0, 72.0, 24.0, 32.0, 124.0, 96.0, 22.0, 64.0, - 80.0, 31.0, 400.0, 4.0, 120.0, 24.0, 80.0, 6.0, 176.0, 42.0, 16.0, - 84.0, 24.0, 240.0, 208.0, 16.0, 62.0, 76.0, 50.0, 72.0, 184.0, 32.0, - 96.0, 56.0, 256.0, 15.0, 16.0, 96.0, 272.0, 52.0, 240.0, 176.0, 19.0, - 48.0, 44.0, 13.0, 6.0, 160.0, 36.0, 216.0, 56.0, 2.0, 100.0, 136.0, - 16.0, 52.0, 232.0, 14.0, 128.0, 288.0, 304.0, 12.0, 464.0, 16.0, 36.0, - 30.0, 29.0, 216.0, 48.0, 44.0, 84.0, 0.0, 30.0, 64.0, 20.0, 128.0, - 28.0, 216.0, 9.0, 0.0, 256.0, 432.0, 23.0, 0.0, 120.0, 56.0, 68.0, - 64.0, 16.0, 3.0, 76.0, 40.0, 23.0, 30.0, 64.0, 64.0, 0.0, 416.0, - 200.0, 14.0, 10.0, 336.0, 68.0, 232.0, 96.0, 0.0, 20.0, 48.0, 496.0, - 10.0, 44.0, 1.0, 16.0, 368.0, 128.0, 5.0, 52.0, 248.0, 240.0, 144.0, - 208.0, 176.0, 19.0, 104.0, 104.0, 96.0, 160.0, 124.0, 56.0, 4.0, 304.0, - 10.0, 416.0, 19.0, 448.0, 8.0, 224.0, 15.0, 29.0, 9.0, 5.0, 248.0, - 232.0, 92.0, 4.0, 20.0, 34.0, 8.0, 36.0, 64.0, 16.0, 18.0, 4.0, - 4.0, 2.0, 384.0, 16.0, 288.0, 64.0, 80.0, 320.0, 21.0, 36.0, 336.0, - 64.0, 200.0, 152.0, 24.0, 36.0, 14.0, 76.0, 0.0, 44.0, 176.0, 96.0, - 272.0, 64.0, 38.0, 0.0, 48.0, 8.0, 208.0, 144.0, 32.0, 112.0, 11.0, - 240.0, 21.0, 208.0, 432.0, 336.0, 88.0, 19.0, 160.0, 200.0, 1.0, 40.0, - 64.0, 24.0, 54.0, 128.0, 8.0, 40.0, 20.0, 464.0, 8.0, 28.0, 96.0, - 320.0, 1.0, 248.0, 28.0, 60.0, 0.0, 7.0, 448.0, 304.0, 88.0, 320.0, - 64.0, 208.0, 160.0, 232.0, 30.0, 0.0, 464.0, 30.0, 104.0, 22.0, 272.0, - 4.0, 240.0, 200.0, 56.0, 10.0, 40.0, 480.0, 13.0, 30.0, 400.0, 120.0, - 52.0, 88.0, 448.0, 92.0, 62.0, 34.0, 25.0, 40.0, 30.0, 68.0, 16.0, - 27.0, 16.0, 40.0, 84.0, 50.0, 40.0, 48.0, 11.0, 24.0, 368.0, 32.0, - 2.0, 112.0, 48.0, 14.0, 64.0, 3.0, 24.0, 32.0, 20.0, 25.0, 104.0, - 192.0, 30.0, 84.0, 208.0, 76.0, 96.0, 176.0, 44.0, 64.0, 272.0, 480.0, - 40.0, 124.0, 12.0, 104.0, 28.0, 34.0, 38.0, 100.0, 36.0, 384.0, 20.0, - 192.0, 34.0, 64.0, 120.0, 17.0, 224.0, 48.0, 2.0, 26.0, 124.0, 112.0, - 48.0, 288.0, 52.0, 160.0, 24.0, 432.0, 88.0, 48.0, 5.0, 0.0, 16.0, - 168.0, 248.0, 24.0, 12.0, 28.0, 40.0, 38.0, 14.0, 96.0, 6.0, 2.0, - 6.0, 11.0, 84.0, 16.0, 36.0, 36.0, 12.0, 320.0, 48.0, 16.0, 24.0, - 12.0, 18.0, 6.0, 22.0, 29.0, 480.0, 124.0, 80.0, 21.0, 3.0, 96.0, - 336.0, 4.0, 48.0, 72.0, 36.0, 36.0, 26.0, 80.0, 23.0, 32.0, 200.0, - 88.0, 7.0, 200.0, 352.0, 56.0, 15.0, 104.0, 21.0, 0.0, 54.0, 10.0, - 52.0, 2.0, 16.0, 352.0, 38.0, 80.0, 18.0, 9.0, 108.0, 0.0, 46.0, - 0.0, 10.0, 192.0, 54.0, 42.0, 116.0, 14.0, 22.0, 36.0, 176.0, 16.0, - 15.0, 40.0, 6.0, 30.0, 6.0, 100.0, 4.0, 112.0, 320.0, 32.0, 30.0, - 240.0, 5.0, 4.0, 8.0, 224.0, 92.0, 12.0, 17.0, 0.0, 3.0, 400.0, - 6.0, 240.0, 4.0, 44.0, 400.0, 240.0, 480.0, 42.0, 208.0, 23.0, 124.0, - 48.0, 40.0, 16.0, 40.0, 18.0, 32.0, 144.0, 29.0, 52.0, 24.0, 16.0, - 52.0, 240.0, 192.0, 14.0, 92.0, 30.0, 88.0, 124.0, 80.0, 14.0, 6.0, - 84.0, 104.0, 40.0, 4.0, 160.0, 58.0, 22.0, 13.0, 208.0, 44.0, 36.0, - 4.0, 4.0, 28.0, 104.0, 96.0, 24.0, 48.0, 36.0, 152.0, 6.0, 18.0, - 240.0, 80.0, 8.0, 52.0, 0.0, 27.0, 256.0, 62.0, 0.0, 84.0, 26.0, - 72.0, 0.0, 32.0, 32.0, 176.0, 18.0, 20.0, 72.0, 60.0, 31.0, 20.0, - 42.0, 128.0, 19.0, 248.0, 0.0, 40.0, 4.0, 200.0, 84.0, 26.0, 21.0, - 44.0, 36.0, 176.0, 88.0, 32.0, 224.0, 32.0, 76.0, 16.0, 31.0, 224.0, - 224.0, 30.0, 368.0, 38.0, 56.0, 256.0, 22.0, 32.0, 62.0, 120.0, 184.0, - 52.0, 112.0, 224.0, 116.0, 36.0, 176.0, 144.0, 26.0, 104.0, 44.0, 30.0, - 52.0, 368.0, 40.0, 12.0, 224.0, 32.0, 26.0, 8.0, 20.0, 100.0, 248.0, - 248.0, 20.0, 192.0, 4.0, 44.0, 8.0, 8.0, 232.0, 32.0, 480.0, 22.0, - 28.0, 168.0, 184.0, 96.0, 40.0, 200.0, 29.0, 12.0, 200.0, 2.0, 42.0, - 52.0, 9.0, 34.0, 304.0, 96.0, 384.0, 30.0, 168.0, 120.0, 160.0, 28.0, - 384.0, 22.0, 88.0, 384.0, 464.0, 40.0, 80.0, 18.0, 30.0, 0.0, 8.0, - 216.0, 168.0, 168.0, 216.0, 0.0, 480.0, 72.0, 28.0, 28.0, 16.0, 24.0, - 0.0, 36.0, 480.0, 24.0, 96.0, 336.0, 16.0, 56.0, 200.0, 11.0, 31.0, - 24.0, 184.0, 4.0, 124.0, 176.0, 48.0, 112.0, 54.0, 100.0, 24.0, 10.0, - 36.0, 108.0, 14.0, 68.0, 36.0, 16.0, 200.0, 15.0, 240.0, 76.0, 116.0, - 15.0, 26.0, 208.0, 2.0, 256.0, 29.0, 6.0, 10.0, 32.0, 176.0, 496.0, - 480.0, 176.0, 224.0, 52.0, 76.0, 36.0, 18.0, 3.0, 72.0, 176.0, 64.0, - 16.0, 128.0, 92.0, 416.0, 1.0, 176.0, 8.0, 46.0, 96.0, 9.0, 96.0, - 176.0, 17.0, 464.0, 80.0, 10.0, 14.0, 7.0, 19.0, 48.0, 24.0, 176.0, - 23.0, 80.0, 32.0, 36.0, 92.0, 50.0, 12.0, 116.0, 8.0, 16.0, 112.0, - 288.0, 144.0, 384.0, 24.0, 72.0, 26.0, 64.0, 240.0, 448.0, 20.0, 28.0, - 496.0, 168.0, 16.0, 48.0, 152.0, 52.0, 288.0, 0.0, 80.0, 19.0, 124.0, - 4.0, 80.0, 0.0, 4.0, 136.0, 304.0, 50.0, 15.0, 32.0, 80.0, 50.0, - 216.0, 272.0, 160.0, 128.0, 24.0, 368.0, 26.0, 240.0, 224.0, 124.0, 104.0, - 29.0, 12.0, 320.0, 432.0, 4.0, 8.0, 200.0, 40.0, 240.0, 28.0, 76.0, - 8.0, 2.0, 32.0, 96.0, 23.0, 16.0, 144.0, 8.0, 112.0, 232.0, 44.0, - 28.0, 60.0, 84.0, 96.0, 58.0, 208.0, 320.0, 6.0, 21.0, 54.0, 14.0, - 320.0, 16.0, 100.0, 24.0, 8.0, 304.0, 464.0, 56.0, 480.0, 384.0, 416.0, - 3.0, 80.0, 12.0, 46.0, 92.0, 40.0, 64.0, 38.0, 17.0, 240.0, 11.0, - 216.0, 36.0, 8.0, 40.0, 112.0, 0.0, 224.0, 24.0, 10.0, 16.0, 124.0, - 18.0, 1.0, 50.0, 46.0, 20.0, 31.0, 44.0, 448.0, 5.0, 104.0, 12.0, - 124.0, 272.0, 12.0, 8.0, 3.0, 120.0, 19.0, 464.0, 12.0, 34.0, 160.0, - 144.0, 5.0, 13.0, 224.0, 32.0, 136.0, 432.0, 12.0, 2.0, 416.0, 16.0, - 16.0, 58.0, 16.0, 8.0, 44.0, 240.0, 18.0, 42.0, 64.0, 19.0, 64.0, - 5.0, 50.0, 19.0, 16.0, 432.0, 96.0, 22.0, 400.0, 232.0, 25.0, 50.0, - 12.0, 12.0, 10.0, 120.0, 52.0, 44.0, 56.0, 68.0, 18.0, 19.0, 88.0, - 34.0, 336.0, 64.0, 8.0, 84.0, 32.0, 20.0, 104.0, 42.0, 29.0, 352.0, - 160.0, 8.0, 52.0, 288.0, 14.0, 52.0, 184.0, 18.0, 208.0, 38.0, 112.0, - 72.0, 32.0, 68.0, 84.0, 5.0, 36.0, 8.0, 34.0, 224.0, 30.0, 20.0, - 20.0, 16.0, 18.0, 224.0, 0.0, 68.0, 80.0, 10.0, 23.0, 34.0, 0.0, - 42.0, 432.0, 58.0, 17.0, 48.0, 96.0, 144.0, 0.0, 160.0, 232.0, 42.0, - 6.0, 160.0, 56.0, 50.0, 7.0, 24.0, 52.0, 24.0, 52.0, 176.0, 68.0, - 24.0, 20.0, 64.0, 304.0, 4.0, 80.0, 28.0, 26.0, 80.0, 72.0, 28.0, - 200.0, 8.0, 416.0, 16.0, 30.0, 288.0, 20.0, 12.0, 368.0, 124.0, 56.0, - 32.0, 14.0, 22.0, 448.0, 64.0, 232.0, 192.0, 40.0, 12.0, 38.0, 0.0, - 60.0, 0.0, 60.0, 24.0, 160.0, 32.0, 20.0, 26.0, 20.0, 6.0, 120.0, - 116.0, 240.0, 9.0, 20.0, 208.0, 26.0, 124.0, 64.0, 10.0, 48.0, 44.0, - 432.0, 50.0, 12.0, 22.0, 0.0, 5.0, 48.0, 108.0, 120.0, 2.0, 13.0, - 19.0, 32.0, 24.0, 304.0, 100.0, 432.0, 200.0, 12.0, 54.0, 72.0, 3.0, - 352.0, 64.0, 240.0, 0.0, 288.0, 21.0, 20.0, 4.0, 14.0, 15.0, 80.0, - 432.0, 76.0, 4.0, 176.0, 272.0, 8.0, 0.0, 336.0, 288.0, 304.0, 16.0, - 21.0, 22.0, 42.0, 8.0, 20.0, 28.0, 38.0, 32.0, 12.0, 160.0, 116.0, - 8.0, 40.0, 18.0, 176.0, 26.0, 224.0, 64.0, 116.0, 48.0, 240.0, 32.0, - 16.0, 0.0, 368.0, 0.0, 19.0, 144.0, 9.0, 15.0, 68.0, 160.0, 192.0, - 12.0, 26.0, 108.0, 208.0, 28.0, 144.0, 416.0, 22.0, 28.0, 14.0, 80.0, - 352.0, 76.0, 9.0, 100.0, 92.0, 4.0, 62.0, 88.0, 72.0, 16.0, 88.0, - 22.0, 72.0, 4.0, 8.0, 272.0, 16.0, 0.0, 5.0, 76.0, 64.0, 54.0, - 136.0, 54.0, 208.0, 17.0, 16.0, 272.0, 88.0, 56.0, 240.0, 4.0, 56.0, - 136.0, 40.0, 2.0, 1.0, 28.0, 64.0, 13.0, 60.0, 496.0, 32.0, 176.0, - 28.0, 72.0, 116.0, 7.0, 76.0, 120.0, 200.0, 32.0, 384.0, 320.0, 58.0, - 12.0, 40.0, 60.0, 48.0, 42.0, 192.0, 496.0, 8.0, 0.0, 24.0, 432.0, - 25.0, 64.0, 80.0, 16.0, 52.0, 496.0, 224.0, 38.0, 24.0, 124.0, 27.0, - 17.0, 54.0, 496.0, 112.0, 272.0, 24.0, 68.0, 6.0, 28.0, 64.0, 68.0, - 160.0, 44.0, 4.0, 16.0, 192.0, 76.0, 120.0, 14.0, 384.0, 104.0, 272.0, - 464.0, 136.0, 18.0, 304.0, 4.0, 40.0, 52.0, 54.0, 30.0, 272.0, 16.0, - 8.0, 16.0, 31.0, 60.0, 128.0, 336.0, 8.0, 5.0, 56.0, 416.0, 8.0, - 10.0, 116.0, 0.0, 96.0, 26.0, 18.0, 8.0, 10.0, 20.0, 48.0, 18.0, - 10.0, 46.0, 0.0, 17.0, 200.0, 18.0, 20.0, 16.0, 192.0, 144.0, 56.0, - 4.0, 52.0, 25.0, 0.0, 64.0, 5.0, 7.0, 62.0, 352.0, 5.0, 30.0, - 112.0, 160.0, 5.0, 60.0, 40.0, 9.0, 44.0, 54.0, 56.0, 20.0, 0.0, - 34.0, 240.0, 12.0, 144.0, 92.0, 30.0, 27.0, 496.0, 2.0, 144.0, 40.0, - 32.0, 192.0, 16.0, 11.0, 84.0, 34.0, 2.0, 224.0, 22.0, 92.0, 464.0, - 42.0, 104.0, 464.0, 160.0, 24.0, 80.0, 16.0, 192.0, 48.0, 216.0, 19.0, - 26.0, 40.0, 27.0, 224.0, 192.0, 72.0, 184.0, 54.0, 304.0, 24.0, 30.0, - 7.0, 30.0, 28.0, 124.0, 124.0, 84.0, 30.0, 0.0, 248.0, 0.0, 208.0, - 84.0, 18.0, 496.0, 9.0, 56.0, 40.0, 28.0, 24.0, 6.0, 28.0, 224.0, - 208.0, 96.0, 28.0, 30.0, 224.0, 0.0, 44.0, 272.0, 38.0, 5.0, 28.0, - 76.0, 17.0, 96.0, 36.0, 48.0, 44.0, 4.0, 18.0, 20.0, 18.0, 11.0, - 368.0, 4.0, 48.0, 104.0, 84.0, 240.0, 4.0, 0.0, 26.0, 24.0, 64.0, - 56.0, 384.0, 40.0, 0.0, 7.0, 64.0, 46.0, 11.0, 56.0, 48.0, 21.0, - 108.0, 176.0, 44.0, 42.0, 100.0, 480.0, 4.0, 448.0, 16.0, 124.0, 16.0, - 56.0, 22.0, 0.0, 24.0, 88.0, 464.0, 48.0, 12.0, 80.0, 18.0, 8.0, - 14.0, 22.0, 176.0, 48.0, 17.0, 17.0, 28.0, 60.0, 216.0, 28.0, 120.0, - 256.0, 128.0, 62.0, 152.0, 8.0, 448.0, 96.0, 208.0, 168.0, 60.0, 25.0, - 88.0, 184.0, 92.0, 56.0, 26.0, 144.0, 96.0, 40.0, 20.0, 224.0, 80.0, - 32.0, 62.0, 92.0, 5.0, 1.0, 48.0, 80.0, 64.0, 248.0, 31.0, 10.0, - 6.0, 27.0, 120.0, 136.0, 120.0, 2.0, 48.0, 432.0, 22.0, 60.0, 72.0, - 58.0, 52.0, 32.0, 28.0, 64.0, 96.0, 184.0, 16.0, 42.0, 200.0, 54.0, - 16.0, 100.0, 26.0, 168.0, 120.0, 448.0, 72.0, 120.0, 336.0, 24.0, 464.0, - 48.0, 40.0, 432.0, 48.0, 192.0, 128.0, 52.0, 224.0, 0.0, 124.0, 8.0, - 6.0, 120.0, 96.0, 80.0, 64.0, 416.0, 42.0, 64.0, 96.0, 10.0, 26.0, - 104.0, 32.0, 25.0, 496.0, 304.0, 64.0, 128.0, 0.0, 22.0, 448.0, 32.0, - 88.0, 3.0, 8.0, 40.0, 7.0, 18.0, 64.0, 68.0, 12.0, 16.0, 42.0, - 192.0, 52.0, 16.0, 80.0, 12.0, 4.0, 192.0, 384.0, 480.0, 56.0, 9.0, - 92.0, 28.0, 52.0, 240.0, 84.0, 34.0, 352.0, 68.0, 50.0, 72.0, 4.0, - 0.0, 0.0, 120.0, 184.0, 160.0, 160.0, 28.0, 304.0, 6.0, 16.0, 80.0, - 10.0, 248.0, 48.0, 200.0, 120.0, 144.0, 88.0, 1.0, 84.0, 104.0, 20.0, - 28.0, 31.0, 0.0, 52.0, 176.0, 40.0, 136.0, 28.0, 8.0, 288.0, 116.0, - 112.0, 208.0, 40.0, 124.0, 20.0, 16.0, 8.0, 104.0, 112.0, 18.0, 25.0, - 10.0, 58.0, 27.0, 128.0, 12.0, 20.0, 23.0, 19.0, 13.0, 23.0, 19.0, - 288.0, 28.0, 32.0, 240.0, 232.0, 240.0, 448.0, 56.0, 224.0, 76.0, 192.0, - 4.0, 52.0, 3.0, 96.0, 13.0, 96.0, 176.0, 48.0, 304.0, 28.0, 0.0, - 352.0, 448.0, 68.0, 16.0, 60.0, 17.0, 16.0, 176.0, 17.0, 24.0, 384.0, - 25.0, 16.0, 9.0, 48.0, 48.0, 124.0, 27.0, 100.0, 12.0, 152.0, 112.0, - 104.0, 336.0, 72.0, 128.0, 4.0, 400.0, 96.0, 124.0, 184.0, 184.0, 7.0, - 88.0, 0.0, 200.0, 52.0, 28.0, 168.0, 16.0, 480.0, 240.0, 120.0, 96.0, - 112.0, 7.0, 12.0, 50.0, 24.0, 12.0, 4.0, 20.0, 26.0, 25.0, 0.0, - 36.0, 10.0, 272.0, 0.0, 46.0, 224.0, 120.0, 31.0, 68.0, 2.0, 30.0, - 40.0, 54.0, 92.0, 8.0, 224.0, 448.0, 54.0, 128.0, 368.0, 1.0, 224.0, - 152.0, 2.0, 368.0, 120.0, 2.0, 112.0, 432.0, 27.0, 0.0, 384.0, 88.0, - 352.0, 320.0, 8.0, 5.0, 4.0, 30.0, 88.0, 8.0, 0.0, 76.0, 208.0, - 2.0, 184.0, 13.0, 100.0, 12.0, 88.0, 24.0, 2.0, 32.0, 176.0, 400.0, - 13.0, 13.0, 50.0, 2.0, 10.0, 448.0, 448.0, 72.0, 368.0, 432.0, 272.0, - 176.0, 208.0, 64.0, 10.0, 14.0, 96.0, 7.0, 60.0, 88.0, 0.0, 104.0, - 38.0, 208.0, 168.0, 116.0, 10.0, 336.0, 144.0, 5.0, 28.0, 13.0, 58.0, - 16.0, 32.0, 56.0, 104.0, 29.0, 11.0, 24.0, 52.0, 30.0, 240.0, 10.0, - 18.0, 10.0, 144.0, 12.0, 20.0, 8.0, 11.0, 72.0, 24.0, 96.0, 480.0, - 160.0, 14.0, 200.0, 352.0, 52.0, 16.0, 76.0, 40.0, 8.0, 56.0, 144.0, - 14.0, 31.0, 136.0, 30.0, 112.0, 416.0, 72.0, 48.0, 48.0, 0.0, 24.0, - 52.0, 16.0, 96.0, 28.0, 448.0, 68.0, 56.0, 108.0, 200.0, 40.0, 26.0, - 112.0, 256.0, 240.0, 56.0, 48.0, 12.0, 112.0, 48.0, 40.0, 10.0, 304.0, - 34.0, 120.0, 14.0, 32.0, 400.0, 352.0, 84.0, 144.0, 5.0, 24.0, 42.0, - 54.0, 28.0, 400.0, 56.0, 136.0, 80.0, 240.0, 152.0, 52.0, 36.0, 50.0, - 192.0, 1.0, 136.0, 76.0, 216.0, 60.0, 50.0, 8.0, 216.0, 8.0, 32.0, - 272.0, 40.0, 224.0, 21.0, 30.0, 34.0, 144.0, 416.0, 20.0, 38.0, 44.0, - 100.0, 144.0, 464.0, 29.0, 416.0, 16.0, 272.0, 1.0, 54.0, 72.0, 480.0, - 12.0, 464.0, 16.0, 23.0, 2.0, 240.0, 32.0, 64.0, 42.0, 120.0, 84.0, - 29.0, 17.0, 80.0, 92.0, 0.0, 14.0, 384.0, 128.0, 28.0, 108.0, 248.0, - 144.0, 23.0, 48.0, 26.0, 42.0, 64.0, 12.0, 104.0, 20.0, 24.0, 48.0, - 2.0, 20.0, 56.0, 216.0, 6.0, 48.0, 80.0, 8.0, 24.0, 64.0, 352.0, - 80.0, 128.0, 208.0, 28.0, 10.0, 416.0, 20.0, 32.0, 24.0, 92.0, 9.0, - 9.0, 60.0, 56.0, 36.0, 14.0, 320.0, 22.0, 26.0, 16.0, 16.0, 120.0, - 38.0, 21.0, 6.0, 76.0, 0.0, 8.0, 184.0, 208.0, 34.0, 80.0, 20.0, - 23.0, 72.0, 416.0, 56.0, 80.0, 27.0, 38.0, 160.0, 12.0, 22.0, 16.0, - 6.0, 32.0, 0.0, 256.0, 26.0, 24.0, 6.0, 16.0, 16.0, 100.0, 26.0, - 64.0, 20.0, 192.0, 88.0, 11.0, 288.0, 176.0, 8.0, 7.0, 104.0, 26.0, - 60.0, 34.0, 336.0, 0.0, 304.0, 72.0, 144.0, 6.0, 58.0, 104.0, 16.0, - 3.0, 27.0, 12.0, 25.0, 8.0, 12.0, 2.0, 0.0, 16.0, 24.0, 208.0, - 42.0, 24.0, 0.0, 16.0, 17.0, 112.0, 368.0, 42.0, 20.0, 104.0, 0.0, - 336.0, 25.0, 464.0, 72.0, 108.0, 36.0, 120.0, 104.0, 96.0, 416.0, 272.0, - 4.0, 0.0, 8.0, 208.0, 128.0, 200.0, 112.0, 124.0, 20.0, 50.0, 128.0, - 96.0, 0.0, 12.0, 24.0, 26.0, 8.0, 14.0, 0.0, 19.0, 96.0, 10.0, - 20.0, 60.0, 24.0, 26.0, 56.0, 416.0, 64.0, 216.0, 56.0, 24.0, 0.0, - 144.0, 4.0, 68.0, 5.0, 0.0, 384.0, 200.0, 60.0, 320.0, 256.0, 62.0, - 100.0, 17.0, 128.0, 18.0, 29.0, 64.0, 17.0, 21.0, 26.0, 48.0, 120.0, - 76.0, 2.0, 31.0, 32.0, 68.0, 15.0, 50.0, 152.0, 224.0, 26.0, 176.0, - 31.0, 192.0, 25.0, 240.0, 168.0, 80.0, 10.0, 58.0, 240.0, 28.0, 9.0, - 13.0, 336.0, 48.0, 24.0, 23.0, 160.0, 0.0, 25.0, 12.0, 23.0, 192.0, - 88.0, 108.0, 16.0, 3.0, 104.0, 60.0, 14.0, 12.0, 208.0, 7.0, 16.0, - 448.0, 64.0, 368.0, 480.0, 0.0, 32.0, 0.0, 25.0, 56.0, 31.0, 108.0, - 0.0, 1.0, 56.0, 116.0, 216.0, 12.0, 8.0, 13.0, 48.0, 112.0, 192.0, - 30.0, 36.0, 248.0, 104.0, 152.0, 48.0, 144.0, 38.0, 128.0, 92.0, 112.0, - 50.0, 200.0, 96.0, 64.0, 32.0, 22.0, 48.0, 34.0, 208.0, 96.0, 32.0, - 120.0, 16.0, 20.0, 96.0, 272.0, 54.0, 29.0, 416.0, 144.0, 0.0, 232.0, - 48.0, 10.0, 24.0, 336.0, 22.0, 14.0, 6.0, 12.0, 400.0, 160.0, 32.0, - 12.0, 58.0, 216.0, 144.0, 208.0, 116.0, 16.0, 44.0, 15.0, 320.0, 88.0, - 21.0, 12.0, 25.0, 10.0, 96.0, 232.0, 144.0, 9.0, 384.0, 100.0, 9.0, - 200.0, 432.0, 8.0, 60.0, 224.0, 62.0, 92.0, 88.0, 124.0, 16.0, 56.0, - 464.0, 40.0, 368.0, 29.0, 448.0, 2.0, 92.0, 36.0, 0.0, 448.0, 480.0, - 128.0, 12.0, 14.0, 24.0, 8.0, 28.0, 18.0, 42.0, 288.0, 50.0, 10.0, - 192.0, 16.0, 72.0, 40.0, 25.0, 224.0, 23.0, 72.0, 272.0, 50.0, 96.0, - 232.0, 112.0, 80.0, 216.0, 44.0, 56.0, 56.0, 14.0, 496.0, 38.0, 16.0, - 24.0, 28.0, 144.0, 3.0, 14.0, 48.0, 0.0, 20.0, 400.0, 38.0, 12.0, - 2.0, 112.0, 448.0, 26.0, 28.0, 192.0, 64.0, 2.0, 224.0, 160.0, 240.0, - 112.0, 128.0, 52.0, 40.0, 224.0, 48.0, 200.0, 112.0, 20.0, 18.0, 448.0, - 12.0, 17.0, 176.0, 352.0, 40.0, 10.0, 4.0, 0.0, 62.0, 52.0, 432.0, - 224.0, 24.0, 60.0, 40.0, 40.0, 44.0, 24.0, 17.0, 22.0, 16.0, 112.0, - 29.0, 96.0, 29.0, 9.0, 56.0, 13.0, 36.0, 480.0, 76.0, 13.0, 8.0, - 224.0, 144.0, 36.0, 16.0, 23.0, 368.0, 248.0, 25.0, 16.0, 30.0, 32.0, - 19.0, 16.0, 4.0, 32.0, 104.0, 26.0, 3.0, 16.0, 30.0, 18.0, 14.0, - 104.0, 80.0, 24.0, 160.0, 68.0, 432.0, 48.0, 44.0, 26.0, 240.0, 352.0, - 10.0, 14.0, 104.0, 240.0, 240.0, 232.0, 80.0, 96.0, 160.0, 28.0, 72.0, - 4.0, 248.0, 12.0, 20.0, 30.0, 28.0, 26.0, 84.0, 120.0, 24.0, 240.0, - 26.0, 64.0, 4.0, 42.0, 152.0, 6.0, 48.0, 29.0, 42.0, 288.0, 40.0, - 10.0, 176.0, 15.0, 160.0, 56.0, 27.0, 2.0, 384.0, 32.0, 232.0, 0.0, - 88.0, 40.0, 30.0, 44.0, 32.0, 8.0, 176.0, 208.0, 416.0, 224.0, 416.0, - 16.0, 28.0, 68.0, 52.0, 15.0, 52.0, 44.0, 8.0, 112.0, 240.0, 3.0, - 116.0, 4.0, 40.0, 58.0, 48.0, 2.0, 432.0, 16.0, 26.0, 80.0, 400.0, - 8.0, 24.0, 54.0, 8.0, 352.0, 104.0, 256.0, 64.0, 21.0, 208.0, 36.0, - 27.0, 336.0, 400.0, 68.0, 22.0, 24.0, 30.0, 13.0, 14.0, 0.0, 304.0, - 464.0, 34.0, 112.0, 28.0, 116.0, 184.0, 96.0, 14.0, 32.0, 12.0, 248.0, - 8.0, 27.0, 14.0, 104.0, 22.0, 48.0, 16.0, 96.0, 62.0, 32.0, 48.0, - 400.0, 8.0, 496.0, 48.0, 36.0, 144.0, 5.0, 176.0, 320.0, 160.0, 176.0, - 120.0, 100.0, 28.0, 24.0, 48.0, 0.0, 24.0, 24.0, 52.0, 12.0, 16.0, - 12.0, 22.0, 168.0, 40.0, 88.0, 128.0, 496.0, 20.0, 240.0, 28.0, 16.0, - 104.0, 48.0, 44.0, 416.0, 96.0, 48.0, 248.0, 304.0, 336.0, 10.0, 8.0, - 64.0, 32.0, 160.0, 64.0, 40.0, 32.0, 88.0, 0.0, 4.0, 27.0, 22.0, - 8.0, 152.0, 320.0, 6.0, 19.0, 384.0, 18.0, 16.0, 20.0, 17.0, 48.0, - 16.0, 112.0, 304.0, 160.0, 320.0, 272.0, 200.0, 56.0, 64.0, 16.0, 56.0, - 30.0, 54.0, 176.0, 22.0, 208.0, 72.0, 496.0, 56.0, 368.0, 128.0, 108.0, - 40.0, 240.0, 176.0, 144.0, 24.0, 96.0, 0.0, 22.0, 208.0, 304.0, 52.0, - 14.0, 19.0, 120.0, 128.0, 256.0, 4.0, 6.0, 18.0, 38.0, 48.0, 0.0, - 104.0, 108.0, 304.0, 64.0, 6.0, 232.0, 19.0, 30.0, 216.0, 52.0, 46.0, - 17.0, 32.0, 10.0, 8.0, 76.0, 96.0, 36.0, 320.0, 44.0, 28.0, 54.0, - 240.0, 36.0, 8.0, 29.0, 16.0, 15.0, 5.0, 100.0, 10.0, 3.0, 60.0, - 124.0, 54.0, 288.0, 8.0, 72.0, 62.0, 32.0, 152.0, 11.0, 26.0, 42.0, - 6.0, 104.0, 28.0, 0.0, 19.0, 4.0, 10.0, 76.0, 8.0, 16.0, 208.0, - 224.0, 352.0, 108.0, 96.0, 16.0, 168.0, 112.0, 368.0, 80.0, 56.0, 12.0, - 320.0, 208.0, 0.0, 8.0, 0.0, 31.0, 480.0, 8.0, 216.0, 8.0, 38.0, - 11.0, 11.0, 80.0, 29.0, 0.0, 160.0, 176.0, 248.0, 112.0, 52.0, 14.0, - 44.0, 6.0, 17.0, 96.0, 80.0, 8.0, 400.0, 352.0, 256.0, 40.0, 192.0, - 60.0, 96.0, 52.0, 20.0, 416.0, 15.0, 248.0, 19.0, 240.0, 72.0, 416.0, - 22.0, 6.0, 14.0, 38.0, 30.0, 11.0, 2.0, 16.0, 48.0, 48.0, 108.0, - 168.0, 32.0, 304.0, 16.0, 112.0, 10.0, 0.0, 64.0, 40.0, 72.0, 60.0, - 496.0, 46.0, 10.0, 176.0, 136.0, 10.0, 76.0, 384.0, 112.0, 192.0, 192.0, - 26.0, 128.0, 192.0, 8.0, 120.0, 7.0, 34.0, 32.0, 24.0, 8.0, 56.0, - 1.0, 16.0, 50.0, 36.0, 44.0, 272.0, 32.0, 96.0, 28.0, 400.0, 9.0, - 432.0, 256.0, 27.0, 24.0, 112.0, 416.0, 192.0, 20.0, 116.0, 40.0, 320.0, - 0.0, 15.0, 144.0, 17.0, 48.0, 58.0, 136.0, 368.0, 16.0, 26.0, 0.0, - 304.0, 116.0, 11.0, 224.0, 368.0, 100.0, 84.0, 240.0, 88.0, 58.0, 448.0, - 160.0, 52.0, 16.0, 60.0, 128.0, 30.0, 100.0, 416.0, 64.0, 48.0, 224.0, - 336.0, 0.0, 200.0, 6.0, 128.0, 144.0, 112.0, 192.0, 44.0, 112.0, 48.0, - 0.0, 14.0, 4.0, 0.0, 3.0, 28.0, 0.0, 16.0, 120.0, 496.0, 44.0, - 336.0, 44.0, 224.0, 80.0, 184.0, 31.0, 464.0, 160.0, 256.0, 44.0, 120.0, - 12.0, 48.0, 192.0, 32.0, 60.0, 56.0, 46.0, 38.0, 288.0, 232.0, 496.0, - 27.0, 152.0, 24.0, 9.0, 26.0, 21.0, 0.0, 72.0, 12.0, 0.0, 30.0, - 48.0, 36.0, 200.0, 4.0, 40.0, 64.0, 17.0, 248.0, 224.0, 9.0, 56.0, - 24.0, 88.0, 48.0, 184.0, 36.0, 32.0, 21.0, 18.0, 152.0, 128.0, 72.0, - 12.0, 56.0, 62.0, 184.0, 2.0, 40.0, 384.0, 352.0, 208.0, 24.0, 116.0, - 416.0, 6.0, 216.0, 52.0, 92.0, 0.0, 288.0, 38.0, 62.0, 13.0, 54.0, - 30.0, 144.0, 52.0, 448.0, 38.0, 400.0, 256.0, 80.0, 10.0, 22.0, 9.0, - 0.0, 68.0, 48.0, 128.0, 176.0, 400.0, 16.0, 2.0, 92.0, 18.0, 40.0, - 4.0, 112.0, 24.0, 36.0, 29.0, 100.0, 40.0, 256.0, 160.0, 256.0, 128.0, - 60.0, 76.0, 160.0, 32.0, 26.0, 48.0, 18.0, 216.0, 84.0, 42.0, 52.0, - 5.0, 9.0, 240.0, 240.0, 16.0, 144.0, 4.0, 0.0, 31.0, 52.0, 40.0, - 72.0, 144.0, 13.0, 224.0, 112.0, 3.0, 32.0, 24.0, 116.0, 384.0, 64.0, - 8.0, 12.0, 16.0, 0.0, 464.0, 42.0, 24.0, 0.0, 480.0, 112.0, 288.0, - 52.0, 16.0, 10.0, 7.0, 240.0, 368.0, 17.0, 16.0, 0.0, 124.0, 24.0, - 128.0, 15.0, 21.0, 28.0, 16.0, 224.0, 8.0, 54.0, 25.0, 224.0, 384.0, - 80.0, 6.0, 192.0, 28.0, 248.0, 20.0, 72.0, 112.0, 27.0, 64.0, 16.0, - 240.0, 56.0, 4.0, 432.0, 26.0, 432.0, 36.0, 5.0, 56.0, 192.0, 32.0, - 124.0, 400.0, 128.0, 23.0, 60.0, 38.0, 19.0, 160.0, 92.0, 16.0, 12.0, - 336.0, 10.0, 20.0, 152.0, 32.0, 72.0, 0.0, 92.0, 128.0, 256.0, 168.0, - 216.0, 62.0, 16.0, 46.0, 192.0, 64.0, 29.0, 208.0, 120.0, 56.0, 52.0, - 2.0, 62.0, 112.0, 16.0, 4.0, 40.0, 16.0, 31.0, 176.0, 192.0, 208.0, - 9.0, 104.0, 96.0, 24.0, 68.0, 384.0, 17.0, 10.0, 320.0, 216.0, 12.0, - 24.0, 480.0, 20.0, 24.0, 12.0, 100.0, 12.0, 48.0, 124.0, 5.0, 42.0, - 192.0, 24.0, 52.0, 72.0, 26.0, 58.0, 32.0, 28.0, 192.0, 84.0, 104.0, - 224.0, 272.0, 68.0, 30.0, 52.0, 14.0, 336.0, 128.0, 5.0, 14.0, 12.0, - 8.0, 29.0, 52.0, 464.0, 54.0, 14.0, 128.0, 29.0, 62.0, 22.0, 76.0, - 48.0, 76.0, 112.0, 44.0, 72.0, 7.0, 192.0, 3.0, 0.0, 200.0, 32.0, - 384.0, 16.0, 76.0, 26.0, 240.0, 400.0, 192.0, 50.0, 5.0, 240.0, 124.0, - 42.0, 48.0, 32.0, 92.0, 104.0, 336.0, 4.0, 256.0, 48.0, 240.0, 120.0, - 80.0, 88.0, 40.0, 36.0, 112.0, 23.0, 60.0, 104.0, 14.0, 40.0, 58.0, - 2.0, 11.0, 30.0, 31.0, 184.0, 14.0, 5.0, 32.0, 16.0, 0.0, 20.0, - 31.0, 12.0, 52.0, 16.0, 28.0, 12.0, 27.0, 22.0, 216.0, 16.0, 432.0, - 52.0, 10.0, 192.0, 72.0, 28.0, 416.0, 24.0, 34.0, 52.0, 80.0, 7.0, - 0.0, 32.0, 36.0, 16.0, 92.0, 50.0, 432.0, 6.0, 2.0, 240.0, 48.0, - 120.0, 30.0, 152.0, 20.0, 32.0, 48.0, 480.0, 96.0, 232.0, 112.0, 416.0, - 108.0, 50.0, 40.0, 34.0, 192.0, 44.0, 52.0, 2.0, 72.0, 46.0, 384.0, - 17.0, 48.0, 80.0, 116.0, 84.0, 112.0, 116.0, 4.0, 320.0, 72.0, 116.0, - 20.0, 200.0, 28.0, 176.0, 2.0, 20.0, 176.0, 12.0, 56.0, 56.0, 20.0, - 76.0, 6.0, 28.0, 6.0, 2.0, 240.0, 11.0, 22.0, 50.0, 11.0, 22.0, - 4.0, 96.0, 13.0, 88.0, 40.0, 288.0, 24.0, 14.0, 80.0, 15.0, 52.0, - 8.0, 0.0, 26.0, 92.0, 24.0, 16.0, 104.0, 64.0, 24.0, 28.0, 112.0, - 88.0, 272.0, 208.0, 128.0, 2.0, 16.0, 16.0, 24.0, 240.0, 176.0, 64.0, - 448.0, 18.0, 124.0, 208.0, 21.0, 20.0, 136.0, 19.0, 384.0, 80.0, 40.0, - 25.0, 27.0, 16.0, 58.0, 200.0, 8.0, 52.0, 5.0, 400.0, 6.0, 128.0, - 216.0, 40.0, 0.0, 27.0, 288.0, 96.0, 128.0, 25.0, 17.0, 12.0, 50.0, - 40.0, 20.0, 13.0, 84.0, 208.0, 42.0, 15.0, 32.0, 4.0, 46.0, 16.0, - 26.0, 32.0, 8.0, 28.0, 16.0, 120.0, 0.0, 224.0, 248.0, 24.0, 2.0, - 40.0, 176.0, 72.0, 168.0, 116.0, 288.0, 432.0, 56.0, 216.0, 116.0, 52.0, - 34.0, 448.0, 32.0, 208.0, 96.0, 80.0, 20.0, 9.0, 28.0, 18.0, 160.0, - 208.0, 216.0, 26.0, 40.0, 1.0, 19.0, 64.0, 2.0, 92.0, 28.0, 22.0, - 44.0, 256.0, 3.0, 64.0, 29.0, 20.0, 88.0, 88.0, 160.0, 80.0, 104.0, - 168.0, 14.0, 36.0, 6.0, 6.0, 116.0, 64.0, 64.0, 92.0, 0.0, 62.0, - 18.0, 144.0, 4.0, 448.0, 21.0, 13.0, 52.0, 40.0, 27.0, 32.0, 480.0, - 0.0, 23.0, 256.0, 496.0, 120.0, 32.0, 10.0, 120.0, 15.0, 28.0, 32.0, - 20.0, 58.0, 40.0, 40.0, 100.0, 24.0, 0.0, 40.0, 4.0, 320.0, 29.0, - 160.0, 88.0, 8.0, 248.0, 104.0, 9.0, 14.0, 12.0, 46.0, 8.0, 48.0, - 76.0, 8.0, 256.0, 184.0, 240.0, 56.0, 120.0, 100.0, 9.0, 60.0, 160.0, - 144.0, 24.0, 40.0, 18.0, 50.0, 48.0, 96.0, 10.0, 6.0, 34.0, 14.0, - 160.0, 108.0, 336.0, 192.0, 56.0, 496.0, 20.0, 400.0, 16.0, 1.0, 48.0, - 25.0, 416.0, 168.0, 272.0, 112.0, 21.0, 8.0, 32.0, 88.0, 8.0, 112.0, - 0.0, 1.0, 100.0, 104.0, 336.0, 36.0, 64.0, 36.0, 112.0, 18.0, 17.0, - 368.0, 232.0, 16.0, 4.0, 26.0, 48.0, 48.0, 21.0, 124.0, 62.0, 136.0, - 192.0, 224.0, 432.0, 5.0, 23.0, 8.0, 80.0, 120.0, 120.0, 29.0, 128.0, - 54.0, 8.0, 168.0, 352.0, 3.0, 50.0, 112.0, 120.0, 4.0, 24.0, 20.0, - 368.0, 25.0, 6.0, 36.0, 64.0, 36.0, 0.0, 96.0, 6.0, 192.0, 80.0, - 100.0, 64.0, 40.0, 30.0, 16.0, 22.0, 32.0, 24.0, 32.0, 44.0, 208.0, - 192.0, 6.0, 14.0, 7.0, 4.0, 26.0, 288.0, 72.0, 52.0, 8.0, 112.0, - 14.0, 0.0, 5.0, 96.0, 60.0, 24.0, 54.0, 60.0, 24.0, 24.0, 200.0, - 400.0, 58.0, 50.0, 40.0, 124.0, 216.0, 432.0, 26.0, 448.0, 1.0, 0.0, - 17.0, 1.0, 18.0, 16.0, 22.0, 128.0, 200.0, 54.0, 240.0, 32.0, 52.0, - 44.0, 28.0, 34.0, 64.0, 368.0, 29.0, 56.0, 168.0, 112.0, 20.0, 44.0, - 5.0, 480.0, 16.0, 48.0, 432.0, 52.0, 7.0, 176.0, 36.0, 12.0, 32.0, - 40.0, 22.0, 56.0, 116.0, 116.0, 100.0, 0.0, 14.0, 480.0, 42.0, 40.0, - 32.0, 224.0, 30.0, 112.0, 16.0, 152.0, 46.0, 272.0, 84.0, 6.0, 11.0, - 50.0, 232.0, 40.0, 416.0, 16.0, 224.0, 56.0, 1.0, 120.0, 4.0, 29.0, - 40.0, 9.0, 104.0, 48.0, 12.0, 208.0, 144.0, 19.0, 8.0, 224.0, 144.0, - 5.0, 20.0, 56.0, 0.0, 0.0, 26.0, 304.0, 80.0, 464.0, 4.0, 32.0, - 18.0, 384.0, 120.0, 0.0, 112.0, 15.0, 128.0, 72.0, 38.0, 32.0, 104.0, - 104.0, 48.0, 17.0, 384.0, 144.0, 108.0, 21.0, 368.0, 368.0, 16.0, 36.0, - 46.0, 3.0, 200.0, 0.0, 32.0, 464.0, 4.0, 168.0, 10.0, 64.0, 480.0, - 12.0, 13.0, 336.0, 24.0, 44.0, 11.0, 32.0, 352.0, 16.0, 416.0, 16.0, - 36.0, 4.0, 52.0, 28.0, 29.0, 7.0, 31.0, 336.0, 20.0, 24.0, 68.0, - 50.0, 400.0, 240.0, 21.0, 30.0, 0.0, 224.0, 336.0, 112.0, 36.0, 232.0, - 224.0, 20.0, 10.0, 288.0, 56.0, 192.0, 20.0, 88.0, 32.0, 96.0, 464.0, - 384.0, 26.0, 28.0, 23.0, 5.0, 40.0, 22.0, 56.0, 368.0, 18.0, 192.0, - 192.0, 480.0, 80.0, 176.0, 0.0, 31.0, 112.0, 32.0, 16.0, 30.0, 60.0, - 72.0, 224.0, 46.0, 192.0, 44.0, 0.0, 144.0, 14.0, 52.0, 216.0, 30.0, - 11.0, 11.0, 10.0, 1.0, 432.0, 30.0, 36.0, 32.0, 8.0, 84.0, 76.0, - 13.0, 16.0, 368.0, 120.0, 0.0, 144.0, 104.0, 30.0, 16.0, 12.0, 240.0, - 11.0, 232.0, 52.0, 26.0, 48.0, 6.0, 160.0, 64.0, 152.0, 12.0, 48.0, - 4.0, 248.0, 40.0, 108.0, 48.0, 56.0, 4.0, 56.0, 120.0, 384.0, 176.0, - 25.0, 144.0, 96.0, 4.0, 0.0, 60.0, 192.0, 160.0, 2.0, 14.0, 12.0, - 1.0, 1.0, 8.0, 0.0, 26.0, 76.0, 104.0, 116.0, 84.0, 192.0, 21.0, - 23.0, 18.0, 9.0, 18.0, 384.0, 480.0, 8.0, 44.0, 208.0, 144.0, 416.0, - 320.0, 240.0, 104.0, 80.0, 27.0, 192.0, 32.0, 20.0, 208.0, 14.0, 26.0, - 4.0, 0.0, 7.0, 58.0, 30.0, 200.0, 2.0, 36.0, 480.0, 24.0, 25.0, - 14.0, 112.0, 240.0, 2.0, 80.0, 27.0, 0.0, 42.0, 48.0, 20.0, 92.0, - 24.0, 38.0, 216.0, 80.0, 176.0, 72.0, 176.0, 8.0, 48.0, 240.0, 352.0, - 8.0, 84.0, 26.0, 20.0, 72.0, 96.0, 56.0, 5.0, 128.0, 0.0, 14.0, - 23.0, 52.0, 28.0, 120.0, 216.0, 28.0, 152.0, 16.0, 0.0, 8.0, 11.0, - 104.0, 200.0, 10.0, 0.0, 352.0, 96.0, 19.0, 432.0, 14.0, 176.0, 24.0, - 11.0, 15.0, 72.0, 44.0, 13.0, 48.0, 368.0, 20.0, 288.0, 176.0, 3.0, - 30.0, 84.0, 28.0, 52.0, 40.0, 304.0, 56.0, 15.0, 32.0, 168.0, 200.0, - 192.0, 192.0, 12.0, 32.0, 4.0, 12.0, 336.0, 2.0, 240.0, 304.0, 19.0, - 72.0, 60.0, 368.0, 144.0, 34.0, 272.0, 54.0, 20.0, 56.0, 54.0, 14.0, - 26.0, 3.0, 14.0, 4.0, 6.0, 192.0, 40.0, 6.0, 12.0, 32.0, 128.0, - 36.0, 96.0, 0.0, 0.0, 56.0, 64.0, 0.0, 200.0, 208.0, 192.0, 14.0, - 48.0, 20.0, 240.0, 80.0, 224.0, 52.0, 34.0, 256.0, 448.0, 96.0, 12.0, - 120.0, 40.0, 100.0, 3.0, 96.0, 320.0, 124.0, 20.0, 128.0, 17.0, 32.0, - 0.0, 18.0, 2.0, 176.0, 72.0, 108.0, 72.0, 20.0, 0.0, 208.0, 168.0, - 20.0, 50.0, 400.0, 0.0, 10.0, 200.0, 368.0, 22.0, 200.0, 112.0, 20.0, - 272.0, 480.0, 256.0, 16.0, 11.0, 58.0, 36.0, 60.0, 19.0, 128.0, 4.0, - 68.0, 72.0, 20.0, 25.0, 32.0, 192.0, 136.0, 24.0, 92.0, 400.0, 12.0, - 19.0, 9.0, 240.0, 136.0, 9.0, 46.0, 96.0, 17.0, 112.0, 48.0, 168.0, - 20.0, 44.0, 14.0, 14.0, 48.0, 16.0, 336.0, 11.0, 14.0, 44.0, 96.0, - 2.0, 56.0, 152.0, 62.0, 192.0, 56.0, 0.0, 160.0, 112.0, 104.0, 112.0, - 13.0, 24.0, 3.0, 96.0, 0.0, 0.0, 24.0, 44.0, 25.0, 44.0, 176.0, - 22.0, 12.0, 288.0, 8.0, 152.0, 28.0, 128.0, 124.0, 28.0, 56.0, 25.0, - 320.0, 96.0, 9.0, 16.0, 12.0, 26.0, 8.0, 9.0, 152.0, 464.0, 60.0, - 192.0, 29.0, 60.0, 20.0, 50.0, 400.0, 272.0, 240.0, 40.0, 50.0, 8.0, - 120.0, 29.0, 38.0, 216.0, 40.0, 8.0, 18.0, 60.0, 24.0, 80.0, 96.0, - 52.0, 20.0, 28.0, 0.0, 5.0, 5.0, 29.0, 44.0, 8.0, 5.0, 56.0, - 76.0, 17.0, 80.0, 20.0, 60.0, 120.0, 432.0, 112.0, 24.0, 20.0, 22.0, - 56.0, 12.0, 0.0, 304.0, 120.0, 0.0, 28.0, 116.0, 64.0, 8.0, 96.0, - 8.0, 192.0, 92.0, 23.0, 12.0, 10.0, 496.0, 72.0, 112.0, 48.0, 152.0, - 72.0, 232.0, 0.0, 32.0, 38.0, 232.0, 352.0, 40.0, 84.0, 88.0, 32.0, - 56.0, 9.0, 2.0, 16.0, 46.0, 29.0, 72.0, 16.0, 25.0, 240.0, 1.0, - 108.0, 144.0, 112.0, 3.0, 216.0, 58.0, 9.0, 10.0, 48.0, 2.0, 7.0, - 12.0, 336.0, 62.0, 14.0, 52.0, 24.0, 58.0, 384.0, 29.0, 2.0, 24.0, - 56.0, 112.0, 3.0, 4.0, 4.0, 30.0, 14.0, 4.0, 54.0, 496.0, 24.0, - 128.0, 7.0, 96.0, 60.0, 20.0, 52.0, 40.0, 272.0, 30.0, 16.0, 46.0, - 56.0, 5.0, 80.0, 30.0, 40.0, 24.0, 76.0, 56.0, 84.0, 64.0, 24.0, - 28.0, 58.0, 12.0, 42.0, 216.0, 16.0, 104.0, 160.0, 12.0, 24.0, 40.0, - 25.0, 80.0, 4.0, 84.0, 64.0, 14.0, 72.0, 17.0, 54.0, 8.0, 48.0, - 72.0, 192.0, 80.0, 58.0, 0.0, 124.0, 224.0, 120.0, 11.0, 200.0, 38.0, - 4.0, 10.0, 184.0, 232.0, 12.0, 48.0, 1.0, 12.0, 8.0, 7.0, 11.0, - 4.0, 26.0, 4.0, 4.0, 34.0, 144.0, 30.0, 58.0, 80.0, 168.0, 4.0, - 416.0, 64.0, 60.0, 216.0, 12.0, 14.0, 384.0, 27.0, 15.0, 248.0, 20.0, - 0.0, 112.0, 64.0, 136.0, 28.0, 304.0, 12.0, 44.0, 100.0, 16.0, 56.0, - 176.0, 56.0, 13.0, 9.0, 152.0, 352.0, 240.0, 52.0, 2.0, 352.0, 96.0, - 29.0, 17.0, 24.0, 1.0, 32.0, 248.0, 128.0, 27.0, 5.0, 31.0, 208.0, - 68.0, 27.0, 144.0, 104.0, 368.0, 84.0, 5.0, 9.0, 12.0, 32.0, 0.0, - 12.0, 24.0, 20.0, 40.0, 432.0, 116.0, 44.0, 184.0, 192.0, 192.0, 14.0, - 336.0, 368.0, 352.0, 38.0, 144.0, 14.0, 76.0, 96.0, 20.0, 120.0, 64.0, - 464.0, 12.0, 112.0, 128.0, 124.0, 64.0, 88.0, 32.0, 240.0, 60.0, 14.0, - 0.0, 32.0, 144.0, 56.0, 18.0, 44.0, 64.0, 112.0, 4.0, 88.0, 32.0, - 80.0, 34.0, 320.0, 0.0, 8.0, 12.0, 0.0, 14.0, 168.0, 12.0, 34.0, - 32.0, 23.0, 184.0, 28.0, 18.0, 0.0, 224.0, 208.0, 136.0, 336.0, 0.0, - 29.0, 58.0, 11.0, 31.0, 96.0, 240.0, 6.0, 104.0, 64.0, 12.0, 120.0, - 128.0, 24.0, 21.0, 18.0, 84.0, 50.0, 144.0, 0.0, 2.0, 12.0, 25.0, - 336.0, 64.0, 19.0, 7.0, 46.0, 4.0, 240.0, 17.0, 72.0, 15.0, 56.0, - 384.0, 96.0, 232.0, 46.0, 6.0, 20.0, 108.0, 18.0, 16.0, 28.0, 48.0, - 184.0, 58.0, 60.0, 160.0, 272.0, 0.0, 44.0, 19.0, 34.0, 288.0, 224.0, - 12.0, 240.0, 34.0, 15.0, 88.0, 52.0, 80.0, 54.0, 10.0, 216.0, 14.0, - 40.0, 304.0, 152.0, 40.0, 28.0, 44.0, 40.0, 416.0, 10.0, 8.0, 2.0, - 29.0, 136.0, 8.0, 96.0, 13.0, 4.0, 112.0, 0.0, 100.0, 48.0, 3.0, - 2.0, 96.0, 50.0, 28.0, 128.0, 120.0, 36.0, 0.0, 24.0, 200.0, 80.0, - 16.0, 14.0, 40.0, 20.0, 432.0, 60.0, 104.0, 29.0, 120.0, 11.0, 160.0, - 64.0, 10.0, 20.0, 64.0, 10.0, 208.0, 20.0, 56.0, 116.0, 272.0, 42.0, - 48.0, 3.0, 10.0, 3.0, 28.0, 368.0, 72.0, 80.0, 52.0, 192.0, 28.0, - 128.0, 14.0, 68.0, 120.0, 400.0, 2.0, 0.0, 224.0, 28.0, 320.0, 176.0, - 176.0, 240.0, 224.0, 48.0, 26.0, 416.0, 4.0, 336.0, 20.0, 104.0, 0.0, - 4.0, 0.0, 11.0, 80.0, 80.0, 72.0, 32.0, 19.0, 34.0, 176.0, 124.0, - 304.0, 0.0, 13.0, 80.0, 62.0, 32.0, 20.0, 160.0, 336.0, 216.0, 52.0, - 28.0, 92.0, 16.0, 480.0, 16.0, 14.0, 96.0, 18.0, 46.0, 16.0, 96.0, - 12.0, 240.0, 0.0, 216.0, 80.0, 16.0, 30.0, 144.0, 12.0, 40.0, 32.0, - 5.0, 4.0, 144.0, 36.0, 336.0, 36.0, 11.0, 44.0, 80.0, 1.0, 144.0, - 40.0, 72.0, 232.0, 26.0, 58.0, 216.0, 16.0, 144.0, 6.0, 10.0, 368.0, - 216.0, 20.0, 21.0, 40.0, 8.0, 16.0, 32.0, 120.0, 448.0, 240.0, 112.0, - 160.0, 80.0, 26.0, 7.0, 80.0, 248.0, 21.0, 28.0, 64.0, 56.0, 18.0, - 19.0, 176.0, 24.0, 20.0, 320.0, 0.0, 2.0, 48.0, 80.0, 8.0, 368.0, - 48.0, 48.0, 448.0, 10.0, 42.0, 80.0, 400.0, 44.0, 13.0, 0.0, 32.0, - 480.0, 30.0, 44.0, 136.0, 76.0, 240.0, 52.0, 3.0, 80.0, 192.0, 192.0, - 160.0, 16.0, 96.0, 64.0, 32.0, 20.0, 2.0, 256.0, 176.0, 176.0, 200.0, - 72.0, 108.0, 31.0, 38.0, 14.0, 58.0, 256.0, 128.0, 232.0, 116.0, 4.0, - 112.0, 24.0, 17.0, 46.0, 136.0, 216.0, 32.0, 30.0, 14.0, 92.0, 26.0, - 56.0, 208.0, 31.0, 17.0, 12.0, 30.0, 12.0, 92.0, 64.0, 24.0, 368.0, - 168.0, 20.0, 4.0, 0.0, 84.0, 32.0, 80.0, 36.0, 52.0, 20.0, 34.0, - 19.0, 8.0, 192.0, 112.0, 288.0, 8.0, 0.0, 12.0, 26.0, 8.0, 10.0, - 8.0, 96.0, 136.0, 128.0, 56.0, 42.0, 42.0, 88.0, 80.0, 64.0, 32.0, - 480.0, 4.0, 112.0, 84.0, 14.0, 72.0, 40.0, 29.0, 112.0, 192.0, 136.0, - 384.0, 240.0, 36.0, 352.0, 32.0, 152.0, 26.0, 12.0, 448.0, 240.0, 8.0, - 23.0, 32.0, 304.0, 18.0, 400.0, 23.0, 36.0, 30.0, 0.0, 16.0, 432.0, - 160.0, 24.0, 216.0, 28.0, 38.0, 200.0, 42.0, 416.0, 432.0, 72.0, 56.0, - 96.0, 100.0, 32.0, 84.0, 112.0, 40.0, 248.0, 152.0, 68.0, 80.0, 25.0, - 32.0, 26.0, 40.0, 9.0, 40.0, 0.0, 80.0, 144.0, 2.0, 208.0, 96.0, - 56.0, 72.0, 176.0, 56.0, 104.0, 128.0, 32.0, 108.0, 18.0, 26.0, 64.0, - 54.0, 48.0, 0.0, 38.0, 32.0, 28.0, 320.0, 19.0, 27.0, 248.0, 384.0, - 160.0, 400.0, 80.0, 176.0, 22.0, 216.0, 16.0, 32.0, 30.0, 56.0, 46.0, - 192.0, 2.0, 304.0, 5.0, 62.0, 240.0, 448.0, 0.0, 80.0, 88.0, 176.0, - 13.0, 116.0, 54.0, 46.0, 30.0, 112.0, 58.0, 320.0, 64.0, 18.0, 256.0, - 0.0, 400.0, 240.0, 200.0, 40.0, 38.0, 92.0, 48.0, 112.0, 100.0, 96.0, - 32.0, 28.0, 28.0, 19.0, 464.0, 208.0, 29.0, 12.0, 216.0, 24.0, 160.0, - 208.0, 58.0, 160.0, 96.0, 38.0, 14.0, 30.0, 32.0, 20.0, 240.0, 14.0, - 480.0, 176.0, 200.0, 32.0, 32.0, 288.0, 144.0, 24.0, 416.0, 20.0, 54.0, - 176.0, 80.0, 52.0, 20.0, 26.0, 52.0, 14.0, 20.0, 30.0, 16.0, 112.0, - 17.0, 23.0, 16.0, 60.0, 16.0, 1.0, 46.0, 112.0, 32.0, 17.0, 8.0, - 24.0, 62.0, 224.0, 2.0, 31.0, 0.0, 13.0, 52.0, 48.0, 448.0, 480.0, - 0.0, 30.0, 104.0, 18.0, 4.0, 72.0, 25.0, 31.0, 432.0, 448.0, 8.0, - 40.0, 96.0, 56.0, 432.0, 27.0, 11.0, 11.0, 20.0, 2.0, 128.0, 24.0, - 112.0, 100.0, 22.0, 56.0, 208.0, 96.0, 30.0, 352.0, 40.0, 12.0, 64.0, - 62.0, 400.0, 8.0, 22.0, 38.0, 29.0, 120.0, 112.0, 16.0, 40.0, 12.0, - 42.0, 1.0, 432.0, 40.0, 112.0, 116.0, 56.0, 88.0, 9.0, 2.0, 100.0, - 18.0, 176.0, 11.0, 320.0, 248.0, 36.0, 6.0, 22.0, 112.0, 20.0, 64.0, - 23.0, 76.0, 26.0, 160.0, 48.0, 84.0, 56.0, 14.0, 224.0, 0.0, 96.0, - 52.0, 30.0, 116.0, 36.0, 336.0, 4.0, 368.0, 100.0, 5.0, 56.0, 320.0, - 60.0, 80.0, 15.0, 12.0, 80.0, 62.0, 2.0, 88.0, 42.0, 96.0, 160.0, - 256.0, 48.0, 32.0, 88.0, 11.0, 62.0, 76.0, 48.0, 34.0, 17.0, 28.0, - 18.0, 17.0, 23.0, 160.0, 208.0, 9.0, 68.0, 52.0, 14.0, 0.0, 272.0, - 80.0, 10.0, 448.0, 50.0, 92.0, 16.0, 320.0, 8.0, 14.0, 320.0, 9.0, - 12.0, 23.0, 26.0, 432.0, 31.0, 18.0, 1.0, 240.0, 400.0, 18.0, 12.0, - 38.0, 36.0, 116.0, 24.0, 432.0, 2.0, 16.0, 128.0, 31.0, 16.0, 48.0, - 76.0, 32.0, 192.0, 10.0, 6.0, 104.0, 88.0, 48.0, 108.0, 16.0, 112.0, - 3.0, 54.0, 208.0, 8.0, 64.0, 42.0, 136.0, 22.0, 4.0, 32.0, 40.0, - 4.0, 92.0, 496.0, 28.0, 216.0, 18.0, 4.0, 416.0, 26.0, 160.0, 46.0, - 38.0, 48.0, 52.0, 60.0, 48.0, 1.0, 52.0, 72.0, 4.0, 272.0, 448.0, - 30.0, 9.0, 5.0, 160.0, 144.0, 9.0, 42.0, 100.0, 32.0, 184.0, 36.0, - 22.0, 40.0, 256.0, 496.0, 20.0, 5.0, 14.0, 22.0, 42.0, 56.0, 36.0, - 108.0, 256.0, 26.0, 336.0, 96.0, 60.0, 19.0, 84.0, 232.0, 16.0, 32.0, - 62.0, 12.0, 224.0, 0.0, 2.0, 16.0, 144.0, 272.0, 368.0, 56.0, 7.0, - 40.0, 160.0, 9.0, 104.0, 240.0, 28.0, 416.0, 96.0, 52.0, 36.0, 88.0, - 29.0, 9.0, 15.0, 96.0, 26.0, 5.0, 44.0, 192.0, 16.0, 28.0, 160.0, - 30.0, 168.0, 32.0, 32.0, 64.0, 76.0, 4.0, 60.0, 240.0, 64.0, 8.0, - 24.0, 25.0, 64.0, 10.0, 19.0, 58.0, 40.0, 24.0, 96.0, 16.0, 108.0, - 1.0, 8.0, 240.0, 16.0, 30.0, 3.0, 4.0, 112.0, 32.0, 29.0, 26.0, - 208.0, 56.0, 144.0, 112.0, 112.0, 52.0, 416.0, 48.0, 50.0, 432.0, 48.0, - 40.0, 64.0, 256.0, 0.0, 8.0, 12.0, 24.0, 64.0, 54.0, 29.0, 240.0, - 448.0, 24.0, 52.0, 96.0, 232.0, 58.0, 416.0, 21.0, 0.0, 10.0, 44.0, - 9.0, 9.0, 496.0, 0.0, 0.0, 12.0, 14.0, 12.0, 496.0, 24.0, 40.0, - 168.0, 20.0, 160.0, 248.0, 28.0, 56.0, 0.0, 62.0, 216.0, 84.0, 8.0, - 27.0, 29.0, 16.0, 208.0, 0.0, 112.0, 62.0, 23.0, 176.0, 56.0, 1.0, - 72.0, 124.0, 60.0, 96.0, 28.0, 432.0, 34.0, 192.0, 112.0, 176.0, 40.0, - 54.0, 36.0, 224.0, 40.0, 128.0, 416.0, 104.0, 124.0, 24.0, 10.0, 112.0, - 64.0, 32.0, 4.0, 8.0, 352.0, 120.0, 0.0, 64.0, 160.0, 19.0, 80.0, - 40.0, 208.0, 96.0, 288.0, 256.0, 22.0, 368.0, 10.0, 36.0, 60.0, 192.0, - 256.0, 184.0, 2.0, 5.0, 40.0, 29.0, 27.0, 8.0, 16.0, 25.0, 42.0, - 336.0, 52.0, 8.0, 56.0, 108.0, 19.0, 320.0, 68.0, 14.0, 56.0, 20.0, - 124.0, 40.0, 60.0, 64.0, 80.0, 288.0, 0.0, 19.0, 18.0, 42.0, 136.0, - 28.0, 64.0, 56.0, 496.0, 26.0, 336.0, 0.0, 1.0, 120.0, 8.0, 120.0, - 80.0, 168.0, 100.0, 72.0, 104.0, 9.0, 352.0, 11.0, 144.0, 128.0, 168.0, - 56.0, 76.0, 2.0, 72.0, 176.0, 52.0, 30.0, 16.0, 248.0, 112.0, 42.0, - 232.0, 4.0, 88.0, 96.0, 8.0, 40.0, 32.0, 32.0, 384.0, 52.0, 12.0, - 24.0, 184.0, 0.0, 176.0, 112.0, 29.0, 48.0, 6.0, 92.0, 34.0, 5.0, - 16.0, 56.0, 8.0, 48.0, 84.0, 30.0, 52.0, 60.0, 124.0, 10.0, 0.0, - 6.0, 40.0, 112.0, 8.0, 256.0, 16.0, 24.0, 72.0, 56.0, 40.0, 96.0, - 0.0, 26.0, 17.0, 16.0, 25.0, 64.0, 320.0, 50.0, 9.0, 4.0, 160.0, - 40.0, 184.0, 272.0, 72.0, 13.0, 176.0, 124.0, 320.0, 17.0, 336.0, 80.0, - 288.0, 28.0, 56.0, 208.0, 20.0, 76.0, 120.0, 48.0, 100.0, 128.0, 80.0, - 27.0, 128.0, 336.0, 8.0, 24.0, 336.0, 12.0, 48.0, 304.0, 24.0, 34.0, - 0.0, 2.0, 448.0, 20.0, 272.0, 480.0, 224.0, 2.0, 128.0, 62.0, 2.0, - 272.0, 50.0, 22.0, 144.0, 144.0, 56.0, 32.0, 400.0, 8.0, 64.0, 0.0, - 40.0, 48.0, 24.0, 96.0, 38.0, 128.0, 100.0, 68.0, 0.0, 64.0, 192.0, - 416.0, 432.0, 480.0, 84.0, 124.0, 4.0, 3.0, 22.0, 2.0, 208.0, 400.0, - 64.0, 12.0, 0.0, 0.0, 480.0, 240.0, 80.0, 36.0, 26.0, 31.0, 44.0, - 176.0, 128.0, 32.0, 20.0, 29.0, 320.0, 14.0, 72.0, 56.0, 112.0, 52.0, - 76.0, 17.0, 76.0, 24.0, 32.0, 4.0, 8.0, 25.0, 29.0, 368.0, 112.0, - 120.0, 19.0, 72.0, 0.0, 184.0, 80.0, 4.0, 112.0, 36.0, 36.0, 216.0, - 496.0, 30.0, 88.0, 76.0, 24.0, 28.0, 16.0, 184.0, 52.0, 32.0, 288.0, - 14.0, 24.0, 100.0, 192.0, 30.0, 48.0, 29.0, 3.0, 464.0, 24.0, 7.0, - 17.0, 160.0, 2.0, 84.0, 18.0, 24.0, 96.0, 20.0, 208.0, 224.0, 8.0, - 0.0, 304.0, 96.0, 80.0, 480.0, 58.0, 46.0, 136.0, 56.0, 80.0, 464.0, - 8.0, 22.0, 1.0, 8.0, 80.0, 1.0, 216.0, 112.0, 216.0, 104.0, 416.0, - 48.0, 224.0, 56.0, 34.0, 48.0, 52.0, 124.0, 36.0, 32.0, 0.0, 62.0, - 192.0, 29.0, 6.0, 128.0, 272.0, 176.0, 6.0, 112.0, 12.0, 11.0, 288.0, - 36.0, 128.0, 72.0, 416.0, 22.0, 48.0, 176.0, 1.0, 27.0, 28.0, 184.0, - 240.0, 168.0, 32.0, 28.0, 0.0, 48.0, 352.0, 29.0, 184.0, 24.0, 8.0, - 8.0, 19.0, 24.0, 58.0, 21.0, 464.0, 1.0, 50.0, 80.0, 168.0, 60.0, - 68.0, 26.0, 12.0, 1.0, 72.0, 48.0, 112.0, 64.0, 336.0, 30.0, 92.0, - 24.0, 29.0, 32.0, 8.0, 448.0, 4.0, 10.0, 256.0, 368.0, 120.0, 23.0, - 2.0, 48.0, 40.0, 256.0, 9.0, 216.0, 96.0, 384.0, 240.0, 208.0, 124.0, - 48.0, 200.0, 38.0, 26.0, 64.0, 112.0, 192.0, 30.0, 1.0, 240.0, 29.0, - 416.0, 16.0, 92.0, 42.0, 38.0, 64.0, 30.0, 16.0, 38.0, 56.0, 384.0, - 72.0, 0.0, 52.0, 16.0, 18.0, 12.0, 108.0, 160.0, 288.0, 32.0, 30.0, - 96.0, 36.0, 13.0, 464.0, 92.0, 128.0, 2.0, 184.0, 384.0, 464.0, 60.0, - 288.0, 0.0, 16.0, 136.0, 16.0, 32.0, 8.0, 16.0, 11.0, 28.0, 50.0, - 16.0, 384.0, 8.0, 320.0, 2.0, 8.0, 48.0, 28.0, 352.0, 26.0, 352.0, - 20.0, 108.0, 22.0, 288.0, 31.0, 16.0, 96.0, 50.0, 20.0, 320.0, 108.0, - 29.0, 64.0, 56.0, 12.0, 240.0, 240.0, 16.0, 30.0, 17.0, 40.0, 32.0, - 48.0, 5.0, 30.0, 400.0, 4.0, 120.0, 96.0, 40.0, 120.0, 192.0, 42.0, - 44.0, 20.0, 4.0, 4.0, 48.0, 240.0, 26.0, 25.0, 26.0, 112.0, 15.0, - 12.0, 15.0, 240.0, 26.0, 32.0, 24.0, 92.0, 480.0, 30.0, 8.0, 112.0, - 22.0, 432.0, 26.0, 448.0, 224.0, 9.0, 152.0, 24.0, 8.0, 128.0, 0.0, - 28.0, 50.0, 0.0, 40.0, 0.0, 48.0, 0.0, 29.0, 100.0, 76.0, 60.0, - 40.0, 48.0, 200.0, 25.0, 64.0, 16.0, 96.0, 18.0, 0.0, 272.0, 23.0, - 192.0, 208.0, 0.0, 192.0, 8.0, 84.0, 8.0, 13.0, 176.0, 10.0, 32.0, - 6.0, 48.0, 208.0, 28.0, 64.0, 44.0, 80.0, 464.0, 144.0, 46.0, 96.0, - 11.0, 22.0, 40.0, 68.0, 124.0, 58.0, 40.0, 80.0, 32.0, 32.0, 48.0, - 17.0, 30.0, 480.0, 208.0, 480.0, 8.0, 36.0, 112.0, 54.0, 32.0, 21.0, - 108.0, 144.0, 240.0, 48.0, 192.0, 12.0, 160.0, 5.0, 4.0, 200.0, 60.0, - 168.0, 288.0, 21.0, 4.0, 232.0, 34.0, 92.0, 20.0, 44.0, 368.0, 27.0, - 1.0, 3.0, 32.0, 7.0, 320.0, 288.0, 60.0, 7.0, 92.0, 4.0, 124.0, - 320.0, 76.0, 480.0, 19.0, 96.0, 44.0, 8.0, 320.0, 84.0, 56.0, 416.0, - 56.0, 28.0, 36.0, 8.0, 18.0, 0.0, 448.0, 112.0, 20.0, 36.0, 28.0, - 144.0, 32.0, 120.0, 28.0, 80.0, 22.0, 28.0, 64.0, 50.0, 108.0, 112.0, - 56.0, 12.0, 20.0, 248.0, 104.0, 120.0, 24.0, 24.0, 136.0, 208.0, 20.0, - 32.0, 16.0, 256.0, 4.0, 22.0, 168.0, 120.0, 21.0, 14.0, 400.0, 14.0, - 96.0, 24.0, 112.0, 7.0, 120.0, 248.0, 48.0, 32.0, 29.0, 16.0, 32.0, - 14.0, 0.0, 4.0, 32.0, 0.0, 232.0, 16.0, 32.0, 38.0, 288.0, 176.0, - 10.0, 192.0, 13.0, 124.0, 20.0, 88.0, 64.0, 256.0, 80.0, 26.0, 100.0, - 22.0, 0.0, 26.0, 48.0, 48.0, 168.0, 14.0, 144.0, 136.0, 6.0, 28.0, - 56.0, 12.0, 92.0, 52.0, 496.0, 3.0, 30.0, 152.0, 416.0, 16.0, 176.0, - 14.0, 232.0, 288.0, 7.0, 28.0, 96.0, 50.0, 4.0, 27.0, 62.0, 46.0, - 104.0, 16.0, 144.0, 64.0, 0.0, 32.0, 128.0, 0.0, 4.0, 368.0, 32.0, - 128.0, 64.0, 104.0, 30.0, 64.0, 30.0, 384.0, 27.0, 23.0, 112.0, 144.0, - 0.0, 248.0, 8.0, 96.0, 10.0, 144.0, 256.0, 62.0, 48.0, 128.0, 88.0, - 272.0, 10.0, 416.0, 288.0, 168.0, 5.0, 44.0, 30.0, 52.0, 0.0, 88.0, - 152.0, 104.0, 216.0, 12.0, 432.0, 68.0, 124.0, 232.0, 50.0, 68.0, 128.0, - 96.0, 116.0, 42.0, 272.0, 28.0, 12.0, 128.0, 88.0, 24.0, 208.0, 38.0, - 36.0, 92.0, 464.0, 80.0, 26.0, 400.0, 496.0, 58.0, 32.0, 28.0, 32.0, - 23.0, 224.0, 336.0, 20.0, 18.0, 168.0, 24.0, 48.0, 5.0, 62.0, 0.0, - 72.0, 54.0, 216.0, 15.0, 2.0, 160.0, 23.0, 496.0, 108.0, 30.0, 6.0, - 144.0, 26.0, 92.0, 18.0, 48.0, 112.0, 176.0, 68.0, 272.0, 208.0, 27.0, - 152.0, 232.0, 26.0, 48.0, 23.0, 16.0, 9.0, 368.0, 24.0, 14.0, 36.0, - 8.0, 28.0, 18.0, 224.0, 48.0, 336.0, 24.0, 2.0, 52.0, 96.0, 64.0, - 256.0, 40.0, 144.0, 112.0, 496.0, 84.0, 208.0, 272.0, 56.0, 84.0, 20.0, - 16.0, 432.0, 144.0, 496.0, 16.0, 56.0, 120.0, 9.0, 29.0, 336.0, 26.0, - 30.0, 5.0, 352.0, 48.0, 144.0, 36.0, 34.0, 256.0, 28.0, 108.0, 14.0, - 104.0, 80.0, 11.0, 416.0, 4.0, 40.0, 32.0, 30.0, 31.0, 224.0, 50.0, - 84.0, 29.0, 48.0, 0.0, 18.0, 224.0, 48.0, 208.0, 31.0, 112.0, 152.0, - 144.0, 176.0, 144.0, 64.0, 25.0, 34.0, 80.0, 56.0, 19.0, 304.0, 4.0, - 112.0, 52.0, 40.0, 240.0, 32.0, 12.0, 18.0, 104.0, 50.0, 24.0, 184.0, - 28.0, 108.0, 40.0, 272.0, 19.0, 18.0, 13.0, 22.0, 96.0, 272.0, 8.0, - 112.0, 224.0, 40.0, 19.0, 44.0, 232.0, 4.0, 304.0, 16.0, 62.0, 152.0, - 12.0, 21.0, 384.0, 184.0, 432.0, 80.0, 6.0, 400.0, 400.0, 272.0, 84.0, - 25.0, 40.0, 50.0, 60.0, 224.0, 0.0, 124.0, 27.0, 21.0, 288.0, 12.0, - 48.0, 16.0, 192.0, 168.0, 15.0, 96.0, 92.0, 112.0, 96.0, 16.0, 76.0, - 64.0, 80.0, 432.0, 16.0, 54.0, 64.0, 144.0, 22.0, 4.0, 232.0, 44.0, - 48.0, 160.0, 16.0, 136.0, 96.0, 32.0, 24.0, 200.0, 116.0, 8.0, 108.0, - 92.0, 8.0, 6.0, 56.0, 176.0, 496.0, 224.0, 10.0, 10.0, 56.0, 256.0, - 224.0, 32.0, 124.0, 42.0, 192.0, 24.0, 48.0, 240.0, 2.0, 0.0, 336.0, - 352.0, 104.0, 208.0, 18.0, 480.0, 8.0, 10.0, 32.0, 7.0, 46.0, 192.0, - 384.0, 16.0, 50.0, 30.0, 18.0, 36.0, 20.0, 72.0, 184.0, 46.0, 108.0, - 19.0, 26.0, 112.0, 60.0, 29.0, 7.0, 6.0, 168.0, 40.0, 112.0, 36.0, - 240.0, 32.0, 72.0, 12.0, 16.0, 160.0, 104.0, 15.0, 6.0, 128.0, 44.0, - 448.0, 184.0, 0.0, 48.0, 168.0, 64.0, 52.0, 152.0, 16.0, 27.0, 26.0, - 22.0, 144.0, 44.0, 224.0, 176.0, 24.0, 40.0, 200.0, 72.0, 16.0, 108.0, - 13.0, 104.0, 0.0, 50.0, 52.0, 184.0, 60.0, 192.0, 36.0, 38.0, 108.0, - 80.0, 160.0, 46.0, 40.0, 176.0, 464.0, 18.0, 24.0, 30.0, 108.0, 0.0, - 160.0, 13.0, 27.0, 6.0, 20.0, 96.0, 88.0, 12.0, 6.0, 72.0, 5.0, - 100.0, 42.0, 136.0, 16.0, 15.0, 2.0, 240.0, 480.0, 184.0, 16.0, 44.0, - 34.0, 23.0, 48.0, 4.0, 200.0, 18.0, 0.0, 54.0, 46.0, 16.0, 368.0, - 400.0, 9.0, 10.0, 20.0, 14.0, 36.0, 124.0, 20.0, 72.0, 29.0, 0.0, - 5.0, 336.0, 192.0, 18.0, 0.0, 32.0, 112.0, 104.0, 22.0, 32.0, 16.0, - 120.0, 0.0, 4.0, 80.0, 20.0, 8.0, 17.0, 44.0, 8.0, 16.0, 304.0, - 9.0, 30.0, 80.0, 16.0, 36.0, 112.0, 104.0, 240.0, 92.0, 304.0, 16.0, - 432.0, 12.0, 6.0, 42.0, 2.0, 28.0, 5.0, 64.0, 136.0, 88.0, 30.0, - 38.0, 80.0, 112.0, 4.0, 19.0, 23.0, 21.0, 5.0, 24.0, 19.0, 24.0, - 20.0, 168.0, 14.0, 128.0, 54.0, 96.0, 64.0, 36.0, 7.0, 144.0, 8.0, - 24.0, 104.0, 72.0, 0.0, 2.0, 18.0, 76.0, 46.0, 416.0, 40.0, 60.0, - 32.0, 32.0, 30.0, 1.0, 32.0, 24.0, 8.0, 10.0, 336.0, 36.0, 7.0, - 120.0, 416.0, 88.0, 0.0, 96.0, 88.0, 21.0, 44.0, 208.0, 232.0, 18.0, - 84.0, 56.0, 120.0, 30.0, 10.0, 16.0, 56.0, 60.0, 104.0, 184.0, 16.0, - 16.0, 0.0, 480.0, 448.0, 28.0, 64.0, 4.0, 8.0, 0.0, 240.0, 272.0, - 16.0, 6.0, 108.0, 272.0, 168.0, 216.0, 88.0, 48.0, 12.0, 11.0, 10.0, - 46.0, 272.0, 32.0, 176.0, 384.0, 240.0, 176.0, 192.0, 13.0, 52.0, 40.0, - 464.0, 448.0, 28.0, 3.0, 80.0, 128.0, 96.0, 320.0, 14.0, 2.0, 18.0, - 28.0, 50.0, 5.0, 24.0, 96.0, 320.0, 304.0, 48.0, 192.0, 56.0, 256.0, - 20.0, 40.0, 18.0, 40.0, 64.0, 136.0, 64.0, 42.0, 80.0, 16.0, 34.0, - 368.0, 11.0, 7.0, 36.0, 8.0, 240.0, 64.0, 10.0, 44.0, 44.0, 184.0, - 28.0, 1.0, 44.0, 2.0, 17.0, 54.0, 12.0, 36.0, 116.0, 52.0, 16.0, - 96.0, 48.0, 40.0, 48.0, 3.0, 1.0, 56.0, 24.0, 168.0, 32.0, 208.0, - 27.0, 0.0, 416.0, 232.0, 7.0, 22.0, 48.0, 224.0, 22.0, 72.0, 112.0, - 48.0, 15.0, 0.0, 80.0, 160.0, 5.0, 16.0, 12.0, 22.0, 36.0, 80.0, - 18.0, 22.0, 24.0, 168.0, 9.0, 128.0, 80.0, 40.0, 42.0, 24.0, 5.0, - 11.0, 2.0, 72.0, 5.0, 16.0, 32.0, 0.0, 21.0, 496.0, 19.0, 40.0, - 29.0, 136.0, 72.0, 52.0, 13.0, 18.0, 34.0, 216.0, 24.0, 80.0, 352.0, - 44.0, 12.0, 128.0, 25.0, 28.0, 160.0, 8.0, 448.0, 128.0, 88.0, 384.0, - 6.0, 10.0, 304.0, 30.0, 64.0, 464.0, 48.0, 136.0, 240.0, 34.0, 92.0, - 224.0, 8.0, 38.0, 232.0, 42.0, 92.0, 8.0, 120.0, 48.0, 320.0, 272.0, - 38.0, 20.0, 160.0, 50.0, 29.0, 448.0, 416.0, 176.0, 232.0, 24.0, 24.0, - 108.0, 16.0, 112.0, 144.0, 112.0, 240.0, 144.0, 128.0, 400.0, 108.0, 88.0, - 104.0, 36.0, 8.0, 72.0, 46.0, 72.0, 4.0, 22.0, 30.0, 56.0, 27.0, - 184.0, 30.0, 112.0, 248.0, 10.0, 20.0, 68.0, 320.0, 9.0, 176.0, 10.0, - 256.0, 36.0, 30.0, 192.0, 3.0, 152.0, 21.0, 272.0, 54.0, 112.0, 20.0, - 8.0, 2.0, 0.0, 88.0, 32.0, 208.0, 448.0, 27.0, 32.0, 8.0, 128.0, - 0.0, 16.0, 8.0, 160.0, 120.0, 100.0, 48.0, 58.0, 160.0, 24.0, 56.0, - 46.0, 6.0, 20.0, 88.0, 1.0, 24.0, 224.0, 48.0, 8.0, 112.0, 40.0, - 136.0, 19.0, 88.0, 8.0, 16.0, 448.0, 176.0, 0.0, 14.0, 400.0, 24.0, - 160.0, 104.0, 52.0, 40.0, 96.0, 0.0, 1.0, 17.0, 128.0, 0.0, 0.0, - 42.0, 64.0, 62.0, 3.0, 256.0, 448.0, 320.0, 16.0, 96.0, 6.0, 28.0, - 10.0, 1.0, 68.0, 208.0, 52.0, 26.0, 84.0, 60.0, 11.0, 11.0, 44.0, - 26.0, 96.0, 88.0, 64.0, 52.0, 20.0, 480.0, 8.0, 22.0, 11.0, 104.0, - 92.0, 6.0, 80.0, 46.0, 80.0, 2.0, 29.0, 26.0, 48.0, 16.0, 30.0, - 152.0, 160.0, 16.0, 56.0, 50.0, 0.0, 16.0, 336.0, 29.0, 2.0, 448.0, - 46.0, 26.0, 22.0, 320.0, 24.0, 4.0, 46.0, 29.0, 256.0, 14.0, 12.0, - 480.0, 112.0, 24.0, 96.0, 112.0, 232.0, 4.0, 24.0, 64.0, 56.0, 6.0, - 48.0, 56.0, 160.0, 28.0, 28.0, 48.0, 40.0, 50.0, 2.0, 32.0, 24.0, - 20.0, 112.0, 54.0, 28.0, 1.0, 48.0, 224.0, 120.0, 64.0, 12.0, 64.0, - 92.0, 384.0, 24.0, 480.0, 304.0, 0.0, 88.0, 32.0, 256.0, 5.0, 76.0, - 464.0, 24.0, 13.0, 240.0, 104.0, 88.0, 96.0, 19.0, 432.0, 48.0, 12.0, - 27.0, 92.0, 40.0, 18.0, 232.0, 8.0, 72.0, 15.0, 17.0, 21.0, 80.0, - 1.0, 13.0, 48.0, 30.0, 104.0, 80.0, 0.0, 100.0, 464.0, 80.0, 4.0, - 28.0, 52.0, 80.0, 6.0, 232.0, 416.0, 192.0, 46.0, 36.0, 176.0, 192.0, - 28.0, 32.0, 184.0, 36.0, 224.0, 0.0, 32.0, 2.0, 248.0, 352.0, 176.0, - 14.0, 2.0, 84.0, 288.0, 104.0, 22.0, 12.0, 384.0, 15.0, 200.0, 240.0, - 80.0, 24.0, 18.0, 9.0, 0.0, 22.0, 160.0, 12.0, 304.0, 208.0, 400.0, - 2.0, 36.0, 96.0, 480.0, 144.0, 4.0, 14.0, 0.0, 20.0, 124.0, 28.0, - 136.0, 104.0, 96.0, 8.0, 24.0, 368.0, 240.0, 18.0, 22.0, 3.0, 31.0, - 480.0, 28.0, 56.0, 40.0, 232.0, 0.0, 256.0, 144.0, 64.0, 208.0, 14.0, - 112.0, 62.0, 25.0, 7.0, 20.0, 28.0, 48.0, 0.0, 76.0, 112.0, 96.0, - 25.0, 48.0, 120.0, 120.0, 27.0, 8.0, 13.0, 88.0, 30.0, 48.0, 240.0, - 240.0, 52.0, 80.0, 20.0, 29.0, 32.0, 0.0, 24.0, 12.0, 25.0, 104.0, - 56.0, 400.0, 96.0, 88.0, 16.0, 200.0, 4.0, 54.0, 112.0, 136.0, 0.0, - 52.0, 40.0, 0.0, 13.0, 72.0, 40.0, 184.0, 72.0, 22.0, 48.0, 40.0, - 304.0, 64.0, 4.0, 200.0, 52.0, 432.0, 56.0, 192.0, 96.0, 12.0, 50.0, - 96.0, 320.0, 96.0, 400.0, 11.0, 20.0, 0.0, 1.0, 68.0, 160.0, 184.0, - 8.0, 48.0, 216.0, 92.0, 168.0, 4.0, 44.0, 288.0, 1.0, 160.0, 36.0, - 112.0, 4.0, 12.0, 42.0, 168.0, 80.0, 24.0, 496.0, 288.0, 72.0, 8.0, - 30.0, 60.0, 64.0, 0.0, 62.0, 7.0, 368.0, 14.0, 56.0, 56.0, 8.0, - 176.0, 40.0, 5.0, 19.0, 0.0, 68.0, 52.0, 48.0, 128.0, 40.0, 120.0, - 18.0, 176.0, 16.0, 15.0, 16.0, 6.0, 25.0, 248.0, 22.0, 40.0, 60.0, - 60.0, 30.0, 64.0, 28.0, 144.0, 64.0, 22.0, 36.0, 12.0, 44.0, 44.0, - 24.0, 9.0, 48.0, 200.0, 9.0, 27.0, 52.0, 56.0, 192.0, 60.0, 44.0, - 19.0, 21.0, 240.0, 176.0, 168.0, 28.0, 496.0, 116.0, 80.0, 50.0, 48.0, - 11.0, 80.0, 56.0, 464.0, 48.0, 62.0, 96.0, 32.0, 72.0, 5.0, 96.0, - 52.0, 14.0, 16.0, 4.0, 21.0, 20.0, 6.0, 320.0, 24.0, 92.0, 80.0, - 100.0, 40.0, 24.0, 64.0, 62.0, 232.0, 6.0, 22.0, 464.0, 104.0, 42.0, - 248.0, 20.0, 400.0, 144.0, 272.0, 40.0, 80.0, 13.0, 20.0, 40.0, 116.0, - 216.0, 52.0, 216.0, 96.0, 216.0, 42.0, 0.0, 72.0, 8.0, 16.0, 1.0, - 16.0, 48.0, 320.0, 30.0, 28.0, 64.0, 160.0, 10.0, 28.0, 0.0, 52.0, - 2.0, 40.0, 136.0, 2.0, 64.0, 120.0, 13.0, 56.0, 16.0, 16.0, 192.0, - 144.0, 32.0, 44.0, 136.0, 8.0, 10.0, 368.0, 8.0, 16.0, 13.0, 224.0, - 7.0, 88.0, 304.0, 128.0, 80.0, 84.0, 224.0, 64.0, 52.0, 120.0, 40.0, - 384.0, 34.0, 42.0, 8.0, 72.0, 32.0, 208.0, 144.0, 40.0, 8.0, 58.0, - 40.0, 12.0, 56.0, 52.0, 26.0, 32.0, 56.0, 16.0, 216.0, 36.0, 36.0, - 464.0, 72.0, 224.0, 7.0, 240.0, 4.0, 16.0, 26.0, 20.0, 11.0, 100.0, - 0.0, 27.0, 56.0, 160.0, 44.0, 54.0, 54.0, 16.0, 24.0, 31.0, 54.0, - 336.0, 144.0, 23.0, 20.0, 56.0, 32.0, 15.0, 22.0, 28.0, 64.0, 40.0, - 120.0, 22.0, 100.0, 16.0, 144.0, 120.0, 0.0, 224.0, 400.0, 240.0, 42.0, - 17.0, 58.0, 288.0, 120.0, 2.0, 496.0, 128.0, 184.0, 14.0, 96.0, 46.0, - 30.0, 36.0, 464.0, 16.0, 28.0, 84.0, 46.0, 62.0, 38.0, 88.0, 64.0, - 28.0, 216.0, 192.0, 2.0, 384.0, 80.0, 32.0, 64.0, 13.0, 104.0, 116.0, - 216.0, 31.0, 384.0, 352.0, 40.0, 48.0, 48.0, 96.0, 56.0, 8.0, 84.0, - 68.0, 52.0, 5.0, 28.0, 48.0, 0.0, 40.0, 10.0, 30.0, 32.0, 25.0, - 11.0, 48.0, 18.0, 50.0, 0.0, 48.0, 40.0, 0.0, 24.0, 208.0, 64.0, - 184.0, 4.0, 12.0, 46.0, 16.0, 432.0, 208.0, 64.0, 29.0, 64.0, 31.0, - 168.0, 22.0, 11.0, 26.0, 6.0, 208.0, 176.0, 448.0, 11.0, 3.0, 36.0, - 84.0, 48.0, 34.0, 8.0, 56.0, 96.0, 480.0, 19.0, 21.0, 26.0, 16.0, - 0.0, 4.0, 62.0, 144.0, 160.0, 8.0, 96.0, 28.0, 64.0, 13.0, 216.0, - 31.0, 144.0, 320.0, 28.0, 0.0, 0.0, 36.0, 62.0, 48.0, 64.0, 25.0, - 24.0, 16.0, 20.0, 336.0, 8.0, 22.0, 34.0, 26.0, 1.0, 92.0, 448.0, - 32.0, 62.0, 32.0, 400.0, 272.0, 128.0, 60.0, 62.0, 10.0, 96.0, 144.0, - 50.0, 160.0, 8.0, 22.0, 16.0, 112.0, 64.0, 58.0, 17.0, 62.0, 320.0, - 18.0, 48.0, 11.0, 248.0, 52.0, 6.0, 40.0, 64.0, 240.0, 76.0, 24.0, - 96.0, 29.0, 112.0, 8.0, 42.0, 72.0, 60.0, 52.0, 2.0, 6.0, 176.0, - 6.0, 16.0, 116.0, 496.0, 23.0, 112.0, 184.0, 96.0, 60.0, 368.0, 16.0, - 58.0, 11.0, 32.0, 400.0, 432.0, 52.0, 384.0, 34.0, 22.0, 22.0, 13.0, - 248.0, 12.0, 48.0, 7.0, 496.0, 104.0, 176.0, 26.0, 29.0, 80.0, 128.0, - 64.0, 16.0, 20.0, 112.0, 176.0, 8.0, 29.0, 432.0, 52.0, 48.0, 144.0, - 50.0, 21.0, 80.0, 62.0, 84.0, 56.0, 48.0, 288.0, 28.0, 28.0, 248.0, - 72.0, 120.0, 8.0, 208.0, 7.0, 100.0, 29.0, 12.0, 416.0, 10.0, 6.0, - 100.0, 22.0, 88.0, 24.0, 32.0, 16.0, 20.0, 32.0, 320.0, 68.0, 224.0, - 27.0, 352.0, 120.0, 24.0, 368.0, 184.0, 42.0, 128.0, 124.0, 432.0, 25.0, - 38.0, 48.0, 32.0, 40.0, 84.0, 32.0, 240.0, 30.0, 72.0, 6.0, 27.0, - 68.0, 23.0, 21.0, 17.0, 92.0, 168.0, 176.0, 464.0, 232.0, 64.0, 176.0, - 56.0, 0.0, 28.0, 19.0, 17.0, 54.0, 112.0, 30.0, 0.0, 88.0, 184.0, - 256.0, 0.0, 24.0, 48.0, 112.0, 224.0, 24.0, 1.0, 30.0, 320.0, 20.0, - 6.0, 38.0, 144.0, 10.0, 432.0, 432.0, 20.0, 7.0, 208.0, 96.0, 34.0, - 29.0, 12.0, 56.0, 128.0, 27.0, 4.0, 72.0, 192.0, 208.0, 96.0, 16.0, - 28.0, 92.0, 38.0, 248.0, 9.0, 20.0, 304.0, 80.0, 36.0, 8.0, 14.0, - 21.0, 176.0, 96.0, 8.0, 288.0, 100.0, 192.0, 13.0, 28.0, 216.0, 124.0, -}; -float verify_data[O_SIZE] = { - 831.0, 869.0, 850.0, 876.0, 590.0, 844.0, 1064.0, 1083.0, 1000.0, - 688.0, 953.0, 1025.0, 973.0, 575.0, 330.0, 474.0, 873.0, 1010.0, - 954.0, 697.0, 566.0, 404.0, 328.0, 622.0, 890.0, 1034.0, 721.0, - 454.0, 312.0, 1145.0, 1095.0, 1303.0, 513.0, 1072.0, 1092.0, 1216.0, - 657.0, 367.0, 241.0, 1072.0, 1180.0, 1174.0, 491.0, 391.0, 509.0, - 968.0, 1128.0, 1169.0, 929.0, 779.0, 886.0, 702.0, 758.0, 536.0, - 844.0, 1390.0, 1548.0, 1344.0, 1028.0, 1040.0, 1592.0, 1629.0, 1508.0, - 956.0, 771.0, 1252.0, 967.0, 994.0, 446.0, 558.0, 783.0, 805.0, - 716.0, 317.0, 717.0, 1043.0, 1100.0, 630.0, 530.0, 963.0, 1211.0, - 1369.0, 1088.0, 1118.0, 1132.0, 909.0, 625.0, 147.0, 144.0, 418.0, - 852.0, 905.0, 798.0, 646.0, 797.0, 718.0, 910.0, 914.0, 1212.0, - 769.0, 1065.0, 863.0, 919.0, 1127.0, 1008.0, 1471.0, 913.0, 843.0, - 567.0, 628.0, 814.0, 747.0, 825.0, 1010.0, 1393.0, 1260.0, 920.0, - 639.0, 689.0, 823.0, 831.0, 1172.0, 1296.0, 928.0, 962.0, 1276.0, - 1391.0, 1587.0, 1137.0, 1282.0, 702.0, 568.0, 322.0, 446.0, 766.0, - 1189.0, 1020.0, 924.0, 659.0, 800.0, 624.0, 669.0, 817.0, 861.0, - 826.0, 581.0, 676.0, 494.0, 1475.0, 1301.0, 1499.0, 777.0, 1340.0, - 1304.0, 1088.0, 465.0, 343.0, 297.0, 666.0, 626.0, 616.0, 335.0, - 311.0, 521.0, 554.0, 726.0, 990.0, 1020.0, 894.0, 670.0, 718.0, - 731.0, 551.0, 365.0, 550.0, 504.0, 1134.0, 1394.0, 1508.0, 1291.0, - 1104.0, 1055.0, 916.0, 675.0, 1252.0, 1016.0, 936.0, 276.0, 400.0, - 708.0, 687.0, 633.0, 379.0, 390.0, 425.0, 629.0, 664.0, 735.0, - 542.0, 565.0, 753.0, 877.0, 1029.0, 1051.0, 913.0, 741.0, 311.0, - 450.0, 826.0, 1418.0, 1373.0, 1155.0, 735.0, 790.0, 894.0, 734.0, - 691.0, 829.0, 991.0, 1734.0, 1484.0, 1283.0, 858.0, 623.0, 1164.0, - 795.0, 1030.0, 696.0, 794.0, 975.0, 1168.0, 1051.0, 1528.0, 1699.0, - 1676.0, 972.0, 531.0, 705.0, 572.0, 435.0, 719.0, 843.0, 842.0, - 926.0, 1256.0, 1297.0, 1293.0, 935.0, 977.0, 606.0, 383.0, 460.0, - 543.0, 805.0, 860.0, 760.0, 947.0, 838.0, 926.0, 670.0, 754.0, - 601.0, 563.0, 557.0, 803.0, 808.0, 557.0, 1122.0, 1188.0, 1205.0, - 734.0, 1285.0, 1385.0, 1506.0, 881.0, 909.0, 511.0, 362.0, 201.0, - 285.0, 388.0, 387.0, 507.0, 716.0, 800.0, 1104.0, 943.0, 919.0, - 555.0, 546.0, 480.0, 299.0, 245.0, 440.0, 459.0, 765.0, 1398.0, - 1484.0, 1291.0, 1008.0, 917.0, 852.0, 411.0, 662.0, 632.0, 608.0, - 400.0, 788.0, 1336.0, 1239.0, 1017.0, 469.0, 874.0, 699.0, 979.0, - 660.0, 889.0, 576.0, 683.0, 833.0, 1047.0, 1109.0, 757.0, 553.0, - 407.0, 349.0, 880.0, 1296.0, 1612.0, 1153.0, 727.0, 587.0, 718.0, - 918.0, 694.0, 504.0, 454.0, 641.0, 1201.0, 1143.0, 1045.0, 842.0, - 785.0, 1054.0, 781.0, 942.0, 876.0, 975.0, 1131.0, 1249.0, 1112.0, - 1294.0, 944.0, 952.0, 500.0, 552.0, 992.0, 592.0, 339.0, 651.0, - 541.0, 554.0, 448.0, 850.0, 1362.0, 1556.0, 1378.0, 934.0, 531.0, - 338.0, 532.0, 629.0, 697.0, 670.0, 922.0, 1157.0, 1002.0, 776.0, - 552.0, 1012.0, 862.0, 800.0, 746.0, 976.0, 1057.0, 566.0, 726.0, - 1043.0, 1084.0, 1225.0, 1441.0, 1551.0, 1432.0, 1000.0, 970.0, 583.0, - 361.0, 196.0, 243.0, 317.0, 338.0, 361.0, 584.0, 678.0, 1040.0, - 826.0, 776.0, 537.0, 554.0, 497.0, 281.0, 333.0, 817.0, 817.0, - 1199.0, 1506.0, 1534.0, 1148.0, 920.0, 974.0, 1122.0, 728.0, 676.0, - 436.0, 577.0, 579.0, 1003.0, 1148.0, 1233.0, 1375.0, 1113.0, 1400.0, - 748.0, 924.0, 598.0, 834.0, 499.0, 581.0, 467.0, 589.0, 459.0, - 457.0, 351.0, 549.0, 623.0, 1180.0, 1432.0, 1538.0, 1140.0, 956.0, - 676.0, 542.0, 528.0, 604.0, 920.0, 882.0, 1141.0, 1153.0, 1065.0, - 799.0, 668.0, 605.0, 574.0, 551.0, 710.0, 771.0, 557.0, 721.0, - 1008.0, 1006.0, 1116.0, 1146.0, 1176.0, 802.0, 606.0, 1008.0, 880.0, - 744.0, 270.0, 99.0, 479.0, 587.0, 878.0, 1198.0, 1133.0, 1185.0, - 497.0, 562.0, 579.0, 901.0, 785.0, 833.0, 694.0, 1132.0, 1191.0, - 1021.0, 655.0, 329.0, 876.0, 832.0, 770.0, 854.0, 1016.0, 1399.0, - 824.0, 852.0, 849.0, 1028.0, 1111.0, 1327.0, 1135.0, 1314.0, 909.0, - 1201.0, 894.0, 763.0, 490.0, 621.0, 616.0, 509.0, 276.0, 518.0, - 641.0, 859.0, 511.0, 856.0, 873.0, 1182.0, 734.0, 448.0, 346.0, - 909.0, 949.0, 909.0, 708.0, 786.0, 839.0, 771.0, 779.0, 984.0, - 1042.0, 1020.0, 645.0, 577.0, 683.0, 1130.0, 1284.0, 1396.0, 1538.0, - 1208.0, 1524.0, 907.0, 1207.0, 631.0, 780.0, 515.0, 579.0, 565.0, - 546.0, 438.0, 486.0, 338.0, 510.0, 704.0, 1110.0, 994.0, 970.0, - 656.0, 900.0, 1020.0, 1118.0, 892.0, 516.0, 935.0, 1013.0, 1042.0, - 555.0, 527.0, 457.0, 613.0, 930.0, 943.0, 845.0, 480.0, 687.0, - 562.0, 494.0, 453.0, 848.0, 990.0, 1057.0, 659.0, 517.0, 357.0, - 1057.0, 1005.0, 1003.0, 753.0, 689.0, 914.0, 620.0, 897.0, 1367.0, - 1350.0, 1245.0, 812.0, 843.0, 1047.0, 834.0, 746.0, 760.0, 537.0, - 927.0, 1103.0, 1149.0, 762.0, 304.0, 963.0, 930.0, 900.0, 854.0, - 890.0, 1305.0, 847.0, 1435.0, 1250.0, 1388.0, 1066.0, 1292.0, 1072.0, - 808.0, 435.0, 701.0, 982.0, 942.0, 710.0, 789.0, 684.0, 567.0, - 148.0, 147.0, 241.0, 365.0, 492.0, 819.0, 982.0, 1063.0, 694.0, - 433.0, 720.0, 1357.0, 1386.0, 1142.0, 456.0, 582.0, 361.0, 419.0, - 313.0, 690.0, 950.0, 985.0, 900.0, 1048.0, 1459.0, 1194.0, 860.0, - 694.0, 1124.0, 1070.0, 1176.0, 675.0, 1067.0, 667.0, 704.0, 381.0, - 343.0, 445.0, 248.0, 249.0, 657.0, 768.0, 951.0, 757.0, 684.0, - 516.0, 566.0, 748.0, 1042.0, 1144.0, 1092.0, 898.0, 918.0, 1376.0, - 1470.0, 1308.0, 970.0, 846.0, 472.0, 310.0, 539.0, 823.0, 862.0, - 583.0, 521.0, 463.0, 500.0, 534.0, 1051.0, 1082.0, 1227.0, 859.0, - 865.0, 857.0, 1073.0, 1147.0, 1325.0, 1065.0, 1061.0, 1065.0, 793.0, - 886.0, 880.0, 843.0, 846.0, 1039.0, 1020.0, 1162.0, 729.0, 739.0, - 787.0, 1016.0, 1484.0, 1534.0, 1335.0, 916.0, 860.0, 1047.0, 943.0, - 669.0, 527.0, 630.0, 994.0, 920.0, 1432.0, 1038.0, 1156.0, 560.0, - 658.0, 443.0, 639.0, 514.0, 911.0, 1311.0, 1739.0, 1419.0, 1211.0, - 722.0, 603.0, 389.0, 397.0, 377.0, 363.0, 917.0, 1360.0, 1582.0, - 1365.0, 1091.0, 661.0, 690.0, 890.0, 947.0, 847.0, 473.0, 560.0, - 384.0, 746.0, 580.0, 572.0, 466.0, 581.0, 922.0, 934.0, 1381.0, - 1012.0, 903.0, 567.0, 775.0, 664.0, 992.0, 762.0, 1226.0, 720.0, - 690.0, 338.0, 506.0, 528.0, 394.0, 219.0, 712.0, 813.0, 1304.0, - 1445.0, 1296.0, 872.0, 570.0, 688.0, 710.0, 960.0, 1150.0, 1214.0, - 1070.0, 1090.0, 1062.0, 804.0, 898.0, 821.0, 603.0, 441.0, 818.0, - 1118.0, 977.0, 527.0, 458.0, 611.0, 757.0, 732.0, 1170.0, 1242.0, - 1099.0, 1005.0, 1037.0, 1183.0, 1157.0, 848.0, 1213.0, 1566.0, 1437.0, - 926.0, 576.0, 934.0, 1104.0, 980.0, 711.0, 1016.0, 968.0, 862.0, - 411.0, 541.0, 533.0, 815.0, 1207.0, 1493.0, 1326.0, 901.0, 927.0, - 975.0, 857.0, 613.0, 404.0, 587.0, 487.0, 536.0, 944.0, 1350.0, - 1772.0, 1546.0, 1087.0, 638.0, 457.0, 512.0, 591.0, 1246.0, 1931.0, - 1803.0, 1188.0, 496.0, 488.0, 493.0, 460.0, 418.0, 540.0, 1102.0, - 1192.0, 1155.0, 713.0, 789.0, 451.0, 717.0, 983.0, 1179.0, 1055.0, - 635.0, 514.0, 344.0, 710.0, 596.0, 470.0, 180.0, 219.0, 649.0, - 828.0, 1287.0, 901.0, 769.0, 541.0, 721.0, 656.0, 733.0, 587.0, - 927.0, 624.0, 630.0, 248.0, 488.0, 412.0, 425.0, 254.0, 733.0, - 822.0, 1241.0, 1394.0, 1396.0, 1200.0, 571.0, 749.0, 508.0, 733.0, - 605.0, 712.0, 948.0, 883.0, 855.0, 1019.0, 1174.0, 1437.0, 1159.0, - 1130.0, 807.0, 797.0, 1013.0, 1399.0, 1110.0, 901.0, 983.0, 1018.0, - 1166.0, 721.0, 853.0, 1072.0, 1277.0, 1507.0, 1260.0, 753.0, 1047.0, - 1270.0, 974.0, 847.0, 820.0, 1180.0, 960.0, 781.0, 577.0, 572.0, - 896.0, 720.0, 659.0, 481.0, 561.0, 895.0, 1315.0, 1545.0, 1318.0, - 1114.0, 1016.0, 1003.0, 651.0, 827.0, 1057.0, 1221.0, 785.0, 492.0, - 488.0, 942.0, 1520.0, 1700.0, 1241.0, 612.0, 434.0, 412.0, 491.0, - 1005.0, 1746.0, 1638.0, 1023.0, 385.0, 366.0, 581.0, 515.0, 708.0, - 740.0, 1164.0, 1094.0, 1167.0, 1220.0, 1173.0, 746.0, 327.0, 516.0, - 835.0, 900.0, 640.0, 333.0, 484.0, 850.0, 906.0, 638.0, 396.0, - 492.0, 540.0, 381.0, 403.0, 327.0, 485.0, 679.0, 939.0, 930.0, - 693.0, 553.0, 885.0, 752.0, 972.0, 586.0, 822.0, 451.0, 472.0, - 372.0, 396.0, 329.0, 717.0, 1548.0, 1634.0, 1422.0, 571.0, 665.0, - 774.0, 977.0, 1207.0, 1262.0, 1124.0, 871.0, 458.0, 868.0, 835.0, - 1123.0, 1452.0, 1449.0, 1238.0, 605.0, 1096.0, 1556.0, 1532.0, 1064.0, - 996.0, 934.0, 954.0, 545.0, 857.0, 1024.0, 1127.0, 965.0, 956.0, - 602.0, 752.0, 994.0, 780.0, 962.0, 1035.0, 1155.0, 999.0, 691.0, - 544.0, 310.0, 711.0, 642.0, 942.0, 666.0, 738.0, 457.0, 433.0, - 645.0, 760.0, 889.0, 677.0, 656.0, 565.0, 819.0, 1041.0, 1291.0, - 969.0, 760.0, 516.0, 1382.0, 1962.0, 2488.0, 1601.0, 922.0, 278.0, - 212.0, 222.0, 768.0, 1101.0, 1186.0, 697.0, 639.0, 734.0, 735.0, - 731.0, 810.0, 969.0, 723.0, 873.0, 783.0, 1144.0, 1145.0, 952.0, - 749.0, 1008.0, 1358.0, 1181.0, 503.0, 195.0, 510.0, 576.0, 664.0, - 490.0, 534.0, 605.0, 483.0, 490.0, 493.0, 655.0, 798.0, 984.0, - 940.0, 702.0, 329.0, 297.0, 619.0, 754.0, 1446.0, 1172.0, 1097.0, - 342.0, 271.0, 326.0, 329.0, 624.0, 1012.0, 1476.0, 1384.0, 1260.0, - 853.0, 855.0, 846.0, 851.0, 983.0, 1444.0, 1358.0, 1265.0, 461.0, - 1069.0, 806.0, 1334.0, 1496.0, 1468.0, 1024.0, 478.0, 1060.0, 1560.0, - 1444.0, 968.0, 992.0, 936.0, 923.0, 350.0, 722.0, 621.0, 646.0, - 522.0, 834.0, 618.0, 592.0, 575.0, 637.0, 1062.0, 1203.0, 847.0, - 579.0, 379.0, 426.0, 238.0, 655.0, 654.0, 1046.0, 842.0, 1000.0, - 826.0, 718.0, 814.0, 712.0, 785.0, 641.0, 580.0, 504.0, 693.0, - 1070.0, 1253.0, 968.0, 656.0, 346.0, 808.0, 922.0, 1298.0, 956.0, - 703.0, 418.0, 217.0, 267.0, 432.0, 445.0, 730.0, 606.0, 1006.0, - 896.0, 808.0, 652.0, 986.0, 1043.0, 805.0, 713.0, 1066.0, 1467.0, - 1474.0, 1085.0, 826.0, 843.0, 946.0, 817.0, 489.0, 557.0, 811.0, - 723.0, 837.0, 756.0, 862.0, 667.0, 529.0, 673.0, 584.0, 766.0, - 1136.0, 1654.0, 1578.0, 978.0, 389.0, 729.0, 1059.0, 1244.0, 1560.0, - 1364.0, 1279.0, 556.0, 456.0, 515.0, 462.0, 676.0, 1008.0, 1288.0, - 1532.0, 1198.0, 981.0, 523.0, 719.0, 662.0, 966.0, 1412.0, 1390.0, - 1272.0, 468.0, 648.0, 370.0, 762.0, 993.0, 969.0, 789.0, 546.0, - 818.0, 722.0, 978.0, 791.0, 875.0, 743.0, 707.0, 550.0, 516.0, - 480.0, 446.0, 680.0, 975.0, 1152.0, 584.0, 549.0, 538.0, 711.0, - 917.0, 588.0, 562.0, 337.0, 454.0, 736.0, 791.0, 762.0, 704.0, - 826.0, 857.0, 779.0, 653.0, 912.0, 761.0, 651.0, 491.0, 435.0, - 986.0, 839.0, 1096.0, 1061.0, 1180.0, 877.0, 371.0, 623.0, 648.0, - 1072.0, 1016.0, 945.0, 605.0, 199.0, 241.0, 689.0, 893.0, 1165.0, - 889.0, 1049.0, 962.0, 722.0, 598.0, 832.0, 934.0, 888.0, 850.0, - 1194.0, 1049.0, 1197.0, 861.0, 946.0, 896.0, 822.0, 852.0, 576.0, - 1038.0, 933.0, 847.0, 843.0, 921.0, 857.0, 420.0, 298.0, 716.0, - 658.0, 850.0, 929.0, 1378.0, 1293.0, 746.0, 279.0, 643.0, 853.0, - 1084.0, 1328.0, 1236.0, 1104.0, 496.0, 404.0, 460.0, 397.0, 767.0, - 1065.0, 1328.0, 1512.0, 1362.0, 1151.0, 865.0, 528.0, 457.0, 361.0, - 831.0, 814.0, 793.0, 385.0, 512.0, 337.0, 692.0, 534.0, 806.0, - 1048.0, 1482.0, 1326.0, 746.0, 710.0, 679.0, 791.0, 819.0, 768.0, - 659.0, 341.0, 404.0, 368.0, 818.0, 1163.0, 1073.0, 583.0, 627.0, - 822.0, 925.0, 993.0, 447.0, 481.0, 302.0, 731.0, 989.0, 1002.0, - 710.0, 428.0, 607.0, 586.0, 741.0, 636.0, 898.0, 674.0, 725.0, - 437.0, 461.0, 737.0, 873.0, 1458.0, 1259.0, 1608.0, 1013.0, 750.0, - 386.0, 455.0, 696.0, 836.0, 750.0, 666.0, 288.0, 702.0, 940.0, - 1192.0, 1027.0, 1251.0, 1241.0, 1154.0, 572.0, 348.0, 521.0, 1100.0, - 1396.0, 1111.0, 986.0, 949.0, 1005.0, 889.0, 646.0, 580.0, 240.0, - 382.0, 534.0, 1038.0, 907.0, 853.0, 821.0, 901.0, 1015.0, 633.0, - 569.0, 623.0, 794.0, 924.0, 1049.0, 1284.0, 1455.0, 1152.0, 663.0, - 919.0, 1235.0, 1308.0, 1010.0, 690.0, 683.0, 449.0, 348.0, 505.0, - 475.0, 588.0, 332.0, 522.0, 834.0, 1098.0, 889.0, 725.0, 526.0, - 707.0, 521.0, 471.0, 316.0, 227.0, 316.0, 267.0, 278.0, 338.0, - 342.0, 570.0, 1060.0, 1372.0, 1246.0, 662.0, 740.0, 601.0, 565.0, - 799.0, 1069.0, 1088.0, 538.0, 533.0, 497.0, 1011.0, 1155.0, 930.0, - 368.0, 534.0, 663.0, 699.0, 671.0, 331.0, 445.0, 404.0, 793.0, - 1089.0, 1036.0, 778.0, 806.0, 889.0, 866.0, 581.0, 544.0, 854.0, - 732.0, 1027.0, 615.0, 613.0, 700.0, 889.0, 1402.0, 1182.0, 1866.0, - 1326.0, 1080.0, 480.0, 652.0, 1070.0, 984.0, 1048.0, 808.0, 608.0, - 808.0, 1120.0, 1570.0, 1371.0, 1413.0, 997.0, 932.0, 528.0, 652.0, - 763.0, 1200.0, 1284.0, 1007.0, 631.0, 603.0, 710.0, 1181.0, 1001.0, - 972.0, 388.0, 822.0, 818.0, 1046.0, 660.0, 1096.0, 1104.0, 1083.0, - 765.0, 507.0, 417.0, 425.0, 644.0, 747.0, 600.0, 384.0, 828.0, - 867.0, 872.0, 483.0, 985.0, 971.0, 1023.0, 507.0, 452.0, 311.0, - 478.0, 621.0, 564.0, 501.0, 495.0, 722.0, 523.0, 640.0, 456.0, - 693.0, 725.0, 907.0, 733.0, 563.0, 382.0, 319.0, 556.0, 623.0, - 682.0, 478.0, 361.0, 656.0, 1102.0, 1353.0, 1538.0, 1010.0, 874.0, - 350.0, 338.0, 636.0, 823.0, 895.0, 919.0, 972.0, 902.0, 914.0, - 1066.0, 312.0, 252.0, 834.0, 1188.0, 1554.0, 1106.0, 763.0, 523.0, - 530.0, 783.0, 1023.0, 938.0, 714.0, 842.0, 887.0, 939.0, 542.0, - 585.0, 528.0, 433.0, 647.0, 535.0, 887.0, 633.0, 818.0, 740.0, - 640.0, 1480.0, 1233.0, 1255.0, 587.0, 680.0, 1044.0, 704.0, 844.0, - 832.0, 1060.0, 1182.0, 990.0, 1038.0, 876.0, 1392.0, 1270.0, 1246.0, - 650.0, 690.0, 833.0, 1275.0, 1178.0, 947.0, 846.0, 1065.0, 970.0, - 1265.0, 1041.0, 1168.0, 492.0, 776.0, 602.0, 530.0, 264.0, 850.0, - 946.0, 1212.0, 966.0, 872.0, 724.0, 420.0, 750.0, 717.0, 939.0, - 724.0, 1113.0, 1087.0, 1023.0, 560.0, 904.0, 845.0, 715.0, 653.0, - 687.0, 858.0, 607.0, 707.0, 547.0, 466.0, 460.0, 542.0, 651.0, - 616.0, 578.0, 317.0, 502.0, 701.0, 635.0, 489.0, 370.0, 418.0, - 628.0, 846.0, 842.0, 666.0, 340.0, 401.0, 399.0, 463.0, 916.0, - 760.0, 980.0, 438.0, 556.0, 654.0, 866.0, 844.0, 916.0, 1038.0, - 1166.0, 1406.0, 1328.0, 260.0, 266.0, 766.0, 762.0, 1202.0, 760.0, - 868.0, 422.0, 690.0, 666.0, 1152.0, 922.0, 930.0, 1338.0, 1778.0, - 2038.0, 1182.0, 868.0, 586.0, 538.0, 866.0, 734.0, 1162.0, 844.0, - 904.0, 386.0, 188.0, 910.0, 1003.0, 1016.0, 382.0, 455.0, 686.0, - 616.0, 722.0, 730.0, 936.0, 780.0, 1138.0, 1396.0, 1444.0, 1206.0, - 874.0, 754.0, 450.0, 474.0, 924.0, 1050.0, 1023.0, 672.0, 831.0, - 778.0, 644.0, 803.0, 796.0, 965.0, 518.0, 799.0, 713.0, 614.0, - 366.0, 792.0, 1034.0, 1215.0, 977.0, 737.0, 702.0, 644.0, 680.0, - 555.0, 567.0, 598.0, 875.0, 775.0, 745.0, 390.0, 852.0, 859.0, - 783.0, 717.0, 722.0, 903.0, 713.0, 804.0, 641.0, 535.0, 563.0, - 624.0, 781.0, 635.0, 591.0, 236.0, 474.0, 519.0, 667.0, 711.0, - 774.0, 594.0, 858.0, 1044.0, 1052.0, 676.0, 338.0, 458.0, 416.0, - 490.0, 1052.0, 988.0, 1482.0, 852.0, 934.0, 634.0, 590.0, 616.0, - 810.0, 1004.0, 1178.0, 1296.0, 1258.0, 566.0, 466.0, 980.0, 1298.0, - 1736.0, 1194.0, 1072.0, 590.0, 1014.0, 820.0, 1342.0, 936.0, 900.0, - 1028.0, 1408.0, 1594.0, 1202.0, 908.0, 674.0, 389.0, 685.0, 697.0, - 1402.0, 1234.0, 1266.0, 520.0, 250.0, 936.0, 1166.0, 1172.0, 440.0, - 384.0, 440.0, 490.0, 386.0, 484.0, 984.0, 1264.0, 1732.0, 1490.0, - 1364.0, 1118.0, 1102.0, 1158.0, 714.0, 606.0, 627.0, 857.0, 878.0, - 714.0, 972.0, 955.0, 1146.0, 934.0, 927.0, 873.0, 838.0, 647.0, - 453.0, 213.0, 445.0, 518.0, 761.0, 846.0, 964.0, 864.0, 840.0, - 771.0, 715.0, 780.0, 812.0, 962.0, 846.0, 824.0, 935.0, 829.0, - 861.0, 676.0, 548.0, 770.0, 674.0, 1016.0, 774.0, 856.0, 520.0, - 536.0, 452.0, 748.0, 1366.0, 1747.0, 1461.0, 617.0, 274.0, 273.0, - 441.0, 557.0, 599.0, 481.0, 657.0, 888.0, 802.0, 496.0, 163.0, - 293.0, 361.0, 476.0, 710.0, 742.0, 1146.0, 924.0, 954.0, 858.0, - 888.0, 832.0, 521.0, 673.0, 861.0, 1072.0, 814.0, 1002.0, 788.0, - 402.0, 724.0, 829.0, 1195.0, 1247.0, 1090.0, 1032.0, 680.0, 902.0, - 636.0, 690.0, 1138.0, 1614.0, 1711.0, 1431.0, 871.0, 627.0, 308.0, - 712.0, 763.0, 1106.0, 861.0, 1049.0, 869.0, 678.0, 970.0, 958.0, - 1050.0, 536.0, 580.0, 644.0, 812.0, 610.0, 588.0, 806.0, 1112.0, - 1557.0, 1555.0, 1537.0, 1146.0, 1050.0, 1266.0, 1326.0, 1206.0, 799.0, - 709.0, 745.0, 901.0, 1023.0, 935.0, 990.0, 766.0, 819.0, 805.0, - 896.0, 767.0, 731.0, 567.0, 1159.0, 1242.0, 1439.0, 878.0, 640.0, - 534.0, 798.0, 861.0, 763.0, 808.0, 600.0, 766.0, 532.0, 774.0, - 984.0, 942.0, 936.0, 828.0, 1094.0, 986.0, 686.0, 764.0, 928.0, - 1031.0, 583.0, 533.0, 680.0, 940.0, 1298.0, 1449.0, 1371.0, 945.0, - 648.0, 480.0, 469.0, 573.0, 582.0, 472.0, 960.0, 1451.0, 1390.0, - 732.0, 151.0, 269.0, 342.0, 517.0, 731.0, 775.0, 1055.0, 837.0, - 806.0, 702.0, 940.0, 1034.0, 757.0, 565.0, 563.0, 556.0, 492.0, - 1642.0, 1428.0, 634.0, 793.0, 1226.0, 1636.0, 1612.0, 1083.0, 1021.0, - 660.0, 697.0, 751.0, 809.0, 1222.0, 1042.0, 999.0, 1391.0, 1053.0, - 1023.0, 214.0, 364.0, 427.0, 716.0, 647.0, 823.0, 733.0, 638.0, - 876.0, 810.0, 1030.0, 858.0, 962.0, 976.0, 882.0, 832.0, 944.0, - 1140.0, 1206.0, 1119.0, 839.0, 1115.0, 1036.0, 1203.0, 1219.0, 1323.0, - 1212.0, 739.0, 581.0, 445.0, 972.0, 1170.0, 1254.0, 996.0, 754.0, - 850.0, 854.0, 957.0, 780.0, 633.0, 447.0, 1100.0, 1302.0, 1355.0, - 767.0, 381.0, 573.0, 848.0, 855.0, 827.0, 736.0, 694.0, 1090.0, - 880.0, 1156.0, 932.0, 1282.0, 1184.0, 1448.0, 1462.0, 1374.0, 773.0, - 747.0, 999.0, 1223.0, 779.0, 949.0, 972.0, 1564.0, 1602.0, 1766.0, - 1696.0, 1524.0, 1102.0, 603.0, 210.0, 180.0, 176.0, 374.0, 804.0, - 1429.0, 1252.0, 898.0, 285.0, 677.0, 613.0, 746.0, 512.0, 550.0, - 650.0, 562.0, 556.0, 576.0, 978.0, 1126.0, 863.0, 535.0, 367.0, - 370.0, 303.0, 1277.0, 1258.0, 431.0, 572.0, 971.0, 1491.0, 1335.0, - 828.0, 569.0, 480.0, 769.0, 1033.0, 1093.0, 1110.0, 952.0, 875.0, - 1145.0, 1239.0, 1167.0, 663.0, 261.0, 306.0, 358.0, 393.0, 577.0, - 737.0, 696.0, 606.0, 452.0, 662.0, 820.0, 875.0, 805.0, 689.0, - 738.0, 916.0, 773.0, 613.0, 540.0, 603.0, 1283.0, 1164.0, 1172.0, - 763.0, 1015.0, 947.0, 752.0, 502.0, 470.0, 1234.0, 1568.0, 1772.0, - 988.0, 518.0, 392.0, 696.0, 703.0, 684.0, 573.0, 544.0, 1041.0, - 1145.0, 1123.0, 775.0, 466.0, 640.0, 880.0, 892.0, 918.0, 1006.0, - 900.0, 1198.0, 694.0, 904.0, 528.0, 876.0, 856.0, 1440.0, 1542.0, - 1518.0, 889.0, 605.0, 673.0, 909.0, 677.0, 899.0, 891.0, 1257.0, - 885.0, 708.0, 847.0, 1117.0, 999.0, 539.0, 240.0, 418.0, 507.0, - 655.0, 885.0, 1317.0, 1129.0, 809.0, 295.0, 674.0, 588.0, 830.0, - 519.0, 522.0, 530.0, 883.0, 1040.0, 932.0, 843.0, 895.0, 873.0, - 572.0, 768.0, 766.0, 779.0, 1059.0, 1064.0, 697.0, 774.0, 1106.0, - 1358.0, 1072.0, 578.0, 581.0, 512.0, 805.0, 905.0, 881.0, 1080.0, - 880.0, 1056.0, 1058.0, 1276.0, 1062.0, 766.0, 424.0, 766.0, 662.0, - 678.0, 446.0, 654.0, 634.0, 830.0, 704.0, 764.0, 828.0, 745.0, - 646.0, 322.0, 433.0, 559.0, 508.0, 356.0, 609.0, 716.0, 1396.0, - 984.0, 1080.0, 487.0, 603.0, 445.0, 578.0, 922.0, 960.0, 1364.0, - 1194.0, 1360.0, 864.0, 708.0, 880.0, 1088.0, 1001.0, 766.0, 465.0, - 412.0, 371.0, 513.0, 633.0, 751.0, 668.0, 1130.0, 1050.0, 1252.0, - 844.0, 1480.0, 1222.0, 1594.0, 814.0, 880.0, 456.0, 1056.0, 1188.0, - 1616.0, 1106.0, 926.0, 729.0, 669.0, 845.0, 797.0, 717.0, 865.0, - 661.0, 1063.0, 679.0, 706.0, 665.0, 845.0, 715.0, 663.0, 627.0, - 809.0, 603.0, 739.0, 661.0, 716.0, 413.0, 529.0, 509.0, 1004.0, - 781.0, 837.0, 346.0, 435.0, 367.0, 814.0, 873.0, 925.0, 626.0, - 599.0, 576.0, 432.0, 696.0, 723.0, 714.0, 615.0, 508.0, 659.0, - 947.0, 887.0, 1361.0, 1143.0, 1107.0, 489.0, 389.0, 829.0, 798.0, - 713.0, 511.0, 539.0, 912.0, 710.0, 1054.0, 648.0, 766.0, 370.0, - 852.0, 718.0, 733.0, 326.0, 597.0, 740.0, 947.0, 822.0, 712.0, - 808.0, 589.0, 624.0, 412.0, 501.0, 697.0, 519.0, 521.0, 509.0, - 763.0, 1439.0, 1128.0, 1161.0, 578.0, 719.0, 530.0, 793.0, 1055.0, - 1071.0, 1001.0, 886.0, 1072.0, 1090.0, 966.0, 1076.0, 1108.0, 1108.0, - 972.0, 656.0, 604.0, 372.0, 669.0, 719.0, 1123.0, 900.0, 1160.0, - 780.0, 936.0, 574.0, 1360.0, 1346.0, 1510.0, 672.0, 488.0, 430.0, - 870.0, 862.0, 836.0, 436.0, 488.0, 624.0, 646.0, 687.0, 439.0, - 393.0, 336.0, 371.0, 459.0, 355.0, 341.0, 344.0, 316.0, 243.0, - 412.0, 864.0, 1128.0, 1145.0, 1101.0, 937.0, 760.0, 373.0, 435.0, - 372.0, 595.0, 436.0, 483.0, 248.0, 373.0, 461.0, 884.0, 959.0, - 1027.0, 656.0, 539.0, 403.0, 348.0, 738.0, 1078.0, 1067.0, 786.0, - 640.0, 648.0, 678.0, 602.0, 1104.0, 1414.0, 1396.0, 697.0, 325.0, - 645.0, 624.0, 504.0, 350.0, 816.0, 1203.0, 1459.0, 1015.0, 688.0, - 596.0, 576.0, 1088.0, 820.0, 935.0, 532.0, 801.0, 930.0, 1129.0, - 872.0, 1052.0, 1092.0, 1078.0, 889.0, 729.0, 753.0, 789.0, 576.0, - 614.0, 498.0, 943.0, 1187.0, 1064.0, 1008.0, 990.0, 942.0, 586.0, - 606.0, 899.0, 1187.0, 837.0, 732.0, 534.0, 966.0, 866.0, 1118.0, - 842.0, 882.0, 786.0, 902.0, 866.0, 664.0, 733.0, 787.0, 1371.0, - 1058.0, 1462.0, 946.0, 1168.0, 868.0, 1150.0, 1212.0, 1147.0, 703.0, - 445.0, 373.0, 816.0, 788.0, 899.0, 804.0, 918.0, 956.0, 722.0, - 779.0, 533.0, 439.0, 368.0, 432.0, 674.0, 566.0, 467.0, 330.0, - 344.0, 575.0, 745.0, 1101.0, 971.0, 971.0, 901.0, 915.0, 762.0, - 460.0, 570.0, 943.0, 1075.0, 988.0, 527.0, 381.0, 317.0, 925.0, - 883.0, 995.0, 603.0, 657.0, 662.0, 391.0, 431.0, 330.0, 737.0, - 722.0, 586.0, 528.0, 652.0, 681.0, 645.0, 775.0, 1001.0, 1085.0, - 578.0, 369.0, 576.0, 665.0, 579.0, 328.0, 940.0, 1295.0, 1751.0, - 1117.0, 777.0, 737.0, 684.0, 1065.0, 901.0, 1039.0, 708.0, 945.0, - 1038.0, 1113.0, 596.0, 872.0, 1036.0, 1288.0, 1060.0, 995.0, 1077.0, - 1569.0, 1349.0, 1109.0, 409.0, 1142.0, 1283.0, 1447.0, 907.0, 878.0, - 1118.0, 945.0, 1053.0, 598.0, 787.0, 396.0, 605.0, 729.0, 1426.0, - 1190.0, 1008.0, 393.0, 535.0, 671.0, 942.0, 902.0, 1158.0, 1019.0, - 965.0, 1211.0, 1084.0, 1248.0, 602.0, 694.0, 734.0, 722.0, 1088.0, - 1371.0, 1225.0, 711.0, 391.0, 608.0, 596.0, 983.0, 1072.0, 1342.0, - 904.0, 652.0, 673.0, 532.0, 824.0, 650.0, 729.0, 599.0, 546.0, - 445.0, 304.0, 264.0, 745.0, 755.0, 1071.0, 637.0, 856.0, 968.0, - 988.0, 992.0, 722.0, 776.0, 901.0, 801.0, 850.0, 545.0, 623.0, - 491.0, 1041.0, 909.0, 1192.0, 974.0, 1066.0, 1118.0, 918.0, 906.0, - 585.0, 701.0, 660.0, 460.0, 466.0, 530.0, 490.0, 448.0, 650.0, - 815.0, 959.0, 553.0, 352.0, 397.0, 849.0, 850.0, 778.0, 1042.0, - 1346.0, 1795.0, 1053.0, 765.0, 717.0, 718.0, 961.0, 869.0, 1022.0, - 952.0, 1101.0, 1205.0, 1117.0, 620.0, 826.0, 731.0, 1251.0, 1069.0, - 1267.0, 1063.0, 1197.0, 984.0, 736.0, 349.0, 1027.0, 992.0, 1174.0, - 650.0, 829.0, 1095.0, 968.0, 938.0, 443.0, 698.0, 475.0, 699.0, - 799.0, 1608.0, 1486.0, 1255.0, 412.0, 376.0, 685.0, 1126.0, 1060.0, - 1154.0, 825.0, 869.0, 1005.0, 954.0, 1178.0, 672.0, 782.0, 828.0, - 866.0, 1378.0, 1501.0, 1501.0, 859.0, 693.0, 674.0, 842.0, 1379.0, - 1458.0, 1480.0, 853.0, 644.0, 896.0, 699.0, 1176.0, 741.0, 793.0, - 789.0, 816.0, 1088.0, 553.0, 615.0, 867.0, 1061.0, 1197.0, 623.0, - 588.0, 577.0, 601.0, 663.0, 636.0, 840.0, 1070.0, 878.0, 856.0, - 561.0, 953.0, 783.0, 1264.0, 890.0, 1200.0, 1286.0, 1606.0, 1588.0, - 1221.0, 966.0, 875.0, 654.0, 772.0, 509.0, 509.0, 685.0, 421.0, - 575.0, 788.0, 732.0, 808.0, 600.0, 592.0, 373.0, 640.0, 804.0, - 918.0, 755.0, 1393.0, 1416.0, 1044.0, 287.0, 440.0, 767.0, 974.0, - 991.0, 688.0, 678.0, 821.0, 1141.0, 1165.0, 840.0, 628.0, 402.0, - 998.0, 1036.0, 1605.0, 1185.0, 1549.0, 1004.0, 876.0, 393.0, 831.0, - 704.0, 970.0, 786.0, 845.0, 1156.0, 895.0, 1087.0, 467.0, 498.0, - 241.0, 301.0, 719.0, 1308.0, 1375.0, 1088.0, 487.0, 442.0, 793.0, - 868.0, 1134.0, 1082.0, 1048.0, 814.0, 673.0, 643.0, 670.0, 585.0, - 657.0, 834.0, 852.0, 1542.0, 1884.0, 2048.0, 1326.0, 1050.0, 918.0, - 1112.0, 1428.0, 1107.0, 991.0, 463.0, 627.0, 814.0, 706.0, 1163.0, - 841.0, 793.0, 557.0, 536.0, 948.0, 614.0, 684.0, 600.0, 762.0, - 890.0, 674.0, 590.0, 733.0, 603.0, 749.0, 586.0, 814.0, 696.0, - 500.0, 558.0, 515.0, 911.0, 711.0, 852.0, 500.0, 614.0, 980.0, - 1366.0, 1362.0, 1180.0, 987.0, 1413.0, 1411.0, 1328.0, 974.0, 1169.0, - 1113.0, 562.0, 500.0, 797.0, 1097.0, 1095.0, 1355.0, 1038.0, 1002.0, - 949.0, 1024.0, 947.0, 404.0, 953.0, 1002.0, 904.0, 250.0, 187.0, - 451.0, 497.0, 552.0, 268.0, 430.0, 391.0, 745.0, 869.0, 918.0, - 606.0, 289.0, 815.0, 941.0, 1498.0, 936.0, 832.0, 274.0, 446.0, - 462.0, 844.0, 666.0, 901.0, 801.0, 911.0, 810.0, 636.0, 653.0, - 391.0, 411.0, 308.0, 374.0, 453.0, 953.0, 1050.0, 855.0, 329.0, - 550.0, 956.0, 1040.0, 1010.0, 589.0, 695.0, 437.0, 637.0, 441.0, - 484.0, 621.0, 631.0, 950.0, 780.0, 1762.0, 1538.0, 1936.0, 1214.0, - 1232.0, 970.0, 1004.0, 1129.0, 970.0, 762.0, 487.0, 513.0, 622.0, - 838.0, 910.0, 743.0, 515.0, 632.0, 620.0, 1436.0, 1102.0, 1168.0, - 414.0, 574.0, 762.0, 788.0, 1020.0, 925.0, 789.0, 439.0, 384.0, - 760.0, 970.0, 982.0, 832.0, 538.0, 878.0, 710.0, 943.0, 424.0, - 405.0, 616.0, 1029.0, 1076.0, 796.0, 619.0, 1149.0, 1415.0, 1351.0, - 1086.0, 1205.0, 1445.0, 1007.0, 911.0, 440.0, 815.0, 1025.0, 1869.0, - 1420.0, 1198.0, 557.0, 608.0, 596.0, 399.0, 761.0, 713.0, 711.0, - 236.0, 179.0, 394.0, 415.0, 504.0, 224.0, 307.0, 491.0, 700.0, - 984.0, 902.0, 740.0, 674.0, 956.0, 966.0, 1276.0, 1022.0, 1066.0, - 456.0, 538.0, 578.0, 897.0, 675.0, 967.0, 886.0, 942.0, 732.0, - 658.0, 603.0, 593.0, 472.0, 439.0, 459.0, 495.0, 725.0, 544.0, - 464.0, 318.0, 587.0, 742.0, 630.0, 666.0, 609.0, 741.0, 449.0, - 797.0, 649.0, 638.0, 503.0, 807.0, 1258.0, 1004.0, 1398.0, 1542.0, - 1980.0, 1594.0, 1069.0, 991.0, 813.0, 873.0, 678.0, 898.0, 811.0, - 756.0, 765.0, 1225.0, 1142.0, 812.0, 342.0, 283.0, 404.0, 1102.0, - 1406.0, 1330.0, 610.0, 372.0, 388.0, 504.0, 1200.0, 1322.0, 1286.0, - 590.0, 460.0, 708.0, 952.0, 1122.0, 1028.0, 637.0, 621.0, 425.0, - 971.0, 712.0, 629.0, 172.0, 335.0, 904.0, 924.0, 1007.0, 875.0, - 1317.0, 1091.0, 1545.0, 1318.0, 1569.0, 971.0, 921.0, 396.0, 799.0, - 1129.0, 1829.0, 1446.0, 1080.0, 648.0, 547.0, 498.0, 294.0, 282.0, - 559.0, 941.0, 946.0, 592.0, 361.0, 342.0, 598.0, 377.0, 516.0, - 718.0, 761.0, 1021.0, 711.0, 742.0, 703.0, 821.0, 815.0, 762.0, - 714.0, 694.0, 374.0, 442.0, 828.0, 1453.0, 1487.0, 1377.0, 822.0, - 1042.0, 831.0, 967.0, 644.0, 741.0, 766.0, 743.0, 1195.0, 993.0, - 1113.0, 487.0, 611.0, 367.0, 697.0, 648.0, 690.0, 357.0, 363.0, - 423.0, 492.0, 859.0, 735.0, 1079.0, 764.0, 990.0, 862.0, 794.0, - 1102.0, 1266.0, 1506.0, 1190.0, 1049.0, 1075.0, 895.0, 759.0, 631.0, - 947.0, 1068.0, 988.0, 938.0, 1102.0, 1055.0, 940.0, 522.0, 503.0, - 426.0, 1132.0, 1432.0, 1442.0, 730.0, 508.0, 656.0, 634.0, 1286.0, - 1108.0, 1526.0, 1098.0, 1118.0, 936.0, 880.0, 1042.0, 866.0, 707.0, - 843.0, 789.0, 1081.0, 628.0, 597.0, 230.0, 331.0, 914.0, 863.0, - 788.0, 416.0, 645.0, 836.0, 1308.0, 990.0, 1135.0, 977.0, 1039.0, - 568.0, 491.0, 771.0, 1018.0, 941.0, 577.0, 420.0, 369.0, 352.0, - 334.0, 312.0, 546.0, 924.0, 923.0, 804.0, 588.0, 681.0, 1091.0, - 905.0, 936.0, 749.0, 782.0, 1306.0, 1165.0, 1300.0, 964.0, 1252.0, - 1094.0, 1230.0, 972.0, 924.0, 428.0, 334.0, 688.0, 1005.0, 1237.0, - 989.0, 826.0, 877.0, 798.0, 784.0, 485.0, 862.0, 1017.0, 1177.0, - 1511.0, 1292.0, 1100.0, 406.0, 686.0, 610.0, 896.0, 554.0, 538.0, - 481.0, 606.0, 535.0, 396.0, 622.0, 685.0, 1041.0, 750.0, 968.0, - 1168.0, 1196.0, 1114.0, 1302.0, 1446.0, 1270.0, 753.0, 825.0, 953.0, - 795.0, 553.0, 709.0, 810.0, 918.0, 942.0, 996.0, 962.0, 871.0, - 581.0, 559.0, 444.0, 674.0, 934.0, 980.0, 686.0, 382.0, 428.0, - 536.0, 1000.0, 826.0, 1151.0, 995.0, 989.0, 828.0, 988.0, 1071.0, - 951.0, 598.0, 685.0, 551.0, 810.0, 702.0, 914.0, 526.0, 482.0, - 749.0, 723.0, 695.0, 772.0, 1052.0, 1230.0, 1588.0, 1022.0, 851.0, - 625.0, 659.0, 768.0, 755.0, 827.0, 658.0, 773.0, 585.0, 560.0, - 301.0, 240.0, 484.0, 492.0, 802.0, 992.0, 1225.0, 1124.0, 867.0, - 744.0, 1030.0, 854.0, 903.0, 742.0, 800.0, 1194.0, 1009.0, 1164.0, - 606.0, 1082.0, 898.0, 1204.0, 874.0, 822.0, 476.0, 274.0, 614.0, - 1004.0, 1408.0, 1176.0, 1146.0, 1113.0, 1024.0, 720.0, 489.0, 650.0, - 892.0, 1092.0, 1524.0, 1301.0, 913.0, 331.0, 750.0, 682.0, 1012.0, - 606.0, 814.0, 699.0, 644.0, 527.0, 458.0, 500.0, 823.0, 1531.0, - 1596.0, 1372.0, 1052.0, 1020.0, 1010.0, 862.0, 870.0, 690.0, 1092.0, - 962.0, 1174.0, 881.0, 913.0, 989.0, 758.0, 1334.0, 1078.0, 944.0, - 416.0, 686.0, 566.0, 597.0, 299.0, 343.0, 728.0, 876.0, 798.0, - 392.0, 474.0, 544.0, 541.0, 357.0, 589.0, 858.0, 834.0, 900.0, - 1044.0, 1023.0, 753.0, 700.0, 1299.0, 1166.0, 761.0, 241.0, 478.0, - 506.0, 466.0, 385.0, 359.0, 415.0, 1028.0, 1181.0, 1255.0, 1039.0, - 777.0, 555.0, 686.0, 509.0, 619.0, 615.0, 647.0, 494.0, 532.0, - 617.0, 644.0, 404.0, 386.0, 548.0, 578.0, 693.0, 513.0, 949.0, - 1006.0, 1209.0, 1334.0, 1558.0, 1435.0, 1108.0, 835.0, 779.0, 871.0, - 806.0, 873.0, 529.0, 1341.0, 1226.0, 1458.0, 1010.0, 946.0, 666.0, - 118.0, 124.0, 362.0, 629.0, 747.0, 877.0, 725.0, 631.0, 321.0, - 474.0, 682.0, 791.0, 832.0, 1218.0, 1318.0, 1059.0, 789.0, 1064.0, - 1092.0, 1062.0, 688.0, 739.0, 717.0, 603.0, 577.0, 739.0, 811.0, - 1126.0, 1182.0, 1153.0, 1157.0, 1218.0, 1153.0, 829.0, 928.0, 1040.0, - 961.0, 851.0, 879.0, 1096.0, 1111.0, 1449.0, 1741.0, 1266.0, 1370.0, - 932.0, 1262.0, 762.0, 850.0, 458.0, 440.0, 292.0, 320.0, 1026.0, - 1152.0, 1138.0, 386.0, 348.0, 404.0, 467.0, 491.0, 493.0, 449.0, - 335.0, 623.0, 1072.0, 1021.0, 691.0, 430.0, 869.0, 942.0, 697.0, - 405.0, 474.0, 978.0, 902.0, 1065.0, 708.0, 1028.0, 1237.0, 1183.0, - 811.0, 811.0, 435.0, 621.0, 748.0, 699.0, 615.0, 669.0, 949.0, - 1173.0, 1433.0, 1150.0, 794.0, 791.0, 819.0, 1163.0, 720.0, 940.0, - 547.0, 991.0, 1233.0, 1674.0, 1759.0, 1334.0, 996.0, 709.0, 814.0, - 682.0, 494.0, 305.0, 454.0, 419.0, 1146.0, 1067.0, 1094.0, 806.0, - 830.0, 755.0, 275.0, 249.0, 546.0, 652.0, 708.0, 638.0, 696.0, - 578.0, 410.0, 478.0, 472.0, 427.0, 334.0, 920.0, 1096.0, 1011.0, - 805.0, 1128.0, 1104.0, 838.0, 814.0, 937.0, 911.0, 436.0, 351.0, - 705.0, 854.0, 1137.0, 1143.0, 1086.0, 1108.0, 923.0, 890.0, 531.0, - 742.0, 724.0, 1113.0, 1144.0, 1264.0, 977.0, 1010.0, 1394.0, 1838.0, - 1450.0, 1384.0, 902.0, 1122.0, 685.0, 807.0, 599.0, 588.0, 406.0, - 324.0, 1110.0, 1204.0, 1300.0, 826.0, 778.0, 610.0, 593.0, 711.0, - 902.0, 562.0, 395.0, 414.0, 529.0, 954.0, 729.0, 840.0, 912.0, - 1256.0, 1213.0, 721.0, 365.0, 687.0, 751.0, 1324.0, 948.0, 1190.0, - 822.0, 853.0, 509.0, 382.0, 655.0, 781.0, 914.0, 612.0, 444.0, - 374.0, 655.0, 1151.0, 1353.0, 1096.0, 607.0, 862.0, 884.0, 1161.0, - 821.0, 1445.0, 1512.0, 1555.0, 1457.0, 1738.0, 2186.0, 1733.0, 1209.0, - 913.0, 850.0, 660.0, 334.0, 349.0, 550.0, 532.0, 951.0, 838.0, - 793.0, 575.0, 639.0, 641.0, 322.0, 278.0, 473.0, 604.0, 576.0, - 754.0, 770.0, 772.0, 426.0, 432.0, 352.0, 761.0, 642.0, 1214.0, - 1165.0, 1213.0, 1069.0, 1253.0, 1402.0, 1059.0, 919.0, 906.0, 969.0, - 554.0, 321.0, 613.0, 674.0, 701.0, 443.0, 410.0, 604.0, 663.0, - 1046.0, 835.0, 1098.0, 703.0, 1099.0, 730.0, 875.0, 539.0, 571.0, - 959.0, 1325.0, 1268.0, 958.0, 583.0, 801.0, 814.0, 911.0, 771.0, - 537.0, 467.0, 349.0, 704.0, 726.0, 1196.0, 1134.0, 1516.0, 1452.0, - 1520.0, 1223.0, 963.0, 794.0, 664.0, 490.0, 477.0, 898.0, 841.0, - 624.0, 338.0, 702.0, 919.0, 939.0, 595.0, 1065.0, 913.0, 1492.0, - 932.0, 1130.0, 514.0, 533.0, 425.0, 1009.0, 1184.0, 1262.0, 938.0, - 609.0, 461.0, 380.0, 635.0, 1104.0, 1327.0, 1008.0, 432.0, 740.0, - 769.0, 1120.0, 764.0, 1248.0, 1827.0, 1611.0, 1645.0, 1234.0, 1498.0, - 1413.0, 1081.0, 1009.0, 572.0, 386.0, 314.0, 399.0, 653.0, 773.0, - 992.0, 1190.0, 1123.0, 977.0, 711.0, 601.0, 370.0, 335.0, 339.0, - 671.0, 794.0, 1119.0, 990.0, 816.0, 628.0, 386.0, 286.0, 574.0, - 674.0, 840.0, 600.0, 643.0, 607.0, 827.0, 926.0, 1039.0, 1395.0, - 1793.0, 1876.0, 1091.0, 803.0, 1035.0, 1122.0, 839.0, 459.0, 479.0, - 671.0, 747.0, 1105.0, 916.0, 1068.0, 565.0, 932.0, 600.0, 1025.0, - 752.0, 711.0, 411.0, 414.0, 513.0, 971.0, 977.0, 913.0, 742.0, - 805.0, 1139.0, 796.0, 694.0, 336.0, 364.0, 298.0, 812.0, 1056.0, - 1446.0, 1426.0, 1644.0, 1299.0, 887.0, 675.0, 747.0, 693.0, 535.0, - 930.0, 1039.0, 998.0, 792.0, 814.0, 953.0, 829.0, 671.0, 691.0, - 591.0, 866.0, 706.0, 593.0, 261.0, 278.0, 471.0, 1027.0, 1156.0, - 1010.0, 736.0, 363.0, 625.0, 523.0, 578.0, 457.0, 445.0, 362.0, - 178.0, 738.0, 755.0, 1368.0, 1188.0, 1700.0, 2132.0, 1766.0, 1384.0, - 694.0, 870.0, 1226.0, 1056.0, 1030.0, 606.0, 734.0, 585.0, 615.0, - 519.0, 755.0, 790.0, 1048.0, 1326.0, 1381.0, 986.0, 567.0, 536.0, - 518.0, 847.0, 1008.0, 1233.0, 1266.0, 1010.0, 914.0, 564.0, 480.0, - 384.0, 758.0, 968.0, 1142.0, 1326.0, 1102.0, 954.0, 662.0, 994.0, - 1051.0, 1125.0, 1327.0, 1435.0, 1031.0, 1003.0, 1662.0, 1674.0, 1238.0, - 816.0, 954.0, 1186.0, 744.0, 1128.0, 836.0, 974.0, 455.0, 534.0, - 889.0, 1306.0, 1269.0, 803.0, 497.0, 554.0, 481.0, 893.0, 1181.0, - 1253.0, 1229.0, 992.0, 1084.0, 524.0, 394.0, 244.0, 280.0, 294.0, - 692.0, 914.0, 1316.0, 1522.0, 1580.0, 1133.0, 597.0, 589.0, 736.0, - 670.0, 698.0, 624.0, 942.0, 716.0, 886.0, 505.0, 959.0, 1075.0, - 1228.0, 962.0, 634.0, 680.0, 513.0, 473.0, 617.0, 503.0, 698.0, - 1332.0, 1017.0, 723.0, 435.0, 382.0, 539.0, 429.0, 875.0, 903.0, - 961.0, 632.0, 477.0, 1065.0, 912.0, 1312.0, 1012.0, 1260.0, 1396.0, - 1254.0, 1066.0, 584.0, 456.0, 839.0, 847.0, 747.0, 332.0, 690.0, - 727.0, 1023.0, 663.0, 1008.0, 843.0, 1143.0, 1347.0, 1380.0, 975.0, - 497.0, 789.0, 791.0, 1158.0, 970.0, 1167.0, 818.0, 612.0, 486.0, - 528.0, 450.0, 520.0, 806.0, 986.0, 1046.0, 1246.0, 1213.0, 1040.0, - 652.0, 623.0, 834.0, 1128.0, 1282.0, 1283.0, 1091.0, 1282.0, 1993.0, - 2045.0, 1612.0, 1082.0, 852.0, 1090.0, 1040.0, 1560.0, 1552.0, 1338.0, - 756.0, 598.0, 1005.0, 1383.0, 1525.0, 1050.0, 786.0, 537.0, 447.0, - 624.0, 1077.0, 1211.0, 1450.0, 1119.0, 1231.0, 713.0, 564.0, 486.0, - 439.0, 551.0, 467.0, 704.0, 632.0, 705.0, 749.0, 687.0, 612.0, - 404.0, 694.0, 940.0, 1076.0, 880.0, 864.0, 778.0, 1086.0, 738.0, - 1184.0, 1372.0, 1574.0, 1044.0, 520.0, 361.0, 526.0, 484.0, 789.0, - 614.0, 671.0, 785.0, 597.0, 487.0, 495.0, 522.0, 787.0, 664.0, - 1108.0, 1179.0, 1378.0, 1238.0, 1026.0, 1474.0, 1118.0, 1392.0, 1164.0, - 1628.0, 1324.0, 1102.0, 744.0, 558.0, 498.0, 393.0, 373.0, 195.0, - 242.0, 702.0, 801.0, 1057.0, 670.0, 935.0, 622.0, 573.0, 801.0, - 1334.0, 1301.0, 869.0, 855.0, 838.0, 1167.0, 829.0, 835.0, 518.0, - 344.0, 438.0, 272.0, 310.0, 338.0, 1200.0, 1468.0, 1550.0, 1418.0, - 1389.0, 1236.0, 740.0, 717.0, 748.0, 678.0, 362.0, 339.0, 517.0, - 775.0, 1238.0, 1484.0, 1360.0, 1428.0, 958.0, 968.0, 754.0, 1284.0, - 1483.0, 1108.0, 570.0, 433.0, 1426.0, 1383.0, 1437.0, 591.0, 733.0, - 603.0, 513.0, 523.0, 837.0, 897.0, 1256.0, 924.0, 1082.0, 594.0, - 552.0, 444.0, 479.0, 701.0, 775.0, 1040.0, 871.0, 896.0, 692.0, - 593.0, 382.0, 223.0, 412.0, 672.0, 911.0, 1250.0, 1180.0, 993.0, - 1093.0, 1049.0, 1572.0, 1616.0, 1580.0, 999.0, 365.0, 278.0, 412.0, - 453.0, 685.0, 772.0, 742.0, 799.0, 583.0, 459.0, 393.0, 436.0, - 623.0, 567.0, 945.0, 1226.0, 1422.0, 1355.0, 1443.0, 1465.0, 1222.0, - 718.0, 752.0, 1316.0, 1444.0, 1772.0, 1362.0, 1166.0, 714.0, 401.0, - 749.0, 685.0, 1120.0, 1094.0, 1104.0, 1092.0, 769.0, 1047.0, 831.0, - 982.0, 866.0, 1366.0, 1166.0, 1050.0, 686.0, 792.0, 1081.0, 997.0, - 821.0, 474.0, 298.0, 330.0, 250.0, 184.0, 206.0, 1152.0, 1250.0, - 1222.0, 598.0, 1006.0, 1029.0, 813.0, 633.0, 677.0, 769.0, 413.0, - 396.0, 489.0, 551.0, 631.0, 905.0, 903.0, 989.0, 430.0, 450.0, - 626.0, 1184.0, 1475.0, 1044.0, 566.0, 285.0, 868.0, 953.0, 1141.0, - 805.0, 697.0, 484.0, 314.0, 432.0, 486.0, 462.0, 673.0, 578.0, - 824.0, 548.0, 794.0, 716.0, 889.0, 865.0, 1051.0, 847.0, 1164.0, - 1159.0, 1178.0, 683.0, 304.0, 267.0, 464.0, 826.0, 869.0, 1125.0, - 865.0, 1204.0, 1335.0, 1434.0, 1467.0, 1679.0, 1596.0, 1269.0, 437.0, - 374.0, 387.0, 563.0, 931.0, 1005.0, 939.0, 552.0, 470.0, 509.0, - 409.0, 435.0, 746.0, 670.0, 629.0, 583.0, 751.0, 1141.0, 1615.0, - 1511.0, 1222.0, 614.0, 986.0, 1628.0, 1764.0, 1884.0, 1388.0, 1134.0, - 658.0, 395.0, 797.0, 1247.0, 1658.0, 1468.0, 1284.0, 944.0, 747.0, - 1057.0, 1173.0, 1582.0, 1458.0, 1932.0, 1572.0, 1172.0, 666.0, 656.0, - 913.0, 853.0, 695.0, 344.0, 264.0, 404.0, 308.0, 346.0, 398.0, - 960.0, 989.0, 863.0, 767.0, 1084.0, 1140.0, 738.0, 646.0, 523.0, - 629.0, 389.0, 452.0, 429.0, 370.0, 374.0, 904.0, 909.0, 1009.0, - 390.0, 404.0, 274.0, 300.0, 307.0, 245.0, 319.0, 378.0, 868.0, - 911.0, 828.0, 594.0, 856.0, 1070.0, 900.0, 801.0, 615.0, 513.0, - 397.0, 413.0, 859.0, 655.0, 816.0, 694.0, 951.0, 872.0, 934.0, - 790.0, 1176.0, 1422.0, 1331.0, 858.0, 258.0, 387.0, 436.0, 490.0, - 841.0, 1243.0, 1343.0, 1172.0, 1075.0, 1132.0, 1351.0, 1393.0, 1230.0, - 791.0, 338.0, 350.0, 339.0, 506.0, 884.0, 917.0, 833.0, 405.0, - 386.0, 379.0, 390.0, 288.0, 523.0, 482.0, 645.0, 539.0, 639.0, - 773.0, 1455.0, 1395.0, 1206.0, 564.0, 874.0, 1328.0, 1666.0, 1896.0, - 1584.0, 1206.0, 614.0, 349.0, 677.0, 1565.0, 2376.0, 2194.0, 1900.0, - 1164.0, 996.0, 866.0, 1184.0, 1614.0, 1526.0, 1476.0, 1052.0, 728.0, - 904.0, 917.0, 1262.0, 888.0, 803.0, 492.0, 416.0, 514.0, 380.0, - 385.0, 459.0, 617.0, 583.0, 881.0, 1119.0, 1552.0, 1240.0, 978.0, - 718.0, 489.0, 567.0, 438.0, 579.0, 540.0, 464.0, 382.0, 644.0, - 683.0, 657.0, 262.0, 264.0, 252.0, 752.0, 721.0, 655.0, 363.0, - 438.0, 564.0, 582.0, 531.0, 575.0, 847.0, 1067.0, 919.0, 565.0, - 371.0, 507.0, 457.0, 794.0, 870.0, 794.0, 608.0, 722.0, 876.0, - 717.0, 543.0, 818.0, 1309.0, 1587.0, 1632.0, 1382.0, 1016.0, 872.0, - 692.0, 678.0, 830.0, 761.0, 763.0, 685.0, 910.0, 845.0, 1011.0, - 1119.0, 1152.0, 708.0, 373.0, 577.0, 543.0, 732.0, 1015.0, 942.0, - 870.0, 425.0, 622.0, 546.0, 501.0, 223.0, 395.0, 410.0, 779.0, - 613.0, 809.0, 776.0, 1158.0, 996.0, 808.0, 666.0, 930.0, 1059.0, - 1163.0, 1032.0, 917.0, 637.0, 444.0, 499.0, 499.0, 1293.0, 1544.0, - 1482.0, 1458.0, 1003.0, 987.0, 617.0, 838.0, 1116.0, 1058.0, 979.0, - 813.0, 597.0, 1024.0, 1091.0, 952.0, 504.0, 447.0, 574.0, 838.0, - 1056.0, 899.0, 690.0, 610.0, 681.0, 641.0, 897.0, 1173.0, 1324.0, - 1320.0, 1160.0, 1210.0, 1076.0, 1029.0, 820.0, 646.0, 791.0, 735.0, - 621.0, 749.0, 814.0, 882.0, 502.0, 392.0, 234.0, 634.0, 601.0, - 725.0, 465.0, 960.0, 910.0, 1264.0, 831.0, 953.0, 979.0, 1200.0, - 998.0, 600.0, 395.0, 515.0, 513.0, 946.0, 1114.0, 1358.0, 884.0, - 858.0, 454.0, 485.0, 605.0, 1141.0, 1208.0, 1150.0, 1125.0, 1262.0, - 930.0, 749.0, 544.0, 582.0, 772.0, 784.0, 768.0, 402.0, 662.0, - 920.0, 1158.0, 874.0, 544.0, 170.0, 191.0, 403.0, 487.0, 700.0, - 574.0, 609.0, 589.0, 392.0, 673.0, 591.0, 534.0, 179.0, 362.0, - 585.0, 940.0, 808.0, 780.0, 630.0, 1122.0, 990.0, 928.0, 598.0, - 530.0, 633.0, 712.0, 823.0, 820.0, 610.0, 877.0, 893.0, 868.0, - 816.0, 1019.0, 1061.0, 1257.0, 843.0, 749.0, 501.0, 696.0, 738.0, - 512.0, 441.0, 477.0, 641.0, 884.0, 1017.0, 1128.0, 864.0, 799.0, - 632.0, 1040.0, 1118.0, 941.0, 608.0, 416.0, 535.0, 456.0, 1202.0, - 1266.0, 1460.0, 1200.0, 1242.0, 1264.0, 1370.0, 1221.0, 1140.0, 682.0, - 951.0, 753.0, 671.0, 325.0, 413.0, 543.0, 533.0, 418.0, 226.0, - 690.0, 643.0, 842.0, 494.0, 937.0, 836.0, 1235.0, 855.0, 969.0, - 582.0, 708.0, 560.0, 450.0, 297.0, 331.0, 352.0, 807.0, 831.0, - 1196.0, 754.0, 630.0, 216.0, 230.0, 578.0, 1076.0, 1114.0, 878.0, - 1271.0, 1416.0, 1310.0, 783.0, 612.0, 674.0, 369.0, 395.0, 351.0, - 770.0, 1020.0, 1238.0, 1386.0, 1090.0, 782.0, 197.0, 417.0, 645.0, - 613.0, 595.0, 474.0, 606.0, 634.0, 557.0, 788.0, 877.0, 693.0, - 474.0, 389.0, 621.0, 736.0, 696.0, 660.0, 1030.0, 1326.0, 1242.0, - 778.0, 678.0, 928.0, 1089.0, 938.0, 727.0, 744.0, 878.0, 1061.0, - 1125.0, 976.0, 607.0, 334.0, 512.0, 709.0, 645.0, 393.0, 527.0, - 537.0, 707.0, 535.0, 779.0, 907.0, 983.0, 954.0, 874.0, 967.0, - 887.0, 777.0, 493.0, 913.0, 1001.0, 969.0, 577.0, 451.0, 490.0, - 558.0, 806.0, 1280.0, 1286.0, 1508.0, 1114.0, 1268.0, 1350.0, 1197.0, - 1481.0, 1063.0, 1452.0, 818.0, 720.0, 450.0, 489.0, 723.0, 689.0, - 650.0, 353.0, 295.0, 315.0, 532.0, 544.0, 1292.0, 1108.0, 1561.0, - 835.0, 1035.0, 708.0, 705.0, 508.0, 348.0, 586.0, 499.0, 522.0, - 355.0, 595.0, 1152.0, 1292.0, 1037.0, 690.0, 390.0, 837.0, 698.0, - 730.0, 654.0, 960.0, 1121.0, 765.0, 743.0, 568.0, 876.0, 539.0, - 603.0, 341.0, 681.0, 777.0, 1087.0, 1342.0, 1210.0, 822.0, 233.0, - 414.0, 441.0, 387.0, 299.0, 315.0, 503.0, 542.0, 786.0, 850.0, - 1128.0, 692.0, 695.0, 573.0, 957.0, 1200.0, 1220.0, 792.0, 1156.0, - 1396.0, 1544.0, 820.0, 554.0, 730.0, 1003.0, 1088.0, 1016.0, 911.0, - 927.0, 1191.0, 1287.0, 1162.0, 615.0, 418.0, 602.0, 527.0, 470.0, - 316.0, 624.0, 655.0, 861.0, 673.0, 1046.0, 1100.0, 1070.0, 984.0, - 626.0, 850.0, 914.0, 1108.0, 777.0, 633.0, 402.0, 813.0, 654.0, - 704.0, 380.0, 619.0, 902.0, 1340.0, 1266.0, 1149.0, 1063.0, 909.0, - 1078.0, 646.0, 1227.0, 1009.0, 1309.0, 624.0, 478.0, 360.0, 419.0, - 619.0, 901.0, 878.0, 715.0, 265.0, 543.0, 574.0, 682.0, 834.0, - 820.0, 813.0, 591.0, 611.0, 636.0, 577.0, 646.0, 525.0, 728.0, - 581.0, 551.0, 371.0, 977.0, 1183.0, 1309.0, 822.0, 920.0, 604.0, - 493.0, 222.0, 226.0, 404.0, 749.0, 991.0, 815.0, 850.0, 668.0, - 890.0, 529.0, 691.0, 567.0, 1239.0, 1107.0, 1109.0, 960.0, 898.0, - 763.0, 444.0, 651.0, 672.0, 410.0, 186.0, 312.0, 466.0, 547.0, - 862.0, 739.0, 1122.0, 818.0, 996.0, 692.0, 796.0, 1080.0, 1171.0, - 903.0, 1017.0, 1018.0, 1194.0, 554.0, 628.0, 902.0, 996.0, 1046.0, - 820.0, 792.0, 836.0, 846.0, 1122.0, 944.0, 973.0, 1037.0, 1205.0, - 850.0, 446.0, 312.0, 398.0, 423.0, 601.0, 1101.0, 1480.0, 1450.0, - 908.0, 834.0, 710.0, 686.0, 1128.0, 1244.0, 1259.0, 561.0, 264.0, - 655.0, 654.0, 788.0, 451.0, 738.0, 627.0, 1152.0, 890.0, 1269.0, - 1103.0, 1489.0, 1220.0, 852.0, 1013.0, 1101.0, 1163.0, 507.0, 633.0, - 683.0, 724.0, 864.0, 1172.0, 1250.0, 807.0, 505.0, 733.0, 684.0, - 610.0, 710.0, 714.0, 854.0, 642.0, 696.0, 580.0, 407.0, 464.0, - 329.0, 642.0, 591.0, 616.0, 538.0, 1192.0, 1368.0, 1496.0, 947.0, - 1079.0, 923.0, 776.0, 448.0, 236.0, 386.0, 403.0, 597.0, 633.0, - 952.0, 834.0, 978.0, 736.0, 818.0, 742.0, 1087.0, 1365.0, 1519.0, - 1386.0, 998.0, 529.0, 481.0, 572.0, 943.0, 689.0, 517.0, 387.0, - 773.0, 837.0, 1068.0, 1205.0, 1375.0, 1311.0, 1037.0, 959.0, 795.0, - 1139.0, 1247.0, 1087.0, 769.0, 1102.0, 1162.0, 936.0, 590.0, 740.0, - 708.0, 814.0, 672.0, 1136.0, 970.0, 1192.0, 1268.0, 1336.0, 1181.0, - 1073.0, 1485.0, 1218.0, 958.0, 580.0, 560.0, 444.0, 442.0, 876.0, - 994.0, 944.0, 530.0, 626.0, 894.0, 1041.0, 1747.0, 1637.0, 1676.0, - 834.0, 613.0, 755.0, 786.0, 792.0, 489.0, 536.0, 457.0, 562.0, - 376.0, 657.0, 999.0, 1417.0, 1298.0, 832.0, 925.0, 1067.0, 1099.0, - 569.0, 569.0, 923.0, 1338.0, 1392.0, 1508.0, 1172.0, 851.0, 567.0, - 731.0, 740.0, 588.0, 470.0, 506.0, 623.0, 759.0, 769.0, 544.0, - 338.0, 514.0, 504.0, 890.0, 706.0, 760.0, 573.0, 1131.0, 1129.0, - 1094.0, 619.0, 645.0, 797.0, 686.0, 610.0, 284.0, 558.0, 574.0, - 552.0, 412.0, 642.0, 758.0, 766.0, 589.0, 661.0, 779.0, 1418.0, - 1638.0, 1718.0, 1243.0, 899.0, 524.0, 531.0, 613.0, 913.0, 695.0, - 713.0, 780.0, 1134.0, 1111.0, 811.0, 896.0, 940.0, 1444.0, 1395.0, - 1399.0, 795.0, 651.0, 679.0, 1147.0, 897.0, 1254.0, 846.0, 781.0, - 422.0, 658.0, 620.0, 639.0, 327.0, 800.0, 938.0, 1100.0, 1440.0, - 1276.0, 1373.0, 1013.0, 1519.0, 1328.0, 1068.0, 652.0, 652.0, 692.0, - 534.0, 1030.0, 952.0, 1014.0, 578.0, 646.0, 975.0, 1043.0, 1545.0, - 1258.0, 1540.0, 966.0, 1080.0, 522.0, 596.0, 474.0, 521.0, 363.0, - 271.0, 480.0, 490.0, 968.0, 868.0, 1296.0, 832.0, 786.0, 894.0, - 1139.0, 1111.0, 544.0, 583.0, 965.0, 1368.0, 1452.0, 1164.0, 922.0, - 520.0, 631.0, 450.0, 507.0, 247.0, 344.0, 316.0, 440.0, 436.0, - 829.0, 706.0, 588.0, 416.0, 409.0, 893.0, 693.0, 806.0, 501.0, - 671.0, 595.0, 764.0, 873.0, 851.0, 955.0, 686.0, 738.0, 334.0, - 826.0, 1058.0, 1164.0, 830.0, 882.0, 862.0, 946.0, 699.0, 667.0, - 755.0, 932.0, 1714.0, 1674.0, 1625.0, 879.0, 627.0, 502.0, 624.0, - 842.0, 771.0, 927.0, 1033.0, 1308.0, 987.0, 833.0, 1194.0, 1484.0, - 2044.0, 1771.0, 1483.0, 829.0, 517.0, 510.0, 1380.0, 1296.0, 1608.0, - 805.0, 788.0, 537.0, 676.0, 583.0, 562.0, 306.0, 1000.0, 1190.0, - 1370.0, 1362.0, 1130.0, 1017.0, 477.0, 832.0, 957.0, 1029.0, 732.0, - 584.0, 580.0, 445.0, 549.0, 487.0, 596.0, 696.0, 978.0, 1205.0, - 1219.0, 1097.0, 790.0, 1328.0, 1386.0, 1588.0, 738.0, 574.0, 472.0, - 506.0, 352.0, 206.0, 221.0, 639.0, 971.0, 1008.0, 617.0, 575.0, - 539.0, 930.0, 969.0, 973.0, 567.0, 470.0, 1118.0, 1506.0, 1470.0, - 982.0, 651.0, 449.0, 358.0, 328.0, 453.0, 429.0, 710.0, 640.0, - 570.0, 320.0, 682.0, 775.0, 729.0, 586.0, 489.0, 907.0, 747.0, - 806.0, 340.0, 318.0, 415.0, 685.0, 942.0, 809.0, 707.0, 556.0, - 578.0, 401.0, 909.0, 1119.0, 1270.0, 766.0, 1160.0, 1378.0, 1349.0, - 726.0, 402.0, 547.0, 980.0, 1754.0, 1626.0, 1289.0, 827.0, 770.0, - 777.0, 551.0, 437.0, 319.0, 767.0, 1132.0, 1610.0, 1170.0, 474.0, - 867.0, 1165.0, 1709.0, 1531.0, 1323.0, 765.0, 446.0, 390.0, 1116.0, - 1488.0, 1398.0, 685.0, 384.0, 504.0, 481.0, 324.0, 216.0, 560.0, - 1106.0, 1383.0, 1427.0, 1315.0, 1384.0, 1092.0, 936.0, 813.0, 837.0, - 633.0, 472.0, 556.0, 704.0, 597.0, 589.0, 510.0, 600.0, 638.0, - 810.0, 845.0, 665.0, 418.0, 346.0, 890.0, 1164.0, 1602.0, 1020.0, - 820.0, 888.0, 848.0, 768.0, 229.0, 276.0, 692.0, 997.0, 926.0, - 503.0, 483.0, 573.0, 1128.0, 987.0, 921.0, 347.0, 606.0, 890.0, - 1006.0, 1114.0, 918.0, 1029.0, 574.0, 475.0, 301.0, 595.0, 941.0, - 1144.0, 1190.0, 1277.0, 1135.0, 1083.0, 801.0, 787.0, 582.0, 696.0, - 690.0, 634.0, 540.0, 415.0, 603.0, 542.0, 1183.0, 1427.0, 1384.0, - 831.0, 543.0, 859.0, 803.0, 909.0, 821.0, 1140.0, 986.0, 1290.0, - 1560.0, 1495.0, 953.0, 389.0, 526.0, 1114.0, 2278.0, 2173.0, 1731.0, - 867.0, 815.0, 763.0, 655.0, 639.0, 465.0, 717.0, 822.0, 1252.0, - 824.0, 581.0, 1065.0, 1381.0, 1695.0, 1074.0, 883.0, 507.0, 404.0, - 328.0, 696.0, 1292.0, 1386.0, 1029.0, 568.0, 563.0, 523.0, 498.0, - 423.0, 896.0, 1310.0, 1351.0, 1439.0, 1059.0, 1304.0, 792.0, 1312.0, - 1151.0, 1183.0, 783.0, 530.0, 642.0, 376.0, 631.0, 500.0, 531.0, - 361.0, 486.0, 639.0, 677.0, 493.0, 789.0, 858.0, 1314.0, 1125.0, - 1267.0, 1017.0, 932.0, 1164.0, 978.0, 860.0, 399.0, 371.0, 749.0, - 753.0, 750.0, 463.0, 792.0, 863.0, 1270.0, 970.0, 904.0, 312.0, - 612.0, 860.0, 1006.0, 1049.0, 1033.0, 1008.0, 615.0, 441.0, 452.0, - 734.0, 1132.0, 1340.0, 1384.0, 1424.0, 1318.0, 937.0, 613.0, 703.0, - 1164.0, 1288.0, 981.0, 941.0, 749.0, 702.0, 477.0, 511.0, 1013.0, - 1194.0, 1368.0, 871.0, 701.0, 713.0, 737.0, 721.0, 413.0, 604.0, - 674.0, 1174.0, 1660.0, 1775.0, 1236.0, 1000.0, 710.0, 1375.0, 2171.0, - 2173.0, 1691.0, 761.0, 1089.0, 1053.0, 895.0, 963.0, 822.0, 926.0, - 455.0, 872.0, 844.0, 1061.0, 1101.0, 1089.0, 1025.0, 414.0, 655.0, - 455.0, 460.0, 376.0, 322.0, 1144.0, 1140.0, 1254.0, 498.0, 397.0, - 305.0, 373.0, 892.0, 1354.0, 1548.0, 1129.0, 1193.0, 995.0, 1240.0, - 742.0, 1244.0, 1306.0, 1235.0, 1013.0, 571.0, 1024.0, 710.0, 959.0, - 552.0, 479.0, 391.0, 492.0, 533.0, 403.0, 250.0, 732.0, 909.0, - 1058.0, 671.0, 827.0, 941.0, 918.0, 1024.0, 884.0, 976.0, 595.0, - 522.0, 616.0, 954.0, 962.0, 870.0, 751.0, 980.0, 1300.0, 1048.0, - 814.0, 278.0, 402.0, 424.0, 550.0, 921.0, 1060.0, 1026.0, 665.0, - 650.0, 587.0, 753.0, 1025.0, 1084.0, 1316.0, 1840.0, 1976.0, 1444.0, - 832.0, 1240.0, 1704.0, 1631.0, 1032.0, 875.0, 762.0, 840.0, 694.0, - 582.0, 895.0, 1046.0, 1269.0, 1018.0, 764.0, 891.0, 710.0, 704.0, - 394.0, 571.0, 584.0, 634.0, 827.0, 1106.0, 1035.0, 1253.0, 882.0, - 1375.0, 2131.0, 2137.0, 1799.0, 709.0, 1164.0, 1068.0, 915.0, 1013.0, - 879.0, 842.0, 251.0, 275.0, 427.0, 991.0, 864.0, 932.0, 867.0, - 557.0, 644.0, 510.0, 381.0, 263.0, 184.0, 651.0, 1195.0, 1359.0, - 911.0, 377.0, 290.0, 714.0, 1642.0, 1936.0, 1481.0, 749.0, 585.0, - 675.0, 767.0, 661.0, 1031.0, 1125.0, 1084.0, 1299.0, 915.0, 1382.0, - 756.0, 1159.0, 672.0, 922.0, 693.0, 1204.0, 979.0, 805.0, 442.0, - 1127.0, 1279.0, 1246.0, 747.0, 583.0, 637.0, 530.0, 954.0, 932.0, - 1198.0, 876.0, 747.0, 975.0, 1286.0, 1276.0, 880.0, 1041.0, 1262.0, - 1142.0, 722.0, 702.0, 690.0, 498.0, 584.0, 714.0, 914.0, 937.0, - 671.0, 657.0, 954.0, 1011.0, 1269.0, 1209.0, 1468.0, 1172.0, 1376.0, - 1244.0, 1194.0, 782.0, 1202.0, 1606.0, 1289.0, 708.0, 649.0, 670.0, - 805.0, 387.0, 489.0, 439.0, 814.0, 1197.0, 1517.0, 1105.0, 808.0, - 386.0, 516.0, 376.0, 455.0, 470.0, 642.0, 723.0, 986.0, 811.0, - 1269.0, 916.0, 1149.0, 1549.0, 1650.0, 1526.0, 812.0, 1536.0, 1439.0, - 1374.0, 1054.0, 955.0, 866.0, 301.0, 329.0, 429.0, 916.0, 717.0, - 941.0, 747.0, 639.0, 423.0, 522.0, 343.0, 329.0, 512.0, 1123.0, - 1535.0, 1267.0, 705.0, 723.0, 821.0, 1503.0, 2355.0, 2416.0, 1729.0, - 597.0, 365.0, 675.0, 781.0, 811.0, 495.0, 597.0, 572.0, 1009.0, - 681.0, 1160.0, 872.0, 995.0, 533.0, 837.0, 914.0, 1396.0, 1058.0, - 832.0, 913.0, 1267.0, 1237.0, 642.0, 446.0, 440.0, 720.0, 494.0, - 994.0, 846.0, 1184.0, 722.0, 628.0, 817.0, 1223.0, 1221.0, 692.0, - 710.0, 1014.0, 1054.0, 802.0, 704.0, 694.0, 460.0, 512.0, 642.0, - 949.0, 848.0, 662.0, 1021.0, 1546.0, 1593.0, 1673.0, 1453.0, 1592.0, - 1006.0, 1238.0, 1076.0, 988.0, 605.0, 907.0, 914.0, 592.0, 270.0, - 392.0, 521.0, 1056.0, 1103.0, 1086.0, 663.0, 631.0, 869.0, 1167.0, - 971.0, 966.0, 596.0, 622.0, 426.0, 461.0, 441.0, 897.0, 822.0, - 934.0, 444.0, 642.0, 538.0, 924.0, 1254.0, 1386.0, 1116.0, 858.0, - 1374.0, 1369.0, 1282.0, 784.0, 611.0, 560.0, 437.0, 595.0, 501.0, - 640.0, 632.0, 808.0, 680.0, 955.0, 715.0, 928.0, 673.0, 662.0, - 689.0, 942.0, 1386.0, 1000.0, 793.0, 822.0, 1183.0, 1718.0, 2058.0, - 2014.0, 1433.0, 691.0, 411.0, 827.0, 1027.0, 1127.0, 857.0, 1057.0, - 939.0, 1362.0, 932.0, 1198.0, 588.0, 737.0, 683.0, 1081.0, 1062.0, - 1372.0, 902.0, 736.0, 828.0, 1246.0, 1326.0, 712.0, 688.0, 549.0, - 765.0, 641.0, 1586.0, 1413.0, 1341.0, 581.0, 580.0, 765.0, 687.0, - 641.0, 231.0, 565.0, 733.0, 821.0, 623.0, 707.0, 732.0, 512.0, - 910.0, 1204.0, 1333.0, 851.0, 807.0, 1284.0, 1918.0, 1712.0, 1804.0, - 1412.0, 1476.0, 1242.0, 1072.0, 938.0, 499.0, 298.0, 540.0, 630.0, - 598.0, 316.0, 627.0, 936.0, 1447.0, 1439.0, 1344.0, 916.0, 814.0, - 1035.0, 1210.0, 1066.0, 1002.0, 802.0, 654.0, 638.0, 648.0, 1045.0, - 1256.0, 1578.0, 1271.0, 950.0, 605.0, 498.0, 617.0, 601.0, 948.0, - 885.0, 922.0, 1190.0, 1081.0, 1203.0, 661.0, 532.0, 532.0, 502.0, - 616.0, 394.0, 990.0, 873.0, 1247.0, 720.0, 1059.0, 939.0, 1191.0, - 1029.0, 1028.0, 1095.0, 1403.0, 1229.0, 791.0, 464.0, 777.0, 1163.0, - 1414.0, 1378.0, 1234.0, 1050.0, 848.0, 684.0, 1034.0, 778.0, 838.0, - 534.0, 1014.0, 990.0, 1180.0, 824.0, 771.0, 649.0, 561.0, 896.0, - 1222.0, 1642.0, 1456.0, 811.0, 533.0, 977.0, 1123.0, 1083.0, 493.0, - 556.0, 449.0, 743.0, 787.0, 1288.0, 949.0, 1141.0, 717.0, 760.0, - 441.0, 713.0, 642.0, 720.0, 436.0, 639.0, 455.0, 396.0, 722.0, - 1003.0, 1044.0, 1234.0, 1184.0, 1420.0, 842.0, 1134.0, 1378.0, 1596.0, - 1658.0, 1414.0, 1190.0, 592.0, 1016.0, 906.0, 1122.0, 551.0, 594.0, - 682.0, 650.0, 1070.0, 781.0, 992.0, 1207.0, 1830.0, 1932.0, 1463.0, - 1272.0, 1002.0, 973.0, 606.0, 609.0, 659.0, 687.0, 564.0, 660.0, - 595.0, 964.0, 1039.0, 1458.0, 1173.0, 1002.0, 609.0, 477.0, 398.0, - 670.0, 920.0, 1029.0, 842.0, 838.0, 762.0, 648.0, 429.0, 323.0, - 355.0, 390.0, 481.0, 475.0, 1104.0, 1051.0, 1343.0, 875.0, 1164.0, - 1200.0, 1269.0, 1225.0, 1644.0, 1471.0, 1355.0, 697.0, 679.0, 535.0, - 442.0, 638.0, 570.0, 568.0, 898.0, 994.0, 1392.0, 882.0, 964.0, - 618.0, 661.0, 847.0, 1273.0, 1238.0, 1240.0, 932.0, 933.0, 729.0, - 815.0, 1132.0, 1454.0, 1520.0, 1648.0, 1115.0, 949.0, 614.0, 528.0, - 474.0, 465.0, 932.0, 835.0, 715.0, 519.0, 944.0, 885.0, 1071.0, - 783.0, 792.0, 480.0, 646.0, 584.0, 753.0, 483.0, 591.0, 315.0, - 244.0, 940.0, 1297.0, 1352.0, 1280.0, 1140.0, 1276.0, 692.0, 1006.0, - 893.0, 1429.0, 1523.0, 1506.0, 939.0, 328.0, 1082.0, 1287.0, 1502.0, - 887.0, 681.0, 807.0, 762.0, 1129.0, 860.0, 957.0, 1118.0, 1437.0, - 1484.0, 1083.0, 1286.0, 1486.0, 1437.0, 915.0, 656.0, 619.0, 560.0, - 397.0, 542.0, 545.0, 853.0, 598.0, 1393.0, 1055.0, 1324.0, 521.0, - 474.0, 271.0, 563.0, 840.0, 933.0, 774.0, 692.0, 546.0, 472.0, - 355.0, 341.0, 319.0, 264.0, 409.0, 415.0, 757.0, 800.0, 1098.0, - 884.0, 890.0, 996.0, 1141.0, 1221.0, 1591.0, 1530.0, 1519.0, 841.0, - 633.0, 592.0, 564.0, 554.0, 383.0, 323.0, 760.0, 1159.0, 1549.0, - 1324.0, 914.0, 608.0, 735.0, 1101.0, 1673.0, 1396.0, 1190.0, 713.0, - 772.0, 842.0, 932.0, 961.0, 1257.0, 1286.0, 1562.0, 1181.0, 1040.0, - 745.0, 519.0, 354.0, 381.0, 868.0, 1066.0, 936.0, 552.0, 392.0, - 384.0, 858.0, 932.0, 834.0, 336.0, 562.0, 630.0, 898.0, 536.0, - 676.0, 398.0, 391.0, 916.0, 1382.0, 1489.0, 1088.0, 608.0, 690.0, - 504.0, 782.0, 707.0, 1017.0, 1295.0, 1286.0, 1059.0, 454.0, 694.0, - 997.0, 1290.0, 1160.0, 760.0, 482.0, 329.0, 911.0, 1020.0, 970.0, - 724.0, 915.0, 1074.0, 653.0, 987.0, 1251.0, 1450.0, 837.0, 510.0, - 349.0, 372.0, 376.0, 289.0, 320.0, 313.0, 342.0, 905.0, 847.0, - 932.0, 318.0, 450.0, 506.0, 830.0, 709.0, 539.0, 303.0, 620.0, - 730.0, 710.0, 637.0, 551.0, 543.0, 272.0, 434.0, 648.0, 989.0, - 834.0, 872.0, 866.0, 792.0, 842.0, 836.0, 1058.0, 1564.0, 1450.0, - 1267.0, 813.0, 709.0, 756.0, 818.0, 784.0, 639.0, 477.0, 714.0, - 1021.0, 1131.0, 1342.0, 1000.0, 926.0, 925.0, 1313.0, 1691.0, 1102.0, - 860.0, 553.0, 985.0, 1075.0, 1176.0, 952.0, 812.0, 550.0, 762.0, - 798.0, 1021.0, 638.0, 902.0, 647.0, 806.0, 958.0, 1304.0, 1178.0, - 682.0, 552.0, 585.0, 575.0, 681.0, 548.0, 448.0, 214.0, 316.0, - 424.0, 606.0, 774.0, 1094.0, 834.0, 1043.0, 941.0, 1097.0, 684.0, - 708.0, 532.0, 566.0, 676.0, 1221.0, 1643.0, 1385.0, 930.0, 589.0, - 346.0, 334.0, 923.0, 1166.0, 1268.0, 816.0, 600.0, 919.0, 933.0, - 1067.0, 575.0, 353.0, 433.0, 542.0, 453.0, 915.0, 1163.0, 1448.0, - 1105.0, 837.0, 658.0, 549.0, 505.0, 417.0, 275.0, 493.0, 471.0, - 1077.0, 977.0, 1106.0, 516.0, 565.0, 689.0, 605.0, 419.0, 207.0, - 311.0, 616.0, 746.0, 702.0, 676.0, 712.0, 715.0, 651.0, 688.0, - 931.0, 932.0, 748.0, 970.0, 773.0, 735.0, 571.0, 600.0, 896.0, - 1452.0, 1810.0, 1775.0, 1517.0, 921.0, 964.0, 816.0, 814.0, 524.0, - 396.0, 445.0, 893.0, 791.0, 1304.0, 946.0, 1344.0, 1233.0, 1545.0, - 1587.0, 1058.0, 934.0, 552.0, 924.0, 1072.0, 1017.0, 1221.0, 873.0, - 868.0, 250.0, 285.0, 493.0, 571.0, 940.0, 874.0, 845.0, 805.0, - 999.0, 1042.0, 778.0, 650.0, 633.0, 661.0, 669.0, 572.0, 480.0, - 616.0, 841.0, 1139.0, 1047.0, 1118.0, 1128.0, 912.0, 1005.0, 1313.0, - 1421.0, 1092.0, 1200.0, 1170.0, 1210.0, 862.0, 1380.0, 1364.0, 1244.0, - 586.0, 732.0, 341.0, 437.0, 635.0, 938.0, 964.0, 776.0, 584.0, - 1219.0, 1325.0, 1294.0, 543.0, 345.0, 340.0, 568.0, 696.0, 1268.0, - 1164.0, 1152.0, 1202.0, 1080.0, 1152.0, 914.0, 1018.0, 805.0, 416.0, - 574.0, 762.0, 1471.0, 1327.0, 1080.0, 592.0, 706.0, 1034.0, 796.0, - 609.0, 273.0, 341.0, 456.0, 620.0, 565.0, 599.0, 723.0, 731.0, - 843.0, 806.0, 1091.0, 1043.0, 912.0, 1166.0, 999.0, 1211.0, 889.0, - 511.0, 513.0, 1159.0, 1896.0, 1850.0, 1704.0, 1044.0, 860.0, 828.0, - 1262.0, 1227.0, 1367.0, 1345.0, 1770.0, 1185.0, 1157.0, 791.0, 1194.0, - 935.0, 1039.0, 663.0, 438.0, 323.0, 574.0, 837.0, 1344.0, 1068.0, - 1377.0, 861.0, 954.0, 312.0, 427.0, 536.0, 604.0, 885.0, 932.0, - 911.0, 640.0, 584.0, 693.0, 596.0, 611.0, 518.0, 942.0, 874.0, - 911.0, 501.0, 752.0, 1055.0, 1319.0, 1191.0, 937.0, 1037.0, 814.0, - 1515.0, 1565.0, 1666.0, 1010.0, 1454.0, 1435.0, 1336.0, 968.0, 1421.0, - 1532.0, 1676.0, 1058.0, 942.0, 577.0, 695.0, 819.0, 690.0, 726.0, - 718.0, 592.0, 1432.0, 1344.0, 1249.0, 349.0, 324.0, 377.0, 474.0, - 717.0, 1238.0, 1185.0, 1379.0, 1520.0, 1416.0, 1560.0, 1416.0, 1512.0, - 1001.0, 410.0, 458.0, 607.0, 1218.0, 1242.0, 1008.0, 700.0, 630.0, - 888.0, 591.0, 603.0, 323.0, 698.0, 634.0, 728.0, 425.0, 379.0, - 571.0, 555.0, 885.0, 858.0, 1243.0, 641.0, 824.0, 1058.0, 1003.0, - 1127.0, 833.0, 805.0, 563.0, 1151.0, 1762.0, 1718.0, 1488.0, 776.0, - 641.0, 625.0, 955.0, 959.0, 1359.0, 1459.0, 1866.0, 1129.0, 763.0, - 385.0, 1032.0, 901.0, 1063.0, 529.0, 530.0, 382.0, 513.0, 526.0, - 824.0, 1026.0, 1449.0, 1152.0, 837.0, 417.0, 655.0, 528.0, 636.0, - 533.0, 832.0, 661.0, 630.0, 426.0, 511.0, 389.0, 356.0, 332.0, - 801.0, 761.0, 949.0, 553.0, 924.0, 1064.0, 1332.0, 1232.0, 875.0, - 641.0, 416.0, 1067.0, 1381.0, 1396.0, 882.0, 1182.0, 1237.0, 1466.0, - 914.0, 947.0, 978.0, 1430.0, 1240.0, 1214.0, 1033.0, 1205.0, 807.0, - 400.0, 242.0, 650.0, 742.0, 1480.0, 1280.0, 1133.0, 465.0, 342.0, - 359.0, 532.0, 717.0, 842.0, 729.0, 1023.0, 1280.0, 1180.0, 1230.0, - 1210.0, 1297.0, 905.0, 586.0, 695.0, 760.0, 1175.0, 1175.0, 914.0, - 722.0, 630.0, 820.0, 535.0, 571.0, 777.0, 1070.0, 970.0, 622.0, - 341.0, 286.0, 372.0, 485.0, 706.0, 757.0, 946.0, 467.0, 948.0, - 1020.0, 1552.0, 1568.0, 1432.0, 1023.0, 483.0, 576.0, 1037.0, 985.0, - 1264.0, 628.0, 817.0, 770.0, 1154.0, 947.0, 1304.0, 1241.0, 1663.0, - 1400.0, 1143.0, 661.0, 652.0, 844.0, 926.0, 624.0, 644.0, 856.0, - 1290.0, 933.0, 884.0, 796.0, 877.0, 663.0, 353.0, 369.0, 710.0, - 586.0, 644.0, 646.0, 872.0, 722.0, 515.0, 357.0, 597.0, 449.0, - 604.0, 492.0, 1027.0, 853.0, 1213.0, 737.0, 796.0, 568.0, 900.0, - 1048.0, 782.0, 682.0, 426.0, 905.0, 721.0, 931.0, 806.0, 1002.0, - 775.0, 932.0, 1024.0, 1075.0, 1106.0, 1526.0, 1482.0, 1312.0, 998.0, - 1104.0, 1106.0, 700.0, 595.0, 751.0, 803.0, 1232.0, 948.0, 918.0, - 590.0, 391.0, 373.0, 312.0, 361.0, 292.0, 253.0, 995.0, 1056.0, - 1050.0, 617.0, 801.0, 854.0, 647.0, 583.0, 794.0, 797.0, 690.0, - 660.0, 775.0, 886.0, 798.0, 724.0, 527.0, 527.0, 861.0, 1216.0, - 1106.0, 535.0, 221.0, 244.0, 294.0, 454.0, 1011.0, 993.0, 1262.0, - 511.0, 903.0, 981.0, 1393.0, 1378.0, 1250.0, 1292.0, 1138.0, 1249.0, - 1039.0, 745.0, 940.0, 636.0, 835.0, 716.0, 1108.0, 819.0, 808.0, - 353.0, 839.0, 1078.0, 1165.0, 735.0, 674.0, 842.0, 1312.0, 1200.0, - 1260.0, 1023.0, 1181.0, 840.0, 585.0, 631.0, 1184.0, 1241.0, 1073.0, - 625.0, 738.0, 436.0, 488.0, 694.0, 882.0, 856.0, 644.0, 496.0, - 564.0, 545.0, 791.0, 785.0, 714.0, 874.0, 1158.0, 1376.0, 1396.0, - 1032.0, 1110.0, 1214.0, 1085.0, 899.0, 380.0, 448.0, 362.0, 777.0, - 918.0, 960.0, 506.0, 825.0, 889.0, 1025.0, 1238.0, 1398.0, 1251.0, - 935.0, 1045.0, 1150.0, 1092.0, 744.0, 631.0, 881.0, 857.0, 1080.0, - 698.0, 654.0, 504.0, 344.0, 312.0, 300.0, 316.0, 274.0, 231.0, - 745.0, 747.0, 738.0, 153.0, 375.0, 549.0, 732.0, 700.0, 832.0, - 844.0, 625.0, 567.0, 628.0, 810.0, 704.0, 1003.0, 841.0, 789.0, - 867.0, 1060.0, 949.0, 352.0, 170.0, 229.0, 251.0, 405.0, 999.0, - 948.0, 915.0, 577.0, 951.0, 875.0, 1305.0, 1277.0, 1549.0, 1407.0, - 1516.0, 1077.0, 1033.0, 793.0, 1008.0, 671.0, 789.0, 720.0, 1179.0, - 1001.0, 790.0, 403.0, 1143.0, 1606.0, 1719.0, 1291.0, 1144.0, 1248.0, - 1284.0, 1172.0, 1390.0, 1166.0, 1570.0, 1066.0, 770.0, 242.0, 745.0, - 1025.0, 1393.0, 952.0, 898.0, 506.0, 726.0, 684.0, 806.0, 637.0, - 503.0, 339.0, 480.0, 708.0, 1042.0, 940.0, 728.0, 1024.0, 1154.0, - 1536.0, 1420.0, 1230.0, 1600.0, 1349.0, 1274.0, 616.0, 392.0, 406.0, - 286.0, 657.0, 996.0, 1120.0, 798.0, 533.0, 603.0, 845.0, 1158.0, - 1328.0, 1247.0, 889.0, 856.0, 673.0, 989.0, 763.0, 752.0, 546.0, - 503.0, 464.0, 306.0, 256.0, 302.0, 310.0, 311.0, 599.0, 639.0, - 574.0, 303.0, 737.0, 959.0, 854.0, 319.0, 339.0, 547.0, 700.0, - 602.0, 860.0, 959.0, 780.0, 377.0, 445.0, 569.0, 611.0, 918.0, - 834.0, 812.0, 520.0, 839.0, 783.0, 530.0, 322.0, 824.0, 782.0, - 1176.0, 1191.0, 1306.0, 936.0, 580.0, 637.0, 559.0, 1045.0, 1145.0, - 1429.0, 1281.0, 1488.0, 1054.0, 646.0, 418.0, 564.0, 613.0, 502.0, - 493.0, 912.0, 1061.0, 1056.0, 674.0, 1398.0, 1483.0, 1635.0, 1225.0, - 1230.0, 1018.0, 1436.0, 1444.0, 1798.0, 1014.0, 1078.0, 711.0, 602.0, - 480.0, 1024.0, 1285.0, 1497.0, 1122.0, 1078.0, 688.0, 800.0, 764.0, - 842.0, 689.0, 513.0, 459.0, 405.0, 586.0, 670.0, 755.0, 516.0, - 1188.0, 1093.0, 1505.0, 1179.0, 1225.0, 1369.0, 1264.0, 1263.0, 711.0, - 390.0, 720.0, 696.0, 928.0, 730.0, 946.0, 750.0, 529.0, 373.0, - 625.0, 966.0, 1084.0, 961.0, 665.0, 1050.0, 847.0, 813.0, 425.0, - 457.0, 655.0, 616.0, 1004.0, 788.0, 742.0, 608.0, 614.0, 959.0, - 1009.0, 1009.0, 996.0, 739.0, 1125.0, 1031.0, 938.0, 702.0, 534.0, - 750.0, 638.0, 486.0, 750.0, 783.0, 797.0, 376.0, 700.0, 715.0, - 717.0, 736.0, 704.0, 644.0, 464.0, 769.0, 766.0, 1040.0, 1116.0, - 1595.0, 1173.0, 1231.0, 772.0, 1188.0, 734.0, 463.0, 493.0, 415.0, - 907.0, 838.0, 1227.0, 809.0, 848.0, 362.0, 506.0, 594.0, 712.0, - 613.0, 460.0, 307.0, 390.0, 945.0, 1608.0, 1562.0, 1487.0, 1159.0, - 1243.0, 1228.0, 1390.0, 1246.0, 1230.0, 1142.0, 1494.0, 1198.0, 1082.0, - 662.0, 510.0, 386.0, 577.0, 927.0, 1081.0, 1218.0, 1062.0, 1082.0, - 1036.0, 992.0, 784.0, 503.0, 579.0, 557.0, 837.0, 742.0, 850.0, - 719.0, 539.0, 843.0, 780.0, 953.0, 491.0, 432.0, 810.0, 889.0, - 1165.0, 631.0, 646.0, 778.0, 732.0, 896.0, 824.0, 1166.0, 1146.0, - 834.0, 638.0, 590.0, 602.0, 656.0, 740.0, 820.0, 807.0, 522.0, - 432.0, 384.0, 426.0, 656.0, 661.0, 1024.0, 791.0, 768.0, 625.0, - 591.0, 928.0, 1263.0, 1270.0, 1288.0, 976.0, 1412.0, 1508.0, 1620.0, - 1392.0, 1034.0, 754.0, 738.0, 554.0, 886.0, 761.0, 871.0, 399.0, - 836.0, 924.0, 963.0, 843.0, 665.0, 589.0, 697.0, 1267.0, 1357.0, - 1347.0, 1155.0, 1579.0, 1173.0, 1333.0, 827.0, 1275.0, 1083.0, 227.0, - 170.0, 272.0, 861.0, 888.0, 925.0, 423.0, 470.0, 452.0, 658.0, - 882.0, 984.0, 930.0, 553.0, 710.0, 770.0, 1305.0, 1764.0, 1716.0, - 1284.0, 840.0, 728.0, 906.0, 914.0, 878.0, 1214.0, 1234.0, 1870.0, - 1678.0, 1466.0, 835.0, 577.0, 516.0, 615.0, 705.0, 724.0, 816.0, - 654.0, 808.0, 544.0, 681.0, 477.0, 503.0, 558.0, 810.0, 937.0, - 784.0, 788.0, 871.0, 800.0, 784.0, 633.0, 597.0, 271.0, 246.0, - 398.0, 520.0, 928.0, 1206.0, 1248.0, 1316.0, 1224.0, 1335.0, 1159.0, - 1083.0, 996.0, 1024.0, 974.0, 798.0, 582.0, 934.0, 995.0, 925.0, - 903.0, 923.0, 1147.0, 1023.0, 917.0, 840.0, 498.0, 935.0, 817.0, - 828.0, 679.0, 619.0, 1163.0, 1152.0, 1281.0, 1067.0, 1117.0, 1301.0, - 1280.0, 1416.0, 1262.0, 1062.0, 914.0, 1232.0, 999.0, 808.0, 357.0, - 512.0, 414.0, 1225.0, 1713.0, 1810.0, 1618.0, 1422.0, 1536.0, 1400.0, - 1618.0, 1477.0, 1367.0, 1057.0, 1077.0, 706.0, 564.0, 514.0, 835.0, - 1119.0, 563.0, 616.0, 454.0, 747.0, 778.0, 886.0, 460.0, 406.0, - 445.0, 705.0, 944.0, 876.0, 876.0, 501.0, 772.0, 772.0, 1287.0, - 1566.0, 1734.0, 1200.0, 704.0, 262.0, 632.0, 738.0, 832.0, 646.0, - 721.0, 1209.0, 1313.0, 1222.0, 1182.0, 1140.0, 873.0, 625.0, 939.0, - 1154.0, 1010.0, 532.0, 660.0, 512.0, 501.0, 361.0, 381.0, 573.0, - 805.0, 975.0, 964.0, 1292.0, 1342.0, 1144.0, 560.0, 386.0, 380.0, - 300.0, 295.0, 409.0, 377.0, 1124.0, 1492.0, 1650.0, 1066.0, 1086.0, - 1169.0, 1271.0, 1071.0, 1208.0, 1490.0, 1414.0, 1018.0, 510.0, 762.0, - 871.0, 883.0, 743.0, 981.0, 1225.0, 1311.0, 943.0, 712.0, 630.0, - 983.0, 1013.0, 768.0, 384.0, 312.0, 410.0, 736.0, 853.0, 1029.0, - 1081.0, 1221.0, 1184.0, 1500.0, 1330.0, 1230.0, 806.0, 1238.0, 1139.0, - 917.0, 464.0, 429.0, 602.0, 1017.0, 1517.0, 1406.0, 1690.0, 1484.0, - 1660.0, 1572.0, 1800.0, 1852.0, 1099.0, 507.0, 298.0, 418.0, 966.0, - 944.0, 839.0, 719.0, 1172.0, 1202.0, 1308.0, 1143.0, 1393.0, 1025.0, - 643.0, 437.0, 981.0, 1019.0, 1614.0, 1108.0, 1231.0, 434.0, 941.0, - 838.0, 1043.0, 648.0, 1040.0, 901.0, 780.0, 263.0, 586.0, 607.0, - 628.0, 394.0, 649.0, 1281.0, 1383.0, 1296.0, 1440.0, 1564.0, 1284.0, - 720.0, 732.0, 806.0, 638.0, 302.0, 348.0, 310.0, 323.0, 351.0, - 349.0, 305.0, 577.0, 771.0, 874.0, 1098.0, 1124.0, 1047.0, 499.0, - 293.0, 612.0, 644.0, 776.0, 766.0, 658.0, 1142.0, 1678.0, 1744.0, - 1330.0, 990.0, 993.0, 984.0, 665.0, 850.0, 1255.0, 1194.0, 940.0, - 720.0, 1074.0, 1111.0, 723.0, 567.0, 978.0, 1346.0, 1397.0, 913.0, - 602.0, 545.0, 907.0, 1150.0, 1106.0, 755.0, 579.0, 489.0, 529.0, - 633.0, 1192.0, 1325.0, 1269.0, 635.0, 725.0, 811.0, 712.0, 644.0, - 796.0, 911.0, 723.0, 443.0, 566.0, 772.0, 1118.0, 1288.0, 1132.0, - 1200.0, 1236.0, 1470.0, 1382.0, 1110.0, 1260.0, 867.0, 654.0, 592.0, - 800.0, 1229.0, 926.0, 741.0, 431.0, 1552.0, 1499.0, 1843.0, 1299.0, - 1714.0, 996.0, 712.0, 341.0, 883.0, 774.0, 1108.0, 638.0, 759.0, - 275.0, 454.0, 360.0, 437.0, 262.0, 588.0, 774.0, 775.0, 442.0, - 364.0, 395.0, 381.0, 291.0, 934.0, 1139.0, 1077.0, 622.0, 1108.0, - 1676.0, 1430.0, 1414.0, 1224.0, 1364.0, 768.0, 526.0, 572.0, 580.0, - 868.0, 1084.0, 980.0, 535.0, 271.0, 705.0, 900.0, 1364.0, 1174.0, - 1108.0, 536.0, 386.0, 710.0, 846.0, 876.0, 688.0, 540.0, 880.0, - 1056.0, 1066.0, 722.0, 473.0, 611.0, 516.0, 414.0, 562.0, 837.0, - 814.0, 556.0, 984.0, 1064.0, 1236.0, 600.0, 452.0, 586.0, 690.0, - 785.0, 449.0, 346.0, 601.0, 1257.0, 1612.0, 1624.0, 1239.0, 1131.0, - 843.0, 661.0, 418.0, 1238.0, 1191.0, 1319.0, 546.0, 711.0, 799.0, - 718.0, 529.0, 382.0, 566.0, 495.0, 476.0, 566.0, 841.0, 773.0, - 515.0, 381.0, 738.0, 766.0, 824.0, 735.0, 853.0, 1060.0, 1042.0, - 999.0, 960.0, 941.0, 1214.0, 1079.0, 919.0, 516.0, 1535.0, 1415.0, - 1644.0, 1044.0, 1407.0, 696.0, 572.0, 355.0, 1011.0, 990.0, 1390.0, - 746.0, 694.0, 209.0, 477.0, 474.0, 436.0, 390.0, 493.0, 725.0, - 978.0, 932.0, 1072.0, 1141.0, 1053.0, 763.0, 927.0, 1105.0, 1051.0, - 542.0, 703.0, 1079.0, 885.0, 1097.0, 831.0, 1148.0, 692.0, 618.0, - 473.0, 525.0, 937.0, 1228.0, 1100.0, 604.0, 404.0, 896.0, 1126.0, - 1110.0, 918.0, 802.0, 678.0, 548.0, 970.0, 1024.0, 1146.0, 870.0, - 760.0, 500.0, 630.0, 596.0, 594.0, 399.0, 601.0, 586.0, 484.0, - 336.0, 385.0, 394.0, 600.0, 1290.0, 1247.0, 1563.0, 989.0, 914.0, - 608.0, 622.0, 617.0, 363.0, 550.0, 537.0, 845.0, 818.0, 1076.0, - 1424.0, 1530.0, 1364.0, 745.0, 396.0, 1042.0, 1125.0, 1295.0, 742.0, - 595.0, 503.0, 349.0, 668.0, 667.0, 717.0, 373.0, 205.0, 400.0, - 544.0, 759.0, 661.0, 648.0, 1001.0, 1027.0, 1061.0, 732.0, 1150.0, - 1164.0, 1257.0, 1016.0, 1221.0, 1111.0, 862.0, 803.0, 673.0, 610.0, - 1699.0, 1215.0, 1580.0, 1062.0, 1245.0, 537.0, 373.0, 712.0, 1333.0, - 1482.0, 1054.0, 408.0, 241.0, 214.0, 272.0, 254.0, 188.0, 600.0, - 647.0, 827.0, 997.0, 1020.0, 1226.0, 1080.0, 1018.0, 946.0, 801.0, - 898.0, 878.0, 672.0, 737.0, 957.0, 910.0, 1306.0, 1142.0, 1372.0, - 876.0, 656.0, 419.0, 955.0, 1337.0, 1686.0, 1104.0, 602.0, 522.0, - 779.0, 1009.0, 905.0, 846.0, 734.0, 986.0, 910.0, 980.0, 580.0, - 712.0, 1016.0, 982.0, 756.0, 422.0, 434.0, 384.0, 613.0, 839.0, - 1085.0, 886.0, 738.0, 552.0, 462.0, 810.0, 1232.0, 1363.0, 1555.0, - 1257.0, 1006.0, 942.0, 926.0, 915.0, 503.0, 992.0, 1023.0, 1163.0, - 680.0, 690.0, 1064.0, 1322.0, 1776.0, 1330.0, 961.0, 839.0, 705.0, - 883.0, 703.0, 558.0, 363.0, 202.0, 529.0, 767.0, 830.0, 556.0, - 321.0, 246.0, 356.0, 531.0, 710.0, 1131.0, 1555.0, 1464.0, 1016.0, - 787.0, 1416.0, 1292.0, 1153.0, 800.0, 1241.0, 1275.0, 1031.0, 901.0, - 625.0, 551.0, 1308.0, 897.0, 1080.0, 848.0, 858.0, 360.0, 214.0, - 608.0, 1217.0, 1495.0, 1206.0, 574.0, 441.0, 421.0, 887.0, 780.0, - 698.0, 700.0, 917.0, 1197.0, 1325.0, 1122.0, 1288.0, 1398.0, 1569.0, - 1513.0, 856.0, 784.0, 784.0, 790.0, 873.0, 621.0, 873.0, 745.0, - 861.0, 790.0, 738.0, 476.0, 297.0, 865.0, 949.0, 1286.0, 702.0, - 656.0, 604.0, 673.0, 863.0, 569.0, 612.0, 381.0, 849.0, 1075.0, - 1152.0, 644.0, 714.0, 1109.0, 1297.0, 925.0, 607.0, 471.0, 775.0, - 998.0, 1372.0, 1224.0, 1080.0, 776.0, 606.0, 466.0, 810.0, 818.0, - 1043.0, 1251.0, 1721.0, 1558.0, 1918.0, 1414.0, 1227.0, 399.0, 1063.0, - 1085.0, 893.0, 207.0, 150.0, 648.0, 1046.0, 1493.0, 1271.0, 905.0, - 754.0, 644.0, 854.0, 700.0, 930.0, 711.0, 612.0, 556.0, 1079.0, - 1074.0, 790.0, 298.0, 267.0, 639.0, 880.0, 1289.0, 1338.0, 1470.0, - 1355.0, 855.0, 825.0, 1416.0, 1807.0, 1500.0, 857.0, 867.0, 1093.0, - 950.0, 650.0, 326.0, 592.0, 1521.0, 663.0, 1241.0, 969.0, 1071.0, - 469.0, 319.0, 606.0, 1282.0, 1460.0, 1101.0, 411.0, 459.0, 438.0, - 894.0, 688.0, 772.0, 640.0, 880.0, 1108.0, 1334.0, 1050.0, 1312.0, - 1038.0, 1301.0, 1229.0, 1009.0, 1052.0, 1204.0, 1334.0, 1334.0, 938.0, - 1160.0, 936.0, 840.0, 502.0, 466.0, 312.0, 312.0, 896.0, 864.0, - 1014.0, 408.0, 448.0, 492.0, 515.0, 655.0, 521.0, 980.0, 795.0, - 1295.0, 1105.0, 1104.0, 538.0, 520.0, 959.0, 1445.0, 1289.0, 1049.0, - 761.0, 1271.0, 1596.0, 1684.0, 1242.0, 914.0, 712.0, 706.0, 590.0, - 686.0, 532.0, 848.0, 872.0, 1224.0, 1029.0, 1595.0, 1191.0, 1119.0, - 797.0, 1171.0, 1048.0, 930.0, 586.0, 536.0, 284.0, 527.0, 986.0, - 1148.0, 979.0, 900.0, 802.0, 1070.0, 984.0, 1382.0, 1063.0, 857.0, - 313.0, 792.0, 747.0, 809.0, 500.0, 472.0, 914.0, 844.0, 1593.0, - 1577.0, 1853.0, 1414.0, 966.0, 792.0, 1163.0, 1623.0, 1560.0, 909.0, - 785.0, 835.0, 826.0, 576.0, 632.0, 960.0, 802.0, 493.0, 900.0, - 878.0, 795.0, 367.0, 253.0, 226.0, 440.0, 556.0, 693.0, 773.0, - 885.0, 766.0, 1136.0, 891.0, 965.0, 557.0, 798.0, 1074.0, 1197.0, - 927.0, 1091.0, 960.0, 1258.0, 1120.0, 945.0, 1185.0, 1237.0, 1332.0, - 1042.0, 882.0, 1050.0, 796.0, 505.0, 215.0, 210.0, 192.0, 306.0, - 445.0, 429.0, 537.0, 517.0, 618.0, 762.0, 690.0, 862.0, 598.0, - 1084.0, 973.0, 1070.0, 836.0, 733.0, 560.0, 546.0, 535.0, 991.0, - 834.0, 826.0, 510.0, 1103.0, 1400.0, 1398.0, 850.0, 850.0, 1016.0, - 1098.0, 926.0, 590.0, 500.0, 672.0, 1160.0, 1452.0, 1383.0, 1257.0, - 1101.0, 1182.0, 1120.0, 842.0, 488.0, 567.0, 1043.0, 1041.0, 922.0, - 645.0, 702.0, 497.0, 359.0, 694.0, 798.0, 1039.0, 1049.0, 1610.0, - 1562.0, 1330.0, 670.0, 820.0, 700.0, 636.0, 413.0, 457.0, 1013.0, - 950.0, 1976.0, 1808.0, 2269.0, 1493.0, 1357.0, 839.0, 1135.0, 1427.0, - 1532.0, 1322.0, 829.0, 875.0, 467.0, 794.0, 892.0, 1610.0, 805.0, - 499.0, 842.0, 795.0, 705.0, 351.0, 270.0, 274.0, 582.0, 744.0, - 803.0, 1069.0, 973.0, 946.0, 614.0, 465.0, 515.0, 543.0, 664.0, - 628.0, 1223.0, 1153.0, 1515.0, 834.0, 996.0, 730.0, 615.0, 1247.0, - 1473.0, 1736.0, 1278.0, 1188.0, 1072.0, 878.0, 553.0, 473.0, 267.0, - 269.0, 241.0, 305.0, 265.0, 241.0, 292.0, 361.0, 735.0, 682.0, - 794.0, 750.0, 1226.0, 1290.0, 1107.0, 1001.0, 800.0, 1063.0, 1017.0, - 1049.0, 947.0, 766.0, 1135.0, 1145.0, 1318.0, 1310.0, 926.0, 1062.0, - 890.0, 1223.0, 1099.0, 988.0, 693.0, 539.0, 698.0, 1330.0, 1172.0, - 945.0, 243.0, 775.0, 1142.0, 1439.0, 1045.0, 956.0, 1140.0, 1484.0, - 1361.0, 1170.0, 671.0, 941.0, 804.0, 896.0, 762.0, 824.0, 1097.0, - 1159.0, 1386.0, 1658.0, 1436.0, 1076.0, 650.0, 820.0, 764.0, 863.0, - 577.0, 933.0, 764.0, 1648.0, 1832.0, 2276.0, 1548.0, 1296.0, 728.0, - 798.0, 682.0, 746.0, 839.0, 746.0, 830.0, 444.0, 873.0, 1497.0, - 1952.0, 799.0, 673.0, 1074.0, 762.0, 988.0, 712.0, 1008.0, 676.0, - 710.0, 652.0, 700.0, 1272.0, 1100.0, 1090.0, 588.0, 541.0, 495.0, - 741.0, 1032.0, 1502.0, 1498.0, 1372.0, 814.0, 467.0, 475.0, 558.0, - 477.0, 1015.0, 1056.0, 1164.0, 664.0, 810.0, 790.0, 914.0, 641.0, - 617.0, 303.0, 318.0, 522.0, 694.0, 658.0, 450.0, 341.0, 479.0, - 693.0, 772.0, 766.0, 746.0, 603.0, 717.0, 542.0, 1011.0, 868.0, - 1127.0, 989.0, 983.0, 741.0, 458.0, 801.0, 1179.0, 1494.0, 1366.0, - 860.0, 1056.0, 1364.0, 1657.0, 1481.0, 1006.0, 789.0, 547.0, 918.0, - 1536.0, 1468.0, 1027.0, 319.0, 815.0, 1188.0, 1185.0, 1211.0, 1149.0, - 1617.0, 1757.0, 1679.0, 1462.0, 730.0, 897.0, 754.0, 797.0, 446.0, - 492.0, 665.0, 1111.0, 1186.0, 1548.0, 1226.0, 1052.0, 664.0, 780.0, - 780.0, 743.0, 497.0, 616.0, 627.0, 1041.0, 1604.0, 1812.0, 1668.0, - 1236.0, 896.0, 674.0, 522.0, 454.0, 713.0, 588.0, 802.0, 608.0, - 961.0, 1305.0, 1566.0, 900.0, 895.0, 1066.0, 774.0, 886.0, 832.0, - 981.0, 655.0, 747.0, 664.0, 744.0, 985.0, 1005.0, 1065.0, 625.0, - 431.0, 273.0, 648.0, 1258.0, 1752.0, 1839.0, 1361.0, 791.0, 445.0, - 861.0, 884.0, 799.0, 629.0, 1160.0, 1364.0, 1418.0, 940.0, 816.0, - 708.0, 601.0, 821.0, 858.0, 907.0, 803.0, 612.0, 685.0, 525.0, - 430.0, 408.0, 391.0, 486.0, 463.0, 872.0, 685.0, 757.0, 434.0, - 1125.0, 1422.0, 1741.0, 1307.0, 1123.0, 835.0, 879.0, 1074.0, 1486.0, - 1608.0, 1483.0, 963.0, 931.0, 1236.0, 1185.0, 1097.0, 942.0, 1031.0, - 1157.0, 1116.0, 1361.0, 1005.0, 776.0, 451.0, 775.0, 1204.0, 1309.0, - 1546.0, 1352.0, 1699.0, 1363.0, 1377.0, 1029.0, 766.0, 755.0, 1061.0, - 1095.0, 710.0, 554.0, 760.0, 1178.0, 954.0, 1128.0, 910.0, 1116.0, - 1038.0, 1310.0, 1162.0, 970.0, 502.0, 677.0, 745.0, 838.0, 1145.0, - 1161.0, 1404.0, 976.0, 714.0, 442.0, 490.0, 476.0, 357.0, 197.0, - 183.0, 413.0, 677.0, 1373.0, 1414.0, 1067.0, 1061.0, 1172.0, 1171.0, - 1312.0, 1214.0, 1262.0, 1089.0, 1091.0, 592.0, 952.0, 1005.0, 1177.0, - 689.0, 623.0, 433.0, 359.0, 728.0, 1259.0, 1751.0, 1450.0, 1017.0, - 507.0, 361.0, 689.0, 718.0, 693.0, 277.0, 840.0, 980.0, 1302.0, - 942.0, 1070.0, 926.0, 643.0, 651.0, 819.0, 1148.0, 1208.0, 991.0, - 768.0, 468.0, 399.0, 384.0, 413.0, 929.0, 903.0, 1052.0, 533.0, - 528.0, 497.0, 714.0, 1145.0, 1046.0, 810.0, 644.0, 760.0, 944.0, - 645.0, 1037.0, 1182.0, 1350.0, 990.0, 687.0, 1492.0, 1282.0, 1414.0, - 884.0, 1164.0, 1700.0, 1706.0, 1441.0, 732.0, 661.0, 778.0, 851.0, - 998.0, 1000.0, 1367.0, 977.0, 1539.0, 1255.0, 1549.0, 925.0, 924.0, - 569.0, 835.0, 723.0, 690.0, 590.0, 744.0, 1131.0, 891.0, 769.0, - 508.0, 746.0, 814.0, 802.0, 673.0, 477.0, 368.0, 588.0, 696.0, - 657.0, 748.0, 866.0, 1104.0, 808.0, 605.0, 443.0, 567.0, 538.0, - 446.0, 222.0, 219.0, 430.0, 806.0, 995.0, 1418.0, 717.0, 987.0, - 976.0, 1392.0, 1101.0, 1033.0, 492.0, 795.0, 810.0, 503.0, 811.0, - 821.0, 989.0, 549.0, 633.0, 845.0, 707.0, 796.0, 949.0, 1025.0, - 1103.0, 540.0, 606.0, 378.0, 834.0, 762.0, 1206.0, 808.0, 1564.0, - 1336.0, 2020.0, 1500.0, 1530.0, 840.0, 527.0, 673.0, 981.0, 1315.0, - 977.0, 628.0, 407.0, 681.0, 757.0, 665.0, 398.0, 853.0, 915.0, - 1112.0, 756.0, 689.0, 746.0, 812.0, 1301.0, 1042.0, 1006.0, 762.0, - 862.0, 902.0, 617.0, 809.0, 614.0, 650.0, 530.0, 471.0, 1088.0, - 846.0, 886.0, 650.0, 840.0, 1804.0, 1579.0, 1388.0, 427.0, 631.0, - 732.0, 791.0, 892.0, 887.0, 920.0, 580.0, 1069.0, 913.0, 1139.0, - 697.0, 921.0, 675.0, 879.0, 706.0, 608.0, 464.0, 534.0, 559.0, - 443.0, 535.0, 590.0, 750.0, 828.0, 968.0, 989.0, 905.0, 804.0, - 837.0, 677.0, 522.0, 408.0, 690.0, 872.0, 744.0, 849.0, 709.0, - 861.0, 620.0, 581.0, 365.0, 304.0, 323.0, 1027.0, 1136.0, 1744.0, - 531.0, 599.0, 754.0, 1258.0, 1197.0, 950.0, 914.0, 1123.0, 1221.0, - 689.0, 939.0, 848.0, 675.0, 515.0, 705.0, 1210.0, 897.0, 794.0, - 479.0, 540.0, 749.0, 566.0, 643.0, 467.0, 659.0, 590.0, 941.0, - 831.0, 1111.0, 826.0, 1206.0, 1136.0, 1248.0, 828.0, 562.0, 468.0, - 530.0, 826.0, 920.0, 1067.0, 763.0, 807.0, 547.0, 716.0, 522.0, - 1205.0, 1206.0, 1416.0, 1344.0, 1179.0, 1363.0, 897.0, 1052.0, 556.0, - 1070.0, 998.0, 1324.0, 770.0, 705.0, 665.0, 519.0, 496.0, 382.0, - 570.0, 1154.0, 1118.0, 1036.0, 440.0, 445.0, 974.0, 1169.0, 1000.0, - 488.0, 500.0, 594.0, 537.0, 658.0, 538.0, 692.0, 465.0, 1061.0, - 933.0, 1118.0, 762.0, 981.0, 899.0, 867.0, 634.0, 460.0, 338.0, - 378.0, 431.0, 525.0, 615.0, 590.0, 430.0, 430.0, 408.0, 767.0, - 777.0, 1276.0, 1056.0, 857.0, 323.0, 212.0, 452.0, 587.0, 1055.0, - 1251.0, 1096.0, 964.0, 1205.0, 1321.0, 953.0, 406.0, 398.0, 949.0, - 900.0, 1243.0, 440.0, 513.0, 746.0, 946.0, 852.0, 603.0, 611.0, - 941.0, 1026.0, 946.0, 700.0, 580.0, 495.0, 819.0, 930.0, 1213.0, - 772.0, 682.0, 368.0, 471.0, 642.0, 440.0, 715.0, 767.0, 1189.0, - 914.0, 1241.0, 989.0, 1088.0, 691.0, 1055.0, 1088.0, 976.0, 472.0, - 274.0, 426.0, 550.0, 618.0, 596.0, 710.0, 666.0, 830.0, 721.0, - 949.0, 1037.0, 1296.0, 1220.0, 1628.0, 1720.0, 1924.0, 1580.0, 1201.0, - 957.0, 484.0, 1359.0, 1371.0, 1948.0, 1094.0, 987.0, 535.0, 378.0, - 429.0, 307.0, 546.0, 658.0, 799.0, 661.0, 527.0, 643.0, 784.0, - 773.0, 494.0, 453.0, 371.0, 372.0, 534.0, 1061.0, 1032.0, 772.0, - 283.0, 607.0, 611.0, 639.0, 647.0, 644.0, 813.0, 817.0, 898.0, - 1100.0, 835.0, 650.0, 296.0, 441.0, 636.0, 732.0, 541.0, 595.0, - 519.0, 886.0, 964.0, 1732.0, 1334.0, 1033.0, 285.0, 443.0, 635.0, - 687.0, 1067.0, 1612.0, 1446.0, 1268.0, 1192.0, 1331.0, 1033.0, 439.0, - 469.0, 786.0, 940.0, 857.0, 390.0, 373.0, 540.0, 498.0, 454.0, - 207.0, 589.0, 875.0, 1171.0, 1303.0, 1067.0, 724.0, 438.0, 700.0, - 809.0, 957.0, 670.0, 654.0, 252.0, 353.0, 368.0, 368.0, 432.0, - 666.0, 1146.0, 994.0, 819.0, 503.0, 418.0, 359.0, 543.0, 1076.0, - 1060.0, 782.0, 298.0, 374.0, 660.0, 1102.0, 1172.0, 1490.0, 1021.0, - 777.0, 292.0, 518.0, 928.0, 1087.0, 996.0, 1482.0, 1758.0, 2074.0, - 1634.0, 1131.0, 773.0, 404.0, 1147.0, 1229.0, 1670.0, 1354.0, 1250.0, - 790.0, 423.0, 521.0, 483.0, 590.0, 575.0, 1226.0, 1052.0, 1031.0, - 1103.0, 1068.0, 1011.0, 456.0, 589.0, 512.0, 369.0, 581.0, 1033.0, - 1037.0, 769.0, 430.0, 523.0, 475.0, 427.0, 1023.0, 1071.0, 1228.0, - 856.0, 1134.0, 1352.0, 1083.0, 666.0, 444.0, 511.0, 922.0, 942.0, - 787.0, 532.0, 358.0, 707.0, 718.0, 1812.0, 1468.0, 1543.0, 547.0, - 668.0, 532.0, 372.0, 740.0, 997.0, 994.0, 831.0, 1003.0, 1142.0, - 1052.0, 626.0, 665.0, 418.0, 708.0, 729.0, 403.0, 348.0, 322.0, - 258.0, 524.0, 552.0, 980.0, 1012.0, 1137.0, 1007.0, 1117.0, 990.0, - 875.0, 617.0, 525.0, 700.0, 636.0, 722.0, 340.0, 448.0, 506.0, - 514.0, 528.0, 998.0, 1386.0, 1296.0, 827.0, 911.0, 1124.0, 1029.0, - 897.0, 1086.0, 1071.0, 965.0, 433.0, 554.0, 992.0, 1466.0, 1464.0, - 1200.0, 767.0, 1057.0, 782.0, 805.0, 1001.0, 1044.0, 1044.0, 989.0, - 1009.0, 1693.0, 1330.0, 1527.0, 837.0, 670.0, 669.0, 807.0, 1066.0, - 1127.0, 930.0, 661.0, 688.0, 851.0, 808.0, 496.0, 331.0, 1160.0, - 1292.0, 1410.0, 1327.0, 1059.0, 1181.0, 931.0, 1103.0, 874.0, 607.0, - 979.0, 1187.0, 988.0, 764.0, 688.0, 946.0, 606.0, 531.0, 827.0, - 959.0, 1476.0, 1448.0, 1722.0, 1572.0, 1104.0, 674.0, 384.0, 392.0, - 796.0, 1168.0, 1073.0, 956.0, 680.0, 1347.0, 1304.0, 1952.0, 1251.0, - 1268.0, 604.0, 738.0, 573.0, 351.0, 307.0, 647.0, 847.0, 894.0, - 690.0, 552.0, 640.0, 652.0, 661.0, 483.0, 827.0, 952.0, 317.0, - 232.0, 585.0, 557.0, 898.0, 586.0, 1012.0, 772.0, 952.0, 671.0, - 945.0, 1131.0, 1035.0, 745.0, 428.0, 821.0, 879.0, 800.0, 332.0, - 614.0, 637.0, 865.0, 483.0, 860.0, 832.0, 896.0, 625.0, 865.0, - 1271.0, 1088.0, 880.0, 786.0, 1043.0, 1049.0, 747.0, 642.0, 1088.0, - 1380.0, 1426.0, 1268.0, 1277.0, 1683.0, 1159.0, 868.0, 662.0, 652.0, - 560.0, 309.0, 689.0, 1189.0, 1248.0, 1286.0, 800.0, 623.0, 194.0, - 208.0, 219.0, 609.0, 627.0, 726.0, 923.0, 1027.0, 1072.0, 585.0, - 447.0, 1109.0, 1241.0, 1250.0, 1198.0, 1465.0, 1617.0, 1327.0, 1151.0, - 1104.0, 1228.0, 1456.0, 1546.0, 1036.0, 936.0, 853.0, 1167.0, 735.0, - 658.0, 894.0, 994.0, 1402.0, 1102.0, 1200.0, 735.0, 508.0, 309.0, - 636.0, 513.0, 826.0, 1366.0, 1344.0, 1208.0, 532.0, 1308.0, 1440.0, - 1778.0, 1050.0, 972.0, 546.0, 525.0, 427.0, 313.0, 419.0, 371.0, - 588.0, 773.0, 957.0, 918.0, 832.0, 976.0, 785.0, 613.0, 829.0, - 1314.0, 231.0, 228.0, 617.0, 677.0, 1052.0, 665.0, 1113.0, 805.0, - 810.0, 264.0, 628.0, 1022.0, 1208.0, 1084.0, 755.0, 899.0, 689.0, - 556.0, 224.0, 461.0, 626.0, 1054.0, 778.0, 969.0, 1033.0, 1164.0, - 954.0, 1274.0, 1598.0, 1602.0, 846.0, 373.0, 435.0, 639.0, 730.0, - 557.0, 813.0, 1037.0, 1188.0, 1038.0, 1286.0, 1701.0, 1350.0, 967.0, - 768.0, 781.0, 624.0, 539.0, 739.0, 1182.0, 857.0, 1175.0, 876.0, - 1055.0, 654.0, 498.0, 329.0, 191.0, 214.0, 625.0, 1020.0, 1269.0, - 1318.0, 972.0, 773.0, 731.0, 1240.0, 1278.0, 1248.0, 1383.0, 1486.0, - 1438.0, 1062.0, 1010.0, 1272.0, 1466.0, 1495.0, 961.0, 763.0, 589.0, - 1037.0, 757.0, 782.0, 458.0, 508.0, 1012.0, 1184.0, 1164.0, 581.0, - 238.0, 207.0, 476.0, 419.0, 620.0, 1372.0, 1582.0, 1521.0, 719.0, - 1097.0, 1426.0, 1430.0, 1122.0, 672.0, 560.0, 586.0, 988.0, 988.0, - 822.0, 526.0, 682.0, 880.0, 939.0, 953.0, 983.0, 1152.0, 878.0, - 714.0, 743.0, 1115.0, 170.0, 348.0, 811.0, 1007.0, 974.0, 455.0, - 357.0, 485.0, 686.0, 606.0, 432.0, 970.0, 1154.0, 1262.0, 838.0, - 1222.0, 1064.0, 908.0, 416.0, 619.0, 790.0, 1570.0, 1617.0, 1272.0, - 771.0, 613.0, 786.0, 931.0, 956.0, 1057.0, 577.0, 475.0, 494.0, - 550.0, 644.0, 402.0, 566.0, 897.0, 904.0, 1108.0, 1560.0, 1651.0, - 1225.0, 502.0, 675.0, 793.0, 605.0, 651.0, 803.0, 946.0, 682.0, - 568.0, 756.0, 823.0, 830.0, 484.0, 317.0, 158.0, 228.0, 958.0, - 1009.0, 1243.0, 1007.0, 1014.0, 773.0, 367.0, 760.0, 887.0, 1363.0, - 1642.0, 1480.0, 1094.0, 662.0, 902.0, 1220.0, 1203.0, 1168.0, 747.0, - 568.0, 334.0, 597.0, 513.0, 550.0, 411.0, 405.0, 665.0, 482.0, - 500.0, 465.0, 469.0, 599.0, 596.0, 660.0, 782.0, 1206.0, 1348.0, - 1052.0, 718.0, 674.0, 1099.0, 815.0, 903.0, 519.0, 463.0, 454.0, - 997.0, 1535.0, 1392.0, 1080.0, 607.0, 819.0, 586.0, 801.0, 1217.0, - 1686.0, 1632.0, 1056.0, 791.0, 915.0, 325.0, 609.0, 662.0, 808.0, - 542.0, 443.0, 383.0, 525.0, 718.0, 661.0, 495.0, 673.0, 698.0, - 807.0, 584.0, 1013.0, 864.0, 839.0, 415.0, 420.0, 664.0, 1236.0, - 1495.0, 1079.0, 805.0, 733.0, 1184.0, 1365.0, 1148.0, 985.0, 431.0, - 509.0, 284.0, 326.0, 564.0, 630.0, 778.0, 937.0, 932.0, 1136.0, - 1188.0, 1139.0, 752.0, 453.0, 702.0, 856.0, 632.0, 632.0, 1069.0, - 1112.0, 1008.0, 624.0, 1040.0, 1068.0, 1272.0, 778.0, 572.0, 194.0, - 428.0, 1374.0, 1337.0, 1415.0, 827.0, 921.0, 713.0, 402.0, 777.0, - 869.0, 1016.0, 851.0, 755.0, 679.0, 515.0, 787.0, 718.0, 672.0, - 445.0, 447.0, 403.0, 454.0, 575.0, 545.0, 383.0, 218.0, 223.0, - 474.0, 459.0, 825.0, 864.0, 816.0, 632.0, 398.0, 562.0, 1010.0, - 1159.0, 1223.0, 578.0, 737.0, 715.0, 904.0, 580.0, 592.0, 448.0, - 404.0, 433.0, 979.0, 1979.0, 1806.0, 1592.0, 612.0, 772.0, 415.0, - 505.0, 1021.0, 1374.0, 1740.0, 1166.0, 813.0, 367.0, 348.0, 594.0, - 694.0, 714.0, 458.0, 442.0, 410.0, 596.0, 748.0, 740.0, 516.0, - 668.0, 610.0, 713.0, 494.0, 953.0, 948.0, 947.0, 587.0, 477.0, - 630.0, 1066.0, 1290.0, 1037.0, 595.0, 447.0, 867.0, 921.0, 826.0, - 442.0, 327.0, 426.0, 546.0, 572.0, 763.0, 639.0, 855.0, 856.0, - 964.0, 815.0, 1035.0, 921.0, 785.0, 388.0, 758.0, 1118.0, 940.0, - 528.0, 730.0, 798.0, 926.0, 436.0, 666.0, 530.0, 729.0, 416.0, - 370.0, 188.0, 562.0, 1230.0, 1219.0, 1005.0, 587.0, 610.0, 906.0, - 735.0, 817.0, 637.0, 758.0, 643.0, 805.0, 758.0, 811.0, 947.0, - 877.0, 846.0, 534.0, 580.0, 694.0, 658.0, 841.0, 651.0, 505.0, - 316.0, 333.0, 474.0, 511.0, 853.0, 977.0, 798.0, 750.0, 791.0, - 1186.0, 1390.0, 1045.0, 1029.0, 614.0, 960.0, 775.0, 666.0, 282.0, - 105.0, 373.0, 502.0, 494.0, 401.0, 1242.0, 1494.0, 1671.0, 743.0, - 751.0, 547.0, 922.0, 1158.0, 1291.0, 1374.0, 920.0, 701.0, 234.0, - 384.0, 482.0, 936.0, 946.0, 798.0, 586.0, 738.0, 936.0, 770.0, - 886.0, 996.0, 1254.0, 918.0, 837.0, 480.0, 609.0, 544.0, 549.0, - 489.0, 349.0, 874.0, 834.0, 870.0, 537.0, 676.0, 678.0, 1038.0, - 951.0, 1071.0, 567.0, 490.0, 300.0, 448.0, 588.0, 878.0, 780.0, - 724.0, 622.0, 832.0, 743.0, 723.0, 541.0, 554.0, 494.0, 924.0, - 1087.0, 883.0, 357.0, 599.0, 665.0, 883.0, 453.0, 552.0, 372.0, - 586.0, 577.0, 533.0, 342.0, 530.0, 1072.0, 1288.0, 1212.0, 1198.0, - 928.0, 1129.0, 668.0, 705.0, 467.0, 490.0, 700.0, 899.0, 892.0, - 713.0, 751.0, 745.0, 877.0, 657.0, 713.0, 634.0, 682.0, 897.0, - 905.0, 631.0, 369.0, 234.0, 268.0, 642.0, 1068.0, 1074.0, 580.0, - 374.0, 660.0, 1168.0, 1371.0, 1094.0, 925.0, 648.0, 1014.0, 870.0, - 953.0, 535.0, 352.0, 391.0, 590.0, 597.0, 374.0, 679.0, 910.0, - 1215.0, 752.0, 772.0, 716.0, 1284.0, 1120.0, 885.0, 525.0, 465.0, - 712.0, 682.0, 483.0, 488.0, 779.0, 837.0, 952.0, 686.0, 780.0, - 1052.0, 810.0, 928.0, 1186.0, 1650.0, 1366.0, 1024.0, 582.0, 682.0, - 966.0, 934.0, 866.0, 300.0, 776.0, 852.0, 870.0, 660.0, 633.0, - 591.0, 555.0, 403.0, 715.0, 559.0, 575.0, 255.0, 459.0, 668.0, - 736.0, 562.0, 576.0, 758.0, 844.0, 595.0, 759.0, 677.0, 738.0, - 308.0, 702.0, 854.0, 1248.0, 824.0, 677.0, 280.0, 456.0, 381.0, - 531.0, 335.0, 364.0, 367.0, 363.0, 366.0, 510.0, 688.0, 948.0, - 928.0, 1430.0, 1154.0, 1315.0, 664.0, 699.0, 456.0, 467.0, 1008.0, - 1220.0, 1295.0, 804.0, 794.0, 751.0, 876.0, 574.0, 559.0, 577.0, - 939.0, 1142.0, 1024.0, 780.0, 554.0, 746.0, 735.0, 1293.0, 1305.0, - 1304.0, 680.0, 544.0, 1060.0, 1480.0, 1661.0, 983.0, 970.0, 717.0, - 1040.0, 682.0, 797.0, 507.0, 512.0, 585.0, 726.0, 672.0, 387.0, - 354.0, 670.0, 1171.0, 1021.0, 858.0, 646.0, 1263.0, 1162.0, 887.0, - 373.0, 639.0, 1106.0, 1194.0, 478.0, 961.0, 1141.0, 1253.0, 938.0, - 658.0, 724.0, 928.0, 796.0, 936.0, 1424.0, 1804.0, 1710.0, 1050.0, - 620.0, 472.0, 876.0, 844.0, 866.0, 308.0, 764.0, 748.0, 914.0, - 676.0, 997.0, 837.0, 788.0, 391.0, 707.0, 605.0, 690.0, 312.0, - 403.0, 668.0, 736.0, 740.0, 694.0, 700.0, 752.0, 406.0, 588.0, - 812.0, 1090.0, 860.0, 761.0, 509.0, 889.0, 596.0, 672.0, 387.0, - 1035.0, 1162.0, 1305.0, 825.0, 675.0, 599.0, 442.0, 437.0, 663.0, - 829.0, 1013.0, 873.0, 1272.0, 1040.0, 820.0, 393.0, 378.0, 336.0, - 543.0, 1106.0, 1132.0, 920.0, 508.0, 640.0, 924.0, 1032.0, 906.0, - 927.0, 771.0, 1211.0, 806.0, 1202.0, 944.0, 855.0, 643.0, 608.0, - 1557.0, 1559.0, 1613.0, 812.0, 626.0, 999.0, 1104.0, 1312.0, 824.0, - 839.0, 731.0, 1107.0, 908.0, 999.0, 641.0, 647.0, 420.0, 372.0, - 547.0, 661.0, 899.0, 978.0, 1312.0, 1098.0, 830.0, 548.0, 730.0, - 716.0, 758.0, 717.0, 1229.0, 1529.0, 1502.0, 575.0, 964.0, 848.0, - 1063.0, 730.0, 592.0, 380.0, 496.0, 530.0, 568.0, 996.0, 1043.0, - 1208.0, 576.0, 483.0, 292.0, 824.0, 820.0, 810.0, 560.0, 676.0, - 764.0, 518.0, 448.0, 721.0, 637.0, 708.0, 355.0, 515.0, 381.0, - 482.0, 311.0, 418.0, 711.0, 816.0, 1114.0, 984.0, 870.0, 552.0, - 264.0, 430.0, 732.0, 916.0, 828.0, 567.0, 515.0, 881.0, 802.0, - 792.0, 533.0, 1063.0, 1200.0, 1267.0, 1323.0, 1102.0, 956.0, 407.0, - 417.0, 787.0, 713.0, 722.0, 486.0, 925.0, 1008.0, 1167.0, 1066.0, - 940.0, 864.0, 825.0, 1275.0, 1000.0, 760.0, 820.0, 1160.0, 1424.0, - 1043.0, 697.0, 1113.0, 1062.0, 1810.0, 1095.0, 1383.0, 839.0, 775.0, - 675.0, 1017.0, 1792.0, 1750.0, 1594.0, 990.0, 836.0, 1208.0, 1094.0, - 1709.0, 1385.0, 1655.0, 1148.0, 876.0, 624.0, 515.0, 483.0, 591.0, - 584.0, 638.0, 841.0, 987.0, 1137.0, 1059.0, 1189.0, 891.0, 626.0, - 345.0, 327.0, 333.0, 562.0, 878.0, 1358.0, 1542.0, 1262.0, 593.0, - 979.0, 1047.0, 1196.0, 774.0, 674.0, 394.0, 368.0, 398.0, 442.0, - 804.0, 649.0, 798.0, 426.0, 469.0, 560.0, 691.0, 720.0, 670.0, - 1199.0, 1429.0, 1541.0, 985.0, 864.0, 882.0, 938.0, 833.0, 503.0, - 381.0, 351.0, 463.0, 552.0, 557.0, 1061.0, 953.0, 1279.0, 1015.0, - 1336.0, 1084.0, 732.0, 374.0, 772.0, 897.0, 1023.0, 680.0, 657.0, - 519.0, 478.0, 396.0, 830.0, 1252.0, 1442.0, 1314.0, 1362.0, 1524.0, - 1502.0, 1153.0, 803.0, 783.0, 645.0, 616.0, 546.0, 741.0, 886.0, - 1033.0, 1129.0, 967.0, 923.0, 1295.0, 1567.0, 1233.0, 489.0, 873.0, - 1088.0, 1552.0, 1087.0, 953.0, 1329.0, 1174.0, 1516.0, 721.0, 1063.0, - 689.0, 605.0, 395.0, 767.0, 1854.0, 1828.0, 1810.0, 886.0, 802.0, - 1070.0, 1112.0, 1409.0, 1609.0, 1655.0, 1514.0, 1018.0, 930.0, 808.0, - 544.0, 518.0, 450.0, 546.0, 804.0, 966.0, 974.0, 883.0, 730.0, - 615.0, 365.0, 351.0, 264.0, 236.0, 511.0, 825.0, 1032.0, 991.0, - 715.0, 767.0, 679.0, 792.0, 719.0, 692.0, 611.0, 347.0, 836.0, - 1216.0, 1246.0, 966.0, 531.0, 494.0, 256.0, 251.0, 535.0, 629.0, - 1022.0, 1385.0, 2003.0, 1848.0, 1644.0, 1024.0, 1264.0, 881.0, 1000.0, - 499.0, 454.0, 279.0, 376.0, 392.0, 631.0, 537.0, 951.0, 801.0, - 1249.0, 1075.0, 1380.0, 1186.0, 988.0, 618.0, 504.0, 441.0, 945.0, - 893.0, 1006.0, 518.0, 503.0, 527.0, 923.0, 900.0, 802.0, 582.0, - 944.0, 1227.0, 1711.0, 1513.0, 1274.0, 716.0, 580.0, 579.0, 586.0, - 692.0, 859.0, 1060.0, 1210.0, 953.0, 1210.0, 1510.0, 1831.0, 1201.0, - 503.0, 883.0, 1106.0, 1292.0, 770.0, 652.0, 910.0, 930.0, 1250.0, - 881.0, 729.0, 319.0, 442.0, 576.0, 1042.0, 1490.0, 1464.0, 1454.0, - 858.0, 1034.0, 994.0, 1452.0, 1586.0, 1947.0, 1541.0, 1431.0, 910.0, - 1022.0, 691.0, 475.0, 433.0, 488.0, 636.0, 678.0, 664.0, 592.0, - 629.0, 794.0, 699.0, 490.0, 342.0, 312.0, 287.0, 332.0, 569.0, - 574.0, 614.0, 442.0, 672.0, 792.0, 984.0, 698.0, 593.0, 462.0, - 405.0, 766.0, 1370.0, 1424.0, 1038.0, 774.0, 746.0, 786.0, 350.0, - 909.0, 839.0, 1196.0, 1547.0, 1890.0, 1629.0, 1132.0, 893.0, 1721.0, - 1282.0, 1365.0, 353.0, 565.0, 752.0, 926.0, 716.0, 626.0, 562.0, - 922.0, 665.0, 1229.0, 1125.0, 1534.0, 1312.0, 1125.0, 747.0, 555.0, - 555.0, 987.0, 953.0, 880.0, 406.0, 397.0, 585.0, 991.0, 934.0, - 742.0, 642.0, 499.0, 880.0, 1380.0, 1583.0, 1344.0, 660.0, 696.0, - 638.0, 567.0, 417.0, 495.0, 424.0, 586.0, 885.0, 1198.0, 1830.0, - 1557.0, 1229.0, 751.0, 843.0, 1008.0, 828.0, 743.0, 721.0, 689.0, - 736.0, 688.0, 549.0, 371.0, 355.0, 504.0, 878.0, 1318.0, 1632.0, - 1335.0, 1098.0, 764.0, 1061.0, 1041.0, 1367.0, 1079.0, 1231.0, 937.0, - 1123.0, 928.0, 868.0, 781.0, 1051.0, 977.0, 773.0, 275.0, 279.0, - 274.0, 412.0, 590.0, 857.0, 1016.0, 782.0, 885.0, 681.0, 730.0, - 387.0, 508.0, 683.0, 592.0, 416.0, 666.0, 724.0, 874.0, 428.0, - 491.0, 626.0, 1031.0, 1310.0, 1693.0, 1383.0, 1079.0, 881.0, 933.0, - 1069.0, 624.0, 1213.0, 1012.0, 1394.0, 1437.0, 1407.0, 990.0, 314.0, - 306.0, 1185.0, 1141.0, 1078.0, 174.0, 542.0, 926.0, 1044.0, 788.0, - 550.0, 680.0, 654.0, 586.0, 986.0, 990.0, 928.0, 764.0, 901.0, - 943.0, 523.0, 374.0, 739.0, 883.0, 763.0, 369.0, 296.0, 502.0, - 623.0, 652.0, 476.0, 524.0, 350.0, 341.0, 989.0, 1007.0, 1150.0, - 431.0, 563.0, 579.0, 463.0, 330.0, 274.0, 459.0, 720.0, 1205.0, - 1435.0, 1687.0, 1142.0, 1108.0, 1018.0, 1100.0, 1286.0, 958.0, 871.0, - 447.0, 623.0, 742.0, 658.0, 421.0, 333.0, 379.0, 564.0, 733.0, - 1367.0, 1615.0, 1377.0, 766.0, 528.0, 1177.0, 1075.0, 1609.0, 1118.0, - 1084.0, 508.0, 687.0, 741.0, 620.0, 987.0, 1404.0, 1375.0, 687.0, - 209.0, 231.0, 732.0, 878.0, 992.0, 944.0, 1092.0, 1074.0, 1214.0, - 1025.0, 985.0, 408.0, 467.0, 771.0, 793.0, 667.0, 428.0, 689.0, - 841.0, 715.0, 621.0, 758.0, 1179.0, 1017.0, 950.0, 584.0, 766.0, - 1003.0, 1061.0, 1127.0, 1086.0, 1606.0, 1542.0, 1244.0, 1006.0, 847.0, - 764.0, 300.0, 428.0, 901.0, 1018.0, 1192.0, 773.0, 1144.0, 1145.0, - 1052.0, 780.0, 472.0, 652.0, 520.0, 452.0, 1064.0, 1110.0, 1066.0, - 470.0, 703.0, 782.0, 666.0, 375.0, 541.0, 757.0, 925.0, 941.0, - 773.0, 961.0, 762.0, 818.0, 414.0, 744.0, 764.0, 918.0, 956.0, - 725.0, 613.0, 237.0, 375.0, 383.0, 361.0, 280.0, 246.0, 742.0, - 856.0, 1396.0, 1044.0, 1280.0, 721.0, 1265.0, 1361.0, 1362.0, 1270.0, - 852.0, 964.0, 468.0, 722.0, 644.0, 628.0, 561.0, 513.0, 532.0, - 307.0, 574.0, 1243.0, 1673.0, 1427.0, 716.0, 962.0, 1267.0, 1557.0, - 1263.0, 983.0, 732.0, 476.0, 554.0, 513.0, 528.0, 1180.0, 1637.0, - 1544.0, 718.0, 530.0, 451.0, 1107.0, 935.0, 1134.0, 784.0, 928.0, - 872.0, 1196.0, 1144.0, 1124.0, 517.0, 379.0, 907.0, 1070.0, 990.0, - 694.0, 677.0, 642.0, 482.0, 416.0, 733.0, 1493.0, 1419.0, 1112.0, - 320.0, 624.0, 673.0, 913.0, 825.0, 1245.0, 1367.0, 1281.0, 1048.0, - 796.0, 635.0, 478.0, 333.0, 447.0, 532.0, 669.0, 873.0, 889.0, - 1027.0, 740.0, 588.0, 484.0, 420.0, 612.0, 488.0, 436.0, 868.0, - 932.0, 912.0, 436.0, 680.0, 775.0, 763.0, 371.0, 507.0, 500.0, - 750.0, 791.0, 758.0, 1172.0, 956.0, 953.0, 305.0, 681.0, 837.0, - 1133.0, 1187.0, 983.0, 690.0, 384.0, 446.0, 803.0, 660.0, 895.0, - 603.0, 1370.0, 1116.0, 1176.0, 720.0, 824.0, 763.0, 1243.0, 1167.0, - 1184.0, 944.0, 928.0, 955.0, 571.0, 1075.0, 956.0, 836.0, 820.0, - 784.0, 803.0, 543.0, 602.0, 881.0, 981.0, 976.0, 1122.0, 1258.0, - 1698.0, 2016.0, 2044.0, 1538.0, 912.0, 352.0, 494.0, 607.0, 754.0, - 1206.0, 1061.0, 1082.0, 541.0, 723.0, 898.0, 1469.0, 1287.0, 1072.0, - 688.0, 616.0, 594.0, 746.0, 844.0, 894.0, 608.0, 512.0, 740.0, - 870.0, 966.0, 1724.0, 1437.0, 1104.0, 676.0, 385.0, 330.0, 736.0, - 955.0, 1291.0, 889.0, 893.0, 540.0, 652.0, 952.0, 1593.0, 1483.0, - 1097.0, 626.0, 705.0, 560.0, 576.0, 555.0, 730.0, 592.0, 613.0, - 1115.0, 1177.0, 1192.0, 630.0, 592.0, 500.0, 442.0, 399.0, 427.0, - 321.0, 872.0, 820.0, 812.0, 333.0, 438.0, 833.0, 960.0, 875.0, - 701.0, 598.0, 742.0, 954.0, 935.0, 1323.0, 1030.0, 945.0, 353.0, - 749.0, 1460.0, 1758.0, 1552.0, 913.0, 592.0, 463.0, 477.0, 912.0, - 764.0, 990.0, 888.0, 1441.0, 1152.0, 906.0, 724.0, 778.0, 757.0, - 770.0, 731.0, 1166.0, 905.0, 893.0, 483.0, 603.0, 894.0, 882.0, - 676.0, 926.0, 878.0, 843.0, 725.0, 907.0, 998.0, 496.0, 576.0, - 998.0, 1566.0, 1974.0, 2248.0, 1922.0, 1283.0, 713.0, 593.0, 1040.0, - 1096.0, 1186.0, 723.0, 587.0, 675.0, 618.0, 994.0, 1189.0, 1491.0, - 1187.0, 1248.0, 1024.0, 1076.0, 574.0, 632.0, 546.0, 696.0, 748.0, - 606.0, 654.0, 576.0, 752.0, 1879.0, 1347.0, 1286.0, 932.0, 665.0, - 427.0, 605.0, 835.0, 1369.0, 1095.0, 1148.0, 520.0, 988.0, 1278.0, - 1523.0, 1235.0, 664.0, 492.0, 281.0, 471.0, 758.0, 1017.0, 856.0, - 520.0, 281.0, 607.0, 929.0, 977.0, 943.0, 671.0, 570.0, 296.0, - 226.0, 378.0, 378.0, 611.0, 883.0, 826.0, 664.0, 345.0, 888.0, - 1009.0, 1032.0, 527.0, 360.0, 189.0, 399.0, 490.0, 909.0, 854.0, - 1043.0, 677.0, 919.0, 1098.0, 1271.0, 1195.0, 791.0, 1033.0, 849.0, - 843.0, 1118.0, 1347.0, 1685.0, 1316.0, 1183.0, 795.0, 600.0, 952.0, - 911.0, 1055.0, 644.0, 1020.0, 1484.0, 1599.0, 1375.0, 663.0, 791.0, - 956.0, 982.0, 600.0, 696.0, 878.0, 870.0, 911.0, 1099.0, 1072.0, - 975.0, 809.0, 1374.0, 1100.0, 1568.0, 1534.0, 1692.0, 1056.0, 676.0, - 574.0, 1270.0, 1400.0, 1274.0, 499.0, 315.0, 485.0, 479.0, 667.0, - 1097.0, 1540.0, 1384.0, 1158.0, 830.0, 938.0, 585.0, 995.0, 865.0, - 1080.0, 712.0, 822.0, 930.0, 1232.0, 1258.0, 1859.0, 1311.0, 1262.0, - 972.0, 641.0, 379.0, 192.0, 422.0, 1010.0, 1101.0, 1378.0, 720.0, - 1340.0, 1320.0, 1536.0, 1104.0, 665.0, 774.0, 555.0, 1070.0, 1101.0, - 1420.0, 926.0, 826.0, 482.0, 780.0, 906.0, 997.0, 873.0, 867.0, - 884.0, 737.0, 381.0, 369.0, 350.0, 543.0, 1095.0, 1026.0, 838.0, - 256.0, 807.0, 906.0, 1016.0, 490.0, 400.0, 565.0, 775.0, 909.0, - 720.0, 676.0, 920.0, 966.0, 1100.0, 1172.0, 993.0, 767.0, 291.0, - 732.0, 880.0, 844.0, 789.0, 963.0, 1175.0, 1036.0, 618.0, 441.0, - 495.0, 941.0, 925.0, 833.0, 576.0, 1372.0, 1818.0, 1841.0, 1217.0, - 570.0, 1122.0, 945.0, 1416.0, 930.0, 1009.0, 891.0, 659.0, 731.0, - 883.0, 846.0, 1073.0, 851.0, 1002.0, 720.0, 868.0, 850.0, 764.0, - 427.0, 451.0, 587.0, 1314.0, 1240.0, 1098.0, 443.0, 372.0, 464.0, - 240.0, 611.0, 739.0, 1214.0, 1202.0, 1381.0, 1213.0, 1273.0, 873.0, - 1227.0, 907.0, 1102.0, 546.0, 631.0, 959.0, 1299.0, 1220.0, 793.0, - 599.0, 874.0, 896.0, 698.0, 938.0, 853.0, 844.0, 686.0, 582.0, - 1031.0, 717.0, 1458.0, 1072.0, 1066.0, 526.0, 487.0, 784.0, 606.0, - 1081.0, 947.0, 1167.0, 1117.0, 1364.0, 1569.0, 1107.0, 1067.0, 747.0, - 933.0, 825.0, 771.0, 658.0, 509.0, 692.0, 665.0, 891.0, 1287.0, - 1346.0, 978.0, 363.0, 594.0, 661.0, 844.0, 755.0, 666.0, 787.0, - 776.0, 915.0, 810.0, 626.0, 924.0, 1214.0, 1251.0, 820.0, 282.0, - 189.0, 139.0, 903.0, 1367.0, 1444.0, 916.0, 870.0, 1010.0, 884.0, - 732.0, 635.0, 1233.0, 1200.0, 1198.0, 722.0, 617.0, 1664.0, 1734.0, - 1810.0, 1278.0, 1050.0, 1438.0, 886.0, 1242.0, 888.0, 909.0, 1219.0, - 1079.0, 987.0, 599.0, 490.0, 1049.0, 743.0, 898.0, 420.0, 440.0, - 362.0, 310.0, 301.0, 355.0, 537.0, 984.0, 964.0, 784.0, 500.0, - 621.0, 747.0, 623.0, 562.0, 578.0, 854.0, 1138.0, 1009.0, 905.0, - 653.0, 849.0, 1189.0, 1051.0, 1162.0, 632.0, 728.0, 922.0, 1666.0, - 1554.0, 798.0, 579.0, 707.0, 695.0, 660.0, 972.0, 832.0, 786.0, - 452.0, 309.0, 625.0, 499.0, 878.0, 598.0, 844.0, 528.0, 542.0, - 666.0, 908.0, 1344.0, 969.0, 953.0, 892.0, 1465.0, 1666.0, 1263.0, - 839.0, 726.0, 768.0, 950.0, 761.0, 766.0, 596.0, 689.0, 638.0, - 910.0, 1008.0, 1090.0, 789.0, 526.0, 582.0, 523.0, 638.0, 697.0, - 720.0, 904.0, 1331.0, 1345.0, 1107.0, 587.0, 1010.0, 1368.0, 1401.0, - 880.0, 559.0, 638.0, 597.0, 738.0, 1038.0, 1156.0, 796.0, 317.0, - 359.0, 508.0, 799.0, 728.0, 1407.0, 1080.0, 1101.0, 337.0, 456.0, - 1128.0, 1146.0, 1350.0, 1110.0, 1198.0, 1290.0, 896.0, 1176.0, 958.0, - 831.0, 1329.0, 1186.0, 1198.0, 490.0, 499.0, 620.0, 674.0, 630.0, - 622.0, 382.0, 468.0, 588.0, 653.0, 637.0, 665.0, 770.0, 940.0, - 698.0, 1074.0, 1267.0, 1533.0, 1038.0, 641.0, 377.0, 424.0, 740.0, - 799.0, 987.0, 835.0, 1176.0, 1008.0, 1142.0, 990.0, 1034.0, 838.0, - 758.0, 984.0, 892.0, 748.0, 857.0, 881.0, 823.0, 772.0, 1144.0, - 1037.0, 1051.0, 633.0, 497.0, 341.0, 683.0, 802.0, 838.0, 1020.0, - 1392.0, 1412.0, 824.0, 620.0, 706.0, 654.0, 752.0, 1325.0, 1627.0, - 1699.0, 1134.0, 906.0, 804.0, 679.0, 675.0, 380.0, 443.0, 521.0, - 882.0, 791.0, 835.0, 693.0, 779.0, 1182.0, 1090.0, 1129.0, 578.0, - 698.0, 788.0, 899.0, 605.0, 1055.0, 999.0, 1223.0, 687.0, 1270.0, - 1564.0, 1554.0, 989.0, 864.0, 980.0, 875.0, 779.0, 833.0, 890.0, - 898.0, 641.0, 602.0, 429.0, 718.0, 758.0, 1194.0, 1069.0, 1093.0, - 863.0, 716.0, 894.0, 622.0, 982.0, 1172.0, 1472.0, 1084.0, 856.0, - 566.0, 1012.0, 842.0, 1732.0, 1565.0, 1550.0, 742.0, 523.0, 584.0, - 1064.0, 1144.0, 1604.0, 998.0, 948.0, 732.0, 761.0, 666.0, 834.0, - 925.0, 1156.0, 611.0, 1173.0, 1399.0, 1788.0, 1161.0, 679.0, 573.0, - 628.0, 842.0, 654.0, 726.0, 610.0, 971.0, 1115.0, 1373.0, 1166.0, - 1010.0, 835.0, 654.0, 1070.0, 979.0, 647.0, 650.0, 647.0, 929.0, - 874.0, 1036.0, 636.0, 748.0, 642.0, 752.0, 684.0, 1074.0, 1056.0, - 839.0, 893.0, 1309.0, 1404.0, 827.0, 855.0, 921.0, 977.0, 825.0, - 1002.0, 1241.0, 843.0, 722.0, 444.0, 782.0, 649.0, 655.0, 531.0, - 518.0, 568.0, 566.0, 779.0, 686.0, 750.0, 465.0, 1058.0, 1052.0, - 1310.0, 690.0, 662.0, 616.0, 798.0, 575.0, 901.0, 1024.0, 1145.0, - 1081.0, 1438.0, 1514.0, 1345.0, 819.0, 1018.0, 1361.0, 1422.0, 1420.0, - 914.0, 740.0, 1029.0, 1288.0, 1571.0, 965.0, 651.0, 387.0, 433.0, - 644.0, 832.0, 988.0, 916.0, 673.0, 382.0, 858.0, 937.0, 1292.0, - 660.0, 992.0, 870.0, 1325.0, 1095.0, 1469.0, 1305.0, 1270.0, 798.0, - 627.0, 532.0, 1080.0, 1424.0, 1952.0, 1272.0, 952.0, 864.0, 877.0, - 694.0, 870.0, 935.0, 1204.0, 683.0, 1309.0, 1431.0, 1664.0, 953.0, - 583.0, 661.0, 784.0, 798.0, 522.0, 674.0, 750.0, 773.0, 909.0, - 1329.0, 1700.0, 1596.0, 1256.0, 743.0, 711.0, 873.0, 795.0, 705.0, - 766.0, 812.0, 722.0, 1020.0, 817.0, 913.0, 903.0, 1378.0, 1430.0, - 1720.0, 1350.0, 1509.0, 1135.0, 1559.0, 1174.0, 790.0, 582.0, 664.0, - 1125.0, 1039.0, 1237.0, 1008.0, 784.0, 574.0, 346.0, 502.0, 525.0, - 595.0, 663.0, 543.0, 599.0, 675.0, 903.0, 765.0, 582.0, 277.0, - 913.0, 972.0, 1344.0, 708.0, 960.0, 828.0, 982.0, 535.0, 361.0, - 661.0, 896.0, 1175.0, 965.0, 1079.0, 915.0, 1233.0, 1272.0, 1459.0, - 1127.0, 1313.0, 1063.0, 850.0, 1147.0, 1427.0, 1673.0, 899.0, 514.0, - 413.0, 379.0, 832.0, 1068.0, 1316.0, 996.0, 777.0, 562.0, 774.0, - 729.0, 901.0, 661.0, 813.0, 826.0, 1229.0, 1257.0, 1287.0, 1300.0, - 1124.0, 902.0, 530.0, 502.0, 1036.0, 1454.0, 1846.0, 1420.0, 1382.0, - 1112.0, 921.0, 400.0, 794.0, 809.0, 984.0, 486.0, 811.0, 763.0, - 845.0, 614.0, 570.0, 778.0, 764.0, 806.0, 465.0, 611.0, 553.0, - 579.0, 771.0, 836.0, 1221.0, 1229.0, 1554.0, 1241.0, 1041.0, 871.0, - 646.0, 350.0, 676.0, 722.0, 772.0, 864.0, 677.0, 709.0, 1063.0, - 1880.0, 2041.0, 1547.0, 753.0, 983.0, 699.0, 811.0, 400.0, 420.0, - 585.0, 673.0, 1076.0, 1101.0, 1187.0, 974.0, 620.0, 418.0, 280.0, - 510.0, 686.0, 638.0, 666.0, 577.0, 609.0, 457.0, 702.0, 636.0, - 513.0, 137.0, 303.0, 414.0, 778.0, 892.0, 1109.0, 995.0, 839.0, - 525.0, 335.0, 647.0, 812.0, 1275.0, 885.0, 1167.0, 820.0, 1598.0, - 1321.0, 1499.0, 881.0, 1266.0, 1048.0, 837.0, 851.0, 1231.0, 1555.0, - 1079.0, 612.0, 489.0, 555.0, 758.0, 913.0, 754.0, 651.0, 464.0, - 534.0, 672.0, 584.0, 548.0, 513.0, 575.0, 840.0, 673.0, 843.0, - 713.0, 932.0, 695.0, 677.0, 529.0, 558.0, 704.0, 1062.0, 950.0, - 838.0, 834.0, 1016.0, 876.0, 780.0, 922.0, 934.0, 724.0, 549.0, - 518.0, 421.0, 364.0, 443.0, 420.0, 494.0, 800.0, 796.0, 694.0, - 548.0, 898.0, 1008.0, 768.0, 589.0, 945.0, 1358.0, 1605.0, 1421.0, - 1240.0, 1024.0, 745.0, 479.0, 770.0, 536.0, 564.0, 516.0, 610.0, - 706.0, 1072.0, 1688.0, 1601.0, 1579.0, 1297.0, 1706.0, 1116.0, 768.0, - 306.0, 277.0, 454.0, 858.0, 1184.0, 1227.0, 1113.0, 1028.0, 856.0, - 790.0, 690.0, 728.0, 662.0, 531.0, 441.0, 408.0, 617.0, 631.0, - 762.0, 897.0, 1004.0, 887.0, 858.0, 664.0, 645.0, 621.0, 1273.0, - 1603.0, 1403.0, 721.0, 343.0, 563.0, 836.0, 1279.0, 991.0, 1121.0, - 624.0, 1420.0, 1232.0, 1165.0, 371.0, 391.0, 792.0, 1039.0, 1037.0, - 969.0, 779.0, 827.0, 563.0, 656.0, 518.0, 706.0, 701.0, 710.0, - 567.0, 555.0, 548.0, 474.0, 393.0, 618.0, 850.0, 705.0, 518.0, - 349.0, 848.0, 822.0, 946.0, 513.0, 551.0, 613.0, 658.0, 697.0, - 559.0, 469.0, 399.0, 773.0, 837.0, 748.0, 774.0, 868.0, 838.0, - 564.0, 457.0, 478.0, 245.0, 207.0, 231.0, 328.0, 319.0, 708.0, - 696.0, 870.0, 607.0, 925.0, 1345.0, 1346.0, 1006.0, 478.0, 539.0, - 1169.0, 1471.0, 1482.0, 908.0, 428.0, 376.0, 652.0, 523.0, 481.0, - 237.0, 370.0, 534.0, 872.0, 1134.0, 929.0, 979.0, 1181.0, 1201.0, - 915.0, 487.0, 504.0, 390.0, 475.0, 684.0, 602.0, 653.0, 748.0, - 1011.0, 851.0, 759.0, 664.0, 653.0, 903.0, 765.0, 658.0, 323.0, - 669.0, 711.0, 824.0, 833.0, 991.0, 910.0, 945.0, 724.0, 645.0, - 889.0, 1361.0, 1703.0, 1243.0, 770.0, 424.0, 588.0, 752.0, 1616.0, - 1450.0, 1636.0, 1046.0, 1278.0, 956.0, 853.0, 483.0, 573.0, 1070.0, - 1331.0, 1417.0, 1173.0, 994.0, 1124.0, 838.0, 788.0, 578.0, 572.0, - 569.0, 652.0, 469.0, 511.0, 412.0, 674.0, 837.0, 1111.0, 955.0, - 806.0, 594.0, 486.0, 1089.0, 1419.0, 1550.0, 1049.0, 717.0, 711.0, - 588.0, 497.0, 509.0, 495.0, 505.0, 589.0, 675.0, 754.0, 1374.0, - 1342.0, 1296.0, 596.0, 512.0, 426.0, 315.0, 280.0, 299.0, 246.0, - 221.0, 658.0, 714.0, 900.0, 697.0, 1019.0, 1393.0, 1542.0, 1209.0, - 723.0, 380.0, 855.0, 964.0, 1307.0, 859.0, 645.0, 559.0, 625.0, - 648.0, 439.0, 610.0, 617.0, 856.0, 628.0, 490.0, 182.0, 570.0, - 1232.0, 1277.0, 1173.0, 1001.0, 1007.0, 681.0, 421.0, 751.0, 704.0, - 456.0, 239.0, 649.0, 929.0, 1089.0, 788.0, 611.0, 931.0, 849.0, - 828.0, 281.0, 641.0, 659.0, 854.0, 838.0, 996.0, 1114.0, 1172.0, - 935.0, 745.0, 721.0, 1320.0, 1555.0, 1367.0, 759.0, 404.0, 896.0, - 1140.0, 1784.0, 1204.0, 1414.0, 1072.0, 1110.0, 784.0, 719.0, 619.0, - 711.0, 1134.0, 1393.0, 1338.0, 1114.0, 851.0, 998.0, 724.0, 746.0, - 516.0, 505.0, 755.0, 1313.0, 1430.0, 1246.0, 794.0, 744.0, 900.0, - 1092.0, 782.0, 683.0, 561.0, 618.0, 1097.0, 1367.0, 1524.0, 1097.0, - 612.0, 618.0, 493.0, 527.0, 519.0, 567.0, 635.0, 705.0, 1131.0, - 1122.0, 1346.0, 881.0, 917.0, 671.0, 590.0, 510.0, 384.0, 423.0, - 370.0, 293.0, 236.0, 407.0, 422.0, 615.0, 732.0, 818.0, 1097.0, - 1270.0, 1147.0, 645.0, 241.0, 1054.0, 1072.0, 1190.0, 880.0, 479.0, - 721.0, 831.0, 940.0, 431.0, 557.0, 728.0, 807.0, 606.0, 780.0, - 754.0, 690.0, 536.0, 553.0, 1073.0, 1253.0, 1191.0, 761.0, 317.0, - 365.0, 286.0, 272.0, 373.0, 507.0, 597.0, 537.0, 432.0, 397.0, - 875.0, 862.0, 853.0, 290.0, 469.0, 495.0, 568.0, 329.0, 273.0, - 587.0, 956.0, 962.0, 684.0, 756.0, 932.0, 1143.0, 927.0, 1168.0, - 1101.0, 1469.0, 1228.0, 1360.0, 1150.0, 1371.0, 1375.0, 1607.0, 1252.0, - 1137.0, 817.0, 943.0, 1126.0, 961.0, 869.0, 1117.0, 1054.0, 986.0, - 479.0, 561.0, 597.0, 557.0, 923.0, 1331.0, 1478.0, 1099.0, 712.0, - 659.0, 888.0, 776.0, 463.0, 359.0, 598.0, 749.0, 1027.0, 1239.0, - 1390.0, 1543.0, 1158.0, 894.0, 349.0, 398.0, 606.0, 658.0, 890.0, - 1286.0, 1888.0, 1674.0, 1456.0, 935.0, 1137.0, 903.0, 682.0, 570.0, - 500.0, 776.0, 578.0, 489.0, 189.0, 441.0, 800.0, 983.0, 1007.0, - 775.0, 754.0, 996.0, 968.0, 798.0, 302.0, 1178.0, 1189.0, 1471.0, - 1129.0, 538.0, 677.0, 618.0, 726.0, 432.0, 718.0, 1154.0, 1261.0, - 1380.0, 1256.0, 1072.0, 760.0, 934.0, 1072.0, 1396.0, 1198.0, 961.0, - 633.0, 355.0, 466.0, 534.0, 522.0, 624.0, 636.0, 632.0, 884.0, - 1016.0, 970.0, 638.0, 369.0, 354.0, 716.0, 803.0, 822.0, 432.0, - 520.0, 462.0, 1010.0, 1088.0, 1152.0, 780.0, 586.0, 790.0, 923.0, - 942.0, 1097.0, 972.0, 1311.0, 1128.0, 820.0, 690.0, 895.0, 1059.0, - 1362.0, 1021.0, 1010.0, 761.0, 812.0, 723.0, 433.0, 330.0, 824.0, - 810.0, 938.0, 657.0, 769.0, 699.0, 431.0, 740.0, 1046.0, 1397.0, - 935.0, 572.0, 232.0, 389.0, 331.0, 693.0, 693.0, 952.0, 710.0, - 804.0, 566.0, 608.0, 889.0, 1028.0, 862.0, 381.0, 342.0, 462.0, - 439.0, 607.0, 915.0, 1688.0, 1458.0, 1062.0, 441.0, 651.0, 995.0, - 862.0, 794.0, 652.0, 906.0, 1072.0, 907.0, 771.0, 729.0, 992.0, - 1028.0, 898.0, 1000.0, 940.0, 1422.0, 1105.0, 951.0, 235.0, 791.0, - 861.0, 1195.0, 1346.0, 488.0, 775.0, 869.0, 643.0, 472.0, 609.0, - 1140.0, 1272.0, 1391.0, 1424.0, 1194.0, 878.0, 902.0, 1174.0, 1358.0, - 782.0, 878.0, 934.0, 1144.0, 666.0, 508.0, 476.0, 754.0, 576.0, - 470.0, 573.0, 1185.0, 1153.0, 810.0, 253.0, 234.0, 830.0, 919.0, - 971.0, 382.0, 435.0, 436.0, 881.0, 960.0, 1100.0, 828.0, 714.0, - 680.0, 1042.0, 909.0, 1189.0, 907.0, 876.0, 535.0, 475.0, 795.0, - 791.0, 743.0, 938.0, 883.0, 848.0, 687.0, 860.0, 823.0, 566.0, - 434.0, 748.0, 757.0, 910.0, 727.0, 740.0, 612.0, 559.0, 676.0, - 575.0, 461.0, 286.0, 440.0, 422.0, 555.0, 410.0, 738.0, 894.0, - 1112.0, 966.0, 890.0, 1096.0, 1078.0, 1426.0, 1192.0, 974.0, 409.0, - 315.0, 399.0, 791.0, 971.0, 1353.0, 1342.0, 1140.0, 752.0, 509.0, - 687.0, 941.0, 822.0, 698.0, 554.0, 741.0, 989.0, 843.0, 1060.0, - 976.0, 1338.0, 1130.0, 972.0, 878.0, 776.0, 1318.0, 1133.0, 1203.0, - 469.0, 699.0, 972.0, 1418.0, 1477.0, 1062.0, 971.0, 1319.0, 751.0, - 790.0, 663.0, 922.0, 1046.0, 1285.0, 1290.0, 974.0, 746.0, 1034.0, - 1470.0, 1272.0, 692.0, 732.0, 922.0, 1096.0, 565.0, 431.0, 431.0, - 874.0, 730.0, 632.0, 591.0, 1321.0, 1249.0, 916.0, 207.0, 256.0, - 808.0, 857.0, 963.0, 366.0, 554.0, 545.0, 805.0, 757.0, 777.0, - 720.0, 798.0, 864.0, 870.0, 629.0, 513.0, 287.0, 246.0, 529.0, - 741.0, 811.0, 535.0, 659.0, 637.0, 638.0, 746.0, 930.0, 1015.0, - 683.0, 666.0, 603.0, 541.0, 494.0, 630.0, 1184.0, 1067.0, 913.0, - 493.0, 508.0, 429.0, 461.0, 575.0, 728.0, 561.0, 543.0, 414.0, - 754.0, 940.0, 1141.0, 918.0, 786.0, 1026.0, 1002.0, 1150.0, 816.0, - 714.0, 833.0, 803.0, 999.0, 956.0, 1124.0, 950.0, 762.0, 532.0, - 540.0, 308.0, 414.0, 656.0, 697.0, 537.0, 545.0, 557.0, 987.0, - 739.0, 1150.0, 788.0, 807.0, 463.0, 851.0, 1220.0, 1179.0, 1131.0, - 756.0, 897.0, 434.0, 350.0, 626.0, 933.0, 1154.0, 1182.0, 1058.0, - 1552.0, 1000.0, 872.0, 813.0, 1070.0, 1386.0, 1055.0, 979.0, 653.0, - 975.0, 888.0, 1160.0, 1262.0, 1156.0, 1522.0, 1214.0, 1246.0, 549.0, - 315.0, 275.0, 1068.0, 1038.0, 1012.0, 319.0, 847.0, 815.0, 782.0, - 352.0, 390.0, 474.0, 626.0, 835.0, 652.0, 804.0, 739.0, 813.0, - 533.0, 422.0, 485.0, 745.0, 966.0, 1045.0, 1061.0, 1186.0, 1011.0, - 631.0, 569.0, 857.0, 887.0, 507.0, 523.0, 564.0, 673.0, 678.0, - 727.0, 829.0, 604.0, 706.0, 544.0, 490.0, 400.0, 399.0, 724.0, - 619.0, 618.0, 457.0, 447.0, 446.0, 488.0, 607.0, 756.0, 732.0, - 698.0, 685.0, 446.0, 588.0, 619.0, 792.0, 976.0, 1462.0, 1448.0, - 1281.0, 671.0, 590.0, 908.0, 932.0, 1108.0, 962.0, 1082.0, 1065.0, - 856.0, 1068.0, 897.0, 628.0, 354.0, 387.0, 457.0, 331.0, 367.0, - 399.0, 463.0, 831.0, 1065.0, 964.0, 573.0, 368.0, 735.0, 774.0, - 733.0, 525.0, 615.0, 778.0, 667.0, 461.0, 745.0, 809.0, 1035.0, - 1405.0, 1194.0, 1424.0, 764.0, 692.0, 575.0, 883.0, 1111.0, 995.0, - 784.0, 566.0, 907.0, 1202.0, 1302.0, 1338.0, 948.0, 1004.0, 467.0, - 509.0, 722.0, 777.0, 719.0, 1040.0, 1408.0, 1590.0, 1050.0, 870.0, - 676.0, 476.0, 394.0, 520.0, 524.0, 756.0, 832.0, 778.0, 826.0, - 802.0, 885.0, 599.0, 333.0, 406.0, 856.0, 1188.0, 1117.0, 928.0, - 1286.0, 1275.0, 900.0, 620.0, 688.0, 936.0, 577.0, 741.0, 600.0, - 672.0, 761.0, 754.0, 755.0, 490.0, 650.0, 606.0, 534.0, 368.0, - 483.0, 800.0, 740.0, 675.0, 828.0, 825.0, 841.0, 719.0, 864.0, - 796.0, 948.0, 846.0, 1004.0, 447.0, 409.0, 625.0, 1072.0, 1488.0, - 1638.0, 1278.0, 1029.0, 503.0, 469.0, 1082.0, 1066.0, 1269.0, 429.0, - 1065.0, 1098.0, 1426.0, 1184.0, 928.0, 654.0, 418.0, 506.0, 503.0, - 349.0, 481.0, 666.0, 714.0, 997.0, 1188.0, 1127.0, 669.0, 416.0, - 705.0, 769.0, 720.0, 560.0, 691.0, 622.0, 555.0, 593.0, 617.0, - 563.0, 729.0, 937.0, 966.0, 831.0, 623.0, 436.0, 583.0, 1227.0, - 1476.0, 1251.0, 528.0, 310.0, 633.0, 972.0, 918.0, 1056.0, 758.0, - 958.0, 759.0, 833.0, 1309.0, 1063.0, 1257.0, 1155.0, 1506.0, 1724.0, - 1504.0, 1212.0, 796.0, 572.0, 1040.0, 1021.0, 965.0, 773.0, 772.0, - 720.0, 780.0, 772.0, 1044.0, 712.0, 590.0, 426.0, 737.0, 937.0, - 867.0, 820.0, 1135.0, 1217.0, 1023.0, 670.0, 1110.0, 1416.0, 1300.0, - 960.0, 652.0, 585.0, 446.0, 281.0, 307.0, 288.0, 445.0, 430.0, - 336.0, 250.0, 543.0, 784.0, 722.0, 495.0, 816.0, 887.0, 1113.0, - 767.0, 729.0, 459.0, 1107.0, 1110.0, 1244.0, 401.0, 489.0, 697.0, - 1524.0, 2252.0, 2408.0, 1633.0, 976.0, 546.0, 519.0, 705.0, 569.0, - 606.0, 322.0, 805.0, 1006.0, 1269.0, 1406.0, 1338.0, 1093.0, 607.0, - 535.0, 512.0, 424.0, 616.0, 701.0, 733.0, 1178.0, 1468.0, 1453.0, - 722.0, 365.0, 327.0, 446.0, 510.0, 653.0, 984.0, 910.0, 1032.0, - 829.0, 785.0, 706.0, 965.0, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/gendata.py deleted file mode 100755 index 35427826..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/gendata.py +++ /dev/null @@ -1,65 +0,0 @@ -#!/usr/bin/env python3 - -import numpy as np - -KH = 3 -KW = 3 -IH = 260 -IW = 260 -OH = IH - KH + 1 -OW = IW - KW + 1 - -info = np.finfo(np.float32) -nmant = 5 # Limit precision to avoid rounding errors -maxmant = 1 << nmant -minexp = 0 -maxexp = 5 - - -# Generate floating-point values with exact mantissa and exponent -def randf(n): - return np.ldexp( - np.random.randint(maxmant, size=n), np.random.randint(minexp, maxexp, size=n) - ) - - -inputs = randf((IH, IW)).astype(np.float32) -weights = np.ones((KH, KW), dtype=np.float32) -weights_1 = np.ones(KW, dtype=np.float32) -weights_2 = np.ones(KH, dtype=np.float32) -outputs = np.full((OH, OW), np.float32(0.0)) - -# Convolution -for kh in range(KH): - for kw in range(KW): - outputs += inputs[kh : (kh + OH), kw : (kw + OW)] * weights[kh][kw] - -print( - """#define KH {} -#define KW {} -#define IH {} -#define IW {} -#define I_SIZE {} -#define OH {} -#define OW {} -#define O_SIZE {} - -""".format( - KH, KW, IH, IW, IH * IW, OH, OW, OH * OW - ) -) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=10): - print("{} {}[{}] = {{".format(data_type, name, data_size)) - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print_array("input_k1", weights_1, "IW") -print_array("input_k2", weights_2, "IH") -print_array("input_image", inputs.flatten(), "I_SIZE") -print_array("verify_data", outputs.flatten(), "O_SIZE") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/vec-sep-conv.S b/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/vec-sep-conv.S deleted file mode 100644 index 0d4f874d..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/vec-sep-conv.S +++ /dev/null @@ -1,205 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Vectorized 2D separable convolution -//-------------------------------------------------------------------------- - - .text - .balign 4 - - .global vec_sep_conv -/* - * Calling convention: - * a0: size_t rows - * a1: size_t cols - * a2: size_t a_stride - * a3: size_t b_stride - * a4: const float *kw - * a5: const float *kh - * a6: const float *a - * a7: float *b - */ - -#define rows a0 -#define cols a1 -#define a_stride a2 -#define b_stride a3 -#define kw a4 -#define kh a5 -#define a a6 -#define b a7 - -#define ap t0 -#define bp t1 -#define vlen t2 -#define row_count t3 -#define VLEN_stride t4 -#define ap_4 t5 -#define ap_8 t6 - -#define row_check s0 -#define rows_odd s1 - -#define kw0 ft0 -#define kw1 ft1 -#define kw2 ft2 -#define kh0 ft3 -#define kh1 ft4 -#define kh2 ft5 - -#define vload0 v0 -#define vload1 v4 -#define vload2 v8 -#define vrow0 v16 -#define vrow1 v20 -#define vtmp v24 - -#define FRAMESIZE 32 - -vec_sep_conv: - addi sp, sp, -FRAMESIZE - sd s0, 0(sp) - sd s1, 8(sp) - - # load the kernel into scalar registers - flw kw0, 0(kw) - flw kw1, 4(kw) - flw kw2, 8(kw) - flw kh0, 0(kh) - flw kh1, 4(kh) - flw kh2, 8(kh) - - slli a_stride, a_stride, 2 - slli b_stride, b_stride, 2 - - mv row_check, rows - addi row_check, row_check, -2 - - andi rows_odd, rows, 1 - -# Prolog -loop_prolog: - mv ap, a - addi ap_4, ap, 4 - addi ap_8, ap, 8 - mv bp, b - mv row_count, row_check - - vsetvli vlen, cols, e32, m4, ta, ma - slli VLEN_stride, vlen, 2 - - # Load the first row and compute horizontal - vle32.v vload0, (ap) - vfmul.vf vrow0, vload0, kw0 - vle32.v vload1, (ap_4) - vfmacc.vf vrow0, kw1, vload1 - vle32.v vload2, (ap_8) - vfmacc.vf vrow0, kw2, vload2 - - add ap, ap, a_stride - addi ap_4, ap, 4 - addi ap_8, ap, 8 - - # Load the second row and compute horizontal - vle32.v vload0, (ap) - vfmul.vf vrow1, vload0, kw0 - vle32.v vload1, (ap_4) - vfmacc.vf vrow1, kw1, vload1 - vle32.v vload2, (ap_8) - vfmacc.vf vrow1, kw2, vload2 - - add ap, ap, a_stride - addi ap_4, ap, 4 - addi ap_8, ap, 8 - - # Begin the vertical computation with the first and second rows - vfmul.vf vrow0, vrow0, kh0 - vfmacc.vf vrow0, kh1, vrow1 - vfmul.vf vrow1, vrow1, kh0 - - # Load the third row and compute horizontal - vle32.v vload0, (ap) - vfmul.vf vtmp, vload0, kw0 - vle32.v vload1, (ap_4) - vfmacc.vf vtmp, kw1, vload1 - vle32.v vload2, (ap_8) - vfmacc.vf vtmp, kw2, vload2 - -# Main Loop -conv_loop: - - add ap, ap, a_stride - addi ap_4, ap, 4 - addi ap_8, ap, 8 - - vle32.v vload0, (ap) - vfmacc.vf vrow0, kh2, vtmp - vle32.v vload1, (ap_4) - vse32.v vrow0, (bp) - vle32.v vload2, (ap_8) - - vfmacc.vf vrow1, kh1, vtmp - vfmul.vf vrow0, vtmp, kh0 - - add ap, ap, a_stride - vfmul.vf vtmp, vload0, kw0 - addi ap_4, ap, 4 - vle32.v vload0, (ap) - vfmacc.vf vtmp, kw1, vload1 - addi ap_8, ap, 8 - vle32.v vload1, (ap_4) - vfmacc.vf vtmp, kw2, vload2 - vle32.v vload2, (ap_8) - add bp, bp, b_stride - - vfmacc.vf vrow1, kh2, vtmp - vfmacc.vf vrow0, kh1, vtmp - vse32.v vrow1, (bp) - vfmul.vf vrow1, vtmp, kh0 - - vfmul.vf vtmp, vload0, kw0 - add bp, bp, b_stride - vfmacc.vf vtmp, kw1, vload1 - addi row_count, row_count, -2 - vfmacc.vf vtmp, kw2, vload2 - - - bgtz row_count, conv_loop - -epilog: - vfmacc.vf vrow0, kh2, vtmp - vse32.v vrow0, (bp) - - bnez rows_odd, row_loop_complete - - vfmacc.vf vrow1, kh1, vtmp - - add ap, ap, a_stride - addi ap_4, ap, 4 - addi ap_8, ap, 8 - add bp, bp, b_stride - - vle32.v vload0, (ap) - vle32.v vload1, (ap_4) - vle32.v vload2, (ap_8) - - vfmul.vf vtmp, vload0, kw0 - vfmacc.vf vtmp, kw1, vload1 - vfmacc.vf vtmp, kw2, vload2 - - vfmacc.vf vrow1, kh2, vtmp - vse32.v vrow1, (bp) - -row_loop_complete: - add a, a, VLEN_stride - add b, b, VLEN_stride - - sub cols, cols, vlen - bnez cols, loop_prolog - -exit: - ld s0, 0(sp) - ld s1, 8(sp) - addi sp, sp, FRAMESIZE - - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/vec-sep-conv_main.c b/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/vec-sep-conv_main.c deleted file mode 100644 index 8b3ea262..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sep-conv-3/vec-sep-conv_main.c +++ /dev/null @@ -1,42 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Separable Convolution Benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized 2D separable convolution implementation. - -#include "util.h" -#include -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_sep_conv(size_t, size_t, size_t, size_t, const float *, const float *, - const float *, float *); - -int main(int argc, char *argv[]) { - float results_data[O_SIZE] = {0}; - printf("2dsepconv (OH,OW,KH,KW,IH,IW) = (%ld, %ld, %ld, %ld, %ld, %ld)\n", OH, - OW, KH, KW, IH, IW); - printf("operations = %ld\n", (IW - KW + 1) * (IH - KH + 1) * (KW + KH)); -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_sep_conv(OH, OW, IW, OW, input_k1, input_k2, input_image, results_data); - memset(results_data, 0, sizeof(results_data)); -#endif - - // Do the convolution - setStats(1); - vec_sep_conv(OH, OW, IW, OW, input_k1, input_k2, input_image, results_data); - setStats(0); - - // Check the results - return verifyFloat(O_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/dataset1.h deleted file mode 100644 index 39867b05..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/dataset1.h +++ /dev/null @@ -1,28 +0,0 @@ -#define M_DIM 6 -#define K_DIM 6 -#define N_DIM 6 - -typedef float data_t; - -static data_t a_matrix[M_DIM * K_DIM] = { - -9.0, -144.0, 8.0, 168.0, 96.0, -144.0, 52.0, -32.0, 160.0, - -6.0, -0.90625, 0.625, 3.25, 13.0, 0.0, 9.5, 104.0, -288.0, - -480.0, 480.0, -24.0, -48.0, -5.75, 40.0, -24.0, 2.875, 0.90625, - 2.0, -0.5625, -1.0, -2.625, -54.0, 176.0, -32.0, -15.5, 208.0, -}; -static data_t b_matrix[K_DIM * N_DIM] = { - -2.625, 0.75, 0.5, -432.0, 400.0, -3.75, 25.0, 29.0, -2.75, - 1.0, 368.0, -2.5, -8.0, 11.0, -2.25, 256.0, -13.0, -0.84375, - -8.0, 2.0, 0.71875, 168.0, 2.0, 64.0, 16.0, -1.375, -1.625, - -3.0, 7.0, 10.0, -0.375, 0.25, -6.0, -3.5, -4.0, 0.4375, -}; -static data_t verify_data[M_DIM * N_DIM] = { - -3394.375, -3926.75, 1202.25, 34232.0, -55112.0, - 12036.0, -2183.234375, 860.40234375, -252.58984375, 17456.53125, - 6923.15625, -642.7890625, 2012.46875, 183.4375, 1531.703125, - 901.0, 7983.0, 1477.3125, 13729.0, 13217.90625, - -1771.15625, 193509.25, -15344.25, -2491.75, 103.0, - 79.8671875, -13.59375, 10944.0625, -8549.71875, 203.9853515625, - -2821.109375, 377.34375, -1494.625, 40078.5, -24214.5, - -2115.65625, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/gendata.py deleted file mode 120000 index ecf1d19c..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/gendata.py +++ /dev/null @@ -1 +0,0 @@ -../vec-sgemm/gendata.py \ No newline at end of file diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/vec-sgemm.S b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/vec-sgemm.S deleted file mode 100644 index c4270edb..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/vec-sgemm.S +++ /dev/null @@ -1,317 +0,0 @@ - .text - .balign 4 - .global vec_sgemm_nn - .type vec_sgemm_nn,@function -# -# void -# vec_sgemm_nn(size_t n, -# size_t m, -# size_t k, -# const float*a, // m * k matrix -# size_t lda, -# const float*b, // k * n matrix -# size_t ldb, -# float*c, // m * n matrix -# size_t ldc) -# -# c += a*b (alpha=1, no transpose on input matrices) -# matrices stored in C row-major order - - - -# With LMUL=4, load 4 rows of C and 4 rows of B -# Load 16 scalars from A into FP registers - -#define n a0 -#define m a1 -#define k a2 -#define ap a3 -#define astride a4 -#define bp a5 -#define bstride a6 -#define cp a7 -#define cstride t0 -#define kt t1 -#define nt t2 -#define bnp t3 -#define cnp t4 -#define akp t5 -#define bkp s0 -#define nvl s1 -#define ccp s2 -#define amp s3 - -#define a00 ft0 -#define a10 ft1 -#define a20 ft2 -#define a30 ft3 -#define a01 ft4 -#define a11 ft5 -#define a21 ft6 -#define a31 ft7 -#define a02 fa0 -#define a12 fa1 -#define a22 fa2 -#define a32 fa3 -#define a03 fa4 -#define a13 fa5 -#define a23 fa6 -#define a33 fa7 - -#define FRAMESIZE 32 - -vec_sgemm_nn: - ld cstride, 0(sp) # Get arg from stack frame - addi sp, sp, -FRAMESIZE - sd s0, 0(sp) - sd s1, 8(sp) - sd s2, 16(sp) - sd s3, 24(sp) - - # Check for zero size matrices - beqz n, exit - beqz m, exit - beqz k, exit - - # Convert elements strides to byte strides. - slli astride, astride, 2 - slli bstride, bstride, 2 - slli cstride, cstride, 2 - - slti t6, m, 4 - bnez t6, m_remainder_m_loop - -c_row_loop: # Loop across rows of C blocks - mv nt, n # Initialize n counter for next row of C blocks - mv bnp, bp # Initialize B n-loop pointer to start - mv cnp, cp # Initialize C n-loop pointer - -c_col_loop: # Loop across columns of C - vsetvli nvl, nt, e32, m4, ta, ma # 32-bit vectors, LMUL=4 - - # Not enough remaining k elements to unroll by 4 - mv kt, k # Initialize inner loop counter - slti t6, kt, 4 - bnez t6, k_loop_remainder - - mv akp, ap # reset pointer into A to beginning - mv bkp, bnp # step to next column in B matrix - - # Initalize current C submatrix block from memory. - vle32.v v0, (cnp); add ccp, cnp, cstride; - flw a00, (akp); add amp, akp, astride; - flw a10, (amp); add amp, amp, astride; - vle32.v v4, (ccp); add ccp, ccp, cstride; - flw a20, (amp); add amp, amp, astride; - flw a30, (amp); add akp, akp, 4 - vle32.v v8, (ccp); add ccp, ccp, cstride; - flw a01, (akp); add amp, akp, astride; - flw a11, (amp); add amp, amp, astride; - vle32.v v12, (ccp); - flw a21, (amp); add amp, amp, astride; - flw a31, (amp); add akp, akp, 4 - - # Get vector from B matrix - vle32.v v16, (bkp); add bkp, bkp, bstride - flw a02, (akp); add amp, akp, astride; - flw a12, (amp); add amp, amp, astride; - vle32.v v20, (bkp); add bkp, bkp, bstride - flw a22, (amp); add amp, amp, astride; - flw a32, (amp); add akp, akp, 4 - vle32.v v24, (bkp); add bkp, bkp, bstride - flw a03, (akp); add amp, akp, astride; - flw a13, (amp); add amp, amp, astride; - vle32.v v28, (bkp); add bkp, bkp, bstride - flw a23, (amp); add amp, amp, astride; - flw a33, (amp); add akp, akp, 4 - - addi kt, kt, -4 - -k_loop: - # Compute current block of FMAs - vfmacc.vf v0, a00, v16 - vfmacc.vf v0, a01, v20 - vfmacc.vf v0, a02, v24 - vfmacc.vf v0, a03, v28 - - vfmacc.vf v4, a10, v16 - vfmacc.vf v4, a11, v20 - vfmacc.vf v4, a12, v24 - vfmacc.vf v4, a13, v28 - - vfmacc.vf v8, a20, v16 - vfmacc.vf v8, a21, v20 - vfmacc.vf v8, a22, v24 - vfmacc.vf v8, a23, v28 - - vfmacc.vf v12, a30, v16 - vfmacc.vf v12, a31, v20 - vfmacc.vf v12, a32, v24 - vfmacc.vf v12, a33, v28 - - addi kt, kt, -4 - blez kt, k_loop_remainder - - # Load values from A for the next iteration - flw a00, (akp); add amp, akp, astride; - flw a10, (amp); add amp, amp, astride; - flw a20, (amp); add amp, amp, astride; - flw a30, (amp); add akp, akp, 4 - flw a01, (akp); add amp, akp, astride; - flw a11, (amp); add amp, amp, astride; - flw a21, (amp); add amp, amp, astride; - flw a31, (amp); add akp, akp, 4 - flw a02, (akp); add amp, akp, astride; - flw a12, (amp); add amp, amp, astride; - flw a22, (amp); add amp, amp, astride; - flw a32, (amp); add akp, akp, 4 - flw a03, (akp); add amp, akp, astride; - flw a13, (amp); add amp, amp, astride; - flw a23, (amp); add amp, amp, astride; - flw a33, (amp); add akp, akp, 4 - - vle32.v v16, (bkp); add bkp, bkp, bstride - vle32.v v20, (bkp); add bkp, bkp, bstride - vle32.v v24, (bkp); add bkp, bkp, bstride - vle32.v v28, (bkp); add bkp, bkp, bstride - - j k_loop - -k_loop_remainder: - beqz kt, 1f - addi kt, kt, 4 - -k_loop_remainder_loop: - # Proceed at one k element per loop - addi kt, kt, -1 - vle32.v v16, (bkp) - flw a00, (akp); add amp, akp, astride; - flw a10, (amp); add amp, amp, astride; - flw a20, (amp); add amp, amp, astride; - flw a30, (amp) - - vfmacc.vf v0, a00, v16 - vfmacc.vf v4, a10, v16 - vfmacc.vf v8, a20, v16 - vfmacc.vf v12, a30, v16 - - addi akp, akp, 4 - add bkp, bkp, bstride - - bnez kt, k_loop_remainder_loop - -1: vse32.v v0, (cnp); add ccp, cnp, cstride; - vse32.v v4, (ccp); add ccp, ccp, cstride; - vse32.v v8, (ccp); add ccp, ccp, cstride; - vse32.v v12, (ccp) - - slli t6, nvl, 2 - add cnp, cnp, t6 - add bnp, bnp, t6 - sub nt, nt, nvl # Decrement element count in n dimension - bnez nt, c_col_loop - - # Move to the next set of rows - addi m, m, -4 - - slli t6, astride, 2 # Multiply astride by 4 - add ap, ap, t6 # Move A matrix pointer down 4 rows - slli t6, cstride, 2 # Multiply cstride by 4 - add cp, cp, t6 # Move C matrix pointer down 4 rows - - slti t6, m, 4 - beqz t6, c_row_loop - - beqz m, exit - -m_remainder_m_loop: - mv cnp, cp - mv bnp, bp - mv nt, n - -m_remainder_n_loop: - vsetvli nvl, nt, e32, m4, ta, ma # 32-bit vectors, LMUL=4 - - # Not enough remaining k elements to unroll by 4 - mv kt, k # Initialize inner loop counter - slti t6, kt, 4 - bnez t6, m_remainder_k_loop_remainder - - mv akp, ap # reset pointer into A to beginning - mv bkp, bnp # step to next column in B matrix - - vle32.v v0, (cnp) - - # Get vectors from B matrix - vle32.v v16, (bkp); add bkp, bkp, bstride - vle32.v v20, (bkp); add bkp, bkp, bstride - vle32.v v24, (bkp); add bkp, bkp, bstride - vle32.v v28, (bkp); add bkp, bkp, bstride - - # Inner loop scheduled assuming 4-clock occupancy of vfmacc instruction and single-issue pipeline - # Software pipeline loads - flw a00, (akp); addi akp, akp, 4 - flw a01, (akp); addi akp, akp, 4 - flw a02, (akp); addi akp, akp, 4 - flw a03, (akp); addi akp, akp, 4 - - addi kt, kt, -4 - -m_remainder_k_loop: - vfmacc.vf v0, a00, v16 - vfmacc.vf v0, a01, v20 - vfmacc.vf v0, a02, v24 - vfmacc.vf v0, a03, v28 - - addi kt, kt, -4 - blez kt, m_remainder_k_loop_remainder - - flw a00, (akp); addi akp, akp, 4 - flw a01, (akp); addi akp, akp, 4 - flw a02, (akp); addi akp, akp, 4 - flw a03, (akp); addi akp, akp, 4 - - vle32.v v16, (bkp); add bkp, bkp, bstride - vle32.v v20, (bkp); add bkp, bkp, bstride - vle32.v v24, (bkp); add bkp, bkp, bstride - vle32.v v28, (bkp); add bkp, bkp, bstride - - j m_remainder_k_loop - -m_remainder_k_loop_remainder: - addi kt, kt, 4 - beqz kt, 1f - -m_remainder_k_loop_remainder_loop: - addi kt, kt, -1 - vle32.v v16, (bkp) - flw a00, (akp) - - vfmacc.vf v0, a00, v16 - - addi akp, akp, 4 - add bkp, bkp, bstride - - bnez kt, m_remainder_k_loop_remainder_loop - -1: vse32.v v0, (cnp) - - slli t6, nvl, 2 - add cnp, cnp, t6 - add bnp, bnp, t6 - sub nt, nt, nvl - bnez nt, m_remainder_n_loop - - addi m, m, -1 - add ap, ap, astride - add cp, cp, cstride - - bnez m, m_remainder_m_loop - -exit: - ld s0, 0(sp) - ld s1, 8(sp) - ld s2, 16(sp) - ld s3, 24(sp) - addi sp, sp, FRAMESIZE - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/vec-sgemm_main.c b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/vec-sgemm_main.c deleted file mode 100644 index e9a81ecc..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v2/vec-sgemm_main.c +++ /dev/null @@ -1,43 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// SGEMM benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized sgemm implementation. - -#include "util.h" -#include -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_sgemm_nn(size_t, size_t, size_t, const float *, size_t, const float *, - size_t, float *, size_t); - -int main(int argc, char *argv[]) { - float results_data[M_DIM * N_DIM] = {0}; - printf("sgemm M,N,K = %ld,%ld,%ld\n", M_DIM, N_DIM, K_DIM); - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_sgemm_nn(N_DIM, M_DIM, K_DIM, a_matrix, K_DIM, b_matrix, N_DIM, - results_data, N_DIM); - memset(results_data, 0, sizeof(results_data)); -#endif - - // Do the sgemm - setStats(1); - vec_sgemm_nn(N_DIM, M_DIM, K_DIM, a_matrix, K_DIM, b_matrix, N_DIM, - results_data, N_DIM); - setStats(0); - - // Check the results - return verifyFloat(M_DIM * N_DIM, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/dataset1.h deleted file mode 100644 index 37351254..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/dataset1.h +++ /dev/null @@ -1,4429 +0,0 @@ -#define M_DIM 87 -#define K_DIM 87 -#define N_DIM 87 - -typedef float data_t; - -static data_t a_matrix[M_DIM * K_DIM] = { - 48.0, 1.0, 0.75, 1.125, -12.0, 4.0, -0.75, 48.0, - -16.0, 40.0, -4.0, 4.0, 0.0, -18.0, 12.0, 1.75, - -7.0, 0.9375, -1.0, -0.75, 0.25, 60.0, 24.0, 0.4375, - 8.0, 0.625, -2.5, 30.0, 6.5, 9.0, -12.0, -2.0, - -9.0, 80.0, -5.0, -24.0, -28.0, 56.0, 3.5, -0.625, - 4.0, -1.625, -32.0, 56.0, 0.125, 2.0, 0.1875, 16.0, - -36.0, -6.0, 0.0625, -40.0, -10.0, 7.0, -40.0, 12.0, - 5.0, 4.0, 48.0, -1.25, -64.0, 0.75, -28.0, 1.5, - -0.375, 3.25, 0.6875, 2.75, 120.0, -3.25, 2.0, 120.0, - -5.0, 72.0, 14.0, 15.0, 0.5, 3.75, -0.375, 11.0, - 0.25, 22.0, -0.25, 0.8125, 1.0, 0.75, -0.1875, -11.0, - 72.0, -0.5, -2.0, 0.75, -7.0, -104.0, -1.5, 12.0, - -3.5, 24.0, -3.75, 7.5, -16.0, 2.0, 3.0, 24.0, - 0.5, 40.0, -1.0, 0.0, 8.0, -10.0, 9.0, -8.0, - -56.0, 2.25, 2.0, -28.0, 0.125, 1.75, -0.25, 0.375, - 1.125, 10.0, -7.0, 24.0, -30.0, 1.0, -0.125, 104.0, - 16.0, 2.5, 24.0, -12.0, 1.125, 15.0, -0.625, 0.0, - -5.0, -5.0, -44.0, 48.0, 112.0, -1.125, 3.5, -104.0, - -1.25, 18.0, 3.5, 1.875, -32.0, 96.0, 0.75, 4.0, - 8.0, -1.0, 112.0, -6.0, 6.5, -1.5, 9.0, 12.0, - -8.0, 14.0, -4.5, -104.0, 11.0, 48.0, -22.0, -72.0, - 2.5, -1.25, 24.0, 1.0, -5.5, -56.0, 12.0, 6.5, - 12.0, -1.0, 0.4375, 0.0, -3.75, 48.0, -0.75, -1.25, - -0.1875, 48.0, 1.5, -14.0, -2.0, -32.0, -0.1875, 1.5, - 0.125, -52.0, -8.0, 40.0, -2.0, -16.0, -3.5, -28.0, - -1.25, -5.0, -112.0, 104.0, -0.5, -0.9375, 0.5, 60.0, - 6.0, 18.0, -7.0, -24.0, 1.5, -15.0, -24.0, -6.0, - -0.25, -2.0, 0.0, -22.0, 6.0, 3.5, 14.0, 7.0, - 2.0, 0.375, -80.0, 5.5, -1.75, 24.0, 10.0, -36.0, - -8.0, 104.0, -15.0, 0.9375, 64.0, 6.0, 0.0625, 1.0, - -48.0, 48.0, 30.0, -1.0, -3.5, -64.0, 6.0, -0.875, - 0.875, 8.0, -8.0, 5.0, 12.0, -80.0, 1.0, 2.0, - -0.5, 40.0, 0.0625, 0.625, -64.0, -24.0, -1.0, -4.0, - -0.25, -4.5, 22.0, 0.375, -2.0, 1.75, 0.75, 0.125, - 1.5, -3.0, -30.0, 5.0, -9.0, -3.0, -1.0, 0.25, - -0.0625, -0.3125, -2.0, 16.0, -0.5, -0.875, -0.4375, -1.75, - 1.0, -0.8125, 26.0, 20.0, -4.5, -48.0, 1.375, -1.5, - 24.0, 2.5, 56.0, -2.0, 1.25, 60.0, -2.0, 12.0, - -4.0, -0.625, 24.0, 4.0, -3.0, -1.0, -56.0, -48.0, - -2.0, 48.0, 36.0, 9.0, 15.0, -12.0, -7.0, 0.5, - 56.0, 56.0, 0.5625, 0.5, -1.5, -10.0, 44.0, -2.25, - 1.375, -56.0, -0.8125, 2.75, 72.0, -0.25, 56.0, 52.0, - 6.0, 20.0, 3.0, -0.625, 5.5, 1.25, 1.125, 0.0, - -2.0, -0.375, 20.0, 1.25, -80.0, -28.0, 0.3125, 88.0, - -4.5, -64.0, 4.0, -2.5, -20.0, 0.0, 48.0, 0.375, - 16.0, 1.5, -2.5, 88.0, 4.0, -72.0, 12.0, 72.0, - -112.0, -4.0, -14.0, -22.0, -22.0, 0.0, -1.125, -1.25, - 0.0, -14.0, 8.0, 80.0, 4.0, 6.0, 0.75, -32.0, - -1.875, 2.75, 24.0, -11.0, 0.75, 0.4375, -3.5, -0.6875, - 5.0, -0.5, -12.0, 12.0, -0.4375, -8.0, 8.0, -0.875, - 112.0, 6.5, -56.0, -32.0, -0.375, -0.4375, 0.0, 1.375, - -3.0, -4.0, -20.0, 0.75, -20.0, 28.0, 6.0, -1.0, - -56.0, 30.0, 104.0, 7.5, -6.0, 5.0, -16.0, -0.5625, - -11.0, -32.0, -2.0, -0.8125, 24.0, 0.4375, -64.0, 6.0, - -0.9375, 14.0, -4.5, 24.0, 4.0, 5.0, 4.0, -11.0, - 0.375, -0.125, 64.0, -0.5625, 2.0, -0.5, -6.5, 0.0, - -10.0, -15.0, -15.0, 28.0, -12.0, -24.0, 64.0, 1.0, - -0.6875, 2.0, -56.0, -32.0, -0.6875, 0.0, 28.0, -6.0, - -20.0, 1.5, -40.0, -1.5, 5.5, -3.25, -2.0, -0.875, - 8.0, 24.0, 88.0, 104.0, 8.0, 16.0, -3.5, 0.375, - 0.0, -72.0, -22.0, -1.75, -8.0, -20.0, 1.0, 7.0, - 15.0, 88.0, 0.5, 12.0, 16.0, -1.5, 6.0, -0.4375, - -3.75, 56.0, 0.0, -1.5, 1.5, 13.0, -0.5625, -0.375, - -3.0, 3.5, 3.0, 20.0, 0.4375, -72.0, 6.5, 24.0, - -1.5, 6.0, -3.5, 6.5, -80.0, 1.375, -32.0, -0.5, - -26.0, -24.0, -8.0, 0.75, -40.0, 2.0, 4.0, 13.0, - -1.5, 0.4375, 0.0, -0.125, -0.5, -2.75, 64.0, 0.0, - 0.0, -20.0, 1.75, -48.0, -8.0, -8.0, -4.0, 24.0, - -6.5, -14.0, 0.0, 1.0, 1.125, -0.8125, -0.75, 0.375, - 6.0, 32.0, 32.0, -0.9375, 64.0, 0.5, 10.0, -24.0, - 6.0, -8.0, -16.0, -28.0, 4.5, -1.0, -0.5, 0.875, - -1.0, 0.75, -4.0, 32.0, -6.0, 7.0, -0.375, -0.1875, - 15.0, 2.5, -16.0, 48.0, -7.0, 80.0, 6.5, -0.75, - -26.0, 0.125, 0.625, 11.0, 0.75, 3.25, 0.0, -7.0, - 2.0, -12.0, -12.0, -128.0, 0.0, -4.5, 0.0, -20.0, - 7.0, -0.0625, 10.0, -0.75, 6.0, 14.0, 3.25, -10.0, - -0.625, -0.125, 44.0, -1.0, 48.0, -4.0, -1.25, -0.75, - 1.0, -32.0, 1.5, 1.375, 0.5, 30.0, -20.0, 1.5, - 1.25, 6.5, -60.0, 1.625, -10.0, -8.0, 88.0, -24.0, - -0.5625, 0.3125, -1.5, -16.0, -4.0, -0.25, 2.5, -4.0, - 0.625, -0.625, 104.0, 0.0, -1.0, 5.0, -3.5, 4.0, - 2.0, -7.5, -4.0, 1.5, -20.0, 1.5, 8.0, 1.5, - -112.0, -0.0625, -0.875, 2.75, 36.0, 1.875, 0.5, 0.0, - 3.25, 10.0, -48.0, 0.5625, -3.0, 104.0, -0.875, -48.0, - 0.6875, -4.0, -0.5625, 4.0, 1.75, 1.625, 12.0, 6.0, - 0.25, -10.0, -40.0, 1.5, 32.0, -0.9375, 88.0, 3.25, - 3.5, 3.25, -22.0, 20.0, 12.0, 10.0, 3.0, -1.375, - 3.75, -1.75, 120.0, -0.1875, 1.75, 3.75, -0.75, -16.0, - 24.0, -5.0, -5.0, -2.25, 1.0, -0.5, 7.5, 0.0, - 4.5, 0.3125, -1.625, 3.5, -28.0, -9.0, -9.0, 96.0, - 6.0, 3.0, -0.25, 3.5, -1.25, 44.0, -1.5, -3.5, - -0.125, 8.0, -2.0, -60.0, -8.0, 5.0, -0.3125, 0.625, - -7.0, -0.0625, 0.0, -0.5, 96.0, 56.0, -13.0, -1.375, - -1.375, 24.0, 0.875, 1.25, 72.0, -7.0, -10.0, -11.0, - 5.5, 28.0, 6.5, 0.75, 0.1875, 24.0, -5.0, 20.0, - 3.75, -3.25, -0.8125, -104.0, 6.0, -7.0, -2.25, 0.1875, - 96.0, -4.0, 8.0, -13.0, -28.0, -12.0, 0.375, 4.0, - -1.25, -56.0, -0.5625, -48.0, -32.0, 2.5, 0.5, -2.75, - -60.0, 6.5, 36.0, -112.0, 1.0, 2.25, 24.0, 2.5, - -0.0625, 2.5, 12.0, -1.75, 16.0, 0.0, 0.6875, -8.0, - 8.0, -26.0, -24.0, 3.5, 1.75, -26.0, 88.0, -22.0, - 9.0, 7.5, 1.5, -48.0, -7.0, -1.25, 16.0, -80.0, - 0.5, 1.5, 0.5, -3.0, 26.0, 1.0, 80.0, -3.75, - -64.0, -0.625, -104.0, 8.0, 0.0625, -0.125, 2.0, -0.75, - 2.5, -0.375, 5.0, -1.5, -12.0, -8.0, 52.0, -7.5, - 80.0, -30.0, 0.0, -0.9375, 1.875, -88.0, 13.0, 64.0, - -0.875, -56.0, 3.0, -32.0, 0.0625, 2.0, 8.0, 8.0, - -4.0, 16.0, -0.5625, -7.0, 40.0, 12.0, 1.875, 56.0, - -16.0, -8.0, -0.0625, 1.0, 3.0, 28.0, -4.0, -5.5, - -4.0, -24.0, -5.5, -120.0, -0.9375, 0.0, 20.0, 0.0, - -72.0, 4.5, -2.0, -64.0, -24.0, 0.25, 2.0, 12.0, - 0.0, -3.0, -0.0625, 3.25, 48.0, 0.75, 0.0, 4.0, - 1.0, 80.0, -1.0, -24.0, 0.1875, 0.375, -2.5, 104.0, - -3.5, -0.5, 1.75, 24.0, 2.75, -5.0, -1.0, 0.5, - -8.0, 4.0, -96.0, 2.0, -10.0, 0.3125, -1.0, 24.0, - -24.0, 64.0, -11.0, 6.0, -1.0, 64.0, -12.0, -7.0, - 52.0, -14.0, 120.0, 20.0, -12.0, 1.5, -0.1875, 52.0, - -120.0, 24.0, -0.75, -20.0, -12.0, -0.875, 40.0, -0.4375, - 10.0, 1.25, 7.0, 5.0, 0.0, -1.0, -1.875, -24.0, - 12.0, 0.75, 1.75, 6.0, 0.75, 0.1875, 0.0, 36.0, - 15.0, 0.0, -12.0, 0.875, 1.375, -72.0, -12.0, 3.0, - 0.75, 0.3125, 0.0, -4.0, -4.0, 3.0, -3.0, -8.0, - -3.5, 2.0, 4.0, 9.0, -20.0, 7.0, 32.0, -6.5, - 1.375, -0.5, -1.25, 0.6875, 20.0, -1.25, -0.5, 1.75, - -1.5, -0.5, -0.125, -24.0, 8.0, 3.5, 1.5, 3.0, - -64.0, 0.375, -40.0, 3.0, 2.5, -1.0, 0.5625, 5.0, - -0.125, 0.75, -0.5, -1.625, -2.5, 3.0, -1.5, -12.0, - 10.0, 1.5, 2.5, 2.0, 20.0, 0.375, 0.25, -11.0, - 0.0, 3.0, -0.5, 28.0, 36.0, -26.0, 72.0, -10.0, - -8.0, 28.0, -1.0, 3.25, -7.0, -0.5625, 2.0, 48.0, - -24.0, 12.0, -0.125, 0.25, 0.25, 6.5, -0.75, 2.0, - -112.0, 0.875, 56.0, 3.25, 40.0, -20.0, -11.0, 1.25, - 16.0, 1.5, 14.0, 1.75, -36.0, -2.0, 4.5, -18.0, - -16.0, -6.5, 0.375, 22.0, -6.0, 0.25, 52.0, -1.625, - -112.0, 24.0, -2.0, 48.0, 3.75, 2.0, -3.5, -8.0, - 0.4375, -0.875, 60.0, -60.0, -1.5, 120.0, 0.1875, 3.5, - 0.875, -2.5, 36.0, -4.0, -30.0, 1.5, 0.125, -8.0, - -80.0, -16.0, 1.375, -15.0, 13.0, 3.25, -5.0, -0.875, - 64.0, 6.0, 2.0, 1.625, 1.0, -20.0, 32.0, -0.25, - 28.0, -5.5, -12.0, 0.5, -24.0, 28.0, 0.0, 1.5, - 0.5, 0.5, -0.5, -2.0, -30.0, 120.0, -2.5, 0.3125, - 24.0, -1.0, -8.0, -4.0, -0.3125, -0.75, -48.0, -1.5, - 1.125, -2.75, -15.0, 1.5, -0.875, -0.5, 0.5, 0.0, - 8.0, 1.375, -0.3125, 9.0, -6.0, -1.0, 2.75, -0.375, - 2.5, 3.0, 44.0, 0.3125, -5.5, -5.5, -8.0, 5.0, - -1.5, 0.75, 2.5, 12.0, -12.0, -1.0, -64.0, -8.0, - -4.0, -3.25, -16.0, -12.0, 112.0, 1.625, 1.75, 8.0, - 1.5, 22.0, 8.0, 13.0, 1.25, 10.0, -0.1875, 20.0, - 0.6875, 0.9375, 0.0, 7.5, 28.0, -0.5, -18.0, 5.0, - 24.0, -3.0, 0.125, 40.0, 3.75, -0.75, -48.0, -11.0, - -2.5, 16.0, 16.0, 3.75, -20.0, -28.0, 0.125, 6.5, - 0.5, 3.25, 0.875, 0.0, 0.125, 12.0, 2.0, 88.0, - -1.75, -8.0, -8.0, 0.0, 88.0, 88.0, -8.0, -24.0, - 3.25, 80.0, -3.0, 16.0, -0.5625, 8.0, -0.8125, 8.0, - -1.25, 4.0, 16.0, 0.0, 56.0, -16.0, -20.0, -26.0, - -8.0, -0.125, 0.375, 2.0, 3.5, 8.0, -16.0, -5.0, - 0.25, -56.0, 15.0, 28.0, 0.0625, -1.0, -32.0, 44.0, - 0.25, -1.0, 0.0, 1.25, 60.0, 0.0, 1.375, 30.0, - -6.0, -6.5, 2.5, -52.0, 0.4375, -16.0, 1.5, -60.0, - 0.375, -48.0, -14.0, -1.75, -24.0, -0.875, -32.0, -0.75, - -0.6875, -44.0, 12.0, 120.0, 1.0, 18.0, -4.5, -2.5, - -8.0, -120.0, 2.0, -16.0, 0.1875, -1.25, -1.5, 2.75, - -60.0, 10.0, 1.0, 3.5, -9.0, -0.125, -56.0, 1.25, - 7.5, 52.0, -0.75, -0.5, 6.0, -2.0, -4.0, -5.5, - 0.875, 28.0, -28.0, 3.5, 24.0, 0.0, 0.125, 20.0, - 0.625, -3.0, -16.0, 0.8125, 14.0, 44.0, -2.0, -52.0, - 0.0625, -1.25, -1.25, 2.0, 24.0, -6.0, 0.0, 0.75, - -0.25, 0.0, -1.0, -1.875, -0.25, 5.5, 1.5, 0.0625, - -8.0, 7.5, 80.0, -9.0, 0.625, 4.0, 0.6875, 1.625, - 0.5, 13.0, 0.0625, 4.5, 40.0, 64.0, -6.0, 48.0, - 14.0, 0.0, -0.875, -0.5, -52.0, -2.75, 32.0, 4.5, - -10.0, 0.125, 8.0, -24.0, -3.75, -22.0, 2.25, 3.25, - 0.375, 16.0, -1.0, 6.0, 16.0, 16.0, -20.0, 0.25, - 8.0, -128.0, -26.0, -20.0, 56.0, -32.0, -6.0, -7.0, - 28.0, -1.5, -7.0, -64.0, 8.0, 16.0, 32.0, 1.625, - 2.0, -0.625, -12.0, 0.75, 12.0, -24.0, 3.5, 0.5, - -28.0, -0.875, 0.0, -1.0, -4.0, -8.0, -0.5625, 15.0, - 0.6875, -6.5, -6.0, -1.0, -8.0, -0.4375, -16.0, 1.0, - -0.375, -6.0, 15.0, 4.0, -1.0, -12.0, -56.0, -4.5, - -112.0, 0.5, 7.5, 0.4375, 0.3125, 4.5, -1.0, -44.0, - -52.0, 0.25, 9.0, 12.0, 6.0, 1.625, 48.0, -8.0, - 3.5, -14.0, -30.0, -1.75, 12.0, 1.0, 20.0, 0.25, - 1.0, 1.0, 0.0, 2.0, -6.0, 32.0, 0.625, 3.75, - -0.5625, 8.0, -2.0, -0.875, 14.0, 24.0, -8.0, 0.9375, - 5.5, 0.75, 0.0, 28.0, 56.0, 1.0, -4.0, 56.0, - 0.5, 10.0, 0.75, -4.0, -1.5, -2.75, 14.0, 0.25, - 3.25, 56.0, 5.0, 8.0, -15.0, 12.0, -0.75, 0.0, - -8.0, 0.0, -0.8125, 4.0, 3.5, 24.0, -64.0, 1.0, - -26.0, 22.0, 1.375, 24.0, 44.0, -24.0, 16.0, 2.75, - -4.0, 1.5, 0.5, -8.0, 1.5, -32.0, 0.25, -1.25, - 0.25, 0.375, 18.0, 2.25, -44.0, -6.0, 3.75, 0.5, - -2.0, -6.0, 2.75, -0.6875, 6.5, 1.125, -18.0, 1.75, - 0.9375, -44.0, -3.0, -8.0, 28.0, 1.125, 1.5, 0.0, - 26.0, -104.0, -0.5, 0.25, -3.0, -3.0, 0.375, -4.5, - 3.5, 1.125, 88.0, -0.5, -2.0, 1.0, -14.0, -2.25, - -0.0625, 104.0, -1.875, 1.0, 56.0, -0.75, -13.0, -36.0, - 2.0, -32.0, -0.125, 36.0, 44.0, 5.0, -18.0, 4.0, - -24.0, -20.0, -5.0, 2.0, 10.0, -3.25, -1.5, 28.0, - 1.125, -4.0, -6.5, 0.25, 0.5, -3.0, -0.5, 1.0, - 30.0, -32.0, 0.6875, -6.0, 2.0, -18.0, 20.0, 0.875, - -64.0, 32.0, -2.75, 0.0625, 3.5, 1.75, 3.25, -1.25, - -0.375, -128.0, -120.0, -0.25, 0.8125, 1.75, 0.8125, 7.0, - -56.0, -1.125, -4.0, 60.0, 15.0, 0.5625, -28.0, -1.375, - 0.9375, -24.0, 3.5, 16.0, -28.0, 0.25, -28.0, 104.0, - 4.0, -1.0, 32.0, -5.0, -0.875, -3.25, 2.0, 60.0, - -2.0, 3.0, 0.0, -3.25, 2.5, -88.0, -1.375, -56.0, - -0.25, 30.0, -16.0, 0.25, -0.5, -2.0, 0.0, -3.75, - 0.3125, -1.5, 0.875, -1.625, 64.0, 16.0, -0.25, 0.5, - 0.5625, 9.0, 0.25, -120.0, 64.0, -6.5, -3.0, 3.75, - -26.0, -16.0, 2.25, -80.0, 10.0, -96.0, -2.5, 0.4375, - -1.375, 1.5, 0.1875, 80.0, 0.0, -0.625, -10.0, 3.25, - -7.5, 88.0, -4.0, -6.0, -8.0, 24.0, -8.0, -26.0, - -1.0, 0.875, 8.0, -0.375, -0.625, 0.75, -1.5, -4.0, - 9.0, 0.9375, -8.0, 1.5, -32.0, 112.0, 0.75, -16.0, - -5.5, -1.875, -2.25, -13.0, -6.0, -15.0, 4.0, -14.0, - 2.5, 0.125, -6.0, 1.0, -1.625, -6.0, -3.75, 9.0, - -1.5, -24.0, -6.0, 2.0, -44.0, -64.0, 0.375, 0.0, - 7.0, 56.0, -14.0, 0.625, -2.0, -1.0, 16.0, -3.0, - 18.0, -48.0, -0.75, 0.5, 0.125, -8.0, -13.0, -12.0, - 0.125, 1.0, -10.0, 1.375, -20.0, -40.0, 14.0, 40.0, - 1.5, -13.0, 28.0, -4.0, 20.0, -0.375, 0.875, 1.875, - -2.0, 6.5, 4.0, -12.0, -4.0, -0.875, 12.0, 1.0, - 1.75, 6.0, 52.0, 0.25, 4.0, -7.0, -7.5, 60.0, - -0.1875, -4.0, 52.0, -3.5, 2.0, -16.0, 80.0, -64.0, - 13.0, -120.0, 0.375, -0.25, 24.0, 16.0, -0.25, 1.25, - -2.0, 1.25, -0.875, 6.5, 0.0, 6.0, -16.0, -0.75, - -0.8125, -20.0, 56.0, 0.375, 0.9375, 15.0, -0.625, 0.0, - -60.0, 1.25, 0.625, 0.5, 0.25, -6.5, 0.0, -1.0, - -1.875, 30.0, -7.0, 28.0, -16.0, -14.0, 1.25, -20.0, - 24.0, -1.5, 0.0, 60.0, 0.4375, 36.0, 1.375, 0.0, - -3.75, 0.625, -6.0, 0.0, 0.25, 1.0, -2.75, 8.0, - -1.5, -20.0, 52.0, -1.5, 7.0, -8.0, 1.5, -0.3125, - 5.0, -44.0, 4.0, 2.0, -112.0, 2.0, -40.0, -96.0, - -6.0, -6.0, 2.25, 96.0, 0.0, -2.0, 48.0, 16.0, - -3.75, -36.0, 13.0, -2.0, -0.6875, 8.0, -4.0, 48.0, - 12.0, 5.0, -0.75, -2.0, -112.0, -64.0, 3.75, -8.0, - -64.0, 1.125, 7.0, 104.0, 5.0, 15.0, -20.0, 1.375, - 2.75, -2.0, 24.0, 5.0, -80.0, -22.0, -0.25, -30.0, - -40.0, -0.5, -0.1875, -1.25, 0.5, -0.8125, -6.0, -0.625, - -6.5, 2.5, -2.0, -1.0, 0.4375, 0.125, -0.5, -0.5, - -1.5, 8.0, -16.0, -8.0, 40.0, -0.4375, 3.25, -0.3125, - 0.5, 2.0, 10.0, -4.0, 4.5, -48.0, -0.6875, -11.0, - -0.75, 5.5, -6.0, -1.5, 7.0, 96.0, -5.0, 1.25, - 0.4375, -5.0, -2.0, 0.75, -80.0, 3.5, 7.5, -60.0, - 6.0, -0.5, -1.5, 40.0, -6.0, 0.1875, 0.0, -12.0, - 7.5, 0.0, 0.0, 0.9375, -4.0, -48.0, 0.0, -0.3125, - 7.5, 12.0, -1.5, 5.0, 1.75, 4.5, 1.5, -4.0, - 0.0, 44.0, 0.8125, -8.0, 80.0, 96.0, -112.0, -0.6875, - 0.875, -13.0, -13.0, -2.75, -3.0, -0.5, 44.0, 6.0, - 48.0, -40.0, 48.0, 1.125, 120.0, 0.1875, -0.875, -4.0, - -0.8125, -0.5, 64.0, 10.0, 1.625, -1.0, 4.5, 40.0, - 5.0, 12.0, 72.0, 4.0, 24.0, -32.0, -3.0, -0.75, - -28.0, 48.0, -128.0, -0.4375, -52.0, -0.875, 0.0, -0.125, - 0.4375, 0.125, 2.0, 60.0, -0.4375, -0.75, -56.0, -30.0, - 8.0, -12.0, -22.0, -2.0, -0.375, -16.0, -24.0, -2.0, - -0.5, -1.0, 15.0, -10.0, 0.0, 0.0, -14.0, 4.0, - 7.5, -0.25, 44.0, -0.5, 1.0, 52.0, 6.0, -12.0, - 0.75, -14.0, 22.0, -8.0, 12.0, -2.25, 0.5, 0.6875, - -3.0, -1.0, -4.0, 28.0, 5.0, -7.0, -128.0, 12.0, - 12.0, -4.0, -1.0, 2.0, 16.0, -8.0, 7.5, -36.0, - -7.5, -8.0, 6.0, 0.375, -1.75, -32.0, 26.0, 16.0, - -80.0, -1.75, -1.25, -56.0, 5.0, 26.0, -24.0, 24.0, - -64.0, 4.0, -20.0, 0.8125, -0.8125, 0.375, 0.625, 20.0, - -0.375, -0.9375, -4.0, 7.0, 11.0, 16.0, 4.0, -13.0, - -10.0, -0.5, 3.0, 8.0, 40.0, 8.0, 6.0, 48.0, - -26.0, -8.0, 1.625, 20.0, 0.8125, -6.0, -5.0, 60.0, - -14.0, 1.0, 28.0, 12.0, -0.25, -6.0, 8.0, 1.25, - 9.0, 12.0, -1.75, 104.0, -1.75, -4.0, -16.0, -5.0, - -2.0, -2.0, -0.75, 1.5, -32.0, 0.25, -0.1875, -8.0, - 1.375, -0.75, -0.6875, 2.25, -10.0, -3.75, -4.0, 20.0, - 7.0, 14.0, 0.25, 0.25, 0.75, 2.0, -32.0, -7.0, - 0.25, -0.6875, -15.0, 3.0, 2.0, -0.375, 36.0, 6.0, - -0.3125, 12.0, -3.25, 112.0, 5.0, -1.5, 4.0, -0.75, - 3.25, -104.0, -7.0, 44.0, -0.5, -0.3125, -0.6875, 12.0, - -8.0, -0.75, -2.0, 30.0, 32.0, -3.5, -104.0, -10.0, - 0.4375, -8.0, 4.0, -0.75, -13.0, 11.0, 3.0, -15.0, - -3.5, -128.0, -1.25, -28.0, -44.0, -1.0, 1.0, 1.375, - 48.0, 1.75, 1.5, 0.0, 104.0, 0.0, -15.0, 56.0, - -0.5625, 24.0, 0.625, -4.0, -15.0, -18.0, -1.25, 3.75, - 0.5, 1.5, -2.0, -24.0, 16.0, 24.0, 0.0625, -0.9375, - 60.0, -1.5, 7.5, -0.3125, 22.0, -0.5625, 0.0, 0.5, - 0.375, -24.0, 0.5, -0.875, 1.75, -0.5, 120.0, 1.375, - 1.0, -6.0, -22.0, 8.0, -10.0, 112.0, -2.0, 1.125, - -5.0, -22.0, 3.0, -4.0, 3.0, 7.0, 0.8125, 1.0, - -12.0, -8.0, -56.0, -0.75, 4.0, -0.375, 18.0, 2.25, - -3.5, -0.875, 7.5, -0.875, -44.0, 1.25, 120.0, 8.0, - 0.375, 52.0, -1.375, 18.0, -0.4375, 20.0, -6.5, -18.0, - -8.0, -2.0, 72.0, -6.0, -14.0, -6.0, -24.0, 5.5, - 3.0, 11.0, 8.0, 4.0, 2.25, -24.0, 0.5, 20.0, - 4.5, 0.875, 0.125, -48.0, -9.0, -6.0, -12.0, -3.5, - 4.5, -2.0, -0.625, 0.0625, -20.0, 4.0, 0.25, -1.75, - 112.0, 0.0, -5.0, 48.0, 48.0, 56.0, 6.5, -1.375, - -72.0, 88.0, -8.0, 1.25, -1.875, -1.25, -1.0, 30.0, - 56.0, -2.5, 56.0, 2.75, -0.75, -2.0, 0.0, -3.0, - -0.25, 12.0, 7.0, 1.5, -1.875, -10.0, -3.0, 7.0, - -3.0, 1.75, -0.375, 12.0, 60.0, -0.75, -1.5, 2.0, - 3.0, 6.5, 1.125, -0.625, 1.75, 20.0, 0.1875, 0.5, - 14.0, -4.5, 22.0, -3.0, 24.0, 0.0, -6.0, -0.1875, - 52.0, -32.0, -80.0, -0.5625, 1.5, -0.5, 0.75, 0.25, - 1.0, -112.0, 6.0, 7.5, 4.0, -4.0, -10.0, 12.0, - 3.5, 2.25, -1.75, -1.0, -26.0, -1.25, 7.5, 64.0, - 9.0, 0.5625, -24.0, -6.0, -24.0, 5.0, -7.0, 0.0, - -5.0, 0.0, 2.5, 80.0, -18.0, -8.0, 9.0, -16.0, - 32.0, -0.375, -2.0, 0.75, 1.25, -0.625, -1.0, -1.5, - -6.5, 1.125, -12.0, -20.0, -1.0, -20.0, -0.75, 12.0, - -9.0, -32.0, -64.0, 16.0, 2.5, 4.0, 36.0, 1.0, - 0.375, -4.0, 32.0, -2.25, -5.5, 0.5, 14.0, 48.0, - -5.0, 2.0, -2.0, 28.0, -0.75, -0.75, -1.5, -96.0, - -30.0, -2.0, 0.25, -40.0, -0.25, 6.0, -36.0, 0.75, - -8.0, -6.5, -12.0, -40.0, -0.5, 60.0, -0.5, 32.0, - -1.0, 3.0, -0.8125, -9.0, 40.0, -0.25, 16.0, -10.0, - -24.0, 112.0, 1.25, 16.0, -3.0, -1.0, -13.0, 40.0, - -24.0, 0.875, -6.0, 96.0, -16.0, 16.0, -2.25, 0.5, - -6.5, 20.0, -16.0, -2.75, -28.0, 6.5, -1.25, -1.5, - -1.25, -10.0, 4.0, -1.75, 1.25, 0.875, 8.0, 1.375, - -0.3125, 8.0, 28.0, -80.0, 20.0, 1.125, -0.25, -0.625, - 32.0, 13.0, -1.25, 1.375, -2.0, 0.5, 1.75, -3.0, - 24.0, -3.25, -1.0, 0.0, 0.4375, -14.0, -2.0, -0.625, - 0.5, -0.5, -0.25, 96.0, -1.125, 0.0, -1.5, 14.0, - 1.0, -0.5, -8.0, -52.0, 0.0, 12.0, -3.75, 3.5, - -7.0, -1.0, -1.75, 16.0, 24.0, -1.0, -5.0, -10.0, - 2.0, -112.0, 5.5, -28.0, -3.25, 4.5, -7.0, 0.75, - 40.0, 8.0, 56.0, 6.0, -12.0, 0.25, -1.0, 0.0625, - -3.0, 8.0, 7.0, -5.0, 3.25, 48.0, -1.25, 16.0, - 28.0, -22.0, -0.5625, -0.75, -1.0, -0.75, -40.0, -4.0, - 16.0, 1.25, 16.0, -24.0, -4.5, 32.0, -32.0, -16.0, - -56.0, -32.0, 52.0, 0.0, -16.0, -0.5625, -14.0, -40.0, - -0.25, -3.0, -5.0, -0.75, -1.75, -4.0, -26.0, -8.0, - -8.0, 0.8125, 8.0, 13.0, 1.0, 0.5, 0.5, 8.0, - -28.0, 112.0, -24.0, 88.0, -1.75, 0.0, 0.6875, 1.625, - -12.0, 36.0, 3.0, -52.0, -2.0, 2.5, 12.0, -80.0, - -0.875, 0.5, 7.5, -2.25, -48.0, -0.25, 0.0, 0.75, - 0.25, -44.0, -0.3125, -32.0, -5.0, -7.0, 0.0625, 7.0, - -88.0, 0.75, 112.0, -9.0, -6.5, 5.0, -16.0, -0.25, - -16.0, 15.0, 7.5, 0.75, 2.0, -0.4375, 0.375, -2.0, - -6.5, -48.0, 14.0, -0.625, -0.25, -6.0, 0.625, 0.875, - 28.0, 4.0, 72.0, 1.875, 3.0, 0.75, 2.0, -4.0, - 6.0, -1.25, -24.0, 1.5, -1.25, 0.5, 6.0, 0.875, - 0.0, 8.0, 1.0, 0.25, -64.0, -3.5, 0.25, 5.0, - 16.0, -3.0, 0.25, 0.6875, -1.75, -128.0, -8.0, 4.0, - 44.0, 32.0, 8.0, 0.375, 0.75, 7.0, -26.0, 2.0, - -3.75, -16.0, 8.0, -0.5625, -8.0, 1.0, 0.5625, 16.0, - -1.125, 30.0, 24.0, 0.75, -13.0, -5.0, 1.25, 8.0, - -1.5, 0.375, -0.75, 0.5, 16.0, 0.0, -2.0, 32.0, - 0.3125, -2.0, -13.0, 7.5, 18.0, 28.0, -12.0, -0.75, - 12.0, 16.0, 0.3125, 80.0, -36.0, -4.0, -3.0, -32.0, - 0.375, 3.25, -48.0, -16.0, 0.125, -32.0, -1.75, -14.0, - -4.0, -1.25, -2.25, -40.0, 6.0, -12.0, 1.0, 36.0, - -4.5, -8.0, -0.5, 0.1875, -12.0, -3.0, 48.0, -32.0, - 64.0, 4.5, -10.0, -120.0, 1.5, -48.0, 2.75, -40.0, - -13.0, 0.375, -6.0, -64.0, -4.0, -112.0, -2.0, 0.5, - -2.75, -36.0, -2.5, 3.0, -24.0, -8.0, -0.375, -32.0, - -2.0, -0.1875, -1.0, -0.5, 5.0, 0.6875, 13.0, -0.75, - 3.75, 0.3125, -24.0, -3.0, -64.0, 1.5, 1.125, 1.25, - 1.125, -1.5, 20.0, 40.0, -4.5, -16.0, -16.0, -2.5, - 7.5, -112.0, -12.0, 7.0, -24.0, 56.0, 12.0, 1.0, - -14.0, -2.0, -1.5, -6.5, -0.3125, -1.5, -56.0, 16.0, - 8.0, 16.0, -0.3125, -1.25, 8.0, -20.0, -0.625, 14.0, - -0.875, 0.125, -0.75, -0.25, -32.0, -0.5, -2.0, -4.5, - 10.0, 48.0, -32.0, -88.0, -1.875, -3.0, 1.5, 2.5, - -0.5, -88.0, 1.75, 0.3125, 56.0, 3.0, -120.0, 13.0, - 12.0, -2.5, -11.0, -56.0, 3.5, 0.0625, -10.0, -112.0, - -22.0, 64.0, -2.5, -3.5, -112.0, 112.0, -24.0, 4.0, - -10.0, -7.0, -6.5, 8.0, 0.125, -32.0, 30.0, -0.0625, - 0.75, -1.875, -0.9375, 0.75, 24.0, -32.0, 0.125, -2.75, - -3.75, -2.5, -1.375, 0.625, 6.0, 0.9375, 11.0, 0.375, - 96.0, -0.9375, 1.625, -3.25, -8.0, 18.0, 3.0, -0.9375, - 1.25, -16.0, 56.0, 20.0, -1.625, 3.75, -72.0, 9.0, - -0.5, 14.0, -4.0, 2.0, 96.0, 16.0, 12.0, -0.3125, - -0.75, 12.0, -2.0, 52.0, 6.0, -52.0, -5.0, 4.0, - 2.5, -3.5, -4.5, -22.0, -7.0, -4.0, 8.0, 0.5, - 0.375, 0.5, 80.0, -48.0, 40.0, 0.75, -40.0, 0.875, - -0.9375, -40.0, 0.5, 1.25, -128.0, -13.0, 6.0, 36.0, - -1.5, -1.0, 0.5, -0.375, 13.0, -28.0, -1.0, -2.0, - -0.1875, 4.5, 4.0, 40.0, -24.0, -88.0, 14.0, 14.0, - -104.0, -4.0, 26.0, 4.0, -0.625, -2.0, -4.0, 48.0, - -28.0, -0.375, 0.1875, 0.3125, 120.0, 10.0, 1.0, 4.0, - 0.0, -1.125, 56.0, 1.25, 1.25, -16.0, 14.0, -0.9375, - 6.0, -0.4375, -120.0, -0.25, -26.0, -0.875, -0.9375, -1.375, - 1.0, 14.0, -88.0, -7.5, -15.0, 2.25, -22.0, 88.0, - -3.5, -0.5, -0.875, 12.0, 7.0, 112.0, -96.0, -2.0, - -104.0, 8.0, 40.0, 0.875, 0.25, 0.125, 1.75, -44.0, - 0.6875, 28.0, 2.75, 5.0, -3.5, -2.0, 40.0, -3.0, - -1.0, -0.4375, -12.0, -3.25, 48.0, 8.0, -3.0, 0.625, - 96.0, -1.25, 3.0, -7.0, 3.0, 2.5, 8.0, 9.0, - 6.0, -14.0, -0.25, -88.0, -24.0, -2.75, -20.0, 0.625, - -1.75, 0.75, -7.5, -1.625, 3.5, -0.625, -2.0, 0.0, - 0.25, 40.0, -1.5, -0.75, 120.0, 112.0, -36.0, -10.0, - -14.0, 0.5, 7.0, 0.8125, 2.5, 1.0, 26.0, -6.0, - -5.0, -0.5625, 2.5, 3.0, 0.6875, -7.0, -1.0, -14.0, - -0.1875, -3.25, -15.0, -0.5, 7.0, -14.0, -0.3125, 1.25, - 1.25, -4.0, 112.0, 16.0, 56.0, 2.75, 0.0, -10.0, - 7.5, -1.5, -0.625, 6.5, 14.0, 1.5, 7.0, -1.875, - 11.0, 80.0, -2.5, 0.0, -2.25, 0.375, 0.0, -40.0, - 112.0, 24.0, -48.0, 0.0, 2.0, -12.0, -3.25, 0.8125, - 0.0, -12.0, -8.0, 2.0, 1.875, -16.0, -0.75, 40.0, - -1.25, -64.0, -1.0, -0.375, -1.0, 1.25, 88.0, -30.0, - 5.0, 0.75, -4.5, 28.0, 0.75, -72.0, 8.0, 0.25, - -30.0, 40.0, 24.0, -20.0, -0.625, 11.0, -32.0, -2.5, - 112.0, -10.0, -7.0, 8.0, -0.25, 0.5, 0.9375, 6.5, - 0.4375, -64.0, 8.0, 0.5, -0.75, 2.0, 14.0, -128.0, - 0.1875, 0.75, 3.0, 16.0, -0.6875, -1.25, -28.0, -22.0, - 1.75, -1.25, -32.0, 0.0, 0.0, 36.0, 3.0, 10.0, - 0.75, 0.5, 0.0625, 6.5, -0.5625, -0.0625, -1.5, 8.0, - 0.875, -1.0, 112.0, 2.0, 1.625, 7.0, -1.0, -0.8125, - -1.5, 13.0, 5.0, -0.75, -1.0, -4.0, -6.0, 48.0, - -4.0, 0.375, -0.6875, 52.0, 0.5, -0.5625, -0.75, 104.0, - -20.0, -6.0, -56.0, -4.0, -1.0, 24.0, 0.875, -1.5, - -11.0, 14.0, 0.9375, 7.5, 0.9375, -13.0, -64.0, 0.9375, - 9.0, -0.0625, 0.25, 2.75, 0.75, -8.0, -88.0, 32.0, - -28.0, -3.25, -1.125, -80.0, 0.75, 1.0, -1.5, -52.0, - 80.0, 120.0, -0.6875, 4.0, -120.0, 0.0, 15.0, -0.3125, - 56.0, -4.0, 52.0, -24.0, 28.0, 52.0, -88.0, 7.0, - 88.0, 32.0, 64.0, 6.0, -56.0, 22.0, 10.0, -26.0, - -0.75, 0.0, -14.0, 22.0, 12.0, -7.0, 2.5, -4.0, - 1.25, 14.0, -7.0, 4.0, 0.25, -0.75, -1.75, 0.0, - 0.25, 1.5, 30.0, -8.0, -3.0, 0.0, -16.0, -4.0, - 28.0, -0.5, -12.0, -72.0, 8.0, 10.0, 3.75, -1.0, - -26.0, 1.5, -1.0, 11.0, 3.25, 1.0, -48.0, 3.5, - -4.5, -1.75, 5.0, -5.5, -28.0, 7.0, 0.1875, -112.0, - 8.0, -88.0, 16.0, 22.0, 4.0, 8.0, -0.4375, 3.0, - 3.5, 1.25, -64.0, 0.0, 15.0, 0.5, 3.0, -24.0, - 1.25, -1.0, 0.5, -10.0, 0.4375, 80.0, 2.0, -96.0, - -1.375, -1.375, 13.0, 13.0, 3.25, -0.125, 8.0, -3.0, - -64.0, -32.0, 0.0, -64.0, -26.0, -1.5, 60.0, -4.0, - 6.0, -10.0, 0.5, -0.75, 3.0, 3.25, -1.5, -0.625, - 1.625, 48.0, 0.875, -28.0, -12.0, -0.0625, -72.0, -16.0, - 5.0, -2.0, 120.0, -1.125, 26.0, 44.0, 52.0, 2.0, - -40.0, 20.0, 2.0, -0.75, -7.0, -1.375, 26.0, -4.0, - 0.25, 0.0, 0.6875, -5.0, -13.0, -88.0, 3.25, -10.0, - -0.6875, -0.5, -5.5, 0.125, -2.0, 3.75, 12.0, -0.9375, - 3.0, 3.0, -2.0, -5.0, -32.0, -12.0, 0.75, -1.0, - -0.875, -12.0, -1.0, 11.0, -12.0, -1.0, 0.5, 3.0, - 0.125, 28.0, -7.0, -2.0, 52.0, -26.0, 24.0, 4.0, - 5.0, -64.0, 0.625, -80.0, -10.0, -22.0, -1.0, 20.0, - 8.0, -1.875, -2.5, -3.25, -3.25, 80.0, -30.0, 32.0, - -128.0, -24.0, 1.875, 0.75, 0.25, -16.0, -12.0, 6.5, - 0.75, -3.0, 13.0, -0.75, 0.25, -2.0, -8.0, -0.5, - -12.0, 40.0, 0.625, 0.8125, -8.0, -56.0, -2.0, -128.0, - 5.5, -2.0, -0.625, -14.0, 3.5, 0.625, -112.0, 4.0, - -44.0, 40.0, -10.0, 4.0, 0.6875, -4.5, -0.375, -0.1875, - 1.375, 72.0, -120.0, 1.75, -1.0, -48.0, -0.5, -16.0, - 52.0, -2.0, 104.0, -1.0, 0.125, 4.0, 1.0, -88.0, - 14.0, -24.0, -52.0, -0.1875, -72.0, 26.0, -0.3125, 20.0, - -10.0, 3.25, -0.75, 40.0, 2.25, 3.0, -2.75, 28.0, - 80.0, -96.0, -8.0, -24.0, -1.0, -0.875, 0.0, -12.0, - 0.0, 0.875, -32.0, 60.0, 48.0, -22.0, 88.0, -4.0, - 0.0, -0.5, -120.0, -15.0, -4.0, -1.5, -24.0, 32.0, - -16.0, 3.25, 7.0, -8.0, 112.0, -28.0, 0.0, 2.0, - 13.0, 7.0, 64.0, 0.5, -4.0, 0.0, -6.5, 0.5, - -1.0, -10.0, -28.0, 0.0, 16.0, 0.375, -64.0, 40.0, - -4.0, 1.0, -0.125, -1.5, -28.0, -7.5, 1.0, -16.0, - 1.625, -0.6875, -32.0, -3.0, 72.0, -8.0, -0.4375, 30.0, - -72.0, -13.0, -1.0, -0.25, 48.0, 0.6875, 3.0, -26.0, - 0.5, -3.5, -88.0, -16.0, 8.0, -0.5, -1.75, -3.0, - -32.0, 18.0, -0.375, 7.0, 0.0, -0.8125, -14.0, -9.0, - -14.0, 3.25, -3.0, 40.0, 3.75, -4.5, -7.0, 0.625, - 2.0, 13.0, 2.25, 56.0, 0.75, 0.25, -0.125, -7.0, - 1.625, -18.0, 6.5, 3.0, -15.0, -2.5, 0.375, 1.5, - -4.0, -1.25, -24.0, 0.625, 0.125, -104.0, 14.0, -15.0, - 7.0, -44.0, 3.5, -2.0, 28.0, -6.0, -5.0, 13.0, - -3.0, -56.0, -1.5, -0.6875, 12.0, -4.0, -0.125, 40.0, - -4.5, 8.0, 40.0, -1.5, 0.0625, -1.5, 0.5, 0.125, - -20.0, 0.6875, 1.25, 24.0, -4.0, -10.0, 3.25, -2.5, - 3.5, 0.125, -4.0, 6.0, -2.0, -0.75, 1.0, 56.0, - -4.0, 0.6875, -8.0, -6.5, -0.5, 24.0, 2.5, 16.0, - -13.0, -0.375, 3.25, 2.5, -8.0, 12.0, 64.0, 0.0, - 24.0, 0.5, -12.0, -0.1875, 1.0, 6.0, -1.5, -4.5, - 0.875, -0.3125, -7.0, 26.0, -14.0, -88.0, -48.0, 4.0, - -16.0, -2.0, -8.0, -56.0, -24.0, 28.0, -1.25, 2.0, - -16.0, 5.5, -0.4375, 4.0, 0.0, -4.0, 0.4375, -104.0, - 32.0, 20.0, -20.0, 60.0, -1.5, -64.0, 2.0, -16.0, - -1.0, -5.0, -32.0, 0.0, -3.25, -12.0, -6.0, 8.0, - -0.75, 1.875, 0.75, -10.0, 7.0, -0.8125, 0.25, 0.0, - 30.0, 13.0, 5.0, -0.5625, -40.0, -1.375, -20.0, -6.0, - -3.0, 2.0, -112.0, 2.75, 0.0, -0.25, -32.0, -112.0, - 14.0, 60.0, -0.25, 112.0, -3.25, -1.75, 104.0, 20.0, - -0.5, -3.75, 0.8125, 16.0, -52.0, 6.0, -1.875, -80.0, - 0.4375, -12.0, -2.75, 20.0, -4.0, -22.0, 7.0, 2.0, - -3.0, 0.375, 1.0, -0.5, -0.6875, -15.0, 0.0, -2.0, - 0.25, -48.0, 40.0, 2.0, -4.0, -48.0, -56.0, 60.0, - 0.125, 4.5, -6.5, 0.0625, -0.75, 2.0, 8.0, 0.0, - -0.4375, -15.0, 12.0, -24.0, -7.5, 18.0, -3.0, -7.5, - -3.0, -0.4375, -0.5, -3.25, -60.0, 22.0, -1.0, 7.0, - -6.0, 1.125, 7.0, -120.0, -32.0, -120.0, 4.0, -1.5, - 6.0, -0.9375, -5.5, 6.0, 3.25, 12.0, -11.0, -1.5, - 0.75, 8.0, 1.125, 0.5, 4.0, -1.0, -0.5, 1.5, - -60.0, -52.0, -8.0, -4.0, -13.0, 1.0, 7.5, 32.0, - -0.0625, -0.8125, 2.5, 0.5, 28.0, -0.25, -9.0, -3.75, - 1.5, -30.0, -10.0, 0.4375, -1.75, 80.0, 1.125, 3.5, - 8.0, 20.0, 0.875, 6.0, -1.25, -0.0625, -8.0, -0.4375, - -3.0, -32.0, -1.75, -2.0, 0.6875, 1.75, 48.0, -8.0, - -8.0, 0.25, -104.0, 20.0, 52.0, -4.0, -10.0, 6.5, - -0.375, 24.0, 0.5, -48.0, 32.0, 5.0, 0.0, 1.875, - -52.0, 26.0, 1.125, 80.0, 0.625, 1.5, -128.0, -13.0, - 12.0, -7.5, 6.0, -9.0, -0.75, -0.1875, -0.875, 11.0, - 0.0625, -6.5, 0.9375, 40.0, 2.75, -3.5, 0.0, -0.625, - 0.0, 11.0, -4.0, 1.0, 22.0, -6.0, 0.375, -6.0, - 0.0, -2.0, 24.0, -0.5, 12.0, -1.625, -6.0, 1.25, - 24.0, -120.0, 10.0, 6.0, -80.0, 9.0, -32.0, 28.0, - -48.0, 3.0, -52.0, -1.0, 56.0, 4.5, -2.25, -48.0, - -32.0, -0.5625, 8.0, -3.5, 8.0, 36.0, 0.375, -1.0, - 2.0, -8.0, -20.0, -32.0, -12.0, -26.0, 18.0, -7.5, - -12.0, -0.1875, 48.0, 16.0, -18.0, 2.5, 1.875, 0.875, - -11.0, 2.0, 0.0, 12.0, 9.0, -26.0, 1.875, 2.5, - 14.0, -16.0, 0.375, -24.0, -60.0, 4.0, 1.0, 24.0, - 12.0, 1.0, 2.0, -28.0, -1.125, 120.0, 32.0, -12.0, - 3.75, 1.75, 5.5, 0.5, -14.0, -3.0, -44.0, 0.5, - -20.0, 2.0, 1.5, -1.0, -5.5, -1.0, 16.0, -14.0, - 0.875, -8.0, 72.0, -0.3125, -96.0, 0.5625, -14.0, -5.5, - -20.0, -6.0, 24.0, 20.0, -24.0, 16.0, -1.375, -48.0, - -40.0, 13.0, 0.375, -6.0, -0.5625, -28.0, 0.0, -10.0, - 0.4375, 60.0, -12.0, -1.0, -40.0, 0.1875, 40.0, 0.75, - 18.0, 2.75, -6.0, 24.0, -64.0, 5.5, -4.0, -48.0, - -32.0, 14.0, 1.75, -80.0, -1.5, 24.0, 0.1875, 1.375, - 0.4375, -28.0, -6.5, 6.0, -0.25, -0.5, -1.25, -56.0, - -2.0, 1.25, -2.5, 60.0, -1.5, -4.0, -7.5, 0.0, - 1.0, -0.5, 16.0, 0.6875, 0.0625, -120.0, -0.625, 32.0, - 0.0, 6.5, -64.0, -18.0, 0.5, 0.9375, -3.0, 1.5, - 120.0, 0.0, 0.0, -2.75, 1.625, -22.0, 2.0, 10.0, - 3.0, -0.5, 0.375, 1.75, 0.3125, 52.0, 0.4375, 2.25, - -4.0, -2.0, -4.0, -28.0, 72.0, -10.0, 0.0, -3.25, - 13.0, -2.0, -12.0, 3.75, -0.25, -56.0, -5.0, -0.25, - -60.0, -2.75, 0.5, -18.0, -40.0, -8.0, 18.0, -104.0, - 26.0, -1.625, 96.0, -5.5, -1.0, -15.0, -0.375, 3.75, - -3.0, -32.0, -112.0, -0.125, 3.0, 20.0, -5.0, -0.875, - 80.0, -26.0, 8.0, -8.0, 10.0, -0.5, 7.5, 20.0, - -1.0, 0.75, -0.125, 0.5625, -0.0625, -2.5, 9.0, -0.6875, - -36.0, -2.0, 1.5, 1.5, 7.0, -48.0, -0.75, -0.625, - -12.0, -1.25, -0.25, 5.5, 0.3125, -0.75, 4.0, -16.0, - 88.0, 3.0, 0.5, 0.0, -104.0, -32.0, 0.0, 0.75, - 7.0, 1.25, 0.5, 20.0, 4.0, 0.6875, -7.5, 48.0, - 20.0, -24.0, 44.0, -24.0, 0.1875, 5.0, 4.0, 6.0, - 0.125, -0.75, -0.5, 3.5, -3.5, 32.0, 6.0, 3.0, - -2.0, 0.25, 56.0, 4.0, -13.0, 24.0, 24.0, -2.0, - 48.0, 22.0, -32.0, 22.0, 24.0, -1.5, 0.375, 1.0, - -4.0, 0.4375, -2.5, -1.0, 6.5, 1.125, -0.25, -8.0, - -22.0, 0.0, 56.0, -15.0, 1.75, -0.5, -1.5, -0.75, - -4.0, -96.0, -80.0, 26.0, -1.625, 44.0, 64.0, 40.0, - 0.0, -24.0, -14.0, 0.3125, -104.0, -60.0, 0.3125, 56.0, - -1.0, 0.5625, -4.0, -8.0, -2.25, -2.0, -2.0, -40.0, - -28.0, 1.375, 0.5625, -0.25, 0.0, -1.375, -9.0, -0.3125, - 2.5, 6.0, -40.0, -11.0, 0.0, 16.0, 9.0, 32.0, - 104.0, 16.0, -96.0, 80.0, -28.0, -18.0, 9.0, -1.0, - 1.0, 0.8125, 24.0, -8.0, -2.0, -28.0, -0.9375, -0.625, - -0.875, -2.0, -10.0, -3.0, 0.0, 56.0, 5.0, 0.875, - 7.5, 64.0, -3.5, -0.5, -1.0, -1.75, 20.0, 4.5, - 0.5, 12.0, 8.0, 8.0, -120.0, 2.0, -26.0, 0.5, - -0.375, -60.0, -24.0, -6.0, -40.0, 16.0, 3.0, 8.0, - -15.0, -6.0, -4.0, -1.25, -6.0, 0.75, 7.0, 0.6875, - -3.0, -4.0, 1.375, -20.0, 112.0, -4.0, -5.5, 48.0, - -14.0, -1.125, 0.5625, 12.0, 0.0, -0.1875, 56.0, -0.25, - -0.6875, -1.25, -44.0, 80.0, 3.5, 48.0, -8.0, -0.9375, - -0.625, 96.0, -1.5, 1.25, 1.0, -0.75, 88.0, 16.0, - 32.0, -1.75, 0.8125, 0.5, -1.0, -1.25, 3.25, 1.25, - -80.0, -12.0, 14.0, -14.0, 18.0, -3.5, 6.5, -112.0, - 0.5, -28.0, -0.25, 0.0, -4.0, -10.0, -0.5, 28.0, - 1.5, 1.75, -2.0, -0.5, 60.0, 14.0, 0.0625, -32.0, - 8.0, 1.0, 0.125, -3.0, 0.875, 2.5, 112.0, -30.0, - 22.0, -0.9375, 0.625, -4.0, -36.0, -30.0, -2.0, 1.25, - 1.375, 4.0, 0.0, -0.75, -3.5, -5.5, -16.0, -28.0, - 0.0, -1.125, 14.0, 40.0, -7.0, 56.0, -0.375, -8.0, - 0.75, 56.0, 1.875, 32.0, 0.3125, 6.0, 24.0, -3.0, - -1.375, 14.0, 0.6875, 1.5, -3.25, -32.0, -32.0, -10.0, - -11.0, 6.5, 5.0, 16.0, 0.5625, 60.0, 26.0, -4.0, - -26.0, 48.0, -24.0, 0.75, 26.0, -0.5, 13.0, -120.0, - 1.5, 0.875, 3.5, -120.0, 2.5, 96.0, -0.125, 8.0, - 80.0, 3.0, -1.25, -5.0, -56.0, 10.0, -1.0, 5.0, - 1.0, 0.125, -28.0, -1.5, 9.0, -1.125, 48.0, -1.375, - 18.0, -14.0, -4.0, -48.0, 14.0, 120.0, -44.0, 0.5625, - -1.375, -12.0, 3.0, 3.5, 3.0, -0.625, -4.0, -16.0, - -20.0, 0.0, -4.0, 48.0, 1.75, -16.0, -3.25, -2.0, - 8.0, -64.0, 0.25, -0.75, 10.0, -1.0, 0.6875, 112.0, - -2.5, -88.0, -0.875, 20.0, -13.0, 0.0, -6.0, 28.0, - 96.0, 26.0, 0.25, -1.0, -48.0, -16.0, 96.0, 0.25, - 0.0625, 48.0, 5.5, 30.0, 1.75, 0.0, 0.375, 0.125, - 8.0, -112.0, 36.0, 2.5, 1.0, 1.375, -0.4375, 104.0, - -5.0, -1.75, 8.0, -0.5, -0.625, -0.5, 2.0, -1.5, - -1.375, -26.0, -13.0, 2.0, -5.0, -5.0, 2.0, 0.25, - -9.0, -1.0, 10.0, -6.0, 0.125, 2.5, 3.0, -2.25, - 0.5, -20.0, -2.0, 1.625, -2.25, 0.5, 28.0, -11.0, - 72.0, -18.0, -0.75, -13.0, 88.0, 0.1875, 1.5, 6.0, - 12.0, -40.0, -5.5, -2.75, 0.5625, -3.5, 4.5, 0.0, - -2.0, 120.0, 1.5, 3.0, -40.0, 13.0, -40.0, 0.375, - 60.0, 0.75, -10.0, -0.5, 0.0, -3.0, 0.875, 7.5, - -72.0, -1.125, 32.0, 16.0, -1.75, 36.0, -72.0, -1.0, - -1.0, 16.0, 1.125, 64.0, 2.75, -9.0, -16.0, -1.0, - 32.0, -2.0, -16.0, 2.0, 40.0, 7.0, 0.9375, -14.0, - 7.5, -0.5625, -0.6875, -1.25, 4.0, -1.25, -6.5, 0.25, - -0.125, -4.0, -24.0, -1.5, -16.0, -56.0, 5.0, -12.0, - 2.0, 3.0, -8.0, -2.25, -36.0, 0.25, -15.0, 16.0, - 0.125, 0.0, 10.0, 4.0, 120.0, 16.0, -6.0, -1.25, - -56.0, -9.0, -1.375, 6.0, 12.0, 60.0, -22.0, -0.5, - -8.0, -1.25, -4.0, -1.5, -3.0, -1.5, 2.5, -0.4375, - 48.0, -40.0, 48.0, -64.0, 2.0, -24.0, -0.75, 0.0, - -0.8125, 52.0, 28.0, -15.0, -1.75, -0.3125, -48.0, -18.0, - -0.5, -1.0, 2.0, -1.75, -3.5, -1.25, 24.0, -6.0, - -30.0, -2.75, 16.0, 1.5, 1.125, -7.0, -1.25, -0.8125, - 0.5, -0.4375, 24.0, -52.0, 0.5, 0.0, 1.75, -1.0, - -7.0, 3.5, 2.25, 1.5, 0.5, -3.5, -1.25, 16.0, - 7.0, -3.0, 0.0625, -0.25, -8.0, -0.5625, -104.0, -64.0, - 16.0, 7.5, -10.0, -32.0, -0.5, 40.0, 0.25, -4.5, - 0.4375, 11.0, 8.0, 8.0, -15.0, -0.75, 16.0, -13.0, - 0.0, -60.0, 0.125, -2.25, 0.6875, -10.0, -112.0, 40.0, - -12.0, 9.0, 0.0, -1.25, 24.0, -0.75, 104.0, 1.5, - -1.5, -13.0, -0.25, -3.0, 0.0, 18.0, -4.0, -0.1875, - 20.0, 28.0, 3.0, 52.0, 0.5, 4.0, -128.0, 0.8125, - 12.0, 3.5, -0.5, 2.75, -52.0, 3.75, 8.0, -1.5, - 0.0, 0.125, 6.0, -112.0, -0.5, 120.0, -72.0, 0.75, - -1.25, 96.0, -16.0, 56.0, 0.0, 3.0, -1.25, 2.0, - 0.0, 0.875, 1.0, 14.0, -0.875, -40.0, 3.0, -48.0, - 40.0, 0.25, -13.0, 7.0, -0.5625, 1.0, 1.25, 12.0, - 0.0, -0.875, -8.0, -20.0, -0.75, -9.0, -56.0, -88.0, - -0.625, -1.25, -48.0, -16.0, 1.75, 0.5625, -4.5, -30.0, - -16.0, -2.0, -0.6875, -1.5, -16.0, 0.6875, 0.5, -28.0, - 120.0, -2.0, -1.875, -4.0, 0.625, -36.0, 15.0, 0.5, - -4.0, -8.0, 0.0, -13.0, -3.75, -0.375, -5.5, 28.0, - 12.0, 20.0, 52.0, -32.0, 80.0, 10.0, 8.0, 13.0, - 3.75, 2.5, -20.0, -2.0, 40.0, 18.0, -1.0, -10.0, - -1.0, -1.5, -96.0, -3.5, 11.0, 32.0, -12.0, -24.0, - 120.0, -20.0, -9.0, -12.0, -5.5, 28.0, -12.0, 7.5, - -72.0, 4.5, -0.6875, 15.0, 7.5, 0.0, -8.0, 10.0, - -1.0, -14.0, 12.0, 10.0, -8.0, 1.5, 0.0, -12.0, - -64.0, -22.0, 3.5, -4.0, 2.0, -24.0, -1.0, -2.0, - -88.0, 3.0, -0.4375, -24.0, 0.5, -0.5, 16.0, 4.0, - -16.0, -4.0, -24.0, 0.5, 8.0, -22.0, -24.0, -8.0, - -14.0, 0.5, -104.0, -0.5, 0.125, 0.0625, -28.0, 11.0, - -8.0, 1.75, 3.0, 10.0, 40.0, 22.0, -5.5, -0.75, - 2.0, 0.0, 1.75, 1.0, 60.0, -24.0, 16.0, -22.0, - 0.4375, 15.0, -11.0, -64.0, -0.0625, 0.0, 1.5, 1.0, - 36.0, -20.0, 36.0, 2.25, -104.0, -0.5625, 20.0, 11.0, - -4.0, 26.0, 96.0, 0.0, -5.0, 8.0, -0.75, 1.5, - -0.125, 2.0, -10.0, 7.5, -0.9375, -3.75, 10.0, -3.0, - 112.0, 0.0, -6.0, 3.5, -8.0, -3.0, -6.0, -1.875, - 3.25, 30.0, -8.0, 5.0, 40.0, 0.125, 1.25, -128.0, - -8.0, 4.0, 40.0, 24.0, -0.25, -60.0, -96.0, 5.0, - -4.0, 0.75, 0.875, 0.75, 12.0, -0.5, -120.0, 60.0, - 5.5, -6.0, 11.0, -0.25, -12.0, 0.5625, 0.125, -88.0, - 80.0, -52.0, 6.0, -32.0, -8.0, 4.5, -0.75, -5.5, - -1.375, -3.5, -5.0, 2.25, 7.5, -1.0, 0.5625, 7.0, - -40.0, 3.0, -1.0, -1.0, 1.125, -0.125, 7.0, 80.0, - -2.0, -36.0, -1.0, 0.0, -0.125, 0.0, -104.0, 56.0, - -3.0, -4.0, -64.0, 0.3125, 20.0, 2.0, -32.0, 0.75, - 7.0, -3.25, -0.5, 0.0, 1.5, -0.75, -7.0, 0.6875, - 3.25, 4.0, -5.0, 0.0, -12.0, 1.0, -112.0, -1.375, - 1.0, -6.0, -36.0, -1.25, -3.5, 20.0, -80.0, 0.4375, - -4.0, -0.5, -24.0, -7.0, 2.75, 1.5, 7.0, -40.0, - -0.125, 0.8125, 2.0, 0.625, 16.0, -1.0, -80.0, -4.0, - -0.75, -2.0, -1.125, 0.6875, -0.25, -1.25, -12.0, 104.0, - -6.0, 5.5, -48.0, -1.0, -72.0, -1.75, -88.0, 0.5, - 56.0, 20.0, 0.0, 6.0, 2.5, -112.0, -8.0, -4.5, - -15.0, 88.0, -12.0, 18.0, 22.0, -0.25, 24.0, 2.5, - -7.0, -0.8125, -0.75, -56.0, 8.0, -16.0, 16.0, -16.0, - -128.0, 0.375, -2.0, -64.0, 2.25, -8.0, 3.0, -2.75, - -3.75, 1.125, -2.25, 0.25, -24.0, -3.5, 88.0, -7.0, - -3.25, 1.0, 0.625, 9.0, -0.625, 0.0, 72.0, -0.25, - -1.5, 0.0, 1.625, -52.0, 0.5625, -3.25, -48.0, -4.0, - 4.0, 15.0, -10.0, 24.0, 6.5, 14.0, -128.0, 18.0, - 2.5, 3.75, 0.0, 1.5, -4.0, -30.0, 1.25, 30.0, - -28.0, 0.0, -3.5, 2.0, 0.0, -1.5, 16.0, -20.0, - -8.0, -1.0, -0.75, 16.0, 60.0, -10.0, 2.75, -24.0, - -1.0, 1.5, 60.0, -0.1875, 7.5, -36.0, -24.0, 2.0, - -2.5, -16.0, -11.0, 0.375, -1.25, 0.0, -0.375, -36.0, - -1.5, -44.0, 1.75, -20.0, 0.5, 8.0, 0.0, 32.0, - 0.5, 16.0, 4.0, 8.0, 26.0, -5.0, 0.375, -36.0, - 96.0, 0.8125, 1.5, 12.0, 4.0, -1.25, -36.0, 3.5, - 1.125, 40.0, 6.5, -0.25, 0.5, -0.625, -8.0, -4.5, - -3.5, -28.0, 0.5, -24.0, 4.0, 0.0, 14.0, -2.5, - 3.0, -10.0, 0.0, 4.0, 28.0, -0.75, 88.0, 2.0, - 0.625, -0.125, -3.25, -56.0, -1.75, 1.5, -48.0, 0.5, - -44.0, 6.0, 6.0, 7.5, -0.5, 15.0, 6.5, -13.0, - 15.0, 0.0, 52.0, -0.1875, 16.0, -14.0, -1.0, -1.0, - 4.0, 1.0, -6.0, -4.0, -2.5, 5.0, 18.0, 4.0, - -8.0, -1.0, -1.25, -0.25, -8.0, -3.5, 40.0, -0.875, - 4.0, -20.0, -16.0, 1.875, 18.0, 12.0, -2.75, 2.75, - -3.75, 15.0, 12.0, 2.0, -12.0, -8.0, -64.0, 24.0, - 72.0, -3.5, 56.0, 2.0, 1.5, 5.5, -0.8125, -60.0, - -30.0, -1.625, 1.75, 0.125, 12.0, 32.0, 72.0, -2.75, - 5.0, -96.0, -3.5, 1.5, 6.5, 6.5, -9.0, 1.875, - -44.0, 0.4375, -1.375, -0.375, -9.0, 2.5, -128.0, -0.6875, - 26.0, 2.5, -120.0, -0.4375, 2.0, -2.0, -3.75, -0.5, - 0.6875, 1.375, -0.375, -28.0, 8.0, 1.5, -60.0, -64.0, - -15.0, -5.0, 2.0, 1.375, -1.25, -60.0, -56.0, 0.0, - 0.0, -0.75, -40.0, -4.5, -3.0, -24.0, -5.0, -16.0, - 7.0, 60.0, -3.0, 0.75, -12.0, 16.0, 14.0, -3.25, - -0.6875, 0.0, 0.0, -7.5, -2.0, -60.0, 8.0, -32.0, - 8.0, -64.0, 6.0, 28.0, 5.5, -40.0, -4.0, -1.0, - 4.0, -40.0, 10.0, 12.0, -4.0, 4.0, 0.625, 6.5, - 2.25, -0.5, -12.0, -1.75, -16.0, 2.5, 0.5, 1.375, - 0.0, 8.0, 56.0, 6.5, 24.0, 2.0, -11.0, -8.0, - 24.0, 36.0, -2.0, 1.5, 4.0, 8.0, 1.0, 1.75, - -3.0, -0.125, 4.0, -2.0, -2.25, -4.0, 0.0, 88.0, - 0.1875, 15.0, -11.0, -1.0, -2.0, 1.25, -20.0, -4.5, - 1.5, 0.4375, -0.4375, -0.25, 3.5, 6.0, 0.9375, 60.0, - -5.0, -8.0, 112.0, -2.0, -1.0, -64.0, 0.375, 28.0, - -14.0, -2.0, 6.0, 0.75, -32.0, 1.0, 0.875, 4.0, - 10.0, 1.875, 0.0625, 112.0, 1.25, -3.75, 0.875, 2.0, - -6.0, 2.0, -5.5, -8.0, -1.0, -0.875, 7.0, 3.0, - 3.0, 9.0, -1.5, 20.0, -4.5, 20.0, 0.5, 120.0, - -30.0, -8.0, -14.0, -26.0, -0.75, 26.0, -16.0, -26.0, - 24.0, 3.0, -3.25, -1.75, 4.0, 4.0, 5.0, -0.625, - 0.0, 3.0, -11.0, -4.5, -7.0, 0.0, -48.0, -5.0, - 10.0, -1.0, 10.0, -0.5, 0.0, 2.75, 1.0, 0.875, - -40.0, -1.0, -2.0, -0.5, -40.0, -22.0, -2.5, 2.25, - 8.0, 0.0, -0.75, -1.125, -0.3125, -0.1875, 0.75, -16.0, - -0.5, -0.75, -1.25, -2.0, -5.5, 0.875, -104.0, -0.875, - 1.125, 1.25, -64.0, -2.0, -128.0, -16.0, 2.75, -0.5, - 80.0, 112.0, 14.0, 0.0, 2.0, 12.0, 40.0, -56.0, - 0.75, -1.0, 22.0, -6.5, 0.5, -3.5, -1.5, -48.0, - 0.0, 1.0, -1.25, -8.0, 28.0, 4.0, 1.25, 6.5, - 0.75, 44.0, 24.0, 2.0, -6.0, -56.0, -14.0, 40.0, - -0.875, 48.0, 2.5, -0.125, 0.5625, -4.0, 5.0, -3.25, - -4.5, 1.625, 4.0, -24.0, -0.375, -26.0, -0.375, 22.0, - 8.0, -24.0, -6.0, 0.625, 56.0, -24.0, 7.5, 2.5, - -16.0, 0.25, -96.0, -0.5, -112.0, 30.0, -0.875, 3.25, - -0.0625, 0.1875, 0.875, -2.0, 0.0, -24.0, 72.0, 36.0, - 6.0, 8.0, -5.0, -44.0, 4.0, -1.25, -0.5, -12.0, - -0.5, -0.9375, -0.8125, 1.0, 0.8125, -16.0, 48.0, 22.0, - -10.0, 80.0, -12.0, -7.0, -80.0, 0.4375, 30.0, 11.0, - -12.0, -0.75, 1.25, -40.0, 56.0, -20.0, -0.5, -4.0, - 80.0, 0.125, -104.0, 16.0, -16.0, -3.0, -6.5, 0.0, - -4.0, 48.0, -16.0, -0.125, -30.0, -24.0, 5.0, 1.75, - -10.0, 120.0, -80.0, -0.25, 0.5, -1.5, -12.0, 0.0, - -1.875, 48.0, -0.1875, 1.875, -48.0, -0.4375, 32.0, -32.0, - -0.125, 7.0, -3.5, 8.0, 8.0, 56.0, -12.0, 32.0, - 0.75, 1.125, -16.0, -2.5, -80.0, -2.5, 0.0, 4.5, - 7.5, 56.0, 60.0, 0.5, 56.0, 6.0, 26.0, 4.0, - 0.5, 0.8125, 5.0, 0.125, 0.125, 0.125, -11.0, 12.0, - 0.6875, 0.0, -0.8125, -2.25, -5.0, -112.0, -0.5625, -56.0, - 0.5, 0.875, -8.0, 0.125, -2.0, 56.0, 1.0, 3.25, - -6.5, 0.75, 4.0, -0.375, -24.0, 0.125, -7.5, -1.0, - 32.0, 0.375, -1.5, 40.0, 4.0, -6.5, -0.6875, -6.0, - -0.3125, 0.6875, -18.0, -1.25, -6.0, 0.0, 5.0, -8.0, - 2.5, 88.0, 0.75, -0.625, 0.0, -0.75, -16.0, 2.0, - 40.0, -0.125, 3.5, -56.0, -15.0, -24.0, -52.0, 0.625, - -12.0, 0.625, 14.0, 1.125, 3.0, 1.0, 0.5, -1.0, - -12.0, 1.25, 0.0, 1.125, -0.25, 0.0625, 5.5, -0.5, - -20.0, 112.0, 1.5, 14.0, -8.0, 8.0, -48.0, 8.0, - -128.0, -56.0, -0.6875, 1.0, -3.5, 2.0, 0.75, 72.0, - -0.375, 8.0, 48.0, 3.5, -13.0, -88.0, -64.0, 0.0, - 3.5, 2.5, -0.25, 13.0, 22.0, 1.75, 4.0, -0.5, - -1.0, 0.625, -22.0, 2.5, -3.0, 0.9375, -36.0, -128.0, - 7.0, 4.5, 8.0, 8.0, -2.5, -16.0, -2.75, 0.75, - -0.6875, 3.5, 0.625, -8.0, -0.75, -2.0, 3.75, -36.0, - 0.1875, 48.0, 10.0, 28.0, 0.5, -48.0, 1.5, 3.5, - -0.125, 0.875, -64.0, 2.0, 10.0, 3.0, -48.0, 15.0, - 52.0, 6.0, -0.125, -12.0, 6.0, -1.5, -32.0, 11.0, - -5.5, -0.5, -88.0, -13.0, -8.0, -8.0, -9.0, -0.5, - -14.0, 3.25, 1.0, 40.0, 5.5, -3.5, 0.75, 96.0, - -0.25, 48.0, 0.0, -22.0, -16.0, 2.0, -14.0, 20.0, - -0.875, -1.875, 10.0, -10.0, 1.0, 7.0, -20.0, 32.0, - -12.0, 0.9375, 8.0, 0.875, -40.0, 24.0, -8.0, -2.5, - 112.0, 0.375, -13.0, -48.0, -0.5, -28.0, -1.0, 0.5, - -3.75, 56.0, -48.0, 4.0, -4.5, 0.75, -22.0, 7.0, - -1.0, -72.0, -28.0, -0.5, -0.25, 2.75, -48.0, -1.75, - 0.125, 10.0, 1.875, -0.75, -2.0, 0.0, -24.0, -0.5, - -20.0, -14.0, 9.0, -72.0, 0.6875, -0.75, 5.5, -5.5, - 2.0, 112.0, -0.8125, -4.0, 40.0, 56.0, 48.0, -128.0, - -7.0, 36.0, 6.0, -80.0, -6.5, 1.75, 56.0, -0.8125, - 0.0, 56.0, -3.0, 0.5, -11.0, -72.0, -0.5, -14.0, - 4.5, 14.0, -18.0, 12.0, -18.0, 120.0, 14.0, 48.0, - 0.0, -24.0, 20.0, 0.3125, 28.0, 6.5, -12.0, 4.5, - 3.25, -1.75, 1.0, -1.0, -0.875, -1.0, 8.0, 0.0625, - -1.5, -28.0, -80.0, -16.0, -3.5, -0.125, 1.875, 56.0, - -2.0, 1.5, -22.0, 0.75, -3.0, -0.8125, -12.0, -96.0, - 40.0, -6.0, 0.8125, 3.5, -5.0, -11.0, 1.0, -28.0, - -0.875, 56.0, -4.0, -3.0, -0.375, -64.0, 2.5, 3.5, - -2.0, 24.0, -6.5, 3.0, -10.0, 4.0, 44.0, 7.5, - 56.0, 60.0, -40.0, -36.0, 1.0, -12.0, -32.0, 2.5, - 1.75, 3.0, 5.5, 1.75, -1.375, 7.0, 30.0, -0.875, - -5.0, 0.25, 64.0, -4.0, -16.0, 2.0, 2.0, 1.75, - 10.0, -0.25, -104.0, 0.0, 120.0, 0.0, 10.0, -2.0, - -7.0, 0.0, 36.0, -26.0, -0.25, 0.9375, 28.0, -5.0, - 0.5, -3.0, 0.375, 2.5, 56.0, 0.25, -2.5, -1.5, - 3.75, 40.0, 15.0, -72.0, -0.6875, 0.875, -1.75, 3.0, - 48.0, 64.0, -7.0, -0.125, 44.0, -3.5, 8.0, -28.0, - 0.5, 3.5, -0.9375, 96.0, -2.25, -10.0, 0.625, -5.0, - 3.5, -88.0, 20.0, 7.0, 20.0, -1.875, -5.5, -96.0, - -112.0, 40.0, 3.5, 72.0, 12.0, -0.375, 0.75, 0.5, - 88.0, -88.0, -48.0, -0.75, 1.0, 2.5, 10.0, -15.0, - 2.5, -6.0, 28.0, 0.0625, 6.0, 24.0, 0.0625, -5.0, - -0.375, 72.0, 0.4375, 8.0, 1.125, 5.0, -28.0, -0.0625, - 0.25, 1.0, 1.125, -1.75, -8.0, 7.0, 32.0, -8.0, - 0.0, 1.375, 44.0, 2.5, -40.0, 3.0, -2.0, -24.0, - 0.5, -1.125, -48.0, -96.0, -0.625, 22.0, 8.0, -1.25, - -1.25, 36.0, -16.0, -0.5625, 1.0, -7.0, -9.0, -2.5, - 72.0, -15.0, 1.375, 0.25, -7.0, -3.75, -4.0, -0.3125, - 1.25, -5.5, 0.0, -32.0, -3.0, -0.6875, -64.0, 1.25, - -3.5, -8.0, -48.0, 1.75, 0.9375, -20.0, 14.0, 0.125, - -1.5, 8.0, -28.0, -60.0, 28.0, -120.0, 6.0, -0.5, - -1.0, -0.875, -14.0, 5.0, 0.625, 0.0, 0.9375, 0.125, - 104.0, -2.75, -2.0, -28.0, 0.375, -20.0, 24.0, 3.0, - 1.0, 96.0, 7.0, 1.125, 4.0, -3.0, -1.25, -2.0, - 1.5, 0.25, -13.0, -12.0, -1.375, 18.0, -15.0, 1.5, - 3.5, -10.0, 20.0, -3.25, 0.75, -3.5, 0.0625, 22.0, - -2.0, 2.25, -1.125, 1.125, -0.125, -28.0, 0.875, 10.0, - 0.0625, -44.0, -0.9375, 13.0, 48.0, -12.0, 0.25, 12.0, - 1.5, -2.0, 15.0, 3.0, -16.0, -0.1875, 3.0, -0.375, - 4.5, -7.0, -12.0, 7.0, 2.75, -3.25, 48.0, -0.25, - 88.0, -64.0, -3.0, 2.0, 0.625, 1.0, 104.0, 6.0, - 0.0, 1.25, 60.0, 1.5, -1.0, 3.25, 40.0, 4.0, - 22.0, -24.0, -15.0, -88.0, -40.0, -120.0, 0.625, -6.0, - 0.75, -3.0, -0.6875, -4.0, 0.3125, 52.0, -0.5, 1.5, - -1.875, 1.5, -4.0, -0.75, -4.0, 0.0, -2.0, -2.0, - 16.0, -4.0, -8.0, 0.5, -11.0, 0.6875, -96.0, -72.0, - 0.25, 1.375, 56.0, -22.0, -2.0, 56.0, -1.75, -1.25, - -2.0, 72.0, -14.0, 4.0, 5.5, -14.0, 8.0, 120.0, - -1.0, 56.0, 2.25, -10.0, 1.375, -20.0, -40.0, 0.0625, - 5.5, -0.0625, 0.0, -0.3125, 0.0625, -120.0, 24.0, 88.0, - -0.5, -1.25, 24.0, -6.0, 3.0, -1.75, 24.0, 1.875, - -3.0, -0.1875, -3.25, -0.5, 2.0, 0.25, 12.0, 0.0, - -48.0, 2.5, -56.0, -12.0, -28.0, 1.25, 64.0, 0.875, - -4.0, 4.5, 1.25, 0.0, 7.0, 12.0, -64.0, -36.0, - -0.875, -13.0, -4.0, -32.0, 0.0, -52.0, 88.0, 26.0, - -0.875, -4.5, 104.0, -28.0, -16.0, -72.0, 56.0, 56.0, - -13.0, -3.75, 14.0, 0.375, 28.0, -8.0, 0.0, 16.0, - -1.75, -1.0, 60.0, 12.0, 5.0, -1.5, 0.1875, -18.0, - -13.0, -0.0625, 64.0, 0.75, 10.0, 0.5, -72.0, -2.75, - 3.0, -2.0, 6.5, -1.0, -10.0, -2.0, -2.0, -96.0, - -4.0, 10.0, -4.0, 0.375, -1.125, 8.0, -2.5, 48.0, - 0.75, 0.25, -16.0, -1.5, -128.0, -120.0, -18.0, 1.125, - -60.0, -13.0, -1.0, 12.0, 28.0, -0.75, -32.0, 2.0, - -1.125, 4.0, -4.0, -14.0, -40.0, -64.0, 0.0, 16.0, - -88.0, -1.375, -88.0, -40.0, -1.25, 0.125, 10.0, -13.0, - -40.0, -9.0, -4.0, 112.0, 0.0, -0.625, 1.0, -5.5, - 2.5, 7.0, 0.625, -12.0, 1.25, 1.5, 4.0, -14.0, - -7.0, -104.0, 1.0, -0.5, -7.0, 0.9375, 32.0, 5.5, - 0.125, 6.0, -4.0, -26.0, -0.375, -2.0, 3.75, -48.0, - 0.0, -5.0, -104.0, -1.0, -48.0, -2.75, -0.75, -3.5, - -8.0, 112.0, -0.3125, -4.0, 0.125, 0.0, -1.0, -112.0, - -1.0, -1.0, 22.0, -40.0, 0.5, 12.0, -12.0, 0.75, - -1.375, -32.0, 12.0, 4.5, 0.375, 6.0, 1.875, 0.75, - 18.0, -48.0, -10.0, -2.5, 2.0, -5.5, -32.0, -0.875, - 0.0, -0.8125, 3.75, 0.0, -28.0, 2.5, 11.0, 0.1875, - -8.0, 8.0, 36.0, 24.0, -6.0, 15.0, 4.0, 4.0, - 1.0, -0.8125, -7.0, -0.875, 0.125, 2.0, 4.0, 48.0, - -1.0, -1.25, 0.875, -6.5, 4.0, 1.625, -2.0, -0.875, - 6.5, -40.0, 96.0, 7.5, -5.0, -60.0, -0.5625, 7.0, - -6.0, 15.0, 2.0, -8.0, 26.0, 28.0, 18.0, 3.75, - -1.5, -16.0, -96.0, 40.0, 24.0, -6.0, -60.0, -128.0, - -10.0, -112.0, 3.75, 16.0, -2.5, -56.0, 52.0, -14.0, - 1.5, -12.0, -1.5, 8.0, -8.0, -32.0, -11.0, -1.0, - 80.0, -0.5625, 0.5, -40.0, -2.0, -32.0, 0.625, -0.375, - 24.0, 1.375, 14.0, -80.0, 0.75, 40.0, 1.75, 24.0, - 1.625, -0.875, -3.5, 14.0, 120.0, 48.0, 0.5, 14.0, - -15.0, -40.0, 11.0, -10.0, -10.0, -14.0, -11.0, 0.125, - 1.5, -28.0, -0.6875, 120.0, 96.0, 88.0, 16.0, -2.0, - -12.0, -0.625, -1.5, -0.25, -8.0, 4.0, -2.0, 0.75, - 0.5, 5.0, -0.5, -7.0, -16.0, 44.0, 12.0, 64.0, - -24.0, 16.0, -56.0, -4.0, 1.125, -0.125, -4.0, 112.0, - 112.0, 0.5, -0.5, 112.0, 60.0, -18.0, -0.75, -10.0, - 6.0, 2.5, 6.0, 6.0, -2.0, 1.25, -28.0, 0.75, - -7.0, -0.5, -1.25, -26.0, -1.25, -8.0, -4.0, 8.0, - 88.0, -1.625, -2.0, 40.0, -13.0, -32.0, 80.0, 1.875, - 0.0, 72.0, -11.0, -40.0, 12.0, -6.0, -8.0, -18.0, - -4.5, 14.0, 1.375, -48.0, 2.0, 0.0, 24.0, -26.0, - -0.5, 0.5, -28.0, -2.5, -4.0, 1.25, 0.1875, -16.0, - 1.75, -64.0, -0.375, 0.75, -8.0, -1.75, 0.0, 0.5, - 2.75, 112.0, 15.0, 24.0, 13.0, 0.875, 0.8125, 44.0, - -1.0, 2.5, 1.25, 56.0, -0.875, 0.625, -0.25, 28.0, - 11.0, 3.0, 0.0, 8.0, 6.0, 0.0, 0.6875, -88.0, - -1.5, -12.0, 44.0, -30.0, 1.25, 1.125, 12.0, -12.0, - -48.0, -0.5, 8.0, -7.0, -26.0, -3.5, 0.0, -1.5, - -24.0, 32.0, 0.0, -11.0, 1.5, 0.0, -28.0, -32.0, - -2.5, -7.0, 0.8125, 1.5, 3.5, 4.0, -3.0, 24.0, - -56.0, -3.0, 8.0, 52.0, -4.5, 48.0, 1.375, 14.0, - 0.0, 88.0, 0.5, 13.0, 10.0, 3.0, 18.0, -10.0, - 12.0, -56.0, -1.5, 0.5, 20.0, 0.875, -12.0, 8.0, - 0.0, 1.0, -28.0, 3.25, 24.0, 0.875, -3.0, 56.0, - -8.0, 7.0, 0.875, -0.125, -18.0, -32.0, -44.0, 1.0, - -32.0, 0.5, -12.0, 80.0, 48.0, -32.0, -1.625, 5.5, - -18.0, -0.6875, 2.0, -28.0, -2.0, 0.0625, -32.0, 24.0, - -32.0, -1.0, 30.0, 5.0, 10.0, 1.5, 1.375, -1.5, - -0.75, -11.0, 3.0, 7.0, 7.0, -0.6875, -1.0, -16.0, - 2.5, -8.0, 0.0, -11.0, 0.0625, -3.0, 0.0, 1.75, - 3.75, 0.9375, -0.375, -56.0, -120.0, -5.0, -6.5, -0.9375, - 6.0, -3.0, 0.5, 11.0, 2.0, 24.0, -2.0, 22.0, - -24.0, 0.125, -15.0, 13.0, -15.0, 16.0, 20.0, -10.0, - -0.375, -2.5, -1.0, -2.0, -28.0, -112.0, -5.0, 1.25, - 0.625, -2.0, 20.0, -20.0, -12.0, -0.125, -10.0, -1.5, - -0.625, -0.0625, -6.0, -0.9375, 4.0, 0.125, 0.9375, -128.0, - 80.0, -3.75, -1.0, -2.5, 5.0, 0.5, 2.0, 2.5, - 1.25, -56.0, -14.0, -0.25, 0.0, 15.0, -8.0, 0.1875, - 20.0, -2.0, 2.0, 0.625, -36.0, 3.5, 12.0, -0.125, - -3.25, 16.0, 0.875, 104.0, 7.0, -0.5625, 12.0, 1.875, - -0.25, 22.0, 24.0, 20.0, -8.0, 4.0, -48.0, 1.75, - -0.1875, -1.5, 64.0, -1.0, -5.0, 1.0, 0.5, 1.0, - -6.0, 0.0, 0.0, -28.0, -0.5, -3.5, -112.0, -52.0, - -1.5, -112.0, 1.25, 8.0, 0.375, 10.0, 0.0, 0.625, - -3.0, -10.0, 104.0, -0.0625, -5.0, -0.875, 0.25, 0.5625, - 0.125, 120.0, -2.25, -104.0, 72.0, -2.0, -8.0, -1.5, - 9.0, -60.0, -10.0, 0.625, -104.0, -2.5, 0.375, 0.0, - 4.0, -12.0, -8.0, 5.0, 7.0, -12.0, 4.5, -8.0, - 56.0, 24.0, -2.25, -44.0, -1.0, -15.0, -1.0, 2.0, - 2.0, 0.0, -1.0, 6.0, -9.0, 5.5, -8.0, -1.5, - -14.0, 5.0, 4.0, -1.0, 32.0, 0.6875, -1.0, -0.5, - 120.0, 4.0, 28.0, 0.6875, -20.0, 0.8125, 24.0, 44.0, - -18.0, 9.0, -2.5, 0.0, -20.0, 0.625, 0.5, 52.0, - 6.5, -52.0, -10.0, 20.0, 8.0, -0.8125, -1.25, 1.0, - -48.0, 5.0, -0.5, -1.625, -3.0, 72.0, -40.0, -32.0, - 0.4375, 2.0, 2.25, 1.25, 6.0, 32.0, -2.5, 2.25, - -24.0, 1.0, 24.0, 8.0, 2.0, -13.0, 1.0, -20.0, - -1.125, -8.0, -2.0, -7.0, 22.0, -0.1875, -48.0, 26.0, - -9.0, -1.75, -8.0, 0.0, 0.0, -1.0, -10.0, -7.0, - 0.3125, -0.75, -1.75, 4.0, 3.5, -3.25, -5.0, -0.75, - 7.5, -6.0, 56.0, -88.0, -5.0, -36.0, -0.625, -3.0, - 3.25, -64.0, -28.0, 5.0, 1.75, -4.5, -0.25, 0.3125, - -13.0, 0.5, -1.75, -16.0, 6.5, -2.0, 0.8125, -2.0, - 3.0, -1.0, 1.75, 12.0, 12.0, 48.0, -1.0, -1.0, - 104.0, 60.0, -8.0, 52.0, -3.5, -120.0, 48.0, 32.0, - 2.5, 20.0, 4.0, 0.9375, 40.0, 16.0, 14.0, 6.0, - -7.0, -12.0, 1.75, 104.0, 0.25, -40.0, -0.375, 16.0, - 120.0, -14.0, -24.0, 12.0, -1.5, -32.0, 1.375, 8.0, - -40.0, 14.0, 0.25, 10.0, -112.0, -5.0, 0.75, -80.0, - 1.75, -2.0, -1.625, 2.25, 4.0, 0.0, -3.0, -104.0, - 0.0, -8.0, -0.75, -32.0, 60.0, -6.0, 28.0, 24.0, - -0.75, -72.0, 56.0, -2.0, 1.5, 14.0, -0.25, -64.0, - -104.0, -32.0, -2.0, 1.875, 12.0, 28.0, 24.0, -2.0, - -16.0, 15.0, -0.75, 4.0, 0.5, -2.5, -16.0, 0.0, - -5.5, -0.6875, 0.25, -1.0, -56.0, 24.0, 0.75, -1.0, - -0.125, -4.0, 4.0, 28.0, 0.5, -36.0, 2.75, -26.0, - -0.25, -64.0, -56.0, -40.0, -0.75, -2.0, -64.0, 0.5, - 20.0, -6.5, -64.0, -4.0, 4.0, 5.5, -0.25, -3.0, - 1.5, 24.0, 1.0, -32.0, 32.0, -2.0, -1.375, 9.0, - 0.375, -40.0, -8.0, -5.0, -0.1875, -16.0, -0.75, 1.125, - -1.0, 60.0, 0.0625, -1.0, 5.0, 0.625, 1.5, 48.0, - 0.5, 4.0, 32.0, -4.5, 6.0, 2.5, 8.0, -6.0, - -1.5, -3.5, -0.75, 0.0, 0.625, 11.0, 1.5, -12.0, - -6.0, 6.5, -10.0, 1.125, 16.0, 2.0, -5.0, -5.0, - -14.0, -18.0, -1.125, 6.5, 8.0, -1.25, -80.0, 1.625, - -128.0, 1.5, 6.0, 12.0, -2.25, -7.0, 56.0, 11.0, - -1.375, -1.5, -1.0, 0.0, 4.0, -13.0, 0.0, 44.0, - -0.25, -8.0, 0.0, -0.6875, -16.0, 0.625, 0.75, -0.5, - 11.0, 0.5, -0.0625, 0.0, -1.5, 56.0, -1.375, -10.0, - 16.0, -1.5, 0.625, -1.0, 26.0, 1.25, 0.5625, 26.0, - -120.0, 20.0, -2.5, -1.75, 16.0, -8.0, 0.5, 0.5, - -2.0, -1.875, -1.75, 8.0, -0.125, -32.0, -40.0, -16.0, - 10.0, -8.0, -0.3125, -1.0, -0.5, 1.375, -3.25, -10.0, - 8.0, 3.75, -56.0, 2.0, -2.0, -0.1875, 1.25, 6.0, - 1.5, -88.0, -48.0, -10.0, 1.25, 80.0, -14.0, -11.0, - 24.0, 2.0, 1.75, 0.5, 6.0, 3.0, 0.3125, -104.0, - 96.0, -18.0, 0.5, -7.0, 3.0, -7.0, -9.0, 2.0, - 3.0, 60.0, -12.0, 6.0, -6.0, -60.0, -14.0, 13.0, - 10.0, -3.75, 2.0, 28.0, 0.0625, 24.0, -9.0, 12.0, - -3.25, 14.0, 48.0, -0.625, 56.0, 2.5, -4.0, -12.0, - -7.0, 1.25, -1.75, 0.0, 28.0, 104.0, 2.0, -8.0, - 9.0, -}; -static data_t b_matrix[K_DIM * N_DIM] = { - -8.0, -7.0, -18.0, -12.0, -1.0, 1.0, -16.0, 3.0, - -28.0, 40.0, 8.0, -0.625, -2.0, 80.0, -5.5, 56.0, - -4.0, -0.875, -14.0, -1.25, 5.5, 1.75, -0.625, -7.0, - -8.0, -0.9375, -3.5, -1.0, 2.5, 4.0, 1.125, -1.0, - -0.1875, -1.25, 0.25, -48.0, 9.0, 20.0, 1.0, 18.0, - 8.0, -0.8125, 0.3125, -0.4375, -8.0, 3.5, -0.125, 1.5, - 1.0, 12.0, 6.0, 28.0, 3.25, 0.5, 2.75, -1.625, - 10.0, 0.25, 0.0625, -3.75, -1.0, -18.0, 0.1875, 5.0, - 1.0, -7.5, -0.5625, 0.1875, 0.5625, -1.0, -2.25, -28.0, - -4.0, 0.0, -16.0, 0.0, -28.0, 14.0, -88.0, -1.5, - 0.9375, -10.0, 0.75, 0.625, -1.25, 2.25, -24.0, -0.5, - -0.5, -1.125, -1.0, -1.0, -7.0, 0.625, 2.0, -40.0, - 0.75, -48.0, 0.25, 26.0, 11.0, -1.0, 0.75, 1.625, - 5.0, -32.0, 9.0, -1.375, -24.0, -1.25, 1.0, 12.0, - 0.875, 1.125, -0.6875, -7.0, -1.625, 0.375, -32.0, -0.625, - 0.75, 40.0, -0.5, 18.0, -0.8125, 12.0, -14.0, 3.0, - 40.0, -26.0, 1.5, 64.0, 8.0, -11.0, -56.0, 1.0, - -12.0, 1.5, -3.25, -7.0, -56.0, -0.375, -72.0, -0.625, - 0.0, 1.375, 120.0, -0.875, -2.0, 56.0, 12.0, 0.875, - -11.0, -4.0, -5.0, -20.0, -0.875, -1.0, 32.0, 0.25, - -0.6875, -8.0, 5.0, 1.0, 0.875, 0.5, 1.0, -0.375, - 9.0, 2.0, -96.0, -4.0, -8.0, 0.25, -0.5, 0.0, - -96.0, -96.0, -18.0, -16.0, 24.0, -5.0, -40.0, -0.4375, - 32.0, -2.0, 5.5, 2.5, -52.0, 0.75, -56.0, -0.5, - 4.0, -2.5, -64.0, -0.3125, -30.0, -1.125, -0.0625, 0.4375, - 48.0, -2.0, 0.75, -0.3125, 15.0, -16.0, 0.0, 1.375, - 4.0, 1.25, -11.0, 1.25, -2.0, -11.0, 4.0, 10.0, - -0.5, 11.0, 4.0, -3.5, 1.0, 1.625, -10.0, 3.0, - -3.0, 0.5, 11.0, -0.625, 10.0, -0.6875, -0.4375, 14.0, - 4.5, 16.0, -24.0, -16.0, -7.5, 88.0, 104.0, 0.4375, - -1.0, -1.5, -1.75, -0.1875, -5.0, 0.6875, -0.875, 14.0, - -9.0, -0.125, 0.25, 11.0, 0.625, -0.625, -36.0, -48.0, - 1.5, 8.0, -0.625, 4.5, 0.875, 8.0, -2.25, 24.0, - 0.125, -1.0, 24.0, -4.0, 0.75, -64.0, 5.0, -1.0, - 1.0, -3.0, -1.375, -1.0, 3.75, 56.0, 28.0, 16.0, - -16.0, 3.0, -2.5, 5.5, -7.5, 0.5625, -14.0, 24.0, - -72.0, -16.0, -8.0, 1.0, -14.0, -0.75, -0.75, 0.25, - 24.0, 16.0, 26.0, 36.0, 64.0, 1.625, -7.5, 40.0, - -0.125, 0.5, 2.75, -0.625, 40.0, 3.0, 0.0, 4.0, - -1.75, -0.75, 28.0, -3.75, -0.875, -2.5, 44.0, 32.0, - 2.75, -0.625, -8.0, -3.5, 3.75, -28.0, 0.625, -120.0, - -1.125, -36.0, 1.0, 0.625, 48.0, 0.0, 48.0, 0.375, - -5.5, -88.0, -5.0, -0.625, 96.0, 1.875, -64.0, 8.0, - 0.0, -64.0, 40.0, 72.0, 8.0, 7.0, -10.0, 1.625, - 3.25, 0.875, -5.0, 0.625, 0.75, -64.0, 11.0, -52.0, - -32.0, -4.0, -1.5, 8.0, 20.0, 0.75, -0.5, -18.0, - -20.0, -15.0, -7.5, -4.0, -0.9375, -40.0, 0.0, -4.0, - 72.0, -8.0, -14.0, 3.0, 104.0, 1.125, 72.0, 8.0, - -0.625, -24.0, 1.875, 1.875, 8.0, -32.0, 0.125, -2.25, - 96.0, -1.25, -7.5, -4.0, 3.75, -0.3125, 5.5, -2.0, - -5.0, 3.75, 4.0, -1.5, -60.0, 5.0, -16.0, -3.25, - 18.0, 0.0, 1.0, 10.0, -30.0, -5.5, 0.5, -5.0, - -0.75, 1.75, -28.0, 0.5625, 56.0, -26.0, -0.25, 104.0, - -0.1875, 1.625, -22.0, 0.875, 10.0, -6.0, 1.0, -2.75, - 1.875, -3.75, -1.5, -22.0, -8.0, 6.5, -1.25, -1.25, - 0.1875, -0.6875, -4.0, 18.0, -1.75, -44.0, 8.0, -3.25, - -3.0, 0.125, -52.0, 30.0, 0.875, 0.1875, -20.0, -28.0, - -16.0, 5.0, 16.0, -0.875, 120.0, 1.25, 2.75, -7.0, - 1.875, 5.0, 0.5, 4.5, 8.0, -128.0, 4.0, 14.0, - -8.0, 0.0, 60.0, -12.0, -1.0, -8.0, -16.0, 24.0, - -0.5625, -4.0, -3.5, 28.0, 2.0, -48.0, -32.0, 44.0, - 4.5, 32.0, 0.5625, -0.125, 2.0, 1.875, 3.25, 36.0, - -4.0, -2.25, -10.0, 0.375, 72.0, -1.75, -5.0, 0.9375, - 20.0, -36.0, 0.125, -3.25, -2.0, 12.0, -4.0, 44.0, - 0.1875, 2.5, -16.0, 16.0, 0.5625, -2.0, 0.75, 1.5, - 2.0, 0.0, 10.0, 28.0, 5.0, 14.0, -88.0, 40.0, - 1.5, 0.375, -24.0, 1.0, 0.375, 24.0, -120.0, 0.75, - -8.0, -28.0, -10.0, -0.25, -5.0, 0.0, -3.0, 18.0, - -1.625, -30.0, 11.0, -12.0, 0.5, -8.0, 52.0, 1.0, - 16.0, -8.0, -4.0, 2.5, 24.0, 1.125, 0.5, 2.0, - -0.3125, 96.0, 2.0, -64.0, -16.0, -0.0625, -1.25, -0.3125, - 0.0, -6.5, 1.0, 0.875, -2.75, 0.625, -0.1875, -56.0, - -30.0, 0.0, 0.375, -3.5, 28.0, 8.0, -1.875, -1.125, - 6.0, -0.375, 56.0, 56.0, 7.0, -1.625, -5.0, 0.0, - -3.0, -0.75, -0.25, 7.5, 0.25, 3.25, -7.0, -4.0, - -56.0, 22.0, -0.25, -0.5, 0.0, -2.25, -8.0, 0.25, - 0.25, -7.0, 40.0, -28.0, -128.0, -2.75, 8.0, 5.5, - -0.25, -56.0, -11.0, 7.5, -1.875, 8.0, -112.0, 5.0, - -0.25, 0.0, -4.0, -0.125, 0.1875, 2.5, 16.0, -0.4375, - -0.625, 0.0, 0.3125, 3.75, -9.0, -128.0, 20.0, 8.0, - 12.0, 0.0, 1.375, -8.0, -2.0, -0.75, -8.0, -96.0, - -36.0, -56.0, 6.0, -4.0, -3.25, 6.5, -30.0, -12.0, - -3.25, 0.375, -22.0, 1.75, 0.6875, 5.5, -2.0, 0.75, - 4.0, 8.0, -0.8125, 0.9375, -56.0, 24.0, -30.0, 3.25, - 20.0, 36.0, -32.0, 1.0, 16.0, -0.5, 0.5, 30.0, - 7.5, 8.0, 0.75, 4.0, 28.0, -4.0, -1.5, 2.0, - 0.0, -3.25, -5.5, 18.0, -120.0, 1.0, 7.0, -0.125, - -6.0, 0.0, -0.25, 0.25, 1.25, -72.0, -0.5, -14.0, - 36.0, 48.0, -22.0, 1.125, -0.375, 0.1875, -0.75, -6.5, - -3.25, -20.0, -18.0, 48.0, 88.0, -1.125, -0.375, 32.0, - -0.75, -3.75, -3.25, -12.0, 12.0, -20.0, 1.0, -0.0625, - 0.625, 0.25, 56.0, -3.25, -0.75, 0.0, -96.0, -32.0, - -6.0, 2.0, -8.0, 0.0, 1.25, 1.75, -2.5, 0.5, - -3.5, 16.0, 0.3125, -48.0, -18.0, -7.5, 1.0, 96.0, - 10.0, 1.25, 22.0, -0.25, -7.0, -7.0, 24.0, -11.0, - 0.625, -3.5, 0.75, 0.375, 3.0, -10.0, -4.5, -0.75, - -3.5, 28.0, -7.5, -6.5, -36.0, -1.0, 1.5, -1.0, - -0.5, 7.5, 0.9375, 0.75, 14.0, -1.75, 32.0, -2.0, - -40.0, 20.0, 48.0, 7.5, 0.0, 0.9375, 0.375, -5.5, - 2.0, 96.0, -1.75, 1.875, -10.0, -6.0, 22.0, -1.375, - -56.0, -20.0, 1.0, -0.75, -3.0, -4.0, -0.875, 0.5, - -3.0, -1.875, -44.0, 0.0, -1.0, -0.25, -28.0, 16.0, - 0.8125, -1.0, -2.0, -1.5, 20.0, -88.0, 2.0, -4.5, - 2.5, 0.5, -1.25, 0.5, -0.125, -4.0, 8.0, -48.0, - 0.0, -60.0, 1.375, -1.125, 0.5, 0.875, 6.0, 2.0, - 6.5, 18.0, -3.75, -1.5, 26.0, 1.0, 3.0, -28.0, - 2.5, 3.5, 24.0, -0.1875, -36.0, -26.0, -112.0, 16.0, - -4.0, 2.75, -13.0, -16.0, 16.0, -1.0, -52.0, -60.0, - 48.0, 0.375, 20.0, -0.125, 0.0, 0.875, 96.0, -2.5, - -16.0, -4.0, 0.0, 1.125, -0.75, -13.0, 1.125, 4.0, - 0.625, -2.0, -22.0, -26.0, -1.75, -0.625, 1.25, -0.25, - 0.125, -26.0, -12.0, 3.5, 120.0, 2.5, -18.0, 3.75, - 2.5, 64.0, 1.5, 0.25, -40.0, 2.5, -30.0, 7.0, - 22.0, -22.0, 24.0, -1.75, -3.5, -18.0, -12.0, 6.0, - 72.0, -3.0, 0.5, -13.0, 0.0, 6.0, 1.25, -20.0, - 7.0, -1.0, 48.0, -30.0, 10.0, 88.0, 56.0, 44.0, - -6.5, -3.0, 0.25, 20.0, -0.125, -3.0, -128.0, 0.9375, - 88.0, -24.0, -22.0, -3.5, -0.9375, 3.5, -0.25, -3.0, - 104.0, -0.375, 0.0, -0.125, -0.1875, -2.0, 24.0, -3.5, - 0.5, 0.125, -112.0, 1.5, -14.0, 22.0, 1.5, 8.0, - -0.75, -0.25, 0.6875, -56.0, 9.0, -6.0, -6.0, 4.0, - -8.0, -12.0, 16.0, -2.0, 2.25, 88.0, 0.0, 0.125, - -0.5, -16.0, 48.0, 52.0, 1.625, -4.0, 15.0, -2.75, - 80.0, -120.0, 7.0, -0.4375, 9.0, 56.0, 10.0, -80.0, - -1.625, -32.0, -3.0, 7.5, -1.75, 2.0, 6.0, -4.0, - 0.8125, -7.5, -15.0, -13.0, 0.75, 12.0, 24.0, -40.0, - 16.0, -0.625, 0.125, 3.5, -0.5625, -2.0, 22.0, 0.5, - 1.25, -7.5, 104.0, -3.5, 44.0, -36.0, 48.0, -0.0625, - 0.125, -8.0, 4.5, -16.0, -2.0, 10.0, -3.75, -10.0, - 16.0, 12.0, -1.5, -1.75, -48.0, 0.75, -2.5, 48.0, - -1.0, 60.0, -0.5, -4.0, 12.0, 20.0, 6.0, 0.625, - -0.75, 1.0, 5.0, -1.25, 0.5, -0.375, -120.0, -48.0, - 104.0, -3.5, -72.0, 13.0, -1.0, -0.75, 1.5, -3.5, - 22.0, 0.6875, -0.8125, -5.0, 24.0, -32.0, 112.0, -32.0, - 30.0, -48.0, -32.0, -4.0, -32.0, 3.0, 4.0, 0.0, - -2.0, -7.5, -48.0, 0.25, -14.0, 7.5, -20.0, 14.0, - 3.5, 7.0, 12.0, 6.0, -128.0, -8.0, 0.3125, 112.0, - -56.0, 0.3125, -30.0, -1.25, -24.0, -56.0, -0.6875, -4.5, - 0.75, -0.75, 16.0, 3.5, 0.625, -0.125, -3.25, -2.25, - -24.0, -8.0, 0.0, 56.0, -40.0, -12.0, 0.375, 40.0, - -28.0, -2.25, -8.0, -3.75, 0.0, -0.3125, -5.0, -44.0, - -1.0, -3.75, 3.0, -15.0, -72.0, -16.0, -0.8125, -6.5, - -11.0, -0.5, -0.5, 6.5, 80.0, -104.0, -0.625, -24.0, - 0.375, 16.0, -1.125, 18.0, 0.0, -3.5, -0.75, -32.0, - -6.0, -104.0, 0.25, 4.0, 4.0, -26.0, -8.0, -2.0, - 1.5, 12.0, 4.0, 1.375, -1.0, 0.3125, 5.0, 0.0, - 3.0, 24.0, -96.0, -1.625, -1.75, 4.0, 0.25, -0.25, - 3.0, -22.0, -20.0, 16.0, 12.0, 14.0, -32.0, 10.0, - 22.0, -9.0, 0.4375, 28.0, 0.875, -24.0, 20.0, 64.0, - 0.5, -3.25, 20.0, 0.125, -8.0, 72.0, -11.0, 12.0, - -40.0, 1.0, -96.0, 4.0, 1.75, -6.0, 2.0, 60.0, - 0.25, -0.25, -24.0, 3.0, -64.0, -4.0, -10.0, 24.0, - -40.0, 0.9375, 0.625, 3.0, 0.0, 7.5, -1.875, 0.125, - 8.0, -0.5, -6.5, 5.0, 32.0, 0.25, -2.25, -60.0, - 88.0, -1.0, -1.75, -0.5625, -14.0, 40.0, -3.0, 6.0, - -1.875, 3.5, -16.0, -32.0, 1.375, 1.0, -0.5, 0.75, - -24.0, 7.0, 0.6875, -15.0, 1.125, -5.0, -0.625, -0.3125, - 10.0, -2.5, 4.0, -0.25, -20.0, -56.0, -16.0, 0.0, - -40.0, -4.0, -2.0, -0.625, 0.5, 8.0, 0.875, -72.0, - 20.0, -0.25, -6.0, 0.125, -2.5, -3.5, -28.0, -26.0, - -0.8125, 3.5, 4.0, -3.75, 12.0, -6.0, -0.125, -2.0, - 0.125, 2.0, 0.0625, 0.0, -12.0, 11.0, -2.25, 1.25, - -0.5, -0.375, 7.0, 0.0, -0.375, 0.625, 7.5, -32.0, - -0.9375, -0.5, -1.875, 0.625, -52.0, -0.6875, -8.0, -30.0, - -9.0, -2.0, 16.0, -24.0, 0.3125, 18.0, 18.0, 1.875, - 5.0, 1.375, -104.0, 7.0, -14.0, -6.5, -1.0, 0.0625, - -1.375, 1.0, 8.0, -44.0, -0.5625, -0.625, -1.75, -4.0, - -12.0, -3.5, -0.875, 16.0, 1.0, 4.5, 20.0, -64.0, - -32.0, 112.0, 0.875, -1.5, 1.5, 14.0, -7.0, -2.5, - -44.0, -96.0, -7.0, 0.25, -6.0, 36.0, 0.25, 9.0, - -1.0, -8.0, 5.5, 32.0, 0.1875, 4.0, 0.75, 0.5, - -10.0, -1.0, -0.25, -1.0, 14.0, 0.25, 32.0, 28.0, - -8.0, 0.375, 0.9375, 8.0, -3.5, 5.5, 10.0, 0.25, - -1.0, 6.0, 0.625, -32.0, 2.75, 3.5, -12.0, 40.0, - -1.25, 13.0, 8.0, -8.0, 1.125, -4.0, -26.0, -0.4375, - 0.75, -12.0, -26.0, -6.0, -1.375, -5.0, -6.0, -3.0, - -0.8125, -22.0, 0.0, -2.0, 18.0, 4.0, 6.5, -128.0, - -2.75, 0.375, -2.0, -10.0, 11.0, -112.0, -0.75, -0.5625, - -5.5, -1.5, -1.0, -2.25, 2.5, 2.5, 0.0, -40.0, - 2.0, 0.0, -20.0, 8.0, -7.0, -1.0, 20.0, 26.0, - -1.0, -32.0, 26.0, -14.0, -4.0, 2.5, -16.0, -112.0, - 48.0, -5.5, -0.1875, -22.0, -16.0, 4.0, -12.0, -0.375, - -0.875, -5.0, 1.25, 4.0, -12.0, -1.125, -1.0, -5.0, - -4.5, 44.0, -40.0, -8.0, -1.25, 60.0, 3.0, -0.5, - -0.125, -8.0, 4.0, -88.0, -5.0, -4.0, -0.3125, 7.0, - 2.0, -3.0, 0.25, -48.0, 1.0, -0.125, -2.0, -48.0, - 1.75, -0.3125, 40.0, -26.0, -0.25, -0.6875, 0.25, -9.0, - 96.0, 6.0, 4.5, -0.875, -0.75, 6.0, 104.0, 3.75, - 52.0, -8.0, 3.0, 15.0, -2.25, 0.125, -0.5, 0.25, - 9.0, -2.0, 3.75, -1.0, -5.0, 48.0, -0.25, -128.0, - 13.0, 0.125, -32.0, 0.75, -14.0, 0.375, -1.75, -4.5, - -7.0, 60.0, 12.0, 12.0, -4.0, 3.0, -0.4375, -112.0, - 0.0, 22.0, 60.0, -0.75, 1.5, 1.875, -1.375, -2.0, - -1.25, -0.0625, -40.0, -40.0, -0.6875, 0.8125, -16.0, 6.0, - 1.5, 0.75, 24.0, 56.0, 88.0, 10.0, -12.0, -0.5, - -0.5, -0.3125, 64.0, 1.75, -1.5, 44.0, 2.0, -40.0, - -64.0, 24.0, -18.0, 20.0, -8.0, -40.0, 72.0, -4.0, - 20.0, 0.625, 12.0, -2.25, 0.75, -0.5, -0.6875, -12.0, - 0.75, -1.25, -4.5, 64.0, 14.0, 3.0, 0.125, -7.0, - 4.5, -0.5625, -64.0, -64.0, 1.25, 1.75, -12.0, 16.0, - -5.0, 96.0, 104.0, 2.5, -96.0, 0.3125, 8.0, -56.0, - -7.0, 56.0, -44.0, 20.0, 0.6875, -0.75, 8.0, -64.0, - -12.0, -6.0, -0.3125, -20.0, -96.0, -24.0, -1.25, 7.5, - 0.5, 0.375, 4.5, -60.0, 9.0, 0.0, -64.0, -9.0, - -0.125, 8.0, 5.0, 2.0, -40.0, 8.0, 20.0, -1.375, - 32.0, 12.0, 0.0, 12.0, -0.75, -5.5, -2.75, -2.0, - 24.0, 32.0, 12.0, -0.25, -26.0, -20.0, -0.125, 8.0, - 80.0, 13.0, -0.75, -1.5, 1.375, -6.5, -0.125, 0.0, - -8.0, -0.25, -5.0, 15.0, -0.25, 0.0, 0.0, 40.0, - 0.6875, 3.0, 40.0, 32.0, 0.875, 56.0, -16.0, 7.0, - -22.0, -112.0, 3.0, 48.0, 10.0, 0.0, 0.0, 1.5, - 4.0, 0.75, -3.25, -2.0, 4.5, -7.0, 4.0, -4.5, - -6.0, 0.25, -1.875, 18.0, 32.0, -0.8125, 10.0, -1.375, - -1.0, 2.5, -0.75, 1.5, 32.0, 3.25, 1.125, -14.0, - 0.0, -28.0, -2.75, -2.0, 2.0, 16.0, -0.3125, 1.0, - 4.0, -1.375, 0.9375, -1.5, -4.0, 7.0, 104.0, -11.0, - -30.0, -12.0, -3.0, 40.0, -1.375, -0.5, -22.0, -40.0, - 0.1875, 0.5, -24.0, 1.0, -1.0, -3.0, -48.0, -8.0, - 8.0, 112.0, -88.0, -3.0, -1.5, -1.0, -80.0, 28.0, - 0.125, -3.5, 0.375, 0.5625, 16.0, 9.0, 3.25, 22.0, - 9.0, 28.0, -18.0, 0.5625, -64.0, 0.0625, -0.625, 16.0, - -12.0, -0.1875, 0.375, 0.9375, -0.4375, 120.0, -6.0, -0.125, - -1.25, -8.0, -2.0, -28.0, -6.0, 60.0, -16.0, -6.0, - -3.0, 0.0, 0.625, -0.125, 0.375, -1.0, -0.25, -128.0, - 1.125, -1.75, 10.0, -7.5, 2.75, -26.0, -44.0, -48.0, - -24.0, 0.125, 3.5, 8.0, -3.5, 3.5, -2.0, 44.0, - -1.0, 60.0, 4.0, 24.0, 64.0, -0.25, 0.25, 1.75, - 0.875, 32.0, -13.0, 2.25, -8.0, -1.5, -40.0, 4.0, - 2.25, 4.0, 3.0, 2.25, -10.0, -10.0, -20.0, -128.0, - -1.25, -3.0, 2.25, -3.5, 52.0, -40.0, 0.375, 1.5, - -0.125, 8.0, -8.0, -9.0, -0.5625, 120.0, -2.0, -48.0, - 3.0, -128.0, 1.0, 7.0, 1.5, 2.0, 4.0, 4.0, - -2.75, -1.625, 1.875, 0.0, 0.25, 1.875, 4.0, 0.5, - -0.5, 0.6875, -2.0, -2.0, 14.0, -6.5, 24.0, -40.0, - 8.0, -1.625, 2.75, -12.0, -24.0, 5.0, -0.25, -0.3125, - -40.0, -0.75, -1.0, -3.0, -7.5, 12.0, 1.0, 8.0, - 2.75, 120.0, -48.0, 1.0, -8.0, 0.5, 7.0, -0.5, - 1.75, -3.25, -6.0, -0.1875, 1.625, -64.0, 0.0, 8.0, - -30.0, 30.0, -1.25, 16.0, -0.8125, -1.75, -52.0, -0.75, - 12.0, -44.0, -11.0, -28.0, -1.25, -36.0, -4.5, 32.0, - -24.0, 12.0, -64.0, -18.0, -24.0, -6.0, -3.0, -0.25, - 2.0, -16.0, -48.0, -0.75, -2.0, 4.0, -0.5, -4.0, - 7.0, -48.0, 0.625, -52.0, 6.0, 80.0, -4.0, -1.0, - 14.0, 3.5, 1.5, -1.75, -10.0, 0.0, -16.0, 3.5, - 6.0, -14.0, 28.0, 1.0, -3.0, -8.0, 0.625, -0.5625, - -0.5, -1.5, 4.0, 0.875, 12.0, -3.0, -10.0, -3.0, - 1.0, -60.0, -4.0, -56.0, 0.75, 24.0, -15.0, -40.0, - 8.0, 4.0, 40.0, -16.0, 56.0, -6.0, 96.0, 112.0, - 0.25, -0.0625, 2.0, -1.375, 10.0, 72.0, -16.0, -120.0, - 2.5, 0.0, 0.5, 0.0625, 48.0, -0.125, 0.25, -6.5, - 0.875, 16.0, -28.0, 0.5, 0.5, -12.0, -6.5, 1.25, - -0.4375, 0.9375, 0.875, -0.375, 32.0, -3.25, 0.5625, 88.0, - -112.0, -20.0, -0.3125, -24.0, 0.4375, 12.0, 1.25, -0.75, - 0.375, -0.875, 0.9375, -13.0, 0.125, -8.0, -0.75, -12.0, - -28.0, 13.0, 28.0, -3.0, 0.6875, -120.0, -28.0, -4.0, - -5.5, 0.3125, 112.0, -4.0, 28.0, 4.0, 5.5, 2.0, - 10.0, 0.25, -0.25, -3.0, -32.0, 0.8125, -24.0, -14.0, - 120.0, 0.75, 0.1875, 1.0, 0.0, 7.5, 104.0, 112.0, - 1.375, -48.0, 32.0, 0.3125, -0.5, 4.0, 96.0, 20.0, - -28.0, 0.375, -0.3125, 3.75, -0.875, -15.0, 24.0, 1.125, - 0.6875, -2.5, 3.0, -16.0, 2.5, -80.0, 2.0, -0.6875, - -0.25, -1.875, 10.0, -1.25, -16.0, -14.0, -1.5, 96.0, - 3.5, 8.0, -0.3125, -5.5, 3.0, 0.5625, 0.75, 10.0, - -5.0, -0.25, -60.0, 1.125, 1.125, 20.0, 7.5, -12.0, - -9.0, 0.5, 48.0, -48.0, 0.25, -20.0, -7.0, -56.0, - -12.0, -0.125, 1.5, -18.0, 0.0, 0.25, 2.0, 24.0, - 3.0, 20.0, -2.75, -0.5625, -40.0, -1.25, -8.0, 0.25, - 3.25, -0.875, 0.875, -12.0, -6.0, 0.5, 60.0, -16.0, - -26.0, -8.0, -2.0, 0.5, 3.75, -24.0, 6.0, 6.0, - 4.0, 56.0, -0.5, 1.0, -0.1875, 64.0, 0.0, 8.0, - -3.75, -16.0, -24.0, -6.5, -1.25, 0.0, -4.0, -7.0, - -28.0, -24.0, -112.0, -6.0, -1.0, -1.625, -28.0, 6.0, - 0.0, -18.0, 0.625, -2.25, 32.0, -5.0, 0.875, -8.0, - -0.5, 0.5, -3.0, -0.5625, -8.0, 3.5, 1.0, 5.5, - 120.0, -2.0, -3.0, -0.125, -0.0625, -1.875, -28.0, -26.0, - 3.25, -4.0, 24.0, -1.5, 7.0, 1.625, 1.5, -0.875, - 0.375, -1.5, -26.0, 28.0, -1.0, 0.0, -0.5, -52.0, - -14.0, 112.0, 0.0, -32.0, -1.375, -1.75, -1.25, -30.0, - 24.0, 36.0, 8.0, -0.75, -15.0, 5.0, 8.0, -8.0, - -3.5, -14.0, -1.75, 2.0, 0.625, 80.0, 1.125, 1.25, - 0.75, -16.0, -6.0, 8.0, -3.0, -7.5, 0.9375, 72.0, - -2.0, 0.625, 0.5, 24.0, -1.875, 0.0, -88.0, -8.0, - -7.0, 30.0, 0.0, -10.0, -5.5, -0.8125, 0.5, -3.5, - 2.25, 1.75, -0.3125, 0.0, -24.0, -7.5, -36.0, -0.1875, - 9.0, -4.5, -18.0, -2.0, 0.875, -3.0, 0.25, 3.0, - -3.0, -26.0, -36.0, 32.0, 48.0, -4.5, -40.0, -0.9375, - -0.75, -20.0, -8.0, -24.0, -0.375, -0.25, 12.0, 32.0, - -14.0, -12.0, -1.375, -28.0, 88.0, -4.0, 88.0, -2.0, - 11.0, 6.5, 32.0, -1.0, -0.5, 112.0, 28.0, 1.75, - 20.0, 4.5, -0.25, -0.8125, -0.625, 0.0, 8.0, -12.0, - 15.0, -1.125, -0.8125, 5.0, -0.125, -1.125, 3.0, -11.0, - -3.0, 0.5, -16.0, -0.8125, 8.0, -0.25, -2.0, -2.0, - 1.5, 12.0, -1.875, 16.0, -4.0, 0.125, -10.0, -3.0, - -1.25, 0.375, -112.0, 96.0, 2.5, 2.5, 52.0, 20.0, - 0.25, -40.0, -120.0, 6.0, -0.25, -11.0, -0.375, -2.25, - -64.0, 0.75, 0.0, -2.0, -24.0, -6.0, 18.0, -0.25, - 0.5, -3.0, 7.0, -30.0, 7.5, -0.875, -3.0, 1.625, - 0.0625, -0.5, 5.0, -8.0, 8.0, -0.4375, 22.0, 1.875, - 64.0, -1.5, 1.5, 0.5, 0.0625, -0.375, 1.5, -0.9375, - 28.0, -48.0, 5.0, -3.25, 0.875, 120.0, 11.0, 0.375, - -120.0, -22.0, -2.5, 3.25, 112.0, -8.0, -0.5, -1.5, - 0.25, 2.0, 3.5, -6.0, 0.8125, 0.0, -4.0, -4.0, - 2.0, 4.0, 30.0, -5.0, -12.0, -16.0, 0.0, 3.5, - 2.0, -0.125, 0.4375, -1.125, 10.0, -5.0, 1.125, 2.75, - 0.875, 24.0, -0.125, -4.0, -0.5, -12.0, -3.25, -14.0, - 22.0, 1.0, 1.875, 56.0, 11.0, 7.5, 0.0, 20.0, - -1.0, 0.875, 0.0, -2.5, -48.0, -1.25, 56.0, 11.0, - -0.5, 1.0, 28.0, -16.0, -3.5, 3.5, 5.0, -0.1875, - 4.0, 0.875, 0.0, 52.0, -1.25, 8.0, 7.0, -3.0, - 0.625, 2.25, -1.625, 5.5, -6.0, 3.75, -15.0, -2.0, - 32.0, 15.0, -3.0, 15.0, -4.5, -6.0, 2.25, -1.875, - -80.0, 8.0, 6.0, -9.0, -48.0, 24.0, 8.0, -5.0, - -1.25, 40.0, -64.0, -2.0, -112.0, 28.0, -0.25, 5.0, - 10.0, -22.0, 3.75, 0.5, 48.0, -56.0, -4.0, 10.0, - -30.0, -0.5625, -24.0, -0.25, 0.5, 3.0, -60.0, 56.0, - -0.625, -1.75, -0.5, -2.0, 0.5, -56.0, -2.0, 3.25, - -18.0, 11.0, 18.0, -9.0, -2.5, -0.625, -8.0, -2.0, - 18.0, -8.0, 6.0, -20.0, -32.0, -13.0, 44.0, -120.0, - -36.0, 1.5, 3.0, -52.0, 1.0, 0.6875, -7.0, -16.0, - 1.0, -96.0, -2.0, -3.5, -0.5, 16.0, 0.5, 7.0, - -120.0, -32.0, 1.125, -80.0, -0.1875, 2.0, -3.0, -3.5, - -0.125, -20.0, -32.0, -7.5, 5.0, -0.0625, 14.0, -20.0, - 11.0, 0.5, -8.0, 5.5, 16.0, -6.0, -2.0, -3.0, - 13.0, -12.0, 1.0, -0.9375, 1.5, -2.0, -16.0, 0.25, - 28.0, -48.0, -5.0, 0.6875, 0.0, -44.0, 18.0, 0.5625, - 1.5, 3.5, -6.0, -2.0, -20.0, -8.0, 0.5, -96.0, - -16.0, -1.375, 30.0, -56.0, -2.75, 3.5, 2.0, -30.0, - 6.0, 5.0, 1.375, 48.0, 10.0, -7.0, -0.8125, -0.75, - -16.0, -6.0, 0.0, 18.0, 0.3125, -0.8125, -2.0, -7.0, - 1.0, -1.0, -6.0, 0.875, 36.0, -64.0, -0.8125, -1.0, - -0.5, -16.0, 20.0, -8.0, 52.0, -2.25, 1.875, 4.0, - -6.0, 1.875, -44.0, -6.0, 0.75, 7.0, 0.0, 26.0, - 14.0, 3.0, -4.5, -9.0, 40.0, 44.0, 5.0, 20.0, - 1.0, -2.75, 56.0, -28.0, -14.0, 6.5, -0.5625, -64.0, - -64.0, 3.0, 0.5, 44.0, -32.0, -0.125, -14.0, -0.625, - 6.0, 0.375, -0.5, 1.5, 5.5, 0.75, -56.0, 112.0, - -0.25, 0.125, 5.0, 0.0, -2.0, -0.5, -0.125, -96.0, - -3.0, 2.0, -1.25, 12.0, -6.0, 4.0, -7.5, -0.5, - -4.0, 0.6875, -80.0, 1.5, 8.0, 12.0, 14.0, -80.0, - -1.0, 0.5625, -0.3125, 11.0, -104.0, 72.0, 8.0, 10.0, - 8.0, -1.5, -1.5, -44.0, -0.625, -1.75, -0.625, 16.0, - -8.0, -5.5, -1.125, 0.375, -0.25, 40.0, -14.0, 2.75, - 0.125, -36.0, -0.1875, -5.0, 12.0, -0.75, 44.0, -0.75, - -32.0, 0.25, 4.0, -64.0, 60.0, -16.0, -2.0, 4.0, - 0.8125, 0.0625, 2.0, -0.25, -0.625, 0.25, 0.25, -0.625, - 0.0, 1.875, 7.5, -16.0, -1.125, 1.625, 3.25, 2.5, - 104.0, -120.0, -1.0, 0.5, -0.5, -28.0, -6.5, 5.0, - 1.5, 0.5, -3.5, 0.0, -1.5, -2.5, 40.0, 4.0, - -0.25, -1.0, 0.1875, 0.625, -0.75, 0.75, 0.0, 0.875, - 4.0, 5.5, -104.0, 14.0, 0.5625, 0.125, -5.0, -24.0, - 2.0, -64.0, -26.0, 1.25, -3.0, -6.0, 2.5, 26.0, - -12.0, 6.5, 120.0, -64.0, -1.0, 52.0, 36.0, -1.75, - 7.5, -1.0, 2.0, -0.3125, -1.75, 10.0, 6.0, 1.5, - 48.0, -2.0, 0.75, 0.75, -1.0, -9.0, -0.625, -24.0, - -28.0, 80.0, 0.0, -56.0, -1.625, 0.75, 1.375, 88.0, - 2.25, -13.0, 7.0, 0.9375, -16.0, 120.0, 6.5, -2.5, - 26.0, 24.0, 0.875, -0.875, -16.0, -10.0, 56.0, -0.9375, - -56.0, -8.0, -56.0, -1.625, 0.0, -2.25, 60.0, 1.5, - 0.125, 0.0, 64.0, 52.0, -0.625, 0.75, -2.0, 120.0, - 0.5, 1.25, 0.5, -0.625, 0.375, 11.0, 5.0, 24.0, - 8.0, 1.375, 6.5, 1.25, 14.0, 0.625, -64.0, 2.0, - 32.0, 0.1875, 1.25, -48.0, 3.25, -0.625, -112.0, 1.375, - -1.5, -72.0, 8.0, 3.0, -1.5, 5.0, 10.0, 4.0, - -4.0, -0.1875, 12.0, 28.0, -0.75, -0.125, 0.375, -16.0, - 1.0, 16.0, -3.0, 12.0, 6.0, 2.5, 0.375, 1.0, - -44.0, 24.0, -0.5, 120.0, -0.0625, -8.0, -20.0, -22.0, - -4.0, -14.0, -14.0, 1.75, -3.0, 2.5, 80.0, -24.0, - 24.0, -4.0, 0.1875, -40.0, 0.0, -3.75, 120.0, 40.0, - -0.4375, 15.0, 1.0, 52.0, 1.25, 1.0, -32.0, -1.5, - 112.0, 0.375, 6.0, -48.0, -12.0, -0.25, 1.25, -1.0, - 8.0, -5.5, -56.0, 3.0, 3.5, 7.0, -0.375, 0.375, - -0.9375, -96.0, -3.75, -4.5, 0.25, -3.0, 8.0, 5.0, - 4.0, -0.5, 0.0, 16.0, 0.0, -6.5, 0.0, 0.25, - 0.0, -0.0625, -72.0, -40.0, 24.0, 0.5625, 1.5, -28.0, - 56.0, -1.75, -16.0, 24.0, -6.0, 0.5, -26.0, -8.0, - 1.5, 0.6875, -0.625, 12.0, -0.25, 0.5, -12.0, -0.5, - -0.25, 60.0, 20.0, 1.0, -1.375, -2.5, -0.75, 80.0, - -4.0, -15.0, -80.0, -4.0, 1.5, 88.0, -8.0, -0.25, - -3.5, -3.5, 9.0, 4.5, 20.0, -2.0, 0.0, 16.0, - -2.5, 0.8125, 16.0, -56.0, 0.3125, -60.0, 1.625, -28.0, - -1.0, -26.0, -8.0, -1.375, -1.25, -0.875, -32.0, 0.625, - -1.0, -36.0, -1.5, 22.0, 3.0, -8.0, 44.0, 14.0, - -1.75, 0.0, -0.6875, -56.0, 13.0, -20.0, -0.875, -30.0, - 52.0, 0.625, -52.0, -0.125, 7.0, 20.0, -1.75, 30.0, - -4.0, 52.0, 1.125, 24.0, 40.0, 1.5, -56.0, -64.0, - 1.125, -7.0, 1.375, -22.0, -0.1875, -0.875, -0.25, -26.0, - -0.5625, -96.0, 32.0, 120.0, 0.8125, 32.0, 2.75, -2.0, - 0.75, -112.0, 6.0, 5.0, -0.8125, 1.25, 6.0, -14.0, - -3.25, 20.0, 24.0, -1.0, 80.0, -0.625, 0.8125, -4.5, - 20.0, 2.0, 1.375, 2.75, -7.0, 18.0, -1.5, -0.25, - -0.875, 7.0, -16.0, 4.0, -12.0, 96.0, -16.0, -24.0, - 1.0, 0.5, 96.0, 3.5, 1.75, 0.25, 0.625, -88.0, - 0.375, 10.0, -0.6875, -1.75, 0.5, 2.5, 44.0, 14.0, - 40.0, 1.25, -56.0, -1.0, -7.0, 1.75, 1.0, -1.25, - -2.0, -12.0, 7.5, -10.0, 0.375, 0.0, -72.0, 0.6875, - 2.75, -44.0, 6.5, -0.25, -0.5625, -3.0, 6.5, -13.0, - -0.4375, 6.0, 8.0, -3.5, -48.0, 0.75, -4.0, 4.0, - 4.5, 0.375, -8.0, 2.0, 10.0, -3.5, -16.0, -3.0, - -48.0, 6.0, -7.0, 0.125, -12.0, 24.0, -1.0, 18.0, - 5.0, 0.375, 3.5, 0.3125, -0.375, 1.0, 48.0, 2.75, - 3.0, -0.5, -2.25, -0.6875, -128.0, -0.6875, 11.0, 1.125, - 0.75, 4.0, -16.0, 1.625, 12.0, 16.0, -8.0, -8.0, - -40.0, -0.1875, -14.0, 2.5, 0.125, 88.0, -3.0, -0.5, - -1.25, -1.875, -32.0, 6.0, -7.0, 16.0, 0.1875, 2.0, - -0.5, -11.0, 10.0, 88.0, -10.0, -1.0, -30.0, 6.0, - 0.375, -13.0, 2.5, -16.0, -36.0, -0.75, -1.0, -12.0, - -2.0, 0.5625, -0.625, -20.0, -0.25, 5.5, -0.125, -64.0, - -1.125, -5.5, 4.0, -12.0, 1.625, 36.0, 2.5, 72.0, - 0.0, -48.0, 2.75, -1.625, -7.0, 52.0, 48.0, 6.0, - -0.875, 3.25, 32.0, -1.5, -8.0, -32.0, 8.0, 2.0, - 1.5, -1.25, 5.5, 3.0, -1.5, 0.75, 1.25, -12.0, - 0.0, -3.0, -16.0, -3.75, -1.0, 0.25, -32.0, 48.0, - 28.0, 0.8125, 8.0, 14.0, -0.75, 56.0, -1.0, 16.0, - 64.0, -2.75, 4.0, 0.5, 0.0, -0.75, -1.75, -0.9375, - -2.5, -0.5625, -24.0, -56.0, 3.5, -56.0, 0.25, 0.5, - -1.75, -32.0, 64.0, 3.0, 4.0, 0.0, -0.875, -22.0, - -56.0, -2.0, 64.0, 3.0, -1.25, -8.0, -1.0, -32.0, - 7.0, 5.5, 6.0, -16.0, -7.5, 1.0, 96.0, 2.5, - 2.0, 2.0, -1.5, 6.0, 48.0, -24.0, 2.5, -1.625, - 3.0, -6.5, -5.0, 2.5, 0.9375, 5.0, -28.0, 88.0, - -48.0, -7.0, -5.5, 14.0, -88.0, 28.0, -16.0, 0.5625, - 1.125, 20.0, 0.0, -16.0, -3.5, 16.0, 0.3125, 0.8125, - -1.0, -3.0, 96.0, -1.375, -20.0, 0.0, -3.0, 2.0, - -16.0, -7.0, -3.0, -6.5, 0.0625, -10.0, -0.5, -1.25, - -7.5, 2.0, 0.375, -2.0, 0.625, 0.75, -112.0, -1.5, - 2.0, -1.75, 8.0, -28.0, 2.0, 1.75, -6.0, 24.0, - -1.0, 4.0, 5.5, 1.0, 5.0, 16.0, 15.0, 2.0, - -0.3125, -20.0, -32.0, 15.0, -4.0, -2.0, 1.0, -1.0, - -0.25, 0.75, -2.25, 1.375, -2.0, 3.0, -12.0, -0.6875, - -0.75, -26.0, -1.75, 3.0, -4.0, -0.1875, 2.0, -6.5, - 7.0, -0.1875, -0.0625, 1.375, -104.0, 0.75, 1.875, 20.0, - 3.5, -3.0, 20.0, 1.75, 0.0625, 1.0, 20.0, 1.25, - -0.5625, 13.0, -1.0, -4.0, -2.0, 0.5, 9.0, 2.0, - -2.0, 10.0, -1.0, -1.75, 2.0, -40.0, 0.375, -0.625, - 18.0, -56.0, -26.0, -3.5, -2.0, -104.0, 0.25, 2.0, - 28.0, -48.0, 4.0, -0.8125, -28.0, -2.0, -128.0, -60.0, - 2.0, 32.0, 3.0, -28.0, -0.5625, 16.0, 2.5, -88.0, - -0.5625, -120.0, -2.0, 16.0, 1.0, 1.125, -2.0, -16.0, - -0.125, 0.0, 7.0, 0.0625, -10.0, -6.0, 22.0, 9.0, - 8.0, -1.0, -32.0, -28.0, 72.0, -4.0, 0.0, 2.25, - -1.75, -0.5, -3.5, 5.0, -7.0, 0.75, 48.0, 0.5, - -3.5, -128.0, -2.5, -1.25, 56.0, -0.375, -2.0, 0.125, - -3.5, -80.0, 12.0, 1.875, 1.375, -0.5625, 2.5, 56.0, - -1.125, 2.0, 24.0, 120.0, -16.0, -24.0, -1.0, 24.0, - 0.0625, -1.75, 14.0, -2.75, 5.0, 22.0, 9.0, -0.125, - 7.0, -7.5, -7.0, 1.0, -16.0, -24.0, -32.0, 20.0, - -24.0, -10.0, 56.0, 24.0, -12.0, -0.625, 0.0625, -52.0, - -1.0, -3.5, -3.25, 4.0, -11.0, -8.0, 0.0, 3.75, - 8.0, 1.75, -72.0, -0.75, -5.0, 20.0, -12.0, 56.0, - 2.0, -24.0, 16.0, 1.5, 0.5, -0.125, -1.5, 3.75, - 15.0, -16.0, -24.0, 2.25, -104.0, -96.0, 30.0, 32.0, - 0.5, 26.0, 44.0, 12.0, 0.25, 0.75, 1.75, -10.0, - 0.625, 0.5625, 0.75, -64.0, 0.25, 0.125, -120.0, -15.0, - 7.0, -0.625, -112.0, -7.0, -64.0, -1.0, 8.0, -10.0, - -0.5, -6.5, 0.3125, -104.0, 0.0, 2.0, -32.0, -0.8125, - 8.0, 1.5, -3.25, -32.0, 30.0, -9.0, 88.0, -2.25, - 6.5, 0.5, -3.5, 0.5625, 56.0, -9.0, -32.0, -13.0, - 3.0, -5.0, -16.0, -0.0625, 2.0, -3.0, 1.0, -4.0, - -1.0, -22.0, -28.0, 3.0, 0.375, -64.0, 3.0, -1.5, - -48.0, -4.0, -88.0, 8.0, 88.0, 0.0, 2.0, -0.375, - -4.0, -44.0, 8.0, -1.0, 4.5, 64.0, -20.0, 0.0625, - 2.0, -4.0, 0.0, -8.0, 56.0, 0.5, -24.0, 10.0, - -1.5, 26.0, -0.375, -24.0, 36.0, -0.75, -32.0, -0.625, - -16.0, 0.375, -8.0, -2.5, -2.0, -80.0, 36.0, -0.5, - 2.75, -1.5, -12.0, 40.0, -7.0, 0.1875, -96.0, 112.0, - -8.0, -7.0, 8.0, -0.25, -0.875, 5.0, 7.0, 40.0, - -0.875, 0.5, 1.0, -2.0, 20.0, 0.0, 6.5, 18.0, - 3.5, -4.0, 28.0, -14.0, -0.125, 2.75, 0.0, 1.5, - -2.5, -1.875, 3.75, -2.0, -0.125, 1.25, 0.5, 32.0, - 30.0, -6.0, 3.0, 36.0, 0.0, 0.875, 0.375, 48.0, - 2.75, 0.75, 3.25, 4.5, 40.0, -11.0, 80.0, -2.0, - 1.875, -0.375, 56.0, -3.25, -14.0, -3.5, -2.0, -0.8125, - -0.3125, -0.8125, 10.0, -2.25, -3.5, 11.0, -0.25, 56.0, - 3.5, 5.5, 0.75, -0.875, -1.0, -0.1875, 0.0625, -0.625, - 0.875, 14.0, -0.5, 56.0, 16.0, -2.0, 4.5, 30.0, - 3.25, 2.0, 56.0, 14.0, 120.0, 4.0, 0.875, -1.125, - 3.0, -40.0, -5.5, -2.0, -12.0, -12.0, 4.0, 2.75, - -9.0, 1.625, 1.5, -12.0, -28.0, 2.0, -1.125, 0.1875, - -0.375, -0.3125, 7.0, -3.25, 14.0, 7.5, 6.5, -36.0, - -0.25, 6.0, 0.75, 0.5, -7.0, -24.0, 0.6875, -1.25, - -1.125, -7.0, 2.0, 56.0, 0.5, -1.375, -3.0, 32.0, - 6.0, -2.5, 10.0, -10.0, 72.0, 40.0, -2.0, -1.0, - -11.0, -16.0, -12.0, 1.125, 0.75, 0.5625, -7.0, -0.0625, - -3.0, -14.0, 0.25, -5.5, 40.0, 0.8125, 8.0, 7.0, - -2.0, -96.0, 28.0, 3.0, 1.875, -1.25, -26.0, 0.6875, - 0.25, 6.5, 0.875, 0.0, 20.0, -0.6875, 0.6875, 1.625, - 1.5, 0.0, 15.0, -9.0, -72.0, -14.0, -0.25, 72.0, - -9.0, 6.0, -0.25, 0.3125, -4.0, -2.5, 3.0, -20.0, - 24.0, -4.0, 1.0, 44.0, -4.0, -72.0, -96.0, 6.0, - 0.875, 0.6875, -32.0, 40.0, 44.0, 1.75, 18.0, 24.0, - -2.25, -0.5625, 36.0, 5.5, -16.0, 0.25, 120.0, -32.0, - 2.0, -3.75, 4.0, -7.0, 30.0, 1.5, -1.375, 0.0, - -0.875, 0.0, -0.75, -4.0, -2.0, 8.0, 2.0, 4.0, - -40.0, -52.0, -40.0, -96.0, 10.0, -2.75, 20.0, -20.0, - 24.0, -0.375, -2.25, 0.75, -11.0, 12.0, 1.0, -128.0, - -52.0, 0.5625, -2.5, 2.25, -22.0, 104.0, 36.0, -6.5, - 28.0, 0.1875, -14.0, -0.75, 1.5, 0.25, 0.375, 8.0, - 24.0, 1.75, 96.0, -0.9375, 5.5, -2.75, -0.3125, 28.0, - -32.0, -28.0, 0.625, -56.0, 16.0, -16.0, 1.0, 3.0, - 28.0, -64.0, -24.0, 2.0, 1.0, -1.5, 24.0, 16.0, - 0.25, 56.0, -0.5, 5.0, 0.5, 24.0, -1.125, 1.0, - 112.0, 1.5, 1.75, -1.375, 0.6875, -88.0, 1.5, -0.1875, - 1.75, -0.5, 0.3125, 6.0, -8.0, 6.0, 0.25, -96.0, - -0.25, -10.0, -28.0, -64.0, 1.0, 1.5, -0.625, -1.25, - -40.0, 3.5, -12.0, 1.375, -0.25, -1.25, -32.0, 24.0, - 3.25, -10.0, -36.0, -1.5, 32.0, -0.625, -0.5, 2.0, - -0.375, 0.0, -13.0, -4.0, 40.0, 0.75, 2.0, -4.0, - -1.0, -10.0, -0.1875, 0.5, 12.0, 20.0, 9.0, -10.0, - -0.9375, -0.875, 36.0, -28.0, 4.0, 1.875, -72.0, -0.5, - -1.0, 8.0, -0.75, 12.0, -3.5, -1.0, -0.6875, -0.125, - 28.0, 2.0, -1.0, -20.0, 2.0, -6.0, -5.0, 96.0, - -48.0, 0.25, -64.0, 24.0, -0.1875, -52.0, 7.0, -2.25, - -13.0, 6.5, -2.25, 5.0, 3.75, -96.0, -0.875, 0.75, - -52.0, -14.0, -2.5, -5.0, 64.0, 0.5, -0.75, -13.0, - -4.0, 0.0, 7.0, 60.0, 0.0, -24.0, 0.0, -32.0, - -0.9375, -1.5, 15.0, -44.0, 9.0, 26.0, 15.0, 0.0, - 0.625, -64.0, 0.0, -2.5, -8.0, -36.0, 1.0, 6.0, - 0.25, 0.5, 1.75, -4.0, -32.0, -4.0, 9.0, 3.0, - 88.0, -40.0, -3.0, -4.0, -1.25, 24.0, 56.0, 28.0, - 15.0, 0.3125, -0.8125, 22.0, 0.25, 1.5, 4.0, 14.0, - -0.375, -112.0, -28.0, -0.875, 0.375, 0.875, -4.5, 2.0, - -28.0, -10.0, 0.5, 5.0, -32.0, 0.0, 0.75, 0.3125, - -6.0, 0.0, -3.0, 3.25, 0.0, -3.75, 26.0, -30.0, - -1.0, 40.0, -1.5, -2.5, -2.0, -8.0, -0.875, -7.0, - 4.5, 0.875, 0.5625, -0.8125, -5.5, -0.75, 0.4375, -6.5, - -0.4375, 1.875, 0.125, -0.75, 1.0, -3.75, -1.0, -2.25, - 1.25, -0.5, -12.0, 5.5, -3.0, 32.0, -2.0, 24.0, - 0.6875, -0.125, -6.0, -0.375, 12.0, -128.0, 2.0, -0.875, - -6.0, 2.75, -60.0, 0.0, -1.0, -48.0, -8.0, -88.0, - -32.0, -128.0, -4.0, 8.0, -36.0, -2.5, -3.25, -1.0, - 2.0, 2.0, -4.5, -0.1875, 2.5, 4.0, 32.0, 20.0, - -1.25, 2.0, -2.75, -0.375, -104.0, 5.5, 36.0, -16.0, - 8.0, -52.0, -1.0, -128.0, -0.5, 24.0, -3.0, -0.375, - 30.0, 32.0, -24.0, 10.0, -1.0, -5.5, -3.75, -15.0, - -24.0, 2.25, 12.0, 56.0, -1.0, 2.5, -0.875, -96.0, - 4.0, 12.0, -32.0, -1.75, 16.0, -0.375, 0.75, -1.75, - 3.25, -1.0, -20.0, -56.0, -24.0, 0.3125, 0.75, 24.0, - 15.0, 2.25, -16.0, -0.625, 1.5, 9.0, 0.5, -16.0, - 2.0, 0.625, -11.0, -6.0, -3.5, -44.0, -6.5, 1.0, - 4.0, 0.5, 1.75, 18.0, -3.0, -0.375, -3.25, 16.0, - -30.0, -4.0, 0.5, 4.0, -44.0, 56.0, -64.0, -24.0, - -0.3125, 40.0, 30.0, 0.0, 1.5, 0.75, 3.75, -0.8125, - 1.75, -9.0, 6.0, 14.0, 0.0625, -104.0, 28.0, 4.5, - -18.0, 2.5, 1.875, -16.0, -28.0, -4.0, 40.0, 0.375, - 0.25, 24.0, 0.375, -6.0, -22.0, -9.0, 0.125, 30.0, - -5.0, 6.0, -16.0, -44.0, -1.625, -16.0, -2.25, 3.75, - 3.25, -20.0, -13.0, 0.0, -1.0, -52.0, -20.0, -8.0, - -52.0, -48.0, 5.0, -128.0, 72.0, -0.375, -7.0, -7.5, - -30.0, 0.625, -8.0, -30.0, -1.75, -8.0, 0.6875, -1.5, - 3.5, -8.0, 4.0, 32.0, -0.4375, 10.0, 0.5625, 0.625, - -0.625, -32.0, -26.0, -0.875, 64.0, 2.75, 10.0, 0.25, - 7.0, 4.0, 1.5, -6.0, -2.0, -14.0, 0.25, -0.75, - 0.0, -7.5, 0.375, -40.0, -112.0, 0.75, -12.0, -3.0, - -28.0, 0.25, -120.0, -3.0, 15.0, -48.0, 26.0, 1.625, - -4.0, 6.5, 0.625, 1.5, 104.0, 4.0, -12.0, 12.0, - 1.5, 24.0, -7.5, -0.625, 7.0, 32.0, 12.0, -9.0, - 2.25, 40.0, -0.375, -4.5, 0.8125, 22.0, 0.5, 3.0, - 1.0, 6.0, -36.0, 112.0, 4.0, 0.5625, 10.0, 120.0, - -6.0, 36.0, 0.0625, 26.0, -12.0, -4.0, -24.0, -4.0, - -0.375, -2.0, -0.5, -22.0, -0.25, 0.8125, -2.0, -0.9375, - 6.0, -0.875, 0.125, 0.5, -1.0, -96.0, -0.1875, -64.0, - 1.25, -60.0, -2.0, 30.0, 104.0, 5.0, -32.0, 0.0, - 0.1875, 0.375, -0.75, 0.75, -64.0, -52.0, 4.0, -3.5, - 0.25, 24.0, -2.25, 0.0, -0.125, -1.5, 26.0, 0.5625, - -40.0, -32.0, -7.0, 44.0, -12.0, 2.75, -26.0, -1.75, - -2.25, 9.0, -0.375, 22.0, -1.75, -48.0, 16.0, -0.0625, - 7.5, 1.0, -48.0, 96.0, -1.625, -28.0, 0.0, 15.0, - -2.0, -1.0, -1.0, 0.9375, 8.0, -12.0, -2.0, -6.0, - -11.0, 11.0, -1.375, 6.0, 1.625, -1.5, 40.0, -8.0, - 5.0, 1.75, -0.625, -16.0, -16.0, -2.0, 28.0, 8.0, - -20.0, 2.5, 0.25, 1.875, 1.25, 0.25, 120.0, 0.0, - -40.0, -28.0, 0.4375, -2.0, 28.0, -18.0, 0.0, -10.0, - 28.0, -8.0, -0.375, 1.0, -8.0, -1.75, 1.625, 16.0, - 0.125, -30.0, 10.0, -15.0, -16.0, 0.5, 0.625, 0.125, - 1.75, -0.125, 24.0, -1.875, 8.0, 0.0, -72.0, 7.0, - -2.0, 0.5, 9.0, 4.0, 1.0, -64.0, 8.0, 0.5, - 2.0, 120.0, 28.0, -10.0, 0.8125, -52.0, 40.0, 0.0, - -26.0, 0.25, 15.0, 0.4375, 40.0, 0.25, 14.0, 15.0, - 3.0, -8.0, 0.0, 1.25, 24.0, -72.0, 6.0, 12.0, - -4.0, -8.0, 0.5625, 80.0, 0.5, 0.5, -20.0, 0.9375, - -1.0, 24.0, 28.0, 2.0, 4.0, 0.5, -16.0, -48.0, - 3.0, -2.25, 8.0, -56.0, 0.6875, 0.25, -120.0, -0.625, - 7.0, 16.0, -32.0, -11.0, 0.0, 0.0625, -1.375, 1.0, - 2.0, 104.0, -0.5, -3.5, 8.0, -0.625, 2.0, -0.375, - -1.0, 10.0, 5.0, 2.0, 5.5, 1.0, -0.625, -3.5, - 0.25, 1.5, 0.25, 4.0, 12.0, -12.0, 0.3125, -112.0, - -0.75, -0.8125, 0.125, -8.0, 44.0, 4.0, -0.125, -1.875, - 1.125, 1.875, -1.125, 14.0, 6.0, -5.0, -0.25, -44.0, - -2.0, 0.0, 48.0, 2.0, 24.0, -0.625, 0.1875, 1.5, - 4.5, 0.0625, -2.5, -1.75, -0.875, 0.4375, 1.0, 2.75, - 56.0, -22.0, -4.5, 13.0, 16.0, -0.375, 9.0, 1.5, - 0.75, -1.75, 1.625, 20.0, -0.75, -2.75, 5.0, 52.0, - 0.0, -60.0, -32.0, 0.375, 16.0, -4.0, 16.0, -0.125, - -5.0, -28.0, -0.4375, -5.0, -6.0, -14.0, 0.25, 0.875, - -20.0, 18.0, -0.875, -30.0, -64.0, -64.0, 9.0, 0.625, - 30.0, -24.0, 16.0, -2.0, 14.0, 5.5, 24.0, -0.5, - -8.0, -20.0, -0.75, -24.0, -10.0, 6.0, 0.4375, -3.25, - -1.25, 8.0, 0.375, 3.0, -0.9375, -2.0, 0.25, -30.0, - 26.0, 80.0, -32.0, -1.0, -3.75, 8.0, 48.0, 4.0, - 26.0, 60.0, 2.75, 2.0, 5.0, 0.125, 0.3125, -36.0, - 12.0, 12.0, 2.0, 0.9375, 0.75, -0.0625, -2.75, -48.0, - -0.3125, 48.0, -32.0, 2.0, -2.0, 1.0, 2.5, 28.0, - -3.5, 0.0, -4.0, 0.5, -9.0, -60.0, 32.0, 0.0, - 7.5, -0.75, -0.25, 0.0, 52.0, -6.0, -96.0, -6.0, - -0.6875, -40.0, 28.0, 2.0, -0.75, -88.0, 1.0, -3.25, - -56.0, 2.0, -0.75, 3.5, 2.5, 0.0, -96.0, 0.8125, - -7.0, 10.0, 0.25, 4.0, -20.0, 16.0, -12.0, 1.5, - 1.0, 5.0, 1.625, 0.875, -40.0, 4.0, 2.0, -12.0, - -0.8125, -18.0, -6.0, 0.625, -88.0, -0.25, -64.0, -24.0, - -12.0, -1.375, -3.25, -24.0, -4.0, -8.0, 48.0, -28.0, - -0.0625, -112.0, 0.0, 1.0, 2.0, 4.0, 0.5, -1.5, - -3.0, 6.0, 8.0, 3.5, -8.0, -0.625, -0.875, -1.75, - -1.75, 28.0, 0.0625, 3.0, 0.0, -13.0, 15.0, 0.25, - 2.0, 20.0, 0.625, -0.375, 22.0, -48.0, 4.0, 0.0625, - -3.5, -13.0, -96.0, -16.0, -8.0, -0.5625, -4.0, -44.0, - -4.0, 0.375, -1.0, 4.5, -1.375, 26.0, 0.125, -8.0, - 2.5, -24.0, 1.75, -88.0, -1.0, -8.0, -3.0, 0.5, - -120.0, -20.0, 7.0, -4.5, 48.0, 22.0, -5.5, -2.0, - -1.5, -0.5, 0.625, -3.75, -28.0, -2.5, 0.4375, -16.0, - 2.25, 104.0, -64.0, 0.375, -56.0, -14.0, 0.625, 0.0, - 16.0, 0.75, 3.25, 0.9375, 0.75, -0.4375, -1.375, 16.0, - 2.75, -8.0, -6.0, 8.0, -9.0, -0.25, -15.0, -52.0, - -2.75, -1.0, -5.5, -1.0, 1.0, 8.0, 0.75, -1.0, - 40.0, 3.5, -0.5, -0.9375, -112.0, 0.875, 32.0, 0.0, - 104.0, 2.0, -5.0, -0.1875, -5.0, -32.0, -26.0, 56.0, - 2.0, -4.0, 10.0, -0.375, -4.5, -104.0, 1.25, -2.25, - 16.0, 0.5, -16.0, 1.5, 16.0, -4.0, 10.0, 0.75, - -112.0, -0.5, -2.5, -6.0, 16.0, -16.0, -0.875, -56.0, - 0.0, -0.375, 13.0, 1.0, 4.0, -0.5, 8.0, 0.625, - -4.0, -0.625, -0.75, -64.0, -8.0, 1.75, 2.0, 0.0, - 0.5, -4.0, 8.0, 0.25, 1.0, 3.0, -0.75, 5.0, - 18.0, -44.0, 0.25, 8.0, 0.8125, 2.75, -0.0625, -4.0, - 24.0, 0.625, -2.0, 36.0, 0.75, -6.0, 3.25, 2.0, - 0.5625, 16.0, 10.0, 0.0, -1.75, -4.0, 0.0, -16.0, - 0.0625, 3.5, -10.0, -2.0, -0.5, -2.5, 3.75, 12.0, - -3.5, -0.625, -80.0, 80.0, -28.0, -64.0, -0.75, 0.5, - 0.0, 2.75, 6.0, 80.0, 0.0, -1.75, 0.5, -1.0, - -24.0, -0.875, 26.0, 64.0, 7.0, 0.3125, 0.0, -120.0, - 32.0, 2.0, -16.0, 0.0, 52.0, -20.0, 1.5, -1.875, - 12.0, 0.5, -4.0, 6.0, -0.875, 0.8125, 5.0, 0.375, - -0.6875, 10.0, 0.5, 0.8125, -1.625, -13.0, 1.75, -12.0, - 16.0, 4.0, 4.0, 2.0, -0.75, 0.5, -30.0, 0.0, - 0.0, -24.0, 8.0, -0.625, 0.0, -4.0, -104.0, -0.4375, - -1.0, 1.5, 3.25, -16.0, -24.0, -1.75, -1.0, 1.5, - -24.0, -0.625, -1.5, -0.25, 1.0, -44.0, 8.0, 16.0, - 12.0, 0.875, -10.0, -22.0, 0.6875, 10.0, -2.0, -11.0, - -0.5, -40.0, 52.0, -0.625, -0.6875, -1.875, 60.0, -12.0, - -16.0, -56.0, 3.5, 120.0, 1.75, -0.5, -0.125, -1.5, - 32.0, -40.0, -6.0, 1.75, 1.625, -0.625, -12.0, -64.0, - 1.25, 10.0, 5.0, 2.0, -22.0, 0.0, -2.0, -40.0, - 96.0, 0.125, 12.0, -4.0, 13.0, -0.0625, -40.0, 3.5, - 20.0, -88.0, -0.875, -1.75, 0.375, 3.75, -10.0, -8.0, - -0.25, 2.0, -8.0, -12.0, -48.0, -2.75, -16.0, -11.0, - -1.0, -8.0, 2.0, -26.0, -16.0, 5.0, 0.75, -56.0, - -0.5625, 14.0, -4.5, 16.0, -0.375, 7.5, -1.0, 0.75, - -56.0, -0.75, -8.0, -0.375, -3.5, 0.625, 2.25, -28.0, - -1.0, -1.5, -8.0, 2.0, 64.0, -112.0, 14.0, 0.0, - 5.5, 48.0, 48.0, 3.75, 1.75, 0.375, -2.0, -48.0, - 12.0, -0.9375, 56.0, 0.75, -2.5, -1.375, -6.5, -7.0, - -3.5, -13.0, 120.0, -0.3125, -0.6875, 4.0, 0.0, 1.5, - 0.1875, -14.0, 0.625, -32.0, 14.0, 2.0, -2.0, -0.75, - 0.75, -8.0, -0.5, 0.25, 0.75, -48.0, 44.0, 3.5, - -0.875, -4.0, 8.0, -13.0, 1.25, -4.5, 10.0, -0.5, - 1.375, 9.0, 10.0, 0.0, 1.75, -1.625, 0.25, 8.0, - 32.0, 88.0, 112.0, -0.625, 1.0, -72.0, -3.0, 9.0, - 0.125, 1.25, -2.0, -48.0, 1.75, -7.0, 16.0, -4.0, - 0.375, -4.0, 24.0, 28.0, -0.5, 4.0, -9.0, 0.0, - -0.25, -0.125, -3.5, -20.0, 5.0, -1.25, 16.0, -2.0, - -32.0, -8.0, -2.5, -64.0, 5.0, 0.0, -8.0, -0.75, - -13.0, -2.5, -2.0, -30.0, 3.25, -0.25, -52.0, -56.0, - -20.0, -88.0, 1.0, -2.75, -2.0, 0.8125, -0.125, -28.0, - 0.0625, -0.25, -96.0, 20.0, 0.0, 120.0, 36.0, -1.0, - 5.0, 24.0, 0.0625, 0.25, -0.25, -128.0, -9.0, 2.0, - 8.0, 5.0, -0.375, 0.8125, 14.0, -16.0, 1.5, 13.0, - -0.3125, 1.0, -8.0, -0.375, 1.375, 11.0, 0.125, 3.5, - 14.0, -4.0, 56.0, 10.0, -4.0, -4.0, -60.0, -12.0, - -60.0, -24.0, -3.5, -1.75, -32.0, -1.5, -7.5, 0.0, - 28.0, -10.0, 0.5, 28.0, -0.875, 2.5, -0.375, 4.0, - 2.5, -1.75, 0.4375, 0.0, 2.0, -0.375, -4.0, 96.0, - 0.0, -0.375, 24.0, -2.75, -24.0, -0.5, -1.25, 0.0, - -40.0, 12.0, -32.0, 4.0, 44.0, 0.5, 1.875, -8.0, - -32.0, -80.0, 72.0, 3.75, -48.0, -6.0, 0.75, -28.0, - 120.0, 12.0, -3.0, 2.5, -56.0, 8.0, -0.125, 0.5, - 72.0, -12.0, 0.0625, 0.5, -56.0, -1.25, 7.5, 22.0, - -8.0, -12.0, 0.875, 0.125, -40.0, 5.0, -14.0, -20.0, - -10.0, 10.0, 0.5, -24.0, 7.0, -0.4375, 0.6875, 20.0, - -1.5, -4.0, 0.0, -14.0, -2.0, 2.0, -32.0, -6.0, - 14.0, -1.5, 28.0, 60.0, -112.0, -12.0, -0.0625, 4.5, - -2.5, 80.0, -0.625, -0.25, 2.5, -15.0, -28.0, -24.0, - -0.5, 30.0, -0.5, 1.125, -26.0, 1.0, 5.5, -0.25, - 0.625, 1.0, -6.5, -0.875, 1.0, -1.125, 26.0, -4.0, - 26.0, -0.125, 56.0, 32.0, 1.875, -32.0, 11.0, -1.75, - 0.125, 16.0, -0.625, 0.75, 48.0, 20.0, -1.375, 16.0, - 2.75, -1.5, -5.5, 28.0, 1.0, -32.0, -0.625, -2.0, - 1.625, 1.25, -20.0, -5.0, 1.5, 7.0, 22.0, -0.6875, - -36.0, 9.0, -9.0, 52.0, 0.0, 2.5, 6.0, -0.375, - 0.9375, -8.0, 0.5, -3.25, 0.0, -0.5625, 0.75, -2.75, - -0.125, 7.0, -2.0, -5.0, -28.0, 20.0, -0.25, -64.0, - -26.0, -10.0, 0.0, 11.0, 1.75, 3.0, -104.0, 0.5, - -14.0, 0.625, -1.0, -0.9375, 5.0, -48.0, -32.0, -4.0, - -16.0, 3.75, -48.0, 0.875, -0.0625, -40.0, -8.0, -4.0, - 104.0, 0.0, -40.0, -4.0, 30.0, -48.0, 1.5, 0.875, - -2.5, 1.5, -13.0, 16.0, -20.0, -40.0, 32.0, 0.4375, - -56.0, -0.25, 1.875, -0.5, -6.0, 0.25, 3.0, 32.0, - 48.0, -0.75, 0.375, 0.125, 120.0, 40.0, 0.5, 0.625, - 0.625, 12.0, 5.0, -16.0, -30.0, 9.0, -0.4375, 20.0, - 0.5, -13.0, 26.0, 28.0, 0.0, 30.0, 0.0, 1.875, - -0.125, -0.375, 64.0, 1.375, -0.25, -2.75, 0.0, 6.0, - 0.375, 0.125, 1.5, 1.75, 1.5, -2.0, -72.0, 0.875, - -1.0, 1.25, 20.0, 2.0, -0.5, 56.0, 2.0, -2.5, - -0.75, -24.0, 10.0, -5.5, -1.5, 12.0, 30.0, -16.0, - 1.375, -5.0, -18.0, -14.0, 1.5, -4.0, -22.0, -2.25, - 72.0, -104.0, 3.0, 48.0, -56.0, 104.0, -7.0, 0.6875, - 1.375, 1.0, 6.0, -24.0, 30.0, -30.0, -2.25, -0.5625, - -104.0, 0.5, 2.0, 0.8125, 0.6875, 0.875, 0.875, 0.9375, - 0.875, -48.0, -0.25, -2.5, 15.0, -0.875, 56.0, -1.625, - 0.0, 1.625, -48.0, -3.5, 0.375, -3.5, -1.875, 0.0, - 0.625, -2.25, -16.0, -12.0, -0.0625, -0.6875, 1.0, -0.625, - -6.0, -2.0, -0.3125, -24.0, -6.0, -6.5, 2.0, 20.0, - -8.0, 7.0, 112.0, 5.0, -0.3125, -3.5, 24.0, 0.75, - -64.0, -40.0, 6.0, 2.0, -1.0, 6.0, 1.5, -1.0, - -56.0, 0.0, -28.0, 0.4375, -4.0, 18.0, 16.0, -0.25, - -24.0, 10.0, -0.5, -1.75, -48.0, 0.9375, -3.0, 1.0, - 16.0, 26.0, 22.0, 0.75, -3.5, -0.25, 0.25, -3.0, - 2.0, -112.0, -4.0, 56.0, -1.5, 1.0, -0.3125, -4.0, - 4.0, -0.125, -0.125, -8.0, 56.0, -0.25, -128.0, -2.0, - -2.0, -1.0, 2.0, -13.0, 0.4375, -0.5625, -4.0, 2.5, - -5.5, -7.5, 8.0, -72.0, 6.0, 13.0, 0.0, -96.0, - 0.75, 0.875, 0.6875, -88.0, -56.0, -40.0, 1.0, -3.25, - 1.5, 0.4375, -1.0, 1.0, -12.0, 2.0, 1.625, 6.0, - 0.3125, 7.5, -1.125, 3.25, 5.0, -3.0, -0.875, 0.125, - -14.0, 15.0, -0.4375, -112.0, -4.0, 88.0, 20.0, 1.0, - -16.0, 0.25, 2.0, -32.0, 13.0, -7.5, 0.0, 1.0, - -2.0, 4.5, 5.0, 36.0, -1.0, -1.25, -1.0, -0.9375, - -26.0, -0.375, 0.375, 1.0, 14.0, 14.0, 0.875, 1.375, - -5.5, -2.5, 7.5, -1.0, -56.0, 2.0, -1.5, -40.0, - 6.0, -14.0, 7.0, -6.5, -72.0, -48.0, 0.5, 120.0, - 0.0, 6.0, 0.75, -0.25, 3.25, 7.0, -56.0, 3.0, - 56.0, 24.0, -3.75, -0.5, 0.75, -0.3125, 8.0, -0.75, - 2.0, -5.0, 11.0, 1.5, 0.5625, -0.3125, -1.5, -9.0, - 2.75, 2.0, 0.75, 3.0, 0.0, -128.0, -44.0, 0.5, - 80.0, -14.0, -3.0, 2.0, -2.0, 26.0, -0.75, -2.0, - -5.0, 0.0, -0.5, 30.0, 3.0, 3.5, 16.0, -0.875, - -0.0625, 7.5, -8.0, 2.5, 6.0, -64.0, 1.0, -64.0, - 28.0, -6.0, 40.0, -6.0, 3.5, 8.0, -3.5, 12.0, - -12.0, 7.0, 9.0, -0.375, -5.0, -2.5, 3.75, 3.0, - 56.0, -2.0, 2.0, -3.25, -1.75, -72.0, 18.0, 0.0, - -4.0, -16.0, -11.0, -2.0, 12.0, 3.5, -1.25, 40.0, - -4.0, 1.5, -7.0, -1.625, 16.0, -20.0, -5.0, -10.0, - -1.25, 120.0, 80.0, 9.0, -52.0, 2.0, -11.0, 48.0, - -0.9375, 1.5, 3.25, 1.25, 9.0, 0.25, 1.75, -1.125, - 72.0, -1.5, 0.375, 2.0, -2.75, -0.9375, 8.0, 0.0, - -0.75, -24.0, 10.0, -0.75, 0.625, 0.625, 2.5, 32.0, - 16.0, -0.875, -1.875, 96.0, 8.0, 120.0, -16.0, -13.0, - -8.0, -1.0, -3.0, -72.0, 1.625, 20.0, -4.0, -1.75, - -6.0, 0.6875, -6.0, 26.0, 4.0, -0.5, 104.0, -9.0, - 0.75, 0.375, -10.0, 3.5, -4.0, -32.0, 48.0, -1.375, - -4.5, -24.0, -0.25, -104.0, 0.25, -0.125, 12.0, 3.75, - 20.0, -10.0, -16.0, -6.0, -22.0, 0.75, -9.0, -0.625, - 96.0, 0.6875, -14.0, 24.0, 0.75, 0.5, 0.125, 0.0, - 1.875, -44.0, -4.0, -9.0, 0.0, -6.0, 3.0, 0.8125, - -12.0, -7.0, 0.0, -1.0, 1.5, -3.5, 2.0, -2.5, - -1.25, 3.0, -2.0, -1.75, -16.0, 40.0, 16.0, 0.875, - 0.4375, -16.0, 20.0, 15.0, 0.0625, -40.0, 5.0, 0.8125, - 0.25, 20.0, 14.0, 16.0, 4.0, 0.5, -0.25, 0.1875, - 0.0, -40.0, -0.75, 7.0, 12.0, -64.0, 1.0, 0.0625, - -3.0, 9.0, -20.0, 11.0, -20.0, -2.0, 40.0, -10.0, - 0.4375, 3.0, 3.5, 0.25, -0.8125, -6.0, 14.0, 0.5, - 0.25, -2.0, -0.5, 16.0, 10.0, 40.0, 0.9375, -1.75, - -5.0, 1.5, 0.375, -0.4375, 80.0, 0.75, 56.0, 64.0, - 10.0, 2.0, -56.0, 6.0, -48.0, 12.0, 12.0, -1.5, - -32.0, 1.5, -5.0, 5.0, 44.0, 6.0, 1.5, 0.0, - 7.5, 0.5, -8.0, 12.0, -2.0, 8.0, -32.0, 56.0, - 16.0, 7.0, 2.0, 0.1875, -32.0, 1.0, -0.5, 0.375, - -80.0, -88.0, 0.375, -1.0, -56.0, 28.0, -4.0, -64.0, - 14.0, -64.0, -5.0, -4.0, 18.0, 3.25, -10.0, 14.0, - 6.0, -0.125, -4.0, -28.0, 1.0, 0.0, -0.3125, 0.5, - 11.0, -24.0, -40.0, -11.0, -1.0, -7.0, 6.0, -4.0, - -0.875, -1.25, 0.5, 0.9375, 7.0, -88.0, 0.0625, 22.0, - -4.0, -0.25, -24.0, -4.0, -10.0, 40.0, -1.0, 0.5, - 0.875, 32.0, -8.0, -4.0, -15.0, 1.625, -14.0, -8.0, - -0.125, 5.5, 0.8125, -12.0, -104.0, 0.3125, 0.0, 24.0, - -6.0, -0.25, -7.0, -0.25, 14.0, 1.5, -64.0, 36.0, - 1.375, 16.0, -72.0, 2.0, 1.0, 9.0, 0.125, -2.0, - -8.0, 56.0, -0.6875, 10.0, -120.0, 0.75, -0.875, 64.0, - 104.0, 0.125, -0.8125, -0.6875, -1.75, 0.0, -0.8125, 32.0, - 1.5, -112.0, -40.0, 0.0, 0.0, 6.5, 4.0, 1.0, - -10.0, 14.0, 0.625, 3.5, 20.0, -48.0, -4.0, -1.25, - 12.0, -16.0, -3.0, -0.6875, 2.0, 30.0, 6.0, -0.25, - 5.0, 0.8125, -40.0, -3.0, 0.25, 0.5, 0.8125, 5.0, - -28.0, -8.0, -8.0, 104.0, -8.0, 0.75, 3.5, -120.0, - -1.0, 3.0, 72.0, 15.0, -120.0, 3.5, 2.75, 0.625, - 2.0, 52.0, -16.0, -2.5, 2.0, 0.0, -10.0, -1.5, - 6.5, 6.0, 4.5, -0.1875, -1.0, 3.0, 0.375, 4.0, - 1.625, 1.5, -1.125, 0.3125, -0.75, -56.0, 1.0, 14.0, - -40.0, -0.875, -7.0, -16.0, -6.0, 5.0, 6.0, 24.0, - 20.0, 1.5, -13.0, -0.25, 24.0, -112.0, -48.0, -7.5, - 6.0, -96.0, -64.0, 3.5, -1.5, -32.0, 56.0, 60.0, - 44.0, 32.0, -6.0, -6.0, 0.375, 6.0, 4.0, -10.0, - 24.0, -0.875, 0.5, -104.0, 20.0, -120.0, 14.0, 0.0, - 0.25, -1.125, 0.5, 0.625, -0.75, -0.3125, 24.0, 0.0, - 7.5, 52.0, -0.5, -0.6875, 112.0, -20.0, 24.0, 0.25, - 0.0, -112.0, 0.6875, -56.0, 72.0, -11.0, 13.0, -16.0, - -8.0, -7.0, 6.5, 26.0, 20.0, -0.25, -28.0, 10.0, - -0.75, 80.0, -7.0, -96.0, 52.0, -0.5, 10.0, 7.5, - -0.6875, -0.9375, -6.0, 30.0, -6.0, 56.0, -0.5, 64.0, - -28.0, -0.75, -1.75, -36.0, -24.0, 0.0, 0.0, 4.0, - 4.0, -0.5, 32.0, 0.375, 0.6875, 2.0, -0.5, -7.0, - -0.8125, -3.25, -48.0, -56.0, -30.0, -32.0, -0.75, 5.0, - -3.25, -0.75, 48.0, 0.125, 48.0, -5.0, -0.75, -8.0, - -0.9375, -0.5, -1.75, -0.0625, 28.0, -1.0, 0.375, 0.875, - -1.0, -1.0, 6.5, 2.0, -1.25, -32.0, -2.0, -6.0, - -3.25, -4.0, -12.0, -2.5, -16.0, -96.0, -0.75, -30.0, - 120.0, 8.0, 20.0, -0.0625, -32.0, -1.875, 0.375, -2.5, - -1.625, 5.0, -22.0, -15.0, 120.0, -44.0, 20.0, 11.0, - 56.0, 0.9375, -10.0, -5.0, 40.0, -0.5, -1.5, -0.4375, - 0.4375, 0.0, 0.4375, -1.5, -104.0, 0.0, 22.0, 0.25, - -8.0, 120.0, -112.0, -0.25, -12.0, -120.0, -8.0, -3.75, - -1.75, 32.0, 1.0, 40.0, -1.5, -64.0, -7.0, 20.0, - -24.0, -0.25, 0.0, -22.0, -0.5, -44.0, -40.0, -8.0, - 0.25, -4.0, -0.625, -4.0, -60.0, -5.0, -1.125, 4.0, - 0.5, 0.375, 0.625, 8.0, -0.125, -0.875, -4.5, -0.9375, - -0.25, 3.5, 6.0, 4.0, -1.0, -28.0, 0.875, -4.0, - -0.25, 0.6875, 8.0, -6.5, 0.375, -26.0, 72.0, -32.0, - -0.625, -20.0, -40.0, 10.0, -44.0, 48.0, -0.75, 1.0, - -120.0, -14.0, 26.0, -18.0, -56.0, -4.0, -120.0, 0.625, - -32.0, -3.0, 10.0, -128.0, -0.5, 8.0, 52.0, 1.0, - -0.5, 120.0, -30.0, 56.0, -4.0, 48.0, 0.375, 0.375, - 16.0, 16.0, -64.0, 4.0, -3.5, 4.0, -0.5, 12.0, - -28.0, -3.5, -4.0, -28.0, -0.875, 48.0, -16.0, -12.0, - 18.0, 2.0, -0.5625, -1.375, 2.0, 1.125, 1.5, 6.0, - 32.0, 52.0, -1.625, -0.3125, -88.0, 30.0, -8.0, -0.5, - -12.0, 16.0, -0.75, 6.5, 16.0, -1.75, 2.0, -44.0, - 96.0, -2.25, -6.0, 3.0, 4.0, -24.0, 48.0, 1.375, - 24.0, -14.0, 0.5625, 120.0, 0.25, -32.0, 10.0, -6.0, - 1.25, 0.0, -13.0, -48.0, -6.0, -104.0, 12.0, -96.0, - -2.5, -2.0, -2.0, 18.0, -6.0, -0.375, 3.75, 3.0, - 1.75, -0.25, -0.6875, 6.0, 11.0, 0.8125, 12.0, -120.0, - 88.0, 4.0, 2.25, 1.625, -0.4375, -56.0, -12.0, -0.5, - -0.5, 40.0, -12.0, -0.0625, -2.0, -6.0, -0.25, 1.375, - -0.8125, -20.0, 0.5, -5.0, -0.5625, 0.5625, -2.25, 0.125, - 52.0, 0.0, -88.0, 1.75, -6.0, -3.25, -2.0, -1.5, - -8.0, 8.0, -5.0, -8.0, -12.0, -6.0, 56.0, -0.25, - 5.5, -16.0, -16.0, 0.0, 4.0, -4.5, -120.0, -0.5, - -7.0, -15.0, 0.0, -6.0, 20.0, -72.0, 6.5, -18.0, - -0.5, 1.75, -6.0, 96.0, -1.0, -0.6875, 0.0, 15.0, - 5.5, -4.5, 0.375, 1.25, -1.5, -2.25, 14.0, -8.0, - 112.0, -18.0, 2.0, -44.0, 1.125, 7.0, -24.0, -4.0, - -0.6875, -26.0, -1.5, -8.0, 72.0, -40.0, 4.0, 28.0, - 1.5, -12.0, 1.5, -6.5, 4.0, -0.3125, 0.4375, 0.8125, - 0.5, -15.0, -1.625, 8.0, -4.5, 4.0, 0.0, 3.0, - -2.0, -3.0, 3.0, 15.0, -1.25, -5.5, 1.0, -40.0, - -5.0, 12.0, 2.0, -120.0, -0.125, -0.25, 0.0625, 0.5625, - -2.75, -0.25, 120.0, -22.0, 0.0, 16.0, -7.0, 10.0, - 1.0, -28.0, 1.75, 0.5, 72.0, 112.0, 6.0, 0.25, - 0.5, 0.1875, 30.0, 18.0, 3.0, -0.125, -1.125, 14.0, - -3.5, -24.0, -14.0, 30.0, -0.125, 5.0, 12.0, 13.0, - -1.0, -0.625, 28.0, 18.0, 32.0, 1.5, 7.5, -2.0, - 2.75, -0.4375, -11.0, -0.25, 36.0, -0.5, -0.1875, 0.6875, - -2.0, -12.0, 2.0, -13.0, -32.0, -1.875, -14.0, -40.0, - -0.5, -0.75, 1.75, -26.0, -2.0, 40.0, 1.5, -3.25, - -64.0, -40.0, 16.0, -0.6875, 5.5, 12.0, -3.5, 20.0, - 0.1875, 24.0, -3.0, -1.875, 2.0, 14.0, 0.625, 4.5, - 0.25, -80.0, 12.0, 8.0, 24.0, -64.0, -52.0, 4.0, - 0.0, 0.5, -12.0, 0.75, 0.9375, 14.0, 0.0, 0.375, - -96.0, 6.0, 5.0, -7.0, 20.0, -22.0, 96.0, -1.0, - -88.0, 0.875, 1.0, 4.0, 36.0, 0.0, -1.5, -20.0, - -1.75, 3.25, 26.0, -0.4375, 15.0, -0.5625, 4.0, 1.625, - -12.0, -56.0, -0.25, 1.25, 96.0, -7.0, 1.375, 26.0, - 0.75, -112.0, -20.0, -16.0, 0.375, 0.6875, 0.0, 7.0, - 0.75, -6.5, -8.0, -0.25, -64.0, 2.0, -1.0, 7.0, - 0.875, 1.125, 0.6875, 0.25, 0.8125, 5.0, -0.1875, -48.0, - 52.0, 3.75, 40.0, 20.0, -1.375, -104.0, -0.75, 10.0, - 0.0625, 80.0, 24.0, 8.0, 8.0, 6.0, 1.25, 11.0, - 0.0, -64.0, -0.625, 2.25, 36.0, 96.0, 11.0, -9.0, - 7.0, 0.25, 2.0, 40.0, 28.0, -2.5, 1.625, 0.25, - 5.0, -7.5, 0.8125, 20.0, -30.0, -26.0, 0.75, 20.0, - 0.0, -2.0, 2.0, 1.5, -0.875, -10.0, -8.0, -3.0, - -0.5, -8.0, 3.5, 28.0, 1.5, 2.0, -12.0, 0.125, - 1.25, 15.0, 0.375, 0.75, 0.9375, 0.625, 0.75, -1.75, - -4.0, 6.0, -0.3125, 0.375, -2.0, 4.0, -0.1875, -5.5, - -1.0, 0.375, -2.0, -28.0, -44.0, -0.5, 0.625, -30.0, - -1.25, 7.0, 16.0, 0.0625, 16.0, -3.0, 24.0, -32.0, - 5.0, 0.4375, 0.0, -1.0, -0.625, -52.0, -1.25, 1.125, - 1.75, -56.0, 1.625, -0.125, 14.0, 11.0, 1.375, 0.125, - -0.75, 0.0, 1.25, -0.25, -120.0, -1.5, 18.0, -10.0, - -2.0, 12.0, 0.75, 88.0, 1.375, 0.125, 4.0, -0.875, - 2.75, -6.5, 6.0, -0.375, -32.0, -32.0, 0.75, 13.0, - 20.0, -0.875, -4.0, 16.0, 0.8125, -20.0, -32.0, 3.25, - 96.0, -2.5, -32.0, -1.375, -1.75, -0.5, -20.0, -1.75, - 56.0, -88.0, -8.0, 96.0, -36.0, 1.25, 7.0, 22.0, - 3.5, 20.0, 40.0, 0.25, -12.0, 3.0, -40.0, 1.75, - -56.0, 8.0, -0.0625, -1.5, -0.5, 4.0, 3.5, -4.0, - -9.0, 3.5, 32.0, 2.5, -2.0, -0.5, -32.0, 3.0, - 14.0, 3.5, 10.0, -8.0, -0.25, 48.0, -5.0, -0.8125, - 0.8125, -1.5, 1.25, 4.5, -0.875, 1.0, 0.9375, 0.0, - 1.5, -32.0, 7.5, 6.0, 3.5, -22.0, -8.0, 5.5, - 0.0, -12.0, -0.75, 7.0, -8.0, 12.0, -20.0, -1.375, - -3.75, -0.1875, 15.0, 1.375, 4.0, 20.0, 0.75, -72.0, - 5.0, 1.5, 0.375, -4.0, -0.875, 48.0, 9.0, 0.625, - -1.625, -16.0, -0.25, -2.0, -12.0, 0.5, 3.25, -6.5, - -0.75, 3.5, 0.0, -11.0, -4.0, -16.0, 18.0, 2.25, - 1.5, 0.0, -5.0, -0.625, 14.0, 24.0, -1.125, 6.0, - 32.0, -16.0, -2.75, 2.0, 0.625, 0.75, 36.0, 1.25, - -32.0, -72.0, -5.0, 16.0, -36.0, -2.5, 0.3125, 1.25, - -24.0, 3.25, 0.5, -2.0, -1.0, -1.25, 28.0, -0.5, - -0.75, -56.0, 2.0, 4.0, 0.0625, 0.4375, -40.0, 40.0, - 88.0, 80.0, 20.0, -3.75, -13.0, 1.375, 2.5, -40.0, - 0.625, -0.125, 11.0, 3.0, -0.1875, 1.875, 0.6875, 2.25, - -0.0625, -60.0, -0.5, -20.0, -48.0, -6.5, 0.0, -36.0, - -4.0, 15.0, 0.5625, 1.625, 16.0, 36.0, -1.625, 28.0, - 13.0, -0.125, 0.5625, 1.5, 0.875, -10.0, -1.75, -36.0, - -2.5, 3.25, -40.0, -16.0, -2.0, 3.75, -0.75, -0.0625, - -0.75, -16.0, -0.4375, -11.0, 0.3125, -1.25, 1.625, -44.0, - -5.5, 0.0, 16.0, 7.0, -0.3125, 18.0, -4.5, -14.0, - 20.0, 0.625, -1.0, -28.0, 0.0, 24.0, 2.0, 0.75, - 12.0, 0.75, 44.0, -4.0, 120.0, 112.0, 20.0, -0.125, - 48.0, 1.625, 0.5, 1.375, -0.9375, -24.0, -0.875, -6.0, - -0.1875, -26.0, -16.0, 16.0, 28.0, 56.0, -0.625, 0.75, - -0.5, 40.0, -60.0, 9.0, -7.5, -15.0, -1.125, 28.0, - -0.5625, -4.0, 64.0, 28.0, 4.0, 28.0, 64.0, 44.0, - -112.0, 4.0, 48.0, -12.0, 0.6875, 5.0, 0.0, 6.0, - -56.0, 0.75, 12.0, 4.0, 3.75, 1.625, -120.0, 7.5, - -8.0, 5.0, -0.375, -12.0, 6.5, -15.0, 0.5, 0.5, - 4.5, 60.0, 0.0, -0.3125, -0.625, -36.0, 8.0, 36.0, - 28.0, 1.5, 2.75, 10.0, 4.0, 32.0, 3.75, 24.0, - 0.875, -28.0, -56.0, 11.0, -18.0, 0.5, 0.5, -20.0, - -1.0, 0.75, -16.0, -0.25, -0.6875, 2.5, 3.5, 88.0, - -3.5, 40.0, 12.0, -2.0, -8.0, 16.0, 0.0, 2.0, - 1.25, 2.5, 0.4375, 3.75, -0.0625, 1.0, 3.0, -30.0, - 2.5, -10.0, -12.0, 12.0, -1.0, -14.0, 4.5, 28.0, - 3.75, 1.5, 96.0, 4.0, 120.0, 36.0, 44.0, -16.0, - -5.5, -0.9375, 2.75, -2.25, -28.0, -22.0, -3.5, 1.125, - 3.0, 20.0, -1.0, -22.0, -0.875, -44.0, -3.75, -4.0, - 7.0, -0.625, -72.0, 0.5625, -28.0, 3.25, -1.0, 0.375, - -30.0, -20.0, 28.0, 0.0, 72.0, -88.0, 0.5, 20.0, - -0.25, 4.0, -0.5, -2.5, 60.0, -104.0, -12.0, 0.25, - -32.0, 13.0, 1.75, -1.875, 52.0, 1.0, 8.0, -1.5, - -32.0, 1.0, -12.0, 30.0, -1.125, -32.0, -96.0, -120.0, - 8.0, -0.5, 1.875, 7.0, -30.0, 14.0, -12.0, -4.0, - 2.0, -4.0, -1.0, 80.0, 11.0, 0.0, 10.0, -1.625, - -8.0, -36.0, 7.0, -0.9375, -1.0, 1.5, 0.0, -1.0, - -96.0, 32.0, 40.0, -0.0625, 12.0, 20.0, 0.0625, 4.0, - 2.25, 6.0, 0.375, 1.0, 2.5, -16.0, 7.5, -0.0625, - 10.0, 20.0, 0.0625, 5.5, -2.0, -0.75, -3.75, -32.0, - 88.0, -0.75, 5.0, 24.0, 0.0625, -28.0, -6.5, 0.625, - -1.25, -}; -static data_t verify_data[M_DIM * N_DIM] = { - 14208.9140625, 1665.9140625, 5949.21875, - -7746.1875, -9054.88671875, -1813.1796875, - 473.9609375, 4126.01171875, -6241.81640625, - -17831.265625, 627.4140625, 1007.3984375, - 10867.03125, -3601.015625, 7084.4765625, - -12165.71875, -9776.97265625, 5007.484375, - 10884.125, -7689.79296875, 2825.5703125, - 5124.9453125, -34844.375, -4946.9375, - -9077.0, -54.96484375, -16357.9765625, - 2836.625, -11345.0859375, 4849.22265625, - -4440.6015625, 2040.79296875, -1337.96484375, - 4781.453125, -4875.671875, -7232.68359375, - -2494.1875, -7328.8515625, -6436.1484375, - 307.84765625, -6310.34375, -99.6015625, - 3095.8828125, 3975.734375, -7152.59375, - 9842.4765625, -17587.07421875, 501.64453125, - 4531.50390625, -6065.0, 6911.8125, - 2872.16015625, 12437.49609375, 3011.7109375, - 352.59375, 4508.43359375, -5803.171875, - -13022.9453125, 11446.8828125, -2441.453125, - 5250.74609375, 7477.72265625, 2997.08984375, - 15078.625, 8579.3359375, -1946.8984375, - -6284.25, 4399.25, -9375.078125, - 7213.69921875, -7236.125, 3800.74609375, - 6132.89453125, -1748.91796875, -8052.125, - -13395.1328125, 12002.98046875, 12804.2890625, - 7509.09375, -7686.61328125, -11248.515625, - 11692.9609375, 10826.73828125, 4016.6328125, - -9722.94140625, -2186.97265625, -3768.41796875, - -987.078125, -9888.640625, -15104.2734375, - 11523.5859375, 6376.921875, -19955.421875, - -15367.2109375, -11777.296875, 7964.890625, - 15499.265625, -1668.25, -831.578125, - 4029.4921875, -1240.5, 3631.921875, - -8857.7578125, -1754.03125, -18926.265625, - -18747.3203125, -3985.7109375, 7128.09375, - 7597.8125, -10134.453125, 2902.9921875, - 19760.1875, -9432.015625, -3348.1484375, - -503.046875, -9525.25, -599.46875, - -6098.796875, 222.7109375, -148.59375, - -2773.03125, 3748.1171875, -58.1796875, - 1000.171875, 4850.5625, -2008.9140625, - -11314.546875, 15047.3125, 1409.1171875, - 6676.34375, -10931.703125, 161.5859375, - 2000.1796875, 7447.3046875, 2933.640625, - 1403.765625, -19347.6171875, 7527.7265625, - 5484.921875, 2753.609375, -1205.3671875, - 13705.8828125, -28558.5234375, 5929.5234375, - -87.9375, -15749.6875, 14944.546875, - -4967.7734375, -2605.953125, 17562.265625, - -8857.2734375, 1686.1015625, -1141.4609375, - 10112.875, 11311.625, -14177.375, - -3333.453125, 9112.34375, 11096.8828125, - -3111.9140625, 7592.8828125, -2618.890625, - 16596.21875, -6664.21875, 8375.640625, - 1223.34375, 2154.6875, 2722.9375, - -5632.21875, 6552.7578125, -8437.90625, - -2067.53125, 3248.34375, -5004.6171875, - 7182.5859375, 3702.984375, -15126.0859375, - 811.5625, -13129.984375, 3938.62109375, - 11401.0625, -12485.25390625, -400.25, - -11892.890625, -1385.6328125, 330.546875, - -10759.6640625, -4098.015625, -7074.703125, - 6016.875, 11467.0234375, 1500.37109375, - -7064.8671875, 6124.15625, -20229.671875, - -5228.8984375, -6136.2734375, -4503.5625, - -1780.23046875, 2824.0703125, 237.1953125, - 5702.41796875, -3576.265625, 12725.8359375, - -15162.3203125, -3461.88671875, 11665.00390625, - -12011.46875, -1762.796875, -1421.5859375, - 8102.3984375, 2889.796875, 55.8828125, - -615.609375, -10169.2578125, 3168.07421875, - -17995.703125, -2714.109375, -9983.453125, - -2443.6953125, 12997.7890625, 5100.9609375, - -2555.546875, 2398.54296875, -899.8984375, - 14377.03125, 7672.17578125, -13123.6875, - -637.0703125, -12668.171875, 7085.76953125, - 3625.765625, -15711.7578125, -6327.4296875, - -1080.8671875, -3004.16015625, 12930.1640625, - -233.0078125, 17214.91796875, -2735.03515625, - -1935.9140625, 7965.4296875, -5054.2265625, - 5851.1875, -2969.94921875, 9291.71875, - 2288.6015625, -5033.2578125, -4858.1328125, - 7163.6953125, -806.51953125, 5853.578125, - -6707.75, -5286.5, 9093.19921875, - 14408.875, 7552.71875, -16901.4375, - -9606.1015625, -2645.63671875, -10475.31640625, - -31.8515625, -7998.4609375, 2718.0859375, - 5060.3515625, -2477.859375, 1320.828125, - -7834.0390625, 5807.5078125, 2939.84375, - 4966.1171875, 10742.74609375, -5386.34375, - -5913.60546875, 622.796875, -6289.1953125, - -15435.14453125, 4199.765625, -3705.9453125, - -3927.5859375, 116.26953125, -2411.1796875, - -2278.33203125, -4683.515625, -3422.25390625, - -5275.1875, -1318.76953125, -5518.8671875, - 1967.51171875, 2541.703125, 539.22265625, - -8165.7109375, 4342.39453125, 748.05859375, - -3043.5859375, -2264.3984375, 2835.59375, - -5480.2421875, 6587.9765625, 1775.98828125, - 4663.953125, -14588.78125, 4723.6015625, - 2054.5390625, -6706.109375, -10324.34375, - -4108.4765625, 15455.0, -23.8046875, - 3955.44921875, -13585.921875, 3816.8828125, - 7938.984375, -8505.015625, 1045.8984375, - 7606.78125, -4022.9140625, -638.671875, - -3221.359375, -2761.35546875, 2287.8671875, - 3040.984375, 390.88671875, -5357.28125, - 5429.9140625, 3257.8515625, 791.54296875, - -7403.984375, 7901.5859375, -1957.3984375, - 6841.50390625, -1731.26953125, 5641.7734375, - 954.03125, -5175.3515625, 1985.4296875, - -10283.2890625, -863.1953125, -2781.41796875, - 7842.0546875, -6151.9453125, 5808.21484375, - -2064.0625, 9388.59375, -5701.41796875, - -6842.95703125, 6358.24609375, -824.69140625, - 7319.25390625, -4802.75, 13649.16796875, - 6242.36328125, 8272.22265625, -10622.39453125, - -44.98046875, 4216.48828125, -4903.49609375, - -8351.69140625, 11643.1171875, -876.03125, - 271.94921875, -4363.3203125, -5631.7734375, - -4725.5078125, 13027.8359375, -15436.84375, - 9764.42578125, -21584.78515625, 5406.453125, - 4546.80078125, 10233.79296875, -10368.20703125, - 16960.71875, -7973.61328125, 2738.6953125, - -8463.28125, -9250.27734375, -3706.19921875, - 5071.9140625, -1727.7265625, -7644.74609375, - -11505.7109375, 16675.86328125, 6587.12109375, - 8438.18359375, 2441.3359375, -9075.15625, - 6771.328125, 15874.91015625, -9889.828125, - 19896.40625, 4914.91015625, 2067.03125, - -501.0078125, 2919.2109375, 2237.0234375, - -5499.99609375, -4866.16796875, -7565.2109375, - 26242.7421875, 4513.48828125, 6055.375, - -5263.7578125, 9913.74609375, 4854.52734375, - 19320.6953125, 11740.09375, 7653.1328125, - -954.86328125, 2015.04296875, -8957.4375, - -4582.7109375, -16081.03515625, -5375.43359375, - -4433.26171875, -1988.1953125, -781.17578125, - -1021.05078125, 8211.2734375, 16659.89453125, - -22184.73828125, 15517.703125, -5177.0, - -3766.1875, -13615.9296875, -18613.1171875, - 1818.2421875, 9017.3046875, 2546.265625, - -12712.9140625, 8061.08203125, -1807.828125, - -6695.10546875, 14554.8359375, 10724.1171875, - -2764.8515625, 3631.65625, -82.609375, - -3471.59375, 1836.5390625, -13414.4921875, - -20696.28125, -12069.9140625, 1892.59375, - 839.7734375, -663.265625, -86.16015625, - 1577.671875, -3343.76171875, -2154.265625, - -3999.15625, 7537.4921875, -11238.93359375, - -580.484375, 2931.90625, -2367.6171875, - 9222.40625, 12867.21875, -7543.0234375, - 7282.125, 2165.140625, 7999.4609375, - -974.3203125, -6858.2109375, 6354.6875, - 11068.484375, 9338.52734375, 661.625, - 1730.890625, -3691.0703125, -8049.1484375, - 9892.3671875, 20297.4140625, -7949.15234375, - -6796.296875, -1990.015625, -5057.046875, - -1480.69921875, -2919.6640625, -4446.1484375, - -4602.15234375, 2293.0703125, 3803.0, - 2257.765625, -13792.03125, 3299.33984375, - -14558.90625, 9485.21484375, 7509.96875, - 3055.55078125, -8456.01171875, 7346.92578125, - -16491.5234375, -10262.5234375, -11790.765625, - -2122.8203125, -7446.0859375, 5434.7421875, - 15748.0625, 1277.0546875, -7934.71875, - -9669.125, 18313.19921875, 964.4296875, - -7959.3125, 8565.75, -2738.17578125, - 11406.17578125, 1794.015625, 6068.8203125, - -5527.625, -10647.5234375, 7826.9375, - -11240.0625, 9728.390625, -7526.32421875, - -2290.15234375, -5430.21484375, -13685.703125, - -4357.46875, -12431.8125, -3664.62890625, - -1309.0859375, 1359.22265625, 5977.546875, - 1095.0546875, -3767.53125, -659.1015625, - 7918.6875, 3075.6875, 3058.76171875, - -5133.734375, -11912.7109375, -9819.9296875, - 13890.875, 389.46875, -7343.1328125, - -2757.3984375, -5923.390625, -4330.109375, - 3546.96875, -2350.79296875, 3498.84375, - -4245.0234375, 14444.47265625, 2873.28125, - -1196.40625, 2595.89453125, 23919.5078125, - 2650.86328125, -4170.42578125, -11159.1171875, - 1373.140625, -2764.234375, -6426.42578125, - -6714.21875, -309.30078125, 948.546875, - 1656.6015625, 8842.875, -1902.5703125, - -1591.93359375, 3316.2109375, 1659.20703125, - -2748.51953125, -1771.62109375, 7938.2265625, - -665.984375, 22369.0078125, -2298.5625, - -7302.2265625, 4920.7734375, -5985.73046875, - 15727.33984375, -7922.40625, 718.26953125, - -856.1328125, 2555.2421875, 4629.3359375, - -7843.39453125, -12497.34765625, -2296.484375, - -2582.3203125, -9427.296875, 3371.59375, - -6351.16015625, -6471.265625, 2334.640625, - 12284.91796875, 8880.41796875, 11721.71484375, - -11173.109375, 1685.3046875, 9733.609375, - -7690.734375, -4766.29296875, -774.171875, - 4341.94140625, -15146.484375, -17216.42578125, - -743.54296875, 243.6015625, 2246.796875, - -1686.7890625, -7719.640625, -6588.2578125, - 13190.62109375, -8470.453125, -2462.3125, - 9453.09375, 8931.7890625, -3595.328125, - -3001.7890625, -11708.40234375, 9210.47265625, - 6120.16015625, -1840.37890625, -2663.8359375, - 4163.046875, -11463.69921875, -9409.421875, - 28203.7109375, 139.140625, 2592.0078125, - 225.4140625, 2942.6328125, 588.03515625, - 10542.71875, -8696.0, -13572.6328125, - -4109.91796875, -1650.24609375, -16097.9140625, - 2870.82421875, -5954.69140625, 2947.58984375, - -6337.3125, 4013.2265625, 5278.265625, - 1999.86328125, -12973.2890625, -136.3984375, - -26383.9140625, 6107.0703125, 2047.11328125, - 3792.4375, 2912.890625, 5259.8671875, - 3418.54296875, 6951.671875, 17.8984375, - 77.75, 5855.43359375, 6087.2578125, - 2082.3125, 14587.84375, -4915.4140625, - -8280.33203125, -8909.4609375, -5540.8515625, - -486.078125, 6872.41015625, -5628.40625, - 11298.28125, -12885.88671875, 1619.33984375, - 6925.26953125, -5871.9375, 652.90625, - -4758.44921875, -11787.13671875, 5831.42578125, - -4493.34375, 3219.95703125, 1975.921875, - -4938.62109375, 852.42578125, -1296.67578125, - -5994.203125, -6109.2265625, 18865.60546875, - 15939.09375, -1819.25390625, 2753.69921875, - -4877.796875, -4975.921875, 3283.39453125, - 1171.66015625, 2405.08984375, -2600.34375, - 5293.234375, -1324.3515625, -14805.703125, - -4456.7578125, 944.41015625, -7599.21484375, - 7860.984375, 2387.7421875, -17488.7421875, - -12351.359375, 6575.8359375, -8766.09765625, - 3569.796875, 12509.984375, -3365.671875, - -7141.4375, 1284.015625, 12551.4140625, - -10886.35546875, 1885.390625, 7056.28125, - 1125.890625, 705.734375, -237.3671875, - -842.3515625, 1466.3828125, -12606.921875, - 2922.140625, 11245.3203125, 4610.359375, - -6254.20703125, -13144.50390625, 4608.55859375, - -608.6953125, -3155.5625, -10944.87109375, - -6777.7265625, -10812.8984375, -4404.73828125, - 10707.125, 5182.859375, -2149.8046875, - -17604.3515625, -10136.33203125, -4747.2890625, - -1898.76953125, 6657.5078125, 5422.0078125, - 12188.6953125, 5367.8984375, -7868.5546875, - -13516.43359375, -6625.4375, -1787.8984375, - -1922.765625, 1340.796875, 129.98828125, - 1047.44921875, 9620.8359375, 1510.23828125, - -815.8828125, 21442.6796875, -1085.671875, - -3546.625, 5956.6171875, 13939.2734375, - -1406.59765625, 5588.02734375, 12094.796875, - -1503.8046875, -718.4140625, -2069.9453125, - -2235.12109375, 7459.30859375, -9375.27734375, - 7403.765625, -1584.78125, 434.26953125, - -4926.12109375, 13624.17578125, -3829.6015625, - -2872.9375, -7831.27734375, 9948.765625, - -2982.7578125, 3055.5234375, -10684.3984375, - -14334.171875, -9211.9453125, 5277.6640625, - -8728.703125, -343.296875, 6494.2890625, - 7308.5, -187.796875, 7075.05078125, - -10462.74609375, -11484.76171875, 85.73046875, - 1333.3125, -2280.1953125, -261.578125, - 11588.921875, 3084.02734375, 172.01171875, - -5202.24609375, -3380.0703125, 9614.6640625, - -13824.0, 2265.203125, 6882.00390625, - 6955.71875, 3646.2890625, 6953.40625, - -2612.8828125, -16620.57421875, -420.9140625, - -3848.65625, -2046.0859375, -3751.8515625, - 1629.73828125, -10569.1328125, -21392.25, - -3227.4375, -3733.48828125, -12445.73046875, - 6227.03515625, -13859.921875, 696.0, - -13103.421875, -1131.6484375, 8138.69140625, - -2351.78125, -644.328125, 12531.2109375, - -2488.578125, 5736.03125, -11911.40625, - -11696.02734375, -6289.83984375, 5126.31640625, - 3825.453125, -62.95703125, -3594.34765625, - -854.05078125, -3301.078125, -4629.984375, - -4457.41796875, 25949.27734375, 1268.50390625, - 1903.4609375, -1490.03125, 3449.35546875, - 1163.4140625, 250.13671875, -7846.3359375, - -15975.078125, 13493.93359375, -2364.3359375, - 4484.1953125, 357.98828125, 17570.26953125, - 9666.77734375, -4286.81640625, 3160.7109375, - -2832.4296875, 6845.53515625, 12035.171875, - -14321.4296875, -9264.890625, 5905.1328125, - 1122.015625, 1423.15625, -6645.7109375, - -1145.7734375, 6252.28515625, -9403.546875, - -15233.98828125, 6998.63671875, 7912.9375, - 5555.55859375, 7489.4921875, 2881.4375, - 1250.52734375, -298.68359375, -1634.66015625, - 1653.109375, 3229.7890625, 2469.4296875, - 2086.46875, 11029.078125, -10390.3125, - -8877.08984375, 8491.1796875, -5203.4375, - -6196.53515625, -4925.1953125, 7055.1796875, - 9501.82421875, 7502.80078125, -135.953125, - 1993.609375, -2768.7421875, -10121.44140625, - 459.0703125, -26300.62890625, 4111.8515625, - -128.02734375, 6080.08203125, 4002.21484375, - 7201.1015625, -345.0859375, 7499.52734375, - -13851.625, 17703.0625, 4402.4765625, - -8819.87109375, 16910.421875, 6786.80859375, - -6141.375, -3979.984375, 2991.7109375, - -10614.4765625, -4165.84765625, -4797.859375, - 6658.41796875, 10063.9453125, 1208.765625, - -3207.2421875, -848.3203125, 4264.9609375, - 6445.3359375, -6950.49609375, -2669.2421875, - 9162.66796875, 4362.26953125, -11433.21875, - -21590.890625, 920.3046875, 3886.171875, - -10671.15625, -9822.90625, -2413.390625, - -7945.5546875, 31731.01953125, -7336.51953125, - 2654.6328125, -3627.04296875, -628.890625, - 5647.6484375, 6054.59375, 18906.0859375, - 4933.5234375, -5695.5234375, 1668.3046875, - -1214.1796875, -3529.65625, 5712.984375, - 582.30078125, -9725.6015625, -4622.140625, - 33763.25, 16820.9453125, -6264.125, - 8507.5703125, 8577.5, 8861.16796875, - -3050.1328125, -1599.21875, -2568.34375, - -5865.0234375, -1618.796875, 6874.8359375, - -1277.28125, 3303.515625, -4937.84375, - -9493.546875, -2292.42578125, -7100.515625, - 5799.6328125, 93.59375, -9206.55078125, - -326.8359375, -6841.53125, 7962.5703125, - 1778.18359375, -11064.37109375, -7285.4921875, - 1336.21484375, -10450.12109375, -9098.09375, - -1640.91796875, -1077.52734375, -3705.0, - -2618.546875, 2704.22265625, -808.1953125, - 1022.50390625, 5492.78125, 9145.87890625, - 2247.2109375, -1120.1640625, -8727.7578125, - 911.2109375, 650.75, 10020.6015625, - 4262.6171875, -7956.4609375, 1717.87890625, - -940.43359375, -786.94921875, -4941.796875, - 4032.1015625, -8774.734375, 2687.953125, - -1421.3203125, 1109.3984375, -790.046875, - 4021.140625, 1954.8828125, -4317.05859375, - 1200.2734375, -5390.3125, 1668.515625, - -2332.6875, 4814.109375, -1303.9140625, - 418.62109375, 626.4375, -3689.00390625, - 2998.1640625, 6815.5078125, -1779.02734375, - -10730.77734375, -1738.09765625, 919.05859375, - 8701.5234375, -121.40625, 5987.4921875, - -1791.05078125, -7491.21875, -2028.6015625, - -3215.234375, 1925.32421875, 96.30078125, - 3024.40625, 29.5703125, 328.77734375, - -2536.30078125, 8289.41796875, -1111.1484375, - -8333.8984375, 3912.4453125, 278.07421875, - -2629.09375, 3724.3515625, -349.8671875, - 3671.96875, -6579.515625, -5145.91796875, - 8851.30078125, -16770.6328125, -15611.96875, - -5662.125, -3383.1875, -5333.2421875, - 7767.1171875, 2283.140625, -3474.6171875, - -2245.4140625, -2782.90625, 4231.20703125, - -10721.9296875, -13444.0859375, 227.90625, - -33.8828125, -10962.625, 2279.875, - 8913.4609375, 6985.9296875, -20051.265625, - -2212.56640625, -4685.140625, 2387.625, - -3039.2890625, -3921.45703125, -2114.8046875, - -5298.43359375, -8516.53125, -4439.7578125, - 8320.01171875, -30788.296875, -8174.28125, - 10387.265625, 5230.765625, 1279.5546875, - 883.9453125, 2859.1328125, 339.515625, - 9909.7265625, -5415.7109375, 1246.41015625, - -4011.3359375, 4203.80078125, 554.234375, - 8953.02734375, -510.875, -2214.296875, - -2974.546875, -3236.296875, 12816.28125, - 3637.8046875, 2889.8203125, -13243.046875, - -2200.59375, -6566.34765625, 9837.3046875, - -1003.8046875, 16513.9765625, 10344.4765625, - -6984.53125, -3922.0, -17804.703125, - 3100.59765625, -111.0234375, 83.08203125, - -14447.5703125, 9187.77734375, -6003.6328125, - -10673.9453125, 14981.1171875, -1142.5625, - 1705.125, 11543.3828125, 3576.40625, - 5022.625, 11516.47265625, 6110.453125, - -5621.0625, 6007.1484375, -4071.9140625, - 1555.3046875, -3850.28515625, -1975.2265625, - 6240.9765625, 5575.37890625, -923.65625, - 7358.6875, 4753.859375, -843.07421875, - 1028.0, -6338.31640625, -148.85546875, - -35.16015625, 1454.04296875, 7569.2109375, - -11848.8828125, -7682.71875, -225.4765625, - 157.875, -9442.81640625, 37.6875, - 44.0390625, 8142.33984375, 187.94140625, - 11661.47265625, 10523.890625, -486.9453125, - 6175.3046875, 5111.85546875, -3727.63671875, - 111.3671875, -4211.6953125, -1984.6875, - 6285.109375, -227.9375, -4106.3203125, - -8268.609375, -18816.50390625, -16283.84375, - -155.625, 11372.41015625, 8414.55078125, - 1135.66015625, 6736.01953125, 11349.09375, - -3514.1640625, -15377.703125, -2209.078125, - -6479.00390625, 1710.98046875, -1598.55078125, - 274.4453125, -4490.0625, -5406.3203125, - 6380.0546875, -4002.9765625, -2812.171875, - -4731.984375, 7747.578125, -7605.53125, - -11247.046875, -4218.09375, 9579.5390625, - -2012.3984375, 409.2890625, 2381.15625, - 4196.6484375, -3048.8046875, -9355.73828125, - 4411.3984375, 3501.40625, -8383.9296875, - -4043.25, 2856.765625, -17063.24609375, - -6324.59765625, -6384.65234375, 4427.125, - -3433.26953125, 7213.3984375, -4960.53125, - -629.59375, -1764.41015625, 497.8515625, - 5843.703125, 6574.6171875, -3969.984375, - 1691.171875, 3594.984375, -9656.71484375, - -1393.85546875, -3286.390625, 803.8671875, - -6507.703125, -6275.3203125, -11158.90625, - 2987.296875, 6768.75390625, 4916.09375, - -8907.1796875, -9074.390625, 3150.6328125, - -2638.46875, 19578.453125, -5454.15625, - -5312.9609375, 16502.546875, -16029.84375, - 5492.90234375, 16101.3203125, -6240.734375, - -8425.515625, -9957.9296875, 5445.99609375, - -8594.95703125, -15207.7578125, 1304.578125, - -1623.6875, 248.85546875, -13940.87109375, - 10332.7578125, 2447.390625, 11982.234375, - 5695.703125, -854.1953125, -8011.046875, - -4083.515625, 47.99609375, 8005.2265625, - -3368.76953125, 11140.36328125, 6884.9453125, - -17632.04296875, 17753.26171875, 5639.1484375, - 2422.94921875, -8919.3359375, 4376.23828125, - -7499.921875, 18517.5625, 4856.72265625, - -27149.4921875, -7974.546875, 725.171875, - -1946.90625, -2926.96484375, -4802.984375, - -361.359375, 2840.5859375, -10788.8203125, - 22243.203125, 290.4296875, 2858.74609375, - -17491.3203125, 1091.15625, -920.8359375, - -2310.69140625, 756.55859375, 5772.90625, - 6614.01953125, 8112.5703125, 7863.484375, - 9703.40234375, 3072.1953125, 3957.4453125, - 2959.328125, 9267.4765625, -4745.703125, - -5932.71484375, 3624.859375, -5631.89453125, - -162.9140625, 11477.1328125, 4569.3671875, - 5891.125, 413.265625, 885.40625, - 8435.1640625, 16222.1875, -1111.33203125, - -6652.67578125, 6552.640625, 3702.91796875, - 15185.4609375, -3910.640625, -942.421875, - -6677.6875, -699.03515625, 252.1875, - 8651.3671875, -568.41015625, -2856.12890625, - -4824.375, -3262.03515625, 3243.5234375, - 15730.5859375, 9908.8984375, -1538.2890625, - -621.84375, -12344.91796875, 2923.859375, - -2498.70703125, -7773.5859375, -1159.125, - 1665.83203125, -2170.3359375, -4118.75, - 15089.984375, 3461.46875, 4621.11328125, - 1072.09375, -15626.8515625, -11191.4921875, - -2122.30859375, -1608.01171875, -1801.1875, - 6318.6171875, -805.8046875, -5883.84375, - -6288.5234375, -3919.1328125, -1612.3125, - 550.9921875, 1187.2734375, -4233.37109375, - -10678.8359375, -6048.484375, 1410.5546875, - -4673.75390625, -4894.375, -1040.6640625, - -2072.43359375, -5300.5546875, -15940.0703125, - 1449.921875, -5695.65234375, 1086.4375, - -3176.3046875, -1468.91015625, -1176.75390625, - 5095.07421875, -2722.55859375, -11164.23046875, - -2921.73046875, 4222.0625, -2072.9453125, - -172.85546875, 1684.69921875, -6516.5, - -10569.984375, -5196.578125, 5096.70703125, - -5416.0234375, 8200.62109375, 2340.234375, - -7273.03125, -3922.22265625, -6537.3046875, - 2115.8359375, 10684.390625, 1914.41796875, - -5815.3046875, 11942.53125, -2702.765625, - 1944.9609375, -1068.8125, 3198.1953125, - 14335.84375, 10528.7265625, 6991.2421875, - -1431.671875, -248.37890625, 3659.40625, - -52.3125, -100.95703125, -589.9453125, - 9303.59375, -6899.0703125, 12310.1015625, - -4185.0234375, -4293.36328125, 1406.6328125, - 1444.6328125, 11315.05859375, 3725.43359375, - 7080.6484375, 11297.0078125, 2228.23828125, - 2748.4765625, 7414.90625, 7730.890625, - -367.65625, -2978.0859375, -2214.828125, - -3863.69921875, 7311.79296875, 2374.1171875, - -11613.13671875, -1715.0078125, -3292.875, - 2081.3828125, 1665.4375, 2344.390625, - -7846.859375, 3992.80078125, 456.8359375, - 7420.24609375, -8546.5703125, -4409.390625, - -2062.26171875, 13989.2421875, -1441.74609375, - 2474.55859375, -7774.3125, -9904.390625, - 5891.22265625, 5377.8359375, -1122.4609375, - 18174.13671875, 904.26953125, -3209.890625, - -4462.796875, 808.40625, 6228.640625, - -7633.421875, 7248.59765625, 2829.53125, - 1610.35546875, -13868.078125, 3244.5703125, - 7927.6640625, -2003.328125, -7339.1171875, - -2196.34765625, -5002.7578125, -10095.4375, - 1559.4296875, 7856.18359375, -4738.6015625, - -9392.9609375, -4133.96875, -219.328125, - 8313.16015625, -1511.67578125, 10089.59765625, - 5607.65625, 2902.55078125, 1491.0234375, - -4841.828125, 1449.2734375, -12512.25, - 395.16015625, 2729.94140625, 3898.5703125, - 1296.23046875, -5055.0390625, -4198.7421875, - 793.6484375, 32.1640625, -5797.125, - 5204.36328125, -9824.984375, -7408.1875, - 6489.11328125, -21418.734375, -9111.4296875, - 9460.26953125, 4297.08984375, -3846.3125, - 973.1484375, 17417.015625, 3941.7109375, - -12216.53125, 803.78125, 9935.1640625, - 3558.34375, 5291.59765625, 4511.03125, - -3300.4375, 4863.015625, -4894.625, - -3279.7421875, 2938.890625, -9160.90625, - 11988.98046875, -8059.0234375, 3721.984375, - -4993.515625, -1350.3125, -2821.828125, - -4643.421875, -7232.984375, 2651.0859375, - -1637.5546875, -568.55859375, 2774.9296875, - -2250.15234375, -157.6640625, -7083.46875, - 13553.6796875, 2540.4296875, -13527.6328125, - 1858.3203125, 6681.890625, 672.078125, - 4198.6328125, 4700.984375, -1678.59375, - -3428.4140625, -954.18359375, -17209.453125, - 2586.7109375, 6855.28125, 1701.22265625, - 2397.46875, -2487.1953125, -2613.5, - -2560.61328125, 649.609375, 770.2421875, - 746.015625, -3014.2109375, -2059.0078125, - -9455.734375, -5567.296875, 1259.6953125, - -4075.8125, -8729.7890625, 4023.34375, - -1864.25, -376.390625, -3884.23828125, - -9161.03125, 14083.3828125, -9537.421875, - -4938.09375, -1407.8359375, -3597.21875, - 17523.265625, 1182.34375, 5874.5390625, - 2284.45703125, 2395.03125, 3331.03515625, - 4687.8359375, -2822.73046875, 5670.94140625, - -5619.6875, -1752.25, -1546.1796875, - -9086.6328125, 2474.74609375, 1424.4609375, - -10885.59765625, 120.79296875, -4764.34765625, - -586.1875, -1584.67578125, 173.93359375, - -4410.95703125, -11728.1328125, 710.01953125, - -5380.23046875, 7776.6875, 6583.7421875, - 452.64453125, 12654.4609375, 11184.328125, - -1680.6875, -5399.7578125, -5931.8125, - 5928.75, 11268.375, 4915.2734375, - 2604.4609375, -13248.23046875, 4296.734375, - -19538.9296875, -10638.078125, -9361.9375, - 5176.984375, 5077.3046875, 9395.29296875, - -1206.7109375, 467.984375, 5668.640625, - -4921.765625, -1717.8671875, -3995.25, - -8805.703125, 4430.26171875, -8037.46484375, - -22621.265625, 120.0, -3533.3515625, - 7317.2578125, 666.265625, -6590.9453125, - -16337.171875, 19615.296875, -231.96875, - -3368.828125, -7545.54296875, -1681.0625, - 3501.59375, 3472.7734375, 4814.08203125, - 9625.546875, -6721.7109375, 2942.38671875, - -8889.0546875, 3469.53515625, 10681.0546875, - -468.11328125, 168.76171875, 8250.6875, - 13615.21875, 18438.640625, 2649.546875, - -5477.1953125, 4958.21875, -3424.7109375, - 840.6484375, 407.7265625, -1124.7421875, - -21890.27734375, 5024.84765625, -10318.19921875, - 6450.3984375, -16542.65625, 1566.8828125, - -6474.94140625, 2160.640625, 1152.71875, - 5791.9609375, 573.796875, -12189.3359375, - 4484.87890625, 657.6484375, -8182.296875, - 8382.6328125, -5620.484375, 4188.78125, - 2506.0, 3046.2109375, 5381.7734375, - 4557.96484375, 14191.68359375, -8179.74609375, - -5995.3359375, -5853.4296875, -3103.765625, - 1674.94140625, -561.1328125, 5390.1328125, - 3815.2578125, 7108.515625, 71.4296875, - 8640.3203125, -12120.8046875, -14702.7421875, - -3610.4765625, -4201.6015625, -731.66796875, - -9035.1796875, 2287.76953125, 4148.9375, - 3668.203125, 1388.90234375, -3079.625, - -7498.96875, -7622.2890625, 987.57421875, - 9509.6328125, 3554.203125, 9387.30078125, - -6261.4375, 16434.26953125, 7219.3125, - 2350.0625, 5706.921875, -12350.8125, - 1381.06640625, 5323.6796875, -3505.25, - -4218.578125, -14720.18359375, -6336.34375, - -965.52734375, 3096.1328125, -5954.2578125, - -12597.99609375, 3799.6328125, 725.4921875, - -11505.91015625, -387.7421875, 438.4453125, - -4436.421875, -5299.69921875, 17626.421875, - -1089.1171875, -1596.4765625, 10307.6953125, - 16110.6484375, -541.5703125, -5242.3515625, - -10158.8203125, 607.3359375, 6040.71875, - 2499.82421875, -1317.65234375, -1848.5390625, - -602.7890625, 1754.453125, -3689.515625, - 5829.5546875, -3663.265625, -1841.4921875, - -13149.9609375, 4352.78125, -9735.703125, - -6086.859375, -4439.328125, 6590.2890625, - -3581.1328125, 789.828125, -5747.171875, - 11623.98046875, -2293.07421875, -2905.671875, - -658.265625, -10335.28515625, -5157.890625, - -9550.734375, -11945.9921875, -10386.2421875, - -1331.6171875, -2653.26171875, -9451.234375, - 2817.4375, -4725.875, 1620.0859375, - 1352.11328125, 8341.328125, -840.94140625, - 6658.640625, -3872.25, -1101.8125, - -4408.1328125, 16262.765625, -3228.62109375, - 5079.609375, 11940.02734375, 992.640625, - 774.3984375, -3703.71875, 4357.1953125, - 6033.453125, 429.65234375, 7488.5859375, - 1648.90234375, -5659.2890625, 6199.578125, - -8856.63671875, -2893.7265625, 3312.9375, - 3695.47265625, -4663.88671875, 3661.59375, - -5486.2265625, -523.25, -4690.21875, - 8495.8515625, 9321.8359375, -7228.921875, - 2064.71875, 3515.3203125, 1454.0625, - 4305.609375, -6833.25, 3879.21484375, - 2094.9921875, 6009.4453125, -7166.625, - 10055.46875, -200.0625, -5109.3671875, - -1014.84375, -9098.5546875, 1352.75, - -5314.8046875, -968.671875, -1025.51171875, - -2761.83203125, 6655.51953125, 2597.46875, - -7818.3359375, -9582.5546875, -2222.53125, - 10575.09375, 2408.74609375, -15529.4921875, - -20820.34765625, -13266.4140625, -291.3828125, - 3363.140625, 874.765625, 5109.859375, - -8508.5703125, 2032.4140625, 14934.265625, - -12869.3984375, 4350.1171875, -19084.3046875, - 31.12890625, 614.171875, 13082.609375, - 6523.53125, 3591.3203125, 20880.95703125, - -5634.671875, 5303.11328125, 19705.52734375, - 8701.18359375, -4632.0546875, 11284.53125, - -650.52734375, -6580.4375, -8695.23828125, - -8585.80078125, -12881.625, -5145.84375, - 2514.4296875, -11385.4921875, -5432.4921875, - 7219.28125, 17129.65625, 3371.328125, - 5823.9140625, -2186.66796875, 8170.46875, - -15803.4296875, 22662.3828125, -602.65625, - 1465.3125, -7857.4140625, -6351.984375, - 3025.94921875, -1022.9375, -11568.3828125, - 14975.875, 20665.9921875, -1522.7109375, - 12303.828125, 2133.53125, -3325.5, - 4159.6953125, 555.0, 12972.96875, - 4235.46875, -3067.8046875, 1522.91015625, - -3448.87109375, 7178.65625, 16918.8203125, - -2841.46875, 1187.0234375, -13684.9375, - 5305.26171875, 7815.359375, 6997.984375, - -8315.84375, -5594.109375, -4187.6875, - -2009.359375, 7387.40625, -2578.796875, - -19545.28125, -6247.4140625, 5690.296875, - 7594.36328125, -2423.9140625, 2015.890625, - -1760.61328125, 4295.09765625, -8125.34375, - -1973.37109375, -3157.734375, -4860.1015625, - 1116.84375, -556.8671875, -3934.44140625, - 14741.60546875, 4106.44140625, 255.1640625, - 4340.16015625, 119.3828125, -2931.265625, - -1906.63671875, -5269.5703125, 667.19140625, - -2058.17578125, -9273.79296875, -1482.78125, - -7302.53515625, -13991.296875, -734.6953125, - 11032.83984375, 11126.7421875, -11648.203125, - 3234.98046875, 6492.5625, 145.203125, - -12270.8515625, 4084.5390625, 6373.046875, - 4319.65625, -7161.453125, 6603.85546875, - 8797.5703125, 6555.0703125, -5654.64453125, - -3574.4765625, 5429.875, 1474.45703125, - 4395.6015625, 4539.63671875, -3663.625, - 660.4140625, -780.96875, 6387.17578125, - 17196.85546875, 1874.21484375, 5394.07421875, - -25625.58984375, -4234.328125, 112.0625, - -2638.5234375, 889.6484375, 4614.16796875, - -3854.22265625, -2658.0390625, -4606.96875, - 8520.34375, -2300.95703125, 4647.8359375, - 805.09375, -17802.1640625, -3932.8125, - 15893.98046875, -6150.078125, -4180.375, - 2569.31640625, -2647.87890625, -1978.36328125, - 1356.859375, -15840.625, 9493.03125, - 5680.90234375, 1041.515625, -14535.62109375, - -15050.4375, 8390.7734375, 11156.671875, - -2235.0703125, 5844.53515625, -6268.6015625, - -5597.03125, -4802.91015625, -7865.62109375, - -2399.140625, -3801.5234375, 1579.8046875, - 2835.5, 3423.71875, -8275.5625, - -5882.453125, 2709.98046875, -1906.21484375, - -3139.546875, 8930.53125, -4894.6796875, - 2112.0, -4317.69140625, 3661.53125, - -1144.6796875, -5840.23046875, -15053.35546875, - -7268.6875, -9035.41796875, 3174.921875, - 2039.828125, -1811.328125, -17150.5, - -19368.015625, -8787.79296875, -16553.9609375, - 2428.9453125, -236.4140625, -6968.6796875, - 5096.515625, -4701.2421875, 14525.0078125, - 9199.359375, -5192.8828125, -2252.20703125, - -566.375, 5219.75, -5629.78125, - 2358.4921875, 17259.41015625, -1632.921875, - 5839.84765625, -8929.34375, 3697.15625, - -1958.6953125, -1871.91015625, 8302.3125, - 7977.59375, 406.703125, -13535.2421875, - -23594.5859375, -2447.078125, -6664.296875, - -990.390625, -8192.8203125, 29062.515625, - 2208.62890625, 223.78125, -2059.6171875, - 3577.0, -3395.53125, 9119.046875, - -1775.50390625, -2867.484375, 1713.2890625, - 1867.67578125, 9009.59375, 3904.828125, - -17436.78125, 1735.296875, -4849.4921875, - -2138.96484375, -3222.01953125, -3204.46875, - -5236.9765625, -10530.6796875, -13142.48828125, - -3240.078125, 232.78515625, -3476.8671875, - -7095.2265625, 4774.46875, -18713.8671875, - -764.34375, 12358.1796875, -8817.86328125, - -1545.84375, -931.03125, 2502.7421875, - -5707.4375, 2763.76171875, -5123.18359375, - 3550.60546875, 4684.453125, 5033.734375, - -5969.3828125, 1393.50390625, -4517.09375, - 5054.73828125, -8526.875, -3880.90625, - -3650.375, -2911.2578125, -264.9765625, - 8850.34765625, -6081.109375, -3323.9296875, - -3907.86328125, 6686.66015625, 282.2734375, - -4198.140625, 3704.0390625, 4912.1953125, - 2705.59765625, -4229.234375, -812.73828125, - 3016.29296875, 5268.6640625, 28.7421875, - 6093.578125, -9933.1796875, 5504.59375, - -21.00390625, -7667.0703125, -2289.609375, - -3814.11328125, -9453.28125, 4376.1328125, - -4453.2734375, 2070.76171875, -616.359375, - 1603.265625, -1828.375, -2408.4375, - -9561.1953125, -2402.3125, -15588.41015625, - -7063.359375, -14216.7734375, -692.30859375, - 370.609375, -987.5859375, -2082.00390625, - -3091.5625, -1307.22265625, -642.8984375, - 2268.0390625, 3659.98046875, -4129.0546875, - -5693.70703125, -7131.546875, 2436.5234375, - 1952.6328125, -353.3515625, 3548.09375, - 1979.45703125, 243.328125, 4394.125, - -1361.78515625, -176.91796875, -2589.2734375, - -4218.1875, 1526.71875, -4469.125, - 13925.328125, -6056.2734375, 854.921875, - -1051.45703125, -5738.84765625, -5458.5078125, - -438.44140625, 1426.14453125, -1602.1796875, - -3179.2890625, 9134.52734375, 645.6875, - -4400.40625, 9453.0859375, 4263.328125, - -4959.0234375, 3097.4140625, -1352.0703125, - -7276.3125, -5772.3515625, 3110.20703125, - 4657.48828125, 852.4453125, -16923.05859375, - 3476.703125, -6610.7421875, 5415.42578125, - 13360.55859375, -798.03125, 1736.40625, - -3936.39453125, -15813.53515625, -3572.3984375, - 9110.78125, -461.0390625, 5040.921875, - -10307.41015625, 13372.02734375, -17580.15234375, - -1186.37109375, 536.296875, -11296.41796875, - 10431.3125, -6435.5625, 6923.1640625, - -135.98828125, -11271.54296875, -8301.40625, - 2611.734375, 10185.0703125, -1348.6484375, - -14031.7890625, 6204.984375, 86.0390625, - -15020.3515625, 4131.45703125, -16839.46875, - -4104.0390625, -1824.9375, 5421.94921875, - 2329.8671875, -17689.1640625, 17535.375, - -17696.5390625, -47099.41796875, -8251.3203125, - -31298.84375, 2047.765625, 1455.59765625, - 7677.84765625, -16335.7578125, -8127.19140625, - -53.2578125, 15026.796875, -17.75, - -16241.7578125, 4337.96875, -2428.8984375, - -10007.515625, -7624.9921875, 9603.48046875, - -196.2265625, 207.7109375, -18044.390625, - 22640.375, -13428.5859375, 8887.82421875, - 971.1171875, 8851.9296875, -14569.4140625, - 508.1171875, -7491.578125, -11047.34375, - 24198.16796875, -232.75, 6947.1953125, - -6361.15625, 6240.84375, -1225.2734375, - -10043.5078125, -990.8984375, -12049.796875, - 6448.390625, 0.375, 211.546875, - -1058.9453125, 10745.359375, 3011.0234375, - 27113.08203125, -7046.11328125, 1783.234375, - -1038.640625, -17986.640625, 3918.0546875, - -9685.87109375, -12910.29296875, 5015.421875, - 2368.1875, 106.09375, 6135.375, - -5032.7890625, 9320.796875, -8613.546875, - -8127.6015625, 2484.37109375, 240.1640625, - -1843.5078125, -5785.12890625, -5741.1015625, - -3110.78125, 301.34375, -3932.84375, - -4648.1875, -11346.609375, -8727.0546875, - -2952.4140625, -2559.2421875, 4467.703125, - 1850.2109375, 14590.390625, -615.828125, - 2733.2265625, -9904.8671875, 1424.84375, - 13849.5703125, 1967.0078125, -5137.10546875, - -21018.421875, -10549.9453125, 3632.4765625, - 9284.6640625, 7485.6875, -9148.0703125, - 3782.5, 6028.51953125, 6116.3203125, - -1293.51171875, -1911.10546875, 6119.828125, - 1204.703125, -12919.9609375, 16415.1171875, - -4753.890625, -4681.0703125, -6518.84375, - 7333.8671875, 15.046875, -8616.87890625, - -6561.0703125, 2500.484375, -16632.3203125, - -20798.140625, 12230.609375, 572.6953125, - -16320.5, -2223.2421875, -2324.98046875, - -29109.6171875, -17363.48828125, -698.8515625, - 2096.203125, -8678.52734375, -22307.703125, - -4212.4453125, -1409.0, -7154.953125, - 2969.234375, -3353.5, -4797.89453125, - -4722.125, -3006.53515625, -2190.40625, - 746.0390625, -3175.91015625, 3779.078125, - 4588.1484375, 1176.921875, 1401.9921875, - -7693.125, -7741.7265625, -4912.8671875, - -4229.75, -127.5703125, -1848.046875, - -2436.91796875, 3437.203125, 1525.3046875, - -12087.6171875, -556.0625, 3141.171875, - -1500.03515625, 7594.921875, 4080.421875, - -2418.03125, -2640.76171875, -346.796875, - -886.0390625, -1441.6640625, -3847.71875, - 996.0546875, 3810.703125, -7609.1484375, - -5954.8828125, 3421.046875, 669.9453125, - -4999.078125, -1811.55078125, -9174.4296875, - 19985.40234375, 5508.078125, -4011.6875, - -1177.765625, -1722.4140625, -4095.39453125, - -4164.6328125, 4131.890625, 2947.109375, - 8360.5078125, 710.33203125, -5789.83984375, - -5567.3828125, 6034.1796875, -946.03125, - 82.87109375, 5465.01953125, -205.5546875, - 8240.4140625, 5059.1171875, 6020.625, - -3078.03125, -6255.25, 6074.63671875, - 9665.16796875, -761.09375, 1022.08984375, - -498.3359375, 290.4921875, -601.0546875, - 7078.98828125, -7900.984375, -1514.8984375, - -5215.29296875, -6942.65625, 3810.8671875, - 114.015625, 4387.53515625, 2544.59375, - -2713.0859375, 5138.9140625, 3248.26171875, - 7925.68359375, -8048.625, -10085.6640625, - 11901.875, 12342.1640625, 12088.15625, - 3204.875, -3178.75, -264.02734375, - 5855.6484375, 7957.95703125, 8427.609375, - 11085.7578125, -10242.12109375, 6051.734375, - 1929.828125, -2165.78515625, 4075.0546875, - 1459.4765625, -10574.109375, 43.140625, - 1037.2109375, 8794.1796875, -467.734375, - 21552.296875, 3404.078125, -277.703125, - -1034.3984375, -13049.1171875, -5360.9609375, - 137.046875, 4337.421875, 272.28515625, - 2030.1640625, -2859.9609375, -9764.15625, - -11358.09765625, -2572.828125, -9965.828125, - -4135.171875, -1939.328125, 12024.21875, - 3655.0546875, -10717.6640625, -2150.953125, - -7611.15625, 6517.1328125, 2662.9765625, - -14060.109375, 3693.2578125, 6174.8125, - 16407.5546875, -4121.30859375, -1364.5234375, - 6001.03125, -10962.9375, 4699.02734375, - -5106.90625, -1068.21875, -7753.640625, - 965.734375, -2161.3359375, 5107.38671875, - 6403.8046875, -7110.29296875, -6237.921875, - -1659.765625, 1467.9140625, -12439.59375, - -1088.5546875, -262.796875, -2077.3359375, - 6307.703125, 4612.7734375, 3209.015625, - -2586.125, -22744.73828125, 4072.7734375, - 7808.0234375, -823.484375, 4237.4921875, - 5714.6640625, 4223.2578125, -7279.796875, - -4272.5625, -2652.875, -201.953125, - 2092.9375, -4063.109375, -2909.890625, - 2371.109375, -4349.265625, -859.5625, - -14779.2890625, 14888.484375, 3732.8125, - -2933.8203125, -6009.9296875, 13417.265625, - 1436.48828125, -1511.9296875, 388.28515625, - -556.3515625, 3467.8203125, 4093.2578125, - -10432.625, -2380.84375, -1412.3359375, - 13785.6875, -4485.578125, 622.625, - 14520.328125, 3319.9921875, -3849.21875, - -434.78125, -2399.390625, 1087.96875, - 3563.1171875, -5434.609375, 63.5625, - 3250.765625, -4128.25, -4604.921875, - -5236.84375, -8034.88671875, -1139.59375, - 4767.90625, -2220.9375, -948.9765625, - -3541.5859375, 1464.359375, -568.59375, - -955.27734375, 1143.0078125, -1899.7109375, - 654.90625, -3735.109375, -2631.16015625, - -8996.828125, -4067.3125, 6101.26171875, - 12692.390625, -17328.453125, -5646.44140625, - -5681.46875, -5042.078125, -5926.328125, - 5018.76171875, -12123.484375, 3883.75, - 3033.25, -8108.8125, 9480.6640625, - 901.69140625, -1494.625, 7182.7890625, - 6641.00390625, -6251.64453125, -627.9375, - 4215.21875, -3796.62890625, 7982.484375, - 4304.421875, 3477.12109375, 420.9375, - -3172.765625, 5746.375, -6782.265625, - 4448.90625, -16892.05078125, -3951.65625, - 519.3046875, -21139.21875, 7684.1640625, - -2694.0859375, -1508.515625, -11634.78125, - -1473.74609375, -5520.0546875, 7546.6484375, - 697.6953125, -2187.0546875, -8036.2734375, - -11773.640625, 1910.6875, 8605.8203125, - 462.7734375, 5670.8046875, 9330.78125, - 294.73046875, 4100.609375, 11282.765625, - 6451.10546875, 1016.47265625, 10234.484375, - -7894.05078125, -7155.59375, 7189.7109375, - 3347.640625, 8171.6875, -9086.5703125, - 5099.4296875, 601.984375, 762.8359375, - 1042.640625, -3918.3125, 1240.109375, - -3986.7578125, 4780.65625, -4459.703125, - -1113.2109375, -8790.46875, 150.8359375, - 2994.6875, 6173.71484375, 10862.1328125, - -5102.58984375, 5157.671875, -9732.9609375, - 2593.6484375, 6097.44140625, -3764.9921875, - 2088.640625, 4819.0703125, 688.1953125, - 2855.9609375, -6189.75, -3357.28125, - -3953.609375, 12968.2734375, 645.3125, - 6621.265625, 9544.4140625, -1543.09765625, - -3700.5390625, 10361.84375, 12675.5703125, - -3588.640625, -16765.3671875, -8641.4453125, - -2870.765625, 2088.64453125, 3360.9765625, - 9220.0625, 2024.5703125, -4939.1796875, - 3907.421875, 4881.18359375, 874.7421875, - 3226.6484375, -11177.1953125, 4219.56640625, - -1623.4296875, -6650.328125, -1793.77734375, - 8156.1015625, -7843.4765625, 2709.28125, - -16952.5, 4885.3359375, -4618.3046875, - 4549.8515625, -4153.890625, 8372.75, - -1135.8046875, -891.15625, -22972.6484375, - 13404.1953125, -4127.609375, 1508.7890625, - 1737.7109375, 17257.98046875, -10142.05859375, - 5337.1953125, 275.98828125, 5593.984375, - -8897.5, -4021.41796875, -5317.015625, - -6064.93359375, -8407.4609375, 1724.66796875, - -6457.4765625, 9236.0078125, -2845.02734375, - 4619.234375, -5867.1796875, -519.015625, - -4067.0234375, 3213.125, -6058.46875, - 3736.23828125, -3060.71484375, -1470.9375, - -18600.7265625, -1956.40625, 3226.9609375, - -2555.609375, 3572.546875, -8596.8125, - 3133.73046875, 6283.0078125, -2610.4765625, - 6660.75, 3485.85546875, 6339.984375, - -533.9453125, 12142.19140625, -743.91796875, - -3165.76953125, -6207.25, 1204.625, - 5168.625, -4552.25390625, 4786.2109375, - -804.7421875, 7322.15625, -2195.36328125, - 770.15625, -1410.140625, -1045.046875, - 6169.73828125, -14029.62890625, 10458.48046875, - -6715.15625, -7223.16015625, 677.1328125, - -3481.8046875, -597.1640625, 2383.125, - -8743.4296875, 3018.3515625, -3117.203125, - -3072.2421875, 3266.1484375, 7641.046875, - -2807.59765625, -5073.9765625, -9034.3671875, - 3743.578125, 6285.64453125, -468.2265625, - -7455.4765625, -8033.6171875, 1039.578125, - -664.078125, -11005.87109375, -6091.71484375, - 281.94921875, 2430.41796875, 3597.57421875, - 8535.23828125, 13209.3203125, -2950.6015625, - 12777.609375, 3176.05078125, 4352.21875, - -3501.71875, 7190.78125, 2289.73046875, - -5417.46875, 190.3828125, -2644.7578125, - -2144.671875, -7097.4453125, -1787.390625, - -16577.46484375, 7266.625, 15224.171875, - -10146.953125, 6796.20703125, 2667.984375, - -11324.93359375, -14105.109375, 3604.11328125, - -10508.8671875, 29549.44921875, 8846.625, - 1015.73828125, -4554.3515625, -9615.234375, - -10684.109375, 11400.47265625, 6060.1171875, - 2316.9609375, -106.7265625, -1783.9609375, - 14253.7578125, -11978.59375, -7868.5234375, - -3144.59765625, -4202.9140625, -1645.3125, - 7463.22265625, 10016.5234375, 272.015625, - -854.859375, -658.703125, 4058.49609375, - -3936.1875, -9596.90625, 580.42578125, - -11043.6640625, 8256.6015625, 41.1484375, - 6957.71875, 1633.0234375, -1100.359375, - 1385.875, -14088.7109375, -13441.1953125, - -7284.8125, 18287.46875, -4235.6171875, - -3494.6640625, -4082.125, -3766.78125, - -5726.76953125, -506.03125, 301.734375, - -1317.01171875, -2223.85546875, 5561.296875, - -5828.125, -3152.921875, 2584.890625, - -3498.62109375, 6436.6953125, -1867.0390625, - 10989.828125, 474.3828125, 9367.81640625, - -7267.9921875, -1473.5390625, 10736.0859375, - -9623.63671875, -5848.75, -1680.828125, - 4510.21484375, -11657.7421875, 63.01171875, - 687.02734375, -10539.3046875, 1974.72265625, - 15390.375, -8358.73046875, 17530.1484375, - 889.80078125, -3897.828125, 2337.04296875, - -910.75390625, -3316.51953125, 16954.8671875, - 3516.08203125, 3368.12890625, 7817.2265625, - 5887.53515625, -11841.59765625, 2423.796875, - 24454.14453125, 13227.3046875, 8312.68359375, - -7418.734375, 5356.0546875, 1029.1015625, - 13615.2890625, -1736.84375, -4535.5234375, - 577.6953125, -1701.73046875, -2026.50390625, - -6994.2421875, 10757.65625, 17813.1015625, - -8020.375, 7886.8828125, -3676.6484375, - -12162.2109375, 7321.46875, -5316.1796875, - 3877.03515625, -10603.6015625, 14497.4296875, - 12990.921875, 2860.92578125, 9392.5546875, - -1094.171875, 4136.578125, 3202.01953125, - -2228.671875, -12285.671875, 11217.8984375, - -15980.78125, -4306.58203125, 10633.359375, - -9598.46875, -6759.81640625, -7783.03125, - -1955.6328125, 21371.8671875, -1617.84375, - 5117.75, 2695.9453125, 1241.7421875, - -18694.640625, 5849.171875, -11359.4375, - -12844.72265625, -815.546875, -8824.50390625, - 5207.96875, -7575.546875, 29091.92578125, - -3712.20703125, -1067.1015625, -9872.2265625, - 2808.15625, 19927.85546875, -1093.1328125, - 14220.20703125, -276.53125, -2307.39453125, - 5275.8671875, 10365.73046875, -347.8359375, - -51.12109375, 2552.99609375, -13918.53125, - 5442.7734375, -10266.46875, 10578.84765625, - 13512.5390625, 15304.25, -3033.14453125, - 7683.78125, 177.953125, -924.0, - 16965.91796875, -4881.10546875, -1318.859375, - 931.05078125, 4251.31640625, 10322.359375, - 781.19140625, -5247.60546875, 6143.72265625, - -2592.2734375, -1713.15625, 7323.4296875, - -6279.9921875, 2944.27734375, 1386.5703125, - -8597.9609375, -6373.87890625, 661.5390625, - 7413.3203125, 4468.5859375, 6513.59375, - -8371.8671875, -7510.4921875, -15648.359375, - 456.30078125, 2331.1953125, -4582.26171875, - 7506.10546875, -6886.890625, 11007.1328125, - -5255.3515625, -5128.609375, -5359.265625, - -11624.58203125, -2230.41015625, 904.82421875, - 24025.7890625, -17469.16796875, -7726.7421875, - -1222.67578125, 19629.4921875, -7538.71484375, - 7563.046875, -12759.90625, -14139.09375, - 851.13671875, -5430.39453125, 9255.796875, - -20180.6796875, 1040.58984375, 610.390625, - -9323.6484375, -1166.2421875, -13224.0, - -17655.28125, 4673.5390625, 5535.640625, - -2548.5703125, 192.703125, -5154.921875, - 795.7421875, 15269.0703125, -2954.1796875, - -17061.9140625, -923.546875, 12941.95703125, - -490.3828125, -4020.0234375, -1809.56640625, - 6478.0390625, -2851.34765625, -3484.52734375, - 2940.74609375, -2639.03515625, -3328.66796875, - -2285.97265625, 1737.0625, -4372.8671875, - 4898.28125, 5138.0390625, 16389.65625, - -5625.1171875, 5790.93359375, -882.9375, - 1406.37890625, 9849.171875, -5871.21875, - -6659.07421875, -9883.9296875, -79.9140625, - -2108.01171875, 3937.6171875, -3555.6875, - 5628.4296875, -6985.3203125, 3587.69921875, - -1167.09375, 4376.18359375, 2327.765625, - 3756.328125, 12814.75390625, 4635.03515625, - 9960.86328125, 4561.43359375, 80.1875, - 2296.6015625, -5842.0625, -781.9765625, - -23758.03125, -4960.84765625, -917.8046875, - 1419.6953125, -7063.1953125, 2949.5703125, - -19466.390625, 6528.67578125, -14229.8828125, - -14332.4609375, 3717.921875, -5905.953125, - 6684.3203125, 7134.5625, -2057.640625, - -2106.328125, -5483.2890625, -1976.61328125, - 1793.859375, 25947.47265625, -5807.6328125, - 4383.1015625, 11211.2109375, -14574.8125, - 2292.92578125, -9649.19921875, -1178.7265625, - -35.8359375, -5704.28125, 1544.59375, - 4883.84765625, -7600.5703125, 14860.7109375, - 9838.2421875, 3024.390625, -10097.046875, - -12188.82421875, -4929.61328125, -10733.203125, - -3774.734375, 11774.15234375, 4230.125, - -8791.046875, -22042.21875, -1615.3203125, - 12348.7421875, -9871.0, 783.734375, - 6523.57421875, -10964.75, -10817.1875, - -12394.62890625, -8858.62890625, -20453.7265625, - -19.5, -8115.29296875, -410.5859375, - -8455.65625, 7411.578125, 996.078125, - 14771.453125, -22297.20703125, -1877.0390625, - -22237.66015625, 17233.921875, -787.203125, - -66.328125, 19859.625, 2534.3125, - -11455.71875, 8398.46875, -18493.1328125, - -1838.0, -9672.921875, 1504.3828125, - 619.01953125, 4018.796875, -9357.828125, - -3132.11328125, 15923.31640625, 625.55078125, - 1318.43359375, 9282.2421875, 14745.42578125, - 5345.3515625, -2352.546875, -4039.7109375, - 6860.296875, 2567.71875, 9583.3984375, - -982.52734375, -4430.6796875, 4570.0, - -11918.82421875, -14208.390625, -6607.8046875, - -433.03125, 17387.8671875, -4465.578125, - -1570.3203125, 2589.3203125, -3203.08203125, - -22839.859375, -8074.3203125, -751.0, - 21741.9375, -4292.765625, 12787.58203125, - 6135.5625, -6004.765625, -8089.4375, - 4577.171875, 8378.53125, -1838.20703125, - -344.09765625, -6791.234375, 1423.1875, - 7178.046875, 4180.5625, 2959.640625, - -3624.7109375, -2571.59765625, -1412.44140625, - -5232.0625, -8063.4375, 6510.2109375, - 19159.171875, -4157.69921875, 3659.06640625, - 7796.4375, 2161.015625, 1815.625, - 14200.40625, -7991.76171875, 2306.3046875, - 7795.65625, -2605.5234375, 18305.796875, - -8035.109375, 3063.640625, -12457.25, - -24605.5078125, -1428.1640625, 16767.453125, - 2173.23828125, -3741.3203125, -4729.51171875, - 18634.2421875, 411.640625, 4619.4140625, - 19183.04296875, 2287.2734375, -12523.6640625, - -363.70703125, -11674.39453125, -16345.52734375, - 854.921875, -612.0390625, -3903.375, - -1095.1796875, 17014.17578125, 5818.02734375, - 15815.03125, -1165.328125, 6469.375, - 681.87890625, 8013.0078125, 2080.58203125, - 12547.09375, -12245.77734375, -13651.19140625, - 1585.58203125, 6252.8515625, -13274.34375, - -7684.421875, -5698.20703125, -4304.7734375, - -3031.51953125, 2970.21875, -4397.16015625, - 3395.8046875, 10316.26953125, 25086.1328125, - -2212.109375, 3177.69921875, -15745.671875, - -6025.80859375, 2873.9609375, -4413.45703125, - -4168.65234375, 12218.046875, 7920.77734375, - -3476.0625, 551.95703125, -7073.76953125, - 18938.08984375, 8532.37890625, 15967.875, - -4031.984375, -5892.125, -2772.69140625, - 1944.3828125, -3101.60546875, 5991.80859375, - 23908.0, -14634.546875, 2641.99609375, - 5907.75, 11514.26953125, -4439.8828125, - -20559.99609375, 2008.55078125, -4955.1328125, - 10220.90234375, 10562.8828125, -1391.65234375, - -8300.671875, -5958.734375, -3663.90625, - 1266.6796875, -8205.93359375, -852.671875, - 13716.078125, -12950.0234375, -1987.1875, - -16788.859375, -1425.96875, 5624.79296875, - 6323.28125, 9259.578125, -13379.90625, - -1404.671875, -9886.921875, -8587.3359375, - -96.625, 4801.359375, 5017.7890625, - 13285.72265625, -5264.7421875, 3597.734375, - 10530.25390625, -7961.5859375, -5205.0234375, - 774.5234375, -4674.1875, 6795.65625, - -823.0078125, 25400.35546875, 10121.734375, - -21442.8828125, -1600.4609375, 4443.97265625, - 357.859375, 7960.375, 7853.3125, - -8432.2109375, 2330.2109375, -1609.6015625, - -2966.296875, 10045.9375, -2586.328125, - 8937.0859375, 4193.0625, -4098.25, - -2853.6171875, 119.625, -3887.6484375, - -212.75390625, 3136.3671875, -2553.75, - 14469.4765625, 498.97265625, 6934.4140625, - 1031.3203125, -1282.2578125, 2334.671875, - 14132.3671875, 106.96875, 2119.46875, - -4532.1796875, 1724.16015625, 3829.62890625, - -7151.015625, 7824.2265625, -5483.421875, - 1600.47265625, 2859.1484375, -2135.8046875, - -10308.125, -22996.3125, 5948.15625, - -4078.265625, 791.1640625, -9866.1484375, - -5379.48046875, 4436.171875, -16500.625, - -2846.8984375, -1829.390625, 85.4453125, - -5507.40625, 3905.8203125, -5233.265625, - -8742.828125, -1467.6171875, 7465.453125, - -18193.921875, -4006.5859375, 5506.62890625, - 1450.3359375, 2739.125, -7567.28125, - 3609.5859375, 4661.0625, 8207.359375, - 7404.80078125, 13523.7421875, -1678.5859375, - -1912.71875, -3240.03515625, 4502.140625, - 4058.5703125, -5611.640625, -753.5, - 1104.32421875, -10559.0546875, 7541.234375, - -3685.015625, 2079.75, 9345.0078125, - -13146.25, 325.37109375, -1226.375, - 3645.6171875, 12447.3203125, -9711.75390625, - -17857.96875, -7814.49609375, -2144.8828125, - 582.4921875, -2133.0, -724.26953125, - -4162.1875, 166.0234375, -4523.01171875, - 5284.765625, -7666.359375, -1450.0234375, - -5835.1796875, -18589.34375, -2451.796875, - 609.9609375, -13478.3828125, -4704.7109375, - 12646.46875, -4494.296875, 4713.375, - 4451.5703125, -6877.203125, -5124.6640625, - 1620.66015625, 6225.0234375, 8378.39453125, - -11641.375, -1470.3046875, 5095.2265625, - -832.4453125, -1236.5078125, -3880.8828125, - 7386.75, -7026.7890625, -1235.1171875, - 3692.2265625, 9267.1796875, -4843.8984375, - -11678.4921875, -5067.49609375, 13236.85546875, - 780.9453125, -12199.09765625, -2096.75, - -3268.046875, 208.1484375, -6440.28125, - 15240.12109375, 1825.6484375, 1103.5546875, - -3638.1953125, 8385.30078125, 10980.453125, - 1157.3359375, -18609.1875, -2295.0390625, - 4745.484375, 10457.12109375, -2725.96875, - -3392.56640625, -7469.3359375, -3112.89453125, - -9254.9609375, 5962.625, 7942.046875, - -17581.49609375, 2549.17578125, 19492.75, - 7737.296875, -2991.5, -632.3125, - 1591.8359375, 3665.92578125, 3831.16796875, - 5200.6875, 706.078125, 12189.64453125, - -3494.046875, -10952.4453125, 8687.203125, - 8274.91796875, 12702.3671875, -8694.796875, - -3047.09375, 11020.3046875, 618.75, - -14191.078125, 19223.203125, 24148.4296875, - 11111.87890625, -8517.328125, 20585.67578125, - -7257.8984375, -2145.07421875, -6289.35546875, - 7831.30859375, -14481.9765625, 5908.09765625, - -21264.46875, 28869.78515625, 5070.734375, - -2924.12109375, 11140.984375, -3382.58984375, - -1784.4921875, 10962.3671875, -4938.0703125, - -4502.078125, -9341.32421875, -1878.0703125, - 4182.6328125, -3252.8203125, 46.84765625, - 6267.06640625, 13707.734375, -1434.1015625, - -914.7421875, -13439.9140625, -7494.375, - -9065.66015625, 7724.98828125, -18612.6953125, - 1404.09765625, -3688.640625, 6511.796875, - -7041.58984375, -2715.06640625, -5959.875, - 320.625, -1171.9140625, 2553.609375, - 10980.5859375, 14972.2734375, -12929.90234375, - 8161.59375, -4979.3203125, 20318.2734375, - -181.140625, 19230.5078125, 1330.63671875, - -13672.96875, 5534.265625, -4194.40234375, - 2353.4296875, -2662.71875, -9280.7265625, - 16.5234375, 2556.76171875, -17746.77734375, - -1805.65625, 3595.765625, -6451.40625, - -10060.7421875, -14422.890625, 360.046875, - 9578.21875, 3042.7734375, -1352.4296875, - 3928.3203125, -4176.703125, -8381.58984375, - 1978.35546875, -12951.109375, 1157.9296875, - -6846.578125, -6815.2265625, -2185.859375, - 8365.44921875, -186.234375, -7362.58203125, - -2447.0703125, -9081.421875, -1281.66015625, - 6957.0390625, 5052.0390625, -139.7890625, - -7981.03125, 4854.4140625, 6877.5078125, - 6988.3515625, -1293.1328125, 1849.16015625, - -5019.984375, 6862.015625, 3378.1796875, - 6590.78515625, -5929.45703125, -7730.96875, - 61.5390625, -17264.95703125, 1813.875, - -2338.953125, -3379.21875, 7198.375, - 4008.609375, 8348.17578125, -2028.76171875, - 8871.07421875, -13559.25, 4830.9765625, - -7202.453125, 6916.0859375, -6814.4140625, - 5412.09375, 9991.93359375, 681.7265625, - 6445.96875, 2786.8125, -7697.8671875, - 2961.71875, -20453.1328125, 18094.64453125, - -9751.359375, 9540.671875, -10116.515625, - 4085.5546875, 6610.953125, -3890.1328125, - -4251.984375, 1239.8125, -3096.6875, - 4446.14453125, 2540.6953125, -15389.4140625, - -2441.296875, 2233.09375, 1381.9453125, - 4709.75, -35.8359375, 5009.5859375, - 11989.3125, -7058.9765625, -19866.71875, - 3925.35546875, -3923.65234375, -12007.453125, - -1790.30078125, -119.34375, -2498.1484375, - 4976.953125, 1536.54296875, 523.4453125, - 2270.484375, 2903.5078125, -1391.546875, - -2736.0, -2603.15234375, -1818.6640625, - 1153.2890625, 1493.41015625, -7053.71875, - 982.91796875, 4951.30078125, 730.671875, - 6248.8203125, -4752.0, -2296.515625, - -7553.07421875, 908.390625, 528.5625, - 5113.16796875, 12960.36328125, -1137.078125, - -2848.21875, 1019.4921875, 2277.32421875, - 100.73046875, 2922.265625, -2002.5625, - -3735.2421875, -1146.28125, -4026.28125, - 14501.828125, 3886.734375, -817.9609375, - 4792.22265625, -1708.80078125, -9592.02734375, - -2664.1484375, -3981.8359375, 13156.234375, - -8057.6484375, -194.3359375, -1340.5625, - -5396.8203125, -2877.63671875, -1370.5859375, - 11986.703125, 3630.2265625, 7249.5703125, - 1499.515625, -21263.1875, 702.76953125, - -8584.875, 1375.3671875, -2126.94921875, - 3459.03125, -17081.6953125, -15100.0390625, - -955.5390625, 3706.203125, 741.3125, - -6915.515625, 1735.9609375, -8077.8515625, - -8502.984375, -2352.6640625, 7861.65625, - -1727.171875, 1353.484375, 1263.68359375, - -4817.25, 1301.37890625, 1812.375, - -1851.37109375, 8062.859375, -5213.7421875, - -190.296875, -3397.5234375, -1433.546875, - 1914.1171875, -4434.5703125, -2017.1015625, - 3836.2109375, 1035.7734375, 6918.36328125, - -3986.640625, -2862.234375, 2089.6015625, - -11164.203125, -6197.8359375, 506.53125, - -15995.26171875, 8558.0859375, 2474.59375, - -15682.890625, 1730.1328125, -3865.7734375, - 5181.80859375, 8368.01171875, 17898.75, - 13332.87109375, -6768.5546875, -1690.453125, - -585.5, -15355.45703125, -16924.421875, - 1289.421875, 6388.80078125, -1804.60546875, - -2517.8984375, 8876.5546875, -3426.34375, - -5744.21875, -690.57421875, 9203.76953125, - -5464.5859375, 7769.5234375, -7677.8828125, - 10912.9453125, 9463.515625, 17109.56640625, - 9865.796875, -7684.22265625, -294.2890625, - -765.6953125, 675.28125, -18306.8984375, - 9368.7421875, -4642.5390625, -2744.62109375, - 11584.42578125, -8325.2421875, 2832.96875, - 20512.18359375, 12882.73046875, -6841.5546875, - 6711.703125, -6077.328125, 3982.5234375, - 8798.0234375, 385.2734375, 3052.46875, - 7187.296875, 4321.62890625, -16574.6640625, - 2326.4140625, 6088.25, 3481.1171875, - -13479.23046875, -2425.94140625, 5634.5078125, - 18952.91796875, -4889.46875, 6953.2890625, - 429.06640625, -3339.2109375, 14892.0390625, - 1943.3671875, 761.3125, 21065.296875, - -1832.0546875, 1452.3359375, 5844.8359375, - 10335.16015625, 10913.203125, 2241.09375, - 318.1171875, 3144.1796875, 5184.48828125, - 1158.3125, 11291.76171875, -965.30078125, - 8763.984375, 8271.46875, -4825.5078125, - 9628.109375, 6857.61328125, -1498.96875, - 7149.25390625, 5009.078125, -2548.75390625, - 5305.7265625, -7427.7109375, 7484.671875, - 8407.0703125, 8364.73828125, 4652.43359375, - 3788.8046875, -3164.16015625, 186.6953125, - -5195.93359375, 7645.8515625, -7119.38671875, - -2871.34375, -2355.2109375, -12300.66015625, - 1690.10546875, 6029.50390625, 6496.5, - 2612.3984375, 1161.69140625, -1021.6953125, - -2867.1015625, 5446.703125, 971.20703125, - -14343.8828125, -1188.4609375, -765.328125, - 4262.80078125, 17060.30078125, 297.25390625, - 3334.609375, -5908.109375, 8512.1015625, - 520.6484375, 426.6953125, 8157.453125, - -1015.8046875, 9750.40625, 6547.6640625, - 2384.9375, -8453.60546875, -1941.2421875, - 8549.0078125, -8663.9296875, 12302.828125, - 13532.65234375, -2354.9140625, -12746.7578125, - -8210.1484375, -3263.703125, -12017.53515625, - 4137.0078125, 6228.203125, 4948.1875, - -12599.0, -10006.6640625, -1184.2578125, - -5036.5, -468.453125, 1877.5078125, - 16568.09375, 10775.54296875, -1.515625, - -7714.69140625, -6259.859375, -9359.26171875, - 9646.4296875, 4225.66015625, 3527.796875, - 4760.2421875, -3349.45703125, 941.4765625, - 1033.015625, 6312.53515625, -1401.4296875, - 3405.3125, -1692.19140625, -19289.9765625, - -125.75, -5269.9375, 521.2734375, - -6100.90625, 2101.9296875, -986.06640625, - -2988.7578125, -812.06640625, -11684.625, - -2199.53125, 2298.51171875, -18492.27734375, - -1253.5625, -5909.921875, 2422.1796875, - -697.98828125, 949.4453125, -5645.94140625, - 895.41015625, 2017.53515625, -12108.140625, - 7572.921875, -5491.57421875, 4154.75, - -115.0703125, -3632.11328125, -7374.9296875, - 5347.875, 553.6328125, -1162.93359375, - 13172.703125, -2344.71875, -6254.1171875, - 23699.0703125, 7820.671875, -446.2109375, - 3651.328125, -22687.24609375, -8705.2109375, - -6799.45703125, -3266.01953125, -12921.515625, - -183.359375, 6422.19140625, 2368.46484375, - -3306.61328125, -14356.7109375, -4915.109375, - -8401.0625, 3109.46875, 10642.19921875, - -2589.109375, -23007.50390625, -4745.0703125, - -15467.0390625, -5208.1875, -2389.53125, - -5993.15625, -27749.65625, -8856.921875, - 6531.8828125, 11613.453125, -10351.17578125, - -3127.4296875, 9412.33203125, 3322.69140625, - -10094.23046875, -6266.6640625, 14487.0078125, - 8039.40625, -983.953125, 3980.0234375, - 774.07421875, -9625.4765625, 7509.8203125, - -1834.5078125, -1523.12890625, -6097.34765625, - 3822.09765625, -14914.76953125, -3904.609375, - 255.5859375, -6143.015625, 2168.2890625, - 9021.0078125, 968.5390625, -9414.96484375, - -1891.41015625, 4937.48046875, -8077.17578125, - 90.5234375, 4523.12890625, 5153.64453125, - -8338.5546875, 558.34375, 7239.9921875, - -852.6796875, -861.16796875, 5480.3515625, - 10585.1875, 16034.4765625, 1134.96484375, - 302.625, 3069.59375, 2538.9375, - -6279.71875, 7240.98828125, 1928.453125, - -5024.3203125, -11745.1953125, 15465.9296875, - 7817.078125, 4320.984375, 13310.6484375, - -48.21875, -7773.96875, -4487.890625, - 3602.6640625, 6208.515625, 2859.30859375, - -2205.0, -5218.5078125, -10761.203125, - -5233.609375, 1739.76171875, 240.921875, - 8129.8203125, 2178.484375, -1498.3828125, - -2258.6171875, -8776.0, 341.7421875, - -10217.8984375, 517.03125, -8721.27734375, - -851.23046875, -7406.91796875, 322.2890625, - 4248.89453125, 21741.9375, -6023.203125, - -4185.7109375, -6058.19140625, 5499.71875, - -1300.46875, 4320.9921875, -1941.1875, - 2694.0078125, 1109.3125, -453.9609375, - -5068.27734375, 3140.68359375, -6204.328125, - -210.5625, 2079.03125, 5861.0703125, - 1504.3046875, 5835.62890625, -5095.78125, - -5520.1953125, 2331.4609375, -1521.1796875, - -4270.2734375, 863.73046875, -6903.640625, - -6537.1484375, -8708.1953125, 2340.78125, - 4994.5625, 5893.4609375, 199.83984375, - -344.71875, 2800.0625, -2545.265625, - -1213.45703125, -2183.7890625, -500.86328125, - 10680.7890625, -9610.296875, 12656.90234375, - 22618.06640625, -3960.4453125, -1264.265625, - -4159.41015625, -1845.546875, -5033.9296875, - 3629.45703125, -3632.3203125, 2022.70703125, - 2424.515625, -9733.76953125, 1570.2734375, - -3685.4375, -1228.453125, -1823.1640625, - 3634.09375, 5266.0390625, -9951.96875, - -5292.5, 3481.9140625, 2072.76171875, - -5737.78125, 3570.7578125, -2070.03125, - -9210.71875, 6964.0390625, 5282.53125, - 14689.359375, 9697.390625, -9911.71875, - -5677.609375, -18071.21875, -3309.234375, - -22150.73828125, 2619.9765625, -696.84375, - 7673.75390625, 6515.84765625, -9001.9609375, - -6143.58203125, 3907.28515625, 3113.51953125, - -2994.82421875, -9179.0, -3712.16796875, - -1098.6015625, -9625.9765625, 7146.4453125, - 12977.01171875, 962.625, -1820.38671875, - 11585.4375, -7182.90625, -2171.5, - 2650.1015625, -3282.55078125, 2476.96875, - 7399.77734375, -944.03125, -1943.28125, - -8157.875, -4798.1484375, -4465.078125, - -6608.07421875, -10966.6953125, 9912.56640625, - -11816.1796875, 5990.84765625, 1623.4375, - -8555.90625, -2277.78125, 3648.90625, - 7289.30078125, 228.3515625, -2124.953125, - 11673.875, -24856.46484375, -2436.375, - -4465.28515625, -3629.0, -7264.79296875, - 4601.87890625, 502.6796875, 3702.9296875, - 1660.21875, -1631.9296875, 8837.56640625, - 5233.48046875, 7417.74609375, -447.46875, - -6360.51171875, -339.3671875, 4772.5859375, - -7801.671875, -7365.97265625, -5029.7265625, - 438.5546875, 7819.828125, 1960.609375, - -1561.55859375, -7599.140625, 95.515625, - -9236.09375, -8983.14453125, 7394.05859375, - -2482.5390625, -1878.046875, -4222.8984375, - -2085.375, -284.44921875, -6924.18359375, - 3735.11328125, 343.20703125, 15570.125, - 3342.4921875, -76.046875, -8660.71875, - -1030.75, -6112.984375, -5386.91015625, - -4039.359375, 4691.203125, -6576.953125, - 590.27734375, -2175.7578125, -11566.5859375, - 10136.32421875, -1063.75390625, -1352.359375, - -6763.67578125, -12907.26171875, 2229.828125, - -11013.2890625, -1035.55078125, 1018.91796875, - 12846.57421875, -5926.44921875, 2253.1640625, - -4635.17578125, 3467.765625, -4188.359375, - 4067.203125, -1288.1328125, -3875.88671875, - -950.59765625, -3198.140625, 11328.359375, - -3544.1484375, -1380.9765625, -4195.16796875, - -8407.96875, 116.5234375, -502.5390625, - 969.5859375, -704.5546875, -10760.9765625, - -2639.9609375, 5688.6796875, -1133.625, - -9976.5234375, 11711.61328125, 7040.9140625, - 8317.18359375, -5586.875, -10550.43359375, - -1316.5078125, -3777.90625, 1880.671875, - -15901.91015625, -17990.2421875, 3349.5625, - 4640.03515625, -928.1171875, -9915.44140625, - -10891.83984375, -5153.48828125, 3172.04296875, - -4440.078125, 4970.86328125, -3111.03125, - 14921.23046875, 5763.421875, 3155.9609375, - 8387.19921875, 12921.55859375, 6152.54296875, - -6726.19140625, -13117.94140625, 6562.29296875, - 5245.9140625, 3470.55078125, 1818.87890625, - 14334.1640625, 4552.1484375, 1673.609375, - -8740.328125, -5948.015625, 6240.5, - 2225.5, -10216.6953125, -4084.6875, - -840.8359375, -16353.34375, -7649.28125, - 12631.4375, 1587.4921875, -2406.109375, - -4063.33984375, 9568.23046875, -3013.3671875, - -395.6328125, -8160.22265625, 1441.59765625, - 2349.0625, -1870.87109375, 2116.765625, - 5471.3671875, -845.9140625, -7806.69921875, - -13315.26171875, 7654.89453125, -6849.8203125, - -1447.2421875, 11006.9765625, 3549.0078125, - -7940.64453125, -651.390625, 2449.23046875, - -11612.5078125, -4825.2578125, -1821.53125, - 959.03125, -545.25, -9400.484375, - 6108.66796875, 2663.03125, -706.34765625, - -2629.1875, -1022.484375, -12984.86328125, - 14072.359375, 23098.7109375, 3052.45703125, - -41.0859375, 1573.43359375, 6562.58984375, - -2627.125, -3426.9296875, -185.88671875, - 612.8671875, 16329.796875, 1370.453125, - -5655.06640625, 9130.296875, 101.5625, - 7884.375, -3210.58984375, -1926.28515625, - -1710.8203125, -6745.59375, 7764.96484375, - 1580.03125, 2389.96875, -1381.0078125, - 7776.26953125, -4549.83203125, 7112.97265625, - -9298.71875, -20785.9765625, -8823.078125, - 9566.84375, -4722.140625, 7454.09375, - 850.64453125, 3812.0546875, -2700.7265625, - -4336.33984375, 8112.65625, 4531.6171875, - 6650.796875, -4216.984375, -8506.85546875, - -4317.37890625, -1145.66796875, -11296.87890625, - 5204.8203125, -1598.296875, 2750.9375, - 3032.93359375, 6778.8359375, -3821.34375, - -13949.95703125, 18385.8984375, -2378.4140625, - -220.1015625, 13299.015625, 13356.3984375, - -8416.17578125, -14038.96875, -2288.703125, - -3039.8984375, -1997.8671875, -7720.55859375, - -4063.6015625, -858.71484375, -11794.953125, - -1913.80859375, 4524.39453125, -23694.171875, - -3945.5703125, -2117.75, -8834.9453125, - 285.359375, 4930.2265625, 6495.5703125, - -8268.1875, 369.66015625, -429.48046875, - 1957.0859375, -1575.23828125, 10500.0078125, - 9248.8203125, -3071.93359375, -7982.49609375, - 7916.66796875, 5407.3828125, -3904.2421875, - -16488.4453125, 1982.6640625, -14667.61328125, - -7777.77734375, -2872.34375, -5869.4609375, - -14548.390625, -5516.98046875, 1014.4765625, - -1433.54296875, -1040.671875, -5614.8671875, - -5135.34375, -2409.0, -275.4609375, - -4551.515625, -4715.92578125, -2210.42578125, - 12742.51171875, 3009.234375, 11716.2734375, - -8840.125, 4398.01171875, 5164.125, - 7233.58984375, 6825.515625, 3777.953125, - -16350.0390625, -5877.0859375, 6890.828125, - -3945.31640625, 10090.703125, -7610.3515625, - -7862.09375, -6772.046875, -7574.16796875, - 11285.2109375, 22163.0859375, -6031.046875, - 3070.80859375, 1518.4765625, 7759.8359375, - 3908.65625, 266.515625, 5320.06640625, - -12331.171875, -8837.359375, 2575.3359375, - -5747.27734375, 4725.671875, -5721.73046875, - -4180.5859375, -12.625, 10775.859375, - -17559.984375, 292.13671875, 5772.7421875, - 416.203125, 6908.7578125, -11582.796875, - 3431.34375, -1614.9453125, 2586.6015625, - -378.1796875, 6085.4765625, -2742.25, - 3609.5625, -4627.3203125, -3883.984375, - -11184.9765625, -6543.546875, -11115.953125, - 85.953125, -2148.59375, 12473.984375, - 8603.77734375, 4227.1875, 7105.421875, - -3521.66796875, 1922.21875, 10418.81640625, - 8846.9140625, 3344.62109375, 4598.546875, - 7407.3984375, -6894.26171875, -12680.6484375, - -1518.09375, 16812.5703125, 3933.6328125, - 3729.16796875, -7926.2734375, -1955.8125, - 28461.265625, -9026.7265625, 5891.8828125, - 16458.296875, -3486.03515625, -4401.69921875, - 16338.0859375, -13858.65625, 13291.9921875, - 13496.53125, -13820.0078125, 1842.1328125, - -6955.98828125, -3889.6171875, 14457.56640625, - -15553.796875, -5840.2890625, -3868.765625, - 10801.9453125, 10363.20703125, -4606.75, - -6975.46875, -10153.484375, 977.1640625, - 3802.61328125, 2088.0, -3537.6171875, - 108.7421875, -1084.5234375, -2591.234375, - -4303.9140625, 8551.8984375, 14597.66015625, - 2970.9375, 4147.703125, -5381.9921875, - -16397.47265625, -11472.1875, 280.0078125, - -3768.5078125, -7704.59375, -8956.03125, - 5519.7578125, 8167.0546875, 11037.453125, - -1570.7578125, 14250.3671875, 10272.546875, - -5425.3359375, 11643.24609375, -2426.265625, - 4892.2578125, 19824.65625, 1666.54296875, - -10547.578125, 464.9296875, 2108.1171875, - 7262.3671875, 10611.1953125, 1143.96875, - -6395.890625, -6897.984375, -11518.875, - -32710.4375, -16873.03125, -1474.1796875, - 5132.671875, 11077.7890625, -2112.88671875, - 3476.0625, -3752.3359375, 8120.44140625, - 15073.046875, 10518.1953125, 2601.7734375, - -6085.25, -13269.46875, -2706.51953125, - 7057.1328125, -24000.5, 18492.71484375, - 8039.1015625, 8084.265625, 19369.90234375, - 19389.703125, -16648.5625, -16017.28125, - 29001.1796875, -1644.8046875, 22622.71875, - -2700.4296875, -1681.21875, -4275.73828125, - 1988.6328125, 2224.328125, 4453.7109375, - 1646.6875, 2899.6328125, -8171.85546875, - 4022.9453125, 15743.421875, 1008.40625, - -3384.1484375, 935.109375, 332.1015625, - -11478.4609375, 7524.2890625, -2619.296875, - -764.8203125, 17454.71875, 2443.671875, - 438.84375, -6559.484375, -4315.625, - -6879.984375, 4085.1328125, -652.7265625, - 2876.29296875, -1841.140625, -8010.09375, - 5486.828125, 11980.9375, 4271.015625, - 3968.28515625, -510.21875, 15238.515625, - -3403.79296875, 2742.7109375, -791.4140625, - -3753.2890625, -4703.6015625, -5472.0, - 548.78125, -3939.5546875, -6742.453125, - 5339.34375, 627.28125, 18527.0546875, - -845.4609375, -1262.1875, -3234.9375, - 1246.078125, 716.4921875, 4313.5546875, - -2635.5390625, -13462.1875, -1246.6875, - -4036.7421875, -948.265625, 2573.28125, - 4614.828125, -6658.640625, -18560.0703125, - 782.9296875, -4832.90625, 5693.859375, - -8850.66796875, 2135.09375, -6380.01171875, - 15771.7734375, 46.0234375, -2424.484375, - -3911.0234375, -4035.8125, -4634.6640625, - 698.07421875, -475.65625, -9572.96875, - 12654.6875, 8783.015625, 1825.23828125, - 5721.4140625, 9871.734375, 4448.078125, - 10065.875, 5972.703125, 2035.4765625, - 2248.7109375, 5161.15625, 17098.8984375, - -1838.90234375, 1260.26953125, -5686.7890625, - -7981.703125, -8197.43359375, -3073.5703125, - 5291.375, -164.1953125, -2377.23828125, - 7115.6484375, 6566.546875, -5146.47265625, - -7045.0703125, 687.38671875, -334.734375, - 292.265625, -1529.2265625, 523.96875, - 2541.1796875, -2447.08203125, -6492.78125, - -3517.08984375, 1121.97265625, 2085.55859375, - -6812.26953125, 5341.15625, -232.609375, - 2428.4609375, 953.80859375, 11662.859375, - -1529.984375, -3858.80078125, 1984.02734375, - -2653.875, 7585.03125, 1705.6953125, - -5347.2578125, 2012.4609375, -2197.8203125, - -791.99609375, -14345.40625, -2888.140625, - -968.27734375, -4529.9375, 2137.890625, - 8151.6796875, 1663.5, -7382.890625, - 509.3984375, 2692.24609375, -1343.515625, - 2579.34765625, 6477.6328125, -5643.5, - 7490.71875, 5339.90625, 398.59375, - 19148.84765625, 5385.15234375, -7888.328125, - -6231.3125, 6370.3046875, 3748.30078125, - 27366.59375, 3737.0078125, -141.09375, - 2983.1875, -7537.578125, 3329.0, - 321.375, 2421.546875, -3261.84765625, - -7996.20703125, -4521.7421875, 1027.109375, - 1135.11328125, 7555.60546875, -13127.4609375, - -3309.62109375, -9007.546875, 5910.4765625, - 3617.9921875, 5985.68359375, 2087.5078125, - -483.2734375, 1485.7734375, 5268.1328125, - -1628.015625, -513.6953125, 9899.2578125, - 5797.90625, 11219.015625, -16650.765625, - 15685.1484375, 8188.046875, 6794.80078125, - 5627.921875, -882.71484375, -15535.6328125, - 3973.21875, 5469.0703125, 9667.453125, - 2501.99609375, 16297.86328125, 8707.20703125, - -23282.19921875, -5332.09375, 6326.0390625, - -6692.07421875, 15817.2734375, -201.81640625, - 11852.4921875, 12707.3125, -19904.828125, - -4578.953125, 740.078125, 6349.5, - 789.4140625, 6439.14453125, 22869.10546875, - -3846.4765625, -243.22265625, 458.98828125, - -10963.66796875, -3741.57421875, -7264.75, - 11225.671875, -6445.765625, 7794.1640625, - -1251.33203125, -26905.40234375, 29771.234375, - 9311.359375, -10250.37890625, -5514.6328125, - -4214.49609375, 12143.02734375, -7657.19921875, - -13894.3046875, -8814.421875, -2507.203125, - 25584.734375, 1897.1640625, 11131.98046875, - 11588.453125, 3831.62890625, 3881.5390625, - -13001.59765625, 931.9609375, -1575.44140625, - -19699.2734375, -13438.390625, 5342.94921875, - -577.65625, -5000.17578125, -14509.10546875, - -11386.90234375, 5309.31640625, 4983.4375, - 3348.94921875, -2818.0, -5512.53125, - -1063.9375, -13457.5546875, 2918.21875, - 13655.6875, 12356.828125, -2098.19921875, - 7342.3125, 716.2890625, 5595.09375, - 2957.65234375, 7131.09375, 4267.1796875, - 3478.4375, 9214.7734375, 8786.3203125, - 4638.0078125, 5775.765625, 14941.76171875, - 3888.01953125, -125.953125, -7099.640625, - -3408.296875, -4564.515625, 6957.16796875, - 12938.9140625, -2564.90625, 7464.484375, - -3021.10546875, 9892.390625, 8173.71875, - -3933.796875, 1124.8828125, -6487.359375, - -1315.640625, 8530.4140625, -9438.81640625, - 276.984375, 1095.26953125, 500.6015625, - 2646.21875, 5253.25, 1290.38671875, - -1255.22265625, -3900.86328125, 1695.44921875, - -3656.17578125, -3889.53125, -3312.0390625, - -1928.90625, 5330.6171875, 4887.4140625, - 9631.03125, 3339.3203125, 7530.9921875, - -3464.234375, 8662.953125, 11145.8203125, - -19825.234375, -3049.76171875, 3610.4140625, - 4625.3984375, 1536.2421875, -2536.0234375, - -12651.2734375, 4811.9453125, -10887.1875, - -5208.54296875, -19567.28125, -7093.890625, - -12356.25, 7392.015625, 13929.5, - -4504.59375, 1981.515625, 11659.140625, - -2804.28515625, -3196.12890625, -4872.984375, - -12756.90625, -1044.82421875, -4286.1328125, - 2280.484375, 2228.2421875, 931.03125, - 12066.265625, 1431.734375, -6120.0546875, - -10291.25, -8928.8046875, 9849.4375, - 6866.33203125, 7747.90625, -4380.9609375, - -10637.3359375, 5592.5, -22491.078125, - 505.25, 749.9296875, -2243.421875, - -12557.5625, 8611.46875, 3868.05078125, - 2429.2734375, -8287.00390625, -10696.69140625, - 2818.125, 283.328125, -8884.296875, - -6902.28515625, -24780.4453125, -1425.4765625, - -9133.89453125, 629.81640625, 7240.03125, - 5930.48828125, -6708.60546875, -4615.890625, - -4900.15625, 2585.32421875, 12982.59375, - 4165.89453125, -13469.0078125, 11625.578125, - 9218.0, -907.66796875, 967.66015625, - 3833.9453125, -14879.796875, -1226.2109375, - -6183.30859375, -1600.328125, -8160.546875, - 501.78125, 7020.59375, 7278.171875, - -1101.28125, -7820.6875, 11504.8125, - -1239.18359375, 9915.4140625, 3238.375, - -17057.1171875, 1879.75, 12614.0625, - 1629.71875, 2250.5, -9607.27734375, - 11870.12890625, 353.52734375, 7706.046875, - 447.625, -17466.140625, -9475.421875, - -10561.16796875, -30265.6015625, -1005.3515625, - -5807.08984375, -9685.671875, 7652.25, - -1092.4140625, 5419.9140625, 2131.84375, - -15435.03125, 11681.15625, 1771.58203125, - 6564.2109375, -440.98828125, 4417.4375, - 6894.71484375, 2364.265625, 3704.6484375, - 144.62109375, 6733.171875, -7753.515625, - 10769.6875, 4734.07421875, -6474.90625, - -1894.2890625, 18940.875, 5411.3359375, - -1701.0546875, -4464.46875, -7141.890625, - 4141.0625, -1588.79296875, 4800.46875, - -16577.453125, 956.609375, -2264.203125, - -7652.46875, 207.171875, -400.8828125, - 4161.8359375, -2418.875, -3468.390625, - -13110.53125, 14203.77734375, -744.8515625, - 4231.9140625, 4002.12109375, -3077.46875, - 25227.140625, 3765.5703125, 8038.93359375, - 1776.015625, -4532.68359375, 1726.49609375, - 2465.0234375, -7980.08984375, 13761.78515625, - 1819.625, -9701.140625, -733.625, - -7675.671875, -1826.796875, 2496.1640625, - -1594.390625, -3969.40625, 10191.01171875, - -4819.7265625, 12282.296875, 9017.0859375, - -8573.3125, -9778.10546875, -10111.484375, - -2401.0625, -3112.953125, -13274.546875, - 5408.8125, 7896.12109375, 11667.4375, - 24403.4609375, -7912.09375, 314.37890625, - 10327.75, 8776.34765625, 4235.734375, - 6711.4921875, -14406.8828125, 6784.296875, - -6519.76171875, 14212.375, 6220.109375, - 27608.9296875, 7593.4453125, 3904.3125, - 9248.1640625, -13981.890625, 5058.90625, - -14543.671875, -5513.859375, 3505.1015625, - -7756.8984375, -19651.0625, 1024.3125, - 504.7890625, -11177.7734375, -22972.171875, - -4083.62890625, 7045.0859375, -2446.6953125, - 9862.9375, 23714.640625, 4872.96875, - -20384.234375, 10182.37109375, 5519.9375, - -11188.3125, 10013.1796875, 17114.30859375, - 12193.1953125, 1701.234375, -817.21484375, - -13768.98046875, -680.1171875, 5376.4609375, - -14370.09375, -173.3203125, 2564.921875, - -8008.109375, -441.2265625, 21.26953125, - -3007.21875, 7672.0859375, 5494.390625, - 9433.76171875, 5921.796875, -6428.625, - 7255.62109375, -8446.83984375, -16192.0390625, - -2211.3046875, 11083.51953125, 9545.765625, - 3708.828125, -13957.5546875, 4059.3984375, - 3650.3046875, 7100.3515625, 16499.578125, - 1669.2421875, 9555.1796875, 12884.6640625, - 2093.140625, -1835.94140625, -1374.41796875, - -5950.40234375, 4017.515625, 3390.43359375, - -13121.52734375, 20519.6875, -2062.87890625, - -6688.19140625, 8684.640625, -10973.37109375, - -2977.2578125, 294.34375, -9164.4765625, - 8786.67578125, -2113.91796875, 6484.0, - 6456.37109375, -1429.375, -1264.921875, - -1183.93359375, 8894.49609375, 12571.546875, - 11594.8125, 5646.671875, 2339.1796875, - -1230.77734375, 10665.32421875, -27963.71875, - -2481.25, -569.328125, -17389.1640625, - 24555.8359375, 13097.75390625, -10676.625, - -1067.8359375, -3614.1953125, 19755.375, - -2376.4453125, 4768.875, -20084.48046875, - -16052.171875, -15753.921875, 8380.34375, - 3085.58203125, -6037.01953125, -1713.3125, - 6431.99609375, 17545.484375, -6780.859375, - -13343.5, -3884.55859375, 13699.859375, - 12034.65625, 3011.34765625, 2439.921875, - 4916.796875, -6521.671875, -5811.26953125, - 7270.28125, 4507.9921875, -6270.2734375, - 1262.9765625, -8522.0703125, 8030.046875, - 17519.6875, 1205.26953125, 18521.15625, - 6248.19921875, 9219.28125, 5665.3828125, - 7272.5625, -2173.0625, -582.8203125, - 6687.4609375, 4551.9296875, 7422.828125, - 4828.171875, -2750.3125, 7898.7109375, - -3719.203125, 2902.0546875, 8335.4296875, - 10485.875, -2975.6484375, -3187.375, - -1895.53125, -12895.828125, -14948.875, - 7400.359375, -19651.6953125, -569.80078125, - 814.09375, -4042.76953125, -9779.75, - 1722.109375, 20967.5625, -3856.78125, - -1039.8125, 3860.328125, -2591.921875, - 2097.0, -2125.9609375, 11276.7890625, - 2732.03125, -2738.09375, 2830.51171875, - -6867.8359375, -234.5703125, -2953.26171875, - -3184.3671875, 14738.5, -1080.734375, - -9657.9375, 6230.59375, 3911.54296875, - 3295.9609375, -9257.625, 2254.25, - 962.2578125, 11050.203125, -11396.296875, - 8148.921875, -1507.6015625, 7827.640625, - 3185.71875, -1881.625, -9133.58984375, - -5099.703125, 284.51953125, 6888.25, - -17092.2578125, 133.14453125, 4632.8046875, - -1717.15625, 7096.234375, -4785.9140625, - 514.203125, 1285.96875, 19110.375, - -7300.78125, 6244.265625, -2882.3671875, - 1856.99609375, -8376.5625, -1738.515625, - -875.90625, -3114.75, 5360.04296875, - -980.0859375, 852.609375, -3966.328125, - 235.8203125, -862.0546875, 4999.75, - -13071.625, -5480.2421875, 6966.640625, - -3902.8125, -4302.9765625, -653.08984375, - 7876.875, -6693.515625, -3733.234375, - -289.421875, -11971.6796875, -1360.875, - -675.984375, 5671.59375, -14610.8203125, - 8570.30078125, 3665.328125, 4275.75, - -2513.9296875, -1472.265625, 3156.72265625, - 534.390625, 2488.9453125, 2470.9140625, - 1295.12109375, -777.09765625, -1495.5546875, - -14338.453125, 4750.09375, -1081.078125, - 1251.953125, 6225.8828125, 1676.90625, - 7612.625, -262.3203125, 352.2734375, - 9813.40625, 1149.7421875, 7721.4375, - -5659.8125, -154.4296875, 3827.5546875, - -2559.2109375, -11222.93359375, -79.8203125, - 5903.1796875, 1338.8984375, -385.53125, - 10476.8828125, 1638.546875, 4324.50390625, - 17729.609375, 4747.640625, -3173.46875, - 13513.421875, 2492.375, -2551.2578125, - 498.1953125, -4478.2734375, -16443.8828125, - 4882.70703125, -6011.15625, -1403.09375, - 3639.84375, 18579.0078125, -4174.703125, - -9850.80859375, -5583.8828125, 594.5625, - -3124.2578125, 3336.8359375, 4643.3671875, - 1671.703125, 2694.15625, -2314.69921875, - 2100.859375, -1732.3203125, 2856.265625, - 2745.546875, 7748.875, -10709.59375, - -6553.37890625, 4115.96875, 2338.328125, - 12768.125, 16234.6875, 4525.515625, - -9922.31640625, -3237.796875, 18431.2890625, - -13625.453125, -4400.3359375, 58.4375, - -3917.171875, 11517.00390625, -1319.6640625, - -761.9375, 12217.7265625, -8962.0078125, - -18322.59375, -2329.05859375, -1180.4296875, - 895.2890625, -1766.9375, 2164.08984375, - 3918.17578125, 4184.86328125, -4776.48828125, - 7033.203125, 3313.7578125, 7890.40625, - -13953.0546875, -2433.1171875, -4075.83984375, - -4460.5234375, 838.0078125, -1421.3671875, - -3463.3671875, -2584.3125, 1757.41796875, - 24595.6015625, -6924.19140625, -271.8125, - 15903.9765625, -1453.8828125, 20838.84375, - -9244.703125, -2103.765625, 7765.484375, - 5413.42578125, 7506.359375, 2880.28125, - 29279.30078125, 7979.65625, -6224.40234375, - 5157.85546875, 630.140625, -16211.8359375, - 22367.86328125, 7064.62890625, 1204.62109375, - -15228.2109375, -13565.18359375, 2028.84375, - -5838.41015625, -5411.640625, -16649.984375, - 8547.8125, 7393.8125, 5424.0, - 5512.734375, 11909.40234375, -82.25390625, - -7977.4453125, -682.234375, -5321.2421875, - 12276.09375, 7001.203125, 15958.7265625, - 1285.1171875, -7062.484375, 9912.7578125, - 8967.45703125, -5502.7734375, -1598.484375, - -5072.390625, 950.578125, -8200.66796875, - -8424.34375, -7802.0546875, 5430.5546875, - -7702.03125, 8835.23828125, 7852.140625, - 2808.125, -647.58203125, 8129.296875, - -4213.203125, 3132.59375, 740.546875, - 1098.32421875, -3845.265625, -6708.79296875, - 4068.3984375, -2741.78125, -5762.47265625, - -1725.3984375, 5974.4609375, -14961.0390625, - -1458.94140625, -8496.7578125, 7129.46875, - -1626.984375, 6124.140625, 4134.234375, - 2241.34375, -9097.765625, -9889.6328125, - 1546.12890625, 1638.8125, 6764.51171875, - -3319.1171875, 2561.7265625, 4935.7890625, - 2978.765625, -1150.890625, -14389.4375, - 150.71875, -8544.8515625, -2437.41796875, - 3513.13671875, -8318.9921875, 54.96875, - 2971.5625, 11480.28125, -233.109375, - 2987.078125, 4392.20703125, 2416.79296875, - 3324.03125, -975.484375, -5620.296875, - -389.55078125, 16311.58984375, -2282.97265625, - 3070.4609375, -8744.72265625, 4059.7109375, - -6936.25, 4303.296875, -757.1015625, - 13622.4375, 828.51171875, 8379.6015625, - -2383.0390625, 1603.0546875, 5994.40625, - 1890.5390625, 11634.609375, 1595.78125, - 2299.04296875, -5873.05078125, 3446.4296875, - 8097.484375, 10547.2890625, 6057.0, - -20792.11328125, 3600.39453125, -1096.12109375, - 7444.12109375, -2824.27734375, -2802.2109375, - 10698.6015625, -2673.015625, 1608.5625, - -11471.234375, -3147.2109375, -8355.859375, - 6088.41796875, -762.39453125, 14578.9765625, - -5991.8359375, -995.3828125, 14173.8828125, - -5773.375, 8006.4140625, -5061.6328125, - 2783.2578125, 656.3125, -2859.8828125, - 2315.07421875, -8777.4609375, -2673.6640625, - -12291.44921875, -16684.79296875, -991.6484375, - -7054.515625, -11614.05859375, 3319.88671875, - 151.890625, -5832.453125, -79.3125, - 20842.25, -781.0546875, 3972.15625, - -13818.140625, -1239.3046875, 1381.71484375, - 6287.25, -11609.984375, 8164.265625, - 5331.2265625, 2893.375, 4802.2265625, - -8094.0234375, 5477.0078125, 6833.4140625, - -933.2265625, 3232.6953125, -6951.3515625, - -16735.640625, 380.46875, -1088.6796875, - -15351.96875, 6958.60546875, -6058.69921875, - 3093.390625, -6520.9296875, -2702.8515625, - 13593.109375, 13628.6875, 9461.8671875, - 2682.1171875, -2320.25, -932.31640625, - -15025.40625, -2044.375, 4244.140625, - 7379.75390625, 641.87109375, -1650.66796875, - -1114.3515625, 7999.1796875, 9997.46875, - 548.90625, -11197.2734375, 1662.6171875, - 489.6015625, 16371.84375, 452.7734375, - -7083.34375, 1966.3359375, -284.578125, - -16579.421875, -6700.8203125, 10412.7265625, - -3228.01171875, 6608.7890625, -18354.6484375, - -13151.015625, -8885.8828125, -10469.375, - -3209.2109375, -11451.1328125, -13359.8125, - 10884.8046875, 5010.8671875, -2712.08984375, - -8102.4375, -2935.1640625, -960.90234375, - -3290.7265625, -859.0703125, 1282.35546875, - -4430.3203125, -5815.296875, -12855.765625, - 6333.2265625, -11572.9375, 6082.40625, - 629.703125, -7342.703125, 5653.55078125, - 3114.2734375, -12524.203125, 13826.95703125, - 9022.40234375, -576.70703125, 3419.6171875, - 165.59375, -2129.5390625, 663.5390625, - -1705.1015625, -2009.3671875, 9680.078125, - 6779.5, 14284.5546875, -8993.85546875, - -1750.5234375, -2326.2890625, 7837.1640625, - -2182.640625, -14547.03125, -1572.89453125, - 4515.53515625, 3687.17578125, -4047.015625, - 3275.4765625, 2492.234375, -5352.5859375, - 18940.0234375, -6248.40625, -6669.7265625, - 7952.03125, 5315.296875, 12697.6796875, - -2371.28515625, 7243.421875, 14036.54296875, - 20280.5859375, 15155.67578125, 344.84375, - -11353.3671875, -16006.5, -5063.203125, - -892.84765625, 6665.74609375, 2098.36328125, - 1800.8984375, 9632.8515625, 7932.9765625, - -575.328125, 8849.96875, -10631.9140625, - 1844.75390625, -4625.1171875, -5171.26953125, - -4704.4921875, 8170.98828125, -10083.2734375, - -8095.23828125, 4714.546875, 488.91015625, - 1666.37109375, -3090.90625, -136.51171875, - 11007.96875, -1459.76171875, -17037.48046875, - 6837.6328125, -2228.8125, 937.05078125, - 4339.2890625, 10913.03515625, 2359.75, - 16119.359375, -282.3359375, 8906.34375, - 10394.359375, 6756.1640625, 4426.3203125, - -1141.0, 6795.30859375, 4974.21875, - 609.4921875, -9457.91015625, 11616.234375, - 17112.96875, -766.4296875, 12367.30859375, - -8268.0234375, -473.53125, -150.5546875, - -71.5078125, 3449.27734375, -3411.359375, - -15894.49609375, -9523.0390625, 4127.32421875, - 8066.58203125, -4903.9375, -10007.765625, - 4829.234375, 6373.2109375, -4196.08984375, - 1996.2890625, 5675.1640625, -16962.8515625, - -4291.1171875, 13516.4375, 4272.7734375, - 10452.2734375, 221.4765625, -8376.94921875, - 3884.04296875, -13680.49609375, -995.9453125, - -988.515625, 4983.1640625, -8876.140625, - -5271.40625, 8450.859375, 8768.3828125, - -842.94921875, -393.25, -6584.8515625, - 5540.890625, -12413.8984375, -6095.22265625, - -2593.3984375, -2352.09375, -1204.1328125, - -3392.4921875, -8748.140625, 938.01171875, - 5449.1875, 2456.3515625, -2170.9765625, - -3554.8828125, -12068.328125, -1591.453125, - 9366.2578125, 4583.2890625, 6830.65625, - -5076.9375, 12753.3203125, -4840.3984375, - 4507.0078125, 10894.33203125, 5906.58984375, - -1326.16015625, -694.453125, -3275.14453125, - 795.71875, 4231.4375, -5017.18359375, - -1668.515625, 11750.8203125, 7094.09375, - -16883.97265625, -3552.4609375, 10126.734375, - -3304.18359375, 2887.9765625, 2260.609375, - -7710.6171875, 19725.13671875, 3833.78125, - 3956.875, 6038.6484375, 6618.3828125, - 2991.0078125, 5415.1484375, 11527.0546875, - 14446.34375, 105.76953125, -4970.16796875, - 9447.2578125, 4643.1953125, -1209.8359375, - 11810.1875, -499.3359375, 2133.76171875, - -6175.21875, 5424.0078125, -5595.375, - 2610.83203125, -4111.75390625, 2787.265625, - 600.18359375, 4306.8125, -544.3515625, - -1554.75390625, -424.5546875, 5490.6875, - 2123.578125, 6472.546875, 1113.0703125, - 2698.65625, -664.609375, -1328.59375, - -3784.7890625, -1841.0234375, 277.015625, - 1635.3828125, 13664.29296875, 3390.8359375, - -2956.4453125, 1854.234375, 6005.421875, - -3112.7109375, -2076.1484375, 1046.9140625, - -1657.68359375, 1311.4453125, -7501.71875, - 13019.12109375, 1382.6953125, 12519.5859375, - 4681.4921875, -1320.609375, 9237.27734375, - -13302.4609375, 9246.96484375, -389.5390625, - 19451.375, 1672.8203125, 1004.7265625, - 23019.87109375, 1393.890625, -635.6484375, - 24975.234375, -3164.7890625, 11959.3671875, - 4003.8359375, -11149.70703125, 22408.48046875, - 1368.0625, 814.4375, 14261.4609375, - -11364.2265625, -5010.9765625, 4.75, - -8875.35546875, -4904.296875, 2019.8203125, - -996.90625, -1555.47265625, 3566.03125, - 9232.53125, -9988.1484375, 11116.04296875, - -1790.3203125, -4879.9375, -6237.984375, - 594.9609375, -1225.875, 10459.1484375, - 4758.34375, -5040.359375, 5781.0, - -20138.6328125, 1537.0078125, 5198.546875, - 1862.9765625, 11493.6015625, 8973.9921875, - -9875.0546875, -4703.9375, -3587.64453125, - -1671.3671875, 2100.5546875, 1733.0078125, - -2520.45703125, -7540.859375, 3640.8203125, - -3340.4765625, -5916.421875, -2089.46875, - 9250.9453125, -1944.5, -672.828125, - -7404.2890625, -3259.53125, -953.609375, - 3904.0390625, -1399.34375, -4912.125, - -7254.609375, 4433.2265625, 3744.46875, - 1356.7421875, 22952.26171875, -9881.234375, - 2680.0703125, -3900.8671875, -7619.921875, - 5824.6953125, 11567.8671875, -15396.75, - -12459.62109375, -5986.4140625, 8848.953125, - 3803.23828125, -1053.4921875, 10295.15625, - -197.91015625, 3634.27734375, -6461.18359375, - 5666.1875, -13615.81640625, 3195.046875, - -13717.6171875, 3723.0859375, 2102.40234375, - -7264.140625, -3175.8515625, -2076.1015625, - -7174.03125, 10119.48046875, -240.578125, - -1976.6171875, 1509.55078125, -9427.3671875, - -5357.98046875, -943.77734375, -7859.5625, - -5814.515625, -96.75, -887.296875, - -1537.703125, 847.3984375, 7351.59375, - -9543.5625, -13519.3046875, 7203.2890625, - 278.1796875, -1370.35546875, 5458.328125, - 1716.4296875, -1165.546875, 2868.2890625, - 553.64453125, -3991.765625, 1061.16015625, - -1636.265625, 5827.8125, 6437.46484375, - 11378.078125, 5584.65625, 3267.453125, - 17546.2265625, 8222.42578125, 13030.984375, - 4406.59765625, 5423.1171875, -2526.0859375, - 16651.84375, 7852.59375, -12349.5859375, - -3112.7421875, 13482.9609375, -53.19140625, - -26403.671875, -2723.59375, 404.6875, - 4252.4296875, -1958.76171875, 8701.7265625, - 15984.05078125, -23018.796875, -1537.38671875, - -1645.0078125, 3921.40625, 7581.9765625, - -3925.0078125, 7903.79296875, 12537.34375, - -11920.4921875, 6919.65625, 4673.6953125, - -12304.8125, -15338.625, 668.3671875, - 3293.0078125, -3343.9453125, 12289.69140625, - -1119.046875, -3335.671875, -6915.98046875, - 21172.390625, 594.9765625, 2055.171875, - -635.453125, 2089.515625, -9742.9609375, - 5485.66796875, -6924.4921875, -11584.7578125, - 16144.0625, 17627.953125, -19469.03515625, - -17396.2109375, 1023.42578125, -545.859375, - -3647.515625, -3123.546875, 7817.2265625, - 4794.07421875, -12413.28125, -1835.3515625, - 2426.578125, -5942.375, -9636.05859375, - 186.59765625, 300.1484375, -3882.25, - -545.6796875, 4196.46875, 5628.265625, - 10212.3828125, -4758.4453125, 4626.0703125, - -2015.15234375, -404.75, -2299.4140625, - 7549.796875, -2400.9765625, 8514.671875, - 1694.34375, -2671.0390625, 9203.1484375, - 13855.84375, -9206.484375, 12808.5859375, - 11131.1484375, -23793.21875, 5785.3125, - -1384.578125, 2014.0, 13691.6953125, - 9469.9140625, -2189.6015625, -2556.18359375, - 4671.5078125, -59.66015625, -6176.64453125, - -5510.578125, 3482.07421875, 3525.0390625, - 347.00390625, -1809.421875, 118.484375, - 6539.43359375, -129.71875, -8427.59375, - -1823.296875, 249.1328125, 4173.2734375, - -3058.3359375, 2530.484375, -4019.0625, - -7459.671875, 5810.4765625, -724.9375, - -5690.046875, 9965.68359375, -1428.984375, - -9274.4765625, -3265.8515625, 9360.10546875, - 1074.1015625, -12638.5546875, 11669.265625, - -10909.10546875, 13658.765625, 3489.8125, - 2948.3046875, -3693.1484375, -4230.1328125, - -5477.1953125, 6838.3828125, 321.08984375, - 57.90625, -1314.13671875, -3767.546875, - -3494.7265625, -1059.2421875, -85.30078125, - -2433.28125, 4821.796875, -4755.140625, - -6560.4765625, 4720.3046875, -1391.953125, - 5177.4453125, -3985.3203125, -1767.59765625, - 8501.171875, 13232.9453125, 165.8046875, - 4017.125, 7551.34375, -4027.6015625, - -7245.7734375, 21011.7265625, -4847.03125, - 11301.78125, -7472.80859375, 2612.96484375, - 4441.24609375, -4372.42578125, 1017.09765625, - 8609.1875, 19020.4453125, 11483.1796875, - -4087.76953125, -11633.01953125, -1329.984375, - 8745.5, 4753.046875, -1634.7421875, - 7382.97265625, -3149.375, -12555.734375, - -813.28125, -1366.0, -4602.04296875, - 13105.2109375, -3672.1328125, 2602.3515625, - -10363.953125, 2952.08984375, -6622.8984375, - -27970.38671875, -4711.08203125, 7575.10546875, - 3639.85546875, 5407.53125, 14725.9609375, - 3020.359375, 2728.390625, 7804.0625, - -11422.9296875, 2591.953125, 3398.7265625, - 2582.05859375, 18131.94140625, -12362.140625, - 896.1171875, -11802.09375, -6599.05078125, - 17194.86328125, -9837.55859375, -1632.97265625, - 15363.109375, 4149.5, 15612.09375, - -13620.50390625, 1410.0390625, 3769.359375, - -9093.25, 3373.921875, 12653.10546875, - 773.3359375, -12635.5625, -8667.0078125, - 7956.39453125, -4298.0390625, -3104.6640625, - 10496.421875, 6905.51171875, -11827.9375, - -5689.7890625, -3637.79296875, -5067.859375, - 963.7734375, -9421.47265625, 703.8125, - -12667.21875, -163.7421875, -18769.23046875, - 4787.28515625, -7777.59765625, 13248.375, - -6445.9375, -1964.40625, 3864.0625, - -5636.265625, -1294.6875, -559.359375, - 1989.3046875, 7353.7421875, -591.41015625, - -2702.046875, -3212.98046875, 5060.41796875, - -2750.921875, 3727.671875, 2430.921875, - -1117.953125, -15848.55078125, -705.39453125, - 5445.890625, -117.13671875, 1448.828125, - 2594.7734375, -37236.5078125, 998.2421875, - -20178.96875, 1129.4453125, -11310.7265625, - -13623.8125, 13038.3671875, 1652.6640625, - -10020.33984375, -9512.9296875, -799.44140625, - 7278.04296875, 6294.8046875, -1965.49609375, - -780.015625, -1053.109375, 6074.140625, - 1364.5, 13749.5, 4193.203125, - -7430.953125, 10105.0, -7433.140625, - -3968.75, -18321.796875, 5268.859375, - 7710.80078125, 883.46875, 7134.6171875, - -2393.48046875, 9415.9375, -10298.7578125, - -2043.3671875, 288.83203125, -10994.80859375, - 1809.05078125, 13861.65234375, 4975.125, - 733.2421875, -5322.796875, -3669.9765625, - 3372.390625, 29915.8515625, -6281.87890625, - -14195.234375, -914.7109375, 5780.296875, - 14674.77734375, 6092.109375, -1578.98828125, - -14497.14453125, -1944.3203125, -6590.1875, - -7492.28125, 19479.90625, 8378.4375, - -862.41015625, 6383.8984375, -6955.8515625, - -13323.203125, -1522.109375, 7857.56640625, - -10239.40625, 8428.92578125, -2914.0390625, - 13414.203125, -6338.2890625, 8217.5234375, - 3361.5078125, -4977.22265625, -9653.609375, - -7017.296875, 1311.203125, -7368.1796875, - 4627.546875, -5501.234375, 10513.84765625, - 2174.80078125, -344.8671875, 4835.45703125, - 1395.484375, 5259.46875, 2980.8671875, - 13225.60546875, -3434.08984375, 15623.8515625, - -3588.05859375, -8844.03515625, -1130.3203125, - 6869.15625, -1615.20703125, 10458.6796875, - 3595.8359375, 1857.93359375, -1867.265625, - 900.0625, -7620.0, -4087.9296875, - 16672.296875, -5351.1171875, -8729.7265625, - -1368.3984375, -23472.4375, -14181.39453125, - 4804.7265625, -4828.40625, 3472.890625, - 5881.41796875, 9056.7890625, 1134.6171875, - -5504.6171875, -6101.4140625, 792.15625, - -175.77734375, 2701.76953125, -6273.1015625, - -11083.97265625, -8769.9296875, 3002.8828125, - 3069.140625, 20557.30078125, -15284.96875, - -1922.9921875, 7107.80859375, -8653.578125, - 5351.234375, 2295.609375, -7089.6484375, - -2351.1171875, -753.46875, -5275.47265625, - 6186.48828125, -3666.2421875, -7281.75, - 3688.41796875, -5040.984375, 56.28125, - -16644.26953125, 4361.8671875, 795.75, - -6224.8828125, -7730.328125, -1266.3515625, - 290.9921875, 11581.90625, -4790.37890625, - -1630.8203125, 3265.6015625, -12572.39453125, - -5494.62890625, 6875.5078125, 2629.859375, - 12745.3046875, 7246.125, -2454.9140625, - -6938.4375, -6402.2421875, 13709.234375, - -1558.5078125, -4869.6796875, -6969.7734375, - 3311.0078125, 10183.9453125, -443.5625, - -20155.8359375, -11022.875, 1052.8046875, - 10692.265625, 26694.8359375, 13114.1328125, - 12053.73828125, 15734.33984375, -21646.8515625, - -1006.55078125, 5931.125, 1398.265625, - 3631.421875, 6026.0625, -14572.91796875, - 5372.83203125, -16930.42578125, 7545.7421875, - 1732.9296875, 13014.4375, 24789.00390625, - 2638.7734375, -257.58984375, 6189.87109375, - 11856.29296875, 1373.234375, 2231.78125, - -580.03515625, -31441.3359375, 469.69140625, - -3237.23046875, -8528.6875, 3610.484375, - -2421.453125, -5833.1328125, 16358.1640625, - 2496.56640625, -4472.55859375, 12248.55859375, - -11697.5078125, 28843.296875, -2631.96875, - -12104.40625, 12733.265625, -10754.6328125, - 2199.078125, 9333.3125, -2064.421875, - -1742.11328125, 10275.6171875, -7570.8046875, - 14353.08203125, -15915.30078125, -1322.2421875, - 8566.6953125, -381.78125, -27188.640625, - -6066.671875, 3782.7890625, -8275.02734375, - 21692.16796875, 3095.78125, 315.6953125, - -895.203125, 28723.93359375, 4441.26171875, - 10762.0546875, -6240.4140625, 11276.3828125, - 22545.8515625, 7421.76171875, 8070.2578125, - -4739.66796875, -10378.7421875, -1653.32421875, - -9721.3671875, 2527.25, -1925.6484375, - 3242.7734375, 1783.84375, 10428.34375, - 6891.41015625, 15.5859375, 3681.6796875, - 15623.5, -4042.2578125, 439.546875, - -4168.8359375, -1412.0, -917.3125, - 8772.0859375, 8521.1953125, 3038.6796875, - 138.96875, 486.078125, -3938.625, - -7729.94140625, -15171.36328125, -6799.8515625, - -9758.1640625, -11001.2265625, -3995.78125, - -353.6640625, -3988.3828125, 1415.2578125, - 2303.578125, -4134.6875, -2032.90625, - -1166.10546875, 2532.4375, -621.9375, - -4538.0390625, 7959.828125, -9613.625, - -2690.765625, -756.4296875, -11114.9609375, - -2068.5703125, 1204.3828125, 8087.71875, - 2840.625, -5494.671875, -11629.21875, - 5267.75, 16955.890625, -10661.76953125, - 7241.8671875, 9222.4296875, 3488.73046875, - -2405.71875, 4917.7890625, 3815.15625, - 12742.09375, -1140.37890625, 4615.4765625, - 16253.78125, -181.734375, -14302.765625, - 9188.4453125, 13764.4140625, 5666.8359375, - -2192.8515625, -25758.671875, 19200.58984375, - 3967.890625, -4610.484375, -6866.59375, - -11664.9375, 6376.765625, 15622.375, - -3873.375, 17474.796875, -16159.203125, - -156.1953125, -5294.90625, 6328.84375, - -5041.74609375, 13033.8828125, -3940.6328125, - 32573.8984375, -8822.6953125, 4121.4921875, - 1923.0625, -11726.59765625, -1101.5390625, - -11214.2421875, -11658.7578125, 12624.4296875, - 3820.7421875, -3376.5078125, 3114.42578125, - 13540.890625, -1458.828125, -8341.8671875, - 4832.8046875, 2144.1328125, -10595.71875, - 3357.5703125, -12589.875, -723.703125, - 19271.2578125, -1130.0234375, -6856.453125, - -4909.125, -12612.5625, 4925.09765625, - 5943.7109375, 10239.6015625, 4826.3203125, - -6866.91015625, -2798.48828125, -2108.234375, - 176.0390625, -37.07421875, 2436.6015625, - -3339.890625, 491.453125, -5757.9375, - 3033.640625, 4802.84765625, -89.14453125, - -550.64453125, 3181.921875, 10496.50390625, - 4487.6015625, 6210.875, 10133.109375, - 857.48828125, 6134.7265625, 443.4296875, - -16234.53125, -1467.34375, -409.06640625, - 8004.921875, 3277.4921875, 3907.4296875, - -16368.671875, -12452.859375, 5704.453125, - -9024.796875, -226.9140625, 6053.1328125, - 3499.671875, -4378.4921875, 18613.171875, - -10616.84375, 2294.9296875, 4799.828125, - -3422.625, 1280.9296875, -5777.2578125, - -3836.0625, 7998.75390625, -247.77734375, - 14719.27734375, -1943.30859375, -4380.7265625, - -8382.68359375, 3321.28125, -6260.0546875, - 14700.12890625, 4438.6015625, 666.6328125, - -6225.83203125, -4638.3125, -4006.8984375, - -8421.62890625, 375.15625, -16195.78125, - 8980.9140625, 14509.765625, 12727.71875, - 9479.0390625, 1337.421875, -18508.19140625, - 8584.15625, 6845.42578125, 9637.9296875, - -102.109375, 14650.60546875, 750.7734375, - -17884.640625, -7254.2265625, -581.6328125, - 16266.7265625, 21324.1796875, 15060.2578125, - -1357.640625, -6382.5859375, 10726.1171875, - -3273.046875, -9168.828125, 2208.1953125, - 1425.80859375, -7421.296875, -9705.609375, - -15272.671875, 5517.6875, -25816.703125, - -12326.015625, 3163.828125, 1401.3203125, - 4004.5390625, 3954.3671875, 3725.90625, - -4757.09375, -7127.10546875, -444.875, - -804.09375, -13460.5390625, -9146.828125, - 10989.171875, 474.0859375, -16668.140625, - 19883.8515625, -27690.6171875, -1553.6328125, - 16344.421875, 6119.5234375, 2299.3515625, - 12944.1953125, -1595.140625, 10325.76953125, - 5619.5, 30182.03125, -4793.5078125, - -209.6640625, 9220.8671875, 14029.5, - 10455.0078125, 23949.07421875, -7767.890625, - -29263.1640625, -18788.796875, -3105.9453125, - 3268.578125, -8766.3359375, -32841.29296875, - -17405.55078125, -18640.4765625, -9459.34375, - -12586.671875, -3319.86328125, 537.1796875, - -8883.1171875, 653.36328125, 13392.59375, - 4634.59375, -9316.8515625, 5468.2265625, - 8952.875, 24231.21484375, 8238.5, - -4299.2890625, 12514.9375, 6283.171875, - -3442.78125, -15586.52734375, -6692.3359375, - -3466.6875, -8778.546875, 5155.1953125, - -10870.4296875, -9065.859375, -10359.60546875, - -3431.1875, -7679.3984375, -1026.84375, - -7406.21875, -9559.9140625, 10576.0625, - -3709.875, 22871.96875, 14247.7265625, - -1858.9140625, 8325.2890625, -10348.0625, - -3228.453125, -9278.6484375, 9600.5078125, - 15534.59375, 9297.984375, -2929.5859375, - -1052.046875, -126.88671875, -498.1484375, - -2434.8828125, -1918.109375, -4703.87890625, - -7352.59375, -12536.0390625, 9950.953125, - 9843.89453125, -8525.25, 8518.4453125, - -3388.6796875, -14957.859375, -715.390625, - 6892.8046875, -7721.015625, -1640.2421875, - -6484.9921875, 310.1953125, -12299.203125, - 6200.7890625, -2565.3984375, 3365.9765625, - 328.5078125, 4730.60546875, 16191.125, - -5153.953125, -9818.1171875, -8876.734375, - 17904.625, -7021.53125, 11971.5859375, - 6978.015625, -9729.7890625, -1756.125, - 2227.0859375, -7640.77734375, 386.0234375, - -565.8515625, -813.8671875, -9711.265625, - -1822.3046875, -4064.0234375, -16284.6484375, - -6472.3125, -3376.3046875, 2337.765625, - 22863.65625, 21583.4375, 10071.046875, - 4668.4921875, -539.796875, -11531.984375, - -20293.1875, -4578.0859375, 1966.7578125, - 2647.18359375, -1496.0859375, 460.015625, - -444.5703125, 3973.25, -3042.125, - -9858.76171875, -2433.9921875, 1940.84375, - 8420.5078125, 2951.45703125, -6973.5859375, - -6686.75, -7184.890625, 5420.3515625, - 5734.4140625, -1363.6015625, -11420.015625, - 15122.984375, -8592.078125, 5397.4375, - 4325.4453125, -6014.5, -7417.078125, - -4871.703125, 6710.609375, 5824.109375, - -3977.671875, 4583.390625, 4373.3359375, - -3663.703125, -208.0625, -1234.2109375, - 8587.03125, -2135.6015625, -228.7890625, - -6609.21875, 2772.83984375, -6485.890625, - -3475.0, 106.953125, 4034.8125, - -1641.8515625, -4875.0703125, -8435.5703125, - 2792.9375, -1760.0625, 3325.15625, - -1028.875, 10698.8125, -5616.765625, - -3780.890625, 5210.546875, 8883.85546875, - 2528.3359375, 3356.21875, 6199.25, - 2295.4140625, -12970.85546875, 3699.46875, - -8697.5546875, 1488.7734375, -7098.578125, - 4747.140625, -5283.3359375, -5101.7265625, - -8264.59375, -5109.1328125, 3808.171875, - 8932.796875, -1308.27734375, -4469.5625, - -12329.28125, -5747.515625, -3069.7578125, - 8935.18359375, 658.96875, -527.203125, - -1130.6171875, -3990.9140625, 1682.109375, - 3122.5390625, -673.21875, 10.96875, - 3311.46875, -5348.90625, 3636.859375, - 416.88671875, 3595.40625, 6418.68359375, - 1268.03125, -3902.5390625, -2990.8046875, - 7463.328125, 495.0625, -4980.71875, - -339.15625, 5011.75390625, 402.46875, - 7072.9765625, -2437.6171875, -4078.2734375, - -5606.19140625, -6.75, -1357.2421875, - -7362.8828125, 5240.875, -2247.42578125, - -1731.25390625, -5886.9921875, -4816.796875, - -641.8125, -1992.7109375, 2101.6484375, - -14473.5859375, -16996.46875, -10091.08984375, - -5453.33203125, 3825.828125, 191.875, - 3466.1640625, -490.46875, 10859.21875, - 6857.21484375, 7617.51953125, 9906.015625, - 2412.44140625, 4143.171875, -1292.9921875, - 7989.5703125, -4816.390625, -3160.62890625, - 1623.9375, -10592.8828125, 6031.6328125, - 6887.0546875, -3357.7265625, 3635.16015625, - -5274.1875, -1806.5234375, 7370.6875, - -10311.96484375, -7845.4296875, 1168.71875, - -5584.4765625, -1253.95703125, 5689.359375, - 6993.62109375, -2606.078125, -232.015625, - -740.06640625, -11132.3359375, -448.203125, - -11850.53515625, 5439.078125, -1212.58984375, - -6881.3515625, -5796.17578125, 1689.62890625, - -10373.4140625, 2470.5, 6321.15234375, - 2739.51171875, 3521.87890625, -1803.9609375, - 9874.0, -15114.76171875, -5528.53125, - 8575.6796875, 8263.0546875, 2226.5390625, - 4421.4609375, 10734.4921875, -1140.1640625, - -5327.41796875, 1078.8359375, 13313.1953125, - -13771.77734375, 3735.11328125, 1596.5078125, - 5727.875, -10507.953125, -6797.6484375, - 5153.0625, 1140.109375, -7495.796875, - 2658.75, -11314.01953125, 96.890625, - -4383.13671875, -18867.6015625, 14182.234375, - -3272.7421875, -7186.68359375, 2976.1640625, - 1503.421875, 336.5859375, 9593.640625, - -3570.3046875, -6545.41015625, 5660.6328125, - 5682.046875, 7867.8359375, -1649.8125, - -5951.9453125, -14353.1015625, -357.46875, - -12037.6171875, -6475.45703125, 1819.1796875, - -2266.9375, -825.6015625, 7087.8203125, - 15496.1796875, -18240.7421875, 16090.03125, - 5526.09375, 5665.734375, -520.53125, - -3502.1484375, -6350.53515625, 15351.75, - 4666.140625, 6758.7265625, 1567.6953125, - 4871.109375, -2920.33203125, 2542.73046875, - -2582.734375, -20996.8515625, -11688.5625, - 2019.00390625, -23224.6484375, -11844.1875, - 11832.875, -20272.95703125, 10194.6171875, - -5143.79296875, 22375.5703125, -12505.875, - -19433.234375, 3475.09375, 7385.0234375, - 21075.21875, 5912.42578125, -2512.0234375, - -1821.40625, -2570.328125, 10886.1171875, - 3523.35546875, -72.453125, 2139.59375, - 22295.078125, 4477.046875, 6058.0390625, - 22970.7421875, 79.4765625, 4722.84375, - 7898.96875, -221.3125, 8425.72265625, - 6444.7578125, 11128.2890625, 12479.58984375, - 10751.140625, 1666.4296875, 1382.078125, - -736.03125, -8608.25390625, 630.546875, - -1941.5703125, -2691.09375, -1652.38671875, - -968.734375, 6044.515625, -10528.82421875, - 3032.1953125, -4722.59765625, -4056.01171875, - 1967.09375, -7328.42578125, 3333.09765625, - 7503.11328125, -16748.984375, 546.92578125, - 6709.8984375, 5976.625, -986.5625, - 5632.90625, 6256.9375, -5936.5078125, - 4566.7890625, -16044.8125, -8164.17578125, - 902.24609375, 126.65625, 6326.015625, - -1129.6640625, 5857.75, -6834.8515625, - 1724.21875, 540.2421875, -4199.52734375, - 9300.10546875, -15607.171875, -1335.94140625, - 6467.7734375, -11.2890625, -6573.73046875, - 5096.75, -4297.9375, 1955.77734375, - 6332.23046875, -189.56640625, 6463.3515625, - 6438.3515625, 3456.640625, -6752.546875, - 592.421875, 435.26171875, 6599.84375, - 17442.265625, -519.46875, -4667.94921875, - -7761.1953125, -2343.5234375, 1669.46484375, - 18788.21484375, -7351.734375, 365.1171875, - 5687.6015625, 2135.734375, -12857.765625, - -1809.015625, 373.14453125, 1138.63671875, - -5689.890625, -6141.1953125, 8421.0859375, - -3876.109375, -2845.578125, 2628.7421875, - -9606.93359375, -3465.85546875, 3569.25, - 3124.3125, 2611.8515625, -2704.7890625, - -7531.71484375, -7212.3046875, -3744.1015625, - 362.875, -222.1171875, 4677.2109375, - -427.375, 1006.765625, -9275.72265625, - -2074.9609375, -12593.08984375, 12831.6484375, - 19600.6484375, 3489.59765625, 1529.1171875, - 25613.578125, -15604.75, -1924.515625, - -3804.015625, -13705.40625, 993.5390625, - 9263.234375, -11154.171875, 4625.40625, - 2852.7109375, 10195.9296875, 5549.4453125, - -1963.9375, 5084.55078125, 17889.6328125, - -8841.09765625, 3454.9140625, -8434.25, - -3152.91796875, -4356.46484375, -4443.7109375, - -2861.0, -1917.921875, -9182.5, - -12893.578125, 13268.9375, -6077.453125, - -11199.1640625, 2388.8125, 2039.35546875, - 18686.78515625, 3118.45703125, -823.9296875, - -8832.265625, -1684.1796875, 7269.6875, - -9829.234375, 618.0546875, -7445.85546875, - 1927.953125, 6466.875, -3175.9296875, - 2995.2578125, 21204.78515625, 432.5, - -13485.75, 3457.77734375, 3823.9375, - -1688.0078125, 1352.7890625, 10358.91796875, - -10367.25, 8765.640625, 8514.859375, - -2312.3515625, -149.84375, -2122.58203125, - -11496.625, 7071.375, -3931.1953125, - -5898.2265625, 5073.609375, -11397.296875, - -23342.65234375, -5170.28125, 8396.375, - 8220.06640625, -8682.46875, 1831.23046875, - -37827.5234375, 20096.546875, -11133.9609375, - 12698.61328125, 11664.3671875, -10309.6953125, - 3454.2265625, -1164.4375, 4313.890625, - 6910.9296875, 2661.6953125, 10129.546875, - -2139.69140625, -11888.875, 6534.390625, - -16050.5625, 4048.421875, -1668.34375, - -4949.546875, -9920.875, 6967.484375, - 35.2109375, -3729.7421875, -539.0078125, - -7255.515625, 8104.0859375, 11033.84375, - -3872.1015625, -1376.09375, -12712.625, - -2128.015625, -8410.5390625, 2891.2890625, - 637.953125, 4776.53125, -1190.875, - 7076.6875, -6917.78125, -2427.578125, - -11357.453125, 8747.046875, 3872.484375, - 7150.5078125, -172.734375, 1278.578125, - -10396.421875, 2083.59375, -6347.3828125, - 2092.875, 504.2578125, 525.46875, - 6029.5703125, 4421.44921875, 1075.6875, - -2823.125, -8124.953125, 6942.25, - 10343.2265625, 16001.0625, -5241.875, - 3203.91796875, -6437.1328125, 4452.1015625, - -6619.890625, -5674.609375, -8518.5234375, - -1515.015625, -18426.5, -7736.109375, - 6546.296875, 31838.54296875, -8876.28125, - 4352.015625, -4537.33203125, 3833.65625, - -399.140625, -7052.46875, 4162.2890625, - 5714.015625, 8049.109375, 22979.578125, - -3053.5625, 21148.34375, 3033.296875, - 9053.40625, -819.0625, -1806.4375, - 18199.75, 6584.765625, -2146.328125, - -2207.4375, 1406.3125, 2560.078125, - -2901.48828125, -4805.5, 6231.5859375, - -3633.8046875, -14640.6015625, -670.12109375, - -706.9453125, -3026.4296875, -3058.90625, - 8460.328125, -14541.01953125, -23432.8046875, - 1366.609375, -2202.15625, -1230.265625, - 6089.203125, -984.0859375, -3484.19140625, - -3131.65625, -638.12890625, 3205.0859375, - 736.9765625, 5539.50390625, -2781.03125, - 7439.1640625, -4015.32421875, 10636.1484375, - -5710.3984375, -9404.0859375, -4306.03125, - 701.6484375, -1162.51171875, 13591.83984375, - -5799.859375, 765.609375, -1596.1953125, - 6092.32421875, -7803.28125, -765.2890625, - 7828.4765625, 3897.4453125, -2050.078125, - -4962.71875, 512.265625, 21262.890625, - 4547.46875, 2839.125, 1198.69140625, - -15793.1640625, 465.48046875, -16456.93359375, - -985.921875, -4595.0, -12569.92578125, - -2043.3515625, -5904.42578125, -3156.734375, - -17789.71484375, -2015.02734375, -4766.734375, - -2671.46875, -8070.94921875, -1416.7734375, - -35127.84375, 1078.8984375, 7456.203125, - 10872.125, -45.4765625, -2889.3984375, - 4324.88671875, 2036.921875, 3489.21875, - 8502.7578125, 323.5, 4878.984375, - 3560.0703125, -2352.40234375, -4720.1640625, - 18236.47265625, 11045.140625, 567.1328125, - 4078.421875, 2496.93359375, -5989.203125, - -3318.890625, 4008.87109375, 266.65625, - 86.921875, -4382.0703125, -9407.78125, - 7281.828125, -1637.25390625, 3919.59375, - 16931.921875, -2453.87890625, 1889.328125, - 27516.9921875, 12797.69921875, -25322.90625, - -14290.53125, -6115.96875, 5234.26171875, - -994.28125, -2473.66796875, 470.0390625, - -963.640625, 13280.203125, -2672.03125, - -16842.1875, -6421.8359375, 2308.46875, - 8808.6328125, -10326.828125, -8662.8671875, - 3356.80078125, -3051.0, 563.40234375, - 3632.046875, 12192.66796875, 5544.42578125, - -6474.01171875, 1054.8984375, 7009.91796875, - 7753.5, -1957.5, 818.640625, - -3513.046875, -1782.6640625, -1848.30859375, - -4723.77734375, 1239.03125, 12823.21875, - -709.125, 6687.46875, 4565.1953125, - 3920.875, 71.296875, -3051.2421875, - -10711.01171875, -17849.015625, -580.5625, - -9447.265625, -18077.71875, 709.38671875, - -2629.16015625, -3353.203125, 7041.984375, - -698.96875, -14862.421875, -3313.109375, - -12795.76953125, 8705.4296875, -10550.0625, - 18604.1875, -2924.8671875, -10940.40625, - 818.140625, -1244.46484375, -11507.28125, - 27750.484375, 6003.0, -1876.96875, - -5963.0, 6216.0234375, -11614.76953125, - 11157.578125, 92.375, 9256.0390625, - -742.9609375, 3178.1875, -7387.3046875, - -5648.19921875, 7296.70703125, -8240.44921875, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/gendata.py deleted file mode 120000 index ecf1d19c..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/gendata.py +++ /dev/null @@ -1 +0,0 @@ -../vec-sgemm/gendata.py \ No newline at end of file diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/vec-sgemm.S b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/vec-sgemm.S deleted file mode 100644 index 62d388a4..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/vec-sgemm.S +++ /dev/null @@ -1,335 +0,0 @@ - .text - .balign 4 - .global vec_sgemm_nn - .type vec_sgemm_nn,@function -# -# void -# vec_sgemm_nn(size_t n, -# size_t m, -# size_t k, -# const float*a, // m * k matrix -# size_t lda, -# const float*b, // k * n matrix -# size_t ldb, -# float*c, // m * n matrix -# size_t ldc) -# -# c += a*b (alpha=1, no transpose on input matrices) -# matrices stored in C row-major order - -# With LMUL=4, load 4 rows of C and 4 rows of B -# Load 16 scalars from A into FP registers - -#define n a0 -#define m a1 -#define k a2 -#define ap a3 -#define astride a4 -#define bp a5 -#define bstride a6 -#define cp a7 -#define cstride t0 -#define kt t1 -#define nt t2 -#define bnp t3 -#define cnp t4 -#define akp t5 -#define bkp s0 -#define nvl s1 -#define ccp s2 -#define amp s3 - -#define a00 ft0 -#define a10 ft1 -#define a20 ft2 -#define a30 ft3 -#define a01 ft4 -#define a11 ft5 -#define a21 ft6 -#define a31 ft7 -#define a02 fa0 -#define a12 fa1 -#define a22 fa2 -#define a32 fa3 -#define a03 fa4 -#define a13 fa5 -#define a23 fa6 -#define a33 fa7 - -#define FRAMESIZE 32 - -vec_sgemm_nn: - ld cstride, 0(sp) # Get arg from stack frame - addi sp, sp, -FRAMESIZE - sd s0, 0(sp) - sd s1, 8(sp) - sd s2, 16(sp) - sd s3, 24(sp) - - # Check for zero size matrices - beqz n, exit - beqz m, exit - beqz k, exit - - # Convert elements strides to byte strides. - slli astride, astride, 2 - slli bstride, bstride, 2 - slli cstride, cstride, 2 - - slti t6, m, 4 - bnez t6, m_remainder_m_loop - -c_row_loop: # Loop across rows of C blocks - mv nt, n # Initialize n counter for next row of C blocks - mv bnp, bp # Initialize B n-loop pointer to start - mv cnp, cp # Initialize C n-loop pointer - -c_col_loop: # Loop across columns of C - vsetvli nvl, nt, e32, m4, ta, ma # 32-bit vectors, LMUL=4 - - # Not enough remaining k elements to unroll by 4 - mv kt, k # Initialize inner loop counter - slti t6, kt, 4 - bnez t6, k_loop_remainder - - mv akp, ap # reset pointer into A to beginning - mv bkp, bnp # step to next column in B matrix - - # Initalize current C submatrix block from memory. - vle32.v v0, (cnp); add ccp, cnp, cstride; - flw a00, (akp); add amp, akp, astride; - flw a10, (amp); add amp, amp, astride; - vle32.v v4, (ccp); add ccp, ccp, cstride; - flw a20, (amp); add amp, amp, astride; - flw a30, (amp); addi akp, akp, 4 - vle32.v v8, (ccp); add ccp, ccp, cstride; - flw a01, (akp); add amp, akp, astride; - flw a11, (amp); add amp, amp, astride; - vle32.v v12, (ccp); - flw a21, (amp); add amp, amp, astride; - flw a31, (amp); addi akp, akp, 4 - - # Get vector from B matrix - vle32.v v16, (bkp); add bkp, bkp, bstride - flw a02, (akp); add amp, akp, astride; - flw a12, (amp); add amp, amp, astride; - vle32.v v20, (bkp); add bkp, bkp, bstride - flw a22, (amp); add amp, amp, astride; - flw a32, (amp); addi akp, akp, 4 - vle32.v v24, (bkp); add bkp, bkp, bstride - flw a03, (akp); add amp, akp, astride; - flw a13, (amp); add amp, amp, astride; - vle32.v v28, (bkp); add bkp, bkp, bstride - flw a23, (amp); add amp, amp, astride; - flw a33, (amp); addi akp, akp, 4 - - # Inner loop scheduled assuming 4-clock occupancy of vfmacc instruction and single-issue pipeline - # Software pipeline loads - - addi kt, kt, -4 - -k_loop: - - slti t6, kt, 4 - bnez t6, k_loop_remainder - - # Compute current block of FMAs - vfmacc.vf v0, a00, v16 - flw a00, (akp); add amp, akp, astride; - vfmacc.vf v4, a10, v16 - flw a10, (amp); add amp, amp, astride; - vfmacc.vf v8, a20, v16 - flw a20, (amp); add amp, amp, astride; - vfmacc.vf v12, a30, v16 - flw a30, (amp); add akp, akp, 4 - vle32.v v16, (bkp); add bkp, bkp, bstride - - vfmacc.vf v0, a01, v20 - flw a01, (akp); add amp, akp, astride; - vfmacc.vf v4, a11, v20 - flw a11, (amp); add amp, amp, astride; - vfmacc.vf v8, a21, v20 - flw a21, (amp); add amp, amp, astride; - vfmacc.vf v12, a31, v20 - flw a31, (amp); add akp, akp, 4 - vle32.v v20, (bkp); add bkp, bkp, bstride - - vfmacc.vf v0, a02, v24 - flw a02, (akp); add amp, akp, astride; - vfmacc.vf v4, a12, v24 - flw a12, (amp); add amp, amp, astride; - vfmacc.vf v8, a22, v24 - flw a22, (amp); add amp, amp, astride; - vfmacc.vf v12, a32, v24 - flw a32, (amp); add akp, akp, 4 - vle32.v v24, (bkp); add bkp, bkp, bstride - - vfmacc.vf v0, a03, v28 - flw a03, (akp); add amp, akp, astride; - vfmacc.vf v4, a13, v28 - flw a13, (amp); add amp, amp, astride; - vfmacc.vf v8, a23, v28 - flw a23, (amp); add amp, amp, astride; - vfmacc.vf v12, a33, v28 - flw a33, (amp); add akp, akp, 4 - vle32.v v28, (bkp); add bkp, bkp, bstride - - addi kt, kt, -4 - - j k_loop - -k_loop_remainder: - vfmacc.vf v0, a00, v16 - vfmacc.vf v4, a10, v16 - vfmacc.vf v8, a20, v16 - vfmacc.vf v12, a30, v16 - vfmacc.vf v0, a01, v20 - vfmacc.vf v4, a11, v20 - vfmacc.vf v8, a21, v20 - vfmacc.vf v12, a31, v20 - vfmacc.vf v0, a02, v24 - vfmacc.vf v4, a12, v24 - vfmacc.vf v8, a22, v24 - vfmacc.vf v12, a32, v24 - vfmacc.vf v0, a03, v28 - vfmacc.vf v4, a13, v28 - vfmacc.vf v8, a23, v28 - vfmacc.vf v12, a33, v28 - - beqz kt, 1f - -k_loop_remainder_loop: - # Proceed at one k element per loop - addi kt, kt, -1 - vle32.v v16, (bkp) - flw a00, (akp); add amp, akp, astride; - flw a10, (amp); add amp, amp, astride; - flw a20, (amp); add amp, amp, astride; - flw a30, (amp) - - vfmacc.vf v0, a00, v16 - vfmacc.vf v4, a10, v16 - vfmacc.vf v8, a20, v16 - vfmacc.vf v12, a30, v16 - - addi akp, akp, 4 - add bkp, bkp, bstride - - bnez kt, k_loop_remainder_loop - -1: vse32.v v0, (cnp); add ccp, cnp, cstride; - vse32.v v4, (ccp); add ccp, ccp, cstride; - vse32.v v8, (ccp); add ccp, ccp, cstride; - vse32.v v12, (ccp) - - slli t6, nvl, 2 - add cnp, cnp, t6 - add bnp, bnp, t6 - sub nt, nt, nvl # Decrement element count in n dimension - bnez nt, c_col_loop - - # Move to the next set of rows - addi m, m, -4 - - slli t6, astride, 2 # Multiply astride by 4 - add ap, ap, t6 # Move A matrix pointer down 4 rows - slli t6, cstride, 2 # Multiply cstride by 4 - add cp, cp, t6 # Move C matrix pointer down 4 rows - - slti t6, m, 4 - beqz t6, c_row_loop - - beqz m, exit - -m_remainder_m_loop: - mv cnp, cp - mv bnp, bp - mv nt, n - -m_remainder_n_loop: - vsetvli nvl, nt, e32, m4, ta, ma # 32-bit vectors, LMUL=4 - - # Not enough remaining k elements to unroll by 4 - mv kt, k # Initialize inner loop counter - slti t6, kt, 4 - bnez t6, m_remainder_k_loop_remainder - - mv akp, ap # reset pointer into A to beginning - mv bkp, bnp # step to next column in B matrix - - vle32.v v0, (cnp) - - # Get vectors from B matrix - vle32.v v16, (bkp); add bkp, bkp, bstride - vle32.v v20, (bkp); add bkp, bkp, bstride - vle32.v v24, (bkp); add bkp, bkp, bstride - vle32.v v28, (bkp); add bkp, bkp, bstride - - # Inner loop scheduled assuming 4-clock occupancy of vfmacc instruction and single-issue pipeline - # Software pipeline loads - flw a00, (akp); addi akp, akp, 4 - flw a01, (akp); addi akp, akp, 4 - flw a02, (akp); addi akp, akp, 4 - flw a03, (akp); addi akp, akp, 4 - - addi kt, kt, -4 - -m_remainder_k_loop: - vfmacc.vf v0, a00, v16 - vfmacc.vf v0, a01, v20 - vfmacc.vf v0, a02, v24 - vfmacc.vf v0, a03, v28 - - addi kt, kt, -4 - blez kt, m_remainder_k_loop_remainder - - flw a00, (akp); addi akp, akp, 4 - flw a01, (akp); addi akp, akp, 4 - flw a02, (akp); addi akp, akp, 4 - flw a03, (akp); addi akp, akp, 4 - - vle32.v v16, (bkp); add bkp, bkp, bstride - vle32.v v20, (bkp); add bkp, bkp, bstride - vle32.v v24, (bkp); add bkp, bkp, bstride - vle32.v v28, (bkp); add bkp, bkp, bstride - - j m_remainder_k_loop - -m_remainder_k_loop_remainder: - addi kt, kt, 4 - beqz kt, 1f - -m_remainder_k_loop_remainder_loop: - - addi kt, kt, -1 - vle32.v v16, (bkp) - flw a00, (akp); add amp, akp, astride; - - vfmacc.vf v0, a00, v16 - - addi akp, akp, 4 - add bkp, bkp, bstride - - bnez kt, m_remainder_k_loop_remainder_loop - -1: vse32.v v0, (cnp) - - slli t6, nvl, 2 - add cnp, cnp, t6 - add bnp, bnp, t6 - sub nt, nt, nvl - bnez nt, m_remainder_n_loop - - addi m, m, -1 - add ap, ap, astride - add cp, cp, cstride - - bnez m, m_remainder_m_loop - -exit: - ld s0, 0(sp) - ld s1, 8(sp) - ld s2, 16(sp) - ld s3, 24(sp) - addi sp, sp, FRAMESIZE - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/vec-sgemm_main.c b/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/vec-sgemm_main.c deleted file mode 100644 index 252319db..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm-v3/vec-sgemm_main.c +++ /dev/null @@ -1,56 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// SGEMM benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized sgemm implementation. - -#include "util.h" -#include -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_sgemm_nn(size_t, size_t, size_t, const float *, size_t, const float *, - size_t, float *, size_t); - -int main(int argc, char *argv[]) { - float results_data[M_DIM * N_DIM] = {0}; - printf("sgemm M,N,K = %ld,%ld,%ld\n", M_DIM, N_DIM, K_DIM); - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_sgemm_nn(N_DIM, M_DIM, K_DIM, a_matrix, K_DIM, b_matrix, N_DIM, - results_data, N_DIM); -#endif - - // Do the size sweeps -#define MAXSZ 85 - if (M_DIM >= MAXSZ && N_DIM >= MAXSZ && K_DIM >= MAXSZ) { - for (size_t t = 8; t <= MAXSZ; t += 7) { - size_t start, end; - start = read_csr(mcycle); - vec_sgemm_nn(t, t, t, a_matrix, t, b_matrix, t, results_data, t); - asm volatile("fence"); - end = read_csr(mcycle); - printf("size %ld cycles = %ld\n", t, end - start); - } - } - - // Do the sgemm - memset(results_data, 0, sizeof(results_data)); - setStats(1); - vec_sgemm_nn(N_DIM, M_DIM, K_DIM, a_matrix, K_DIM, b_matrix, N_DIM, - results_data, N_DIM); - setStats(0); - - // Check the results - return verifyFloat(M_DIM * N_DIM, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-sgemm/dataset1.h deleted file mode 100644 index 29a30cd7..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/dataset1.h +++ /dev/null @@ -1,2535 +0,0 @@ -#define M_DIM 71 -#define K_DIM 71 -#define N_DIM 71 - -typedef float data_t; - -static data_t a_matrix[M_DIM * K_DIM] = { - -112.0, -1.875, -20.0, 28.0, -0.125, -40.0, 60.0, -1.625, - -32.0, -14.0, -48.0, -0.375, -2.5, -0.25, -2.0, 2.0, - -44.0, 1.375, 8.0, -0.375, 8.0, 1.0, -48.0, -16.0, - -1.75, -7.0, -1.25, -3.25, 8.0, 32.0, -0.875, 12.0, - 0.9375, 15.0, 6.0, -4.0, 30.0, 0.25, 64.0, -1.5, - 7.0, -2.0, -1.375, 6.0, -0.5, 1.625, -1.125, 0.75, - 6.0, -0.5, 36.0, 64.0, -24.0, 7.0, -1.375, -3.0, - 8.0, -20.0, -128.0, 13.0, 0.5625, -5.0, -30.0, 24.0, - -60.0, -0.125, 80.0, 0.0625, -8.0, 12.0, 4.0, -0.4375, - 0.0, 112.0, 0.5, 0.375, -3.75, 1.875, -64.0, -1.5, - 2.5, 8.0, 64.0, 80.0, 12.0, 1.0, 32.0, 6.0, - 6.0, -20.0, 0.0, -36.0, -44.0, 2.75, 1.375, 0.375, - -0.375, -6.0, 1.75, -1.75, 80.0, -4.0, -0.875, 6.5, - 96.0, 40.0, 0.0, -0.25, 28.0, -9.0, -8.0, 2.0, - -4.0, 18.0, 0.5, -5.0, -2.0, -2.0, 8.0, -0.5, - 3.0, -4.0, -4.0, -5.5, 18.0, 36.0, 0.75, -0.75, - 2.5, 10.0, 88.0, -10.0, -26.0, -4.0, 0.75, 2.0, - -30.0, -3.0, -4.0, -1.625, -12.0, -48.0, -32.0, 0.0, - 1.75, 0.625, 11.0, -1.5, -0.125, 26.0, -15.0, 48.0, - -18.0, 36.0, 1.5, 72.0, 120.0, -5.0, 16.0, -24.0, - -72.0, -0.5, -1.25, -3.75, 3.0, 0.5, 4.0, 0.0, - -1.875, 2.5, -1.0, -104.0, 0.375, 26.0, -0.5, -60.0, - 20.0, -8.0, 0.0, 0.0, -3.5, -8.0, 1.125, -0.3125, - -3.0, 18.0, 6.0, 0.875, -88.0, -0.75, 1.75, 5.0, - 0.0, 11.0, 1.125, -0.25, 12.0, 6.0, -1.75, 0.9375, - -36.0, 48.0, -8.0, -2.0, -10.0, -5.5, 0.0, -16.0, - 3.5, -0.5625, 2.5, -1.25, 28.0, -8.0, -2.5, 2.0, - -40.0, -0.875, 0.75, -3.0, 4.5, -112.0, -8.0, 5.0, - -2.0, -0.25, -6.5, -1.0, -7.0, 0.125, -6.0, 6.5, - 0.25, -52.0, -1.625, 0.5, 0.75, -88.0, 14.0, -0.625, - 24.0, -112.0, 1.0, -8.0, 16.0, 0.0625, 0.0, -128.0, - 5.0, 40.0, -72.0, -32.0, -20.0, 4.0, -1.25, 96.0, - -10.0, -0.625, -16.0, -1.625, 56.0, 14.0, 6.0, 3.0, - -1.125, 64.0, 1.875, -0.4375, 6.0, -1.0, -1.5, -0.625, - 4.5, -112.0, 5.0, -12.0, 11.0, 8.0, 1.125, -1.0, - -32.0, -4.0, -1.75, 24.0, 0.0, -0.5, -1.5, 0.0625, - -13.0, -5.5, 48.0, 1.875, -5.5, 24.0, -0.25, 28.0, - -16.0, -0.5, 1.5, 2.75, -2.75, -14.0, 6.0, 0.25, - 1.0, -1.5, -48.0, -4.0, 0.5, 112.0, 3.0, 30.0, - 1.75, -9.0, 15.0, 2.0, 60.0, -4.0, 4.0, 6.0, - -48.0, 22.0, -4.5, -1.5, 40.0, 8.0, -0.625, 0.0, - -16.0, -9.0, 13.0, -0.5, -32.0, -14.0, -0.375, -3.5, - 56.0, 0.875, 1.5, -2.0, -2.0, -0.75, 24.0, 120.0, - -24.0, 1.5, -0.625, -8.0, 10.0, -2.0, -52.0, -40.0, - -0.875, 0.5, 22.0, 1.0, 88.0, 5.0, 0.0625, 0.25, - 4.0, -4.0, -64.0, 2.5, 2.0, -4.0, 4.0, -20.0, - -5.0, 1.625, -48.0, 4.0, 1.0, 18.0, 4.5, 0.25, - -0.9375, 104.0, -24.0, 16.0, 2.0, 0.1875, 16.0, 12.0, - -0.75, 18.0, -0.25, -40.0, -56.0, -8.0, -4.5, -28.0, - 1.75, -104.0, 2.5, 2.75, -64.0, -6.0, 4.5, -26.0, - -3.75, 0.1875, 112.0, 24.0, 1.0, 5.0, 2.0, 7.5, - -7.5, -15.0, 4.0, -1.0, 12.0, -6.5, 0.5, -7.0, - 72.0, 26.0, -28.0, -64.0, -112.0, 0.25, -0.125, -22.0, - 1.625, 1.75, -7.5, -16.0, 0.0, -6.0, -1.5, 64.0, - 24.0, 0.25, -128.0, 4.0, -1.875, 1.875, 8.0, -1.0, - 0.4375, -20.0, -0.625, 0.5, -1.5, -16.0, -15.0, 0.75, - 0.375, -30.0, -6.0, 0.875, -48.0, 28.0, -1.75, -0.375, - 0.0, 0.5, -20.0, -7.5, -15.0, -56.0, 3.0, 96.0, - 0.3125, 32.0, 0.5625, -1.5, -0.375, -1.875, 1.25, -16.0, - -32.0, -20.0, 0.4375, 0.375, -1.0, -52.0, -4.0, 28.0, - -8.0, -1.75, -4.0, -16.0, 7.0, -5.0, 0.875, 52.0, - 0.0, 15.0, -26.0, -4.5, 2.0, -0.3125, 13.0, -0.25, - -0.5, -4.0, 4.0, 44.0, -0.5625, -26.0, -0.5, -6.5, - -1.75, 2.25, 0.0, 2.5, -3.5, -3.5, -0.875, 1.0, - 7.0, 16.0, -1.25, -1.5, 0.0, 0.0, -7.0, -40.0, - -104.0, 4.0, -3.25, -32.0, 0.0, 0.0, -4.0, 20.0, - -3.25, 10.0, 7.0, 20.0, 1.0, -3.75, 0.5, 0.5, - 0.0, 52.0, -5.5, 4.0, -52.0, 20.0, 12.0, -56.0, - 6.5, -2.0, -2.5, 12.0, -0.625, -0.8125, 1.5, -4.0, - -72.0, 4.0, -1.0, -120.0, -3.0, 5.5, 14.0, -20.0, - -8.0, 6.0, 0.375, -16.0, 2.0, -32.0, -0.5, 10.0, - -24.0, 1.5, -3.75, -0.3125, 8.0, 11.0, -0.375, 12.0, - 1.0, -7.5, 7.0, 0.625, 0.5, 1.0, 0.75, 24.0, - -4.0, -128.0, 3.5, 0.875, 1.25, 1.25, -15.0, -16.0, - -0.625, 0.5, 88.0, -32.0, 24.0, 4.0, -6.0, 8.0, - -3.25, -2.0, -0.9375, 16.0, -8.0, -11.0, -2.0, 0.6875, - 32.0, -5.0, -2.5, 0.25, -2.0, 0.5, -1.5, 3.5, - -3.0, 1.5, -72.0, 20.0, -1.25, -0.25, -0.9375, -0.5, - -1.0, -0.8125, -2.25, -0.25, 20.0, -112.0, 1.75, 0.75, - 88.0, 96.0, 1.125, -0.75, 16.0, -28.0, -28.0, 0.5, - -1.875, -104.0, 4.5, 6.5, 48.0, -12.0, -6.5, 6.5, - 12.0, 0.25, 0.5, -1.0, -0.625, 64.0, 0.5, -1.5, - -48.0, -1.375, 56.0, 0.125, -7.5, -0.1875, -56.0, -0.8125, - -48.0, -0.5, -3.0, 26.0, 24.0, 4.0, 20.0, 0.125, - -56.0, 0.0625, -28.0, 1.125, -8.0, -60.0, 2.0, -16.0, - -96.0, 1.0, -7.0, -32.0, 2.0, 60.0, 3.5, 7.0, - -32.0, -18.0, -1.0, 1.125, -6.0, -10.0, -36.0, 6.0, - -0.5, 0.0, -7.0, -56.0, 20.0, 72.0, -3.0, 3.5, - -2.5, -13.0, 1.375, 8.0, -26.0, 18.0, -2.0, 0.75, - -120.0, -10.0, -30.0, 0.0, -1.25, -0.625, -22.0, -8.0, - 6.0, 88.0, -18.0, -15.0, -1.75, -48.0, 11.0, -60.0, - 8.0, -8.0, 0.0625, 4.0, -8.0, -112.0, 30.0, 1.0, - -15.0, 0.375, -52.0, 0.875, 0.0, -8.0, -16.0, 0.0, - 24.0, 0.5625, -1.0, -40.0, -36.0, 0.75, 2.0, -2.0, - 88.0, -11.0, 6.0, -2.25, 4.0, 52.0, -8.0, -12.0, - -1.125, 2.0, 2.0, 32.0, 2.5, 12.0, -1.75, -80.0, - -1.0, -6.0, -8.0, 120.0, -40.0, 28.0, -112.0, -44.0, - 0.5625, -6.5, 4.0, 28.0, -32.0, -0.25, 10.0, -112.0, - 6.0, 32.0, -7.0, -20.0, -0.5, 72.0, -30.0, -12.0, - 3.5, 14.0, -3.0, -1.125, -4.0, 16.0, 6.0, -6.5, - 3.0, -0.75, -2.0, -1.375, 0.25, 48.0, 1.75, -2.75, - -5.0, 4.5, -4.0, 1.25, 0.875, 14.0, 0.0, -3.75, - -72.0, 0.0, -0.5, -8.0, 16.0, -5.0, -2.0, -0.75, - 48.0, 0.5, -32.0, -10.0, -0.3125, -4.0, 0.0, -5.5, - 9.0, 2.0, -36.0, 40.0, 1.375, -4.0, 12.0, -96.0, - -12.0, -88.0, 2.0, 96.0, -1.5, -0.1875, 0.375, 52.0, - -0.5625, 2.25, 1.875, 1.25, -5.5, -10.0, 1.5, 0.5, - 2.0, 2.0, 12.0, 20.0, 16.0, 0.0, -8.0, -1.25, - 0.125, -128.0, -64.0, 16.0, 4.0, -20.0, -6.0, -2.0, - -3.5, -128.0, -0.9375, 5.0, 48.0, 1.125, 40.0, 0.0, - -6.5, 6.5, -0.5, 0.3125, 0.4375, 1.0, -1.25, 4.0, - -56.0, -5.0, 1.125, -7.0, 12.0, 1.25, 40.0, 3.5, - 0.0, -24.0, 24.0, 0.25, -15.0, 80.0, -16.0, 0.6875, - 3.0, -10.0, 0.125, -8.0, 1.75, 2.0, 0.3125, -10.0, - -40.0, -0.375, -0.125, 0.5, -0.5, 8.0, -28.0, 96.0, - -1.5, 1.125, -2.0, -3.0, 3.75, 0.625, -7.0, 18.0, - -11.0, 0.75, -1.0, -8.0, 15.0, -1.5, -112.0, -1.75, - -112.0, -28.0, 0.6875, 1.0, -2.0, 48.0, 12.0, -0.875, - -1.125, 2.5, -1.625, 1.0, 5.0, 2.0, -6.0, 1.0, - -40.0, 3.0, -3.5, -4.0, 2.0, 0.75, 0.75, -0.625, - 0.125, -14.0, -1.125, -10.0, 1.75, -2.75, -4.0, -22.0, - -9.0, 56.0, 26.0, -0.125, -112.0, -0.0625, -3.75, -120.0, - 8.0, -0.0625, -0.0625, 24.0, 0.5, 0.5, -2.0, 32.0, - 2.0, 112.0, -4.0, -72.0, -9.0, 8.0, 1.75, -0.5625, - 0.75, -1.5, 0.5, -15.0, 24.0, -3.0, -28.0, 0.4375, - 18.0, 0.1875, -48.0, -11.0, -0.5625, -7.0, -120.0, -40.0, - 1.875, 10.0, 0.0, 1.625, -16.0, -8.0, -2.5, -0.5, - -56.0, 20.0, -7.5, 0.125, 80.0, -1.875, -10.0, 0.75, - 0.1875, 0.75, -1.0, -4.0, -1.625, 6.0, -4.0, -1.5, - 1.75, -15.0, -48.0, -24.0, -1.625, 0.5625, -14.0, 6.0, - 5.0, -80.0, -16.0, -32.0, 112.0, -1.75, -0.375, -0.0625, - 48.0, -7.0, -28.0, -1.0, -0.5, 0.75, 0.1875, -32.0, - 0.0, 1.75, 0.5, -0.75, -0.25, 0.375, 0.875, 18.0, - -1.5, 6.0, 0.0, -1.375, 1.75, -13.0, -72.0, 0.625, - 14.0, 22.0, 0.0625, -16.0, -6.0, -5.0, -48.0, -96.0, - 1.25, 0.5, -104.0, -88.0, -28.0, 6.5, 104.0, -0.5, - -32.0, -1.0, 3.25, 16.0, -120.0, -48.0, -1.0, 0.0, - 0.5, 20.0, -0.625, 1.0, -6.0, -22.0, -0.25, 16.0, - -1.25, 6.0, 0.5, 6.0, -8.0, -1.0, 80.0, 96.0, - -0.75, -32.0, -96.0, 1.5, -1.5, -8.0, 7.0, 64.0, - -0.1875, -2.0, 1.625, -1.375, -3.0, 4.5, -8.0, -0.625, - -20.0, -0.0625, 15.0, 2.0, 0.5, 1.0, 6.5, -22.0, - -0.125, -12.0, -24.0, -2.75, 88.0, 1.0, 8.0, -16.0, - 40.0, 5.0, -0.25, 10.0, 32.0, -14.0, -64.0, 60.0, - -20.0, -0.25, -2.0, -16.0, -0.625, 20.0, 0.75, 0.375, - -80.0, 3.5, -0.6875, -48.0, -1.0, 3.5, -0.875, 15.0, - 0.0, -1.5, 0.5625, 3.5, 1.25, 1.75, -1.375, 0.6875, - 64.0, 4.0, -5.5, 7.0, 5.0, 8.0, -20.0, -6.0, - -36.0, 0.625, 2.25, -1.0, 0.5625, 0.875, -64.0, -96.0, - -20.0, -1.125, 24.0, 4.5, -1.75, -24.0, -6.0, -96.0, - 12.0, 0.3125, -5.5, -32.0, 18.0, -0.625, 0.4375, 6.5, - -32.0, -36.0, 44.0, 8.0, -112.0, 7.0, 40.0, -0.6875, - 2.0, -12.0, -3.0, 5.0, -112.0, 104.0, -0.1875, 0.5, - 7.5, 0.25, -4.0, -0.3125, 0.25, -128.0, -2.5, -7.0, - 3.0, 2.0, -2.0, 0.75, 12.0, -1.5, -2.75, 0.75, - -3.75, 14.0, -7.5, -2.5, 1.125, 12.0, 0.625, -6.0, - -14.0, -0.5, -0.9375, 120.0, 52.0, -0.75, 1.5, -18.0, - 5.5, -40.0, -3.0, 48.0, 0.75, 0.75, 0.3125, -80.0, - -10.0, -13.0, 0.0, 1.5, -60.0, 1.0, -36.0, -0.75, - 2.25, -6.0, -8.0, 0.5625, -48.0, -8.0, 26.0, 48.0, - -64.0, 0.3125, -0.5, 8.0, -9.0, 0.0, -0.125, 20.0, - -0.3125, -0.75, -56.0, -1.875, -0.5, 0.5, -0.9375, -2.5, - 60.0, -26.0, 1.0, 0.9375, -3.0, -8.0, -10.0, -20.0, - 28.0, -4.0, 6.0, -0.1875, -2.25, -64.0, 0.5, 9.0, - 1.0, -0.875, 7.0, -4.0, 0.3125, 5.5, -0.25, 0.0, - -1.375, 48.0, -32.0, 48.0, -0.6875, 15.0, -18.0, -3.0, - 0.0, -30.0, -28.0, 6.0, 26.0, 5.0, -5.0, 4.0, - 12.0, -6.0, 0.75, 0.5, 0.125, 0.0625, 1.625, 7.5, - 112.0, 3.75, 0.75, -0.75, 88.0, 0.25, -80.0, 2.0, - 0.0, 40.0, -3.75, 0.3125, 48.0, 1.0, -0.3125, -3.0, - -16.0, 3.0, -1.5, -1.125, 0.3125, 28.0, -20.0, 1.25, - -1.375, 0.8125, -120.0, -36.0, 40.0, -1.5, -1.0, -0.8125, - -8.0, -3.0, -12.0, -10.0, -2.0, 12.0, 12.0, -4.0, - 0.75, 5.0, 52.0, -0.5, -0.125, -2.5, -52.0, -8.0, - -0.5, -120.0, 64.0, 72.0, -0.1875, 4.5, -1.0, 0.0, - 0.0, -9.0, 52.0, -20.0, 0.875, 1.875, -15.0, 2.0, - 1.25, -7.0, -11.0, 2.5, 0.625, -112.0, 4.0, -5.0, - 4.5, 0.0, 28.0, -0.875, -1.0, 12.0, -0.875, -1.0, - -0.4375, -1.375, -0.25, 60.0, -7.0, -2.5, 80.0, -0.5, - -12.0, 44.0, 20.0, -112.0, -64.0, -44.0, 0.0, -44.0, - -24.0, -36.0, 40.0, -1.5, 2.0, 1.25, -6.5, 9.0, - -3.25, 16.0, 10.0, 0.125, 52.0, -1.875, 40.0, -1.75, - 2.75, -1.0, -0.125, -0.25, -2.5, 0.125, -0.375, 40.0, - 44.0, 72.0, -1.25, 1.25, -3.75, -26.0, -1.0, -10.0, - 11.0, 3.75, -5.5, -0.75, -48.0, -1.25, -40.0, 56.0, - 7.0, 1.5, 0.0625, -0.1875, -44.0, -12.0, -88.0, 6.0, - 3.5, 0.0, 1.75, -2.0, -2.0, 0.9375, -7.0, -2.0, - 0.625, 0.5625, -0.0625, 0.125, 5.0, -3.25, -0.5, -20.0, - 104.0, 13.0, 0.5625, 0.1875, 16.0, 1.125, 0.0625, -1.25, - -1.0, -6.0, 2.75, 0.5, -4.0, 0.0625, -56.0, -12.0, - -12.0, -1.875, -12.0, 1.25, -0.125, -32.0, 14.0, -0.25, - 6.0, -1.0, 3.5, 4.0, -0.5, 0.75, -0.75, 3.0, - 14.0, -0.5, -8.0, 1.875, -1.25, -24.0, -16.0, -0.625, - -0.3125, 56.0, -4.5, 0.5, 1.0, -13.0, -0.0625, 40.0, - -13.0, -112.0, -2.0, -1.875, -2.5, 7.5, -6.0, -26.0, - 1.25, 1.5, 1.5, -28.0, -20.0, -8.0, 5.0, 40.0, - 32.0, 0.0, -56.0, -4.0, -13.0, -14.0, 88.0, 1.375, - -0.5, -0.75, -2.75, 10.0, 64.0, 52.0, 1.0, 112.0, - 11.0, -3.0, -44.0, 0.625, -14.0, -0.375, 28.0, 28.0, - -1.5, -0.75, 0.875, -1.5, 104.0, 11.0, -96.0, -0.125, - 1.25, -0.5, -6.5, -24.0, 3.75, 8.0, -112.0, 0.3125, - 3.0, -0.6875, -88.0, -120.0, 24.0, 4.5, 0.75, 0.5625, - 4.5, -11.0, -20.0, 112.0, -20.0, 4.0, 1.5, 36.0, - 26.0, 52.0, 48.0, 7.5, -1.25, 2.0, 2.0, 1.75, - 32.0, -2.75, 0.625, 3.5, -16.0, 1.0, 6.0, -32.0, - -1.0, 5.0, -16.0, 5.0, 0.875, 8.0, -14.0, 48.0, - -8.0, -24.0, -8.0, 1.5, -20.0, -1.5, -0.625, 6.0, - 1.25, 1.125, -7.0, 0.625, 2.0, -1.0, -5.0, 1.0, - -0.6875, -0.5625, -8.0, -16.0, 2.25, 2.0, 2.0, 4.0, - -3.5, -12.0, -10.0, -0.75, -22.0, -6.0, -28.0, 0.0, - -2.0, -0.875, 48.0, 28.0, -8.0, 1.25, -1.0, 0.0625, - 104.0, -4.0, 2.5, -14.0, -8.0, 0.75, 2.25, 7.0, - 7.0, -8.0, -0.3125, -5.5, 3.25, -12.0, 0.0, 1.5, - -32.0, 20.0, 16.0, 12.0, 12.0, -16.0, 8.0, -12.0, - -0.75, 24.0, 0.1875, -20.0, 28.0, 6.0, 0.5, -0.375, - -16.0, 7.5, -0.375, 0.125, -3.5, 44.0, 0.125, -0.25, - -56.0, 0.5, -16.0, -1.0, 20.0, 15.0, 4.0, 6.0, - 120.0, 0.375, -0.5625, 48.0, 4.0, 0.5, 2.0, 10.0, - -6.0, -2.0, -1.0, 1.0, 4.0, 9.0, 9.0, -2.0, - -3.25, 1.25, 0.75, -3.75, -0.875, 8.0, -0.875, -1.75, - 10.0, 1.0, -3.5, 0.0, -1.0, -22.0, 3.5, -20.0, - -10.0, 2.25, -15.0, 26.0, 8.0, 10.0, -4.0, 1.5, - -0.5, 24.0, -24.0, -7.0, 1.375, -2.25, 14.0, -0.75, - -64.0, -0.125, -0.5, 32.0, 32.0, 48.0, 2.0, 0.0, - 12.0, 26.0, -10.0, -3.5, -0.5625, 0.875, -15.0, 0.5625, - -0.1875, -64.0, -0.875, 2.5, -0.25, 1.875, -0.8125, 2.5, - -5.5, 40.0, -6.0, -0.5, -5.0, 10.0, -30.0, -1.625, - 1.0, 20.0, 6.5, -15.0, 4.0, -1.125, -24.0, 12.0, - -1.0, -0.8125, -8.0, -40.0, -4.0, 0.5, -6.0, -104.0, - 1.0, -4.0, 24.0, -0.8125, -10.0, 8.0, 0.5, -0.25, - -120.0, -0.25, 14.0, 48.0, 32.0, -1.0, -18.0, 0.0625, - -6.0, -8.0, -10.0, -0.25, 0.0, 1.75, -1.125, 64.0, - 40.0, -0.5, 72.0, 14.0, 2.0, -7.0, 52.0, 3.0, - -3.25, -3.0, -0.5625, -1.625, 20.0, 1.125, -64.0, -1.0, - -1.875, -20.0, -2.0, -1.625, -0.25, 104.0, -1.375, 28.0, - -8.0, 24.0, -4.0, -80.0, 14.0, 5.5, 0.0, -0.5, - 4.5, -16.0, -13.0, -120.0, -0.5625, -14.0, -2.5, 0.8125, - -0.375, -0.0625, 0.75, 9.0, -2.5, 28.0, -1.5, -1.0, - 6.5, -16.0, -3.5, -32.0, -16.0, 0.0, 0.8125, -1.5, - -9.0, -0.9375, -24.0, 2.25, 3.75, -2.0, 4.0, -26.0, - -20.0, -22.0, 5.0, 1.5, 4.0, -0.375, -0.125, 0.25, - 1.25, -0.125, -2.75, -0.75, 4.0, -1.5, 8.0, 16.0, - -1.625, -0.375, 3.0, -18.0, 0.4375, -40.0, -5.0, -1.0, - -0.125, -4.0, -8.0, 0.8125, 24.0, 7.5, 0.0, -64.0, - 0.8125, 120.0, 0.25, 40.0, -12.0, -6.5, 1.0, 12.0, - -0.5, -40.0, -8.0, 1.0, -44.0, -10.0, -1.125, 0.0, - -2.5, 2.75, -10.0, 56.0, -1.5, -40.0, 2.0, -30.0, - -3.0, -1.0, -1.5, -0.8125, 4.0, 0.25, -1.75, 1.375, - -0.25, -0.9375, 0.0, -80.0, 5.0, 14.0, 0.5, -120.0, - 10.0, 7.5, -16.0, -1.5, -24.0, 48.0, 4.0, 1.25, - -48.0, -3.0, 0.0, 16.0, -0.75, 32.0, -1.5, -0.5, - -16.0, 28.0, -88.0, -3.0, 14.0, 0.75, 7.0, 0.5, - 3.0, -0.375, -24.0, -36.0, 0.4375, 88.0, -0.1875, 12.0, - 80.0, 2.0, 88.0, -0.625, 0.625, 2.0, 24.0, 0.0, - -22.0, 6.0, -14.0, -9.0, 0.375, 0.0, 2.75, -2.75, - 0.625, -8.0, -28.0, -4.0, -72.0, 1.0, 4.5, -2.0, - -26.0, -44.0, -80.0, -18.0, -88.0, -1.0, -4.0, -6.0, - -0.75, 15.0, 4.0, -88.0, 6.0, -44.0, 0.0625, 8.0, - -5.5, 7.0, 0.75, -0.25, 1.0, -8.0, -26.0, -0.75, - 72.0, 1.0, 0.6875, -96.0, -4.0, 20.0, -0.5, -0.25, - -22.0, 52.0, -60.0, -0.5, -1.5, -6.5, 0.25, 0.9375, - -16.0, -3.5, 40.0, -4.0, -32.0, 28.0, 1.75, 8.0, - -12.0, -4.0, 0.0, -1.5, 1.125, -0.5625, 0.0, 120.0, - -0.5, -64.0, 40.0, -32.0, -40.0, 56.0, -3.5, -5.0, - 72.0, 3.75, 0.625, -13.0, -9.0, -4.0, -9.0, -4.0, - 4.0, 1.75, -0.75, -48.0, 0.5, -0.25, 6.0, 14.0, - -40.0, -0.625, 0.9375, -112.0, -60.0, -1.0, -0.5, 40.0, - -1.75, 5.0, -6.0, -40.0, -16.0, -48.0, -28.0, 8.0, - -7.0, 80.0, 28.0, -2.0, -60.0, -64.0, 40.0, 32.0, - 2.0, -52.0, -0.625, -10.0, -40.0, 56.0, -14.0, 5.0, - -2.0, -0.375, 0.0, -0.375, 0.375, -14.0, -0.875, -72.0, - 5.5, 28.0, 1.25, 16.0, -8.0, 10.0, 26.0, 32.0, - -104.0, -0.6875, -3.25, 7.0, 56.0, -10.0, -96.0, 1.5, - 28.0, -2.25, 0.4375, -10.0, 52.0, 2.25, 5.5, -1.75, - -3.0, 2.0, 3.0, -12.0, -12.0, -16.0, -1.0, -4.0, - 2.75, 0.3125, -1.875, 24.0, 1.25, -0.75, -40.0, 72.0, - -64.0, 1.75, -40.0, 3.0, 0.75, -1.75, 0.0, -2.0, - 18.0, 24.0, -6.0, -24.0, 40.0, -64.0, 0.5, -0.875, - 0.5, 3.0, 5.0, 0.5, -1.0, 1.75, -24.0, 5.5, - 1.0, 24.0, -14.0, 18.0, -8.0, -4.0, -0.125, 8.0, - -36.0, 56.0, 40.0, -56.0, -1.75, 2.0, -0.0625, -10.0, - -0.3125, -1.5, 0.25, 0.5, -1.0, 7.0, 3.5, 64.0, - -12.0, 3.75, 26.0, -1.375, -40.0, -128.0, -18.0, -24.0, - -0.625, -1.125, 4.0, -0.75, 48.0, -4.0, 1.375, 0.6875, - 44.0, -0.8125, -8.0, -1.25, 0.0, -7.0, 0.375, -5.0, - 24.0, 120.0, -0.5, -2.5, 0.5, -2.25, 40.0, -96.0, - -4.0, -30.0, -6.0, -1.0, 4.5, 7.0, -0.5, 0.0, - 5.0, -2.0, -0.5, -120.0, -14.0, 0.0, 26.0, -0.75, - 0.875, -120.0, 0.0, -22.0, -0.25, -4.5, 1.0, 0.5, - -32.0, -0.6875, -56.0, 7.5, 12.0, 1.125, -0.25, -14.0, - -0.375, 40.0, 0.1875, 0.25, 0.0625, -8.0, 2.0, 0.5, - 0.0, 88.0, 16.0, 0.0, 0.5, -20.0, -7.5, -64.0, - 14.0, 0.75, 0.0, 1.5, 13.0, 32.0, 1.5, 0.875, - -0.5, 18.0, -4.0, 20.0, -1.625, -64.0, 0.3125, 18.0, - -0.0625, 0.375, -28.0, -0.375, 0.0, -24.0, 40.0, -30.0, - 0.5, 120.0, 7.0, 2.5, -104.0, -0.875, 9.0, -1.875, - -0.625, -8.0, -30.0, 1.25, 1.125, -4.0, -0.8125, 1.0, - -8.0, 4.0, -0.5, -13.0, -3.25, 7.5, 1.0, -3.5, - 8.0, 6.0, -14.0, -1.0, -16.0, 20.0, 1.125, -1.625, - 2.0, -6.5, -64.0, 120.0, -1.75, -13.0, 5.0, 1.0, - 0.125, 22.0, -0.1875, -104.0, -5.5, 0.5, 1.0, -1.5, - 0.75, -0.8125, 6.5, 15.0, 36.0, 18.0, 1.0, -1.0, - 18.0, 48.0, -11.0, -60.0, 22.0, -8.0, -20.0, -4.0, - -80.0, -0.125, -4.0, 80.0, 0.5, 0.75, -4.5, 0.875, - 26.0, 12.0, -0.75, 44.0, -12.0, 8.0, 4.5, 24.0, - 1.25, 15.0, 104.0, 1.0, -0.5, -1.0, 0.0, -16.0, - -8.0, -4.0, -28.0, 0.0, -3.75, -26.0, -8.0, -30.0, - 1.75, 2.0, 1.375, 2.5, -2.75, -1.0, -3.5, -0.5, - -8.0, 0.875, -14.0, -4.5, 3.0, 0.6875, 40.0, -12.0, - 3.5, 1.75, -11.0, 14.0, 2.0, -15.0, 48.0, 0.1875, - -2.0, -7.0, 24.0, 2.0, -4.0, 2.0, -1.75, 24.0, - 11.0, 0.3125, 52.0, 8.0, -1.0, -2.0, -12.0, 1.25, - -44.0, 18.0, 0.875, 3.0, -7.0, 28.0, 0.5, 1.75, - 0.75, 0.0, -22.0, -18.0, 2.75, -0.375, 3.75, -28.0, - -32.0, -48.0, -0.375, 104.0, -12.0, 0.25, 7.0, -32.0, - -1.125, 3.0, 1.875, -0.75, -0.5625, 6.0, 0.8125, 2.0, - -14.0, -3.75, 16.0, 0.875, 56.0, -16.0, -6.0, -1.125, - 1.0, 0.875, 1.75, 32.0, 20.0, 0.625, -1.5, 0.625, - 5.0, -10.0, 3.5, 6.5, 0.0, 72.0, 56.0, 8.0, - 3.5, -0.25, -32.0, 1.75, 3.0, 8.0, -0.75, 13.0, - 10.0, -80.0, 0.1875, -16.0, -5.5, 16.0, 26.0, 6.0, - 0.5, 0.5625, 0.0, 3.75, -4.0, 1.75, -4.0, 15.0, - 0.5, -2.0, -7.0, -24.0, 0.8125, 16.0, -32.0, -10.0, - -18.0, 0.3125, -3.0, -0.6875, -5.0, -2.5, -16.0, -6.5, - 2.0, -0.5625, 0.0, 56.0, 9.0, -0.375, 2.0, -7.0, - 1.5, 1.5, 1.0, 16.0, 0.75, -40.0, -40.0, -4.0, - 0.375, -6.0, -3.0, 0.25, 4.0, 14.0, 14.0, 1.875, - -2.75, -1.75, -0.25, 0.8125, 24.0, -1.625, 1.75, -2.75, - -48.0, -0.625, 6.0, -0.0625, 1.5, -3.5, 0.5, -5.5, - -3.5, -40.0, -2.0, 0.125, -40.0, 30.0, -80.0, 4.0, - -16.0, -0.25, -1.5, 0.0, 48.0, 12.0, 0.75, -88.0, - -1.0, -40.0, -18.0, -2.0, 0.75, 112.0, 20.0, -18.0, - -0.3125, 1.5, -5.0, -22.0, 16.0, 2.5, 12.0, -64.0, - -0.375, -4.0, -6.0, 40.0, 0.0, 7.0, 1.25, 13.0, - -24.0, -13.0, 6.5, 1.0, 30.0, 1.25, -52.0, -28.0, - -16.0, -2.0, -8.0, 16.0, 4.0, 0.75, 0.0, -0.5, - -0.5, -0.5, 14.0, -0.25, 3.0, -5.0, 6.0, 2.0, - -48.0, -0.3125, -12.0, -2.0, -14.0, -2.0, -1.25, 56.0, - 104.0, -8.0, -3.5, 15.0, 1.875, -12.0, 8.0, -14.0, - -44.0, 0.8125, -40.0, 60.0, 8.0, 36.0, 14.0, -26.0, - -0.25, -0.75, 4.0, -44.0, -32.0, -5.5, 1.625, -8.0, - -44.0, -48.0, -0.75, 2.25, 6.5, -3.5, -24.0, 64.0, - -6.5, 4.5, -36.0, -24.0, -0.5, -10.0, 0.375, -0.5, - -0.625, -1.5, -4.0, 1.25, -22.0, 0.125, -0.625, -1.0, - -4.0, 24.0, 1.125, -6.0, -3.0, 0.0, -2.25, -12.0, - 6.5, 24.0, -1.0, -3.75, 0.25, -7.0, -48.0, -3.25, - 18.0, 16.0, -1.0, 14.0, 32.0, 28.0, 0.0, -16.0, - 1.5, -80.0, -32.0, 0.0, -0.4375, -96.0, -16.0, 6.0, - -0.6875, -12.0, 36.0, 1.25, -0.875, 16.0, 9.0, -48.0, - -1.5, 0.25, -24.0, 0.125, -12.0, -8.0, 20.0, -20.0, - 8.0, 1.125, -6.0, 112.0, 0.0625, -3.0, 7.0, -8.0, - -2.0, -8.0, 0.0, -0.5, -11.0, -0.8125, 24.0, 2.0, - -7.0, 6.5, 56.0, 1.75, 0.1875, -96.0, -56.0, 10.0, - 0.0, -13.0, 0.75, 1.375, -1.5, 2.25, 1.75, 0.0, - -7.0, -0.25, 28.0, -60.0, 6.0, 0.0, 0.125, 24.0, - -8.0, -16.0, -4.0, -112.0, -32.0, 0.875, 1.75, 10.0, - -3.5, 20.0, -0.375, -28.0, -24.0, 9.0, -2.5, -8.0, - -7.5, 0.25, -15.0, -15.0, 26.0, 16.0, 28.0, 1.0, - -0.5, 0.8125, -14.0, 0.75, -4.0, 18.0, 2.0, -0.125, - -48.0, 2.5, 44.0, -3.75, -3.25, 1.0, 52.0, 0.75, - -120.0, -0.1875, -0.75, -2.5, 112.0, 0.375, 0.1875, -2.0, - 5.0, -7.0, -1.875, 0.0, 60.0, 7.5, 0.125, -0.1875, - -1.25, 12.0, -2.0, -32.0, -20.0, 6.0, 2.0, -1.0, - -12.0, 12.0, 80.0, -1.75, -0.25, 7.0, -64.0, -0.5, - -3.0, -0.875, -2.5, -2.0, 0.9375, 26.0, 1.625, 4.0, - -20.0, 80.0, 24.0, 7.5, 7.0, -2.0, -48.0, 6.0, - -0.75, -12.0, 0.9375, -1.625, -32.0, -24.0, -2.75, 52.0, - -16.0, 12.0, -6.0, 10.0, -0.375, 24.0, 3.0, 1.0, - -9.0, -1.125, -7.0, 2.0, -8.0, 0.875, -8.0, 1.75, - 1.25, -48.0, -16.0, 1.0, -11.0, -0.5, -128.0, -1.75, - -4.0, 60.0, 2.0, -11.0, -4.0, -4.0, -0.1875, -0.4375, - -10.0, -14.0, 5.0, 88.0, -0.125, 8.0, -20.0, -36.0, - 56.0, 1.0, -1.5, 2.0, 14.0, -16.0, -60.0, -16.0, - -2.0, 0.4375, 40.0, 0.25, -4.0, -1.0, 36.0, 0.3125, - -44.0, 20.0, 64.0, -40.0, 80.0, 104.0, 0.25, 44.0, - 12.0, -32.0, 15.0, -5.0, -0.625, 3.0, 4.5, 26.0, - -28.0, 1.375, 1.0, -16.0, 8.0, 7.0, 0.5, 12.0, - 120.0, -96.0, 0.875, 0.4375, 0.25, -56.0, 1.25, 0.3125, - 13.0, -104.0, 6.0, 3.75, -10.0, 2.5, 56.0, -3.0, - -60.0, 0.875, 40.0, -0.9375, -15.0, -0.75, -28.0, 9.0, - -6.0, -32.0, 7.0, -2.0, 52.0, 0.0, -1.625, 30.0, - -2.0, -64.0, 0.375, -13.0, 0.0, -1.75, -1.25, 0.5, - 16.0, -1.5, 3.0, -4.0, -1.875, -22.0, -2.0, -0.125, - 3.0, -104.0, -40.0, -7.5, 0.125, -20.0, -0.9375, 2.0, - 112.0, 2.5, -20.0, -1.75, -9.0, -48.0, 6.0, 24.0, - -8.0, -0.25, 14.0, 3.0, -30.0, -3.0, -0.5, -2.25, - 1.125, -12.0, 56.0, 12.0, 1.25, -0.8125, 0.75, 28.0, - -30.0, 0.0, 56.0, 1.875, -2.25, -16.0, -0.125, 1.75, - -0.9375, -20.0, -6.5, 0.9375, 0.9375, -2.0, -0.75, -6.0, - -0.625, -104.0, 0.4375, 5.0, -28.0, -0.0625, -0.375, 0.9375, - 1.0, -0.0625, 0.3125, 8.0, -12.0, -0.875, -0.25, 96.0, - -3.5, -4.0, 0.375, 88.0, 56.0, 32.0, 3.0, -22.0, - -7.0, 4.0, 0.0, 8.0, -24.0, 0.5, -8.0, -14.0, - 5.5, -104.0, -4.0, -0.0625, 0.875, -0.25, 40.0, -3.0, - -5.0, 24.0, 64.0, -56.0, 9.0, -0.0625, -0.5, -36.0, - 1.375, -16.0, -16.0, -7.5, -6.0, 12.0, 1.125, -10.0, - 8.0, -30.0, -80.0, -4.0, 0.25, -15.0, 0.625, -8.0, - -0.25, -4.0, 9.0, -0.125, 44.0, 2.5, 0.8125, 30.0, - -1.0, -8.0, -1.25, 5.5, 0.0625, -120.0, -8.0, 8.0, - 48.0, -0.6875, 2.5, 48.0, 1.5, -6.5, 0.0, 24.0, - 1.5, 40.0, -2.0, -1.0, 0.0, -6.0, 0.0, 1.25, - 104.0, -88.0, -0.875, -2.0, 0.3125, -0.5625, 10.0, 7.5, - -6.5, -112.0, -1.75, -0.75, -0.0625, 4.0, 52.0, -0.0625, - 0.0625, -9.0, -3.0, -3.75, 48.0, 5.0, 2.0, 3.75, - 0.4375, 18.0, -60.0, -0.875, -16.0, -0.3125, -104.0, 0.75, - -10.0, -16.0, -1.5, -1.25, 1.5, 2.0, 22.0, 0.1875, - 0.8125, -44.0, 8.0, -3.5, -0.125, 4.0, 16.0, -1.0, - -12.0, 0.3125, -60.0, -6.0, -0.75, 3.25, -0.5, 4.0, - -0.9375, 48.0, -0.875, 4.0, -0.0625, -3.0, 3.5, 0.125, - -16.0, 1.0, -3.5, 0.875, 2.5, -1.75, -0.25, 64.0, - -56.0, -16.0, 1.25, -1.0, -2.0, -0.75, -3.5, -112.0, - 2.0, 1.25, 0.5, 2.5, 3.75, -0.25, -13.0, 10.0, - 0.0, -1.25, -8.0, -0.75, -128.0, -1.5, 5.0, 104.0, - -3.0, 0.75, -12.0, 88.0, -128.0, 2.5, 2.5, 0.25, - 1.25, 7.0, 0.25, 5.0, 14.0, -5.0, -4.0, -4.0, - -1.875, 88.0, -0.625, 0.5625, 0.625, 0.375, -2.0, 3.5, - 3.5, 60.0, 120.0, 0.5, -0.5, 10.0, 16.0, -0.1875, - -1.5, -56.0, 12.0, 72.0, -1.75, 0.0625, -128.0, -10.0, - 0.625, -1.5, 0.9375, -1.625, -8.0, -13.0, 44.0, 104.0, - 16.0, 1.5, -128.0, 0.25, -0.75, -3.0, -16.0, 0.625, - -28.0, -2.5, 8.0, -128.0, -0.6875, -6.5, 2.25, 16.0, - 0.75, -11.0, 9.0, -13.0, 8.0, -24.0, 3.0, -0.375, - 6.0, 5.0, 36.0, -8.0, -30.0, -26.0, -52.0, 0.25, - 20.0, 14.0, -10.0, 52.0, -0.1875, 1.75, -0.625, -0.875, - -3.0, 0.0, -7.0, 1.125, -0.3125, -2.0, 12.0, 14.0, - -12.0, -6.0, -1.5, -40.0, -3.25, 0.375, 28.0, 0.8125, - 0.625, -2.5, 40.0, 60.0, -1.0, -2.0, 24.0, 4.0, - 1.375, 28.0, -1.75, -6.5, -0.25, -24.0, -1.0, -44.0, - -16.0, -120.0, 7.5, 1.375, 24.0, -2.0, -13.0, 26.0, - -0.9375, 20.0, -9.0, -0.375, -12.0, 1.125, -0.1875, -2.25, - 0.625, 0.375, 8.0, 14.0, 64.0, -60.0, -1.5, 0.25, - -24.0, 64.0, -12.0, -40.0, 0.5, -0.75, 48.0, 0.875, - 0.0, 9.0, 10.0, -12.0, 0.75, -12.0, 0.0, -9.0, - -120.0, -1.5, -6.5, 26.0, 0.125, -3.5, 0.875, 0.375, - -1.0, -4.0, 1.375, -8.0, -0.6875, -1.0, 15.0, -2.0, - 30.0, 2.75, 3.0, 40.0, -3.5, 0.75, -0.75, -16.0, - 14.0, -1.5, 3.0, 6.0, -1.0, -2.75, 20.0, -32.0, - -1.75, 1.625, -4.0, -64.0, 2.75, -1.0, -64.0, -13.0, - -0.3125, 13.0, 22.0, 0.0, -1.75, 4.0, 60.0, -0.25, - -1.0, -4.0, 24.0, 5.5, 0.0, -1.75, -1.875, -0.5625, - 1.875, -9.0, 0.75, -3.0, 0.0, -6.0, -2.0, -8.0, - -4.5, 0.0, 2.0, 80.0, 6.0, 0.375, -4.0, 1.375, - 1.25, 1.75, 56.0, 6.5, 0.3125, 32.0, -72.0, 3.5, - -20.0, -26.0, -13.0, -0.3125, -7.0, 5.0, -32.0, -0.375, - -44.0, -1.5, -32.0, -24.0, 0.3125, 0.625, 3.0, 1.125, - 0.875, -20.0, 14.0, -0.4375, -36.0, -4.0, 6.0, 0.375, - -0.5, -56.0, -1.5, 0.9375, 6.0, 1.875, 2.0, 1.5, - -96.0, 15.0, 28.0, -1.375, 48.0, 2.5, 0.25, 48.0, - 72.0, 9.0, -0.125, 48.0, -32.0, 26.0, -4.5, 6.0, - -0.875, -3.5, 14.0, -24.0, -128.0, -1.25, 1.875, -0.5, - 6.0, 0.1875, -48.0, 0.4375, -14.0, -56.0, 24.0, -0.875, - 36.0, 0.0, -10.0, 120.0, -2.0, 1.0, 3.5, -40.0, - 12.0, 48.0, -128.0, -1.875, 32.0, -120.0, 1.5, 1.125, - -1.75, -3.5, -2.25, 1.0, 4.0, -28.0, -1.125, 4.0, - -28.0, 44.0, 2.75, -0.6875, 0.5, 0.5, 72.0, 8.0, - -0.75, 15.0, 88.0, 4.0, 0.0625, 88.0, 96.0, -1.625, - -3.5, -8.0, 8.0, -0.375, 4.5, 0.125, 28.0, -1.0, - 80.0, -16.0, 16.0, -2.0, 6.0, 0.25, 3.0, 0.625, - -3.0, 12.0, -2.0, 48.0, 16.0, 56.0, 0.0625, -4.0, - 0.8125, 4.0, -3.75, -16.0, 11.0, 8.0, 16.0, 1.125, - 0.625, 15.0, -104.0, -14.0, -0.625, 48.0, 0.5, -0.75, - -6.0, -7.5, -7.0, -24.0, 2.0, -6.0, 1.5, -0.75, - -72.0, -0.25, 0.6875, 1.75, 2.5, 0.125, -4.0, 8.0, - -0.25, 0.125, -2.5, 7.0, 4.0, -7.0, 12.0, -96.0, - -24.0, -1.75, 6.0, 2.75, 24.0, -16.0, -5.0, -1.75, - 0.9375, 0.75, -0.75, 2.0, 0.0, 24.0, -48.0, 1.5, - -16.0, -1.25, 2.25, -20.0, 0.125, -48.0, -0.625, -32.0, - -1.0, 0.3125, -24.0, 14.0, -1.5, 22.0, 1.25, -52.0, - -18.0, 24.0, -22.0, 7.0, -1.25, 3.0, -88.0, -3.75, - 5.0, 5.5, 0.5, -80.0, 0.5, 10.0, 0.0625, 0.1875, - -4.0, -56.0, -3.5, 1.375, -0.625, -6.0, -2.0, -0.125, - 0.25, -5.5, -104.0, 1.25, -0.25, 3.5, 7.0, -40.0, - -2.75, -56.0, -40.0, -0.0625, -0.75, 32.0, -10.0, 1.875, - 2.5, 48.0, 0.9375, 4.0, -1.5, 1.625, -4.0, 8.0, - 40.0, -1.875, 4.0, 104.0, -4.0, -1.0, 48.0, -120.0, - -7.0, -22.0, 24.0, -6.0, 14.0, -104.0, 5.0, -24.0, - 3.5, 36.0, -36.0, 40.0, 120.0, -12.0, -0.5, 60.0, - -2.0, -12.0, -24.0, -5.0, 0.0, -2.5, 11.0, -3.0, - -5.0, 0.25, 6.5, 0.75, -0.25, 0.75, -12.0, 0.0, - 4.5, 40.0, 0.125, 0.125, 0.875, 36.0, -13.0, 6.0, - 7.0, 24.0, -1.5, 4.0, 2.0, -3.0, 0.375, 22.0, - -80.0, 0.375, 26.0, 80.0, 13.0, 10.0, 3.0, -0.1875, - -0.125, -3.5, 2.0, 1.5, 4.5, -0.25, -40.0, 3.0, - 1.5, -3.5, -16.0, 104.0, 1.5, 5.0, 20.0, -36.0, - 3.75, 8.0, -16.0, 0.5, -2.75, -36.0, 0.9375, -0.25, - 2.0, -1.0, -0.1875, -40.0, 1.0, -30.0, -0.1875, -24.0, - -4.5, -56.0, -0.375, -15.0, 2.0, -3.0, -96.0, 5.0, - -0.3125, 0.25, 0.75, 1.0, -15.0, -9.0, -44.0, -0.75, - -56.0, 10.0, 18.0, -1.25, -6.0, 10.0, 8.0, -3.0, - -1.75, -7.5, -16.0, 16.0, 22.0, -104.0, 0.0, 10.0, - -0.3125, -0.75, -2.75, -2.0, -64.0, 9.0, -1.0, -1.75, - 4.0, 52.0, 2.5, -4.5, 22.0, -30.0, 80.0, 24.0, - -48.0, 0.0, 16.0, -60.0, 0.0, -5.5, -88.0, 14.0, - -0.625, 1.0, 3.0, 64.0, -14.0, 12.0, -3.0, -1.125, - 0.4375, -6.0, -104.0, -1.75, 20.0, -4.0, 0.25, -64.0, - 32.0, 60.0, -2.5, -88.0, -0.5, 4.0, -6.5, 0.25, - -1.0, -48.0, -0.75, -112.0, 1.25, 40.0, -8.0, -0.25, - 0.0, -36.0, -2.5, -8.0, -12.0, -1.5, 9.0, -104.0, - -5.5, 64.0, 1.75, -6.0, 8.0, 3.75, 48.0, 2.0, - -2.0, -112.0, 48.0, -4.0, -1.0, -20.0, -32.0, 8.0, - -1.375, -0.75, 0.75, -8.0, 32.0, -13.0, 1.125, 48.0, - -12.0, -0.5, 4.0, -0.75, -1.375, -20.0, 0.5, 3.75, - -20.0, 32.0, 96.0, 0.5, 32.0, 40.0, 5.0, 0.5, - -1.0, 3.75, 16.0, 40.0, 1.75, -5.5, -1.0, -104.0, - 1.0, -56.0, -4.0, -1.375, -8.0, 2.0, -8.0, 0.75, - 3.75, 2.0, 60.0, 0.8125, -2.5, -0.1875, 112.0, 1.375, - 6.0, 26.0, -128.0, -4.5, 1.375, -80.0, 14.0, -22.0, - 1.375, -1.875, -0.5625, 9.0, -2.0, 2.5, 0.375, -60.0, - 28.0, -2.0, -0.75, -1.875, 64.0, 4.0, 32.0, 5.0, - 4.0, -32.0, -4.0, -40.0, 20.0, 0.75, -32.0, 44.0, - 48.0, -16.0, -32.0, 56.0, -2.5, -64.0, -120.0, 24.0, - 2.5, 40.0, 5.5, -0.125, -7.0, 14.0, -56.0, 4.0, - -56.0, 0.8125, -12.0, -2.25, 0.0625, -0.375, -1.0, -3.5, - -88.0, -18.0, 6.5, -0.4375, 32.0, 0.0, -0.375, 0.0, - 16.0, 0.25, 12.0, 3.0, -24.0, -22.0, -1.25, 2.75, - 1.5, 1.375, 5.5, -6.5, 80.0, 1.125, -0.125, 96.0, - -120.0, -0.5, -6.5, -14.0, 0.625, 4.0, -16.0, -1.0, - -6.0, 80.0, -12.0, -1.0, 72.0, 40.0, -120.0, -56.0, - 0.1875, -8.0, 24.0, 6.0, -1.25, -16.0, 5.5, -6.0, - 3.0, -13.0, 0.25, -1.0, -2.25, -6.0, -20.0, 0.0, - -2.0, -4.0, -6.0, -1.5, 112.0, -13.0, 0.75, 0.5, - 6.0, 0.125, 3.0, 7.0, -80.0, 0.375, -1.5, -3.5, - 64.0, -2.5, -8.0, -128.0, 8.0, 64.0, -0.5, -8.0, - -26.0, -1.0, 24.0, -2.0, 2.25, -0.75, 2.0, -12.0, - 1.0, -12.0, -2.5, -2.5, 96.0, -0.3125, -20.0, -72.0, - 0.0625, 2.5, 24.0, 56.0, -88.0, 1.25, 40.0, 0.875, - -9.0, 9.0, -24.0, -48.0, -20.0, 0.8125, -0.5, -88.0, - 0.625, -40.0, 14.0, -5.0, -120.0, -6.5, 0.0, -5.0, - -6.0, -1.0, 0.0, 2.25, 1.125, -0.125, 112.0, 120.0, - 12.0, -1.5, 96.0, -0.0625, 64.0, 0.1875, -15.0, 12.0, - 4.0, 3.25, -30.0, -4.0, 14.0, -32.0, 3.5, 22.0, - 3.0, -4.5, -24.0, 0.375, -3.75, -5.5, -3.5, -9.0, - -16.0, -0.375, -20.0, -80.0, 0.75, -8.0, 1.25, -1.625, - -0.4375, -0.6875, -0.9375, 13.0, 3.5, 4.0, -0.0625, 0.25, - 16.0, 1.625, 6.0, 3.5, 104.0, -16.0, 2.25, -3.0, - 16.0, -26.0, -8.0, 0.8125, 0.0, -13.0, -5.5, -36.0, - -0.25, -6.0, 8.0, 4.0, 0.4375, -2.25, -1.125, 2.0, - -12.0, 1.5, 24.0, -1.0, 1.5, -7.5, 64.0, -1.125, - -0.1875, -56.0, 12.0, 4.0, -8.0, 16.0, 1.375, 0.5, - -14.0, -2.25, -11.0, -120.0, 40.0, -0.625, -1.625, 88.0, - -0.5, -3.0, 28.0, -32.0, -26.0, -112.0, -2.0, -14.0, - 24.0, 3.75, 0.625, -0.25, 96.0, 40.0, -56.0, -56.0, - -88.0, 6.0, -1.5, 6.5, 96.0, 1.0, 0.0, 3.5, - -0.5, 0.0, -16.0, 112.0, 4.0, -8.0, 3.5, -7.0, - -14.0, 2.0, -5.0, 16.0, -112.0, -0.375, -18.0, 2.5, - 0.375, 8.0, -0.25, -16.0, 0.5, 0.25, -0.5, 88.0, - -56.0, 10.0, -7.0, 0.5625, -1.75, 1.0, 30.0, -5.5, - 24.0, -0.75, 0.4375, 0.875, -3.0, 0.6875, 7.0, 3.0, - 1.25, -52.0, 7.0, -80.0, -1.5, 20.0, 112.0, -6.5, - 2.5, 48.0, 0.625, 0.0, -12.0, 2.0, -3.0, -56.0, - 40.0, -1.25, 20.0, -104.0, -20.0, -0.125, 10.0, -5.0, - -13.0, -24.0, 0.125, 8.0, 12.0, 2.5, 4.0, 3.0, - -0.375, 20.0, -0.75, 112.0, 40.0, 4.0, -26.0, 40.0, - 32.0, 4.5, 0.0, -0.1875, -2.5, 0.5, -44.0, 11.0, - 2.25, 0.5, 1.125, 3.5, 0.25, -0.25, 1.625, -80.0, - -18.0, 1.625, -4.0, 20.0, 1.75, -0.25, 96.0, 52.0, - 0.1875, 0.125, 6.0, -8.0, -7.0, 104.0, -8.0, -4.0, - 1.375, -3.25, -56.0, 0.875, -0.75, -0.25, -5.0, -2.0, - -48.0, 104.0, 120.0, -0.125, -48.0, 32.0, -0.5625, -14.0, - 2.0, -10.0, -0.8125, -60.0, 0.0, -1.5, 0.375, 0.75, - -6.0, 6.5, -1.0, -1.5, 1.625, 1.625, -2.75, -2.75, - 5.0, -0.5625, 0.75, -1.0, 0.0, 3.0, 0.125, -4.0, - -8.0, -0.6875, 8.0, 6.0, 0.1875, 0.875, -14.0, -1.5, - -1.875, 22.0, -1.5, -3.0, 0.25, 1.5, 10.0, 2.75, - -0.125, 1.875, -64.0, -44.0, 64.0, 0.0625, -1.125, 7.0, - -6.0, 3.0, -1.25, -4.0, -1.0, -104.0, 16.0, -2.0, - 30.0, 0.625, 1.75, -0.6875, 0.5, 0.125, -0.3125, 40.0, - 0.75, 1.75, -0.8125, 1.625, -3.0, -1.25, -4.0, -0.5, - 7.0, -0.5, 0.625, -28.0, 1.875, -0.25, -64.0, 0.25, - 12.0, 0.0625, -0.125, -36.0, -15.0, 8.0, -3.5, -12.0, - 4.0, 0.0, 4.5, 32.0, -44.0, -4.0, -36.0, -0.75, - -44.0, 10.0, 8.0, 0.4375, -11.0, -4.0, 24.0, -0.25, - 0.625, 0.0, -3.5, 48.0, 104.0, 3.0, -1.375, 88.0, - 0.125, -56.0, 0.6875, 9.0, -30.0, -60.0, 8.0, 3.0, - -64.0, -13.0, 0.4375, -0.875, -28.0, -20.0, 3.5, 4.5, - 16.0, 28.0, 10.0, 56.0, -0.25, 0.5, 24.0, -1.375, - 3.75, 0.625, -10.0, -112.0, 0.0, 20.0, 6.0, -16.0, - -0.75, 22.0, -0.375, -0.625, -8.0, -32.0, 0.0, -5.0, - 0.25, 11.0, -72.0, 96.0, 32.0, 1.75, -64.0, 1.625, - 56.0, -56.0, -6.0, 0.0, 72.0, -3.5, -3.25, -1.5, - 14.0, -1.75, -7.0, -0.875, 0.875, 0.5, -0.75, -10.0, - -3.25, -10.0, -0.9375, 30.0, -26.0, 0.875, -0.5, -28.0, - -24.0, -20.0, -24.0, 0.0625, 56.0, 104.0, -0.75, 28.0, - -1.375, -3.0, -0.25, -32.0, 0.0, -0.25, -0.4375, 0.0, - 0.0, -56.0, 1.5, 28.0, -24.0, 0.0, 2.0, -2.0, - -120.0, 32.0, 18.0, -6.0, -1.375, 3.0, 3.0, -2.0, - -0.625, -112.0, 64.0, -24.0, -4.5, 12.0, -10.0, 0.3125, - -16.0, -3.0, -112.0, 0.75, -8.0, 24.0, -1.75, -1.0, - 0.875, -2.5, -1.25, 14.0, 11.0, 88.0, -96.0, -0.375, - 14.0, 16.0, -6.5, 120.0, 22.0, -1.625, -40.0, -2.0, - 0.0, -16.0, -1.0, -3.25, 16.0, 28.0, 72.0, -32.0, - -2.0, 32.0, -4.0, 88.0, -44.0, 11.0, 3.75, -60.0, - -52.0, 2.0, -0.25, -4.0, -1.25, 88.0, 2.0, 2.0, - 0.0, 1.0, -0.8125, 14.0, -32.0, -3.5, -1.25, 20.0, - 14.0, 0.5, -16.0, 1.25, 0.0, 0.1875, -2.25, -6.0, - 28.0, 12.0, 11.0, 14.0, -16.0, -1.0, -16.0, 0.8125, - -72.0, -3.25, -1.5, -48.0, 0.125, 2.25, -24.0, 2.25, - -16.0, 9.0, -20.0, 22.0, 7.0, 14.0, -112.0, 0.875, - 28.0, 1.0, 0.375, 0.5, -1.5, -32.0, -40.0, -16.0, - -24.0, 3.5, -24.0, -8.0, 16.0, -10.0, -30.0, -16.0, - 4.0, 24.0, 0.4375, 6.5, -14.0, -18.0, -2.0, -3.0, - -11.0, -5.0, -2.0, -64.0, -1.5, 0.0, 3.5, -1.75, - -4.0, 18.0, -18.0, -1.125, -8.0, 0.625, 1.625, -15.0, - 36.0, 28.0, 4.5, 0.9375, 96.0, 1.0, -1.0, -0.1875, - -4.0, 24.0, 10.0, -7.5, 0.5, 56.0, 15.0, 0.375, - 24.0, 0.0625, -0.25, -14.0, -2.75, 2.0, 16.0, -0.625, - -7.0, -12.0, -3.5, -52.0, -0.5, -5.5, -0.5, 88.0, - 24.0, -0.75, -14.0, 0.0, -24.0, 5.0, 0.625, 0.75, - -0.25, -8.0, 3.25, 8.0, -8.0, -5.5, -0.125, -32.0, - 3.75, 28.0, -56.0, 72.0, 20.0, -2.0, -9.0, 2.75, - 0.375, -10.0, 6.0, 1.75, -8.0, -2.0, 28.0, 0.125, - 7.0, -14.0, 1.0, 15.0, -24.0, -28.0, -0.5, -1.0, - 1.0, -8.0, -0.5, -0.0625, -2.5, 48.0, -0.125, -7.0, - 1.5, 56.0, 14.0, 72.0, 1.0, -48.0, -0.125, -1.625, - 2.5, 0.4375, 6.0, -0.375, -64.0, -16.0, -64.0, 20.0, - -36.0, 16.0, -3.0, 1.5, -128.0, 3.5, 0.375, 0.125, - 48.0, -4.0, 0.75, -120.0, -14.0, 2.0, 0.375, -8.0, - 1.75, -5.5, -16.0, -28.0, 0.0, -24.0, 88.0, 72.0, - -8.0, -28.0, -80.0, -6.0, -4.0, 60.0, -16.0, 0.125, - 4.0, 1.0, 104.0, 3.0, -12.0, 6.0, -26.0, 4.0, - 8.0, -20.0, 48.0, 0.25, -4.0, -5.0, -28.0, -22.0, - -0.875, 24.0, -3.25, 1.5, -1.875, -20.0, 56.0, 15.0, - 0.5, -96.0, -2.5, 24.0, -8.0, -3.5, 8.0, 1.875, - -1.875, -1.5, 2.5, -0.1875, 2.5, 32.0, 56.0, -4.0, - 24.0, 1.0, 32.0, 3.75, -20.0, 1.0, 0.5, 40.0, - 2.75, 1.0, -7.0, 0.0, 5.0, -12.0, -0.875, 0.5, - -0.4375, 1.5, 14.0, 44.0, 22.0, -2.0, -1.0, 80.0, - -9.0, 2.25, -1.0, -3.0, 0.75, -0.3125, 52.0, 14.0, - -14.0, -8.0, -9.0, -0.875, -6.0, 0.125, 0.0, 0.25, - -8.0, 0.75, 1.125, 52.0, 3.5, -36.0, 24.0, 24.0, - -56.0, -}; -static data_t b_matrix[K_DIM * N_DIM] = { - -3.5, -1.0, -0.5, -3.5, -44.0, -40.0, -0.3125, 10.0, - 1.5, 28.0, 0.625, 7.0, 10.0, -120.0, 0.5, 16.0, - 2.0, 0.1875, 12.0, 40.0, 96.0, -10.0, 1.5, 15.0, - -48.0, 56.0, -0.5, -1.5, 56.0, -0.125, 0.25, -0.9375, - -6.0, 7.0, 0.5, 3.0, 0.75, -6.0, 1.0, 96.0, - -32.0, -14.0, 4.0, -2.0, -5.0, -16.0, 10.0, -6.0, - 0.4375, 2.0, 28.0, 2.0, 2.0, 14.0, -10.0, 44.0, - -1.0, 0.5, -104.0, 1.75, -72.0, -0.75, 1.625, 10.0, - 0.1875, -14.0, 2.75, -2.0, 9.0, -80.0, 0.375, -0.125, - 2.5, -5.0, 1.0, 88.0, -9.0, 2.5, 30.0, 7.0, - 3.0, 24.0, 2.25, -64.0, -0.6875, 0.875, -112.0, -6.0, - -5.0, -64.0, -32.0, -14.0, 2.25, -0.3125, -32.0, 24.0, - -0.5625, -0.125, -0.9375, -1.75, 30.0, -6.5, 2.75, -3.25, - 112.0, 3.0, 0.5, 56.0, -1.125, 7.0, 1.25, 3.25, - -48.0, -96.0, 96.0, 13.0, -8.0, 16.0, -14.0, -10.0, - 3.0, 14.0, 52.0, 1.5, 1.375, 1.25, -11.0, -7.0, - 1.125, 112.0, 16.0, 0.0, -0.5, -28.0, 0.0, -72.0, - 32.0, 104.0, -15.0, 14.0, -16.0, -1.0, 0.0, 40.0, - 9.0, 0.5625, 0.625, -0.25, 88.0, 0.75, -36.0, -0.1875, - 0.4375, -104.0, -26.0, -1.0, 1.0, 11.0, 28.0, 0.0, - -28.0, -14.0, 0.0, -2.5, -128.0, 0.75, 0.5, -9.0, - 4.0, -72.0, -3.0, 0.5625, -22.0, 2.0, 26.0, 20.0, - 8.0, -56.0, 112.0, -3.5, -7.0, -22.0, -0.625, 1.75, - 3.5, -0.9375, 1.75, -0.75, -2.0, -2.0, -11.0, -1.75, - -8.0, -4.0, -1.5, -12.0, -48.0, -9.0, -22.0, 0.0, - -0.1875, 11.0, -0.5, -4.0, -1.75, 20.0, 0.5, -80.0, - -28.0, -2.25, -0.25, 28.0, 1.5, 40.0, -104.0, 1.25, - 44.0, 0.5, 28.0, -52.0, -1.5, -0.6875, 6.5, 0.5625, - 0.0, -1.5, -22.0, 16.0, -2.0, -6.0, -0.5, -1.5, - -16.0, -0.5, 10.0, 4.0, -24.0, -0.9375, 0.9375, 14.0, - 0.1875, 4.0, -16.0, 16.0, 1.75, -8.0, -16.0, 2.0, - -15.0, -80.0, 28.0, 0.5, -16.0, 80.0, -0.4375, 112.0, - -60.0, 0.0, -120.0, -2.5, -5.5, -16.0, 14.0, -0.75, - -72.0, 0.0, -3.25, -16.0, 48.0, 60.0, -2.75, -44.0, - -4.0, -1.0, 0.375, -128.0, 1.0, 14.0, -44.0, -112.0, - 5.0, -32.0, 0.25, 8.0, 8.0, -1.0, 120.0, -0.25, - -12.0, 0.0, 32.0, -4.0, 24.0, 0.9375, 96.0, -2.75, - 0.0, -0.375, 7.0, -2.25, 15.0, -0.4375, 1.0, 26.0, - -2.5, -1.0, 30.0, -52.0, -1.25, 1.625, -0.8125, 40.0, - 6.0, -6.0, -0.875, 24.0, 2.0, 0.0, 8.0, 40.0, - -0.25, 12.0, 0.0, -16.0, 3.0, -1.75, 0.625, 2.0, - 3.75, 20.0, -64.0, -6.5, -1.5, -120.0, -0.5, -24.0, - -22.0, 3.25, 0.0, 112.0, -1.25, 4.0, -12.0, -7.0, - 120.0, 2.75, 5.0, -1.25, -2.0, 3.25, 56.0, 14.0, - -14.0, -1.0, -2.5, -1.0, 80.0, -3.0, 0.8125, -10.0, - -1.0, 0.4375, -5.0, -12.0, -7.5, -0.25, -10.0, -28.0, - -20.0, 6.5, -0.625, -26.0, -2.0, -3.75, 12.0, -16.0, - 0.0, -1.875, -16.0, -10.0, 16.0, 0.5625, -20.0, -10.0, - 5.5, -30.0, -12.0, 24.0, -7.0, 2.0, 4.0, -0.75, - 112.0, 1.5, -40.0, -0.125, 72.0, 40.0, 2.5, -4.0, - -22.0, 0.875, -4.0, 3.5, -4.5, 12.0, 0.9375, -26.0, - 28.0, 0.5625, -0.8125, -72.0, 0.5, -9.0, -2.5, -0.3125, - 2.25, -16.0, -8.0, -88.0, -6.5, 1.5, 104.0, -3.0, - -0.5625, -1.0, 15.0, -0.625, 2.5, -1.375, -72.0, 20.0, - -18.0, -0.5625, 0.0625, -0.875, 0.75, -40.0, -1.0, -12.0, - -1.5, -4.5, 2.75, 0.9375, 0.3125, 2.5, -0.375, -1.875, - -0.25, 5.5, 80.0, -3.5, -0.5, 2.25, 112.0, 64.0, - 7.0, -1.75, 0.0, -32.0, -16.0, -0.75, -2.0, 3.5, - 1.75, 0.8125, 2.25, 3.5, 15.0, 0.5, 26.0, -15.0, - 2.5, -8.0, -1.0, 1.625, 14.0, 1.0, 24.0, -5.0, - 0.0, -3.0, -1.5, 8.0, 0.25, 12.0, 0.125, 2.5, - -10.0, 0.625, -1.0, -4.0, 36.0, 1.25, -0.125, -1.25, - -4.0, -2.0, 2.0, -9.0, -60.0, -10.0, 0.25, -5.0, - 0.25, 7.0, -16.0, -26.0, -80.0, -8.0, 1.75, -26.0, - 6.0, -8.0, -128.0, -3.0, 0.0, -10.0, -28.0, -0.75, - -12.0, 10.0, -3.25, 3.5, -6.0, 56.0, 32.0, 0.8125, - -9.0, 20.0, -1.875, -14.0, 40.0, 0.6875, -4.0, -2.0, - -0.3125, 18.0, 1.0, 3.5, 3.5, 0.0, 80.0, 13.0, - -15.0, 6.5, 3.5, -0.1875, 72.0, 0.8125, 72.0, 0.0, - 0.6875, 0.375, -0.5, 10.0, 0.0, 104.0, 0.0, -128.0, - -0.75, 7.0, -3.0, -48.0, -112.0, 32.0, 24.0, 0.25, - -0.5, -2.5, 0.875, -0.75, 7.0, 24.0, -0.3125, -16.0, - 0.0, -60.0, -0.0625, 0.25, 0.125, -2.0, -0.125, -56.0, - -52.0, -56.0, 104.0, 40.0, -0.25, 1.75, 1.25, 4.0, - 0.625, 24.0, 32.0, -0.5625, -16.0, 3.75, -48.0, -30.0, - 104.0, -12.0, -18.0, 0.6875, -56.0, -2.5, -56.0, 4.0, - 14.0, -4.0, -16.0, 72.0, 13.0, -3.5, -5.5, -3.0, - -8.0, -2.0, 48.0, 1.0, 4.0, 24.0, 0.625, 56.0, - 48.0, -0.75, 4.5, 13.0, 32.0, -32.0, 28.0, -1.625, - 0.125, 4.0, 1.25, -0.25, 2.5, 0.0, -1.875, 0.625, - -12.0, 56.0, 2.5, -8.0, -80.0, 12.0, -4.0, 3.0, - 0.75, 0.0, 1.875, 64.0, -30.0, 24.0, 11.0, 1.375, - -3.5, 0.375, -20.0, 0.375, 16.0, 6.0, 112.0, -12.0, - 7.5, -8.0, -1.875, -1.25, 0.0, -2.0, 8.0, 0.0, - 0.375, -1.25, -60.0, -16.0, 60.0, 1.0, -4.5, 1.5, - 0.75, -11.0, -1.0, 0.6875, -14.0, 0.125, -0.6875, 5.0, - -4.0, 1.0, 16.0, 2.0, 0.75, 2.0, 120.0, 60.0, - -32.0, 104.0, -0.3125, 120.0, 3.5, 24.0, -44.0, 0.875, - 10.0, 7.0, 10.0, 1.375, 1.375, -6.0, 36.0, 2.0, - -0.0625, 6.0, -3.0, -1.625, 0.4375, -28.0, 1.875, -2.5, - 4.0, 2.75, -4.0, -48.0, 0.0, -24.0, 0.375, 8.0, - 2.0, 0.5, 1.25, -88.0, 0.0, -6.5, 15.0, 5.5, - -7.0, 0.0, -36.0, -10.0, 28.0, -1.25, 24.0, 0.0, - -26.0, 1.625, -0.625, 11.0, -14.0, -0.125, -40.0, 88.0, - 32.0, -0.375, -3.5, -7.0, -3.25, -60.0, 0.375, -0.625, - 2.0, 14.0, 44.0, -32.0, 0.5, -14.0, -0.125, 0.5, - 1.125, 1.5, -12.0, -9.0, -28.0, -2.0, -1.75, 1.0, - -1.375, -0.625, -56.0, 1.25, -14.0, 0.75, 28.0, -4.0, - -36.0, -7.5, 0.75, -40.0, -0.25, 7.0, -1.5, 5.0, - -120.0, -7.0, -32.0, 2.25, -0.375, -1.75, 12.0, -2.75, - 1.875, -6.5, 0.5, -28.0, -2.25, -8.0, 0.875, -3.0, - -88.0, -24.0, -4.0, 1.875, 28.0, 104.0, -20.0, -7.0, - -32.0, 14.0, 56.0, -2.0, -7.0, 3.5, -10.0, -6.0, - 0.3125, 7.5, -0.25, -44.0, -5.0, -5.0, -2.5, 1.125, - -0.125, 3.75, 2.0, -36.0, 80.0, 0.25, -4.0, -6.0, - -52.0, -4.5, -28.0, -36.0, -64.0, -96.0, -2.0, 1.25, - -48.0, 20.0, 13.0, -7.0, -0.625, -2.0, 30.0, -0.625, - -1.25, 0.5, 44.0, -20.0, -52.0, -16.0, 0.1875, 12.0, - 0.5, 7.5, 2.5, 44.0, -0.0625, -0.625, 3.5, -2.0, - 0.375, 5.0, 10.0, 22.0, 2.0, -4.5, 9.0, -12.0, - 28.0, 1.0, 80.0, 0.75, -7.0, -1.25, 1.125, -0.75, - 2.0, -56.0, 40.0, -0.125, -8.0, -60.0, -26.0, -1.25, - -32.0, 24.0, 5.0, -4.5, -1.625, 1.875, 80.0, 6.5, - -0.5625, -0.25, 14.0, -0.75, 13.0, 24.0, 32.0, 24.0, - 28.0, 52.0, -3.75, -48.0, 7.5, 48.0, 1.5, -2.0, - -24.0, -6.0, -0.75, -0.4375, 11.0, -16.0, -1.25, 0.75, - 0.0, -32.0, 0.625, 16.0, 0.0, -3.0, 22.0, 36.0, - 7.0, -112.0, -88.0, -40.0, 26.0, 0.0, -8.0, -3.0, - 16.0, -48.0, 1.25, -11.0, 22.0, 11.0, -0.5, -6.0, - -112.0, -14.0, 1.0, -0.5, 2.0, 3.0, -1.0, -1.875, - 52.0, -10.0, -5.0, -4.5, -1.375, 0.125, -2.0, -44.0, - -24.0, 30.0, -12.0, 13.0, -32.0, 7.5, 0.25, -6.5, - -112.0, 15.0, -1.5, 3.0, 0.9375, 3.0, 56.0, 28.0, - -1.0, 0.375, 12.0, -28.0, 0.875, 0.625, 0.0, -14.0, - -2.0, -24.0, 1.0, 24.0, -8.0, -11.0, 9.0, 20.0, - -0.125, -0.9375, -6.0, 104.0, -0.625, 28.0, 40.0, 1.5, - 1.0, 0.0, 28.0, 0.0, 120.0, 40.0, 0.0, -1.0, - -4.0, -6.0, -30.0, 32.0, 0.375, 20.0, -30.0, 0.0, - 6.5, 0.875, 4.0, 4.0, -0.8125, -56.0, -8.0, 4.0, - -36.0, 2.5, 0.5, -12.0, -3.25, -12.0, 0.375, 80.0, - 14.0, -0.5, -0.625, -2.5, -14.0, 4.5, 120.0, -24.0, - 0.0, 1.875, 0.25, -0.9375, -0.5, 28.0, -1.0, 0.5, - 0.625, 1.0, -2.0, 24.0, -1.25, -0.875, 56.0, -2.25, - 12.0, 20.0, 12.0, 32.0, 8.0, -1.0, 18.0, -16.0, - 8.0, -72.0, 13.0, 24.0, 0.5625, -0.875, 0.9375, 0.25, - -3.0, 0.0, -2.0, 52.0, -80.0, 0.875, 6.5, -0.75, - -0.75, 2.5, 1.875, -1.0, 32.0, -20.0, 3.0, 60.0, - 24.0, -7.0, 0.0, -9.0, -5.0, 0.0, 11.0, 0.8125, - -1.0, -4.0, 18.0, 20.0, -3.5, 20.0, 0.0625, 0.375, - -2.0, 0.4375, -28.0, 3.0, -16.0, -0.875, 72.0, -4.0, - 1.75, 1.75, 18.0, 7.0, -12.0, -2.0, 52.0, -5.0, - -16.0, -0.875, -1.875, -32.0, -0.9375, 1.0, -5.0, 0.9375, - 48.0, 4.5, -0.9375, 0.75, -1.75, -2.5, 16.0, -0.5, - -0.5, -13.0, -28.0, -48.0, -2.0, 2.0, 16.0, -56.0, - -0.3125, 3.5, -8.0, -1.25, 1.25, 0.0, 24.0, 0.875, - 56.0, -56.0, -4.0, 8.0, 20.0, 0.4375, 3.5, -16.0, - 0.6875, 40.0, 26.0, 12.0, 0.4375, 0.0, -4.0, -2.5, - -20.0, -8.0, -0.5, 0.875, 0.75, 20.0, 16.0, 1.875, - -4.0, -4.0, -16.0, -2.0, 8.0, -1.125, 0.3125, 2.5, - 1.75, -1.125, -3.5, 0.5, 0.75, 0.875, -28.0, -72.0, - -1.625, 0.75, 2.0, 48.0, 24.0, 3.0, -0.1875, 0.1875, - -6.0, 12.0, 0.25, 2.0, 1.25, 0.0, 0.0, 22.0, - -2.0, 0.4375, 64.0, 1.0, -0.25, 0.0, 52.0, -26.0, - 0.125, -1.0, -12.0, 36.0, 6.0, -11.0, 8.0, -16.0, - 16.0, 0.5, 60.0, -32.0, -20.0, 0.3125, -0.75, -0.25, - -80.0, -4.0, -1.5, 2.0, -1.25, 0.375, 13.0, -2.5, - -1.375, 0.25, -1.0, -0.125, 13.0, 26.0, 26.0, -4.0, - 0.0, 96.0, 0.125, 0.1875, -40.0, 3.0, -9.0, 5.0, - 18.0, -1.75, 1.0, -0.375, -2.0, 2.0, -0.75, 0.375, - 0.125, -0.5, 6.0, 8.0, -6.0, -52.0, -16.0, 12.0, - 0.5, -20.0, -0.6875, -3.0, 1.125, -0.25, 0.25, 0.0, - -1.0, 18.0, -2.0, 6.5, -8.0, -3.75, 0.8125, 56.0, - 0.25, 24.0, -1.0, 0.75, -60.0, -0.8125, 3.0, -0.1875, - -0.5625, 1.875, 8.0, -0.375, 0.0, -1.25, 26.0, 0.75, - 18.0, -3.0, -72.0, -0.375, 1.0, 6.0, -1.625, -8.0, - -20.0, 48.0, 0.5, 0.0, 64.0, -52.0, -0.8125, -40.0, - 4.0, -3.5, 2.5, 3.75, 80.0, -0.6875, -0.0625, -1.0, - -3.0, -26.0, 3.0, -8.0, -0.75, 1.5, 1.25, -64.0, - -16.0, -4.0, -48.0, 1.0, -0.625, -64.0, 10.0, 0.5625, - 2.0, -96.0, -11.0, 1.875, -1.875, 15.0, -0.75, -0.5, - -0.5, -0.6875, 7.0, 3.75, 0.0, -26.0, 2.75, -104.0, - -3.25, 2.5, 22.0, 0.25, -2.0, 72.0, 3.5, 1.0, - 24.0, -24.0, -32.0, 10.0, -1.0, -0.4375, -0.1875, -0.5, - -14.0, 3.0, -4.0, -64.0, 24.0, -7.0, -11.0, 4.0, - -2.0, 1.375, -6.5, -24.0, 9.0, 28.0, -88.0, -2.0, - -3.0, -1.875, 48.0, -28.0, 28.0, 0.25, 24.0, -10.0, - 1.5, 6.0, -7.0, -48.0, -0.8125, -11.0, -8.0, -1.375, - 1.375, 24.0, -4.0, -0.0625, 40.0, -30.0, 12.0, -7.0, - -88.0, 3.75, 56.0, 0.0, 0.375, 0.0, -120.0, 4.0, - 120.0, 0.1875, 2.0, -20.0, -0.5, 2.25, 12.0, 0.0, - -0.5, -7.0, 6.0, -2.0, 2.0, 16.0, 0.75, -104.0, - -4.0, -1.125, 0.625, -0.375, -0.875, 7.0, -3.5, -0.75, - 6.5, -7.5, 40.0, -3.0, -0.9375, 120.0, 0.0, 2.0, - 20.0, -40.0, 12.0, 11.0, 32.0, -1.5, -0.25, 36.0, - -16.0, -0.5625, 0.6875, 72.0, -0.75, -2.0, -6.0, 5.0, - -96.0, -60.0, 4.0, -18.0, 24.0, 0.0, 48.0, -0.6875, - -1.75, 0.6875, -1.5, 56.0, -0.3125, -1.625, -6.0, 36.0, - 80.0, -3.0, 4.0, -2.0, 1.5, 120.0, 20.0, -8.0, - 0.25, 0.8125, -4.0, 44.0, -0.625, 6.0, 1.5, -2.25, - -30.0, 9.0, -0.75, -0.75, 15.0, -32.0, 30.0, 0.0, - 8.0, 5.0, 0.0, -3.0, 7.0, -2.5, -8.0, -0.25, - 2.5, -56.0, -0.125, -40.0, -36.0, -2.75, 18.0, -0.25, - -6.5, 0.75, -48.0, -5.0, -2.0, -1.0, -0.25, 4.0, - -3.5, 112.0, 48.0, -4.5, 3.5, 24.0, 1.25, 5.0, - -1.5, 24.0, -6.0, -2.0, -128.0, 5.0, 48.0, -3.5, - -0.6875, 10.0, -0.875, 1.25, 1.0, -2.0, -0.125, 0.0, - 6.0, 20.0, -4.0, 20.0, 0.4375, 32.0, 0.0, 0.5, - 20.0, -4.0, -3.5, -0.75, -6.0, 0.75, 1.0, 24.0, - -0.0625, 0.8125, 16.0, -0.75, 0.3125, -28.0, -0.875, -2.0, - -0.5, 0.25, -4.0, -0.5, -16.0, -0.5, 24.0, 10.0, - -44.0, -96.0, 0.875, 0.5625, 0.0, -72.0, -0.5625, -0.5, - 14.0, -4.0, 0.125, -5.0, -0.1875, -7.0, -3.75, -5.0, - -2.0, -30.0, -6.5, 0.5, -1.375, 0.5, -1.25, -3.0, - 16.0, -0.75, 1.25, -2.25, -2.75, 12.0, -6.0, -2.0, - -1.375, 30.0, -2.5, 0.5, -52.0, -0.5, -2.0, -8.0, - 40.0, 4.0, -2.0, 5.5, -24.0, 9.0, -12.0, 32.0, - -1.0, -16.0, -4.0, -48.0, 0.9375, -24.0, 1.75, -24.0, - 24.0, 60.0, 1.5, -0.0625, 16.0, 3.0, 0.25, -1.875, - -30.0, 0.0, -9.0, 1.5, -0.75, 16.0, 0.0625, 0.0, - -40.0, 36.0, 3.0, 0.0, 18.0, -72.0, -2.5, 7.0, - -16.0, -6.0, -7.0, -112.0, 0.0, -16.0, 40.0, 0.125, - -1.0, -24.0, -80.0, -30.0, 2.25, 2.0, -32.0, -24.0, - -4.0, -0.75, 0.8125, -8.0, -48.0, 3.75, 5.0, -6.5, - -6.0, 5.0, -40.0, 0.6875, -1.0, 0.125, 2.75, 5.0, - -0.75, 14.0, 0.3125, -3.5, 0.6875, 96.0, -0.8125, -20.0, - 0.6875, -1.0, 14.0, -24.0, 64.0, 3.5, 0.25, 24.0, - -2.0, -2.0, -18.0, 80.0, 3.25, -24.0, 0.4375, -60.0, - 0.0, -32.0, -1.0, 40.0, -0.875, -5.5, -12.0, 0.25, - 6.0, 0.0, 56.0, 11.0, -5.0, -20.0, 4.0, 15.0, - -4.0, -8.0, 0.5, 0.75, 14.0, -0.125, -6.5, -18.0, - 64.0, 2.0, -72.0, 13.0, -32.0, 6.0, 1.0, 1.0, - 0.0, -2.25, -18.0, -20.0, 0.875, 22.0, 2.0, -8.0, - 1.625, 104.0, -2.25, 1.75, 40.0, -1.375, 32.0, 9.0, - 1.0, -0.5, 24.0, 12.0, 0.75, 1.0, 1.0, -5.0, - 0.25, 8.0, 1.5, -0.125, 1.75, -3.5, 1.0, -28.0, - -120.0, -20.0, 0.375, -8.0, 72.0, -4.0, 0.0, -28.0, - 0.0, 3.75, -5.0, 6.0, 96.0, 1.625, 7.5, 1.5, - -3.75, 10.0, 32.0, -16.0, -0.25, -0.5, -0.25, -13.0, - -8.0, -1.125, -0.625, 24.0, 15.0, 7.5, 5.5, -22.0, - 7.0, -13.0, 5.5, -16.0, 0.875, -8.0, 40.0, -12.0, - -40.0, -1.5, 3.0, 60.0, 1.0, -64.0, 18.0, -3.5, - 3.75, 104.0, 4.0, -48.0, 0.4375, -96.0, 1.0, 0.75, - 3.0, -30.0, 6.0, 24.0, -1.25, -16.0, 24.0, -14.0, - 52.0, -16.0, 10.0, -13.0, 16.0, 0.25, 6.0, 8.0, - -13.0, 20.0, -10.0, 0.5, -1.0, -0.625, 1.75, -6.0, - -0.125, 24.0, -11.0, 0.0, -2.0, 56.0, 0.0, -0.875, - 52.0, -6.5, -0.3125, 1.875, -3.5, 1.25, 0.125, -1.0, - 36.0, -7.0, 24.0, 2.0, 0.625, -56.0, 56.0, -4.0, - 64.0, -6.0, 0.0, 8.0, 2.5, 0.375, -8.0, 16.0, - 0.1875, 8.0, 0.25, -0.3125, 5.0, 1.5, 4.0, 13.0, - 0.0, -8.0, 24.0, -22.0, 48.0, 1.75, -1.25, -2.0, - 1.75, 7.5, -1.0, -4.5, 1.875, -1.0, 1.5, 5.0, - -32.0, 0.0625, 4.0, -0.0625, 28.0, -20.0, -36.0, 1.0, - 0.4375, 24.0, -0.5, -2.25, -64.0, 3.25, -4.5, 7.5, - -64.0, 2.5, 20.0, -0.3125, 0.25, -4.0, -1.5, 3.0, - 112.0, 10.0, -2.5, -1.125, 0.0, 3.25, 1.0, -0.875, - 28.0, 0.75, 10.0, -0.3125, 16.0, -1.75, -16.0, -48.0, - -1.0, -0.4375, 0.25, -0.6875, 2.5, -0.3125, 3.75, -18.0, - 1.0, 0.8125, 0.125, 40.0, -56.0, 44.0, 13.0, 96.0, - 18.0, -2.25, 0.875, -24.0, -8.0, -16.0, -0.375, 1.25, - -0.5625, 112.0, 1.75, -128.0, -80.0, 0.5625, 36.0, 96.0, - 0.25, 3.75, 1.25, 0.5, -0.375, 10.0, 56.0, 8.0, - 15.0, -88.0, -14.0, -0.25, 0.8125, 7.0, 2.0, 0.5, - -0.5, -24.0, 0.3125, 112.0, -0.75, 0.0, 2.0, -2.0, - -0.8125, -32.0, -6.0, -0.5, -0.5, -40.0, -1.5, 104.0, - -56.0, 2.75, 36.0, -1.625, 15.0, -2.0, -1.125, -0.5, - 12.0, -0.5625, -3.75, -1.0, -4.0, 44.0, -0.375, -0.625, - -4.0, -0.3125, -4.5, 0.0, 0.0, 28.0, -88.0, -48.0, - -0.5, 1.25, -0.75, 8.0, -40.0, -2.0, 1.375, -16.0, - 15.0, 48.0, 0.5, -0.0625, 1.5, -7.0, 0.3125, 7.5, - -64.0, -2.0, -112.0, -4.0, -1.5, 0.875, 2.5, 26.0, - 1.0, 0.5625, -1.75, -10.0, -32.0, 8.0, -13.0, -1.125, - -120.0, 0.4375, -56.0, 10.0, -0.4375, -4.0, 8.0, -7.5, - -0.5, -10.0, -0.375, 1.75, 3.75, -0.25, 0.0, -0.5, - -8.0, 0.6875, -1.25, -56.0, -1.0, 0.0, 64.0, 48.0, - 8.0, -7.0, -104.0, -36.0, -44.0, -40.0, -8.0, 5.0, - -4.5, 32.0, 0.875, -24.0, 1.0, -16.0, 15.0, -6.0, - 4.0, -8.0, -8.0, -0.3125, -16.0, 2.0, -4.5, 72.0, - 0.875, -6.0, -0.4375, 0.0, 1.0, 32.0, 8.0, -0.875, - -0.625, -4.0, 7.0, -0.125, 40.0, 0.6875, 2.0, 3.5, - 0.375, -0.4375, -0.25, -40.0, -36.0, 48.0, 16.0, -1.0, - -1.75, -72.0, -0.25, -36.0, -40.0, -1.75, -3.5, -18.0, - -0.125, -5.0, -6.5, 48.0, -16.0, -0.5625, -1.0, 0.25, - -2.5, -4.0, 28.0, 3.5, -14.0, 15.0, 12.0, -0.25, - 2.0, -80.0, 12.0, 0.1875, 32.0, -0.25, -7.0, 0.125, - -56.0, -15.0, 1.25, -0.25, -24.0, -1.125, -16.0, -5.0, - 0.1875, 5.0, -88.0, 20.0, -1.75, -5.0, -128.0, 8.0, - 18.0, 48.0, -1.625, -8.0, -8.0, -28.0, 32.0, -0.0625, - 56.0, -8.0, 1.0, 0.25, 3.5, 28.0, 3.5, 48.0, - 40.0, 10.0, 11.0, 0.5, 1.625, 0.4375, -8.0, 2.25, - -24.0, -6.0, 0.625, -1.0, -4.0, 80.0, 2.75, -56.0, - -1.875, -6.0, 60.0, -9.0, -6.5, -1.5, 0.5, 0.5, - 40.0, 6.0, 4.0, -2.5, 1.5, 2.5, 0.9375, -0.6875, - 2.0, -1.75, -1.0, 0.0, 96.0, -5.0, -16.0, -1.75, - -40.0, 1.875, -1.0, 0.0, 1.0, -0.875, -80.0, 3.25, - -16.0, 12.0, -72.0, -14.0, 5.5, -1.0, -0.4375, -10.0, - -64.0, 8.0, 1.5, 2.25, 12.0, 52.0, 88.0, -0.75, - -32.0, -3.25, -22.0, -48.0, 22.0, 16.0, -40.0, 6.5, - 2.0, -0.25, 15.0, -36.0, -16.0, -0.375, 0.0, 0.125, - 0.0, 15.0, -1.75, -0.125, 2.25, -3.75, 120.0, -0.125, - -0.625, 3.0, -28.0, -5.5, -3.5, 6.0, -5.0, 4.0, - 56.0, 14.0, 5.0, 0.0, 14.0, -0.25, 0.5, 8.0, - 8.0, -0.75, -16.0, 0.5, 0.25, 0.5, -24.0, 30.0, - 2.0, 10.0, 6.5, -0.25, 0.8125, 12.0, -3.25, 1.25, - -1.0, -0.625, 0.6875, -0.375, -12.0, 0.25, 30.0, -8.0, - 0.25, -0.875, 0.25, -12.0, -3.0, -7.5, -1.5, -0.5, - -0.25, 0.0, -1.25, -16.0, 0.875, 2.75, 0.0, 104.0, - -24.0, -0.5, 72.0, 1.375, -16.0, -104.0, -2.25, -26.0, - 7.5, -6.0, 1.75, 0.875, -0.625, 0.875, 20.0, 9.0, - 20.0, 0.0, 16.0, -5.5, 12.0, 0.0, -56.0, 1.0, - -8.0, -6.0, -96.0, -28.0, 1.125, -0.25, 1.125, -12.0, - -5.0, -0.5, 2.0, -10.0, -28.0, -112.0, -0.625, -0.375, - 2.25, -16.0, -16.0, 6.5, -40.0, 1.0, 2.75, 0.75, - 4.0, 112.0, 2.0, 64.0, 9.0, 20.0, 6.0, -32.0, - -1.5, 0.0, 60.0, -0.75, -7.0, 0.375, -5.0, 0.5, - 0.25, 0.625, 48.0, 60.0, 7.0, -2.0, -8.0, -0.0625, - 2.0, 4.0, 0.5, -64.0, 13.0, -0.875, 44.0, -24.0, - 7.0, 0.125, -15.0, 1.375, -0.875, 28.0, -22.0, 8.0, - -1.25, -60.0, -0.5625, -0.375, 1.875, -20.0, 0.0, -14.0, - 0.1875, -0.25, -1.875, -1.0, -1.0, 1.0, 3.75, -3.0, - -10.0, -20.0, -1.625, 0.0, 1.75, -5.0, -128.0, -3.0, - 6.5, 88.0, 0.4375, -4.5, 0.0, 10.0, 4.5, -5.5, - 4.0, -15.0, 28.0, 10.0, 0.875, 88.0, 0.875, -7.0, - -14.0, -40.0, -48.0, 56.0, 24.0, 3.0, 16.0, 104.0, - -1.5, 104.0, 0.625, -8.0, 1.25, -2.0, 0.375, 1.5, - -1.25, -44.0, -40.0, 60.0, 32.0, -1.0, -1.5, -128.0, - -112.0, -1.0, -2.5, 0.75, -1.25, 36.0, -48.0, 24.0, - -80.0, -6.0, 3.0, -80.0, 0.5, 60.0, 72.0, 0.75, - 24.0, 2.5, -0.1875, 88.0, 0.0, 0.0, 6.0, -0.1875, - -52.0, 120.0, 0.9375, 0.875, 16.0, -8.0, 20.0, -1.5, - 3.0, 0.5, 1.5, 4.0, 32.0, 0.9375, -0.625, 1.25, - -72.0, 32.0, 2.0, -0.5625, 0.0, 64.0, -120.0, -3.25, - -4.0, -104.0, -4.5, 3.5, 2.75, 0.0, -9.0, 24.0, - 1.5, -88.0, 32.0, 6.0, 0.875, -0.5, 48.0, 96.0, - -0.6875, -15.0, -0.1875, 22.0, 0.625, -1.0, -28.0, 0.375, - 112.0, -1.25, 24.0, -1.0, 12.0, -30.0, 72.0, 30.0, - 2.25, 1.5, -36.0, 0.875, -64.0, 4.0, -16.0, -20.0, - -4.0, -0.3125, 0.3125, 0.625, -88.0, -18.0, 13.0, 20.0, - 112.0, -32.0, -1.5, -7.0, 14.0, 2.5, 0.4375, -0.875, - -64.0, -1.375, 4.0, 3.75, -3.5, 88.0, -5.5, 7.5, - 36.0, 6.0, -3.75, 0.125, 6.0, 4.0, 1.125, 2.0, - 12.0, -112.0, 1.0, 0.0, 26.0, -3.0, 0.3125, 15.0, - 0.75, -28.0, 0.0, 8.0, -56.0, -14.0, -5.0, 0.1875, - 12.0, -0.375, 0.0, -12.0, 36.0, 0.75, 3.25, 0.0, - -0.25, 104.0, 2.25, 44.0, 112.0, 16.0, -2.5, -6.0, - 2.25, -28.0, 7.0, 0.3125, -16.0, -24.0, -3.0, -2.75, - -1.5, -96.0, -5.0, -15.0, 2.0, -11.0, 40.0, 3.75, - -1.0, -10.0, -3.5, -60.0, -2.0, 0.5, -0.1875, -11.0, - 0.9375, -5.5, -1.25, 0.25, -16.0, -0.1875, 8.0, 32.0, - 36.0, 8.0, -2.75, 8.0, 10.0, -1.125, -5.0, -1.5, - -14.0, -8.0, 3.25, -0.3125, 8.0, 120.0, 2.0, 8.0, - 16.0, 0.5, 0.625, -0.9375, -4.0, 13.0, -1.75, -0.3125, - -60.0, 20.0, -0.5, 44.0, 6.5, 1.75, 8.0, -0.125, - 5.0, -4.0, 12.0, -0.875, -72.0, 48.0, 0.6875, 4.0, - 120.0, 0.0, 16.0, 0.625, 24.0, 4.5, -52.0, 3.25, - -48.0, 12.0, 0.9375, 6.0, -0.4375, -20.0, 0.75, 80.0, - -12.0, -18.0, -32.0, -10.0, -6.0, -64.0, -2.0, 0.75, - 4.0, 2.0, 0.9375, -1.25, -3.25, -2.75, 40.0, -4.0, - 0.0, -1.125, -128.0, -13.0, 4.0, -1.875, -32.0, -3.25, - 12.0, -28.0, 5.0, 9.0, 10.0, 16.0, 12.0, 4.0, - -22.0, 48.0, 8.0, 48.0, -24.0, -16.0, -5.5, -2.25, - -18.0, -10.0, 22.0, 7.0, -1.5, -5.0, -0.625, -9.0, - 56.0, -6.0, -6.0, 0.5, -0.375, -4.0, -1.625, 0.875, - 0.625, 24.0, 0.0, 96.0, 24.0, -0.125, 9.0, -16.0, - 40.0, 2.0, 8.0, -44.0, 40.0, 0.4375, -11.0, 2.0, - -5.5, 16.0, 4.0, -0.625, 16.0, -10.0, 11.0, -2.0, - -0.25, -2.5, -1.75, 52.0, -0.8125, -48.0, -32.0, -14.0, - 16.0, -24.0, 1.5, -10.0, -0.625, 0.25, 0.0, -2.5, - -1.0, -80.0, -8.0, -96.0, -0.75, -64.0, 12.0, 1.375, - 0.5, -1.125, 104.0, -60.0, -0.1875, -56.0, -16.0, 48.0, - 0.0, 3.75, -3.0, -11.0, -0.25, -0.1875, -1.0, -4.0, - -0.5625, -2.5, -40.0, -64.0, -104.0, 15.0, -5.0, -7.0, - -1.5, -72.0, 18.0, 0.5, 32.0, 6.0, 0.75, -0.1875, - -24.0, 4.0, 1.0, 0.0, -0.5, -104.0, 1.5, 64.0, - 24.0, 16.0, 32.0, 0.125, -0.75, -0.25, 0.0, 0.0, - 40.0, 0.0, 0.0, -1.875, -3.25, 0.0625, -2.25, 1.75, - 0.6875, -1.0, -7.0, 40.0, -1.5, -14.0, 6.0, 88.0, - 48.0, 8.0, -120.0, -12.0, 120.0, -0.375, 16.0, 0.25, - 0.5, -1.25, -1.125, -52.0, 16.0, 1.375, -8.0, 8.0, - 0.75, 8.0, -7.5, 1.75, -0.875, 7.0, -7.0, 8.0, - -1.375, -13.0, 0.125, 0.0, 1.5, 3.5, -14.0, -1.0, - -2.0, 0.0625, -30.0, 1.875, -1.375, -56.0, -26.0, -5.0, - 8.0, -56.0, 10.0, -8.0, 1.0, 0.25, -14.0, -1.375, - -0.9375, 0.75, 0.375, -2.5, 0.75, 22.0, 1.375, 0.125, - 1.625, 2.5, 88.0, 1.0, -5.5, 12.0, 8.0, 112.0, - 32.0, -80.0, -2.0, -6.0, -24.0, -8.0, 0.5, 3.5, - 112.0, -22.0, 8.0, -0.125, 12.0, 52.0, 4.0, 0.25, - 0.125, 7.0, 24.0, 13.0, 0.3125, 8.0, -5.0, 40.0, - -1.875, 8.0, 14.0, 20.0, -0.125, -6.5, -6.0, -40.0, - 0.0, -0.75, -0.125, 0.875, 0.0, -1.0, -3.0, -96.0, - -0.8125, -0.875, -28.0, -0.5, 36.0, -16.0, -0.9375, 5.5, - 12.0, -8.0, 0.25, -72.0, 28.0, 24.0, -0.6875, -1.0, - 0.0, -0.25, 40.0, -8.0, -0.375, 0.125, 72.0, -1.0, - 1.0, 0.0, -1.5, -120.0, 3.5, -0.5, 0.9375, 96.0, - 3.0, 11.0, 12.0, 5.0, 8.0, 12.0, 1.5, 0.0, - -1.125, 28.0, -7.0, 0.0, -7.0, -30.0, -80.0, 2.75, - -2.0, -30.0, -64.0, 0.5, -14.0, 4.0, 0.3125, 0.0, - 0.75, 0.0, 3.25, 5.0, 0.125, 0.5625, -3.0, -2.0, - 3.75, 60.0, 0.0, 13.0, 12.0, -16.0, 2.0, -0.5, - 6.0, 2.75, -24.0, -56.0, 5.5, 0.5, 40.0, -60.0, - -24.0, -40.0, -1.25, -1.625, -0.5, -56.0, -22.0, -96.0, - -28.0, -2.0, 8.0, 12.0, -96.0, -0.125, -0.125, -6.0, - -2.0, -2.0, 0.375, 20.0, 16.0, -4.0, 5.5, -2.0, - 16.0, 0.0, 13.0, -104.0, 0.0, -0.125, -120.0, -9.0, - 1.25, 0.75, -0.25, 5.0, -6.0, 24.0, -48.0, 12.0, - -0.6875, 112.0, 1.25, -56.0, -18.0, -24.0, -1.125, 16.0, - 0.3125, 0.0, -48.0, -18.0, 30.0, 64.0, 3.0, -120.0, - 24.0, 20.0, -1.75, -20.0, -1.875, 10.0, 1.5, 14.0, - 0.5625, -4.0, 4.0, 14.0, -48.0, 2.0, 0.0, 44.0, - 14.0, -6.0, 0.0, 0.4375, -1.0, 24.0, -128.0, -88.0, - 16.0, -56.0, -2.25, 3.0, 1.0, -120.0, -6.0, 88.0, - -48.0, 9.0, -128.0, -14.0, 1.25, 7.0, -8.0, 0.6875, - 44.0, -1.5, -32.0, 0.5, 2.0, -8.0, -2.5, -0.9375, - -0.6875, -18.0, -5.0, -0.5, -5.5, -0.6875, -1.0, 28.0, - 72.0, -2.0, 64.0, 0.375, -2.5, 3.5, -13.0, 1.5, - -1.25, 5.0, -0.625, -2.0, -1.75, -0.125, -3.75, -0.6875, - -0.75, 0.5, -8.0, -0.5, -3.0, -52.0, -36.0, 1.25, - 0.0, -1.5, -15.0, -64.0, -1.25, -0.9375, 56.0, 72.0, - 0.125, 0.5, -2.0, -4.5, 2.5, 1.625, 1.0, 2.75, - 32.0, -1.125, 10.0, -0.5, 30.0, -0.75, 0.625, -8.0, - -6.0, 0.25, -15.0, 44.0, -0.625, 1.0, 56.0, -3.0, - 26.0, -2.75, 6.0, -2.0, 0.6875, 16.0, 6.5, 1.75, - 4.0, -2.0, -120.0, -0.75, -0.375, 0.0, -3.0, 16.0, - 1.75, -12.0, -20.0, -11.0, 56.0, -7.5, -2.0, 24.0, - -28.0, -28.0, -0.25, 0.625, -1.75, -4.0, -0.5, -44.0, - 15.0, 0.75, -3.0, -4.5, -1.25, 80.0, 24.0, 4.0, - 6.0, 0.4375, 32.0, -24.0, -5.0, 80.0, -32.0, 1.5, - 14.0, -2.5, -2.5, -13.0, -4.5, 16.0, -1.0, 0.5, - 3.5, 0.25, -2.0, -0.75, -1.375, 0.75, 3.0, -0.5, - 0.75, -9.0, 0.25, 96.0, -56.0, -6.5, -16.0, 1.125, - 0.625, 8.0, -3.25, -64.0, 64.0, 56.0, -0.25, 16.0, - -1.625, 6.0, 112.0, 14.0, -72.0, -1.375, 0.75, -1.625, - 48.0, -6.0, -56.0, 5.0, 112.0, 6.5, 4.0, 4.0, - -4.0, 0.75, -26.0, -4.5, 0.75, -0.625, -2.25, -0.375, - 0.6875, -24.0, -0.0625, 44.0, 28.0, 0.0, -1.75, -6.0, - -18.0, -3.75, 3.0, -7.0, 0.0, 0.4375, 0.5, 26.0, - 0.0625, -1.625, 32.0, 0.9375, 48.0, 48.0, -0.3125, 2.5, - 6.0, 1.5, 56.0, 0.625, 6.0, 2.0, -2.5, -2.75, - 12.0, -104.0, -3.25, -60.0, 88.0, -80.0, 1.25, 0.6875, - -56.0, -0.375, -6.5, -0.125, -8.0, -0.5, -28.0, -0.8125, - -22.0, 0.625, 3.5, 12.0, 52.0, 120.0, 9.0, 1.0, - -36.0, 10.0, 52.0, -0.25, 4.0, -14.0, 0.375, 1.125, - 36.0, 0.5, 0.625, -1.25, 0.0, -60.0, -0.0625, -14.0, - -0.0625, -1.75, 1.75, -64.0, -40.0, -0.25, -30.0, 24.0, - -16.0, -1.5, -0.625, 1.75, -96.0, -1.375, -32.0, 28.0, - -2.0, 0.75, 4.0, 10.0, -0.625, -0.125, -1.0, 4.0, - -2.0, 0.75, -72.0, -2.0, -0.5, -0.5625, -0.6875, -0.25, - 14.0, -18.0, 80.0, -0.75, 2.0, -60.0, 48.0, -60.0, - 80.0, 30.0, 30.0, 4.5, 4.0, 2.0, 16.0, 3.75, - 8.0, 8.0, 28.0, -11.0, -0.5, 0.25, 15.0, -22.0, - -4.0, 5.0, 0.5, -9.0, 0.0, 0.375, -2.0, 40.0, - 0.9375, -0.75, 0.5, 64.0, -2.25, 2.0, -0.125, -0.875, - -14.0, -4.0, -44.0, -56.0, -5.0, -14.0, 1.25, -32.0, - -0.9375, -72.0, 4.0, -4.0, 0.9375, -1.375, 0.125, -3.5, - 3.25, 3.0, 1.5, 0.75, -0.5, 1.125, 6.0, -1.0, - -9.0, 5.0, -2.0, -1.75, 1.75, 5.5, 0.0, 0.0, - -7.5, -3.25, -64.0, -2.0, -0.375, 0.875, -16.0, 1.0, - -30.0, -30.0, 2.0, -1.75, 0.875, -2.5, 0.25, 0.0, - 0.0, 0.5, 1.0, -2.0, -0.75, -28.0, 2.5, 56.0, - 2.75, 0.75, 3.0, 8.0, 1.125, 2.0, 3.0, -6.0, - -4.0, 28.0, 1.25, 12.0, -0.5, 0.0, 6.0, 1.0, - -2.0, 0.125, 4.0, 0.0, -30.0, 0.6875, 60.0, 14.0, - -4.0, 8.0, -12.0, -0.5625, 0.375, 0.9375, 72.0, -28.0, - 2.25, -0.3125, -7.0, 0.75, -0.25, -15.0, -20.0, -0.5, - 0.75, -0.625, -26.0, -3.75, 112.0, -1.375, -2.5, 5.0, - 11.0, -52.0, 6.5, 0.75, 12.0, -0.125, -80.0, 14.0, - 4.0, 0.0625, -26.0, 1.25, 12.0, 1.625, -1.25, 0.125, - -16.0, -8.0, -1.25, 88.0, 22.0, -7.0, -7.0, -3.75, - -2.0, -56.0, -0.0625, -2.5, -56.0, 3.0, 0.625, 6.5, - -1.5, 4.0, 80.0, 0.6875, 0.0, 0.8125, 2.25, 0.375, - 0.5, 8.0, -8.0, 0.5, 0.0, 32.0, 2.5, -1.125, - 0.125, 4.0, -6.0, -0.25, 1.625, -3.0, 56.0, 56.0, - -20.0, -32.0, -60.0, -8.0, -10.0, 48.0, -24.0, -60.0, - -2.0, -3.0, -88.0, -16.0, -7.5, 8.0, -0.375, 22.0, - 16.0, -0.25, -14.0, 30.0, 6.0, 2.5, -0.75, 13.0, - -1.5, 0.875, 0.75, 22.0, 0.875, 6.0, -6.5, -96.0, - -0.75, -0.625, -104.0, -6.0, 40.0, -1.5, -0.5, 5.0, - 1.0, -0.875, 1.125, 0.8125, -4.0, 4.0, 16.0, 13.0, - 64.0, 1.0, 56.0, 112.0, 40.0, 3.0, 14.0, 0.5, - -2.5, -4.0, -26.0, -0.4375, 10.0, 0.875, -0.4375, 16.0, - -32.0, 7.5, -10.0, 8.0, 18.0, 6.5, -0.375, 0.125, - 1.375, -0.25, 56.0, -48.0, 0.75, -1.875, 1.5, 14.0, - 64.0, -0.5, -40.0, -7.0, 4.0, 8.0, 0.0, -0.5, - 8.0, 10.0, -5.0, 0.625, -52.0, -120.0, -1.625, 14.0, - -0.25, 0.0, -2.0, -13.0, 88.0, 13.0, -44.0, 5.0, - -8.0, -0.625, 0.5, 32.0, -1.625, -112.0, 14.0, -2.0, - -10.0, 30.0, -112.0, 0.25, -10.0, 0.4375, 0.125, 0.0, - -2.75, 60.0, -36.0, -4.0, -4.0, -5.0, -6.0, -0.25, - 52.0, -7.5, 1.0, -0.25, 2.0, -11.0, 1.0, 2.0, - 0.0, 0.625, 1.75, 0.75, 13.0, 80.0, -0.75, -1.5, - -14.0, 1.0, -60.0, -0.625, -0.25, 0.0, -8.0, 104.0, - 24.0, 6.0, 1.5, 2.0, 2.0, 15.0, -11.0, 26.0, - 0.625, -96.0, -0.25, -72.0, 0.25, -40.0, -13.0, -0.75, - 0.5, -24.0, 0.625, -12.0, 22.0, -1.125, -8.0, 0.625, - -1.0, -0.3125, 0.3125, -1.25, -26.0, 2.5, -0.4375, 0.0, - 2.5, -1.0, -28.0, 4.5, 24.0, 0.0, -24.0, 9.0, - -80.0, 1.75, -3.75, -0.4375, 44.0, 0.375, 0.75, -14.0, - -9.0, -0.625, 7.5, 2.25, -2.0, 1.75, 2.0, -0.5, - -2.0, -0.5625, 112.0, -6.5, -26.0, 9.0, 30.0, 8.0, - 16.0, -2.0, 2.0, -0.875, 0.9375, -3.5, 12.0, -0.125, - -0.75, -48.0, 0.25, -24.0, 1.0, -1.375, 0.0, 0.75, - 8.0, 120.0, 1.875, -14.0, -0.875, -3.75, -10.0, -14.0, - 0.75, -96.0, 3.5, 2.5, 0.0, 72.0, 13.0, -8.0, - -2.0, 3.5, 40.0, -1.875, -88.0, -1.5, 4.0, -9.0, - -12.0, -6.0, 26.0, 48.0, 1.5, 0.375, 2.25, 8.0, - 1.875, -3.0, -6.0, -1.125, -0.125, 1.0, -0.75, 0.5, - -3.5, 28.0, 1.25, 40.0, 64.0, -7.0, 2.0, -8.0, - -0.75, -0.875, 52.0, -10.0, -28.0, -0.8125, -2.0, 24.0, - -4.0, -0.8125, -0.125, 24.0, -3.0, 2.0, -0.4375, -1.625, - 48.0, 14.0, -8.0, -64.0, 0.75, -10.0, 104.0, -11.0, - 3.0, -128.0, -4.0, 0.0, -2.0, -8.0, 0.0, 2.0, - 48.0, -24.0, -6.5, -40.0, 30.0, 1.0, 14.0, -5.0, - -0.125, -16.0, -2.5, -3.0, -0.25, -80.0, -20.0, 1.0, - -20.0, -2.0, 1.5, -0.625, -1.75, -2.75, 0.25, 24.0, - -1.375, 0.625, -60.0, 56.0, -2.75, 15.0, -2.0, -15.0, - -10.0, 0.25, 0.75, 0.75, 20.0, 32.0, -20.0, -0.625, - -8.0, -8.0, 13.0, -24.0, 1.875, 0.0, -96.0, 36.0, - 4.0, -1.0, 3.0, 72.0, -8.0, 1.0, 1.0, 5.0, - 0.125, 3.5, -5.0, -32.0, 5.0, -12.0, -20.0, 0.75, - 0.875, 40.0, -1.625, 0.0625, -3.75, 52.0, -24.0, -7.0, - 11.0, -1.5, -2.0, 20.0, -16.0, -0.5, -40.0, 12.0, - 2.0, -40.0, 60.0, -0.625, -18.0, -0.375, 18.0, -0.25, - -0.6875, 1.0, 32.0, 48.0, -10.0, -20.0, 15.0, 0.4375, - 20.0, -8.0, -40.0, 1.0, 20.0, 28.0, -3.0, -1.125, - 56.0, 13.0, 36.0, 26.0, -7.5, -10.0, -0.5625, -12.0, - -10.0, -15.0, -0.3125, 0.125, -2.0, 2.0, 2.25, 28.0, - 0.875, -0.375, -2.0, -104.0, -8.0, 18.0, 14.0, 26.0, - -30.0, 1.5, -1.875, -0.875, 36.0, -3.25, -26.0, -12.0, - -18.0, -1.0, 0.0, 2.5, 0.1875, 28.0, -0.6875, -3.0, - 0.25, 5.0, -6.5, 0.0, 0.25, -12.0, -3.5, -0.625, - 5.0, -40.0, -22.0, 0.5, -18.0, -2.5, -22.0, -56.0, - -1.0, -2.0, 2.0, 1.25, -60.0, -18.0, 10.0, 56.0, - -8.0, 0.875, -28.0, -26.0, 7.0, -112.0, -16.0, 11.0, - -16.0, 4.0, -0.4375, -0.5, 4.0, 72.0, -28.0, -8.0, - 0.9375, -2.5, 64.0, -32.0, -0.875, -16.0, -64.0, 1.625, - -36.0, -16.0, 0.3125, 14.0, -0.5, 8.0, 0.5, 14.0, - -28.0, -1.0, 24.0, -0.8125, -0.5, -7.0, 0.8125, -0.6875, - 0.5, 20.0, -0.125, 6.0, 11.0, -0.4375, 8.0, -4.0, - 0.3125, 0.0, 0.6875, -60.0, -16.0, -112.0, -1.0, 5.0, - -11.0, 1.625, -1.75, 0.75, -6.0, -6.5, -0.5, 16.0, - 0.0, 5.0, 0.5, 0.25, 3.0, -0.4375, -15.0, 60.0, - 16.0, 28.0, -3.5, 5.0, -22.0, 44.0, -52.0, -112.0, - -15.0, 26.0, -4.0, -48.0, -128.0, 2.0, 112.0, -2.25, - -30.0, 6.0, -5.0, -1.25, 0.4375, 44.0, -1.25, -88.0, - -96.0, 6.0, 1.25, 3.25, -60.0, -0.875, 9.0, 3.25, - -1.5, 6.0, -16.0, 8.0, -1.5, -1.0, -32.0, -0.5, - -64.0, -7.0, -32.0, -2.0, 1.375, 24.0, 2.0, 0.4375, - 104.0, 24.0, -12.0, -3.5, -28.0, 4.0, 0.0, 5.5, - 104.0, -5.5, -1.875, 12.0, -8.0, -2.0, 0.5, -32.0, - -0.5, -22.0, 44.0, 2.0, -6.0, -6.5, -0.75, 0.0, - 3.0, -18.0, -20.0, 88.0, 1.75, -0.6875, 12.0, -1.0, - 20.0, -0.25, 20.0, -0.25, 1.0, -26.0, -32.0, 3.5, - 14.0, -14.0, 2.75, -12.0, 5.0, -12.0, -5.5, 0.5625, - -8.0, -36.0, -3.5, -4.5, -3.5, 5.0, 1.0, 1.125, - -1.5, -1.875, 0.0, 3.0, -1.25, -16.0, -3.5, -14.0, - 0.0, 52.0, -0.625, 10.0, -0.375, 0.0, -44.0, -3.5, - 1.0, 0.0, 0.5, 15.0, 96.0, 0.9375, 9.0, 0.0, - -3.5, 8.0, -96.0, 40.0, -2.0, 18.0, 1.0, -0.8125, - -104.0, 0.375, 7.5, 40.0, 2.75, 0.375, 56.0, -1.0, - 48.0, -0.9375, -14.0, 1.5, 3.5, 40.0, -1.0, -5.0, - -1.0, 48.0, 18.0, -0.6875, 3.0, -3.5, -1.5, -5.0, - -16.0, -0.4375, 0.875, -16.0, -7.5, 4.5, -0.625, 2.0, - 22.0, -4.0, -20.0, 2.0, 2.25, 64.0, 0.1875, -2.0, - 1.375, -44.0, -30.0, -2.0, -0.625, -64.0, -5.0, 16.0, - 48.0, -2.0, 1.0, -1.75, -3.0, -4.0, 48.0, -48.0, - -16.0, 20.0, -1.75, -2.0, -20.0, 2.0, -1.0, -4.0, - 6.0, 0.25, -3.0, -32.0, -16.0, -1.75, 0.5625, 8.0, - 3.25, 0.0, -14.0, 14.0, -0.375, -1.75, 12.0, 1.0, - 32.0, -3.5, -0.375, -3.75, 1.0, 3.5, 7.5, -1.5, - -88.0, 0.5, 0.0625, 1.875, -0.375, -3.75, -24.0, 10.0, - -0.5625, -4.0, -1.875, -0.625, 0.8125, -0.125, 0.625, -1.0, - 3.5, 0.5, -10.0, -3.0, -5.0, 10.0, -15.0, 0.5, - 16.0, -26.0, -1.0, 0.875, 12.0, -16.0, 0.5, 1.75, - 0.0625, 0.0, -5.0, 0.25, 13.0, 80.0, 0.375, 0.0, - 2.5, 20.0, 1.0, 72.0, -0.5, 20.0, -0.6875, 0.0, - 52.0, 12.0, -1.625, 0.0, 0.0, -6.0, 64.0, -0.375, - 0.5, 11.0, 16.0, -8.0, 0.375, -0.75, 14.0, -14.0, - -4.0, 104.0, -5.5, -12.0, -0.875, -22.0, 112.0, -3.0, - 0.875, -8.0, -3.5, -24.0, -30.0, 48.0, 120.0, 16.0, - -128.0, -2.25, -0.75, -44.0, -0.5, 6.0, -104.0, 0.3125, - 20.0, -120.0, -3.0, 4.0, -64.0, -16.0, -3.5, -16.0, - -0.4375, 60.0, 1.0, -2.25, -26.0, 20.0, -1.0, -24.0, - 10.0, -16.0, 40.0, -2.75, 16.0, -1.75, 0.75, 40.0, - 22.0, -3.5, 12.0, -11.0, 1.5, 120.0, 1.5, 0.3125, - -24.0, -16.0, -72.0, 2.75, 1.5, 2.0, -6.0, 36.0, - -8.0, -26.0, -12.0, 2.5, 60.0, -1.75, -0.625, -0.25, - 4.0, -0.75, -0.5, 56.0, 0.0, 0.5, -11.0, -0.75, - -22.0, -0.75, -0.875, 1.25, 0.375, 8.0, -1.0, 80.0, - 56.0, -8.0, 28.0, -72.0, 0.25, 32.0, 112.0, -1.5, - -0.75, 1.875, 4.5, 0.0, 12.0, -20.0, 20.0, 7.0, - -1.25, 7.0, 0.8125, -4.0, -40.0, 12.0, 30.0, -24.0, - -0.75, 0.0625, 0.5625, -3.75, -5.0, 2.0, -9.0, 0.875, - -3.5, 1.0, 12.0, 0.0, 6.0, -1.0, 1.5, 40.0, - 60.0, -0.125, 3.0, 5.5, -72.0, 12.0, 0.0, 10.0, - -12.0, -3.75, 2.0, -8.0, -3.0, -18.0, 15.0, -48.0, - -4.0, 0.25, -0.5, -1.75, 11.0, 1.75, -0.375, -8.0, - 0.0, 0.5625, 1.5, 0.0, 0.0625, -6.0, 11.0, -120.0, - 8.0, -0.375, -3.5, -4.0, 0.375, 26.0, 28.0, -8.0, - -0.625, 1.75, -10.0, -0.1875, 3.0, -0.5, -60.0, 0.75, - -1.25, 0.0, 5.5, -1.5, -1.5, -80.0, -1.0, 0.8125, - -2.0, -2.0, -4.0, 0.1875, -32.0, 72.0, 7.0, 32.0, - 56.0, -8.0, 26.0, -9.0, -6.0, 10.0, -6.0, -36.0, - -1.0, -0.375, 4.5, 8.0, 0.75, 0.3125, -3.0, -2.0, - -0.375, -0.625, 0.25, 0.3125, -1.625, 26.0, -16.0, 24.0, - -0.125, 2.75, 1.0, 16.0, -104.0, -20.0, -56.0, -7.0, - -0.875, -4.0, 0.25, 2.5, 6.0, 0.5, 24.0, -16.0, - 0.875, -0.3125, 1.25, -36.0, -32.0, -0.5625, 120.0, 48.0, - 2.0, 1.5, 0.25, -3.5, -40.0, 3.75, 22.0, 24.0, - -64.0, -1.0, 10.0, -128.0, -56.0, 3.5, 104.0, 2.0, - -1.75, -15.0, -14.0, 40.0, 0.9375, 3.0, 24.0, 5.0, - 1.25, -10.0, 15.0, 1.5, 24.0, 3.0, 0.1875, 3.0, - -1.0, -0.9375, 1.5, 8.0, 4.0, -6.5, -0.5, -16.0, - -32.0, 0.375, 0.25, 0.0, -2.0, 3.25, -64.0, 18.0, - 24.0, 1.5, 0.625, 16.0, -1.5, -64.0, 1.5, -1.0, - 22.0, 1.5, -4.0, 2.75, 1.75, 7.5, 2.25, 0.375, - -11.0, -1.75, 0.5625, 1.5, 32.0, -12.0, -1.25, -4.0, - 48.0, 6.0, 2.0, -28.0, 0.0, 0.0, 0.5, -1.0, - -3.5, -16.0, 1.75, -0.375, -40.0, -88.0, 22.0, 28.0, - -5.0, -1.0, 9.0, -120.0, 3.5, 0.25, -56.0, 0.0, - 0.0, -6.5, -44.0, 6.0, 8.0, -3.25, 52.0, -1.375, - -32.0, 16.0, 0.625, -0.25, 10.0, -8.0, -6.5, -4.0, - 88.0, -8.0, -30.0, -64.0, 0.8125, -1.75, 88.0, 7.0, - -16.0, 1.75, 1.75, 0.0, -40.0, -1.25, 0.625, 1.125, - -1.625, 0.375, -1.25, 1.875, 3.0, -8.0, 36.0, 4.0, - -12.0, 56.0, -1.0, -1.5, 2.0, -12.0, 24.0, -22.0, - -4.0, -1.0, -4.5, 2.75, 14.0, 3.5, 24.0, 32.0, - -2.75, -10.0, 3.0, -4.0, 0.1875, -7.0, 0.0, -0.25, - -0.75, 8.0, 36.0, 6.0, -1.0, 0.0, -20.0, -20.0, - 44.0, 14.0, -3.5, 1.75, 8.0, 5.5, 2.0, 3.5, - -2.0, -16.0, 1.0, 0.5625, -22.0, 44.0, 3.75, -1.0, - 3.5, -6.0, -32.0, 40.0, 1.875, -0.125, 6.5, -3.25, - -1.25, -16.0, 3.0, 2.0, -2.0, 6.5, -10.0, -72.0, - -1.0, 2.0, -2.0, 1.625, -0.875, -32.0, -32.0, 1.125, - 12.0, 56.0, 60.0, -11.0, 0.75, -1.0, 96.0, -80.0, - 2.75, 0.375, 12.0, -18.0, 0.125, 4.0, -1.875, 7.5, - 56.0, 4.0, 2.5, 28.0, 40.0, 0.875, -1.125, 88.0, - 0.1875, -12.0, -128.0, -1.75, -24.0, -32.0, 3.25, -64.0, - 2.0, -13.0, 15.0, 1.75, 20.0, 0.75, -1.0, 1.875, - 32.0, -0.25, -32.0, 3.0, -2.0, 0.9375, -4.0, -0.875, - -24.0, -1.5, 60.0, -5.0, -0.125, 0.4375, 30.0, -6.0, - 0.875, -}; -static data_t verify_data[M_DIM * N_DIM] = { - -962.921875, -4579.5703125, -232.078125, 1570.3984375, - 1315.828125, 3846.3671875, -10134.34375, -530.9375, - 8636.51171875, -19297.46875, 314.3046875, 5595.1796875, - -3201.51171875, 29362.3203125, 10196.08984375, 10705.84375, - 2120.17578125, 6486.703125, -1152.78125, -8443.953125, - -11839.1484375, -41.375, -2676.8359375, -3116.296875, - 3359.390625, -13558.7265625, 458.03125, 12407.6171875, - 6181.8203125, -3769.8828125, 2464.609375, -4219.6484375, - -3468.609375, -4070.8359375, 76.46875, 5017.05078125, - -5057.8828125, -8253.9453125, 4844.2265625, 4012.86328125, - 4411.6328125, 2736.421875, 6881.296875, 3218.09375, - 9695.9765625, -10341.703125, 611.7265625, 1372.9296875, - 5624.7890625, 6818.43359375, -396.5234375, 1898.8046875, - -3272.58203125, 5205.125, 17108.609375, -7816.1015625, - -1110.5, -5132.0390625, 12762.8515625, 99.5, - 2846.1484375, 5288.78125, -9673.765625, 16681.484375, - 8192.3828125, -612.30078125, -2964.5859375, -10266.234375, - 1437.953125, 14647.0390625, -6906.6640625, 7689.984375, - 7519.421875, 6888.96875, -1388.3984375, -1693.203125, - 4694.90625, 5007.13671875, 49.640625, -15205.0078125, - -10791.4453125, 5086.265625, -11287.046875, -2304.6640625, - 4761.515625, 5984.578125, 1535.015625, -1097.6484375, - 10582.76171875, -833.1953125, -9493.9609375, -2631.328125, - 5318.046875, -18405.7890625, -9738.109375, -2927.40625, - -4257.515625, 1960.890625, -12663.03125, 1818.59375, - -813.6015625, 2743.3046875, 3622.62890625, 16237.984375, - -5423.5625, 603.375, -5159.078125, 14513.5390625, - -18203.1328125, -10983.15625, -8462.0703125, -4135.90625, - -8955.40625, -3351.421875, -400.46875, -4388.125, - 1132.296875, -1082.75, 5492.546875, -6468.12890625, - 1195.203125, 7114.6875, -7146.109375, -819.59375, - 96.671875, -19853.3125, -7435.890625, -5847.125, - -2013.75, 5447.5625, -4192.390625, 2301.328125, - 5372.6484375, 3944.671875, -17954.421875, -9159.51171875, - -6146.6640625, -17800.078125, -753.53125, -1338.953125, - -3756.625, -13881.6328125, 4504.5, 5653.96875, - -3874.75, 5817.0859375, 5659.2734375, 1001.6484375, - -8899.9453125, 751.4453125, -4328.5, 9510.4921875, - -2040.2890625, -6154.1796875, 11422.62890625, 4455.140625, - -12140.12890625, -7105.640625, 6353.91796875, -9981.90234375, - 15573.5078125, -1010.78125, -6003.7421875, 10886.390625, - -2967.859375, 15496.453125, -1345.2109375, 1009.609375, - -906.90625, -3682.2734375, -3301.16796875, -5978.4296875, - 1938.7578125, 1567.8515625, 5264.4375, 6657.40625, - -430.2578125, -11876.5703125, 3865.2890625, 22540.515625, - -40.4296875, -52.78515625, -3751.375, -3152.6796875, - -4191.0, -6110.53125, -829.578125, 6409.4375, - -902.6328125, 2972.4296875, -41.3515625, 1390.6953125, - -3414.46875, 893.0, 14839.26953125, -2716.140625, - -5438.140625, 13024.359375, 6376.16015625, -4413.921875, - 11148.93359375, -791.75, 10513.78125, -997.7265625, - -5867.875, 7855.234375, 749.2109375, 13426.2734375, - -562.9453125, -8078.265625, 14423.65625, 3820.578125, - 518.859375, 17956.6875, 3729.796875, -6848.0234375, - -5380.21875, 10608.890625, -2064.7890625, 2695.3046875, - 7688.21875, 424.5625, 1827.59375, 1501.484375, - -9641.421875, 6140.890625, -32397.859375, -8187.28515625, - 25620.40625, 24028.5859375, 19066.5390625, -18387.46875, - -24298.3125, -11866.546875, -11776.265625, 6682.0078125, - 2260.875, 1448.421875, -9810.296875, 187.703125, - -1034.4140625, 3826.4921875, 2511.2265625, 19030.6484375, - 16896.671875, -11282.5703125, 10867.703125, 6316.3046875, - -10131.6875, 1497.484375, 12707.6875, -918.609375, - -9045.46484375, -8501.93359375, -5893.0625, -9092.578125, - 1352.4453125, -3581.69921875, 14647.1328125, -6187.84375, - -9997.8046875, 1416.5078125, -8998.71875, -1202.2890625, - 1460.78125, 9894.796875, -225.171875, -7259.765625, - -1766.40625, -17927.4921875, -5418.19921875, 9556.4375, - -1893.8671875, 370.8671875, 8006.15625, 13180.46875, - -9807.09375, -597.578125, -3119.828125, 3995.1796875, - 2274.84375, 744.1015625, -10559.046875, 3406.3828125, - 8567.921875, 656.8203125, 8865.3046875, 5683.296875, - -3656.21875, 5204.078125, -14087.953125, 3456.7578125, - -4111.01171875, 769.296875, -5225.77734375, -8944.3125, - 5911.421875, -5497.921875, -5473.3359375, -1310.9375, - -595.890625, -3526.578125, 6494.90625, -2401.03125, - 1272.5703125, 11096.9375, 2989.46875, 8719.515625, - 2296.70703125, -18440.28515625, -1883.453125, -8101.31640625, - 10425.609375, 15643.09375, 4110.8203125, 8556.125, - -1175.734375, -6013.0, 3191.953125, -20063.375, - -6720.6953125, 9516.28125, -10731.6484375, 213.2265625, - -12051.6875, -1714.54296875, -2629.578125, 6128.375, - -8030.828125, -1659.796875, 7753.4375, -4273.21875, - -4241.359375, 4440.578125, -5769.765625, 30.59375, - 2902.46875, 4057.25, 740.75, 12981.2265625, - 9137.03125, -2891.515625, 9225.2109375, 7000.03125, - 6542.5625, -1446.0546875, -3837.90625, -7054.6796875, - 9578.078125, -8510.96875, -14916.5, -4111.734375, - 6451.46875, -3918.7265625, -8379.015625, -5997.765625, - 12018.63671875, -6186.3515625, 3906.1484375, 13509.890625, - -1310.75, -5911.12109375, 2408.65625, 6205.47265625, - -6269.28125, 8040.14453125, 7478.8125, 4957.046875, - -1659.25, -940.08984375, -28018.453125, -1813.7265625, - 10409.5703125, 6684.6328125, -2771.921875, 2736.59375, - 1549.671875, -404.328125, 760.703125, -3870.41796875, - -6506.62890625, -5211.55859375, -1194.84765625, 4695.890625, - -4417.9375, -1142.578125, -5669.0234375, -2647.296875, - 24835.4296875, -1811.4921875, -21943.21875, 9542.34375, - 2112.890625, 4843.203125, -3169.75, -2589.890625, - -7981.11328125, -8307.609375, 16737.21875, -2096.46484375, - -15382.7578125, -479.390625, -8601.296875, -3595.1484375, - -10111.40625, 13030.5625, -782.484375, 886.5390625, - -13134.46875, -10075.2265625, 514.703125, -2841.2578125, - 3466.3671875, 11119.796875, 12955.96875, -6090.421875, - 161.1171875, 18055.8125, -2782.17578125, -595.71875, - 2529.859375, 17484.31640625, 5440.296875, -6215.953125, - -12698.2734375, 11181.375, -2181.06640625, 10092.125, - -316.7578125, -4571.54296875, -5888.83984375, -7171.984375, - -1955.90625, 8976.23046875, 1334.2734375, 6768.09375, - 11218.984375, -796.58984375, -3526.875, 2364.859375, - 291.5859375, 4765.0546875, 1724.86328125, 6096.7421875, - -15380.859375, -943.171875, 3634.9453125, 1121.6875, - -16314.3515625, -45.66015625, -4472.1875, -534.87109375, - -13407.078125, -2798.3046875, -482.9375, 1567.8671875, - 4008.8828125, 1659.51953125, -6466.359375, -6301.26171875, - 1536.84765625, -3764.5234375, 9471.44921875, 10986.52734375, - 1233.34375, -12361.63671875, -2207.671875, 14478.6171875, - 5855.25, -2477.0546875, 3554.15625, -3163.86328125, - 9238.3984375, -1684.9609375, 6054.35546875, 15124.25390625, - 621.28125, 5045.578125, -3923.1171875, -11418.5, - 3966.83203125, -6620.8046875, -14072.37109375, -2644.49609375, - -3961.3984375, -1238.984375, -4068.3828125, 9448.21875, - -17758.8359375, -969.98046875, -6689.54296875, -1038.203125, - 1090.93359375, 11023.9921875, -8386.109375, 2099.0078125, - 3561.2265625, -6595.8515625, -3021.40625, -1717.234375, - 1472.796875, -1272.7421875, -3792.734375, 1945.765625, - -2524.625, -6801.36328125, -5219.859375, -8813.56640625, - 1207.6875, -847.671875, 13103.09375, 544.0625, - -4209.296875, 3865.265625, -6289.0, 4622.4453125, - -4331.9375, -1561.0, -652.421875, -8730.453125, - -2489.5625, -11380.61328125, -4839.83984375, 695.046875, - -9254.62890625, -1765.7578125, -1573.4609375, 18.046875, - 403.390625, 18699.1796875, 11189.34375, 4405.5390625, - -8463.3671875, 273.140625, 1250.6953125, 1383.421875, - 11594.390625, -3580.328125, -1849.20703125, -5618.171875, - 13040.25, 9832.46875, -1894.75, 6632.4921875, - 2777.90234375, 3453.53125, 11907.515625, 1985.578125, - -5728.625, -5909.21875, -3474.234375, 2744.7578125, - -1080.328125, 1954.6796875, -2094.7109375, 2886.265625, - -3167.46875, -2639.03125, 8422.4453125, 3428.796875, - 14677.60546875, -3029.40625, -4722.4453125, -171.48828125, - -3147.390625, 6384.5, -5844.421875, 2799.34375, - -2869.578125, 113.9453125, 3702.3984375, 12179.84375, - 124.28125, 1797.8359375, 13486.6640625, -10983.6953125, - -7131.13671875, 2913.46484375, 1921.58203125, -12643.265625, - -9271.65625, 5979.34375, -6034.7265625, -697.015625, - 5517.125, -14464.75, 8829.2109375, 1389.421875, - -11083.7265625, -299.8984375, 2902.5390625, -3198.7890625, - 3685.69921875, 400.77734375, 5528.578125, -5401.44921875, - -315.80078125, -4144.296875, -4663.9609375, 4130.21875, - -2109.5625, 7786.30859375, 6453.953125, 7185.0078125, - 169.078125, -6052.796875, 2790.359375, 10089.19921875, - 3582.53515625, -13625.48828125, 4091.8125, -3882.0625, - 2445.15234375, 10571.515625, -8727.40625, 6677.859375, - -2816.6875, 69.52734375, -16861.65625, -4336.1953125, - -2397.33203125, 12973.9375, 1434.265625, -2170.203125, - 1059.05078125, -4593.52734375, 8530.76171875, -6000.6015625, - 10848.484375, -3663.765625, 1175.078125, 4848.84375, - -12867.1328125, 905.609375, -2618.484375, 3544.015625, - 492.83984375, 908.390625, 1171.1171875, -7626.4140625, - 4935.6953125, -11681.28125, 1298.45703125, -6775.75, - -4508.3359375, -8150.265625, -5543.453125, 5321.046875, - -2812.9296875, 1700.5859375, 15196.703125, 8709.44921875, - -2590.53125, -2265.00390625, -9322.6953125, -1862.29296875, - 1582.921875, 8805.640625, 9123.671875, 4463.25, - -966.140625, 21800.26953125, -6191.38671875, -13066.625, - 2581.671875, 4497.99609375, 7750.9375, -963.7890625, - -16283.734375, 9328.37890625, 719.625, -8319.515625, - 990.83984375, 115.71875, -4757.4375, -12132.609375, - -5048.46875, 3358.6953125, 2169.640625, -1206.1875, - -2373.52734375, 5250.265625, 2624.5703125, -241.58203125, - 545.2890625, -6011.0390625, -2217.8828125, 7.296875, - -10272.9296875, 4795.05078125, 6389.609375, 2983.19140625, - -3210.4765625, -2418.203125, 11548.2421875, -1436.4765625, - 5976.4921875, -1426.01171875, 15339.390625, 12235.8828125, - 4280.3359375, 9860.8984375, -11265.7109375, -2062.84375, - 8445.09375, 10509.0703125, 12674.9765625, -3384.84375, - -6051.4296875, 10267.4765625, -1943.2421875, 1454.3125, - -13887.65625, -7876.68359375, 6072.46875, 380.71484375, - -26573.62109375, 8744.703125, 11349.5390625, 98.28125, - -6700.40625, 9768.4609375, 7103.625, -3205.00390625, - 2922.3515625, -3140.1484375, 1935.2265625, 24646.5859375, - 8215.8828125, -12553.5859375, 8620.6328125, -1491.51171875, - 19683.8828125, -98.140625, -3102.4375, -12308.28125, - -8769.953125, -3879.8828125, 6258.765625, 5844.328125, - 2418.2890625, 9644.3046875, 4142.9453125, 11906.21484375, - -4808.96875, -4294.4375, -13245.0703125, 5274.73046875, - 4608.546875, 1911.5234375, 1793.78515625, 8191.7734375, - -186.34375, 2326.21875, 7317.25, -5874.875, - -7759.140625, -15317.1875, 3586.46875, -501.8125, - 17721.8984375, -9203.7109375, 432.7890625, -10494.66015625, - 51.73046875, -3426.62890625, 7253.71875, -696.828125, - 4823.2734375, 5774.0078125, -14574.0546875, 5762.4765625, - 7137.296875, -12802.4609375, -2966.1171875, 2155.53125, - 11809.24609375, -10124.58984375, -2451.9453125, -7041.84375, - 19621.2578125, 12130.015625, 4714.96875, -10061.171875, - 8764.1484375, 315.515625, 2899.328125, -8900.5625, - -4164.0, 3491.09375, -14471.78125, 5006.4296875, - 18855.3203125, 7103.12890625, 17450.53125, -3461.15625, - -1826.63671875, -10226.07421875, -9168.796875, 16490.015625, - 13874.140625, 4542.265625, 4831.953125, 10844.4921875, - -11796.28125, -3721.80859375, 402.46875, 3713.0390625, - 5172.8515625, -3439.046875, -966.921875, 8444.9375, - -12984.2890625, 5040.53125, -11961.3828125, -610.875, - 11268.5390625, -19759.6875, -11854.421875, -7894.6015625, - 4512.34765625, 5478.640625, -2123.5390625, 10750.046875, - 17720.171875, -6471.3515625, 310.5625, 636.65625, - 6300.98828125, -1519.59375, 720.01953125, 3521.4296875, - 8811.234375, 2838.97265625, 7522.34375, 6373.1015625, - 9152.62890625, 2564.3671875, -5876.26953125, -226.96875, - -1256.09375, 5240.0078125, -223.5703125, -3807.96875, - 2796.203125, 2170.265625, 13677.40625, 3312.90625, - 992.734375, -12121.19921875, -2314.7265625, -16617.984375, - -2377.3828125, -8766.515625, -5280.859375, 7426.609375, - -3126.4921875, -14146.59375, 6781.5390625, -3656.54296875, - -3220.046875, -2467.03125, -13994.76953125, -13406.109375, - 2750.40625, 2009.71875, 9084.88671875, 10870.78125, - 1237.984375, 9400.8828125, 8103.8125, 98.234375, - 3879.5, 6793.9140625, 2350.3203125, 867.89453125, - 3153.5703125, 405.8984375, 15001.45703125, -6922.140625, - 8302.36328125, 20162.46875, -10005.33203125, -3134.1875, - 442.52734375, 5335.8125, -1780.9453125, 4755.2734375, - -8330.05078125, -13095.078125, 14796.609375, -1250.78125, - 7710.90625, 6208.69140625, 8361.9921875, 136.7421875, - -6781.26171875, 3155.72265625, -8027.9375, 9659.796875, - 1655.4375, 5113.3359375, -5267.52734375, -2546.828125, - -1375.9375, 20302.515625, -16389.91796875, -7039.765625, - 8786.59765625, -6204.8515625, -5223.140625, -10122.3984375, - -6418.828125, 1499.51171875, -2285.33203125, -6749.5, - 7445.0703125, 8961.203125, -7813.328125, -8152.8828125, - -8211.51171875, 459.84375, 8502.3515625, -7247.57421875, - -1954.6796875, 3581.984375, 4361.7265625, 5029.78125, - -2133.625, 8186.9296875, 402.3984375, 9831.890625, - -7555.9375, 6810.109375, 27317.3828125, -5793.1484375, - 4902.65234375, 30430.91796875, 10101.3671875, -131.21875, - 8226.75, -363.3125, -6525.0859375, 6029.015625, - 911.65625, 18579.875, -5398.15625, 7397.703125, - -20507.640625, -6072.3515625, 466.5625, 3911.875, - 13595.8046875, 5814.9296875, -6476.125, -17271.7109375, - 7055.703125, -1991.4375, 11119.1328125, 5513.203125, - -13268.296875, 12959.84765625, 3261.84765625, -6041.38671875, - -1451.1875, -2741.8359375, 4679.8515625, 8325.15234375, - -8212.83203125, 2550.140625, -1962.984375, -5097.16796875, - -7526.9375, 1690.421875, -791.4921875, -17633.90625, - 10536.3671875, 588.5390625, -21349.9453125, -4193.828125, - 5434.65234375, 1632.40625, 5551.5390625, 4738.1875, - 4698.0078125, 13682.2734375, 10561.66796875, 20842.83203125, - -3049.9375, 11724.625, 2013.28515625, -359.3359375, - -1228.08984375, -5700.01953125, 1846.734375, -2542.38671875, - -5062.59765625, 12759.22265625, -2223.6953125, -3958.78125, - -4606.5703125, -6022.5625, 2801.45703125, 3016.66015625, - -4336.703125, -8962.046875, -1214.3046875, 4881.96875, - 8409.734375, 3518.7578125, 6378.54296875, 3617.03515625, - -7726.96875, 1247.09375, -9854.7734375, 578.4453125, - -1698.6796875, -6564.078125, -600.6015625, -48.0703125, - -482.28515625, -499.36328125, -4598.484375, 6875.296875, - -9999.62890625, -1654.76953125, -7334.0390625, -11130.125, - -4724.9765625, 10080.41796875, 3497.32421875, 1904.71875, - 1735.8359375, 329.31640625, 2756.2109375, 15843.62109375, - 7094.390625, -10796.0234375, 3323.09375, 3165.359375, - -674.9140625, 5434.515625, -4364.23828125, 6721.921875, - -9466.28125, 4881.953125, 5278.36328125, 9374.875, - -8647.60546875, -10378.2578125, 52.921875, -330.0078125, - -2632.5078125, -7189.140625, -8521.17578125, -6192.640625, - -1737.0546875, 7739.5859375, 12849.8984375, -5638.4375, - -599.953125, -1148.0859375, 5105.1171875, 6434.32421875, - -842.46875, 5417.06640625, -13166.9296875, -1439.90625, - -5363.7734375, -3793.2734375, 5430.0234375, 364.125, - -6158.296875, -5100.8046875, -8943.3359375, 297.21875, - -1388.30078125, 4074.75, -2279.484375, 8999.8984375, - -6213.50390625, 6290.2421875, -3320.3515625, -345.515625, - -10676.9375, -1554.2109375, 8049.53125, 19862.28125, - -16832.375, 11367.66796875, -13532.0078125, -10037.515625, - -10439.859375, 4160.4453125, 15198.3046875, -1461.4765625, - 5066.234375, -15398.7109375, 7979.6171875, -3479.953125, - 7973.921875, 4398.4375, -9525.453125, -2102.625, - 992.4765625, 1012.203125, -7444.265625, 298.4140625, - -13625.4921875, 2380.35546875, -39.125, -9062.6484375, - -4406.90625, -9244.546875, 13126.5234375, -8079.125, - -8129.921875, -15156.859375, 4273.15625, 1531.9375, - 1629.5390625, 8234.1953125, 880.046875, 10986.953125, - 8231.42578125, 6572.234375, 870.109375, 7118.1640625, - -1216.296875, 8819.3671875, -13598.71875, 9870.03125, - 8961.359375, 3042.515625, -464.2890625, 4091.328125, - 4383.78125, -12566.4296875, 15710.9375, -3111.3203125, - -1641.5078125, -850.8828125, 1870.88671875, -5924.1875, - -789.234375, -9836.54296875, 11018.5234375, 4021.9296875, - -1591.703125, -23661.515625, -606.7578125, 2916.5546875, - 5694.578125, -4335.7421875, 1867.984375, 2864.0546875, - -9687.6328125, 2977.265625, 4829.15625, -793.90625, - -6403.09375, 4009.3671875, 1235.859375, -5111.37109375, - 3743.234375, 1044.5234375, 10311.953125, 15475.8515625, - 13396.5390625, -5779.75390625, 4099.01171875, 3971.85546875, - -21642.8359375, -9016.8984375, 991.06640625, -1499.5859375, - -1931.546875, -6850.0234375, -13742.65625, 1921.8203125, - 385.63671875, 110.171875, -8725.6875, -4821.87109375, - -1760.546875, -12105.640625, -3072.40625, -11057.671875, - -14803.83984375, 6457.1875, 5510.5625, 14268.6796875, - -3187.0, -1468.171875, 8290.15234375, -5118.88671875, - 18904.203125, 777.4140625, 6964.34375, -6751.36328125, - -15351.0625, 2460.734375, 2172.546875, -1478.22265625, - 583.73046875, -21155.59765625, -1222.0078125, 1149.51171875, - -18621.93359375, 1150.1796875, -15502.8984375, -10512.796875, - 2333.875, -7427.70703125, -3532.2109375, -10533.375, - 10234.0234375, -7763.68359375, -7820.3828125, 9837.21875, - 1502.9375, 6624.09765625, 9465.5, 4685.390625, - -7997.30078125, 19813.765625, 11448.921875, 7317.3125, - 1144.953125, 1661.0, -13022.75, -2488.5078125, - 9700.03125, -14273.015625, 14273.50390625, 3497.796875, - 13277.9609375, 4424.6953125, 3288.953125, -7999.5546875, - -629.15625, -5302.28125, -1165.75390625, -16512.7421875, - -5352.9375, -3966.99609375, 13900.078125, 4646.27734375, - -19360.71484375, 200.4453125, 5776.078125, 5146.3125, - 17052.80078125, -8451.3828125, 4551.0546875, -637.1875, - 14064.765625, -3204.1875, -15755.6796875, 22656.9375, - 17188.08203125, 5993.66015625, 8629.0234375, 10228.3671875, - 97.03125, -2071.5078125, -9269.8828125, 3044.5, - -4104.765625, -8425.078125, 4096.3515625, -54.18359375, - 1929.0078125, -10502.171875, -9286.90625, 12824.2890625, - -4961.21484375, 4973.49609375, 2677.6796875, -2512.734375, - -2678.40625, 1963.25, 1656.67578125, -7535.734375, - -8619.6640625, -2152.78515625, 8436.23046875, -2169.5390625, - -6024.71484375, 5750.828125, -15.1875, 4911.5546875, - -2139.61328125, -10656.4609375, 2590.296875, 2810.4921875, - 8778.6171875, 2595.41796875, 9313.4765625, -6609.96875, - 5969.453125, -6192.3203125, -5862.64453125, 4340.4453125, - 445.05859375, 3597.30078125, 135.46875, 82.44921875, - -2323.24609375, -12539.93359375, -458.98046875, 4365.09765625, - 7465.9453125, 5457.265625, -11300.28515625, 5555.80078125, - -3844.24609375, 18723.4375, -9272.625, 4792.1484375, - -1704.078125, 3085.921875, 3256.75, -8131.69140625, - 10035.328125, -1643.8984375, -1354.7890625, -10991.234375, - 1819.796875, 7491.03125, -93.05859375, -1000.7578125, - 7519.3203125, -4046.84375, -3477.4296875, -2203.5859375, - 5078.953125, -5384.2265625, -738.78125, -3612.9609375, - -3010.1875, 613.91796875, -4013.80078125, 5952.78125, - 3788.59765625, -4194.359375, 4637.078125, -2424.90234375, - -9521.88671875, -6227.0625, 5597.078125, 7952.6953125, - -214.8515625, -9673.171875, -15807.3984375, -13185.98046875, - -1236.92578125, 26075.15234375, -3439.8203125, -15925.109375, - -2917.13671875, 12099.828125, -1994.0078125, 13824.0390625, - -4691.5, -11478.6953125, 11568.27734375, 1852.19921875, - 3608.1875, 6451.8828125, 6615.9453125, -1841.8203125, - 15046.9140625, -1700.84375, -12258.2890625, 9275.109375, - 8739.8046875, 11379.9375, -9922.8984375, -4537.5546875, - 245.59375, -8121.578125, -6202.109375, 2301.46875, - 4053.53125, -2875.2578125, -8367.796875, 7868.91015625, - 9182.87890625, 4327.67578125, 10952.1328125, 5841.140625, - -9745.21875, -5236.7734375, -4253.26171875, 9739.859375, - 788.57421875, 3051.31640625, -4980.6015625, 1146.328125, - 11727.0625, -5906.0625, 1457.66796875, -9275.08203125, - -5122.91015625, 1343.296875, -3186.23046875, -15993.40625, - -608.16015625, 3284.4140625, 3019.984375, -10872.453125, - -1350.9609375, 11773.296875, 9503.2890625, -8274.01171875, - -10936.390625, 1960.64453125, 3924.5546875, -1074.76171875, - -1491.25, -3327.2890625, -5922.09375, -18597.6015625, - -1829.4296875, 288.8984375, -20141.390625, -1170.6015625, - 6588.7734375, 11394.37109375, -17187.796875, -49.48046875, - 4240.875, 37.296875, -6106.9609375, -18200.734375, - -10295.3125, -1257.01953125, 6767.703125, 8181.953125, - 6473.10546875, -253.84375, -3913.15625, -10164.296875, - 1205.078125, 5828.765625, -6385.01171875, -10600.9375, - 8981.015625, 1485.9765625, -9876.6484375, 3877.25390625, - -1260.03125, 1887.4140625, -6755.59375, 1040.96875, - 886.1640625, 6281.9375, -2731.0234375, -15290.609375, - -4133.390625, 901.0625, -6637.0078125, 1097.2265625, - -4536.953125, 3028.89453125, 8126.265625, -7111.5, - 1253.9375, 14892.1953125, -1489.015625, -907.9453125, - 7757.46484375, -11867.8125, 2335.109375, -9668.1171875, - 346.859375, -4725.25, 3315.953125, 2919.3515625, - 11306.4296875, -5802.5703125, -8157.9453125, 7698.328125, - -2235.5078125, -95.21875, -7022.1171875, -11124.4765625, - 317.59375, -1189.7578125, 7274.375, -6828.6484375, - 321.8671875, 2247.078125, 3546.46875, -523.9921875, - -1589.34375, -1405.37890625, -2430.94140625, -563.96484375, - 5233.57421875, -666.48828125, -4541.03515625, 1780.62109375, - 6377.2890625, -12267.453125, -11111.7265625, -3289.40234375, - 2376.93359375, 671.36328125, -6299.796875, -1426.015625, - 3598.0859375, -6307.57421875, -139.2890625, 2542.265625, - 2566.49609375, 1318.71484375, 6360.8203125, -6905.25, - -14513.4453125, -10880.94140625, 1056.2109375, 1832.03515625, - 437.88671875, 1092.2734375, 2960.05078125, -2056.4921875, - 16470.87890625, -2351.92578125, 2397.5078125, -3331.0546875, - 19.640625, 105.4375, -7794.05859375, 479.26171875, - -804.28515625, 6000.4609375, 1077.8828125, 632.77734375, - 563.5703125, 2263.859375, -3954.5859375, 107.21484375, - -14611.3359375, -2997.62109375, -952.02734375, 1701.81640625, - -1004.6171875, 5277.35546875, 4345.125, -849.4375, - -2672.4921875, -1317.609375, 10500.45703125, -5293.65625, - 521.96875, -346.56640625, -1168.33203125, 3645.9296875, - -259.04296875, 3237.43359375, -5938.02734375, 7710.34375, - 772.78125, 5500.84375, -15039.8046875, 4785.1328125, - 2206.671875, -12801.890625, 966.421875, -563.703125, - 7068.1171875, 25506.1015625, 13262.359375, 6491.765625, - -3037.1640625, 10683.625, 2948.3125, -1121.9375, - 6866.375, 3438.796875, -11149.91796875, 10952.328125, - 2517.390625, -462.0859375, 11578.8203125, -4252.03125, - 14609.625, -1116.9140625, 11423.48828125, 11685.95703125, - 19977.09375, -4296.3828125, -6195.265625, -5787.984375, - 4106.28515625, -17155.71875, 397.53125, -34555.75, - -1653.8046875, 42.703125, 4895.1484375, -2072.8984375, - -7203.578125, 2196.703125, -13313.125, 6195.03125, - 7808.953125, -2046.6328125, -9960.90625, -13542.203125, - -3517.88671875, 10327.3984375, -7882.89453125, 3517.984375, - 22811.515625, 9420.94921875, -2354.7734375, 5482.26953125, - 4549.546875, 10771.0625, 19142.8203125, 11362.3125, - 6916.0703125, -8832.359375, 27648.0546875, -7295.65625, - 8321.19140625, 6793.578125, -4605.88671875, 5944.9609375, - 2861.5703125, -1431.6015625, 14618.65625, -3039.59375, - 1171.6328125, 9689.7109375, 1620.8125, 4129.40625, - 726.88671875, 754.953125, 4016.6484375, 3601.9921875, - -2510.65625, -2759.6953125, 5069.8671875, 4082.99609375, - 3862.484375, -3591.3671875, 2620.4765625, 2136.3515625, - 8529.890625, -3794.3125, -2936.625, -4448.76953125, - 3267.8203125, 2869.703125, 8520.9921875, 949.40625, - 1946.73828125, -6256.15625, -1845.984375, -1322.9296875, - 509.6796875, -8137.6484375, 5054.484375, -1415.97265625, - 171.83984375, 9561.79296875, -429.8125, -1113.0625, - -1952.40234375, -1101.00390625, 6330.4140625, -2374.28125, - -7678.4375, 63.48046875, 6324.015625, 491.078125, - -1461.921875, 1890.0078125, 5150.359375, 4403.1015625, - 4691.609375, 8265.21875, -1084.0625, 2198.2265625, - -1849.140625, -1602.6484375, -479.79296875, 102.28125, - -582.7890625, -6195.953125, 653.01171875, 4358.05078125, - 4631.3359375, 5855.109375, 2116.8125, 3863.1484375, - 4524.140625, -662.984375, 3031.171875, 2051.09375, - -4288.7265625, -11481.7890625, 7417.57421875, 20.078125, - 6973.8828125, 3728.09375, -8512.94921875, 1559.484375, - -1579.1328125, 1356.5625, -2316.1015625, 4167.765625, - -184.0859375, 2608.2421875, 3313.390625, -5762.328125, - -3094.578125, 7074.84375, -2194.84375, 721.453125, - -6301.96875, 4063.421875, 12946.7265625, -2150.5078125, - -1317.53125, -385.8203125, 3584.5625, -7832.3046875, - 9116.65625, 1413.296875, -7150.6875, 4720.59375, - -3282.2421875, 5303.19921875, 1296.265625, -8538.5234375, - -4207.5703125, -5933.8125, 3628.3828125, -3718.6015625, - -548.6796875, -5409.7109375, -15992.96875, -3967.859375, - -542.96875, -2518.34375, 13320.9921875, -1919.1171875, - 423.1015625, -2007.9140625, 452.3671875, -2946.875, - -2018.46484375, -2081.4765625, 3083.3828125, 5138.3828125, - 6665.5, 5431.2578125, 12988.62109375, 3206.84375, - 11734.3046875, 1257.296875, 6290.375, -1234.984375, - -249.765625, -6449.625, 518.109375, 3140.0, - 4949.5078125, -910.703125, -2505.3125, -794.1015625, - 4031.44921875, -10340.421875, -8742.48046875, 7447.5703125, - 12446.2890625, 1030.1953125, 2864.76171875, -9884.65234375, - -4651.8359375, 230.58984375, 4127.296875, 8262.19140625, - 3276.203125, -4468.0546875, -14087.625, -3899.52734375, - 4774.2265625, 4462.53125, 8976.953125, 5059.21875, - 919.296875, -16054.56640625, -10557.796875, -5474.40625, - 15932.578125, 7447.6640625, -4282.2109375, 2331.28125, - 4583.234375, -1915.80859375, -4046.9609375, -4819.4140625, - -3506.65625, 943.71875, 11454.13671875, 2537.45703125, - -6997.421875, 1505.640625, -5194.7265625, -4586.875, - -445.203125, -7304.0, -1690.78125, -2076.5703125, - 17862.4609375, 6083.328125, 46.0234375, -3965.2109375, - -8630.65625, 3190.28125, 2180.2890625, 121.609375, - 3768.5546875, 154.0625, -14078.6328125, -11482.296875, - 1224.78125, -3421.703125, -4983.6484375, -201.203125, - -757.25390625, 2443.34765625, -3070.08984375, 8629.6484375, - -3206.79296875, 2054.328125, -5773.609375, -1641.84375, - -4431.3203125, -1179.4921875, 2274.16015625, 725.6796875, - 3669.234375, 14302.96875, 12822.94140625, -5647.2890625, - 12962.8984375, -2994.1484375, -1740.1171875, -10694.53125, - 6719.1953125, 1375.2421875, -6233.87890625, -17028.3828125, - 3497.47265625, 17943.1484375, 6107.13671875, 87.25, - -7117.4296875, 122.546875, -11987.484375, -6167.53125, - 12006.54296875, 20199.328125, 7379.58984375, -4064.73828125, - 6723.97265625, -3439.48828125, 2738.35546875, 492.6796875, - 13098.171875, 10266.1015625, 2286.140625, 11800.671875, - 1322.3125, -761.0390625, -2958.0, -9746.703125, - 1224.6328125, -10919.53515625, -2756.11328125, -3415.1171875, - -9949.859375, -10719.125, 3968.0234375, 5021.61328125, - -11634.5234375, -1413.44921875, -3988.6875, -1193.0546875, - 1397.875, 3243.359375, 1288.42578125, -745.98828125, - 4365.25, 7560.5234375, 2189.09765625, 2910.2421875, - 23789.8046875, 2240.65625, 7779.6640625, 2731.375, - 10291.7734375, -6412.20703125, 9949.1875, 5914.77734375, - 3831.828125, 4511.67578125, -9371.765625, -1758.0, - -4061.6015625, 692.5546875, 399.05859375, 492.6171875, - 9279.9296875, 5602.84375, 3278.12109375, -699.625, - -2414.91015625, 4987.2890625, -3616.22265625, -2077.640625, - -2961.5703125, -5533.5, 2514.2578125, 712.12890625, - -3496.671875, -7541.56640625, 14231.6640625, -16883.65625, - 5537.34765625, -17157.25390625, -2265.171875, -4650.6171875, - 2080.5, 521.78125, 5140.1796875, 10296.60546875, - 1359.01171875, -5526.96875, -11717.9609375, 656.484375, - 1753.08984375, -886.3515625, -7534.5546875, 3619.578125, - 15998.8203125, -4366.203125, -10159.93359375, -2581.5078125, - 6722.6875, 7168.6328125, 7480.328125, -7648.3125, - 1777.84375, 17579.6171875, 11600.0, 7981.6171875, - -441.5703125, 907.453125, -179.5703125, 6332.921875, - 16592.26953125, -1038.48046875, 2046.8203125, -2163.1875, - 1765.9609375, -6352.46875, -1116.1875, 5986.2734375, - -7417.4765625, -2349.9765625, -857.76953125, 2223.1953125, - 4015.5078125, -11670.40234375, 482.53125, 6396.11328125, - 2685.8515625, 10123.1171875, -2535.08984375, -1626.359375, - 2011.078125, 7618.25, -420.875, 2226.7734375, - 10368.484375, -11065.4765625, -4162.4921875, -4497.953125, - 21594.015625, 4463.3046875, 5288.40625, 4649.96875, - 703.6171875, 4594.0, 10201.07421875, 2553.3125, - -365.4765625, 18346.75, -10335.328125, -5139.6015625, - 5478.03125, -12510.609375, -10059.375, 14303.078125, - -1076.96875, 8219.96875, 4167.46875, 26960.2734375, - 228.359375, -7783.8125, -5941.6796875, 4625.3984375, - 5541.0, 19.46484375, 3135.96875, 19647.9375, - -2883.4921875, 1493.5703125, -12407.3828125, 14624.0625, - -15147.9375, 4931.6953125, -14869.515625, 11807.34375, - 5225.078125, 9696.9375, 5583.7265625, 6328.171875, - 7124.25, -5654.0625, 5312.90625, 20389.296875, - 475.35546875, 5178.8046875, 11373.3515625, -9045.015625, - -7986.0390625, -2520.75, 14771.43359375, -19525.921875, - -9486.8828125, -2064.578125, 10894.78125, 1170.84375, - -13448.0625, 563.15625, 12587.5234375, -10220.90625, - 14135.078125, -3495.9375, -4613.125, -9943.6640625, - -6431.6015625, -5718.17578125, -7754.3125, 1863.0546875, - -1609.09375, -13486.0703125, 4562.3125, -7134.50390625, - -2275.8359375, -3527.60546875, 22844.97265625, 2062.60546875, - 15233.33203125, -7658.7578125, -3816.25, -918.484375, - 7518.92578125, 8075.53125, -7333.4375, -10078.3984375, - 9280.84375, 3826.203125, -13381.28125, -618.87890625, - -6620.07421875, -26511.984375, -2220.72265625, 331.953125, - -7631.7109375, 2916.625, 4794.421875, -1067.61328125, - 12636.875, -8169.6953125, -3836.765625, -9041.28125, - 10710.453125, 3390.49609375, -20478.625, 8417.328125, - -823.71484375, -16296.7421875, 4722.59375, 6545.00390625, - -8938.36328125, -4033.171875, -7760.59375, 3523.14453125, - 2809.546875, 10649.125, -15355.4375, -1839.4375, - -3964.5546875, 7211.5, -11530.2734375, 2107.078125, - 9038.4375, 12273.26953125, 6363.015625, 10273.34375, - 5354.15234375, 5770.90625, 11077.4375, -3691.96484375, - 9322.1796875, 6858.9296875, -883.67578125, 1089.625, - 3364.78515625, 5658.56640625, 10761.75, -2854.8984375, - -5349.34375, -11109.3671875, -11839.953125, 19775.01171875, - -4645.49609375, 5063.0703125, -10760.6875, -600.98828125, - -14829.1484375, 15517.703125, 10474.0078125, -4458.453125, - 4897.56640625, 7653.140625, 3145.1328125, -13554.27734375, - 16480.46875, -6257.875, -7110.265625, -121.625, - 21287.078125, -10251.2421875, -12330.21875, -4027.1875, - -9991.390625, -1688.953125, -428.375, 1966.7578125, - 5456.7421875, -3309.30859375, 5482.375, -9362.8359375, - 1959.5625, 13736.90625, -8492.5859375, 5351.0625, - 10320.453125, -5657.9375, -5847.2890625, 15590.140625, - 9257.234375, -9090.1875, 5039.1875, -2705.921875, - -7482.375, 8461.671875, -2538.68359375, -3357.15625, - -2751.09375, -6922.2578125, -8225.40625, 8279.375, - -4157.6328125, -14274.33203125, -8538.5703125, 7904.41015625, - -16698.9921875, -7665.15625, -12356.5625, 2539.15625, - 10639.0546875, -4315.234375, 7225.53515625, 2768.296875, - -14700.0625, 4261.40625, 5880.07421875, -1168.75, - 5933.1953125, 12518.66796875, 651.7890625, -13439.953125, - 14881.9609375, 3754.7890625, -225.609375, -4355.53515625, - -3083.6015625, -4830.734375, 2318.1015625, 6319.5546875, - 14883.75, 839.515625, 4335.078125, -1916.140625, - -4579.1640625, 8575.75, 9459.32421875, -3968.765625, - 2176.015625, 3008.97265625, 7393.1875, 2500.1171875, - -8430.5859375, 13867.8828125, -11062.9765625, -1624.203125, - -5836.75, 9705.25390625, 4882.30078125, -1114.2109375, - -5315.82421875, -3042.4765625, 1657.4921875, 5051.35546875, - -6589.9140625, -14369.0390625, -9222.78125, -5.20703125, - 5781.375, 8870.0234375, -14353.3984375, -432.96875, - 331.453125, -13642.2265625, -3324.7109375, 351.53125, - 1138.75, -3027.31640625, 1554.28125, -4943.046875, - -7234.265625, -1906.12109375, 3044.4609375, -13413.8359375, - -7980.87109375, 2310.671875, 5607.22265625, -7370.609375, - -5497.05859375, -5157.046875, 5636.7421875, 1052.8984375, - -7180.40625, -1169.45703125, 547.859375, 2791.41015625, - 4814.62890625, -3265.09765625, -1056.7265625, -757.578125, - -2515.4296875, -6387.359375, -893.859375, -3126.3984375, - -6060.8515625, -11308.890625, 4928.3515625, -16209.359375, - 7160.7890625, 5303.28125, -2424.515625, -2929.78125, - 11143.6328125, -8607.50390625, 4876.00390625, -1961.9375, - 2437.390625, -7149.33984375, -941.15625, 22434.046875, - 2135.9140625, 3251.359375, 1319.01171875, 8056.84375, - -2976.109375, 15962.86328125, -3349.6484375, 1526.24609375, - 1006.1640625, 3224.015625, -4634.4375, 1097.0703125, - -12380.1171875, 2602.453125, -4867.6484375, -3622.3984375, - -8691.140625, 6341.359375, 9376.0234375, -6510.6328125, - 1773.1875, -8482.0625, -8907.515625, -7915.90625, - -6481.796875, 759.46875, -1929.796875, -1953.02734375, - -1798.953125, 5370.7421875, -1081.6875, 2256.9140625, - 17857.5234375, -11525.0234375, -1203.7265625, 24716.90625, - 7464.5078125, 2954.5078125, 13075.48828125, 7866.1484375, - 14447.9453125, -7084.4921875, 3929.09375, -7539.9296875, - -3519.9375, 6672.7421875, 6093.62109375, 6955.96875, - 4025.828125, -126.94921875, 5126.5859375, 3201.5546875, - -6431.578125, -188.0703125, 6426.171875, 1929.5546875, - -15846.875, -695.66015625, -1451.84765625, -1942.734375, - 207.140625, -10189.765625, 7762.03125, 8906.9140625, - -660.60546875, -7752.40625, -5255.984375, 882.09375, - 8423.71875, -3172.421875, 2646.7421875, 9152.2265625, - 1085.55859375, -8692.8203125, 973.875, -1869.5, - 8178.34765625, 7867.89453125, 2181.4765625, 588.0390625, - 24.9140625, 2975.97265625, 5771.296875, -1679.90234375, - 925.08984375, 1006.890625, -7803.75390625, -17944.4375, - -3548.34375, -10146.4140625, -3722.81640625, 10419.03125, - 3722.5859375, -1935.8125, 59.203125, -11391.94921875, - 1219.65625, 5151.9921875, 1346.69140625, 595.3359375, - -9482.3359375, 1295.83984375, 2772.953125, 1346.38671875, - 2181.46875, 1360.9453125, 9254.3984375, 6072.16796875, - 726.2421875, 2013.1484375, -694.09765625, 2080.8046875, - 2262.453125, -911.01171875, -9442.703125, 7123.328125, - 2869.1875, -4724.6796875, -2982.75, -16466.546875, - 2652.921875, -4498.03125, -6018.85546875, -3825.53125, - -9.71875, 6784.6328125, 1360.5390625, 9806.4453125, - 711.40625, 4829.46875, -7308.5, -1421.546875, - 3980.078125, 1118.55078125, 6501.453125, 14434.125, - -12590.9453125, -17783.1875, -3761.734375, 1385.390625, - 3462.2265625, 4659.578125, -2503.8828125, 11852.125, - -2482.5625, -2120.75, -1140.296875, -5098.1640625, - 4901.46875, 5874.95703125, 5603.6015625, -16923.9375, - -1649.1796875, -612.765625, -6498.8125, -5476.25, - 1940.93359375, -7307.91015625, -1280.0546875, -1428.9375, - 2240.828125, 13458.6875, 5682.6484375, -13847.640625, - 1063.7578125, -319.9140625, -8096.9453125, -7829.0390625, - 1371.5, -3219.21875, -2539.828125, -6963.734375, - -5473.5703125, 5542.6875, -7898.125, -9674.625, - 8558.0859375, 5664.2265625, 7490.0546875, -1753.53125, - -7953.9765625, 10069.0078125, 6275.3828125, 523.234375, - -1178.53125, -844.859375, 3125.453125, -7463.5625, - -1403.0625, -3630.984375, -165.859375, 1731.0, - -147.59375, 4273.390625, 213.109375, -2529.8125, - -8061.21875, -3309.484375, 2422.453125, 14572.296875, - 2388.203125, -9348.625, 980.0078125, -8830.7578125, - -4131.828125, -5384.453125, 4592.6328125, -5177.69921875, - 9471.78125, 1171.7890625, -5009.875, -262.0625, - -1244.3046875, -7430.7421875, 711.765625, -2984.734375, - -7533.65625, 6934.953125, -1074.9453125, -230.79296875, - -131.8671875, -596.65625, 2107.8125, 1501.8984375, - -2482.46875, -3647.3515625, 2726.203125, 2980.125, - 2719.4921875, 4163.8046875, 3455.421875, 190.75, - -8463.9375, 974.515625, -1396.7734375, -2681.6328125, - 1464.375, -1929.703125, 6373.5625, 3737.25, - -1427.3984375, -1370.328125, 6595.3359375, -5094.640625, - -3787.984375, -1242.546875, -2496.8984375, -6000.3125, - -1932.7421875, -5074.6796875, -553.9765625, -3536.10546875, - 6607.984375, 4595.796875, 1487.390625, -2378.015625, - -8420.7734375, -2305.15625, -2969.28125, 3722.953125, - 13136.421875, 2029.99609375, -14.171875, -7908.75, - -1538.08984375, -6611.66015625, 2443.5078125, -3765.9296875, - -7126.234375, 861.9453125, 4108.4765625, 1330.92578125, - -1151.36328125, 688.171875, 1843.58984375, 3522.57421875, - 4382.81640625, -4604.859375, -2774.58984375, 1481.546875, - 2628.0, -6447.58984375, -2573.28515625, -7236.56640625, - 184.4765625, -4527.6171875, 2820.39453125, 337.875, - 904.03125, -5031.2890625, -1555.9453125, 2415.86328125, - -2276.203125, 1030.0625, 6113.796875, 1191.03515625, - 2499.4140625, 7111.7734375, 2831.5703125, -7500.7109375, - 632.6640625, -3403.34375, -2083.80859375, 8113.24609375, - -2290.96875, -5565.80859375, 1127.7265625, 699.3125, - 2097.359375, 1067.48828125, -1349.671875, 2933.4921875, - -3408.84375, -3061.53125, -3349.28125, -3074.484375, - 2263.03125, -5592.7421875, -2938.16796875, 7319.375, - 4691.07421875, 2557.40234375, 6860.3515625, 3755.1796875, - -665.5546875, 1312.01171875, 3079.79296875, 3601.578125, - 4347.671875, -373.6953125, -2485.328125, -1756.6484375, - 1228.515625, -2431.359375, -1550.84375, -177.515625, - -5181.296875, -3509.625, 14990.4296875, 15603.28125, - -2783.3125, 3154.66796875, -12306.09375, -305.15625, - -54.4140625, 14821.1015625, 11276.9140625, -9920.09375, - 2091.4609375, -3140.234375, 4269.109375, 8895.06640625, - -5238.52734375, 13865.703125, -6129.95703125, 6361.28125, - -1705.875, 2244.953125, -4493.0859375, -6121.65625, - 613.53125, 1030.1875, 16007.90625, -6636.5390625, - 4345.5078125, 750.79296875, -4060.83203125, -2591.625, - -2854.359375, 1805.171875, 4265.96875, -3956.7265625, - -9527.625, -4510.5546875, 6907.9375, -4604.625, - -9026.55078125, 8320.0625, 5856.0, -5373.359375, - 2939.78125, -2488.21875, 5363.1015625, -1415.43359375, - 3488.03125, 3998.109375, -12185.390625, 1702.96875, - -3503.1171875, 2064.26953125, -27361.0, 7070.3828125, - 753.890625, 2550.375, 601.53515625, 832.4375, - 8466.6640625, 3403.2421875, 8911.890625, 6375.94140625, - 2002.0625, -8505.3203125, -632.63671875, -9265.140625, - -2716.328125, 424.578125, -5060.2109375, 10112.390625, - -3605.3359375, -526.46875, -16408.8828125, -6998.0390625, - -947.2734375, 1816.078125, -146.75, -3324.0390625, - 2203.90625, 470.62890625, 1453.875, 5785.91796875, - 4051.2578125, 7159.40625, -10929.71875, 203.8671875, - -9140.46875, -955.765625, 1994.5859375, 2536.734375, - -6436.75, 2710.59375, 10599.65625, 9739.6640625, - 292.4609375, 7849.890625, 7964.2890625, 4877.515625, - -3839.65625, -2208.5, 1137.7578125, -8188.84375, - -7361.765625, -1264.390625, 2944.0703125, 6649.171875, - 2033.23046875, 5752.921875, -1250.171875, 9403.359375, - 8651.578125, -3471.765625, -2449.875, -7566.44140625, - 10851.6640625, 4833.5859375, 4242.02734375, 1504.8125, - 9558.734375, 1157.828125, -2613.6171875, 3727.0625, - -2434.1640625, -101.984375, -4429.4453125, 3920.265625, - -1042.203125, -780.15625, -5463.6640625, -120.3984375, - -3493.78125, 7424.40625, 2479.6640625, 3601.7578125, - 353.0078125, 2827.59765625, -6464.34375, 6887.9140625, - -1698.26953125, 2404.29296875, -3530.640625, -7846.9609375, - -9994.484375, -3575.42578125, 2306.05859375, -2894.4921875, - -948.265625, 11356.625, 9154.8515625, 2904.1953125, - 11919.17578125, -2282.4453125, -1121.11328125, 13480.078125, - 10908.390625, 6747.8828125, 800.3671875, -10509.640625, - 533.5234375, 11027.0703125, -4219.234375, -7729.84375, - -11163.734375, -4349.6640625, 12480.4765625, 5203.109375, - 2441.71875, 853.85546875, 5038.3046875, -2251.828125, - 7780.875, -8832.6640625, -5423.0234375, 3671.8203125, - 15434.1875, 1959.40234375, -2531.34375, 17689.17578125, - 1567.234375, -3896.0, 4207.6875, 11394.1328125, - -8066.49609375, -456.33984375, -2744.2734375, -2817.8984375, - 9222.984375, 2738.46484375, 4012.1953125, -7719.9296875, - -12361.6640625, -3519.6015625, -5856.375, -11304.05859375, - -11491.1875, -9794.16015625, -2768.078125, -11432.0390625, - -7268.390625, -10967.83203125, -6549.73046875, -5389.828125, - 4412.8984375, -2906.8984375, -12168.6171875, -7328.99609375, - -1229.16015625, 4599.5234375, -2829.68359375, 10485.98046875, - 201.86328125, -2020.4609375, -4518.4921875, -16199.96875, - -7845.7734375, -1346.921875, 3753.4296875, 1471.234375, - -5581.5, -6293.375, -2851.8515625, 6224.46484375, - 10153.203125, 12598.6875, 402.66796875, 5316.359375, - 2073.984375, 56.99609375, 1132.6484375, 3657.6875, - -1813.6328125, -204.453125, -4555.3203125, -6068.96875, - -4076.46875, 9983.5234375, -13637.44140625, 17032.640625, - 2791.5703125, 11097.4140625, -4654.7734375, -7166.515625, - -4559.1875, 1125.203125, -23760.140625, 4752.5390625, - 7987.3515625, 7073.265625, -8569.125, -9292.9296875, - 6584.5078125, 6068.828125, -5207.5234375, 9421.640625, - -970.8203125, 13756.9921875, -7502.84375, 1054.546875, - 5556.46875, 10590.59375, -414.1015625, 5828.6640625, - 19121.390625, -2501.421875, 11971.7265625, -3551.046875, - -5037.640625, 7503.8515625, 3815.33984375, 4156.0234375, - -10529.328125, -1909.9453125, 14220.2109375, 3243.30859375, - -6585.703125, 6086.234375, 14310.0859375, 4950.453125, - -392.21875, 18875.03125, -8386.78125, -10255.14453125, - 689.6484375, 4532.77734375, 3118.59765625, -4785.4296875, - -2039.86328125, -3946.77734375, -7149.9453125, 12456.921875, - 2814.28515625, -5509.5703125, -8851.35546875, -19781.35546875, - 3746.8515625, 7935.33984375, 1626.765625, -9028.0703125, - -264.890625, -15158.828125, 16204.4609375, 931.2890625, - -1809.53125, -1022.3984375, 953.90625, 2694.1875, - -171.03125, -493.91015625, -8933.1015625, -11803.421875, - -7485.1953125, -5281.203125, -3619.1875, 2157.24609375, - -15016.64453125, -6877.53125, 14902.8125, 7209.2265625, - -2042.5078125, 1408.5546875, -7527.7421875, -2854.0234375, - 14371.3203125, 4124.328125, -12106.0078125, 6915.3671875, - -5424.265625, -3368.4140625, -1484.390625, 1838.796875, - 11136.140625, 3617.8515625, 1566.80859375, 2294.078125, - -12725.1328125, 91.15234375, -6349.77734375, 7540.0234375, - 2733.8359375, -10235.1171875, -581.0078125, -3233.48046875, - -4170.0625, -7954.2578125, 9444.06640625, -2047.2734375, - 4835.671875, 4909.4375, -5823.890625, -3981.140625, - 3692.07421875, -4590.18359375, 504.390625, 20112.46875, - 17923.375, -4953.19140625, -491.546875, 4872.7265625, - -2552.0078125, 515.15625, -7827.59765625, 817.8515625, - 7113.12109375, 15145.3828125, 6575.7578125, 10015.203125, - 16561.82421875, 14229.453125, -8912.09375, -11645.81640625, - 10893.7421875, -1340.2890625, 14550.59375, 6277.40625, - -9793.296875, 553.3359375, -7538.17578125, -2921.26171875, - -7752.9921875, 6873.0703125, -16358.5546875, 12424.53125, - 20554.5078125, 10303.4375, 2862.0625, -3766.5859375, - -3983.75, -3120.3828125, 11599.23046875, 10377.546875, - -9287.1171875, -6044.09375, 11645.28125, -11572.89453125, - 11123.390625, 14959.6875, 3593.30078125, 9979.328125, - -846.6953125, 3183.48046875, -3967.171875, 3256.1171875, - 3214.33203125, -4160.796875, 4222.59375, 3537.5, - 11854.21875, 6423.28125, -2027.49609375, -5821.015625, - -1476.13671875, -12135.12890625, 3820.03125, 7955.4921875, - 1692.8125, -3746.8125, -6269.2578125, 4416.1875, - 9166.109375, 2023.671875, -8005.09375, 2959.1328125, - -8966.8203125, 4020.79296875, -1371.328125, -11798.4140625, - -4201.66796875, -14029.77734375, -8324.484375, -5026.23828125, - -561.40625, 6900.2578125, -4260.3984375, -4829.32421875, - 7013.27734375, 16687.0078125, -1037.53125, 11360.58203125, - -5288.93359375, 3934.4609375, 5246.1171875, -14641.5390625, - -1695.796875, -2946.5234375, 3663.37109375, -5420.6953125, - -12489.984375, 8586.80078125, 2190.5, 6811.578125, - -1677.21875, 172.26953125, 3728.15625, -5256.53515625, - 3975.3359375, -521.1953125, -4278.7734375, -1740.34375, - -1648.5859375, -10813.98046875, 6757.08203125, -5296.5, - 8466.80859375, 2116.62109375, 9801.4140625, 2160.05859375, - -5650.734375, -8918.32421875, -3301.09375, 1912.65625, - 76.85546875, 5785.43359375, -3678.203125, -8450.1796875, - 391.5078125, -4309.796875, 10641.140625, -12910.97265625, - -2342.26953125, -3833.234375, -3109.11328125, -4014.1875, - -6257.9921875, -3807.9921875, -3364.80859375, -7310.78515625, - -5972.72265625, -7337.65234375, -5605.2109375, -1185.828125, - 2887.03125, 9783.078125, -4199.96875, -789.6640625, - 2593.18359375, -10152.171875, 3671.8359375, 6207.3125, - -4058.71484375, -5752.25390625, -7506.3671875, -4762.3046875, - -12752.46875, -7281.8046875, -16373.4609375, -9003.58984375, - 8183.5, 12239.4140625, -2972.0078125, -13531.28515625, - -7572.0859375, -17158.4140625, 8011.765625, -5040.41796875, - 1570.8046875, -2291.1171875, 6919.9765625, -12279.046875, - 510.08203125, -5314.2890625, 3371.51953125, -7799.9140625, - -2271.6875, 6626.4453125, -6619.00390625, -7024.953125, - 16345.05078125, -11029.59765625, -4943.28125, 5850.90625, - -8048.53125, 1177.35546875, 1111.40625, -14836.8046875, - 14040.265625, 2033.5546875, 7317.09375, -1060.890625, - -3371.7109375, -12625.4296875, 2082.7890625, 6905.03515625, - -8944.015625, -3204.109375, 6517.40625, 1347.953125, - -1594.109375, 20008.1875, -15193.2734375, 143.90625, - 10596.44140625, -2490.234375, 1265.5546875, -746.0234375, - 20993.859375, -762.265625, -21868.46875, -12004.640625, - 11621.1484375, 8729.95703125, -11762.8984375, -25792.125, - -3066.94921875, 3412.9140625, -6038.296875, 269.48828125, - 1147.76953125, 1774.30859375, 115.00390625, -3346.20703125, - -8087.625, -4701.3515625, 145.75390625, 6729.3046875, - -19666.2265625, 1479.62890625, -5465.859375, -164.57421875, - -12049.1015625, 89.765625, -631.828125, -3386.03125, - 10949.03515625, -19428.3828125, -4300.53515625, 4744.55078125, - -4306.890625, -4472.4296875, 3664.56640625, 1580.515625, - 154.60546875, 819.92578125, 6881.30078125, 8297.5546875, - -11585.5703125, 756.6015625, 716.25, 5503.94140625, - 7701.68359375, 84.3671875, 8214.640625, -2575.51171875, - 13807.0625, 2423.33984375, 16971.53125, -11991.8046875, - 2.08203125, -5043.3828125, 11978.828125, -4652.14453125, - -14990.8671875, -1060.71875, -6386.01171875, 3906.98828125, - -5114.7109375, 5230.16015625, 2625.4921875, -15182.6015625, - -6823.0546875, 5756.82421875, 4508.95703125, -1303.89453125, - -990.09375, -12502.71484375, -4192.7890625, -1111.5390625, - 847.16796875, 4728.234375, -374.43359375, -6225.95703125, - -6084.8828125, 11852.12109375, -3346.046875, 2343.578125, - -10360.43359375, -8016.4296875, 1469.875, 9533.984375, - -140.203125, -1890.52734375, -4219.8828125, -10638.8046875, - 20232.0078125, 443.59375, -12256.8203125, -2933.41796875, - -4542.10546875, -22589.890625, -11386.234375, -3307.1015625, - -19243.1328125, -8766.6875, -10584.890625, -21397.40625, - -14948.421875, 6914.828125, -10830.4375, 1655.828125, - -13818.1796875, -513.28125, 16966.09375, 13417.484375, - -12038.25, -11313.9453125, -11004.60546875, -2917.69140625, - -9740.0703125, -8780.4765625, -7191.390625, -7960.81640625, - 3362.671875, 4705.8203125, 6335.921875, 12470.6171875, - 6172.671875, -2139.6953125, 2560.203125, 9491.84375, - 5177.125, -10475.26953125, -1871.4765625, 8260.109375, - 10105.8203125, 4305.0546875, 8176.1796875, 14430.28515625, - -1030.65625, 7994.3203125, -6586.40625, -2639.6875, - -15531.9765625, 283.4921875, -1403.2109375, 9352.2734375, - -6525.73828125, -38.9453125, 14282.6015625, -109.015625, - -23493.0625, -2206.484375, 4852.234375, 12229.6015625, - -2510.6640625, 16286.640625, -5536.0703125, 17186.6953125, - -9520.359375, -1242.48828125, -2995.25, -2009.2265625, - -3255.02734375, 12811.79296875, 7390.453125, 7872.109375, - 12615.9609375, -2345.578125, 14482.98828125, 4769.4453125, - -3797.375, 1101.7109375, 8028.10546875, -1339.0234375, - -4475.8984375, 7231.484375, -574.24609375, 5084.4921875, - 161.78125, -11135.36328125, -519.265625, -12472.1484375, - 1723.421875, 3613.6953125, -5380.00390625, 3567.5546875, - -15237.87109375, -1104.12890625, 3431.3828125, 3784.2421875, - 6972.984375, 550.93359375, 18355.625, 947.3671875, - -16801.19921875, 6178.96875, -13105.97265625, -3757.6953125, - -15707.96875, 565.2109375, -6904.4453125, -16116.5703125, - 712.828125, -2828.14453125, 10232.23046875, -272.3125, - -4127.19140625, 754.875, -633.6796875, -21484.09375, - 4282.046875, 8931.84375, -10209.2890625, -5143.5546875, - 3306.3046875, 2691.3125, 645.7890625, 6578.9921875, - 8765.71875, 5326.92578125, 1496.7734375, -170.01171875, - -468.953125, 20395.39453125, -3119.90625, -1231.9453125, - -17771.96484375, -2116.85546875, 5227.640625, 7571.2734375, - 9872.8046875, -10816.28125, -1909.1796875, 2398.39453125, - -588.25, 3172.890625, 3266.9453125, -1874.546875, - 2273.2578125, -1985.00390625, -7490.98828125, -17269.6796875, - -3753.80859375, -2737.96875, 10793.3125, -783.109375, - 867.1015625, 1218.5390625, -16178.140625, 10126.0703125, - 881.65625, -9156.34375, -9181.046875, 3826.046875, - 7133.609375, 16692.1171875, 1797.859375, -15600.31640625, - 7694.40625, 14056.328125, 2971.73046875, -291.23828125, - -9726.2890625, 16140.953125, 3121.578125, 8939.3125, - -8716.0859375, 6947.0390625, -1543.109375, -4687.21875, - 2043.015625, 535.8046875, 7026.8125, -2974.21875, - 6930.1875, -530.0546875, -6411.8046875, -3458.921875, - 1119.6640625, -5939.9375, -10655.328125, -1522.82421875, - 1987.0546875, 3149.3828125, -4371.8046875, 4911.046875, - -1162.03125, -1900.8046875, 7159.3125, -5554.21875, - 1738.2890625, -5734.7109375, -2750.5390625, 2726.0546875, - 3216.11328125, -12055.87109375, -9456.0625, -715.359375, - 4819.4296875, -203.4609375, 7273.0546875, -989.359375, - -1536.3046875, 11036.40625, 9034.7890625, 8026.8203125, - -152.6484375, -10797.7265625, -2350.53125, -4118.328125, - -1820.625, 12179.515625, -6721.5390625, -11243.2890625, - -7186.40625, -7793.66015625, -2010.390625, 6595.3515625, - 533.5546875, -271.5, -831.671875, -2737.7109375, - 3444.8125, 2452.4609375, 1615.8359375, -3139.25, - -990.3671875, 43.9296875, 4780.2734375, 5218.96875, - 2264.59375, 8520.9375, 10525.3984375, 779.8125, - 3522.84375, -238.34375, -9830.125, -4218.75390625, - -1380.109375, -10862.2265625, -978.8359375, -6911.1015625, - -11409.203125, -14533.83203125, -315.171875, 8415.7578125, - 1377.30859375, -6239.9375, -6262.421875, -4127.33203125, - -2755.796875, 9889.1171875, -2309.546875, 6002.19921875, - -3387.265625, 2919.609375, 12211.7421875, 5011.8359375, - 7181.1640625, 12193.1875, 716.421875, 15581.65625, - 6559.3046875, 3664.69140625, 690.08984375, -6804.08203125, - -1148.828125, -3649.109375, 6169.65625, 3679.7265625, - -9953.0, -2349.53125, 2022.7265625, -4498.64453125, - -10101.3515625, -13052.1640625, 4657.57421875, 1880.9921875, - -7047.3125, 10128.37890625, -6736.77734375, -10288.171875, - -1017.078125, 18140.8125, 1629.4140625, 10726.2734375, - -3315.78515625, -361.5546875, 1091.03125, -3859.98046875, - 396.30078125, 168.609375, -10235.7421875, -5509.3046875, - 455.8046875, 3412.6953125, -10598.609375, -4554.9453125, - 2911.7890625, -9053.890625, 7177.01953125, 9911.90625, - 2131.6953125, -7070.5390625, 31697.2265625, 11208.6171875, - -3894.2734375, 1488.71875, -8995.4609375, -9611.5390625, - 9759.40625, 5818.98828125, -157.71875, 8729.1015625, - -8412.5390625, 5671.09375, -3655.8984375, -9348.91796875, - 17153.390625, -3371.3359375, 2297.65625, 11417.5546875, - -7736.03515625, -1883.6484375, -5866.03125, -7699.890625, - -5550.04296875, -10843.9453125, 1187.44140625, 2740.41796875, - -3856.90625, -5226.9453125, -15328.8359375, 7097.1640625, - 4907.765625, -2376.3515625, 3247.9453125, 4936.171875, - -9527.0, -15037.6953125, 3643.9609375, -2882.06640625, - -4769.1796875, 2846.17578125, -10344.4453125, 13935.6484375, - -777.25, 13651.42578125, 19802.9921875, 12977.6953125, - 1668.58984375, -3588.28125, -1103.4375, 15397.484375, - 8939.3203125, 1773.5078125, 23913.0546875, -1999.87109375, - 4213.86328125, 9440.890625, -9809.6875, -787.765625, - -16373.41015625, -4932.890625, -7492.8125, 4156.1484375, - 3535.125, 1034.3203125, -11444.8046875, 776.33984375, - 16857.3984375, 1265.86328125, 6983.6484375, -3705.65625, - -5589.5546875, -12884.8203125, -8074.125, 19.22265625, - 6337.5546875, 4416.046875, 8930.46484375, -159.2578125, - -8283.703125, -2381.609375, -12708.140625, 1655.703125, - -4148.8203125, -2889.1015625, 8338.56640625, 1855.3203125, - 9154.4453125, -1253.9609375, -1666.91796875, 801.65625, - -8052.5234375, 4874.578125, 759.25, 8142.67578125, - -6646.953125, 4058.91015625, 7254.30078125, 7746.4609375, - -785.6796875, 4627.953125, -1246.453125, 5961.0078125, - 5426.09375, 952.7578125, -12446.234375, -6563.265625, - 9014.625, 4251.1015625, 10347.28515625, -16890.0546875, - 1170.94921875, 898.4765625, -10999.4140625, -3605.63671875, - 14255.25, 2157.046875, -1359.70703125, 1604.05078125, - -2724.42578125, -5751.8046875, -4726.1015625, -9718.53125, - -1978.5, 7374.875, 1789.9765625, -7673.65625, - -1145.71484375, 951.6875, 13529.03515625, -2687.609375, - -3232.35546875, 1753.01171875, -7501.5, 5562.5, - -6176.1328125, 5084.65625, -2890.3125, 2626.3125, - -67.32421875, -4593.04296875, -767.03125, -5540.21875, - 6695.93359375, 6038.12890625, -1635.44921875, -724.515625, - -1898.9765625, 1594.30859375, -1726.69921875, -1548.0625, - -2624.3515625, -67.234375, -9180.375, 6267.296875, - 3098.60546875, -3196.9453125, 8700.96875, -5866.8125, - 306.7421875, 21482.85546875, -2711.6171875, 3666.8359375, - 466.828125, -13546.265625, 11090.9375, -13352.16015625, - 398.0, -7701.0078125, -4624.8046875, -4380.3125, - -8021.40625, 5669.3828125, 9764.5390625, 11979.265625, - -11539.5390625, -1222.1875, 531.41015625, -4274.0859375, - 498.7265625, -7231.77734375, 279.96875, -55.8125, - -6929.484375, 2481.49609375, 9563.390625, 3560.890625, - -4082.84375, -12567.359375, -3022.984375, 2853.234375, - -1353.4375, -4305.390625, -7408.40625, -3579.0625, - -4166.171875, 426.5625, -5552.578125, -1367.19140625, - -8917.625, -2335.375, 21657.89453125, 13092.40625, - 10337.421875, 6580.4296875, 9982.7890625, -14562.328125, - 10851.1015625, 1190.01171875, -6974.515625, 1818.28125, - 11247.83203125, -13036.8203125, -1493.6875, -3207.93359375, - -4367.015625, 966.03125, -16569.828125, -9528.87109375, - 2244.33984375, -3105.625, -8069.71875, -8994.1953125, - -1041.390625, 1394.734375, -11563.4609375, -15632.4140625, - -18348.359375, 8792.046875, 1471.3828125, 5665.5546875, - 4322.625, -10991.25, -12158.7109375, 3989.78125, - -13146.53125, 6573.32421875, 5805.9375, 15909.453125, - 10993.1015625, 994.65625, -3116.96875, -11683.84375, - -5161.1015625, 10795.640625, 1865.796875, 9630.96875, - -5362.04296875, -13533.1953125, 10627.7421875, -21505.984375, - 8826.78125, -7207.72265625, 18429.21484375, -14311.953125, - -9024.6484375, -1557.88671875, -1648.265625, 1942.2421875, - 840.25390625, -865.03515625, 4699.0390625, -12978.2578125, - -4966.46875, 13152.90625, 6042.7734375, -7304.0625, - 1796.19140625, 3501.390625, 837.9140625, -2006.03125, - -18303.4921875, -12422.3125, -4122.82421875, -1478.046875, - 3508.6328125, -840.08984375, -436.875, -673.2734375, - 3210.3125, 5192.484375, 4595.953125, 1659.4375, - 3981.34375, -107.38671875, 18545.2109375, -8220.0390625, - -4340.84375, 14516.3125, -15250.76953125, -3820.6640625, - -9490.4921875, 590.9765625, 9679.40625, -1590.1953125, - 9569.7734375, 3985.6953125, -7002.90234375, -2670.578125, - 756.21875, 6613.94921875, -1645.90625, -1547.734375, - 4705.78125, 506.15625, -9361.15234375, -2706.890625, - -8115.97265625, -6504.1875, 3712.41015625, 4783.890625, - 244.0, -7712.125, -4502.95703125, 2157.765625, - 1644.5625, -7629.16796875, -6938.609375, -8972.3828125, - -7026.61328125, -4831.6796875, -4054.546875, -832.83203125, - -15792.8359375, 3272.515625, 2651.671875, -648.484375, - -15101.5, 2880.03125, 5349.03125, -5876.25, - 7985.90625, 11059.140625, 16087.6484375, -9912.09375, - 1482.25390625, 3208.5390625, 3152.83203125, 7562.703125, - -6562.046875, -745.953125, 13119.578125, 9882.484375, - -21119.0625, 7162.4765625, -12082.84375, 16159.5859375, - -16112.1640625, 14644.375, -17490.4140625, 1315.3984375, - 52.078125, -5559.90625, -2324.921875, -8314.3125, - 5578.8515625, 1196.078125, -1543.4375, -3584.0625, - -572.578125, 9751.4296875, 4439.796875, 2980.26953125, - -24902.53125, -494.46875, 5620.828125, -6995.65625, - 7991.421875, -777.984375, -10751.25390625, -9159.7734375, - -4143.21875, -3886.93359375, 1789.125, -5803.578125, - 1389.9375, -3939.1875, -1760.71875, 12747.8671875, - -3184.93359375, -4926.015625, -10606.49609375, -11832.234375, - 9423.984375, 2142.265625, -1061.359375, 13054.25, - -2470.1328125, 684.6640625, -12289.0078125, 16340.0625, - -11126.69140625, 3188.60546875, -2249.359375, 13695.09765625, - -1289.21875, -7329.390625, 1189.03125, 3282.71875, - -7625.3046875, -16688.2734375, -494.0703125, 2585.859375, - -8843.3828125, -4881.5625, -5706.8359375, -1437.7734375, - 14060.33203125, -7777.5703125, 3348.2734375, 5334.6484375, - 19191.94921875, 24535.375, 15069.09375, -1057.40625, - 6501.65625, -3592.34375, -2071.0859375, 9815.46875, - -13567.890625, -466.0390625, 9002.2265625, -6892.0390625, - 317.40625, 1737.84375, 9035.3828125, 6043.0859375, - 9428.9296875, 9822.77734375, 3538.21875, -7869.1640625, - -15818.3046875, 21774.6015625, 4791.9453125, 11292.94921875, - -17180.578125, 1431.34375, 5122.65625, 1264.71875, - 2974.21875, -10943.734375, -7043.453125, -6390.25, - 2252.140625, -7582.14453125, -15002.28125, -8446.9453125, - 5452.64453125, 282.078125, 8035.53125, 6049.296875, - 622.65234375, -12556.359375, 657.875, 404.953125, - -7945.6953125, -816.2578125, 14000.0078125, -1486.78125, - 4373.828125, 1908.1875, -3748.9921875, 9485.4375, - 408.671875, -6540.8203125, 3891.1875, -1159.0234375, - 23961.0078125, 3584.3515625, -16737.890625, 14594.5078125, - 5136.9453125, 7937.3515625, 15521.625, -2033.97265625, - 5030.140625, 7234.1015625, 9122.453125, -8879.078125, - -2938.3125, -7339.0, 11260.6484375, -1896.46875, - 22847.515625, -13288.75, -7961.7421875, 5254.78125, - -1637.984375, -771.3359375, -3305.0546875, -3090.5234375, - -2325.7109375, -9667.4296875, -18932.1015625, -20762.4375, - -7296.71875, -8840.578125, 4688.3125, -3881.6015625, - -9253.390625, -1808.40625, 8678.71484375, 17919.9140625, - -5530.015625, -6528.5, -5780.5234375, -8453.53125, - 1163.7421875, -12433.9375, 2800.66796875, 1157.5703125, - 20866.359375, 9474.03125, -11325.1484375, 1518.421875, - 1232.6875, 3569.51953125, 4999.1015625, -10654.546875, - -3048.0234375, 1870.4375, -8439.2890625, -17630.078125, - -10741.7734375, -10101.21875, -1371.609375, 7337.703125, - -973.796875, 19439.546875, -435.30078125, -8275.4921875, - -15801.19921875, 19533.203125, -10677.6875, 1619.046875, - -10340.01953125, -4761.3046875, -7395.328125, -990.375, - 7443.25, -2431.703125, -13043.3671875, -12194.875, - 5225.15234375, 3050.5390625, 6939.6796875, 4578.015625, - 6549.78125, -2731.5703125, 5464.09375, -18469.296875, - 8805.484375, 3718.203125, -6007.09375, 5015.953125, - -16821.6875, 3399.40625, 1803.734375, 11701.5625, - 5794.265625, 15152.03125, 12209.4609375, 13709.2734375, - 4615.140625, -1269.90625, 11701.7578125, 4005.96875, - -9214.6484375, 12736.7890625, -18733.0625, -1960.4375, - -3156.609375, -2741.3984375, 6899.8203125, 561.734375, - -4119.375, -12688.5390625, 3883.53125, -25479.328125, - 614.921875, -13440.125, -1309.69921875, 2778.125, - -6634.34375, 16022.234375, -1044.2265625, -11476.703125, - -6846.8984375, 3209.859375, -5657.1796875, 21886.5625, - 3312.859375, 11954.9609375, -5757.8671875, 231.0703125, - 11480.46875, -20.9375, 22835.40625, -5692.3125, - 1356.71875, -14085.6640625, 10855.953125, -762.296875, - 114.28515625, 8700.484375, -1597.515625, 6540.12890625, - 1866.234375, 3076.49609375, 3608.390625, -5642.8203125, - -26417.4921875, -7897.7421875, 13085.7265625, -5569.6484375, - 8776.6875, 10785.34375, 18619.5859375, -23138.1640625, - 1069.0078125, 24686.91015625, 3973.125, -9526.62890625, - -6936.7578125, -14106.33984375, 2786.828125, 9223.6640625, - -7347.9140625, -835.875, 8007.078125, -9283.6328125, - -13327.39453125, 7049.9609375, 10873.9921875, 12574.44921875, - -14661.90234375, 6308.484375, 15493.84375, -14216.40234375, - 1556.87890625, -3021.359375, -4272.3046875, -968.3515625, - -1078.46484375, -2637.859375, 572.21875, 1111.1015625, - 6992.578125, 14468.765625, -6641.125, -1783.3125, - 11311.41015625, -9276.65625, 22454.4765625, 10975.8046875, - 9434.8203125, -975.484375, -8339.0390625, 144.34375, - 6799.2109375, 4708.5859375, -1095.1171875, -6271.32421875, - -4386.1328125, -7273.67578125, -9819.4375, -7072.765625, - -2180.8125, 3987.3046875, -8439.109375, -12031.2109375, - -7977.04296875, -9884.7109375, 5282.734375, -8396.9609375, - -18316.9375, 12759.0546875, -5556.3984375, 5122.9296875, - 10017.96875, -15912.5546875, -1954.46484375, 7856.98046875, - 1843.0546875, -3683.328125, -20.15625, -2530.12890625, - -7920.26171875, -15270.37890625, -405.15625, -11620.875, - 10926.609375, 402.4921875, 3854.30078125, -9379.8359375, - 8840.1640625, 9004.4609375, 7173.28515625, -5673.31640625, - 1143.0390625, 419.8828125, 6527.9765625, -1925.328125, - -10096.9609375, -8330.7265625, 10222.35546875, -569.6484375, - 7856.0234375, 4410.265625, 7794.5859375, 3433.390625, - 14852.5703125, 1742.9765625, -66.90625, 3712.4765625, - -9510.86328125, -8079.98828125, -1898.2421875, -6584.15234375, - -14063.2109375, -1161.5625, -4230.09375, 5113.671875, - -985.37109375, -2648.25, 558.6796875, -704.1015625, - -3880.8359375, -12148.4296875, -6997.546875, 90.16796875, - 19.5078125, 7495.671875, -1437.75390625, 172.13671875, - 754.8203125, -5104.0703125, 7777.90625, 7456.6328125, - 1642.0078125, 186.5625, 735.171875, -14439.59375, - 8286.375, 1456.55078125, -11917.8515625, -11773.390625, - 2097.6875, -5921.71484375, -4891.828125, 14908.078125, - -1576.9765625, -18608.78125, -7703.921875, 8292.84375, - -4171.765625, 4803.078125, -3866.46875, 3073.0625, - -8014.515625, -14829.234375, -14335.3125, -4531.23828125, - -5798.28125, 1970.34375, -19271.796875, -3255.6171875, - 5170.859375, -15515.9375, -843.9296875, -2277.2578125, - -6240.921875, -5705.09375, 18334.1640625, 11407.84375, - 18327.625, 1091.078125, 2738.484375, 7692.390625, - -10882.96875, 7517.90625, 1660.921875, -988.703125, - 24371.6171875, 3588.046875, 4901.0625, 13083.41796875, - -23344.25, -9870.640625, 14470.4375, -17102.21875, - -12510.2265625, -18719.140625, 2573.390625, 11104.5078125, - 15623.76171875, 5916.5703125, -1593.46875, -119.48828125, - -5041.734375, 2340.6875, 15526.484375, 8456.9140625, - -5549.984375, -1395.6171875, -2431.01171875, 4059.2578125, - -2646.46484375, 34677.96875, -1436.6953125, 4892.125, - -1536.78125, 13530.71875, -17949.1328125, 679.3125, - 4712.59375, -2922.9296875, -8261.125, 12543.59765625, - -4924.56640625, -3505.9609375, 4623.296875, -2252.5859375, - -4235.953125, -22676.1171875, -16403.4375, 11887.796875, - 11829.890625, 612.71875, -25410.6796875, 6777.453125, - -5512.4375, 7517.3984375, 9033.6328125, -15122.328125, - -14273.32421875, 28852.6640625, 3172.859375, -8073.640625, - 8124.7890625, -2460.625, -392.8359375, -2297.296875, - 3440.515625, -5661.78125, -2393.1484375, 8387.4765625, - -11954.55859375, 8758.046875, -12108.5078125, -1117.53125, - -1124.59375, 1146.6015625, -1617.421875, -12966.0625, - 1223.6171875, 3614.265625, -4869.4921875, 17049.078125, - 8325.375, -4903.4140625, -1148.9140625, -7142.3984375, - 16645.4375, -9805.28125, -7518.4453125, 21499.34375, - -8389.390625, -1916.46875, -818.78125, -4138.40625, - 2594.1796875, -25016.140625, 4696.53125, 11107.078125, - 5423.5234375, 1686.3203125, 745.1171875, 2753.1171875, - -13483.453125, -7521.984375, 9776.078125, 971.390625, - 12825.2734375, -2429.15234375, 14913.52734375, -15369.875, - -12652.984375, 17550.640625, -2681.03125, 4838.25, - 4585.515625, -2235.41015625, 307.3359375, -6510.84765625, - -6023.21484375, 3194.33984375, 7169.625, -972.40625, - 191.3046875, 1424.0859375, -3792.4609375, 1636.99609375, - 12105.5859375, -1301.5703125, 3435.01171875, 1586.68359375, - -668.8515625, -5305.3828125, -5562.2890625, 6825.0390625, - 4532.71875, -3284.91796875, 3240.125, 13165.9140625, - -2819.4453125, -4120.125, -4968.42578125, -8099.375, - 7725.30859375, 1068.5859375, -7190.58203125, 1442.9140625, - -5960.8828125, -7069.71875, -6779.734375, 840.64453125, - 11610.98828125, -2653.90625, -4195.734375, 10076.83984375, - 3114.75, -4034.2109375, 681.6015625, -2177.76953125, - -2545.265625, 231.69921875, -1326.3203125, 3586.3515625, - -1048.4765625, 986.5078125, 1664.109375, 888.0, - -1290.7265625, -2889.21484375, 785.5078125, -832.59375, - -9755.890625, -1764.3046875, 5436.03125, -598.1953125, - 731.7578125, 1338.6484375, 347.671875, 2675.65625, - 3546.8984375, 279.76171875, 7108.2890625, 2223.5390625, - 629.140625, 1358.125, 1418.4375, -5836.31640625, - -2756.4765625, -3265.703125, -2217.63671875, 7164.5234375, - 7194.7109375, -2941.21875, 3873.5, -4540.71875, - 9077.8671875, -9588.3515625, 9636.40625, 6979.6796875, - -17726.4453125, 7012.234375, 5185.7734375, 13603.7109375, - 3981.5546875, -11808.2265625, -21980.3359375, 3822.140625, - -13725.24609375, -12745.46875, 5347.16796875, 7561.5390625, - -6374.6875, -7054.8984375, 9939.5234375, 21318.23828125, - 2077.5234375, -18526.53125, 20275.8671875, 5398.0859375, - 2499.46875, -10253.96875, 11386.46875, 10337.4765625, - -1595.2578125, -19869.5234375, -5400.0, 2007.625, - -10151.21875, -6897.734375, -19205.5703125, -384.0078125, - 8508.88671875, 4486.9921875, 1818.453125, 4604.05078125, - -2522.1171875, -9759.3125, 8014.4296875, 3242.9375, - -2396.328125, 3487.2421875, -12284.86328125, -2679.15625, - -5258.9296875, 4265.546875, 17669.41796875, 15225.66796875, - 10652.79296875, 1599.59375, 13372.24609375, 3147.08984375, - 2707.890625, 4810.609375, -5790.8984375, -1744.484375, - 23885.73828125, -6615.24609375, -5964.0234375, -892.83203125, - -2926.1875, 7654.9140625, -5129.578125, 2529.5546875, - 9321.69921875, 11874.7265625, -2874.46875, -6221.4765625, - -17045.625, 4904.3046875, 7283.58203125, 8041.8046875, - 8165.40625, 301.8828125, -15808.765625, -1516.8125, - 21.3828125, -6668.84375, 154.0859375, 6834.6015625, - -7939.65625, 7895.484375, -3528.953125, -7648.3203125, - 505.421875, -3843.9609375, 14161.984375, -3700.796875, - -4359.390625, -2098.7890625, -13579.28125, -19602.296875, - 5169.8203125, -812.546875, 9614.765625, 13910.76171875, - 4750.5234375, 3026.5625, 5976.765625, -9897.828125, - -11070.375, 5648.359375, -2958.296875, 748.18359375, - 3244.9375, -2721.59375, 677.6875, -5049.9375, - -4586.5546875, 13902.625, 2647.0, -770.46875, - -8598.78125, 10639.046875, -7279.3359375, -6227.859375, - 4642.234375, 3703.5234375, 12864.5390625, 7881.41015625, - 5195.41796875, 10869.671875, 10681.125, 11148.1484375, - -8002.6328125, 4435.05859375, 17962.296875, 3331.359375, - 3182.37109375, -6666.671875, -2052.3984375, 19783.046875, - 12467.953125, -35550.3984375, -14904.2578125, 2281.4375, - 19.765625, 6142.2421875, 9214.83203125, -3947.2421875, - -1183.3203125, 5204.6171875, 5805.40625, -10033.578125, - 1323.609375, -10372.8125, 207.125, -9880.0859375, - -2053.9765625, -5359.171875, -7515.2421875, -6795.6875, - -3231.3046875, -18476.8828125, 17553.890625, -12257.7265625, - -4446.7734375, 2186.8359375, 33.984375, -6545.0703125, - -9458.15625, 860.171875, -4632.6015625, -941.1796875, - 8570.515625, -7168.77734375, 20211.8203125, -9912.890625, - -7248.765625, 966.203125, -2136.21875, 6113.375, - -14287.96875, 2707.4609375, -2365.05078125, 13351.0, - -8868.125, -1675.66796875, -1109.2734375, -4445.3125, - -17930.578125, 2291.796875, -13276.25, 4131.9375, - 5726.921875, 2569.2734375, -23250.4765625, -16244.25, - 5475.6640625, 2373.30859375, 16117.125, -5620.9453125, - 10639.6796875, -7814.390625, -12852.6171875, -4790.47265625, - 17444.703125, 690.890625, 11280.2734375, -7886.6015625, - 3976.34375, -2202.58984375, -9942.7890625, -11087.12890625, - -3137.578125, -6460.53125, -2952.6171875, 5634.7421875, - -1941.7421875, 730.6953125, -1153.4453125, -3112.0625, - 1452.3828125, -5410.09375, -4261.3125, -1456.875, - -5643.1484375, -1463.296875, 3925.58203125, -4900.59375, - 40.6484375, 1252.46484375, -344.40625, 3529.609375, - -900.7890625, 5922.8203125, 15063.515625, -1324.6484375, - 8625.3359375, 2836.80078125, -12040.46484375, -4216.2578125, - -6213.86328125, -10288.53125, 4614.4375, -1944.99609375, - 7520.5234375, -992.953125, -10501.46875, 7919.19140625, - -12348.5078125, 2628.2109375, 1514.1796875, 5828.78125, - -1572.5234375, 2965.0859375, -8385.7109375, -7463.65625, - 3317.1875, 494.47265625, 1404.5859375, -1180.9765625, - -3527.46875, -6418.40234375, -802.8203125, 7201.2265625, - -4751.09375, -838.4921875, -1073.96875, -7186.6953125, - 5347.86328125, 2872.046875, 790.2265625, 3402.5078125, - 1907.296875, 6744.76953125, 2084.5625, 3341.69140625, - 8442.84765625, -13053.484375, 286.046875, 16889.6796875, - -5099.55859375, 3100.8125, 4257.390625, 1814.0078125, - -6161.875, -8540.140625, -4483.34375, 439.2734375, - 227.67578125, -1799.125, -3553.13671875, 1656.9375, - 6641.8359375, -15043.8359375, 5641.4375, -4594.21875, - -1376.0625, -331.0, 7282.484375, -2041.40625, - 9620.3125, -3654.15625, -1126.9765625, 7806.84375, - -11721.73046875, 2696.73828125, -3096.0546875, 5105.82421875, - 10436.234375, -1913.53125, 153.1875, -4162.796875, - 9412.65625, 11157.125, 369.90625, -11163.9921875, - 1634.1796875, 11452.6796875, 740.8046875, 19047.921875, - -7026.5, -5453.51953125, 920.1796875, 16123.0703125, - 4222.296875, -7529.296875, -1535.359375, 3761.2109375, - -367.03125, -6131.1875, 13643.1875, 5634.125, - -1419.5546875, 4258.546875, -2644.84375, 8160.5, - 756.7109375, 4635.49609375, -3157.5, -6028.796875, - -5035.375, 989.3046875, 4301.7265625, -4428.3125, - -4750.1484375, 2756.0078125, 914.015625, 1900.859375, - 1138.453125, -6232.4375, 4567.5546875, 8333.65625, - 11547.3125, -13053.6171875, -12727.390625, -5376.078125, - -2138.2890625, 14686.62109375, -1218.58203125, -11680.9921875, - 1386.78125, 675.359375, -15428.65625, -5115.9765625, - 3637.8125, -7797.75, -2382.59375, 5126.0625, - -13190.69140625, 8505.671875, 7270.078125, -3543.0234375, - -5799.0, -12729.6640625, -6931.734375, 20180.1796875, - 17068.0234375, -2588.7421875, -9055.625, 6298.5625, - -2445.78125, -891.5625, -2216.37109375, -828.140625, - -12619.6484375, -10334.46484375, 9141.03125, 11402.8203125, - -10834.828125, 3124.640625, -18391.984375, 7821.4375, - 8543.5859375, -12.421875, -18913.5234375, 5369.1875, - -2350.953125, -17559.25, 1404.2578125, 2315.15234375, - 17695.484375, 6971.875, -4016.984375, 6146.0078125, - 4189.140625, -4621.578125, 486.5546875, -3347.2421875, - 1304.48046875, -5327.890625, 5445.984375, 1236.8984375, - -3343.890625, -3943.3203125, -6349.9296875, 9726.19921875, - 12270.40625, 355.75, -65.1484375, -10270.2890625, - 11838.4375, 17297.46875, 2137.1640625, 1681.453125, - 15321.875, -8187.03125, -17540.8046875, 1278.78125, - 796.32421875, 2355.9140625, 5170.328125, 6044.40625, - -10024.01171875, 11287.48828125, 15092.3984375, -1404.3203125, - 4568.390625, -9599.65625, -4476.6953125, 10327.2265625, - -4055.96875, -460.6015625, 375.5, 11058.625, - -3602.8046875, -331.03125, 4607.375, 3562.81640625, - -1857.3828125, -3505.53125, -4849.01171875, -3349.9765625, - 887.36328125, 5757.046875, -3885.2421875, -6878.19921875, - 777.421875, -7941.1796875, 2654.5078125, -648.74609375, - -7497.625, -3190.84375, 8338.125, 5672.5859375, - -1459.53125, 3862.8203125, 840.5078125, -2808.44140625, - -4378.484375, 1430.5390625, 3635.96875, -7734.8984375, - -1269.0, -3961.3828125, -1743.4140625, -8289.5625, - 3716.69140625, 15817.203125, 1354.08984375, -5814.9140625, - -3868.46875, -768.5, -2380.3671875, -1916.4609375, - 9306.6015625, -9984.109375, -12726.48046875, 2759.6953125, - 10010.52734375, 16845.6953125, -2136.59375, -3730.140625, - 335.890625, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-sgemm/gendata.py deleted file mode 100755 index 4b6e0c92..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/gendata.py +++ /dev/null @@ -1,79 +0,0 @@ -#!/usr/bin/env python3 - -import numpy as np -import argparse - - -m_dim = 71 -k_dim = 71 -n_dim = 71 - -parser = argparse.ArgumentParser( - description="A script to generate input data for an SGEMM kernel." -) - -parser.add_argument("--mdim", type=int, help="M dimension of inputs") -parser.add_argument("--kdim", type=int, help="K dimension of inputs") -parser.add_argument("--ndim", type=int, help="N dimension of inputs") -parser.add_argument("--size", type=int, help="Dimensions of NxN inputs") - -args = parser.parse_args() - -if args.size: - m_dim = args.size - k_dim = args.size - n_dim = args.size -else: - if args.mdim: - m_dim = args.mdim - if args.kdim: - k_dim = args.kdim - if args.ndim: - n_dim = args.ndim - -a_array_size = m_dim * k_dim -b_array_size = k_dim * n_dim -c_array_size = m_dim * n_dim - -info = np.finfo(np.float32) -maxmant = 1 << 4 -minexp = -4 -maxexp = 4 - - -# Generate floating-point values with exact mantissa and exponent -def randf(n): - return np.ldexp( - np.random.randint(-1 * maxmant, maxmant, size=n), - np.random.randint(minexp, maxexp, size=n), - ) - - -a_matrix = randf((m_dim, k_dim)).astype(np.float32) -b_matrix = randf((k_dim, n_dim)).astype(np.float32) - -c_matrix = np.dot(a_matrix, b_matrix) - -print( - f"""#define M_DIM {m_dim} -#define K_DIM {k_dim} -#define N_DIM {n_dim} - -typedef float data_t; - -""" -) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=10): - print(f"{name} [{data_size}] = {{") - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print_array("static data_t a_matrix", a_matrix.flatten(), "M_DIM*K_DIM") -print_array("static data_t b_matrix", b_matrix.flatten(), "K_DIM*N_DIM") -print_array("static data_t verify_data", c_matrix.flatten(), "M_DIM*N_DIM") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/vec-sgemm.S b/bb-tests/workloads/src/CTest/rvv/vec-sgemm/vec-sgemm.S deleted file mode 100644 index 1c43bede..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/vec-sgemm.S +++ /dev/null @@ -1,200 +0,0 @@ - .text - .balign 4 - .global vec_sgemm_nn -# RV64IDV system -# -# void -# vec_sgemm_nn(size_t n, -# size_t m, -# size_t k, -# const float*a, // m * k matrix -# size_t lda, -# const float*b, // k * n matrix -# size_t ldb, -# float*c, // m * n matrix -# size_t ldc) -# -# c += a*b (alpha=1, no transpose on input matrices) -# matrices stored in C row-major order - -#define n a0 -#define m a1 -#define k a2 -#define ap a3 -#define astride a4 -#define bp a5 -#define bstride a6 -#define cp a7 -#define cstride t0 -#define kt t1 -#define nt t2 -#define bnp t3 -#define cnp t4 -#define akp t5 -#define bkp s0 -#define nvl s1 -#define ccp s2 -#define amp s3 - -# Use args as additional temporaries -#define ft12 fa0 -#define ft13 fa1 -#define ft14 fa2 -#define ft15 fa3 - -#define FRAMESIZE 32 - -# This version holds a 4*VLMAX*4 block of C matrix in vector registers -# in inner loop, but otherwise does not cache or TLB tiling. - -vec_sgemm_nn: - ld cstride, 0(sp) # Get arg from stack frame - addi sp, sp, -FRAMESIZE - sd s0, 0(sp) - sd s1, 8(sp) - sd s2, 16(sp) - sd s3, 24(sp) - - # Check for zero size matrices - beqz n, exit - beqz m, exit - beqz k, exit - - # Convert elements strides to byte strides. - slli astride, astride, 2 - slli bstride, bstride, 2 - slli cstride, cstride, 2 - - slti t6, m, 4 - bnez t6, end_rows - -c_row_loop: # Loop across rows of C blocks - - mv nt, n # Initialize n counter for next row of C blocks - - mv bnp, bp # Initialize B n-loop pointer to start - mv cnp, cp # Initialize C n-loop pointer - -c_col_loop: # Loop across one row of C blocks - vsetvli nvl, nt, e32, m4, ta, ma # 32-bit vectors, LMUL=4 - - # Initalize current C submatrix block from memory. - vle32.v v0, (cnp); add ccp, cnp, cstride; - mv akp, ap # reset pointer into A to beginning - vle32.v v4, (ccp); add ccp, ccp, cstride; - mv bkp, bnp # step to next column in B matrix - vle32.v v8, (ccp); add ccp, ccp, cstride; - vle32.v v12, (ccp); - - # Get vector from B matrix - vle32.v v16, (bkp) - mv kt, k # Initialize inner loop counter - - # Inner loop scheduled assuming 4-clock occupancy of vfmacc instruction and single-issue pipeline - # Software pipeline loads - flw ft0, (akp); add amp, akp, astride; - flw ft1, (amp); add amp, amp, astride; - flw ft2, (amp); add amp, amp, astride; - flw ft3, (amp); add amp, amp, astride; - - # Loop on inner dimension for current C block -k_loop: - vfmacc.vf v0, ft0, v16 - add bkp, bkp, bstride - addi kt, kt, -1 # Decrement k counter - addi akp, akp, 4 - beqz kt, 1f - flw ft0, (akp) - add amp, akp, astride -1: vfmacc.vf v4, ft1, v16 - beqz kt, 1f - flw ft1, (amp) - add amp, amp, astride -1: vfmacc.vf v8, ft2, v16 - beqz kt, 1f - flw ft2, (amp) - add amp, amp, astride -1: vfmacc.vf v12, ft3, v16 - beqz kt, 1f # Exit the loop - flw ft3, (amp) - add amp, amp, astride - vle32.v v16, (bkp) # Get next vector from B matrix, overlap loads with jump stalls - j k_loop - -1: vse32.v v0, (cnp); add ccp, cnp, cstride; - slli t6, nvl, 2 - vse32.v v4, (ccp); add ccp, ccp, cstride; - add cnp, cnp, t6 # Move C block pointer over - vse32.v v8, (ccp); add ccp, ccp, cstride; - add bnp, bnp, t6 # Move B block pointer over - sub nt, nt, nvl # Decrement element count in n dimension - vse32.v v12, (ccp); - - # Following tail instructions should be scheduled earlier in free slots during C block save. - # Leaving here for clarity. - - # Bump pointers for loop across blocks in one row - bnez nt, c_col_loop # Any more to do? - - # Move to next set of rows - addi m, m, -4 # Did 4 rows above - slli t6, astride, 2 # Multiply astride by 4 - add ap, ap, t6 # Move A matrix pointer down 4 rows - slli t6, cstride, 2 # Multiply cstride by 4 - add cp, cp, t6 # Move C matrix pointer down 4 rows - - slti t6, m, 4 - beqz t6, c_row_loop - - # Handle end of matrix with fewer than 4 rows. - # Can use smaller versions of above decreasing in powers-of-2 depending on code-size concerns. -end_rows: - beqz m, exit - -end_rows_row_loop: - mv nt, n # Initialize n counter for next row of C blocks - mv bnp, bp # Initialize B n-loop pointer to start - mv cnp, cp # Initialize C n-loop pointer - -end_rows_col_loop: - vsetvli nvl, nt, e32, m4, ta, ma # 32-bit vectors, LMUL=4 - vle32.v v0, (cnp) - mv akp, ap # reset pointer into A to beginning - mv bkp, bnp # step to next column in B matrix - vle32.v v16, (bkp) - flw ft0, (akp) - mv kt, k # Initialize inner loop counter - -end_rows_k_loop: - vfmacc.vf v0, ft0, v16 - addi akp, akp, 4 - addi kt, kt, -1 - add bkp, bkp, bstride - - beqz kt, 1f - - flw ft0, (akp) - vle32.v v16, (bkp) - j end_rows_k_loop - -1: vse32.v v0, (cnp) - slli t6, nvl, 2 - add cnp, cnp, t6 - add bnp, bnp, t6 - sub nt, nt, nvl - - bnez nt, end_rows_col_loop - - addi m, m, -1 - add ap, ap, astride - add cp, cp, cstride - - bnez m, end_rows_row_loop - -exit: - ld s0, 0(sp) - ld s1, 8(sp) - ld s2, 16(sp) - ld s3, 24(sp) - addi sp, sp, FRAMESIZE - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/vec-sgemm_main.c b/bb-tests/workloads/src/CTest/rvv/vec-sgemm/vec-sgemm_main.c deleted file mode 100644 index e9a81ecc..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemm/vec-sgemm_main.c +++ /dev/null @@ -1,43 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// SGEMM benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized sgemm implementation. - -#include "util.h" -#include -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_sgemm_nn(size_t, size_t, size_t, const float *, size_t, const float *, - size_t, float *, size_t); - -int main(int argc, char *argv[]) { - float results_data[M_DIM * N_DIM] = {0}; - printf("sgemm M,N,K = %ld,%ld,%ld\n", M_DIM, N_DIM, K_DIM); - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_sgemm_nn(N_DIM, M_DIM, K_DIM, a_matrix, K_DIM, b_matrix, N_DIM, - results_data, N_DIM); - memset(results_data, 0, sizeof(results_data)); -#endif - - // Do the sgemm - setStats(1); - vec_sgemm_nn(N_DIM, M_DIM, K_DIM, a_matrix, K_DIM, b_matrix, N_DIM, - results_data, N_DIM); - setStats(0); - - // Check the results - return verifyFloat(M_DIM * N_DIM, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-sgemv/dataset1.h deleted file mode 100644 index 0a7ef5f9..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/dataset1.h +++ /dev/null @@ -1,1123 +0,0 @@ -#define M_DIM 128 -#define N_DIM 128 -#define DIM_SIZE 16384 -float input_data_A[M_DIM * N_DIM] = { - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 4.0, 0.0, 4.0, 4.0, 2.0, 2.0, - 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 1.0, 1.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, 4.0, - 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 2.0, 1.0, - 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 4.0, 2.0, 1.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, - 1.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 1.0, 0.0, - 1.0, 4.0, 0.0, 1.0, 0.0, 1.0, 1.0, 2.0, 1.0, 1.0, 4.0, 0.0, 2.0, 4.0, 4.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, - 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, - 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, 4.0, 1.0, 1.0, 4.0, 4.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 4.0, 2.0, 4.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, - 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, - 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, - 0.0, 1.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 4.0, 1.0, 1.0, 0.0, 0.0, - 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 0.0, 2.0, 2.0, 0.0, 2.0, 1.0, 4.0, - 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 0.0, - 2.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, - 1.0, 4.0, 4.0, 0.0, 0.0, 1.0, 2.0, 1.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, - 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 2.0, 2.0, - 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, - 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 1.0, - 4.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 2.0, 4.0, 4.0, 1.0, 4.0, 4.0, 1.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 4.0, 1.0, 0.0, 1.0, - 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 2.0, - 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, - 4.0, 2.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, - 2.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, - 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 4.0, - 1.0, 2.0, 4.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, - 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, - 0.0, 4.0, 1.0, 4.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 2.0, 1.0, 2.0, 0.0, 2.0, - 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, - 4.0, 2.0, 2.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 1.0, - 0.0, 4.0, 0.0, 4.0, 1.0, 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 2.0, 1.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, - 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 1.0, - 1.0, 0.0, 4.0, 2.0, 4.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, - 0.0, 4.0, 2.0, 1.0, 2.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 1.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 4.0, 4.0, 2.0, - 0.0, 4.0, 0.0, 1.0, 4.0, 2.0, 1.0, 1.0, 4.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, - 1.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 2.0, 4.0, 2.0, - 2.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 2.0, 2.0, 1.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 4.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, - 4.0, 1.0, 2.0, 0.0, 0.0, 2.0, 1.0, 4.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 4.0, 2.0, 2.0, 0.0, 4.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, - 1.0, 4.0, 2.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 1.0, 0.0, 1.0, 0.0, - 2.0, 1.0, 0.0, 2.0, 1.0, 2.0, 4.0, 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 4.0, 1.0, 2.0, - 1.0, 2.0, 1.0, 4.0, 1.0, 2.0, 2.0, 4.0, 4.0, 4.0, 2.0, 0.0, 2.0, 0.0, 1.0, - 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 4.0, 4.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 2.0, 0.0, 1.0, 1.0, 0.0, 4.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 4.0, 0.0, - 4.0, 2.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 1.0, 4.0, 1.0, 0.0, 0.0, 1.0, - 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 2.0, 2.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 4.0, 1.0, 4.0, 1.0, 1.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, 4.0, 0.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 2.0, 2.0, 0.0, 1.0, 2.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 0.0, 4.0, 2.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 4.0, 0.0, 2.0, - 4.0, 4.0, 1.0, 4.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, 0.0, 4.0, - 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, - 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, 2.0, 4.0, - 2.0, 1.0, 1.0, 1.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, 0.0, - 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 4.0, 2.0, 2.0, 2.0, 1.0, 4.0, 0.0, 4.0, 1.0, - 0.0, 0.0, 1.0, 1.0, 2.0, 4.0, 0.0, 0.0, 2.0, 2.0, 4.0, 0.0, 1.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 1.0, - 4.0, 0.0, 4.0, 4.0, 0.0, 1.0, 1.0, 4.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 0.0, 2.0, 0.0, 2.0, 4.0, 2.0, 2.0, 1.0, 4.0, 2.0, 1.0, 1.0, 0.0, 4.0, 2.0, - 0.0, 0.0, 1.0, 1.0, 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, 1.0, 1.0, 0.0, 4.0, 0.0, - 4.0, 4.0, 1.0, 4.0, 1.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, - 4.0, 4.0, 2.0, 4.0, 4.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 2.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 2.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, - 4.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 4.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 1.0, 4.0, 0.0, 2.0, 1.0, - 1.0, 4.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 4.0, 1.0, 0.0, 4.0, - 2.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, - 4.0, 2.0, 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, - 1.0, 2.0, 4.0, 1.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, 4.0, 1.0, 4.0, 1.0, 4.0, - 4.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, - 4.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 4.0, 0.0, 1.0, 4.0, 2.0, 0.0, 4.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, 4.0, 1.0, 0.0, 0.0, 1.0, - 2.0, 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 4.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 2.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, - 4.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 1.0, 1.0, 0.0, 2.0, 2.0, 1.0, 4.0, 4.0, 4.0, 0.0, 4.0, 4.0, 1.0, 0.0, 4.0, - 0.0, 2.0, 1.0, 2.0, 0.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, - 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 4.0, 0.0, 1.0, 4.0, 0.0, - 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 2.0, 4.0, - 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, 4.0, 0.0, 2.0, 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 1.0, 0.0, 4.0, 2.0, 1.0, 4.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, - 2.0, 1.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, 4.0, 1.0, - 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 4.0, 2.0, 2.0, 0.0, 4.0, 4.0, 4.0, 1.0, - 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 2.0, 4.0, 2.0, 4.0, 0.0, 1.0, 4.0, 1.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, - 0.0, 2.0, 4.0, 0.0, 2.0, 4.0, 4.0, 0.0, 4.0, 1.0, 2.0, 2.0, 2.0, 1.0, 4.0, - 1.0, 0.0, 2.0, 0.0, 4.0, 2.0, 4.0, 4.0, 2.0, 0.0, 1.0, 0.0, 2.0, 4.0, 2.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, - 4.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 4.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 4.0, 2.0, 0.0, 4.0, 2.0, 2.0, 1.0, 4.0, - 0.0, 1.0, 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 2.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, 1.0, - 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 4.0, - 2.0, 1.0, 4.0, 0.0, 2.0, 0.0, 1.0, 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, - 0.0, 4.0, 1.0, 1.0, 4.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 2.0, - 0.0, 4.0, 0.0, 1.0, 2.0, 1.0, 1.0, 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 0.0, 1.0, - 2.0, 4.0, 0.0, 2.0, 0.0, 4.0, 1.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 1.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 4.0, 1.0, 4.0, 0.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, - 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, 1.0, 4.0, 2.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, - 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, - 2.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 1.0, 4.0, 0.0, 2.0, 1.0, - 0.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 4.0, 4.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, 4.0, - 2.0, 2.0, 0.0, 2.0, 1.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, - 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 4.0, - 0.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 0.0, - 0.0, 2.0, 2.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, - 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, - 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 2.0, 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 0.0, 1.0, - 2.0, 4.0, 0.0, 0.0, 2.0, 2.0, 4.0, 1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 2.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, 4.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 4.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, - 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, 1.0, 1.0, 4.0, 2.0, 0.0, - 1.0, 2.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 4.0, - 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 1.0, 2.0, 0.0, 4.0, 2.0, 1.0, 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 2.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 4.0, 4.0, 1.0, 0.0, 2.0, 1.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 2.0, 4.0, 0.0, 2.0, 0.0, 4.0, 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 2.0, 1.0, 0.0, 2.0, 4.0, 1.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, - 2.0, 0.0, 1.0, 2.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, - 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 0.0, 1.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 2.0, 4.0, 0.0, 0.0, 1.0, 4.0, - 1.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 1.0, 0.0, 2.0, 4.0, - 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, - 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, 0.0, - 2.0, 0.0, 4.0, 1.0, 1.0, 0.0, 4.0, 4.0, 1.0, 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, 4.0, 4.0, 1.0, 1.0, 1.0, 2.0, - 0.0, 1.0, 4.0, 4.0, 2.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, - 4.0, 2.0, 1.0, 2.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, - 1.0, 0.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 2.0, 0.0, 1.0, 4.0, 0.0, 1.0, 1.0, - 1.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 4.0, 2.0, 2.0, 4.0, 0.0, 4.0, 0.0, 1.0, - 4.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 4.0, 1.0, - 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 2.0, 1.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 1.0, 4.0, 0.0, 4.0, 4.0, 1.0, - 0.0, 2.0, 0.0, 2.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 2.0, 2.0, - 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 4.0, 1.0, - 1.0, 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 1.0, - 0.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 4.0, 0.0, 4.0, 2.0, 1.0, 2.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 1.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 1.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 2.0, 1.0, 0.0, - 0.0, 1.0, 0.0, 1.0, 1.0, 2.0, 4.0, 0.0, 4.0, 2.0, 2.0, 4.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 4.0, 2.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 1.0, - 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, 1.0, - 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 1.0, 4.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, - 2.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 4.0, 1.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 1.0, 1.0, - 2.0, 1.0, 4.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 1.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, - 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, - 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, - 1.0, 1.0, 4.0, 1.0, 1.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 1.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, - 2.0, 4.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 1.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 2.0, 4.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 2.0, 2.0, 2.0, 0.0, 4.0, 1.0, 4.0, 0.0, 1.0, 0.0, 4.0, 4.0, 4.0, 4.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 4.0, 2.0, 0.0, 1.0, 2.0, 0.0, 2.0, 2.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, - 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, 1.0, 1.0, 1.0, 0.0, 1.0, 4.0, 1.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 4.0, 2.0, 1.0, 1.0, 0.0, 0.0, - 4.0, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 1.0, 2.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 4.0, 2.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, 1.0, 4.0, - 1.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, 1.0, 2.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 4.0, - 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 1.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 1.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 1.0, 2.0, 4.0, 4.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 4.0, 2.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, - 1.0, 0.0, 2.0, 4.0, 1.0, 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, - 4.0, 4.0, 2.0, 0.0, 1.0, 0.0, 4.0, 2.0, 0.0, 1.0, 1.0, 2.0, 2.0, 0.0, 1.0, - 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 1.0, 4.0, 4.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 2.0, - 4.0, 4.0, 1.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, - 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, 2.0, 4.0, 1.0, 4.0, - 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 1.0, - 1.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 1.0, 1.0, 0.0, 1.0, - 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, - 0.0, 1.0, 2.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 2.0, 1.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, - 4.0, 4.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 2.0, 0.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 4.0, 4.0, - 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 2.0, 1.0, - 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 4.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 0.0, 2.0, - 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 4.0, 4.0, 2.0, 2.0, 4.0, 2.0, 4.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, - 0.0, 1.0, 4.0, 4.0, 1.0, 0.0, 2.0, 2.0, 4.0, 0.0, 1.0, 4.0, 1.0, 0.0, 0.0, - 4.0, 4.0, 2.0, 0.0, 1.0, 1.0, 4.0, 4.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, - 2.0, 1.0, 4.0, 2.0, 1.0, 1.0, 4.0, 1.0, 0.0, 2.0, 2.0, 4.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 2.0, - 1.0, 1.0, 2.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 4.0, 1.0, 2.0, 0.0, 4.0, 2.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 4.0, 1.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, - 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, - 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 2.0, 4.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 2.0, 1.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 1.0, 1.0, 2.0, 4.0, - 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 4.0, 4.0, 0.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 2.0, 1.0, 4.0, 0.0, 4.0, 4.0, 0.0, 2.0, 2.0, 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, - 0.0, 1.0, 4.0, 2.0, 0.0, 4.0, 1.0, 1.0, 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, - 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, 2.0, 4.0, 4.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 2.0, - 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, - 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 1.0, 4.0, 0.0, 2.0, 0.0, - 2.0, 2.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 4.0, 4.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 4.0, 1.0, 1.0, 4.0, 1.0, 4.0, 1.0, 0.0, 4.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 4.0, 4.0, 2.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 1.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, - 4.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, - 1.0, 0.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, - 2.0, 4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, - 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 1.0, 4.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 4.0, - 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 2.0, 1.0, 4.0, 4.0, 1.0, 1.0, 2.0, 4.0, 4.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 2.0, 4.0, 1.0, - 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 1.0, - 0.0, 2.0, 2.0, 1.0, 4.0, 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 1.0, - 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 4.0, 1.0, 1.0, 2.0, 2.0, 4.0, 0.0, - 4.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 2.0, 1.0, 1.0, - 0.0, 4.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 1.0, 2.0, 0.0, 4.0, 0.0, - 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 4.0, 2.0, - 0.0, 2.0, 4.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, - 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 4.0, 2.0, - 1.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 1.0, 1.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 2.0, 2.0, 0.0, - 2.0, 1.0, 2.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, - 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 1.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 4.0, 4.0, 2.0, 1.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 4.0, - 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 2.0, 2.0, 4.0, - 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 4.0, 4.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, - 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, - 4.0, 2.0, 0.0, 2.0, 0.0, 4.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 4.0, 2.0, 4.0, 1.0, 0.0, 1.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, - 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 4.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, - 4.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 4.0, 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 2.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 4.0, 4.0, 0.0, 4.0, 2.0, 1.0, 4.0, 4.0, 4.0, - 1.0, 1.0, 0.0, 2.0, 1.0, 2.0, 2.0, 4.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, - 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 4.0, 2.0, 1.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 2.0, 4.0, 0.0, 0.0, 2.0, 1.0, 1.0, 0.0, - 0.0, 1.0, 4.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 2.0, 0.0, 4.0, 1.0, - 2.0, 0.0, 2.0, 4.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 0.0, 4.0, 1.0, 0.0, - 1.0, 0.0, 2.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, - 1.0, 4.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 0.0, 2.0, 4.0, 2.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 2.0, 1.0, 4.0, 1.0, 0.0, 4.0, 4.0, 4.0, 2.0, 1.0, 4.0, 4.0, 2.0, 1.0, - 0.0, 0.0, 1.0, 2.0, 0.0, 4.0, 4.0, 2.0, 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, - 2.0, 1.0, 4.0, 1.0, 1.0, 2.0, 0.0, 1.0, 0.0, 4.0, 4.0, 1.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, - 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 2.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 0.0, 1.0, 4.0, 4.0, - 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 4.0, 2.0, - 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, - 0.0, 1.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 4.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, - 4.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 1.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, 1.0, 0.0, - 0.0, 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 1.0, 0.0, 1.0, 1.0, 4.0, 0.0, 4.0, 2.0, 0.0, 4.0, 1.0, 0.0, 4.0, 4.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, 0.0, 4.0, 4.0, 2.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 2.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 4.0, 2.0, 1.0, 0.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 2.0, 2.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 4.0, 2.0, 1.0, 2.0, 0.0, 1.0, - 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 4.0, 0.0, 1.0, 1.0, 1.0, 0.0, 4.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 4.0, 1.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, - 1.0, 2.0, 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, - 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 1.0, 0.0, 2.0, - 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0, 1.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 4.0, 0.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0, 0.0, - 0.0, 0.0, 4.0, 2.0, 2.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 1.0, 4.0, 2.0, 1.0, - 4.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 4.0, 0.0, 0.0, 4.0, 2.0, 2.0, - 0.0, 1.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 0.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, - 0.0, 4.0, 0.0, 4.0, 4.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 4.0, 4.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 1.0, 2.0, 4.0, 4.0, 0.0, 4.0, 2.0, 0.0, 2.0, 1.0, 2.0, 2.0, - 0.0, 2.0, 1.0, 0.0, 2.0, 1.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, - 0.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 2.0, 0.0, 2.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, 2.0, 0.0, - 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 4.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 2.0, - 2.0, 4.0, 0.0, 1.0, 1.0, 2.0, 2.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 1.0, 2.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, - 2.0, 1.0, 4.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, - 0.0, 1.0, 4.0, 0.0, 1.0, 4.0, 4.0, 1.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 4.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 1.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 1.0, - 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, 4.0, 1.0, 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 1.0, 1.0, 2.0, 1.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 4.0, 1.0, 2.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 2.0, 2.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, - 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, 1.0, 4.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 1.0, 4.0, - 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, - 2.0, 2.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 4.0, 4.0, 4.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, - 2.0, 1.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 2.0, 4.0, 4.0, 0.0, 0.0, - 2.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 4.0, 1.0, - 4.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 4.0, 1.0, 4.0, 0.0, 0.0, 4.0, 1.0, 1.0, 1.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 1.0, 2.0, 1.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, - 0.0, 4.0, 4.0, 1.0, 4.0, 1.0, 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, - 2.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 1.0, 4.0, 2.0, 4.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, - 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 2.0, 4.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, - 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, - 2.0, 0.0, 2.0, 0.0, 1.0, 1.0, 4.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, - 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, - 0.0, 2.0, 1.0, 0.0, 2.0, 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, - 0.0, 0.0, 2.0, 2.0, 2.0, 4.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 0.0, - 2.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, - 4.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, 1.0, 2.0, 4.0, 1.0, 0.0, 2.0, 1.0, 0.0, - 4.0, 4.0, 1.0, 0.0, 0.0, 1.0, 2.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 2.0, 4.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 1.0, 4.0, 1.0, 2.0, 1.0, 4.0, 2.0, 0.0, 0.0, - 4.0, 1.0, 0.0, 1.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 4.0, 2.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, - 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 4.0, 4.0, 0.0, 2.0, 0.0, 4.0, 1.0, - 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 4.0, 4.0, 1.0, - 2.0, 0.0, 0.0, 0.0, 2.0, 4.0, 4.0, 2.0, 4.0, 0.0, 1.0, 2.0, 2.0, 0.0, 2.0, - 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 4.0, 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, - 0.0, 4.0, 1.0, 0.0, 1.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 2.0, 1.0, - 4.0, 2.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 1.0, - 4.0, 2.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 2.0, 0.0, 1.0, 4.0, 0.0, 2.0, 1.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 1.0, 0.0, 4.0, 1.0, 4.0, 2.0, 2.0, 0.0, 4.0, 0.0, 4.0, 4.0, 1.0, 2.0, - 1.0, 2.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, - 1.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, - 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, - 0.0, 0.0, 2.0, 4.0, 4.0, 4.0, 0.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 4.0, 0.0, 4.0, 1.0, 0.0, 4.0, 4.0, 4.0, 1.0, 0.0, 2.0, 2.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 1.0, - 0.0, 1.0, 0.0, 4.0, 2.0, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, - 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 0.0, - 1.0, 0.0, 2.0, 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 4.0, 1.0, 4.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, - 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 1.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 1.0, 0.0, 1.0, 4.0, 0.0, 1.0, 0.0, 2.0, 4.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 0.0, 0.0, 2.0, 4.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, - 0.0, 4.0, 1.0, 2.0, 1.0, 0.0, 4.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 4.0, - 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, 1.0, 2.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 2.0, 1.0, 1.0, 0.0, - 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 4.0, - 2.0, 2.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 2.0, 0.0, 1.0, 4.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 1.0, 4.0, 4.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 1.0, 1.0, - 0.0, 1.0, 0.0, 2.0, 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, - 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 1.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 4.0, - 0.0, 0.0, 2.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 1.0, 1.0, 4.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 1.0, - 1.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 2.0, - 2.0, 4.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, - 4.0, 0.0, 2.0, 0.0, 4.0, 2.0, 2.0, 1.0, 1.0, 4.0, 4.0, 0.0, 0.0, 1.0, 4.0, - 0.0, 4.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 4.0, 4.0, 4.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 2.0, 4.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 4.0, 1.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 2.0, 4.0, 2.0, 2.0, 1.0, 4.0, 1.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, - 4.0, 4.0, 1.0, 0.0, 2.0, 2.0, 4.0, 0.0, 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 1.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 4.0, 4.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, - 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, - 4.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 4.0, - 2.0, 4.0, 4.0, 2.0, 2.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 2.0, - 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 1.0, 0.0, 4.0, 1.0, 4.0, 1.0, 4.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, - 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 1.0, 0.0, 1.0, - 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 1.0, 1.0, 0.0, - 0.0, 4.0, 2.0, 4.0, 4.0, 2.0, 1.0, 2.0, 4.0, 1.0, 2.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, - 0.0, 4.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, - 1.0, 1.0, 2.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 0.0, - 2.0, 4.0, 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 0.0, 4.0, 1.0, 2.0, 4.0, - 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, 2.0, 4.0, 2.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, - 1.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 4.0, 1.0, 1.0, 4.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 0.0, 4.0, 4.0, - 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, - 2.0, 2.0, 4.0, 0.0, 4.0, 4.0, 4.0, 1.0, 2.0, 0.0, 1.0, 2.0, 1.0, 2.0, 0.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 1.0, 1.0, 0.0, 1.0, 0.0, 2.0, 4.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, 1.0, 4.0, 4.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 2.0, 2.0, - 2.0, 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 4.0, 2.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, - 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 4.0, 2.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 4.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 1.0, 1.0, 1.0, - 0.0, 1.0, 0.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 2.0, - 0.0, 4.0, 1.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, - 1.0, 1.0, 0.0, 2.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 4.0, 2.0, 1.0, 1.0, 0.0, 1.0, 4.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 2.0, 4.0, 4.0, - 4.0, 0.0, 0.0, 4.0, 2.0, 4.0, 0.0, 2.0, 1.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, - 4.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 2.0, 1.0, 1.0, 4.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 4.0, 0.0, 1.0, 4.0, 2.0, 0.0, 4.0, 0.0, - 2.0, 2.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 2.0, 2.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 1.0, 2.0, 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, - 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, - 1.0, 1.0, 0.0, 4.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, - 2.0, 1.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 4.0, 1.0, - 1.0, 4.0, 0.0, 0.0, 2.0, 2.0, 2.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, 4.0, 1.0, - 2.0, 0.0, 1.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 2.0, 4.0, 1.0, 4.0, 4.0, 4.0, 2.0, 0.0, 4.0, 4.0, 4.0, 2.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 2.0, 2.0, 1.0, 4.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 1.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 4.0, 1.0, 4.0, 4.0, 0.0, - 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 4.0, 1.0, 4.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 4.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 4.0, 4.0, - 4.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 4.0, 1.0, 1.0, 4.0, 0.0, - 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 2.0, 1.0, 4.0, 4.0, 4.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 4.0, 1.0, 4.0, 1.0, 4.0, 0.0, 4.0, - 0.0, 1.0, 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 2.0, - 1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 0.0, 2.0, 2.0, 0.0, 2.0, 1.0, 1.0, 1.0, 0.0, - 4.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 4.0, - 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 1.0, 1.0, - 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, - 0.0, 2.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 1.0, 0.0, 2.0, 4.0, 1.0, 4.0, 0.0, 2.0, - 0.0, 1.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 2.0, - 2.0, 0.0, 1.0, 4.0, 2.0, 1.0, 4.0, 2.0, 4.0, 2.0, 1.0, 0.0, 2.0, 0.0, 2.0, - 2.0, 1.0, 2.0, 2.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 1.0, 1.0, 4.0, 2.0, 2.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 4.0, 4.0, 0.0, 0.0, 2.0, - 1.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 4.0, 2.0, - 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, - 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 1.0, 0.0, 4.0, 0.0, 2.0, - 4.0, 4.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 4.0, - 4.0, 1.0, 4.0, 2.0, 1.0, 2.0, 2.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 0.0, 2.0, 1.0, 1.0, - 0.0, 2.0, 4.0, 2.0, 0.0, 4.0, 0.0, 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 4.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 2.0, 1.0, - 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 1.0, 1.0, - 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 4.0, 4.0, - 1.0, 2.0, 4.0, 2.0, 0.0, 0.0, 4.0, 2.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 2.0, 1.0, 4.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 1.0, 1.0, - 0.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, - 0.0, 2.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, - 2.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, - 1.0, 4.0, 2.0, 2.0, 1.0, 4.0, 0.0, 4.0, 0.0, 4.0, 4.0, 1.0, 2.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 4.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 1.0, 2.0, 4.0, 4.0, 0.0, 4.0, 4.0, 1.0, 1.0, 0.0, 2.0, 1.0, - 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 1.0, - 2.0, 0.0, 4.0, 2.0, 1.0, 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 1.0, - 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 0.0, 1.0, - 0.0, 0.0, 2.0, 1.0, 1.0, 4.0, 1.0, 0.0, 0.0, 2.0, 1.0, 4.0, 4.0, 0.0, 1.0, - 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 4.0, 1.0, 4.0, 2.0, 0.0, 4.0, 4.0, 0.0, 2.0, 1.0, 0.0, 1.0, 4.0, 4.0, 1.0, - 2.0, 1.0, 1.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 1.0, 4.0, 2.0, 1.0, 1.0, 4.0, 4.0, 4.0, 2.0, 2.0, 4.0, 4.0, - 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, - 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 1.0, 4.0, 1.0, 0.0, 1.0, 1.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 4.0, 2.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 1.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 2.0, 2.0, 0.0, 0.0, 2.0, 4.0, 4.0, 2.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, - 4.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 4.0, 4.0, 1.0, 4.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, 2.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 2.0, 1.0, 0.0, 2.0, 4.0, 4.0, 0.0, 0.0, - 1.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 2.0, 2.0, - 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 4.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 2.0, 0.0, 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 2.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 2.0, 2.0, 1.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 4.0, 4.0, 2.0, 0.0, 1.0, 4.0, 0.0, 4.0, 4.0, 1.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 4.0, 1.0, 4.0, - 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, 2.0, 4.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 4.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 2.0, 2.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, - 0.0, 0.0, 1.0, 2.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, - 2.0, 1.0, 1.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 4.0, 2.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 4.0, 2.0, 2.0, 1.0, 4.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 1.0, 0.0, - 0.0, 4.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 2.0, 4.0, 4.0, 1.0, 4.0, 4.0, 4.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 1.0, 4.0, 1.0, - 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, - 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 4.0, 4.0, 4.0, 0.0, - 1.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 0.0, 4.0, 2.0, 1.0, - 2.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, - 2.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 2.0, 4.0, 4.0, 0.0, 2.0, 0.0, 4.0, - 2.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 4.0, 2.0, 4.0, 2.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, - 0.0, 4.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, - 4.0, 0.0, 4.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 2.0, - 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 0.0, 1.0, - 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0, 4.0, - 0.0, 2.0, 4.0, 4.0, 1.0, 4.0, 0.0, 2.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 0.0, - 4.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 1.0, - 4.0, 4.0, 0.0, 2.0, 1.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, - 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 2.0, 2.0, 0.0, 1.0, 1.0, - 4.0, 4.0, 2.0, 2.0, 2.0, 0.0, 0.0, 4.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 4.0, 1.0, 4.0, 0.0, 0.0, 2.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, - 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 2.0, 2.0, 0.0, 1.0, 4.0, - 0.0, 1.0, 0.0, 4.0, 2.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 1.0, - 1.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 1.0, 1.0, 0.0, 2.0, - 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 4.0, 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, - 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 0.0, 2.0, 4.0, 2.0, 2.0, 0.0, 1.0, 4.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 4.0, 4.0, 4.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 1.0, 4.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 1.0, 4.0, 4.0, 2.0, 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 1.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 4.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 1.0, 4.0, 0.0, 4.0, 1.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 1.0, 2.0, 1.0, 4.0, 0.0, 1.0, 2.0, 4.0, 0.0, 2.0, 2.0, 0.0, 4.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, - 4.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 4.0, 0.0, 2.0, 1.0, - 4.0, 0.0, 4.0, 1.0, 0.0, 1.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 2.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, - 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 4.0, 2.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 2.0, 4.0, 2.0, 0.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, 2.0, 1.0, 0.0, 2.0, 2.0, 1.0, 2.0, 2.0, 2.0, - 4.0, 0.0, 1.0, 0.0, 2.0, 4.0, 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, - 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 2.0, 2.0, 0.0, 1.0, 4.0, 2.0, 2.0, - 2.0, 4.0, 0.0, 2.0, 0.0, 1.0, 1.0, 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 2.0, 2.0, - 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, - 0.0, 1.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 1.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 1.0, 4.0, 4.0, 2.0, 0.0, 2.0, 4.0, 0.0, 4.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 1.0, 2.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 2.0, - 1.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 4.0, - 2.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, - 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, - 2.0, 1.0, 2.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 2.0, 4.0, - 4.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 4.0, 4.0, 2.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 4.0, 1.0, 0.0, - 1.0, 0.0, 2.0, 0.0, 1.0, 1.0, 1.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, - 2.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 1.0, 0.0, 2.0, 2.0, 2.0, 0.0, 2.0, 0.0, 4.0, 4.0, 1.0, 0.0, 0.0, 2.0, - 0.0, 4.0, 1.0, 1.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 1.0, - 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 2.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 2.0, 1.0, 4.0, - 4.0, 4.0, 2.0, 4.0, 0.0, 0.0, 1.0, 4.0, 4.0, 4.0, 1.0, 2.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 2.0, 1.0, 1.0, - 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 4.0, 2.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, - 2.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 1.0, 1.0, 2.0, 4.0, 0.0, 2.0, 1.0, - 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 4.0, - 4.0, 4.0, 0.0, 4.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, - 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 0.0, - 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 0.0, - 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 1.0, 1.0, 4.0, 2.0, 2.0, - 0.0, 4.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 4.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 1.0, 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, - 4.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 4.0, 2.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, 2.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 2.0, - 0.0, 0.0, 4.0, 0.0, 1.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, - 2.0, 4.0, 1.0, 0.0, 1.0, 4.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 1.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 1.0, 0.0, 1.0, 2.0, - 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, - 1.0, 2.0, 4.0, 2.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 2.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 4.0, 1.0, 4.0, 0.0, 0.0, - 4.0, 2.0, 1.0, 0.0, 4.0, 1.0, 0.0, 1.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 1.0, 1.0, 4.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, - 2.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, - 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 0.0, 2.0, 1.0, 1.0, - 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, - 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, - 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, 2.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, - 2.0, 1.0, 0.0, 4.0, 0.0, 4.0, 2.0, 4.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 0.0, 2.0, - 1.0, 0.0, 4.0, 1.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 4.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, - 0.0, 4.0, 0.0, 4.0, 2.0, 4.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 4.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 4.0, 2.0, 2.0, - 2.0, 4.0, 1.0, 0.0, 0.0, 4.0, 2.0, 0.0, 4.0, 1.0, 4.0, 1.0, 1.0, 2.0, 0.0, - 1.0, 1.0, 0.0, 0.0, 4.0, 1.0, 4.0, 1.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, 2.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 4.0, 1.0, 0.0, 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 4.0, - 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, 0.0, 4.0, - 0.0, 4.0, 4.0, 1.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 4.0, 4.0, 1.0, 0.0, 4.0, 2.0, - 4.0, 1.0, 0.0, 1.0, 4.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, - 1.0, 1.0, 4.0, 1.0, 0.0, 1.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, - 1.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, - 0.0, 0.0, 4.0, 1.0, 1.0, 2.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, - 0.0, 4.0, 2.0, 2.0, 4.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 4.0, 2.0, 1.0, 1.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 4.0, 4.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 4.0, 4.0, - 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, 4.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0, 2.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, 2.0, - 4.0, 2.0, 1.0, 2.0, 4.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, - 4.0, 2.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 4.0, 1.0, 0.0, 1.0, 4.0, 1.0, - 4.0, 2.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, 2.0, 4.0, 2.0, 1.0, - 2.0, 4.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 1.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 1.0, - 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 2.0, 0.0, 1.0, 1.0, 4.0, 0.0, 0.0, 2.0, 2.0, - 0.0, 4.0, 4.0, 4.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 1.0, 4.0, 4.0, 1.0, 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 1.0, 4.0, 0.0, 0.0, 0.0, 2.0, 2.0, 4.0, 0.0, 4.0, 4.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 2.0, 2.0, 1.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, - 2.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 2.0, 2.0, 4.0, 2.0, 0.0, 2.0, 4.0, - 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 4.0, 4.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 2.0, 4.0, 2.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 1.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 2.0, 4.0, - 0.0, 1.0, 4.0, 4.0, 0.0, 4.0, 2.0, 0.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, - 1.0, 4.0, 2.0, 2.0, 0.0, 2.0, 1.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, - 1.0, 1.0, 4.0, 0.0, 2.0, 0.0, 2.0, 1.0, 1.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 4.0, - 0.0, 1.0, 0.0, 4.0, 2.0, 4.0, 4.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, - 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, 4.0, 2.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, - 4.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 4.0, 4.0, 2.0, 0.0, - 1.0, 1.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, - 2.0, 0.0, 2.0, 4.0, 2.0, 1.0, 4.0, 1.0, 4.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 2.0, 0.0, 2.0, 0.0, - 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 0.0, 4.0, 1.0, 2.0, - 0.0, 4.0, 1.0, 2.0, 0.0, 2.0, 0.0, 4.0, 4.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 4.0, 2.0, 2.0, 4.0, 2.0, 0.0, 0.0, 1.0, 2.0, 0.0, 2.0, - 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, - 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, 2.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 1.0, - 4.0, 4.0, 2.0, 1.0, 0.0, 4.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 0.0, - 1.0, 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 0.0, 0.0, 2.0, - 2.0, 1.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 4.0, 2.0, - 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 4.0, 1.0, 1.0, 0.0, - 2.0, 2.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 4.0, 1.0, 0.0, 4.0, 1.0, 0.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 2.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 4.0, 1.0, 4.0, 1.0, 2.0, 2.0, 1.0, 0.0, 4.0, 4.0, 4.0, 2.0, 1.0, 0.0, - 4.0, 0.0, 1.0, 4.0, 4.0, 2.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 4.0, 1.0, - 0.0, 0.0, 4.0, 2.0, 2.0, 4.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 4.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 4.0, 4.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 0.0, 2.0, 4.0, 1.0, 0.0, - 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 1.0, 0.0, 1.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, - 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 0.0, 4.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 4.0, 0.0, 1.0, 2.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 4.0, 4.0, 4.0, - 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 2.0, 0.0, 2.0, 0.0, 4.0, 1.0, 1.0, 0.0, 1.0, - 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, - 1.0, 0.0, 2.0, 2.0, 4.0, 4.0, 2.0, 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 2.0, 1.0, 4.0, 4.0, 2.0, - 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, - 0.0, 1.0, 1.0, 4.0, 2.0, 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 1.0, 0.0, - 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, - 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 2.0, - 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 4.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 4.0, 0.0, 1.0, 4.0, 0.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 1.0, 0.0, 4.0, 4.0, 2.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 4.0, 0.0, 4.0, 4.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 4.0, 2.0, 0.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 0.0, 0.0, 0.0, - 2.0, 4.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 2.0, 0.0, 2.0, 0.0, 2.0, 0.0, 2.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 4.0, 4.0, - 1.0, 0.0, 0.0, 2.0, 4.0, 0.0, 4.0, 0.0, 2.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 2.0, 2.0, 2.0, 4.0, 0.0, 1.0, 4.0, 0.0, 2.0, 4.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 2.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, - 1.0, 4.0, 1.0, 0.0, 0.0, 4.0, 1.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 1.0, 1.0, 2.0, - 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 4.0, 0.0, 1.0, 0.0, 1.0, 4.0, 2.0, 2.0, - 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 4.0, 1.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 4.0, 1.0, 1.0, 0.0, 0.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 1.0, 0.0, 2.0, 1.0, 4.0, 4.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 0.0, 0.0, 0.0, - 0.0, 4.0, 1.0, 2.0, 4.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, - 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 2.0, 4.0, - 4.0, 2.0, 0.0, 0.0, 1.0, 2.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 4.0, 0.0, 2.0, 2.0, 1.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, - 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, - 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, 1.0, 2.0, 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 2.0, 2.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, - 4.0, 4.0, 1.0, 0.0, 4.0, 4.0, 1.0, 2.0, 2.0, 0.0, 4.0, 2.0, 0.0, 2.0, 4.0, - 1.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 1.0, 1.0, 0.0, - 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, - 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, - 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 2.0, - 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0, 4.0, 0.0, 1.0, 0.0, 4.0, 1.0, 1.0, 0.0, - 2.0, 0.0, 4.0, 2.0, 1.0, 4.0, 0.0, 4.0, 4.0, 2.0, 2.0, 4.0, 0.0, 1.0, 4.0, - 2.0, 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 0.0, 2.0, 1.0, 0.0, 4.0, 0.0, 1.0, 2.0, - 0.0, 2.0, 0.0, 4.0, 2.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 1.0, 2.0, - 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 2.0, 4.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 1.0, 4.0, 1.0, 0.0, 4.0, 1.0, 4.0, 1.0, 0.0, - 1.0, 1.0, 2.0, 0.0, 2.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 1.0, - 0.0, 2.0, 0.0, 1.0, 2.0, 4.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, - 4.0, 4.0, 4.0, 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 2.0, 2.0, 1.0, 4.0, 0.0, 4.0, 2.0, 1.0, 2.0, 4.0, - 0.0, 1.0, 1.0, 4.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, - 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 0.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 0.0, 4.0, 0.0, 2.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, - 1.0, 0.0, 1.0, 0.0, 4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, 1.0, - 2.0, 2.0, 1.0, 2.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 2.0, 4.0, 2.0, - 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 4.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, - 0.0, 1.0, 4.0, 4.0, 2.0, 4.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, - 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 2.0, - 2.0, 0.0, 1.0, 4.0, 1.0, 2.0, 4.0, 4.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, - 2.0, 4.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 4.0, 1.0, - 0.0, 0.0, 4.0, 0.0, 0.0, 1.0, 1.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 4.0, 4.0, 0.0, 1.0, 4.0, 0.0, 0.0, 0.0, 4.0, 1.0, 4.0, 4.0, 0.0, - 2.0, 1.0, 1.0, 2.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, - 1.0, 1.0, 2.0, 0.0, 1.0, 2.0, 4.0, 0.0, 4.0, 1.0, 0.0, 0.0, 1.0, 2.0, 0.0, - 2.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 1.0, 0.0, 2.0, - 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 4.0, - 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, - 2.0, 0.0, 2.0, 4.0, 0.0, 1.0, 0.0, 4.0, 2.0, 2.0, 1.0, 0.0, 1.0, 4.0, 2.0, - 4.0, 1.0, 1.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, - 4.0, 0.0, 4.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, - 2.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 4.0, 0.0, 1.0, 4.0, 4.0, 1.0, 0.0, - 4.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 2.0, 0.0, 4.0, - 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, - 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 4.0, 1.0, 0.0, 0.0, 1.0, 4.0, 0.0, - 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 1.0, 4.0, 2.0, - 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 1.0, 1.0, 0.0, 0.0, - 2.0, 0.0, 1.0, 4.0, 2.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 1.0, 0.0, - 1.0, 0.0, 2.0, 0.0, 2.0, 1.0, 4.0, 2.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 4.0, 4.0, 0.0, 2.0, 2.0, 4.0, 2.0, 4.0, 4.0, 4.0, 4.0, 1.0, 0.0, 0.0, - 0.0, 1.0, 2.0, 4.0, 0.0, 4.0, 1.0, 2.0, 2.0, 2.0, 1.0, 4.0, 2.0, 0.0, 0.0, - 0.0, 2.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 1.0, 4.0, 0.0, 4.0, 0.0, 4.0, 0.0, - 1.0, 0.0, 2.0, 4.0, 4.0, 4.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 4.0, 1.0, 4.0, 2.0, 2.0, 2.0, 2.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 4.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 2.0, 4.0, - 1.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 4.0, 4.0, 0.0, 4.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, - 4.0, 0.0, 4.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, - 4.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 0.0, 1.0, - 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, - 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 4.0, 1.0, 4.0, 2.0, 0.0, 0.0, 1.0, 1.0, - 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 4.0, 1.0, 1.0, 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, - 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4.0, 0.0, 0.0, 2.0, 4.0, 1.0, 0.0, 0.0, - 2.0, 4.0, 4.0, 4.0, 0.0, 0.0, 4.0, 0.0, 4.0, 0.0, 2.0, 4.0, 0.0, 0.0, 2.0, - 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, 0.0, 4.0, 4.0, 4.0, 0.0, 4.0, 4.0, 0.0, 4.0, - 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 2.0, 1.0, 0.0, 4.0, 4.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4.0, 0.0, 4.0, - 1.0, 0.0, 2.0, 1.0, 0.0, 4.0, 4.0, 2.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, - 0.0, 2.0, 4.0, 4.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, - 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 4.0, 0.0, 0.0, - 0.0, 2.0, 2.0, 0.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 4.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 1.0, 0.0, 2.0, 0.0, 4.0, 1.0, - 2.0, 0.0, 0.0, 4.0, 1.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 4.0, 4.0, 1.0, 0.0, - 0.0, 4.0, 0.0, 0.0, -}; -float input_data_x[M_DIM] = { - 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 4.0, 1.0, 0.0, 4.0, 4.0, 4.0, 0.0, - 2.0, 4.0, 2.0, 0.0, 1.0, 2.0, 2.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, - 1.0, 1.0, 4.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 4.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, - 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 1.0, 0.0, 0.0, 2.0, 0.0, 1.0, 2.0, 4.0, - 0.0, 2.0, 1.0, 0.0, 2.0, 4.0, 4.0, 1.0, 4.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, - 2.0, 2.0, 4.0, 2.0, 4.0, 4.0, 0.0, 2.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 1.0, - 0.0, 0.0, 2.0, 0.0, 4.0, 4.0, 1.0, 0.0, 1.0, 4.0, 4.0, 0.0, 0.0, 4.0, 0.0, - 2.0, 1.0, 0.0, 1.0, 0.0, 4.0, 1.0, 0.0, -}; -float verify_data[N_DIM] = { - 147.0, 219.0, 230.0, 268.0, 177.0, 160.0, 245.0, 180.0, 194.0, 235.0, 232.0, - 210.0, 237.0, 228.0, 192.0, 198.0, 179.0, 107.0, 160.0, 151.0, 171.0, 157.0, - 209.0, 132.0, 231.0, 213.0, 202.0, 229.0, 168.0, 207.0, 150.0, 239.0, 197.0, - 189.0, 153.0, 268.0, 199.0, 213.0, 149.0, 192.0, 205.0, 172.0, 156.0, 211.0, - 173.0, 167.0, 161.0, 256.0, 203.0, 212.0, 137.0, 226.0, 186.0, 236.0, 186.0, - 185.0, 202.0, 174.0, 185.0, 235.0, 184.0, 227.0, 268.0, 172.0, 190.0, 236.0, - 203.0, 183.0, 186.0, 210.0, 219.0, 165.0, 219.0, 212.0, 205.0, 178.0, 216.0, - 206.0, 188.0, 174.0, 155.0, 266.0, 192.0, 218.0, 166.0, 173.0, 142.0, 202.0, - 188.0, 147.0, 148.0, 230.0, 249.0, 202.0, 219.0, 191.0, 182.0, 182.0, 179.0, - 236.0, 220.0, 213.0, 159.0, 160.0, 201.0, 238.0, 175.0, 186.0, 160.0, 190.0, - 136.0, 144.0, 119.0, 169.0, 199.0, 175.0, 174.0, 186.0, 168.0, 184.0, 227.0, - 221.0, 246.0, 165.0, 188.0, 178.0, 189.0, 178.0, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-sgemv/gendata.py deleted file mode 100755 index 3c21ca42..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/gendata.py +++ /dev/null @@ -1,43 +0,0 @@ -#!/usr/bin/env python3 - -import numpy - -M_DIM = 128 -N_DIM = 128 - -info = numpy.finfo(numpy.float64) -nmant = 1 # Limit precision to avoid rounding errors -maxmant = 1 << nmant -minexp = 0 -maxexp = 3 - - -# Generate floating-point values with exact mantissa and exponent -def randf(n): - return numpy.ldexp( - numpy.random.randint(maxmant, size=n), - numpy.random.randint(minexp, maxexp, size=n), - ) - - -A = randf((M_DIM, N_DIM)).astype(numpy.float64) -x = randf(M_DIM).astype(numpy.float64) -result = numpy.dot(numpy.transpose(x), A) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=8): - print("{} {}[{}] = {{".format(data_type, name, data_size)) - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print("#define M_DIM {}".format(M_DIM)) -print("#define N_DIM {}".format(N_DIM)) -print("#define DIM_SIZE {}".format(M_DIM * N_DIM)) - -print_array("input_data_A", A.flatten(), "M_DIM * N_DIM") -print_array("input_data_x", x, "M_DIM") -print_array("verify_data", result, "N_DIM") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/vec-sgemv.S b/bb-tests/workloads/src/CTest/rvv/vec-sgemv/vec-sgemv.S deleted file mode 100644 index 9b590563..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/vec-sgemv.S +++ /dev/null @@ -1,94 +0,0 @@ - .text - .balign 4 - .global vec_sgemv -# RV64IDV system -# -# void -# vec_sgemv(size_t m, -# size_t n, -# const float* v, // m-length vector -# const float* m, // m * n matrix -# float*c) // m-length vector -# -# c += a*b (V^T * M) -# matrix stored in row-major order - -#define m a0 -#define n a1 -#define vp a2 -#define mp a3 -#define cp a4 - -#define vt t1 -#define mpm t2 -#define nvl t3 -#define nstride t4 -#define mt t5 -#define nvlb t6 - - -vec_sgemv: - # Check for zero size matrices - beqz m, exit - beqz n, exit - - # Convert elements strides to byte strides. - slli nstride, n, 2 - - slti mt, mp, 2 - bnez mt, exit - -c_col_loop: - vsetvli nvl, n, e32, m8, ta, ma # 32-bit vectors, LMUL=8 - - mv mt, m # reset the row pointer - mv vt, vp # reset the vector pointer - - # Load vector from the C matrix - vle32.v v16, (cp) - - vle32.v v0, (mp) - add mpm, mp, nstride - vle32.v v8, (mpm) - add mpm, mpm, nstride - - flw ft0, (vp) - flw ft1, 4(vp) - addi vt, vp, 8 - -m_loop: - vfmacc.vf v16, ft0, v0 # Compute the first FMA of the loop against V scalar - vle32.v v0, (mpm) # Load the next row of the matrix - flw ft0, (vt) # Load the next scalar from V - add mpm, mpm, nstride # Bump the M pointer for the next M vector load - addi mt, mt, -2 # Completing two rows of M in this loop - vfmacc.vf v16, ft1, v8 # Compute the second FMA of the loop - vle32.v v8, (mpm) # Load the next row of the matrix - add mpm, mpm, nstride # Bump the M pointer - flw ft1, 4(vt) # Load the next V scalar - addi vt, vt, 8 # Bump the pointer for the next scalar load - slti t0, mt, 4 - bnez t0, 1f - j m_loop - -1: vfmacc.vf v16, ft0, v0 - addi mt, mt, -2 - vfmacc.vf v16, ft1, v8 - - beqz mt, 1f - - vle32.v v0, (mpm) - flw ft0, (vt) - vfmacc.vf v16, ft0, v0 - -1: vse32.v v16, (cp) # Store the vector of results - - slli nvlb, nvl, 2 # Current vl in bytes - add cp, cp, nvlb # Bump the output pointer - add mp, mp, nvlb # Bump the matrix pointer to the next set of columns - - sub n, n, nvl # Track how much of output we've computed - bnez n, c_col_loop # Done with entire computation? - -exit: - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/vec-sgemv_main.c b/bb-tests/workloads/src/CTest/rvv/vec-sgemv/vec-sgemv_main.c deleted file mode 100644 index 4dc55c68..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-sgemv/vec-sgemv_main.c +++ /dev/null @@ -1,40 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// SGEMV benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized sgemm implementation. - -#include "util.h" -#include -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_sgemv(size_t, size_t, const float *, const float *, float *); - -int main(int argc, char *argv[]) { - float results_data[N_DIM] = {0}; - - printf("sgemv M,N = %ld,%ld\n", M_DIM, N_DIM); -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_sgemv(M_DIM, N_DIM, input_data_x, input_data_A, results_data); - memset(results_data, 0, sizeof(results_data)); -#endif - - // Do the sgemv - setStats(1); - vec_sgemv(M_DIM, N_DIM, input_data_x, input_data_A, results_data); - setStats(0); - - // Check the results - return verifyFloat(N_DIM, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/dataset1.h deleted file mode 100644 index 67db4dcf..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/dataset1.h +++ /dev/null @@ -1,1040 +0,0 @@ -#define KH 3 -#define KW 3 -#define IH 72 -#define IW 72 -#define I_SIZE 5184 -#define OH 70 -#define OW 70 -#define O_SIZE 4900 - -float input_k1[IW] = { - 1.0, - 1.0, - 1.0, -}; -float input_k2[IH] = { - 1.0, - 1.0, - 1.0, -}; -float input_image[I_SIZE] = { - 2.0, 0.0, 0.0, 6.0, 80.0, 64.0, 184.0, 26.0, 4.0, 128.0, 26.0, - 30.0, 208.0, 16.0, 384.0, 32.0, 52.0, 40.0, 42.0, 19.0, 36.0, 18.0, - 12.0, 80.0, 32.0, 30.0, 400.0, 22.0, 104.0, 432.0, 80.0, 400.0, 56.0, - 184.0, 56.0, 4.0, 224.0, 8.0, 28.0, 40.0, 2.0, 288.0, 208.0, 108.0, - 32.0, 5.0, 18.0, 36.0, 60.0, 10.0, 88.0, 108.0, 368.0, 20.0, 28.0, - 108.0, 26.0, 23.0, 28.0, 32.0, 28.0, 176.0, 34.0, 96.0, 168.0, 4.0, - 16.0, 248.0, 416.0, 108.0, 4.0, 6.0, 0.0, 192.0, 248.0, 22.0, 72.0, - 32.0, 176.0, 144.0, 46.0, 12.0, 88.0, 128.0, 26.0, 22.0, 12.0, 124.0, - 36.0, 48.0, 18.0, 50.0, 22.0, 112.0, 10.0, 4.0, 288.0, 36.0, 96.0, - 120.0, 16.0, 92.0, 46.0, 40.0, 100.0, 31.0, 24.0, 240.0, 224.0, 496.0, - 124.0, 56.0, 248.0, 18.0, 26.0, 416.0, 232.0, 20.0, 31.0, 20.0, 96.0, - 26.0, 36.0, 88.0, 88.0, 48.0, 28.0, 16.0, 96.0, 80.0, 48.0, 6.0, - 64.0, 160.0, 10.0, 32.0, 56.0, 224.0, 224.0, 248.0, 144.0, 24.0, 52.0, - 64.0, 100.0, 21.0, 416.0, 64.0, 62.0, 18.0, 46.0, 16.0, 20.0, 12.0, - 224.0, 30.0, 50.0, 27.0, 76.0, 23.0, 16.0, 16.0, 208.0, 44.0, 31.0, - 256.0, 50.0, 19.0, 42.0, 80.0, 2.0, 48.0, 8.0, 336.0, 56.0, 32.0, - 24.0, 60.0, 8.0, 48.0, 9.0, 56.0, 12.0, 80.0, 64.0, 104.0, 10.0, - 32.0, 58.0, 200.0, 76.0, 100.0, 56.0, 68.0, 152.0, 0.0, 18.0, 208.0, - 60.0, 88.0, 116.0, 96.0, 432.0, 6.0, 68.0, 72.0, 52.0, 104.0, 128.0, - 54.0, 2.0, 160.0, 96.0, 24.0, 15.0, 56.0, 192.0, 480.0, 12.0, 88.0, - 48.0, 9.0, 12.0, 18.0, 20.0, 2.0, 320.0, 192.0, 54.0, 16.0, 13.0, - 208.0, 160.0, 40.0, 272.0, 80.0, 480.0, 25.0, 448.0, 40.0, 13.0, 48.0, - 23.0, 480.0, 92.0, 6.0, 72.0, 8.0, 248.0, 288.0, 60.0, 44.0, 80.0, - 56.0, 124.0, 256.0, 208.0, 58.0, 22.0, 112.0, 208.0, 48.0, 60.0, 64.0, - 232.0, 14.0, 224.0, 20.0, 248.0, 12.0, 184.0, 88.0, 168.0, 288.0, 38.0, - 400.0, 50.0, 192.0, 34.0, 0.0, 400.0, 464.0, 10.0, 30.0, 1.0, 144.0, - 24.0, 44.0, 8.0, 7.0, 0.0, 272.0, 240.0, 18.0, 160.0, 26.0, 104.0, - 2.0, 80.0, 496.0, 56.0, 208.0, 176.0, 18.0, 88.0, 32.0, 48.0, 10.0, - 24.0, 7.0, 4.0, 72.0, 34.0, 352.0, 88.0, 120.0, 84.0, 128.0, 24.0, - 30.0, 416.0, 288.0, 16.0, 7.0, 72.0, 100.0, 56.0, 208.0, 48.0, 72.0, - 28.0, 52.0, 32.0, 88.0, 7.0, 5.0, 18.0, 10.0, 25.0, 40.0, 14.0, - 24.0, 38.0, 30.0, 48.0, 30.0, 6.0, 28.0, 48.0, 42.0, 72.0, 2.0, - 100.0, 56.0, 160.0, 24.0, 56.0, 1.0, 4.0, 128.0, 31.0, 116.0, 92.0, - 88.0, 11.0, 26.0, 40.0, 80.0, 16.0, 80.0, 80.0, 28.0, 26.0, 112.0, - 216.0, 76.0, 368.0, 2.0, 50.0, 136.0, 3.0, 176.0, 480.0, 168.0, 64.0, - 2.0, 56.0, 96.0, 112.0, 320.0, 24.0, 8.0, 56.0, 96.0, 224.0, 26.0, - 432.0, 42.0, 32.0, 48.0, 13.0, 116.0, 160.0, 60.0, 17.0, 112.0, 116.0, - 18.0, 352.0, 112.0, 108.0, 6.0, 50.0, 144.0, 368.0, 28.0, 448.0, 21.0, - 4.0, 112.0, 25.0, 38.0, 30.0, 23.0, 12.0, 0.0, 16.0, 10.0, 60.0, - 36.0, 40.0, 116.0, 14.0, 30.0, 368.0, 18.0, 80.0, 12.0, 4.0, 64.0, - 136.0, 21.0, 248.0, 192.0, 128.0, 8.0, 352.0, 29.0, 12.0, 56.0, 84.0, - 8.0, 0.0, 352.0, 352.0, 34.0, 48.0, 96.0, 112.0, 4.0, 64.0, 50.0, - 10.0, 6.0, 100.0, 68.0, 192.0, 240.0, 5.0, 128.0, 40.0, 80.0, 24.0, - 84.0, 224.0, 128.0, 0.0, 34.0, 20.0, 13.0, 448.0, 2.0, 21.0, 80.0, - 48.0, 29.0, 104.0, 42.0, 144.0, 4.0, 15.0, 24.0, 26.0, 22.0, 4.0, - 48.0, 27.0, 12.0, 192.0, 32.0, 368.0, 64.0, 32.0, 9.0, 128.0, 58.0, - 352.0, 100.0, 144.0, 120.0, 26.0, 56.0, 48.0, 12.0, 60.0, 28.0, 16.0, - 432.0, 17.0, 29.0, 96.0, 28.0, 88.0, 56.0, 56.0, 6.0, 30.0, 0.0, - 2.0, 232.0, 58.0, 0.0, 0.0, 26.0, 27.0, 32.0, 144.0, 100.0, 80.0, - 62.0, 22.0, 62.0, 4.0, 232.0, 22.0, 152.0, 160.0, 54.0, 176.0, 12.0, - 144.0, 216.0, 352.0, 16.0, 320.0, 16.0, 48.0, 128.0, 52.0, 8.0, 6.0, - 42.0, 104.0, 6.0, 24.0, 7.0, 28.0, 232.0, 60.0, 128.0, 56.0, 26.0, - 54.0, 176.0, 22.0, 416.0, 12.0, 384.0, 32.0, 30.0, 28.0, 352.0, 44.0, - 72.0, 160.0, 448.0, 120.0, 32.0, 88.0, 52.0, 0.0, 448.0, 16.0, 22.0, - 30.0, 32.0, 14.0, 108.0, 112.0, 256.0, 5.0, 120.0, 31.0, 400.0, 128.0, - 248.0, 64.0, 9.0, 116.0, 23.0, 58.0, 32.0, 4.0, 25.0, 112.0, 92.0, - 464.0, 54.0, 92.0, 18.0, 16.0, 84.0, 432.0, 72.0, 60.0, 40.0, 64.0, - 56.0, 96.0, 32.0, 116.0, 0.0, 44.0, 1.0, 4.0, 6.0, 32.0, 28.0, - 104.0, 112.0, 58.0, 336.0, 28.0, 192.0, 32.0, 14.0, 224.0, 44.0, 1.0, - 200.0, 0.0, 50.0, 29.0, 320.0, 144.0, 60.0, 72.0, 60.0, 144.0, 27.0, - 30.0, 208.0, 84.0, 16.0, 432.0, 352.0, 14.0, 192.0, 18.0, 92.0, 14.0, - 232.0, 28.0, 16.0, 116.0, 128.0, 30.0, 4.0, 192.0, 8.0, 160.0, 20.0, - 464.0, 208.0, 240.0, 152.0, 352.0, 16.0, 17.0, 10.0, 144.0, 21.0, 108.0, - 480.0, 80.0, 52.0, 7.0, 24.0, 52.0, 52.0, 108.0, 200.0, 36.0, 128.0, - 400.0, 448.0, 48.0, 256.0, 320.0, 368.0, 30.0, 64.0, 54.0, 272.0, 176.0, - 0.0, 32.0, 52.0, 120.0, 144.0, 88.0, 64.0, 16.0, 160.0, 248.0, 120.0, - 176.0, 31.0, 23.0, 54.0, 176.0, 24.0, 208.0, 0.0, 304.0, 28.0, 168.0, - 208.0, 72.0, 6.0, 0.0, 4.0, 352.0, 26.0, 64.0, 176.0, 28.0, 16.0, - 192.0, 368.0, 29.0, 30.0, 320.0, 224.0, 0.0, 42.0, 36.0, 112.0, 24.0, - 48.0, 136.0, 22.0, 11.0, 208.0, 184.0, 240.0, 64.0, 464.0, 10.0, 216.0, - 32.0, 19.0, 16.0, 62.0, 38.0, 52.0, 10.0, 10.0, 120.0, 80.0, 208.0, - 15.0, 416.0, 9.0, 4.0, 368.0, 112.0, 15.0, 56.0, 176.0, 64.0, 400.0, - 12.0, 320.0, 12.0, 64.0, 288.0, 248.0, 4.0, 80.0, 496.0, 20.0, 29.0, - 30.0, 64.0, 28.0, 136.0, 25.0, 112.0, 16.0, 56.0, 30.0, 32.0, 124.0, - 14.0, 22.0, 2.0, 2.0, 48.0, 120.0, 27.0, 240.0, 18.0, 8.0, 184.0, - 54.0, 192.0, 112.0, 48.0, 84.0, 0.0, 22.0, 96.0, 10.0, 36.0, 1.0, - 46.0, 248.0, 7.0, 24.0, 2.0, 16.0, 124.0, 44.0, 4.0, 10.0, 224.0, - 160.0, 0.0, 25.0, 24.0, 64.0, 13.0, 160.0, 28.0, 18.0, 32.0, 19.0, - 124.0, 176.0, 15.0, 28.0, 432.0, 112.0, 52.0, 80.0, 32.0, 6.0, 0.0, - 32.0, 72.0, 464.0, 16.0, 224.0, 128.0, 60.0, 29.0, 64.0, 144.0, 44.0, - 20.0, 20.0, 36.0, 176.0, 92.0, 304.0, 56.0, 32.0, 64.0, 0.0, 26.0, - 216.0, 0.0, 464.0, 240.0, 72.0, 58.0, 10.0, 240.0, 31.0, 4.0, 56.0, - 84.0, 120.0, 72.0, 44.0, 248.0, 7.0, 136.0, 60.0, 8.0, 29.0, 30.0, - 60.0, 0.0, 6.0, 32.0, 6.0, 184.0, 288.0, 76.0, 192.0, 27.0, 34.0, - 112.0, 20.0, 96.0, 58.0, 176.0, 30.0, 42.0, 1.0, 80.0, 19.0, 96.0, - 56.0, 25.0, 36.0, 224.0, 26.0, 256.0, 8.0, 50.0, 24.0, 36.0, 4.0, - 128.0, 2.0, 64.0, 26.0, 28.0, 22.0, 104.0, 100.0, 152.0, 29.0, 112.0, - 8.0, 12.0, 112.0, 34.0, 4.0, 0.0, 6.0, 46.0, 224.0, 2.0, 192.0, - 368.0, 248.0, 16.0, 32.0, 9.0, 29.0, 120.0, 14.0, 0.0, 22.0, 432.0, - 96.0, 240.0, 96.0, 160.0, 1.0, 32.0, 176.0, 9.0, 26.0, 84.0, 160.0, - 160.0, 160.0, 15.0, 12.0, 60.0, 13.0, 52.0, 18.0, 46.0, 128.0, 336.0, - 32.0, 0.0, 88.0, 24.0, 38.0, 16.0, 44.0, 1.0, 368.0, 40.0, 50.0, - 248.0, 62.0, 64.0, 168.0, 112.0, 48.0, 15.0, 16.0, 30.0, 120.0, 0.0, - 10.0, 88.0, 2.0, 80.0, 30.0, 176.0, 8.0, 20.0, 144.0, 0.0, 116.0, - 8.0, 40.0, 48.0, 120.0, 168.0, 48.0, 22.0, 96.0, 88.0, 25.0, 92.0, - 60.0, 16.0, 2.0, 12.0, 112.0, 12.0, 54.0, 44.0, 16.0, 232.0, 10.0, - 11.0, 60.0, 60.0, 92.0, 200.0, 480.0, 128.0, 58.0, 28.0, 256.0, 168.0, - 104.0, 56.0, 92.0, 28.0, 80.0, 80.0, 28.0, 108.0, 16.0, 84.0, 336.0, - 50.0, 224.0, 16.0, 14.0, 80.0, 152.0, 18.0, 272.0, 0.0, 29.0, 56.0, - 27.0, 44.0, 16.0, 152.0, 128.0, 29.0, 11.0, 24.0, 112.0, 88.0, 30.0, - 0.0, 80.0, 52.0, 14.0, 1.0, 3.0, 160.0, 72.0, 36.0, 176.0, 12.0, - 6.0, 46.0, 6.0, 88.0, 26.0, 352.0, 48.0, 29.0, 19.0, 6.0, 17.0, - 72.0, 108.0, 352.0, 14.0, 288.0, 44.0, 192.0, 24.0, 112.0, 20.0, 112.0, - 12.0, 152.0, 88.0, 4.0, 496.0, 232.0, 30.0, 304.0, 336.0, 176.0, 104.0, - 0.0, 21.0, 29.0, 21.0, 16.0, 84.0, 68.0, 12.0, 64.0, 23.0, 224.0, - 80.0, 176.0, 23.0, 10.0, 16.0, 6.0, 44.0, 16.0, 3.0, 304.0, 8.0, - 464.0, 0.0, 104.0, 120.0, 116.0, 6.0, 28.0, 400.0, 2.0, 0.0, 64.0, - 64.0, 44.0, 50.0, 28.0, 48.0, 192.0, 21.0, 52.0, 26.0, 6.0, 304.0, - 28.0, 52.0, 24.0, 17.0, 12.0, 42.0, 144.0, 248.0, 29.0, 42.0, 4.0, - 152.0, 22.0, 30.0, 52.0, 144.0, 6.0, 12.0, 320.0, 48.0, 24.0, 304.0, - 9.0, 88.0, 208.0, 224.0, 48.0, 8.0, 24.0, 152.0, 32.0, 144.0, 17.0, - 28.0, 40.0, 7.0, 16.0, 48.0, 21.0, 32.0, 60.0, 40.0, 22.0, 26.0, - 14.0, 40.0, 34.0, 22.0, 52.0, 8.0, 36.0, 19.0, 14.0, 58.0, 36.0, - 128.0, 14.0, 208.0, 224.0, 60.0, 8.0, 464.0, 128.0, 28.0, 448.0, 152.0, - 56.0, 192.0, 464.0, 432.0, 17.0, 1.0, 160.0, 20.0, 0.0, 40.0, 32.0, - 20.0, 36.0, 68.0, 32.0, 16.0, 12.0, 116.0, 4.0, 44.0, 96.0, 52.0, - 5.0, 160.0, 52.0, 31.0, 36.0, 216.0, 208.0, 168.0, 14.0, 44.0, 16.0, - 26.0, 40.0, 13.0, 224.0, 6.0, 224.0, 168.0, 80.0, 320.0, 46.0, 208.0, - 25.0, 44.0, 112.0, 34.0, 224.0, 48.0, 11.0, 16.0, 12.0, 72.0, 108.0, - 44.0, 11.0, 32.0, 16.0, 200.0, 112.0, 56.0, 232.0, 400.0, 160.0, 60.0, - 16.0, 176.0, 14.0, 26.0, 44.0, 10.0, 15.0, 6.0, 54.0, 24.0, 76.0, - 26.0, 64.0, 25.0, 4.0, 48.0, 24.0, 100.0, 108.0, 72.0, 72.0, 64.0, - 240.0, 2.0, 192.0, 12.0, 23.0, 50.0, 48.0, 60.0, 320.0, 32.0, 18.0, - 84.0, 29.0, 192.0, 13.0, 88.0, 48.0, 17.0, 84.0, 368.0, 400.0, 72.0, - 232.0, 368.0, 10.0, 144.0, 64.0, 22.0, 184.0, 232.0, 16.0, 8.0, 48.0, - 92.0, 16.0, 416.0, 4.0, 14.0, 120.0, 32.0, 26.0, 48.0, 24.0, 8.0, - 0.0, 13.0, 92.0, 44.0, 16.0, 32.0, 24.0, 52.0, 200.0, 64.0, 224.0, - 16.0, 32.0, 6.0, 13.0, 46.0, 112.0, 108.0, 26.0, 50.0, 27.0, 31.0, - 248.0, 352.0, 80.0, 68.0, 32.0, 248.0, 0.0, 136.0, 232.0, 240.0, 44.0, - 24.0, 42.0, 54.0, 80.0, 14.0, 232.0, 104.0, 48.0, 32.0, 64.0, 112.0, - 16.0, 13.0, 248.0, 128.0, 120.0, 22.0, 200.0, 480.0, 448.0, 4.0, 152.0, - 84.0, 4.0, 104.0, 0.0, 42.0, 80.0, 256.0, 25.0, 8.0, 24.0, 64.0, - 8.0, 30.0, 92.0, 152.0, 20.0, 60.0, 8.0, 16.0, 8.0, 160.0, 16.0, - 14.0, 40.0, 19.0, 144.0, 36.0, 256.0, 16.0, 26.0, 26.0, 48.0, 112.0, - 9.0, 56.0, 32.0, 184.0, 224.0, 21.0, 480.0, 4.0, 62.0, 14.0, 116.0, - 192.0, 60.0, 36.0, 192.0, 28.0, 36.0, 16.0, 12.0, 36.0, 44.0, 400.0, - 36.0, 64.0, 120.0, 54.0, 52.0, 21.0, 36.0, 48.0, 0.0, 27.0, 27.0, - 24.0, 20.0, 96.0, 448.0, 20.0, 400.0, 20.0, 36.0, 120.0, 464.0, 2.0, - 60.0, 3.0, 11.0, 104.0, 60.0, 88.0, 40.0, 40.0, 192.0, 8.0, 76.0, - 248.0, 26.0, 32.0, 48.0, 80.0, 208.0, 62.0, 10.0, 58.0, 480.0, 8.0, - 16.0, 84.0, 46.0, 116.0, 64.0, 56.0, 6.0, 416.0, 4.0, 44.0, 176.0, - 224.0, 32.0, 8.0, 128.0, 216.0, 32.0, 144.0, 44.0, 352.0, 232.0, 0.0, - 480.0, 40.0, 0.0, 20.0, 12.0, 120.0, 20.0, 26.0, 50.0, 44.0, 0.0, - 0.0, 152.0, 8.0, 496.0, 16.0, 288.0, 224.0, 0.0, 68.0, 32.0, 16.0, - 216.0, 112.0, 32.0, 6.0, 48.0, 5.0, 36.0, 60.0, 240.0, 1.0, 18.0, - 72.0, 18.0, 6.0, 144.0, 16.0, 52.0, 56.0, 352.0, 400.0, 224.0, 52.0, - 40.0, 8.0, 29.0, 0.0, 112.0, 28.0, 208.0, 18.0, 208.0, 38.0, 56.0, - 54.0, 44.0, 36.0, 8.0, 36.0, 192.0, 12.0, 96.0, 26.0, 400.0, 304.0, - 224.0, 184.0, 8.0, 9.0, 1.0, 34.0, 256.0, 52.0, 120.0, 12.0, 80.0, - 320.0, 3.0, 16.0, 92.0, 1.0, 20.0, 176.0, 464.0, 144.0, 16.0, 464.0, - 192.0, 16.0, 38.0, 124.0, 56.0, 24.0, 108.0, 112.0, 88.0, 176.0, 352.0, - 48.0, 16.0, 18.0, 48.0, 112.0, 31.0, 18.0, 112.0, 4.0, 44.0, 216.0, - 136.0, 4.0, 26.0, 2.0, 208.0, 248.0, 6.0, 22.0, 5.0, 104.0, 26.0, - 176.0, 272.0, 256.0, 11.0, 56.0, 40.0, 22.0, 58.0, 36.0, 0.0, 128.0, - 3.0, 21.0, 0.0, 30.0, 400.0, 6.0, 144.0, 128.0, 128.0, 6.0, 32.0, - 208.0, 21.0, 40.0, 80.0, 28.0, 50.0, 124.0, 60.0, 112.0, 0.0, 100.0, - 14.0, 40.0, 288.0, 3.0, 60.0, 72.0, 176.0, 4.0, 8.0, 26.0, 40.0, - 6.0, 34.0, 18.0, 6.0, 56.0, 8.0, 42.0, 64.0, 120.0, 80.0, 16.0, - 116.0, 352.0, 24.0, 58.0, 27.0, 60.0, 192.0, 6.0, 336.0, 64.0, 24.0, - 400.0, 58.0, 24.0, 34.0, 0.0, 128.0, 46.0, 19.0, 26.0, 88.0, 18.0, - 60.0, 152.0, 48.0, 36.0, 56.0, 58.0, 0.0, 176.0, 208.0, 32.0, 10.0, - 16.0, 16.0, 400.0, 32.0, 4.0, 10.0, 16.0, 448.0, 216.0, 240.0, 160.0, - 352.0, 31.0, 27.0, 6.0, 32.0, 176.0, 58.0, 58.0, 48.0, 32.0, 208.0, - 11.0, 80.0, 448.0, 7.0, 96.0, 32.0, 84.0, 56.0, 124.0, 4.0, 26.0, - 48.0, 8.0, 9.0, 304.0, 24.0, 144.0, 272.0, 208.0, 28.0, 184.0, 240.0, - 8.0, 17.0, 40.0, 22.0, 40.0, 448.0, 160.0, 4.0, 2.0, 256.0, 232.0, - 48.0, 272.0, 11.0, 12.0, 62.0, 28.0, 80.0, 208.0, 8.0, 8.0, 44.0, - 20.0, 0.0, 152.0, 62.0, 20.0, 144.0, 96.0, 84.0, 0.0, 192.0, 24.0, - 160.0, 8.0, 384.0, 128.0, 120.0, 64.0, 184.0, 224.0, 96.0, 48.0, 96.0, - 64.0, 464.0, 100.0, 20.0, 24.0, 208.0, 6.0, 208.0, 100.0, 28.0, 96.0, - 36.0, 152.0, 12.0, 0.0, 84.0, 58.0, 34.0, 52.0, 31.0, 64.0, 24.0, - 4.0, 18.0, 28.0, 72.0, 13.0, 15.0, 56.0, 56.0, 48.0, 128.0, 7.0, - 32.0, 38.0, 96.0, 192.0, 256.0, 6.0, 16.0, 96.0, 56.0, 4.0, 112.0, - 12.0, 48.0, 42.0, 2.0, 50.0, 80.0, 25.0, 8.0, 7.0, 208.0, 5.0, - 160.0, 120.0, 96.0, 120.0, 50.0, 336.0, 42.0, 7.0, 44.0, 192.0, 30.0, - 184.0, 34.0, 16.0, 88.0, 48.0, 4.0, 46.0, 18.0, 8.0, 36.0, 60.0, - 18.0, 64.0, 26.0, 8.0, 38.0, 34.0, 56.0, 24.0, 40.0, 44.0, 136.0, - 112.0, 192.0, 104.0, 80.0, 32.0, 48.0, 20.0, 34.0, 32.0, 72.0, 60.0, - 46.0, 56.0, 120.0, 384.0, 11.0, 96.0, 14.0, 28.0, 52.0, 58.0, 120.0, - 18.0, 20.0, 496.0, 52.0, 32.0, 192.0, 20.0, 52.0, 20.0, 24.0, 48.0, - 40.0, 72.0, 2.0, 56.0, 80.0, 64.0, 124.0, 336.0, 84.0, 448.0, 2.0, - 432.0, 12.0, 64.0, 1.0, 288.0, 100.0, 50.0, 80.0, 116.0, 112.0, 20.0, - 184.0, 30.0, 40.0, 12.0, 15.0, 22.0, 68.0, 48.0, 208.0, 256.0, 304.0, - 8.0, 8.0, 216.0, 32.0, 38.0, 50.0, 136.0, 120.0, 52.0, 17.0, 15.0, - 12.0, 200.0, 20.0, 192.0, 44.0, 480.0, 8.0, 272.0, 88.0, 14.0, 52.0, - 18.0, 46.0, 58.0, 16.0, 464.0, 2.0, 44.0, 4.0, 23.0, 112.0, 48.0, - 52.0, 27.0, 224.0, 18.0, 56.0, 18.0, 22.0, 120.0, 304.0, 42.0, 52.0, - 56.0, 27.0, 21.0, 8.0, 0.0, 32.0, 100.0, 30.0, 4.0, 18.0, 20.0, - 23.0, 0.0, 56.0, 36.0, 192.0, 64.0, 40.0, 48.0, 21.0, 100.0, 11.0, - 28.0, 104.0, 34.0, 176.0, 0.0, 432.0, 352.0, 9.0, 60.0, 19.0, 112.0, - 42.0, 24.0, 4.0, 72.0, 208.0, 168.0, 42.0, 56.0, 112.0, 32.0, 16.0, - 12.0, 64.0, 88.0, 22.0, 416.0, 240.0, 136.0, 15.0, 54.0, 304.0, 56.0, - 56.0, 36.0, 112.0, 84.0, 21.0, 21.0, 25.0, 104.0, 336.0, 48.0, 8.0, - 38.0, 30.0, 128.0, 116.0, 448.0, 248.0, 60.0, 46.0, 24.0, 176.0, 38.0, - 192.0, 52.0, 14.0, 168.0, 104.0, 240.0, 8.0, 2.0, 496.0, 13.0, 40.0, - 96.0, 8.0, 0.0, 224.0, 3.0, 36.0, 0.0, 28.0, 32.0, 144.0, 176.0, - 116.0, 400.0, 48.0, 120.0, 112.0, 368.0, 384.0, 416.0, 144.0, 256.0, 116.0, - 8.0, 0.0, 9.0, 16.0, 160.0, 46.0, 4.0, 152.0, 144.0, 12.0, 60.0, - 368.0, 224.0, 232.0, 192.0, 72.0, 24.0, 28.0, 28.0, 6.0, 416.0, 32.0, - 112.0, 320.0, 100.0, 112.0, 42.0, 120.0, 64.0, 64.0, 432.0, 24.0, 480.0, - 152.0, 6.0, 96.0, 54.0, 34.0, 92.0, 24.0, 240.0, 32.0, 18.0, 20.0, - 48.0, 18.0, 18.0, 64.0, 0.0, 352.0, 36.0, 320.0, 288.0, 200.0, 17.0, - 84.0, 160.0, 4.0, 112.0, 48.0, 320.0, 304.0, 38.0, 24.0, 124.0, 100.0, - 336.0, 108.0, 8.0, 76.0, 48.0, 52.0, 54.0, 22.0, 20.0, 9.0, 8.0, - 40.0, 30.0, 320.0, 8.0, 120.0, 72.0, 40.0, 12.0, 32.0, 4.0, 7.0, - 24.0, 11.0, 304.0, 16.0, 27.0, 48.0, 0.0, 20.0, 352.0, 24.0, 100.0, - 112.0, 208.0, 8.0, 448.0, 26.0, 42.0, 40.0, 0.0, 36.0, 384.0, 44.0, - 62.0, 240.0, 108.0, 192.0, 2.0, 0.0, 88.0, 13.0, 208.0, 128.0, 28.0, - 16.0, 26.0, 152.0, 464.0, 124.0, 13.0, 184.0, 10.0, 30.0, 26.0, 68.0, - 16.0, 224.0, 176.0, 14.0, 64.0, 48.0, 8.0, 416.0, 4.0, 6.0, 112.0, - 192.0, 112.0, 16.0, 152.0, 96.0, 24.0, 304.0, 20.0, 116.0, 16.0, 128.0, - 4.0, 5.0, 44.0, 96.0, 6.0, 232.0, 36.0, 40.0, 288.0, 7.0, 208.0, - 80.0, 192.0, 5.0, 48.0, 36.0, 14.0, 36.0, 17.0, 152.0, 0.0, 160.0, - 256.0, 184.0, 8.0, 46.0, 48.0, 3.0, 272.0, 288.0, 1.0, 128.0, 240.0, - 104.0, 100.0, 120.0, 16.0, 240.0, 60.0, 38.0, 11.0, 104.0, 38.0, 60.0, - 80.0, 112.0, 216.0, 192.0, 6.0, 136.0, 16.0, 72.0, 112.0, 64.0, 12.0, - 256.0, 3.0, 40.0, 16.0, 176.0, 19.0, 116.0, 464.0, 96.0, 368.0, 352.0, - 80.0, 44.0, 38.0, 136.0, 5.0, 30.0, 6.0, 304.0, 27.0, 12.0, 200.0, - 248.0, 120.0, 100.0, 9.0, 160.0, 128.0, 22.0, 24.0, 224.0, 96.0, 29.0, - 19.0, 80.0, 52.0, 432.0, 200.0, 10.0, 352.0, 128.0, 208.0, 176.0, 15.0, - 24.0, 8.0, 112.0, 3.0, 16.0, 448.0, 304.0, 352.0, 56.0, 11.0, 112.0, - 14.0, 32.0, 176.0, 336.0, 2.0, 124.0, 480.0, 52.0, 16.0, 4.0, 56.0, - 320.0, 304.0, 384.0, 72.0, 0.0, 320.0, 24.0, 464.0, 9.0, 40.0, 50.0, - 12.0, 96.0, 30.0, 48.0, 20.0, 104.0, 96.0, 24.0, 14.0, 288.0, 18.0, - 25.0, 0.0, 7.0, 200.0, 216.0, 116.0, 16.0, 18.0, 8.0, 384.0, 34.0, - 480.0, 22.0, 100.0, 384.0, 304.0, 5.0, 116.0, 144.0, 0.0, 100.0, 4.0, - 100.0, 20.0, 336.0, 112.0, 21.0, 464.0, 64.0, 232.0, 21.0, 30.0, 3.0, - 12.0, 21.0, 104.0, 6.0, 9.0, 20.0, 336.0, 368.0, 28.0, 25.0, 48.0, - 232.0, 12.0, 24.0, 68.0, 14.0, 44.0, 25.0, 50.0, 36.0, 28.0, 416.0, - 72.0, 4.0, 464.0, 60.0, 4.0, 400.0, 248.0, 13.0, 22.0, 104.0, 100.0, - 1.0, 1.0, 72.0, 18.0, 27.0, 28.0, 384.0, 12.0, 144.0, 112.0, 20.0, - 46.0, 112.0, 64.0, 124.0, 272.0, 32.0, 8.0, 7.0, 76.0, 8.0, 104.0, - 8.0, 84.0, 80.0, 42.0, 184.0, 112.0, 52.0, 12.0, 0.0, 88.0, 56.0, - 4.0, 22.0, 24.0, 192.0, 136.0, 4.0, 336.0, 144.0, 30.0, 72.0, 38.0, - 48.0, 20.0, 192.0, 2.0, 24.0, 28.0, 56.0, 208.0, 60.0, 27.0, 0.0, - 23.0, 6.0, 84.0, 0.0, 176.0, 64.0, 25.0, 144.0, 48.0, 24.0, 15.0, - 52.0, 448.0, 64.0, 192.0, 30.0, 20.0, 52.0, 176.0, 8.0, 104.0, 152.0, - 496.0, 36.0, 12.0, 256.0, 32.0, 18.0, 32.0, 7.0, 168.0, 72.0, 14.0, - 104.0, 20.0, 0.0, 25.0, 12.0, 384.0, 64.0, 42.0, 240.0, 80.0, 352.0, - 9.0, 0.0, 432.0, 112.0, 48.0, 12.0, 128.0, 58.0, 76.0, 184.0, 320.0, - 32.0, 320.0, 160.0, 336.0, 128.0, 480.0, 136.0, 0.0, 10.0, 240.0, 96.0, - 80.0, 104.0, 52.0, 48.0, 80.0, 20.0, 288.0, 232.0, 50.0, 10.0, 116.0, - 112.0, 48.0, 192.0, 4.0, 224.0, 208.0, 16.0, 496.0, 72.0, 64.0, 1.0, - 27.0, 38.0, 21.0, 52.0, 14.0, 20.0, 40.0, 224.0, 48.0, 46.0, 24.0, - 64.0, 2.0, 28.0, 2.0, 19.0, 48.0, 11.0, 48.0, 32.0, 32.0, 72.0, - 120.0, 7.0, 56.0, 152.0, 176.0, 272.0, 12.0, 25.0, 120.0, 68.0, 200.0, - 288.0, 92.0, 448.0, 336.0, 152.0, 44.0, 19.0, 320.0, 128.0, 44.0, 38.0, - 20.0, 136.0, 52.0, 96.0, 52.0, 68.0, 2.0, 0.0, 176.0, 168.0, 16.0, - 56.0, 208.0, 20.0, 240.0, 160.0, 60.0, 60.0, 112.0, 14.0, 216.0, 6.0, - 76.0, 30.0, 2.0, 116.0, 448.0, 52.0, 6.0, 40.0, 224.0, 1.0, 46.0, - 20.0, 34.0, 92.0, 46.0, 144.0, 10.0, 128.0, 27.0, 16.0, 432.0, 240.0, - 176.0, 26.0, 52.0, 60.0, 28.0, 448.0, 72.0, 152.0, 128.0, 304.0, 464.0, - 80.0, 184.0, 22.0, 32.0, 20.0, 15.0, 26.0, 272.0, 352.0, 40.0, 248.0, - 4.0, 4.0, 6.0, 208.0, 8.0, 104.0, 12.0, 496.0, 68.0, 184.0, 400.0, - 112.0, 8.0, 10.0, 84.0, 112.0, 208.0, 5.0, 496.0, 136.0, 14.0, 92.0, - 152.0, 160.0, 88.0, 48.0, 0.0, 44.0, 368.0, 88.0, 4.0, 216.0, 336.0, - 56.0, 22.0, 38.0, 28.0, 176.0, 72.0, 9.0, 232.0, 28.0, 72.0, 240.0, - 32.0, 288.0, 304.0, 64.0, 96.0, 84.0, 30.0, 11.0, 208.0, 208.0, 496.0, - 7.0, 448.0, 8.0, 80.0, 23.0, 56.0, 60.0, 12.0, 88.0, 240.0, 10.0, - 29.0, 480.0, 20.0, 288.0, 10.0, 116.0, 30.0, 7.0, 96.0, 3.0, 31.0, - 168.0, 224.0, 27.0, 0.0, 480.0, 3.0, 4.0, 16.0, 432.0, 152.0, 0.0, - 10.0, 368.0, 20.0, 104.0, 88.0, 16.0, 48.0, 248.0, 17.0, 32.0, 26.0, - 30.0, 72.0, 176.0, 8.0, 44.0, 50.0, 52.0, 9.0, 38.0, 38.0, 464.0, - 56.0, 0.0, 104.0, 108.0, 5.0, 56.0, 32.0, 0.0, 4.0, 320.0, 26.0, - 272.0, 0.0, 120.0, 216.0, 96.0, 272.0, 152.0, 184.0, 176.0, 120.0, 24.0, - 48.0, 88.0, 160.0, 48.0, 23.0, 36.0, 32.0, 240.0, 112.0, 12.0, 2.0, - 13.0, 32.0, 108.0, 216.0, 22.0, 72.0, 58.0, 368.0, 84.0, 208.0, 128.0, - 16.0, 18.0, 352.0, 32.0, 23.0, 88.0, 52.0, 272.0, 4.0, 6.0, 16.0, - 288.0, 256.0, 152.0, 48.0, 480.0, 176.0, 176.0, 44.0, 38.0, 24.0, 320.0, - 448.0, 208.0, 56.0, 15.0, 4.0, 7.0, 40.0, 0.0, 32.0, 208.0, 40.0, - 384.0, 34.0, 192.0, 40.0, 20.0, 30.0, 240.0, 240.0, 0.0, 22.0, 40.0, - 64.0, 88.0, 120.0, 20.0, 304.0, 92.0, 2.0, 2.0, 25.0, 272.0, 464.0, - 13.0, 20.0, 92.0, 2.0, 16.0, 160.0, 168.0, 92.0, 320.0, 12.0, 416.0, - 44.0, 304.0, 28.0, 320.0, 8.0, 18.0, 48.0, 8.0, 11.0, 240.0, 20.0, - 184.0, 128.0, 112.0, 240.0, 6.0, 480.0, 44.0, 384.0, 144.0, 112.0, 48.0, - 16.0, 200.0, 92.0, 11.0, 32.0, 40.0, 10.0, 208.0, 13.0, 72.0, 3.0, - 40.0, 104.0, 22.0, 40.0, 2.0, 192.0, 128.0, 52.0, 248.0, 112.0, 144.0, - 8.0, 80.0, 80.0, 30.0, 76.0, 38.0, 1.0, 144.0, 16.0, 6.0, 16.0, - 1.0, 8.0, 25.0, 200.0, 32.0, 32.0, 496.0, 5.0, 152.0, 76.0, 6.0, - 2.0, 5.0, 432.0, 288.0, 384.0, 64.0, 288.0, 4.0, 184.0, 64.0, 40.0, - 32.0, 72.0, 31.0, 60.0, 0.0, 144.0, 29.0, 48.0, 176.0, 12.0, 34.0, - 20.0, 176.0, 24.0, 96.0, 120.0, 44.0, 320.0, 496.0, 108.0, 108.0, 384.0, - 32.0, 76.0, 304.0, 0.0, 12.0, 15.0, 96.0, 80.0, 0.0, 7.0, 64.0, - 216.0, 21.0, 120.0, 7.0, 0.0, 200.0, 416.0, 0.0, 38.0, 224.0, 36.0, - 7.0, 224.0, 120.0, 112.0, 32.0, 44.0, 128.0, 200.0, 84.0, 15.0, 224.0, - 24.0, 58.0, 60.0, 80.0, 7.0, 10.0, 32.0, 8.0, 21.0, 108.0, 14.0, - 200.0, 19.0, 44.0, 4.0, 104.0, 4.0, 20.0, 248.0, 28.0, 120.0, 108.0, - 288.0, 368.0, 72.0, 68.0, 60.0, 80.0, 16.0, 30.0, 29.0, 20.0, 40.0, - 60.0, 32.0, 152.0, 336.0, 120.0, 32.0, 8.0, 160.0, 40.0, 88.0, 176.0, - 11.0, 124.0, 208.0, 46.0, 32.0, 288.0, 272.0, 96.0, 0.0, 20.0, 13.0, - 208.0, 46.0, 5.0, 20.0, 224.0, 112.0, 31.0, 192.0, 10.0, 224.0, 28.0, - 22.0, 160.0, 112.0, 80.0, 62.0, 12.0, 80.0, 22.0, 14.0, 496.0, 48.0, - 60.0, 76.0, 32.0, 10.0, 18.0, 72.0, 23.0, 480.0, 240.0, 96.0, 240.0, - 112.0, 192.0, 200.0, 120.0, 112.0, 152.0, 27.0, 208.0, 448.0, 0.0, 1.0, - 48.0, 52.0, 184.0, 8.0, 384.0, 26.0, 24.0, 4.0, 12.0, 48.0, 104.0, - 20.0, 120.0, 200.0, 6.0, 3.0, 144.0, 8.0, 96.0, 14.0, 80.0, 12.0, - 48.0, 168.0, 336.0, 112.0, 128.0, 48.0, 62.0, 28.0, 56.0, 240.0, 28.0, - 416.0, 88.0, 14.0, 104.0, 224.0, 27.0, 104.0, 448.0, 4.0, 136.0, 96.0, - 60.0, 46.0, 84.0, 18.0, 4.0, 52.0, 12.0, 17.0, 12.0, 320.0, 28.0, - 8.0, 11.0, 28.0, 9.0, 432.0, 272.0, 144.0, 96.0, 38.0, 84.0, 40.0, - 216.0, 5.0, 232.0, 3.0, 432.0, 128.0, 184.0, 216.0, 48.0, 32.0, 2.0, - 224.0, 30.0, 36.0, 256.0, 28.0, 58.0, 23.0, 32.0, 108.0, 3.0, 6.0, - 16.0, 23.0, 480.0, 3.0, 108.0, 10.0, 32.0, 38.0, 152.0, 21.0, 32.0, - 12.0, 192.0, 16.0, 80.0, 9.0, 144.0, 10.0, 14.0, 124.0, 96.0, 23.0, - 400.0, 176.0, 200.0, 56.0, 30.0, 7.0, 100.0, 2.0, 32.0, 30.0, 17.0, - 18.0, 400.0, 2.0, 200.0, 304.0, 16.0, 32.0, 9.0, 24.0, 352.0, 336.0, - 152.0, 128.0, 240.0, 54.0, 96.0, 88.0, 28.0, 24.0, 136.0, 112.0, 11.0, - 224.0, 26.0, 18.0, 160.0, 26.0, 8.0, 12.0, 68.0, 0.0, 64.0, 18.0, - 6.0, 27.0, 176.0, 336.0, 192.0, 32.0, 6.0, 192.0, 52.0, 48.0, 200.0, - 17.0, 13.0, 72.0, 21.0, 76.0, 184.0, 28.0, 26.0, 4.0, 54.0, 208.0, - 24.0, 30.0, 88.0, 184.0, 52.0, 168.0, 34.0, 92.0, 68.0, 480.0, 9.0, - 0.0, 36.0, 8.0, 52.0, 16.0, 72.0, 200.0, 30.0, 8.0, 416.0, 112.0, - 56.0, 54.0, 176.0, 352.0, 10.0, 32.0, 112.0, 8.0, 168.0, 64.0, 144.0, - 24.0, 11.0, 80.0, 14.0, 38.0, 200.0, 24.0, 152.0, 96.0, 320.0, 6.0, - 32.0, 32.0, 96.0, 36.0, 64.0, 108.0, 12.0, 224.0, 176.0, 16.0, 0.0, - 48.0, 200.0, 48.0, 34.0, 28.0, 10.0, 6.0, 248.0, 352.0, 46.0, 30.0, - 256.0, 32.0, 116.0, 248.0, 184.0, 256.0, 8.0, 17.0, 76.0, 56.0, 16.0, - 192.0, 84.0, 28.0, 60.0, 24.0, 232.0, 62.0, 56.0, 496.0, 3.0, 480.0, - 36.0, 336.0, 432.0, 4.0, 96.0, 400.0, 5.0, 208.0, 96.0, 28.0, 48.0, - 28.0, 64.0, 36.0, 104.0, 14.0, 26.0, 13.0, 416.0, 32.0, 27.0, 480.0, - 224.0, 5.0, 72.0, 448.0, 34.0, 28.0, 38.0, 24.0, 64.0, 19.0, 448.0, - 112.0, 104.0, 120.0, 184.0, 116.0, 14.0, 176.0, 14.0, 112.0, 448.0, 10.0, - 5.0, 400.0, 56.0, 92.0, 320.0, 40.0, 4.0, 336.0, 48.0, 50.0, 120.0, - 25.0, 168.0, 8.0, 19.0, 144.0, 48.0, 80.0, 96.0, 42.0, 44.0, 8.0, - 240.0, 15.0, 42.0, 80.0, 31.0, 12.0, 30.0, 11.0, 2.0, 27.0, 80.0, - 64.0, 6.0, 27.0, 108.0, 60.0, 144.0, 28.0, 22.0, 11.0, 50.0, 144.0, - 16.0, 304.0, 44.0, 28.0, 6.0, 54.0, 104.0, 108.0, 20.0, 160.0, 28.0, - 32.0, 124.0, 29.0, 6.0, 56.0, 168.0, 17.0, 14.0, 28.0, 216.0, 304.0, - 224.0, 17.0, 64.0, 128.0, 36.0, 25.0, 16.0, 136.0, 42.0, 8.0, 16.0, - 32.0, 62.0, 24.0, 28.0, 0.0, 96.0, 240.0, 112.0, 7.0, 0.0, 31.0, - 48.0, 160.0, 40.0, 40.0, 30.0, 100.0, 36.0, 0.0, 72.0, 152.0, 64.0, - 4.0, 224.0, 32.0, 32.0, 56.0, 352.0, 160.0, 16.0, 116.0, 46.0, 88.0, - 17.0, 160.0, 24.0, 10.0, 28.0, 32.0, 152.0, 400.0, 352.0, 496.0, 288.0, - 34.0, 192.0, 32.0, 192.0, 52.0, 19.0, 160.0, 432.0, 8.0, 31.0, 18.0, - 54.0, 19.0, 56.0, 224.0, 88.0, 76.0, 112.0, 40.0, 76.0, 42.0, 22.0, - 80.0, 15.0, 40.0, 9.0, 104.0, 128.0, 64.0, 12.0, 136.0, 116.0, 336.0, - 480.0, 60.0, 144.0, 432.0, 224.0, 224.0, 192.0, 88.0, 64.0, 192.0, 124.0, - 62.0, 136.0, 4.0, 112.0, 16.0, 27.0, 1.0, 7.0, 30.0, 0.0, 20.0, - 168.0, 16.0, 224.0, 40.0, 448.0, 24.0, 29.0, 68.0, 120.0, 112.0, 18.0, - 24.0, 80.0, 240.0, 20.0, 28.0, 26.0, 432.0, 8.0, 200.0, 22.0, 80.0, - 0.0, 48.0, 304.0, 62.0, 192.0, 176.0, 21.0, 13.0, 0.0, 352.0, 88.0, - 32.0, 80.0, 26.0, 5.0, 28.0, 240.0, 8.0, 224.0, 16.0, 72.0, 256.0, - 56.0, 288.0, 16.0, 104.0, 18.0, 28.0, 29.0, 104.0, 144.0, 128.0, 112.0, - 144.0, 12.0, 120.0, 384.0, 64.0, 25.0, 80.0, 30.0, 12.0, 9.0, 168.0, - 48.0, 6.0, 16.0, 176.0, 64.0, 60.0, 30.0, 240.0, 48.0, 50.0, 16.0, - 54.0, 128.0, 30.0, 42.0, 26.0, 18.0, 27.0, 44.0, 14.0, 11.0, 40.0, - 72.0, 64.0, 8.0, 176.0, 76.0, 12.0, 80.0, 16.0, 29.0, 48.0, 160.0, - 32.0, 30.0, 52.0, 58.0, 0.0, 48.0, 10.0, 30.0, 12.0, 32.0, 168.0, - 60.0, 9.0, 44.0, 20.0, 24.0, 56.0, 4.0, 240.0, 18.0, 144.0, 40.0, - 50.0, 27.0, 56.0, 15.0, 384.0, 124.0, 216.0, 208.0, 28.0, 112.0, 116.0, - 368.0, 30.0, 28.0, 32.0, 0.0, 84.0, 6.0, 192.0, 136.0, 7.0, 42.0, - 34.0, 112.0, 104.0, 200.0, 116.0, 184.0, 10.0, 368.0, 46.0, 42.0, 320.0, - 30.0, 6.0, 16.0, 20.0, 48.0, 384.0, 416.0, 152.0, 120.0, 9.0, 0.0, - 3.0, 16.0, 16.0, 36.0, 104.0, 36.0, 128.0, 40.0, 58.0, 23.0, 24.0, - 124.0, 112.0, 144.0, 176.0, 224.0, 124.0, 128.0, 448.0, 48.0, 24.0, 18.0, - 64.0, 34.0, 128.0, 448.0, 16.0, 26.0, 16.0, 208.0, 54.0, 36.0, 34.0, - 60.0, 160.0, 34.0, 30.0, 208.0, 21.0, 36.0, 48.0, 22.0, 68.0, 200.0, - 52.0, 224.0, 5.0, 25.0, 20.0, 40.0, 68.0, 40.0, 14.0, 0.0, 60.0, - 176.0, 112.0, 32.0, 18.0, 38.0, 80.0, 7.0, 88.0, 88.0, 48.0, 336.0, - 272.0, 384.0, 8.0, 152.0, 224.0, 400.0, 144.0, 10.0, 10.0, 32.0, 27.0, - 62.0, 9.0, 24.0, 200.0, 22.0, 416.0, 144.0, 184.0, 400.0, 28.0, 31.0, - 208.0, 38.0, 176.0, 64.0, 14.0, 21.0, 28.0, 200.0, 54.0, 16.0, 28.0, - 26.0, 40.0, 8.0, 192.0, 28.0, 72.0, 96.0, 104.0, 216.0, 80.0, 62.0, - 64.0, 48.0, 88.0, 88.0, 272.0, 272.0, 64.0, 34.0, 448.0, 17.0, 256.0, - 20.0, 11.0, 17.0, 14.0, 40.0, 176.0, 4.0, 352.0, 30.0, 272.0, 76.0, - 2.0, 176.0, 240.0, 96.0, 14.0, 216.0, 32.0, 16.0, 0.0, 16.0, 5.0, - 104.0, 224.0, 112.0, 11.0, 232.0, 256.0, 320.0, 240.0, 336.0, 432.0, 0.0, - 58.0, 8.0, 60.0, 128.0, 18.0, 56.0, 0.0, 38.0, 26.0, 38.0, 72.0, - 176.0, 12.0, 32.0, 52.0, 15.0, 22.0, 40.0, 224.0, 416.0, 240.0, 20.0, - 40.0, 60.0, 32.0, 26.0, 92.0, 24.0, 32.0, 60.0, 48.0, 88.0, 30.0, - 80.0, 56.0, 18.0, 168.0, 16.0, 24.0, 8.0, 30.0, 27.0, 8.0, 22.0, - 30.0, 5.0, 248.0, 24.0, 16.0, 12.0, 12.0, 112.0, 30.0, 192.0, 23.0, - 16.0, 16.0, 14.0, 240.0, 58.0, 248.0, 24.0, 24.0, 432.0, 20.0, 120.0, - 30.0, 176.0, 50.0, 96.0, 16.0, 11.0, 44.0, 16.0, 3.0, 10.0, 224.0, - 152.0, 56.0, 23.0, 1.0, 52.0, 88.0, 136.0, 56.0, 46.0, 16.0, 28.0, - 3.0, 100.0, 24.0, 432.0, 240.0, 448.0, 256.0, 20.0, 80.0, 62.0, 6.0, - 336.0, 52.0, 216.0, 12.0, 12.0, 96.0, 124.0, 152.0, 16.0, 144.0, 16.0, - 64.0, 30.0, 60.0, 36.0, 4.0, 56.0, 25.0, 144.0, 80.0, 304.0, 96.0, - 100.0, 4.0, 56.0, 80.0, 256.0, 15.0, 40.0, 14.0, 58.0, 22.0, 64.0, - 216.0, 8.0, 120.0, 72.0, 112.0, 120.0, 200.0, 368.0, 20.0, 12.0, 36.0, - 64.0, 12.0, 80.0, 26.0, 160.0, 25.0, 0.0, 464.0, 68.0, 2.0, 80.0, - 48.0, 96.0, 120.0, 20.0, 112.0, 40.0, 64.0, 28.0, 0.0, 432.0, 76.0, - 2.0, 20.0, 42.0, 64.0, 112.0, 160.0, 44.0, 19.0, 224.0, 88.0, 40.0, - 28.0, 44.0, 7.0, 62.0, 52.0, 40.0, 120.0, 31.0, 15.0, 19.0, 10.0, - 192.0, 15.0, 30.0, 16.0, 56.0, 56.0, 1.0, 24.0, 62.0, 124.0, 96.0, - 8.0, 8.0, 0.0, 72.0, 22.0, 336.0, 6.0, 11.0, 44.0, 12.0, 2.0, - 22.0, 14.0, 44.0, 48.0, 34.0, 56.0, 96.0, 10.0, 432.0, 3.0, 24.0, - 152.0, 18.0, 0.0, 27.0, 40.0, 24.0, 20.0, 368.0, 192.0, 31.0, 144.0, - 168.0, 32.0, 28.0, 60.0, 22.0, 14.0, 32.0, 16.0, 128.0, 18.0, 2.0, - 24.0, 160.0, 108.0, 144.0, 496.0, 124.0, 48.0, 184.0, 100.0, 21.0, 2.0, - 104.0, 23.0, 24.0, 92.0, 0.0, 28.0, 36.0, 34.0, 100.0, 38.0, 34.0, - 40.0, 50.0, 40.0, 62.0, 19.0, 9.0, 58.0, 9.0, 72.0, 68.0, 36.0, - 34.0, 224.0, 80.0, 272.0, 8.0, 29.0, 124.0, 26.0, 10.0, 304.0, 52.0, - 192.0, 23.0, 336.0, 36.0, 144.0, 40.0, 248.0, 22.0, 50.0, 13.0, 1.0, - 108.0, 0.0, 60.0, 104.0, 80.0, 50.0, 4.0, 28.0, 14.0, 14.0, 25.0, - 60.0, 160.0, 0.0, 60.0, 62.0, 160.0, 184.0, 136.0, 34.0, 13.0, 96.0, - 416.0, 464.0, 16.0, 24.0, 26.0, 480.0, 76.0, 8.0, 32.0, 6.0, 60.0, - 108.0, 52.0, 60.0, 2.0, 16.0, 120.0, 26.0, 34.0, 124.0, 112.0, 56.0, - 64.0, 24.0, 448.0, 416.0, 124.0, 116.0, 4.0, 0.0, 3.0, 6.0, 17.0, - 28.0, 30.0, 60.0, 26.0, 56.0, 232.0, 64.0, 36.0, 432.0, 80.0, 304.0, - 160.0, 18.0, 240.0, 28.0, 18.0, 92.0, 48.0, 44.0, 28.0, 68.0, 104.0, - 4.0, 20.0, 40.0, 224.0, 46.0, 216.0, 248.0, 40.0, 208.0, 304.0, 56.0, - 6.0, 52.0, 34.0, 20.0, 38.0, 112.0, 12.0, 128.0, 28.0, 192.0, 0.0, - 100.0, 248.0, 48.0, 96.0, 8.0, 160.0, 0.0, 112.0, 36.0, 8.0, 2.0, - 8.0, 0.0, 62.0, 40.0, 112.0, 8.0, 64.0, 240.0, 1.0, 58.0, 2.0, - 64.0, 128.0, 42.0, 40.0, 248.0, 1.0, 44.0, 20.0, 25.0, 56.0, 56.0, - 20.0, 384.0, 64.0, 60.0, 208.0, 50.0, 48.0, 3.0, 232.0, 68.0, 56.0, - 4.0, 0.0, 17.0, 60.0, 176.0, 104.0, 32.0, 208.0, 2.0, 8.0, 32.0, - 20.0, 4.0, 80.0, 12.0, 400.0, 88.0, 84.0, 272.0, 36.0, 11.0, 0.0, - 16.0, 2.0, 9.0, 60.0, 80.0, 320.0, 20.0, 92.0, 72.0, 54.0, 20.0, - 16.0, 288.0, 120.0, 2.0, 124.0, 16.0, 52.0, 28.0, 30.0, 60.0, 496.0, - 20.0, 10.0, 80.0, 256.0, 31.0, 176.0, 16.0, 416.0, 27.0, 48.0, 10.0, - 60.0, 36.0, 64.0, 216.0, 100.0, 4.0, 0.0, 160.0, 432.0, 10.0, 96.0, - 336.0, 8.0, 9.0, 272.0, 56.0, 56.0, 36.0, 26.0, 29.0, 240.0, 104.0, - 384.0, 21.0, 28.0, 352.0, 76.0, 432.0, 4.0, 10.0, 48.0, 104.0, 320.0, - 17.0, 112.0, 432.0, 11.0, 28.0, 8.0, 58.0, 72.0, 16.0, 192.0, 20.0, - 224.0, 0.0, 28.0, 64.0, 2.0, 208.0, 128.0, 44.0, 4.0, 24.0, 248.0, - 48.0, 17.0, 30.0, 20.0, 46.0, 96.0, 36.0, 232.0, 10.0, 32.0, 24.0, - 10.0, 80.0, 2.0, 48.0, 28.0, 448.0, 6.0, 0.0, 6.0, 28.0, 17.0, - 38.0, 336.0, 6.0, 24.0, 96.0, 80.0, 96.0, 88.0, 13.0, 112.0, 13.0, - 16.0, 6.0, 432.0, 28.0, 304.0, 272.0, 320.0, 184.0, 56.0, 0.0, 40.0, - 116.0, 6.0, 272.0, 432.0, 120.0, 16.0, 248.0, 176.0, 232.0, 12.0, 26.0, - 16.0, 96.0, 26.0, 256.0, 124.0, 240.0, 58.0, 48.0, 448.0, 400.0, 62.0, - 52.0, 48.0, 464.0, 112.0, 36.0, 2.0, 12.0, 104.0, 96.0, 72.0, 48.0, - 14.0, 0.0, 31.0, 116.0, 448.0, 12.0, 52.0, 20.0, 80.0, 30.0, 12.0, - 14.0, 72.0, 128.0, 464.0, 248.0, 9.0, 4.0, 0.0, 224.0, 2.0, 480.0, - 30.0, 352.0, 208.0, 96.0, 288.0, 48.0, 14.0, 352.0, 240.0, 160.0, 16.0, - 20.0, 25.0, 5.0, 22.0, 11.0, 416.0, 72.0, 22.0, 38.0, 30.0, 26.0, - 4.0, 232.0, 60.0, 88.0, 17.0, 12.0, 1.0, 11.0, 60.0, 6.0, 56.0, - 72.0, 248.0, 34.0, 32.0, 8.0, 42.0, 120.0, 80.0, 168.0, 216.0, 208.0, - 24.0, 208.0, 208.0, 200.0, 4.0, 12.0, 384.0, 0.0, 28.0, 16.0, 496.0, - 48.0, 112.0, 224.0, 31.0, 64.0, 50.0, 16.0, 7.0, 16.0, 80.0, 4.0, - 26.0, 23.0, 48.0, 20.0, 38.0, 32.0, 54.0, 224.0, 30.0, 432.0, 120.0, - 248.0, 54.0, 64.0, 480.0, 26.0, 416.0, 17.0, 160.0, 28.0, 24.0, 46.0, - 48.0, 104.0, 24.0, 32.0, 96.0, 8.0, 64.0, 80.0, 192.0, 6.0, 29.0, - 44.0, 58.0, 0.0, 24.0, 64.0, 72.0, 26.0, 112.0, 208.0, 76.0, 0.0, - 36.0, 128.0, 9.0, 40.0, 112.0, 28.0, 320.0, 8.0, 56.0, 26.0, 62.0, - 20.0, 6.0, 32.0, 240.0, 16.0, 124.0, 16.0, 152.0, 352.0, 0.0, 48.0, - 152.0, 96.0, 104.0, 40.0, 0.0, 8.0, 232.0, 3.0, 28.0, 0.0, 144.0, - 160.0, 0.0, 13.0, 248.0, 36.0, 1.0, 400.0, 200.0, 2.0, 36.0, 9.0, - 2.0, 29.0, 18.0, 320.0, 336.0, 240.0, 62.0, 22.0, 29.0, 38.0, 26.0, - 368.0, 64.0, 168.0, 24.0, 432.0, 16.0, 13.0, 25.0, 23.0, 64.0, 38.0, - 184.0, 58.0, 32.0, 22.0, 62.0, 104.0, 6.0, 32.0, 16.0, 18.0, 50.0, - 24.0, 46.0, 24.0, 80.0, 22.0, 168.0, 20.0, 64.0, 32.0, 25.0, 64.0, - 96.0, 112.0, 12.0, 0.0, 9.0, 88.0, 13.0, 112.0, 16.0, 72.0, 48.0, - 216.0, 5.0, 8.0, 72.0, 208.0, 2.0, 272.0, 2.0, 62.0, 56.0, 88.0, - 18.0, 0.0, 50.0, 2.0, 21.0, 16.0, 9.0, 18.0, 144.0, 416.0, 5.0, - 1.0, 192.0, 25.0, 8.0, 96.0, 100.0, 34.0, 20.0, 64.0, 40.0, 36.0, - 23.0, 26.0, 15.0, 240.0, 448.0, 22.0, 1.0, 31.0, 16.0, 416.0, 16.0, - 17.0, 20.0, 26.0, 20.0, 352.0, 76.0, 88.0, 44.0, 116.0, 1.0, 54.0, - 112.0, 40.0, 14.0, 56.0, 368.0, 336.0, 64.0, 9.0, 16.0, 8.0, 464.0, - 480.0, 22.0, 16.0, 23.0, 30.0, 4.0, 2.0, 240.0, 176.0, 64.0, 68.0, - 48.0, 32.0, 26.0, 304.0, 80.0, 64.0, 64.0, 88.0, 30.0, 80.0, 27.0, - 17.0, 18.0, 1.0, 16.0, 232.0, 80.0, 0.0, 16.0, 16.0, 152.0, 12.0, - 464.0, 400.0, 14.0, 28.0, 64.0, 136.0, 480.0, 13.0, 176.0, 46.0, 100.0, - 124.0, 4.0, 160.0, 36.0, 24.0, 448.0, 124.0, 8.0, 16.0, 4.0, 232.0, - 2.0, 25.0, 2.0, 64.0, 24.0, 20.0, 216.0, 76.0, 208.0, 272.0, 136.0, - 144.0, 28.0, 400.0, 464.0, 48.0, 17.0, 304.0, 44.0, 22.0, 168.0, 68.0, - 32.0, 8.0, 24.0, 25.0, 2.0, 52.0, 100.0, 336.0, 62.0, 8.0, 128.0, - 88.0, 22.0, 6.0, 160.0, 0.0, 56.0, 288.0, 48.0, 60.0, 120.0, 8.0, - 8.0, 120.0, 240.0, 176.0, 0.0, 24.0, 18.0, 32.0, 12.0, 24.0, 96.0, - 15.0, 200.0, 13.0, 48.0, 320.0, 48.0, 108.0, 224.0, 30.0, 4.0, 19.0, - 112.0, 34.0, 192.0, 128.0, 25.0, 84.0, 120.0, 10.0, 9.0, 0.0, 27.0, - 10.0, 10.0, 16.0, 32.0, 1.0, 384.0, 112.0, 432.0, 248.0, 21.0, 72.0, - 336.0, 16.0, 112.0, 50.0, 46.0, 46.0, 58.0, 6.0, 18.0, 10.0, 32.0, - 24.0, 20.0, 104.0, 96.0, 104.0, 0.0, 54.0, 120.0, 464.0, 84.0, 192.0, - 12.0, 72.0, 84.0, 216.0, 184.0, 30.0, 88.0, 100.0, 432.0, 8.0, 18.0, - 28.0, 4.0, 18.0, 20.0, 3.0, 240.0, 50.0, 384.0, 31.0, 96.0, 64.0, - 136.0, 8.0, 352.0, 2.0, 19.0, 10.0, 27.0, 352.0, 208.0, 0.0, 144.0, - 184.0, 20.0, 18.0, 96.0, 32.0, 36.0, 4.0, 112.0, 16.0, 28.0, 240.0, - 72.0, 11.0, 10.0, 184.0, 84.0, 13.0, 12.0, 6.0, 240.0, 28.0, 152.0, - 104.0, 240.0, 184.0, -}; -float verify_data[O_SIZE] = { - 979.0, 969.0, 970.0, 420.0, 734.0, 706.0, 662.0, 408.0, 560.0, - 678.0, 810.0, 537.0, 821.0, 716.0, 755.0, 387.0, 476.0, 485.0, - 470.0, 588.0, 547.0, 561.0, 537.0, 611.0, 1006.0, 834.0, 816.0, - 1178.0, 1170.0, 1514.0, 834.0, 927.0, 543.0, 655.0, 837.0, 1309.0, - 1181.0, 900.0, 654.0, 900.0, 968.0, 1210.0, 1122.0, 1103.0, 672.0, - 506.0, 493.0, 472.0, 592.0, 576.0, 946.0, 946.0, 866.0, 604.0, - 566.0, 649.0, 945.0, 751.0, 712.0, 612.0, 664.0, 736.0, 680.0, - 866.0, 876.0, 1180.0, 1554.0, 1468.0, 883.0, 353.0, 1661.0, 1543.0, - 1032.0, 415.0, 475.0, 471.0, 498.0, 290.0, 744.0, 1008.0, 1112.0, - 545.0, 296.0, 521.0, 668.0, 671.0, 814.0, 776.0, 1205.0, 1100.0, - 1434.0, 964.0, 914.0, 570.0, 628.0, 933.0, 885.0, 1198.0, 724.0, - 688.0, 626.0, 831.0, 843.0, 803.0, 737.0, 1253.0, 1181.0, 1260.0, - 1172.0, 1092.0, 758.0, 798.0, 1116.0, 1326.0, 933.0, 619.0, 735.0, - 676.0, 904.0, 628.0, 874.0, 730.0, 894.0, 732.0, 844.0, 1036.0, - 1362.0, 1394.0, 1112.0, 1018.0, 702.0, 656.0, 816.0, 1462.0, 1562.0, - 1416.0, 915.0, 871.0, 524.0, 447.0, 1236.0, 1360.0, 1202.0, 819.0, - 613.0, 323.0, 422.0, 220.0, 784.0, 1358.0, 1502.0, 1129.0, 676.0, - 765.0, 778.0, 601.0, 880.0, 750.0, 1197.0, 957.0, 1325.0, 921.0, - 722.0, 700.0, 682.0, 1241.0, 945.0, 1302.0, 806.0, 692.0, 910.0, - 1394.0, 1408.0, 819.0, 344.0, 472.0, 565.0, 948.0, 1056.0, 1098.0, - 614.0, 490.0, 554.0, 830.0, 777.0, 648.0, 618.0, 567.0, 799.0, - 553.0, 741.0, 584.0, 806.0, 732.0, 820.0, 952.0, 1222.0, 1324.0, - 1076.0, 906.0, 630.0, 570.0, 892.0, 1308.0, 1374.0, 960.0, 539.0, - 536.0, 365.0, 440.0, 938.0, 1155.0, 851.0, 800.0, 564.0, 389.0, - 476.0, 348.0, 704.0, 1280.0, 1332.0, 1188.0, 877.0, 1043.0, 1323.0, - 992.0, 1060.0, 670.0, 1103.0, 941.0, 1647.0, 1420.0, 1323.0, 793.0, - 680.0, 1265.0, 1151.0, 1438.0, 862.0, 620.0, 886.0, 1438.0, 1692.0, - 1049.0, 961.0, 859.0, 994.0, 922.0, 993.0, 1027.0, 725.0, 680.0, - 691.0, 729.0, 688.0, 518.0, 872.0, 825.0, 1095.0, 559.0, 735.0, - 558.0, 1082.0, 916.0, 1400.0, 1149.0, 1051.0, 927.0, 711.0, 935.0, - 531.0, 433.0, 673.0, 1057.0, 1218.0, 770.0, 367.0, 362.0, 366.0, - 537.0, 666.0, 991.0, 1169.0, 765.0, 591.0, 430.0, 630.0, 529.0, - 767.0, 1227.0, 1334.0, 1254.0, 1282.0, 1195.0, 1335.0, 681.0, 740.0, - 426.0, 363.0, 716.0, 1398.0, 1645.0, 1256.0, 870.0, 852.0, 926.0, - 736.0, 978.0, 816.0, 600.0, 674.0, 1068.0, 1456.0, 1157.0, 1214.0, - 1052.0, 907.0, 734.0, 549.0, 693.0, 769.0, 924.0, 701.0, 523.0, - 426.0, 413.0, 997.0, 978.0, 1096.0, 404.0, 392.0, 435.0, 819.0, - 807.0, 1250.0, 795.0, 720.0, 244.0, 288.0, 365.0, 307.0, 281.0, - 318.0, 280.0, 575.0, 502.0, 918.0, 651.0, 661.0, 430.0, 1189.0, - 1222.0, 1253.0, 599.0, 463.0, 428.0, 470.0, 513.0, 701.0, 749.0, - 806.0, 970.0, 1307.0, 1271.0, 1195.0, 696.0, 784.0, 508.0, 481.0, - 793.0, 1455.0, 1598.0, 1178.0, 646.0, 670.0, 656.0, 502.0, 672.0, - 633.0, 503.0, 407.0, 610.0, 1060.0, 1088.0, 1283.0, 1019.0, 767.0, - 668.0, 495.0, 771.0, 955.0, 1138.0, 979.0, 593.0, 631.0, 685.0, - 1679.0, 1529.0, 1731.0, 681.0, 697.0, 549.0, 971.0, 903.0, 1200.0, - 743.0, 788.0, 332.0, 340.0, 284.0, 204.0, 432.0, 464.0, 542.0, - 503.0, 472.0, 814.0, 826.0, 852.0, 911.0, 1378.0, 1372.0, 1152.0, - 884.0, 810.0, 750.0, 610.0, 1017.0, 1253.0, 1161.0, 912.0, 976.0, - 1093.0, 1367.0, 999.0, 736.0, 432.0, 404.0, 368.0, 632.0, 1030.0, - 1250.0, 839.0, 793.0, 704.0, 1053.0, 797.0, 920.0, 617.0, 472.0, - 508.0, 598.0, 881.0, 855.0, 695.0, 580.0, 402.0, 775.0, 1070.0, - 1204.0, 1276.0, 966.0, 868.0, 522.0, 918.0, 1027.0, 1757.0, 1219.0, - 1323.0, 615.0, 749.0, 533.0, 653.0, 511.0, 516.0, 291.0, 364.0, - 206.0, 241.0, 175.0, 275.0, 585.0, 673.0, 1013.0, 897.0, 1002.0, - 980.0, 958.0, 986.0, 1001.0, 1167.0, 1206.0, 765.0, 1173.0, 1207.0, - 1194.0, 682.0, 988.0, 1124.0, 931.0, 545.0, 913.0, 927.0, 1286.0, - 1138.0, 1439.0, 1078.0, 814.0, 500.0, 574.0, 450.0, 850.0, 679.0, - 891.0, 608.0, 1101.0, 891.0, 964.0, 719.0, 610.0, 752.0, 612.0, - 1165.0, 1047.0, 1170.0, 807.0, 973.0, 1047.0, 1311.0, 1059.0, 1115.0, - 705.0, 789.0, 969.0, 1532.0, 1572.0, 1415.0, 839.0, 935.0, 640.0, - 812.0, 736.0, 816.0, 700.0, 790.0, 1077.0, 1097.0, 915.0, 800.0, - 1047.0, 941.0, 973.0, 742.0, 1316.0, 1168.0, 1214.0, 596.0, 578.0, - 726.0, 1212.0, 797.0, 936.0, 593.0, 1337.0, 1461.0, 1319.0, 782.0, - 980.0, 1257.0, 1085.0, 849.0, 669.0, 974.0, 1140.0, 1496.0, 1690.0, - 1314.0, 928.0, 378.0, 466.0, 714.0, 1196.0, 1089.0, 923.0, 584.0, - 1031.0, 1069.0, 1514.0, 1255.0, 952.0, 928.0, 910.0, 1385.0, 1071.0, - 1084.0, 851.0, 1057.0, 933.0, 1261.0, 859.0, 950.0, 580.0, 802.0, - 1359.0, 1688.0, 1968.0, 1241.0, 945.0, 505.0, 555.0, 495.0, 641.0, - 704.0, 664.0, 824.0, 1093.0, 1085.0, 973.0, 1074.0, 1313.0, 1521.0, - 1146.0, 851.0, 1277.0, 1408.0, 1499.0, 643.0, 569.0, 770.0, 1238.0, - 713.0, 886.0, 867.0, 1527.0, 1577.0, 1183.0, 1086.0, 896.0, 1074.0, - 564.0, 732.0, 619.0, 1062.0, 829.0, 1305.0, 1357.0, 1430.0, 946.0, - 420.0, 498.0, 650.0, 880.0, 754.0, 568.0, 480.0, 650.0, 705.0, - 1125.0, 1100.0, 897.0, 949.0, 1008.0, 1618.0, 1316.0, 1342.0, 1034.0, - 1048.0, 810.0, 711.0, 377.0, 482.0, 463.0, 759.0, 1536.0, 1457.0, - 1659.0, 710.0, 815.0, 483.0, 579.0, 451.0, 515.0, 698.0, 910.0, - 1048.0, 1233.0, 1085.0, 1075.0, 1133.0, 1484.0, 1558.0, 1108.0, 655.0, - 840.0, 1161.0, 1262.0, 706.0, 550.0, 975.0, 1528.0, 676.0, 754.0, - 826.0, 1166.0, 1188.0, 1227.0, 1362.0, 1408.0, 1166.0, 745.0, 748.0, - 507.0, 977.0, 773.0, 981.0, 641.0, 708.0, 620.0, 500.0, 768.0, - 978.0, 934.0, 632.0, 388.0, 410.0, 632.0, 673.0, 1643.0, 1578.0, - 1469.0, 959.0, 960.0, 1282.0, 905.0, 705.0, 525.0, 448.0, 550.0, - 602.0, 570.0, 675.0, 587.0, 877.0, 1130.0, 993.0, 1144.0, 638.0, - 851.0, 490.0, 517.0, 277.0, 199.0, 576.0, 1024.0, 1032.0, 813.0, - 484.0, 576.0, 682.0, 706.0, 1068.0, 820.0, 837.0, 714.0, 907.0, - 887.0, 621.0, 566.0, 966.0, 1383.0, 793.0, 800.0, 908.0, 928.0, - 958.0, 982.0, 1242.0, 1364.0, 1081.0, 625.0, 534.0, 367.0, 583.0, - 517.0, 635.0, 463.0, 616.0, 615.0, 715.0, 907.0, 754.0, 684.0, - 348.0, 272.0, 180.0, 422.0, 489.0, 1343.0, 1261.0, 1460.0, 1142.0, - 1194.0, 1370.0, 935.0, 684.0, 405.0, 434.0, 529.0, 528.0, 400.0, - 960.0, 896.0, 1242.0, 930.0, 1001.0, 633.0, 293.0, 370.0, 449.0, - 461.0, 329.0, 372.0, 864.0, 1352.0, 1267.0, 928.0, 431.0, 451.0, - 399.0, 486.0, 545.0, 572.0, 918.0, 829.0, 791.0, 512.0, 550.0, - 469.0, 748.0, 841.0, 858.0, 862.0, 882.0, 688.0, 792.0, 944.0, - 1006.0, 1096.0, 711.0, 625.0, 472.0, 411.0, 505.0, 458.0, 460.0, - 410.0, 602.0, 625.0, 883.0, 935.0, 788.0, 696.0, 474.0, 506.0, - 252.0, 416.0, 390.0, 1164.0, 1312.0, 1530.0, 1170.0, 1114.0, 1146.0, - 786.0, 537.0, 338.0, 470.0, 501.0, 440.0, 398.0, 954.0, 1027.0, - 1269.0, 749.0, 992.0, 612.0, 513.0, 409.0, 438.0, 489.0, 509.0, - 1086.0, 1434.0, 1624.0, 1097.0, 1085.0, 834.0, 866.0, 626.0, 501.0, - 520.0, 566.0, 1028.0, 948.0, 832.0, 345.0, 443.0, 686.0, 743.0, - 879.0, 804.0, 990.0, 1094.0, 1092.0, 978.0, 677.0, 539.0, 504.0, - 470.0, 300.0, 467.0, 554.0, 577.0, 374.0, 316.0, 473.0, 750.0, - 623.0, 697.0, 473.0, 468.0, 450.0, 389.0, 428.0, 326.0, 409.0, - 416.0, 768.0, 832.0, 948.0, 864.0, 1032.0, 978.0, 625.0, 728.0, - 673.0, 755.0, 337.0, 218.0, 204.0, 685.0, 925.0, 1410.0, 1020.0, - 1442.0, 861.0, 970.0, 550.0, 676.0, 579.0, 715.0, 1186.0, 1488.0, - 1398.0, 793.0, 1117.0, 1271.0, 1371.0, 1019.0, 1005.0, 1108.0, 1008.0, - 978.0, 809.0, 634.0, 343.0, 386.0, 707.0, 716.0, 872.0, 830.0, - 1031.0, 1068.0, 1081.0, 873.0, 644.0, 461.0, 506.0, 368.0, 229.0, - 596.0, 777.0, 1235.0, 770.0, 730.0, 471.0, 734.0, 584.0, 554.0, - 758.0, 766.0, 720.0, 297.0, 406.0, 460.0, 557.0, 486.0, 618.0, - 828.0, 791.0, 567.0, 323.0, 430.0, 665.0, 1009.0, 987.0, 701.0, - 267.0, 137.0, 239.0, 429.0, 809.0, 1063.0, 907.0, 1021.0, 802.0, - 955.0, 545.0, 563.0, 594.0, 798.0, 1078.0, 1422.0, 1298.0, 850.0, - 1306.0, 1521.0, 1687.0, 1199.0, 1442.0, 1472.0, 1096.0, 548.0, 497.0, - 474.0, 551.0, 467.0, 746.0, 723.0, 849.0, 506.0, 674.0, 843.0, - 853.0, 621.0, 438.0, 179.0, 258.0, 281.0, 246.0, 643.0, 693.0, - 1181.0, 703.0, 701.0, 462.0, 672.0, 694.0, 446.0, 894.0, 1008.0, - 1040.0, 425.0, 678.0, 936.0, 1013.0, 994.0, 1038.0, 1148.0, 855.0, - 1041.0, 1245.0, 1137.0, 906.0, 982.0, 991.0, 713.0, 249.0, 179.0, - 205.0, 381.0, 755.0, 1089.0, 909.0, 789.0, 688.0, 834.0, 628.0, - 576.0, 574.0, 599.0, 523.0, 831.0, 875.0, 755.0, 1247.0, 1529.0, - 1751.0, 1261.0, 1416.0, 1370.0, 982.0, 442.0, 388.0, 535.0, 642.0, - 713.0, 708.0, 725.0, 807.0, 675.0, 705.0, 774.0, 601.0, 701.0, - 443.0, 377.0, 221.0, 193.0, 259.0, 623.0, 621.0, 1035.0, 622.0, - 696.0, 563.0, 776.0, 832.0, 728.0, 1472.0, 1668.0, 1514.0, 594.0, - 912.0, 978.0, 994.0, 810.0, 834.0, 993.0, 692.0, 1052.0, 1271.0, - 1151.0, 912.0, 682.0, 680.0, 377.0, 230.0, 201.0, 335.0, 518.0, - 838.0, 809.0, 643.0, 511.0, 648.0, 744.0, 574.0, 475.0, 503.0, - 476.0, 537.0, 983.0, 1035.0, 881.0, 793.0, 928.0, 1298.0, 929.0, - 1039.0, 703.0, 519.0, 311.0, 732.0, 1337.0, 1411.0, 1351.0, 1259.0, - 1167.0, 1165.0, 634.0, 816.0, 726.0, 578.0, 564.0, 542.0, 501.0, - 679.0, 563.0, 630.0, 438.0, 472.0, 437.0, 256.0, 226.0, 419.0, - 468.0, 611.0, 683.0, 1187.0, 1390.0, 1204.0, 600.0, 892.0, 1082.0, - 1152.0, 1176.0, 1012.0, 997.0, 485.0, 838.0, 1237.0, 1238.0, 842.0, - 590.0, 480.0, 376.0, 245.0, 454.0, 895.0, 1000.0, 904.0, 568.0, - 672.0, 716.0, 834.0, 934.0, 978.0, 887.0, 585.0, 384.0, 495.0, - 821.0, 803.0, 815.0, 767.0, 975.0, 1081.0, 768.0, 727.0, 415.0, - 380.0, 508.0, 937.0, 1625.0, 1353.0, 1500.0, 1772.0, 2210.0, 2022.0, - 789.0, 907.0, 721.0, 592.0, 564.0, 832.0, 800.0, 888.0, 532.0, - 630.0, 426.0, 492.0, 471.0, 467.0, 421.0, 560.0, 448.0, 473.0, - 537.0, 1021.0, 1128.0, 902.0, 378.0, 433.0, 685.0, 731.0, 1008.0, - 692.0, 639.0, 153.0, 226.0, 335.0, 494.0, 569.0, 509.0, 571.0, - 636.0, 614.0, 1107.0, 1308.0, 1458.0, 860.0, 624.0, 878.0, 1024.0, - 978.0, 1090.0, 1070.0, 999.0, 473.0, 295.0, 342.0, 696.0, 1040.0, - 1176.0, 984.0, 735.0, 727.0, 604.0, 628.0, 450.0, 399.0, 510.0, - 933.0, 1402.0, 1188.0, 1117.0, 1514.0, 2302.0, 2018.0, 966.0, 806.0, - 1160.0, 988.0, 720.0, 591.0, 591.0, 931.0, 668.0, 782.0, 422.0, - 436.0, 580.0, 620.0, 638.0, 644.0, 470.0, 411.0, 243.0, 493.0, - 672.0, 632.0, 422.0, 311.0, 1027.0, 1061.0, 1428.0, 720.0, 716.0, - 368.0, 377.0, 487.0, 466.0, 921.0, 769.0, 920.0, 767.0, 981.0, - 1463.0, 1400.0, 1394.0, 932.0, 748.0, 1062.0, 868.0, 1212.0, 1284.0, - 1448.0, 1484.0, 908.0, 694.0, 244.0, 300.0, 780.0, 958.0, 1016.0, - 700.0, 542.0, 464.0, 379.0, 453.0, 406.0, 1017.0, 984.0, 1350.0, - 876.0, 925.0, 1134.0, 1792.0, 1612.0, 1056.0, 518.0, 814.0, 791.0, - 737.0, 544.0, 771.0, 708.0, 491.0, 439.0, 392.0, 366.0, 570.0, - 680.0, 752.0, 688.0, 898.0, 1198.0, 1114.0, 1020.0, 836.0, 640.0, - 427.0, 240.0, 892.0, 885.0, 1288.0, 670.0, 878.0, 578.0, 628.0, - 570.0, 449.0, 789.0, 611.0, 816.0, 900.0, 1113.0, 1457.0, 903.0, - 1236.0, 1162.0, 1496.0, 1426.0, 1004.0, 1029.0, 934.0, 884.0, 1259.0, - 942.0, 1012.0, 308.0, 336.0, 1044.0, 1035.0, 1005.0, 427.0, 467.0, - 433.0, 368.0, 921.0, 1049.0, 1364.0, 1219.0, 1526.0, 1278.0, 829.0, - 610.0, 882.0, 884.0, 1124.0, 702.0, 1322.0, 1221.0, 1031.0, 248.0, - 492.0, 597.0, 625.0, 504.0, 457.0, 398.0, 600.0, 670.0, 884.0, - 812.0, 976.0, 1146.0, 1318.0, 1294.0, 1114.0, 726.0, 390.0, 298.0, - 824.0, 992.0, 1326.0, 1066.0, 1119.0, 833.0, 635.0, 502.0, 400.0, - 728.0, 608.0, 708.0, 591.0, 836.0, 756.0, 449.0, 1120.0, 1518.0, - 1854.0, 1382.0, 1036.0, 1003.0, 812.0, 874.0, 1264.0, 1131.0, 1089.0, - 392.0, 402.0, 766.0, 789.0, 801.0, 379.0, 441.0, 321.0, 395.0, - 1154.0, 1275.0, 1631.0, 1279.0, 1780.0, 1452.0, 946.0, 508.0, 392.0, - 392.0, 726.0, 606.0, 772.0, 741.0, 619.0, 409.0, 682.0, 695.0, - 662.0, 736.0, 761.0, 664.0, 437.0, 575.0, 887.0, 738.0, 1160.0, - 1246.0, 1636.0, 1622.0, 1260.0, 858.0, 226.0, 226.0, 438.0, 620.0, - 1015.0, 1049.0, 1106.0, 719.0, 575.0, 496.0, 534.0, 486.0, 322.0, - 394.0, 481.0, 626.0, 708.0, 601.0, 1202.0, 1224.0, 1520.0, 1422.0, - 1264.0, 899.0, 230.0, 320.0, 1026.0, 1291.0, 1473.0, 948.0, 1122.0, - 1157.0, 1047.0, 699.0, 348.0, 535.0, 493.0, 643.0, 1166.0, 1253.0, - 1263.0, 1010.0, 1279.0, 1463.0, 969.0, 767.0, 427.0, 488.0, 550.0, - 610.0, 764.0, 764.0, 595.0, 629.0, 683.0, 866.0, 843.0, 1269.0, - 1161.0, 988.0, 721.0, 841.0, 940.0, 679.0, 779.0, 540.0, 1170.0, - 1594.0, 1556.0, 924.0, 411.0, 679.0, 833.0, 1032.0, 998.0, 1090.0, - 757.0, 557.0, 443.0, 664.0, 676.0, 576.0, 294.0, 386.0, 309.0, - 558.0, 622.0, 701.0, 906.0, 754.0, 916.0, 890.0, 1124.0, 914.0, - 588.0, 468.0, 1287.0, 1469.0, 1677.0, 1076.0, 1278.0, 1217.0, 1148.0, - 728.0, 477.0, 634.0, 1004.0, 1074.0, 1090.0, 613.0, 891.0, 624.0, - 1029.0, 1105.0, 1059.0, 813.0, 369.0, 568.0, 338.0, 376.0, 324.0, - 332.0, 296.0, 694.0, 720.0, 780.0, 698.0, 1158.0, 1118.0, 967.0, - 661.0, 661.0, 671.0, 483.0, 845.0, 691.0, 1101.0, 1213.0, 1260.0, - 974.0, 922.0, 1002.0, 976.0, 844.0, 692.0, 542.0, 390.0, 362.0, - 508.0, 648.0, 648.0, 554.0, 332.0, 377.0, 291.0, 446.0, 821.0, - 870.0, 849.0, 603.0, 742.0, 948.0, 990.0, 1158.0, 850.0, 607.0, - 1119.0, 1443.0, 1802.0, 1334.0, 1368.0, 1249.0, 1052.0, 584.0, 445.0, - 520.0, 958.0, 992.0, 810.0, 386.0, 654.0, 631.0, 829.0, 951.0, - 943.0, 855.0, 423.0, 610.0, 500.0, 588.0, 694.0, 634.0, 558.0, - 684.0, 616.0, 664.0, 588.0, 760.0, 764.0, 697.0, 730.0, 678.0, - 614.0, 785.0, 826.0, 776.0, 798.0, 863.0, 872.0, 630.0, 1036.0, - 1140.0, 972.0, 1204.0, 1067.0, 1031.0, 533.0, 474.0, 606.0, 510.0, - 484.0, 410.0, 304.0, 387.0, 291.0, 342.0, 575.0, 654.0, 867.0, - 1069.0, 1244.0, 1384.0, 1076.0, 1604.0, 1250.0, 1085.0, 722.0, 1116.0, - 1287.0, 1156.0, 846.0, 952.0, 950.0, 768.0, 696.0, 540.0, 946.0, - 782.0, 713.0, 297.0, 471.0, 518.0, 854.0, 924.0, 1176.0, 872.0, - 608.0, 630.0, 436.0, 658.0, 922.0, 860.0, 682.0, 447.0, 323.0, - 419.0, 380.0, 548.0, 512.0, 993.0, 810.0, 1006.0, 717.0, 1094.0, - 901.0, 758.0, 404.0, 337.0, 380.0, 1002.0, 1256.0, 1160.0, 486.0, - 723.0, 875.0, 919.0, 660.0, 499.0, 739.0, 463.0, 486.0, 278.0, - 340.0, 475.0, 673.0, 636.0, 759.0, 570.0, 776.0, 913.0, 976.0, - 1233.0, 840.0, 1520.0, 1036.0, 1027.0, 222.0, 638.0, 716.0, 887.0, - 557.0, 572.0, 730.0, 692.0, 752.0, 484.0, 431.0, 323.0, 261.0, - 292.0, 362.0, 446.0, 746.0, 820.0, 1448.0, 1432.0, 1241.0, 767.0, - 513.0, 694.0, 816.0, 816.0, 849.0, 748.0, 622.0, 593.0, 544.0, - 698.0, 554.0, 940.0, 802.0, 1086.0, 764.0, 1460.0, 1347.0, 1367.0, - 628.0, 465.0, 587.0, 1090.0, 1128.0, 854.0, 412.0, 837.0, 924.0, - 889.0, 555.0, 521.0, 1032.0, 849.0, 786.0, 278.0, 284.0, 516.0, - 834.0, 1288.0, 1348.0, 1106.0, 757.0, 758.0, 846.0, 1135.0, 980.0, - 1296.0, 866.0, 876.0, 415.0, 907.0, 802.0, 731.0, 815.0, 849.0, - 1141.0, 689.0, 756.0, 490.0, 595.0, 478.0, 462.0, 227.0, 312.0, - 364.0, 842.0, 1074.0, 1812.0, 2044.0, 1677.0, 1221.0, 1157.0, 1570.0, - 1320.0, 1224.0, 989.0, 912.0, 586.0, 510.0, 467.0, 797.0, 638.0, - 986.0, 826.0, 1224.0, 850.0, 1116.0, 1272.0, 1528.0, 1331.0, 975.0, - 989.0, 1240.0, 1022.0, 738.0, 316.0, 753.0, 810.0, 869.0, 743.0, - 809.0, 1300.0, 1011.0, 964.0, 412.0, 420.0, 916.0, 1240.0, 2094.0, - 1866.0, 1544.0, 743.0, 390.0, 486.0, 447.0, 596.0, 770.0, 716.0, - 658.0, 408.0, 640.0, 499.0, 377.0, 685.0, 685.0, 1249.0, 829.0, - 1148.0, 900.0, 1149.0, 901.0, 696.0, 439.0, 455.0, 502.0, 682.0, - 1042.0, 1716.0, 2138.0, 1723.0, 1175.0, 1581.0, 1798.0, 1206.0, 1048.0, - 976.0, 982.0, 670.0, 379.0, 286.0, 422.0, 439.0, 348.0, 684.0, - 822.0, 930.0, 942.0, 1350.0, 1568.0, 1299.0, 901.0, 912.0, 737.0, - 582.0, 567.0, 597.0, 1029.0, 762.0, 761.0, 599.0, 1054.0, 1393.0, - 1218.0, 902.0, 740.0, 652.0, 1420.0, 1276.0, 2144.0, 1576.0, 1476.0, - 684.0, 706.0, 894.0, 908.0, 902.0, 1048.0, 1094.0, 826.0, 550.0, - 688.0, 539.0, 643.0, 955.0, 957.0, 1137.0, 607.0, 1046.0, 1390.0, - 1780.0, 1333.0, 885.0, 507.0, 536.0, 402.0, 492.0, 942.0, 1416.0, - 1770.0, 1344.0, 1008.0, 1880.0, 2048.0, 1562.0, 1070.0, 1002.0, 950.0, - 572.0, 393.0, 340.0, 494.0, 703.0, 636.0, 1032.0, 810.0, 1016.0, - 564.0, 809.0, 829.0, 1053.0, 842.0, 873.0, 597.0, 474.0, 783.0, - 728.0, 1300.0, 840.0, 1115.0, 809.0, 1149.0, 1017.0, 828.0, 596.0, - 713.0, 781.0, 1393.0, 1314.0, 1868.0, 1364.0, 1168.0, 568.0, 678.0, - 745.0, 993.0, 1059.0, 1327.0, 1253.0, 961.0, 736.0, 620.0, 511.0, - 629.0, 825.0, 762.0, 926.0, 567.0, 1055.0, 1439.0, 1750.0, 1284.0, - 874.0, 876.0, 992.0, 756.0, 622.0, 748.0, 1204.0, 1278.0, 1028.0, - 628.0, 1315.0, 939.0, 850.0, 465.0, 797.0, 1169.0, 1124.0, 1304.0, - 1131.0, 1109.0, 957.0, 588.0, 1048.0, 689.0, 879.0, 389.0, 709.0, - 514.0, 572.0, 433.0, 837.0, 877.0, 818.0, 932.0, 935.0, 1147.0, - 696.0, 729.0, 615.0, 961.0, 834.0, 718.0, 450.0, 638.0, 1097.0, - 1517.0, 1436.0, 1494.0, 1198.0, 1218.0, 826.0, 921.0, 776.0, 860.0, - 1053.0, 1094.0, 1088.0, 1138.0, 1434.0, 1638.0, 1137.0, 964.0, 904.0, - 817.0, 668.0, 401.0, 891.0, 1309.0, 1404.0, 1385.0, 1229.0, 1163.0, - 816.0, 556.0, 838.0, 948.0, 1540.0, 1376.0, 1118.0, 834.0, 1268.0, - 1000.0, 757.0, 435.0, 779.0, 1153.0, 1170.0, 1306.0, 1252.0, 1292.0, - 1124.0, 644.0, 984.0, 651.0, 762.0, 232.0, 509.0, 597.0, 911.0, - 917.0, 1142.0, 992.0, 818.0, 1003.0, 1030.0, 1698.0, 1141.0, 1256.0, - 1053.0, 1377.0, 1131.0, 667.0, 479.0, 478.0, 1013.0, 957.0, 1158.0, - 1102.0, 1546.0, 1604.0, 1219.0, 1098.0, 861.0, 1130.0, 1024.0, 967.0, - 602.0, 881.0, 1276.0, 1685.0, 1167.0, 774.0, 590.0, 818.0, 1220.0, - 1063.0, 1118.0, 768.0, 969.0, 1076.0, 1176.0, 1060.0, 698.0, 616.0, - 797.0, 957.0, 1343.0, 1074.0, 1184.0, 1096.0, 1324.0, 1100.0, 795.0, - 965.0, 1130.0, 1020.0, 989.0, 1252.0, 1193.0, 1122.0, 774.0, 387.0, - 661.0, 572.0, 941.0, 508.0, 912.0, 812.0, 1042.0, 949.0, 986.0, - 940.0, 810.0, 1099.0, 1123.0, 1507.0, 893.0, 867.0, 867.0, 1320.0, - 1162.0, 765.0, 565.0, 617.0, 1114.0, 1126.0, 1194.0, 862.0, 1010.0, - 1256.0, 1125.0, 1144.0, 846.0, 857.0, 699.0, 758.0, 517.0, 988.0, - 1288.0, 1751.0, 1089.0, 678.0, 372.0, 608.0, 1142.0, 1168.0, 1183.0, - 669.0, 875.0, 1190.0, 1248.0, 947.0, 265.0, 252.0, 492.0, 912.0, - 1209.0, 1134.0, 1176.0, 1173.0, 1241.0, 1128.0, 654.0, 1269.0, 1383.0, - 1125.0, 599.0, 566.0, 479.0, 570.0, 534.0, 513.0, 707.0, 1145.0, - 1454.0, 1011.0, 876.0, 775.0, 1005.0, 792.0, 583.0, 579.0, 589.0, - 1124.0, 1044.0, 1348.0, 707.0, 738.0, 634.0, 1397.0, 1273.0, 1111.0, - 783.0, 828.0, 1222.0, 883.0, 913.0, 741.0, 1064.0, 1160.0, 785.0, - 933.0, 829.0, 1072.0, 873.0, 1215.0, 922.0, 1193.0, 1032.0, 1463.0, - 1001.0, 1203.0, 937.0, 1087.0, 1130.0, 1196.0, 985.0, 571.0, 693.0, - 820.0, 796.0, 579.0, 341.0, 564.0, 652.0, 802.0, 493.0, 550.0, - 958.0, 1057.0, 1164.0, 1051.0, 1003.0, 1887.0, 1809.0, 1619.0, 562.0, - 560.0, 373.0, 436.0, 421.0, 466.0, 467.0, 899.0, 1407.0, 1280.0, - 1162.0, 686.0, 716.0, 350.0, 329.0, 461.0, 596.0, 783.0, 696.0, - 557.0, 262.0, 248.0, 264.0, 833.0, 779.0, 869.0, 733.0, 952.0, - 1578.0, 1239.0, 1018.0, 774.0, 821.0, 1080.0, 872.0, 916.0, 1108.0, - 1188.0, 1492.0, 1464.0, 1083.0, 1531.0, 1463.0, 1818.0, 1080.0, 1186.0, - 1096.0, 930.0, 690.0, 664.0, 780.0, 592.0, 458.0, 706.0, 872.0, - 835.0, 475.0, 718.0, 853.0, 1151.0, 802.0, 896.0, 758.0, 773.0, - 872.0, 821.0, 651.0, 1343.0, 1296.0, 1902.0, 1039.0, 840.0, 266.0, - 604.0, 612.0, 646.0, 417.0, 926.0, 1114.0, 1028.0, 904.0, 618.0, - 722.0, 337.0, 322.0, 714.0, 984.0, 1171.0, 710.0, 499.0, 353.0, - 297.0, 709.0, 1193.0, 1331.0, 1025.0, 1145.0, 1642.0, 2120.0, 1629.0, - 956.0, 836.0, 831.0, 1047.0, 789.0, 1081.0, 1676.0, 1802.0, 1894.0, - 1404.0, 1007.0, 1069.0, 1197.0, 1530.0, 1154.0, 1170.0, 1550.0, 1400.0, - 1178.0, 1102.0, 1258.0, 1058.0, 480.0, 516.0, 754.0, 944.0, 713.0, - 1377.0, 1461.0, 1684.0, 954.0, 894.0, 922.0, 908.0, 748.0, 1146.0, - 1060.0, 1288.0, 1040.0, 1754.0, 1361.0, 1012.0, 280.0, 444.0, 618.0, - 634.0, 410.0, 487.0, 699.0, 816.0, 940.0, 662.0, 976.0, 879.0, - 921.0, 971.0, 981.0, 1127.0, 645.0, 610.0, 656.0, 1164.0, 1383.0, - 1723.0, 1334.0, 1071.0, 910.0, 1439.0, 1587.0, 1316.0, 755.0, 735.0, - 625.0, 734.0, 1136.0, 1422.0, 2266.0, 1858.0, 1990.0, 980.0, 624.0, - 530.0, 791.0, 844.0, 732.0, 649.0, 1225.0, 1035.0, 1539.0, 1335.0, - 1399.0, 665.0, 652.0, 880.0, 1134.0, 926.0, 943.0, 1387.0, 1413.0, - 1326.0, 870.0, 870.0, 996.0, 945.0, 416.0, 838.0, 890.0, 824.0, - 684.0, 1224.0, 1370.0, 1031.0, 313.0, 443.0, 1047.0, 1105.0, 844.0, - 573.0, 627.0, 721.0, 791.0, 637.0, 930.0, 825.0, 1151.0, 1289.0, - 1550.0, 1356.0, 959.0, 839.0, 997.0, 1636.0, 1767.0, 2107.0, 1647.0, - 1368.0, 1015.0, 1247.0, 1147.0, 1152.0, 742.0, 809.0, 519.0, 437.0, - 888.0, 1226.0, 1802.0, 1108.0, 1081.0, 495.0, 562.0, 503.0, 670.0, - 662.0, 674.0, 1045.0, 1541.0, 1487.0, 1675.0, 1487.0, 1345.0, 929.0, - 984.0, 1109.0, 933.0, 729.0, 1115.0, 1435.0, 1411.0, 884.0, 760.0, - 970.0, 1412.0, 1169.0, 1012.0, 936.0, 1036.0, 822.0, 918.0, 1450.0, - 1730.0, 1237.0, 494.0, 248.0, 808.0, 885.0, 824.0, 545.0, 721.0, - 829.0, 1141.0, 895.0, 1258.0, 926.0, 1232.0, 904.0, 1152.0, 1018.0, - 997.0, 847.0, 921.0, 1622.0, 1423.0, 1831.0, 1203.0, 1460.0, 847.0, - 749.0, 395.0, 453.0, 755.0, 1332.0, 1194.0, 867.0, 952.0, 1027.0, - 1262.0, 622.0, 785.0, 623.0, 886.0, 913.0, 1200.0, 912.0, 1118.0, - 1297.0, 1581.0, 1267.0, 1273.0, 909.0, 723.0, 476.0, 1113.0, 1278.0, - 1171.0, 657.0, 1214.0, 1206.0, 1132.0, 964.0, 1048.0, 1620.0, 1580.0, - 1409.0, 1184.0, 832.0, 839.0, 497.0, 693.0, 976.0, 1380.0, 1054.0, - 671.0, 248.0, 681.0, 756.0, 733.0, 398.0, 516.0, 731.0, 1123.0, - 923.0, 1126.0, 714.0, 1080.0, 704.0, 1140.0, 976.0, 1062.0, 784.0, - 638.0, 825.0, 895.0, 1041.0, 906.0, 962.0, 759.0, 615.0, 290.0, - 558.0, 852.0, 1256.0, 1416.0, 1121.0, 1086.0, 731.0, 708.0, 388.0, - 384.0, 906.0, 1458.0, 1884.0, 1830.0, 1518.0, 1272.0, 1350.0, 1414.0, - 1304.0, 902.0, 570.0, 371.0, 616.0, 752.0, 882.0, 760.0, 716.0, - 1089.0, 1044.0, 862.0, 818.0, 1070.0, 1688.0, 1564.0, 1336.0, 1956.0, - 1628.0, 1273.0, 841.0, 989.0, 1366.0, 1646.0, 1323.0, 888.0, 190.0, - 264.0, 389.0, 389.0, 325.0, 375.0, 801.0, 1255.0, 1187.0, 1186.0, - 805.0, 963.0, 970.0, 1138.0, 1132.0, 932.0, 746.0, 473.0, 508.0, - 726.0, 889.0, 658.0, 670.0, 643.0, 795.0, 542.0, 561.0, 879.0, - 1288.0, 1615.0, 1172.0, 976.0, 494.0, 441.0, 311.0, 407.0, 920.0, - 1442.0, 1671.0, 1806.0, 1441.0, 1383.0, 919.0, 1056.0, 756.0, 610.0, - 490.0, 505.0, 626.0, 606.0, 991.0, 1381.0, 1281.0, 1185.0, 916.0, - 788.0, 948.0, 886.0, 1203.0, 947.0, 969.0, 1368.0, 1752.0, 1623.0, - 1223.0, 767.0, 774.0, 878.0, 899.0, 913.0, 390.0, 549.0, 681.0, - 720.0, 539.0, 501.0, 1113.0, 1279.0, 1097.0, 692.0, 572.0, 952.0, - 1147.0, 1107.0, 693.0, 701.0, 840.0, 778.0, 717.0, 767.0, 1043.0, - 692.0, 500.0, 437.0, 691.0, 798.0, 786.0, 734.0, 681.0, 980.0, - 791.0, 1383.0, 938.0, 935.0, 317.0, 231.0, 618.0, 922.0, 1347.0, - 1171.0, 1544.0, 1362.0, 1359.0, 980.0, 848.0, 808.0, 920.0, 943.0, - 991.0, 731.0, 1011.0, 1324.0, 1632.0, 1417.0, 885.0, 479.0, 323.0, - 640.0, 539.0, 951.0, 747.0, 1144.0, 1508.0, 1484.0, 1260.0, 928.0, - 1032.0, 946.0, 877.0, 773.0, 457.0, 682.0, 652.0, 744.0, 479.0, - 577.0, 1107.0, 1509.0, 1341.0, 840.0, 448.0, 686.0, 1021.0, 1021.0, - 849.0, 835.0, 1338.0, 1166.0, 1120.0, 790.0, 1224.0, 881.0, 817.0, - 993.0, 1222.0, 1352.0, 789.0, 769.0, 619.0, 610.0, 406.0, 836.0, - 779.0, 769.0, 314.0, 259.0, 528.0, 557.0, 599.0, 482.0, 855.0, - 1054.0, 1352.0, 1441.0, 1408.0, 1184.0, 1054.0, 1026.0, 990.0, 980.0, - 1068.0, 1604.0, 1651.0, 1831.0, 1212.0, 1001.0, 785.0, 858.0, 615.0, - 737.0, 765.0, 606.0, 904.0, 1114.0, 769.0, 517.0, 703.0, 677.0, - 614.0, 482.0, 475.0, 1078.0, 967.0, 1159.0, 513.0, 656.0, 900.0, - 1430.0, 1195.0, 897.0, 386.0, 715.0, 625.0, 693.0, 500.0, 806.0, - 1203.0, 1067.0, 1001.0, 673.0, 1011.0, 1136.0, 1228.0, 1565.0, 1282.0, - 1226.0, 583.0, 583.0, 465.0, 438.0, 328.0, 717.0, 697.0, 1107.0, - 685.0, 811.0, 973.0, 940.0, 808.0, 217.0, 687.0, 1176.0, 1997.0, - 2129.0, 1912.0, 1576.0, 1204.0, 1120.0, 832.0, 936.0, 692.0, 1028.0, - 1195.0, 1582.0, 1359.0, 1054.0, 897.0, 936.0, 744.0, 852.0, 722.0, - 494.0, 466.0, 594.0, 332.0, 566.0, 1042.0, 1173.0, 886.0, 408.0, - 430.0, 1017.0, 916.0, 1081.0, 492.0, 520.0, 410.0, 880.0, 996.0, - 1062.0, 641.0, 712.0, 416.0, 518.0, 695.0, 843.0, 1109.0, 842.0, - 968.0, 764.0, 989.0, 1128.0, 1248.0, 1549.0, 1628.0, 1431.0, 818.0, - 474.0, 355.0, 420.0, 288.0, 325.0, 427.0, 805.0, 739.0, 1097.0, - 1391.0, 1464.0, 930.0, 390.0, 694.0, 971.0, 1575.0, 1707.0, 1616.0, - 1320.0, 940.0, 984.0, 632.0, 731.0, 516.0, 746.0, 644.0, 1178.0, - 1172.0, 1381.0, 1068.0, 1220.0, 922.0, 634.0, 374.0, 650.0, 610.0, - 614.0, 504.0, 734.0, 1118.0, 1039.0, 741.0, 503.0, 571.0, 1051.0, - 908.0, 963.0, 430.0, 644.0, 788.0, 974.0, 808.0, 818.0, 671.0, - 878.0, 674.0, 920.0, 1059.0, 967.0, 706.0, 411.0, 599.0, 706.0, - 911.0, 1065.0, 1197.0, 1142.0, 1184.0, 1159.0, 900.0, 532.0, 767.0, - 785.0, 1119.0, 738.0, 1205.0, 1541.0, 1430.0, 1588.0, 1542.0, 1605.0, - 1187.0, 652.0, 979.0, 1095.0, 1210.0, 1134.0, 896.0, 1012.0, 816.0, - 910.0, 523.0, 846.0, 716.0, 768.0, 943.0, 1242.0, 1318.0, 938.0, - 1065.0, 1326.0, 1136.0, 652.0, 206.0, 859.0, 869.0, 936.0, 731.0, - 1029.0, 1375.0, 1210.0, 930.0, 682.0, 828.0, 1106.0, 972.0, 835.0, - 724.0, 955.0, 1256.0, 1220.0, 1049.0, 977.0, 986.0, 1030.0, 888.0, - 850.0, 1149.0, 1047.0, 744.0, 438.0, 622.0, 683.0, 940.0, 770.0, - 816.0, 548.0, 846.0, 1165.0, 1070.0, 692.0, 795.0, 804.0, 1178.0, - 732.0, 1193.0, 1149.0, 1050.0, 1095.0, 1207.0, 1235.0, 932.0, 736.0, - 1109.0, 1022.0, 730.0, 488.0, 341.0, 575.0, 599.0, 730.0, 749.0, - 998.0, 952.0, 658.0, 759.0, 1147.0, 1237.0, 909.0, 1085.0, 1330.0, - 1152.0, 642.0, 345.0, 957.0, 1028.0, 1047.0, 739.0, 1078.0, 1384.0, - 1250.0, 915.0, 757.0, 807.0, 1084.0, 869.0, 612.0, 636.0, 919.0, - 1340.0, 1180.0, 936.0, 806.0, 816.0, 906.0, 882.0, 890.0, 1219.0, - 1209.0, 841.0, 415.0, 358.0, 438.0, 775.0, 764.0, 762.0, 464.0, - 376.0, 774.0, 717.0, 755.0, 975.0, 996.0, 1322.0, 884.0, 1165.0, - 1135.0, 932.0, 1081.0, 1239.0, 1179.0, 1002.0, 628.0, 777.0, 635.0, - 601.0, 535.0, 383.0, 349.0, 429.0, 566.0, 1101.0, 1723.0, 2085.0, - 1689.0, 1445.0, 1409.0, 1233.0, 949.0, 1089.0, 1025.0, 961.0, 895.0, - 875.0, 864.0, 911.0, 992.0, 694.0, 1034.0, 1356.0, 1334.0, 1079.0, - 737.0, 669.0, 942.0, 903.0, 657.0, 727.0, 719.0, 887.0, 775.0, - 804.0, 678.0, 710.0, 766.0, 1074.0, 1274.0, 1407.0, 1445.0, 1196.0, - 1114.0, 1089.0, 930.0, 1015.0, 816.0, 802.0, 672.0, 642.0, 780.0, - 601.0, 657.0, 493.0, 596.0, 387.0, 400.0, 351.0, 368.0, 210.0, - 737.0, 943.0, 1086.0, 669.0, 1031.0, 957.0, 964.0, 618.0, 612.0, - 555.0, 395.0, 429.0, 544.0, 1392.0, 1608.0, 1912.0, 1288.0, 1392.0, - 1144.0, 1164.0, 878.0, 866.0, 573.0, 579.0, 1147.0, 1199.0, 722.0, - 542.0, 362.0, 723.0, 1066.0, 1408.0, 1220.0, 911.0, 644.0, 426.0, - 641.0, 609.0, 666.0, 560.0, 570.0, 683.0, 691.0, 952.0, 674.0, - 738.0, 516.0, 790.0, 1131.0, 1373.0, 1409.0, 1371.0, 1303.0, 1302.0, - 987.0, 1019.0, 1108.0, 1152.0, 963.0, 717.0, 623.0, 460.0, 411.0, - 545.0, 668.0, 486.0, 397.0, 496.0, 581.0, 470.0, 782.0, 1102.0, - 1254.0, 910.0, 1004.0, 882.0, 850.0, 598.0, 618.0, 592.0, 398.0, - 295.0, 423.0, 1013.0, 1313.0, 1601.0, 1333.0, 1480.0, 1124.0, 1146.0, - 906.0, 842.0, 533.0, 467.0, 1088.0, 1107.0, 714.0, 415.0, 303.0, - 774.0, 914.0, 918.0, 564.0, 418.0, 413.0, 429.0, 673.0, 657.0, - 702.0, 456.0, 464.0, 597.0, 709.0, 1196.0, 826.0, 1022.0, 604.0, - 972.0, 1124.0, 1170.0, 1059.0, 1467.0, 1707.0, 1988.0, 1456.0, 1232.0, - 1208.0, 1168.0, 1449.0, 1061.0, 883.0, 414.0, 363.0, 437.0, 470.0, - 548.0, 439.0, 571.0, 478.0, 433.0, 530.0, 784.0, 1142.0, 1038.0, - 1326.0, 942.0, 1261.0, 757.0, 873.0, 806.0, 728.0, 581.0, 263.0, - 471.0, 493.0, 805.0, 1045.0, 1614.0, 1298.0, 1169.0, 619.0, 578.0, - 289.0, 271.0, 545.0, 663.0, 815.0, 550.0, 295.0, 580.0, 717.0, - 790.0, 668.0, 622.0, 729.0, 795.0, 1009.0, 1213.0, 1209.0, 841.0, - 490.0, 550.0, 584.0, 1126.0, 1232.0, 1402.0, 830.0, 442.0, 442.0, - 572.0, 673.0, 955.0, 1037.0, 1362.0, 1070.0, 952.0, 1136.0, 1083.0, - 1334.0, 788.0, 667.0, 350.0, 401.0, 625.0, 791.0, 785.0, 658.0, - 583.0, 526.0, 511.0, 490.0, 702.0, 788.0, 832.0, 850.0, 778.0, - 1080.0, 798.0, 744.0, 642.0, 603.0, 602.0, 324.0, 351.0, 625.0, - 1173.0, 1963.0, 1792.0, 1376.0, 913.0, 1165.0, 1044.0, 741.0, 307.0, - 245.0, 318.0, 521.0, 573.0, 507.0, 853.0, 859.0, 1062.0, 1196.0, - 1096.0, 1077.0, 1003.0, 1013.0, 1359.0, 1015.0, 847.0, 277.0, 269.0, - 449.0, 808.0, 1142.0, 1092.0, 762.0, 386.0, 441.0, 651.0, 624.0, - 871.0, 849.0, 1250.0, 1218.0, 1076.0, 978.0, 721.0, 1035.0, 819.0, - 756.0, 676.0, 982.0, 1044.0, 936.0, 1109.0, 1087.0, 1106.0, 563.0, - 498.0, 384.0, 414.0, 541.0, 724.0, 956.0, 1190.0, 1268.0, 1240.0, - 922.0, 894.0, 771.0, 949.0, 747.0, 616.0, 882.0, 1370.0, 2104.0, - 1664.0, 1264.0, 686.0, 1030.0, 1113.0, 1013.0, 546.0, 475.0, 724.0, - 1195.0, 1467.0, 1135.0, 1233.0, 819.0, 1130.0, 1304.0, 1250.0, 1205.0, - 865.0, 847.0, 1186.0, 1004.0, 910.0, 475.0, 429.0, 585.0, 604.0, - 979.0, 779.0, 637.0, 438.0, 1004.0, 1398.0, 1202.0, 716.0, 446.0, - 658.0, 788.0, 774.0, 772.0, 613.0, 555.0, 445.0, 526.0, 752.0, - 1120.0, 1094.0, 1000.0, 1069.0, 955.0, 979.0, 426.0, 477.0, 261.0, - 229.0, 182.0, 364.0, 513.0, 1163.0, 983.0, 1104.0, 518.0, 526.0, - 515.0, 747.0, 1029.0, 819.0, 1029.0, 973.0, 1302.0, 982.0, 888.0, - 951.0, 1231.0, 1397.0, 1474.0, 987.0, 979.0, 738.0, 1313.0, 1403.0, - 1137.0, 1199.0, 719.0, 933.0, 953.0, 1055.0, 1047.0, 773.0, 602.0, - 566.0, 456.0, 531.0, 661.0, 603.0, 707.0, 496.0, 459.0, 234.0, - 278.0, 507.0, 1310.0, 1816.0, 2024.0, 1536.0, 1040.0, 760.0, 696.0, - 698.0, 904.0, 748.0, 894.0, 620.0, 660.0, 734.0, 1062.0, 1146.0, - 816.0, 1100.0, 877.0, 1153.0, 451.0, 503.0, 239.0, 207.0, 224.0, - 375.0, 502.0, 1064.0, 1191.0, 1422.0, 930.0, 590.0, 550.0, 712.0, - 1238.0, 946.0, 868.0, 386.0, 422.0, 412.0, 488.0, 869.0, 743.0, - 973.0, 1120.0, 1127.0, 1231.0, 1101.0, 1618.0, 1238.0, 1003.0, 673.0, - 293.0, 307.0, 491.0, 654.0, 773.0, 995.0, 857.0, 678.0, 328.0, - 407.0, 786.0, 804.0, 694.0, 466.0, 361.0, 352.0, 340.0, 505.0, - 1696.0, 2084.0, 2306.0, 1342.0, 908.0, 614.0, 498.0, 634.0, 862.0, - 765.0, 1007.0, 751.0, 788.0, 442.0, 542.0, 617.0, 559.0, 675.0, - 532.0, 644.0, 349.0, 382.0, 256.0, 209.0, 374.0, 362.0, 519.0, - 593.0, 907.0, 896.0, 665.0, 321.0, 383.0, 504.0, 1008.0, 824.0, - 654.0, 140.0, 238.0, 458.0, 886.0, 1212.0, 971.0, 701.0, 747.0, - 838.0, 912.0, 640.0, 848.0, 368.0, 421.0, 345.0, 765.0, 626.0, - 754.0, 627.0, 765.0, 1091.0, 808.0, 681.0, 317.0, 355.0, 912.0, - 1124.0, 1065.0, 737.0, 605.0, 607.0, 491.0, 339.0, 1126.0, 1300.0, - 1698.0, 1104.0, 964.0, 644.0, 528.0, 528.0, 906.0, 909.0, 1303.0, - 1359.0, 1356.0, 944.0, 700.0, 783.0, 710.0, 556.0, 457.0, 565.0, - 452.0, 459.0, 307.0, 264.0, 381.0, 400.0, 632.0, 482.0, 802.0, - 720.0, 737.0, 411.0, 399.0, 471.0, 764.0, 665.0, 499.0, 224.0, - 341.0, 364.0, 712.0, 960.0, 979.0, 981.0, 627.0, 671.0, 501.0, - 649.0, 892.0, 754.0, 565.0, 825.0, 1089.0, 1079.0, 945.0, 822.0, - 689.0, 979.0, 662.0, 665.0, 363.0, 323.0, 804.0, 1008.0, 1071.0, - 853.0, 649.0, 642.0, 406.0, 268.0, 623.0, 703.0, 823.0, 380.0, - 460.0, 410.0, 648.0, 786.0, 982.0, 869.0, 882.0, 1222.0, 1641.0, - 1800.0, 1364.0, 915.0, 484.0, 774.0, 863.0, 905.0, 458.0, 351.0, - 279.0, 338.0, 505.0, 535.0, 521.0, 311.0, 412.0, 402.0, 417.0, - 395.0, 509.0, 623.0, 604.0, 458.0, 724.0, 1043.0, 1217.0, 926.0, - 812.0, 778.0, 698.0, 646.0, 453.0, 418.0, 272.0, 335.0, 806.0, - 1038.0, 785.0, 1245.0, 1481.0, 1777.0, 1223.0, 1093.0, 922.0, 776.0, - 416.0, 269.0, 371.0, 377.0, 700.0, 884.0, 1035.0, 777.0, 605.0, - 490.0, 558.0, 486.0, 649.0, 705.0, 817.0, 778.0, 948.0, 852.0, - 796.0, 564.0, 758.0, 752.0, 687.0, 1061.0, 1451.0, 1896.0, 1420.0, - 1184.0, 591.0, 945.0, 1057.0, 1089.0, 659.0, 337.0, 478.0, 462.0, - 556.0, 466.0, 440.0, 296.0, 328.0, 284.0, 374.0, 416.0, 636.0, - 573.0, 506.0, 542.0, 917.0, 1326.0, 1198.0, 956.0, 576.0, 648.0, - 555.0, 915.0, 675.0, 653.0, 301.0, 386.0, 812.0, 1360.0, 1067.0, - 1591.0, 1275.0, 1650.0, 1070.0, 1015.0, 1011.0, 909.0, 727.0, 330.0, - 340.0, 314.0, 365.0, 557.0, 784.0, 722.0, 606.0, 388.0, 548.0, - 408.0, 599.0, 665.0, 853.0, 812.0, 1264.0, 1190.0, 1220.0, 964.0, - 964.0, 779.0, 322.0, 340.0, 705.0, 1255.0, 1135.0, 1001.0, 746.0, - 1242.0, 1362.0, 1144.0, 726.0, 344.0, 452.0, 666.0, 916.0, 778.0, - 516.0, 266.0, 348.0, 268.0, 360.0, 404.0, 1070.0, 1028.0, 942.0, - 566.0, 1187.0, 1554.0, 1512.0, 1003.0, 1046.0, 813.0, 708.0, 424.0, - 433.0, 450.0, 300.0, 523.0, 550.0, 976.0, 1392.0, 1642.0, 1418.0, - 1576.0, 1290.0, 936.0, 990.0, 926.0, 1026.0, 414.0, 336.0, 296.0, - 492.0, 766.0, 1268.0, 997.0, 905.0, 707.0, 958.0, 1212.0, 1058.0, - 1012.0, 670.0, 754.0, 1516.0, 1509.0, 1387.0, 1119.0, 1039.0, 896.0, - 186.0, 291.0, 318.0, 425.0, 519.0, 725.0, 1116.0, 956.0, 1032.0, - 672.0, 704.0, 572.0, 692.0, 872.0, 872.0, 630.0, 678.0, 508.0, - 523.0, 201.0, 247.0, 316.0, 962.0, 914.0, 1074.0, 700.0, 925.0, - 732.0, 590.0, 461.0, 894.0, 823.0, 779.0, 939.0, 889.0, 853.0, - 237.0, 439.0, 627.0, 1004.0, 1426.0, 1236.0, 1070.0, 1032.0, 1010.0, - 651.0, 785.0, 778.0, 881.0, 311.0, 632.0, 578.0, 1136.0, 1230.0, - 1964.0, 1597.0, 1337.0, 883.0, 770.0, 1058.0, 734.0, 896.0, 876.0, - 1082.0, 1532.0, 1325.0, 1461.0, 1661.0, 1367.0, 1060.0, 148.0, 259.0, - 294.0, 551.0, 757.0, 997.0, 1318.0, 1010.0, 1238.0, 1172.0, 1222.0, - 934.0, 590.0, 1268.0, 1224.0, 1094.0, 672.0, 512.0, 623.0, 403.0, - 449.0, 430.0, 882.0, 816.0, 935.0, 535.0, 1215.0, 1009.0, 1041.0, - 421.0, 852.0, 719.0, 691.0, 665.0, 698.0, 774.0, 836.0, 1190.0, - 508.0, 772.0, 1184.0, 1434.0, 1250.0, 1576.0, 1294.0, 1206.0, 1094.0, - 907.0, 875.0, 597.0, 1178.0, 1309.0, 1475.0, 1173.0, 1685.0, 1335.0, - 1045.0, 679.0, 1001.0, 1515.0, 1184.0, 972.0, 862.0, 1080.0, 1100.0, - 1087.0, 1185.0, 1597.0, 1140.0, 858.0, 131.0, 256.0, 348.0, 601.0, - 808.0, 982.0, 1234.0, 944.0, 1120.0, 1062.0, 1086.0, 958.0, 742.0, - 1312.0, 1264.0, 1276.0, 874.0, 810.0, 871.0, 923.0, 751.0, 528.0, - 696.0, 636.0, 821.0, 469.0, 1409.0, 1202.0, 1234.0, 582.0, 611.0, - 579.0, 345.0, 710.0, 653.0, 707.0, 779.0, 974.0, 501.0, 699.0, - 698.0, 922.0, 836.0, 1444.0, 1162.0, 1539.0, 1387.0, 1370.0, 913.0, - 815.0, 1658.0, 1788.0, 2102.0, 1259.0, 1550.0, 1031.0, 824.0, 376.0, - 663.0, 853.0, 848.0, 686.0, 952.0, 1054.0, 796.0, 798.0, 1072.0, - 1314.0, 812.0, 466.0, 215.0, 264.0, 292.0, 543.0, 688.0, 916.0, - 1008.0, 1046.0, 1264.0, 1254.0, 1104.0, 848.0, 577.0, 1109.0, 1249.0, - 1384.0, 1058.0, 846.0, 942.0, 918.0, 828.0, 540.0, 622.0, 516.0, - 735.0, 479.0, 1515.0, 1292.0, 1460.0, 988.0, 1023.0, 849.0, 467.0, - 482.0, 523.0, 493.0, 911.0, 988.0, 373.0, 350.0, 504.0, 1100.0, - 940.0, 1345.0, 1159.0, 1639.0, 1459.0, 1669.0, 1373.0, 1382.0, 1442.0, - 1369.0, 1385.0, 695.0, 703.0, 622.0, 938.0, 1032.0, 1205.0, 1021.0, - 799.0, 381.0, 335.0, 662.0, 686.0, 1014.0, 888.0, 1282.0, 864.0, - 657.0, 215.0, 187.0, 266.0, 290.0, 568.0, 576.0, 860.0, 812.0, - 826.0, 546.0, 366.0, 476.0, 469.0, 611.0, 709.0, 864.0, 1028.0, - 890.0, 974.0, 832.0, 826.0, 534.0, 740.0, 570.0, 811.0, 453.0, - 1105.0, 988.0, 1168.0, 1028.0, 892.0, 816.0, 455.0, 639.0, 566.0, - 479.0, 383.0, 484.0, 445.0, 410.0, 560.0, 876.0, 704.0, 819.0, - 689.0, 1189.0, 1029.0, 1343.0, 1091.0, 1020.0, 909.0, 656.0, 1015.0, - 542.0, 813.0, 1150.0, 1451.0, 1416.0, 954.0, 740.0, 514.0, 378.0, - 449.0, 798.0, 780.0, 870.0, 716.0, 1042.0, 798.0, 625.0, 249.0, - 444.0, 897.0, 923.0, 917.0, 496.0, 532.0, 921.0, 960.0, 921.0, - 337.0, 369.0, 293.0, 641.0, 693.0, 788.0, 788.0, 698.0, 695.0, - 387.0, 581.0, 524.0, 506.0, 284.0, 837.0, 1169.0, 1333.0, 837.0, - 601.0, 677.0, 1013.0, 1449.0, 1276.0, 1027.0, 554.0, 509.0, 337.0, - 420.0, 828.0, 627.0, 634.0, 934.0, 686.0, 871.0, 791.0, 951.0, - 655.0, 759.0, 851.0, 852.0, 448.0, 210.0, 155.0, 119.0, 255.0, - 1194.0, 1567.0, 1630.0, 932.0, 574.0, 522.0, 398.0, 925.0, 1538.0, - 1490.0, 1160.0, 486.0, 992.0, 1251.0, 1175.0, 787.0, 577.0, 1137.0, - 1105.0, 985.0, 622.0, 522.0, 795.0, 1072.0, 1233.0, 805.0, 353.0, - 148.0, 716.0, 770.0, 867.0, 357.0, 433.0, 401.0, 405.0, 697.0, - 728.0, 918.0, 782.0, 1175.0, 1433.0, 1261.0, 1253.0, 1201.0, 1069.0, - 1038.0, 1418.0, 1441.0, 1101.0, 436.0, 527.0, 461.0, 480.0, 616.0, - 675.0, 616.0, 1118.0, 880.0, 1104.0, 728.0, 878.0, 608.0, 438.0, - 438.0, 416.0, 426.0, 507.0, 500.0, 475.0, 434.0, 1015.0, 1029.0, - 870.0, 662.0, 786.0, 825.0, 509.0, 874.0, 1180.0, 1094.0, 628.0, - 362.0, 503.0, 1090.0, 942.0, 994.0, 897.0, 1441.0, 1456.0, 1079.0, - 704.0, 506.0, 736.0, 1091.0, 1210.0, 971.0, 565.0, 439.0, 887.0, - 915.0, 989.0, 376.0, 358.0, 287.0, 316.0, 474.0, 554.0, 724.0, - 715.0, 1471.0, 1809.0, 2004.0, 1773.0, 1682.0, 1286.0, 1446.0, 1745.0, - 1795.0, 1066.0, 503.0, 469.0, 475.0, 254.0, 591.0, 453.0, 410.0, - 784.0, 824.0, 1072.0, 808.0, 830.0, 620.0, 970.0, 1038.0, 1104.0, - 641.0, 744.0, 622.0, 804.0, 747.0, 867.0, 766.0, 666.0, 1084.0, - 1108.0, 1058.0, 434.0, 720.0, 1000.0, 982.0, 551.0, 501.0, 656.0, - 1665.0, 1322.0, 1441.0, 807.0, 1034.0, 954.0, 1104.0, 1012.0, 831.0, - 304.0, 699.0, 1150.0, 1505.0, 1062.0, 725.0, 817.0, 815.0, 695.0, - 302.0, 256.0, 290.0, 217.0, 459.0, 480.0, 714.0, 889.0, 1373.0, - 1372.0, 1329.0, 1569.0, 1871.0, 1534.0, 1067.0, 824.0, 1087.0, 822.0, - 862.0, 684.0, 914.0, 746.0, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/gendata.py deleted file mode 100755 index 6957e0e9..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/gendata.py +++ /dev/null @@ -1,65 +0,0 @@ -#!/usr/bin/env python3 - -import numpy as np - -KH = 3 -KW = 3 -IH = 72 -IW = 72 -OH = IH - KH + 1 -OW = IW - KW + 1 - -info = np.finfo(np.float32) -nmant = 5 # Limit precision to avoid rounding errors -maxmant = 1 << nmant -minexp = 0 -maxexp = 5 - - -# Generate floating-point values with exact mantissa and exponent -def randf(n): - return np.ldexp( - np.random.randint(maxmant, size=n), np.random.randint(minexp, maxexp, size=n) - ) - - -inputs = randf((IH, IW)).astype(np.float32) -weights = np.ones((KH, KW), dtype=np.float32) -weights_1 = np.ones(KW, dtype=np.float32) -weights_2 = np.ones(KH, dtype=np.float32) -outputs = np.full((OH, OW), np.float32(0.0)) - -# Convolution -for kh in range(KH): - for kw in range(KW): - outputs += inputs[kh : (kh + OH), kw : (kw + OW)] * weights[kh][kw] - -print( - """#define KH {} -#define KW {} -#define IH {} -#define IW {} -#define I_SIZE {} -#define OH {} -#define OW {} -#define O_SIZE {} - -""".format( - KH, KW, IH, IW, IH * IW, OH, OW, OH * OW - ) -) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=10): - print("{} {}[{}] = {{".format(data_type, name, data_size)) - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print_array("input_k1", weights_1, "IW") -print_array("input_k2", weights_2, "IH") -print_array("input_image", inputs.flatten(), "I_SIZE") -print_array("verify_data", outputs.flatten(), "O_SIZE") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/vec-slide-conv.S b/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/vec-slide-conv.S deleted file mode 100644 index df139ab3..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/vec-slide-conv.S +++ /dev/null @@ -1,217 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Vectorized 2D separable convolution -//-------------------------------------------------------------------------- - - .text - .balign 4 - - .global vec_sep_conv -/* - * Calling convention: - * a0: size_t rows - * a1: size_t cols - * a2: size_t a_stride - * a3: size_t b_stride - * a4: const float *kw - * a5: const float *kh - * a6: const float *a - * a7: float *b - */ - -#define rows a0 -#define cols a1 -#define a_stride a2 -#define b_stride a3 -#define kw a4 -#define kh a5 -#define a a6 -#define b a7 - -#define ap t0 -#define bp t1 -#define vlen t2 -#define row_count t3 -#define VLEN_stride t4 -#define ap_4 t5 -#define ap_8 t6 - -#define row_check s0 -#define rows_odd s1 -#define slide_ptr s2 - -#define kw0 ft0 -#define kw1 ft1 -#define kw2 ft2 -#define kh0 ft3 -#define kh1 ft4 -#define kh2 ft5 - -#define vload0 v0 -#define vload1 v4 -#define vload2 v8 -#define vrow0 v16 -#define vrow1 v20 -#define vtmp v24 - -#define FRAMESIZE 32 - -vec_sep_conv: - addi sp, sp, -FRAMESIZE - sd s0, 0(sp) - sd s1, 8(sp) - sd s2, 16(sp) - - # load the kernel into scalar registers - flw kw0, 0(kw) - flw kw1, 4(kw) - flw kw2, 8(kw) - flw kh0, 0(kh) - flw kh1, 4(kh) - flw kh2, 8(kh) - - slli a_stride, a_stride, 2 - slli b_stride, b_stride, 2 - - mv row_check, rows - addi row_check, row_check, -2 - - andi rows_odd, rows, 1 - -# Prolog -loop_prolog: - mv ap, a - mv bp, b - mv row_count, row_check - - vsetvli vlen, cols, e32, m4, ta, ma - slli VLEN_stride, vlen, 2 - - # Load the first row and compute horizontal - vle32.v vload0, (ap) - add slide_ptr, ap, VLEN_stride # Bump slide_ptr to the n+1 element - flw ft6, (slide_ptr) - vfmul.vf vrow0, vload0, kw0 - vfslide1down.vf vload1, vload0, ft6 - flw ft7, 4(slide_ptr) - vfmacc.vf vrow0, kw1, vload1 - vfslide1down.vf vload2, vload1, ft7 - vfmacc.vf vrow0, kw2, vload2 - - add ap, ap, a_stride - add slide_ptr, ap, VLEN_stride - - # Load the second row and compute horizontal - vle32.v vload0, (ap) - flw ft6, (slide_ptr) - vfmul.vf vrow1, vload0, kw0 - vfslide1down.vf vload1, vload0, ft6 - flw ft7, 4(slide_ptr) - vfmacc.vf vrow1, kw1, vload1 - vfslide1down.vf vload2, vload1, ft7 - vfmacc.vf vrow1, kw2, vload2 - - add ap, ap, a_stride - add slide_ptr, ap, VLEN_stride - - # Begin the vertical computation with the first and second rows - vfmul.vf vrow0, vrow0, kh0 - vfmacc.vf vrow0, kh1, vrow1 - vfmul.vf vrow1, vrow1, kh0 - - # Load the third row and compute horizontal - vle32.v vload0, (ap) - flw ft6, (slide_ptr) - vfmul.vf vtmp, vload0, kw0 - vfslide1down.vf vload1, vload0, ft6 - flw ft7, 4(slide_ptr) - vfmacc.vf vtmp, kw1, vload1 - vfslide1down.vf vload2, vload1, ft7 - vfmacc.vf vtmp, kw2, vload2 - -# Main Loop -conv_loop: - add ap, ap, a_stride - add slide_ptr, ap, VLEN_stride - - vle32.v vload0, (ap) - flw ft6, (slide_ptr) - - vfmacc.vf vrow0, kh2, vtmp - - vfslide1down.vf vload1, vload0, ft6 - - vse32.v vrow0, (bp) - - flw ft7, 4(slide_ptr) - vfslide1down.vf vload2, vload1, ft7 - - vfmacc.vf vrow1, kh1, vtmp - vfmul.vf vrow0, vtmp, kh0 - - add ap, ap, a_stride - vfmul.vf vtmp, vload0, kw0 - add slide_ptr, ap, VLEN_stride - vle32.v vload0, (ap) - flw ft6, (slide_ptr) - vfmacc.vf vtmp, kw1, vload1 - vfslide1down.vf vload1, vload0, ft6 - flw ft7, 4(slide_ptr) - vfmacc.vf vtmp, kw2, vload2 - vfslide1down.vf vload2, vload1, ft7 - add bp, bp, b_stride - - vfmacc.vf vrow1, kh2, vtmp - vfmacc.vf vrow0, kh1, vtmp - vse32.v vrow1, (bp) - vfmul.vf vrow1, vtmp, kh0 - - vfmul.vf vtmp, vload0, kw0 - add bp, bp, b_stride - vfmacc.vf vtmp, kw1, vload1 - addi row_count, row_count, -2 - vfmacc.vf vtmp, kw2, vload2 - - - bgtz row_count, conv_loop - -epilog: - vfmacc.vf vrow0, kh2, vtmp - vse32.v vrow0, (bp) - - bnez rows_odd, row_loop_complete - - vfmacc.vf vrow1, kh1, vtmp - - add ap, ap, a_stride - add slide_ptr, ap, VLEN_stride - add bp, bp, b_stride - - vle32.v vload0, (ap) - flw ft6, (slide_ptr) - vfslide1down.vf vload1, vload0, ft6 - flw ft7, 4(slide_ptr) - vfslide1down.vf vload2, vload1, ft7 - - vfmul.vf vtmp, vload0, kw0 - vfmacc.vf vtmp, kw1, vload1 - vfmacc.vf vtmp, kw2, vload2 - - vfmacc.vf vrow1, kh2, vtmp - vse32.v vrow1, (bp) - -row_loop_complete: - add a, a, VLEN_stride - add b, b, VLEN_stride - - sub cols, cols, vlen - bnez cols, loop_prolog - -exit: - ld s0, 0(sp) - ld s1, 8(sp) - ld s2, 16(sp) - addi sp, sp, FRAMESIZE - - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/vec-slide-conv_main.c b/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/vec-slide-conv_main.c deleted file mode 100644 index a6a9ea70..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-slide-conv/vec-slide-conv_main.c +++ /dev/null @@ -1,42 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Separable Convolution Benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized 2D separable convolution implementation. - -#include "util.h" -#include -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_sep_conv(size_t, size_t, size_t, size_t, const float *, const float *, - const float *, float *); - -int main(int argc, char *argv[]) { - float results_data[O_SIZE] = {0}; - printf("slideconv OH,OW,KH,KW,IH,IW = %ld,%ld,%ld,%ld,%ld,%ld\n", OH, OW, KH, - KW, IH, IW); - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_sep_conv(OH, OW, IW, OW, input_k1, input_k2, input_image, results_data); - memset(results_data, 0, sizeof(results_data)); -#endif - - // Do the convolution - setStats(1); - vec_sep_conv(OH, OW, IW, OW, input_k1, input_k2, input_image, results_data); - setStats(0); - - // Check the results - return verifyFloat(O_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-softmax/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-softmax/gen_data.py deleted file mode 100644 index df20a8d6..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-softmax/gen_data.py +++ /dev/null @@ -1,65 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# arg1: vector size, arg2: filter size - -import random as rand -import numpy as np -import sys - - -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -def rand_matrix(N, dtype): - return np.random.rand(N).astype(dtype) - - -# SCRIPT - -if len(sys.argv) == 3: - channels = int(sys.argv[1]) - innerSize = int(sys.argv[2]) -else: - print("Error. Give me two arguments: the number of channels and the inner size.") - sys.exit() - -# Vector of samples -i = rand_matrix(channels * innerSize, np.float32).astype(np.float32) - -# Results buffer -buf = np.zeros(channels * innerSize, dtype=np.float32) -o_s = np.zeros(channels * innerSize, dtype=np.float32) -o_g = np.zeros(channels * innerSize, dtype=np.float32) - -# Create the file -print('.section .data,"aw",@progbits') -emit("channels", np.array(channels, dtype=np.uint64)) -emit("innerSize", np.array(innerSize, dtype=np.uint64)) -emit("i", i, "NR_LANES*4") -emit("buf", i, "NR_LANES*4") -emit("o_s", i, "NR_LANES*4") -emit("o_v", i, "NR_LANES*4") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-softmax/main.c b/bb-tests/workloads/src/CTest/rvv/vec-softmax/main.c deleted file mode 100644 index a4d95b69..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-softmax/main.c +++ /dev/null @@ -1,85 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include -#include - -#include "ara/util.h" -#include "softmax.h" -#include "util.h" -#include - -// Check the results using a threshold -#define CHECK - -// Sanity check to see that there are some precision differences -// between the two algorithms -// #define SANITY_CHECK - -// Sanity check to see the results -// #define PRINT_RESULTS - -#define THRESHOLD 0.1 - -extern uint64_t channels; -extern uint64_t innerSize; -extern float i[] __attribute__((aligned(32))); -extern float buf[] __attribute__((aligned(32))); -extern float o_s[] __attribute__((aligned(32))); -extern float o_v[] __attribute__((aligned(32))); - -int main() { - printf("SOFTMAX\n"); - printf("Channels: %lu\nInner Size: %lu\n", channels, innerSize); - - int64_t runtime; - int error = 0; - unsigned long cycles1, cycles2, instr2, instr1; - - printf("Scalar Softmax...\n"); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - softmax(i, o_s, buf, channels, innerSize); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - runtime = cycles2 - cycles1; - printf("The scalar SOFTMAX execution took %d cycles.\n", runtime); - - printf("Vector Softmax...\n"); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - softmax_vec(i, o_v, channels, innerSize); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - runtime = cycles2 - cycles1; - printf("The vector Softmax execution took %d cycles.\n", runtime); - - for (uint64_t k = 0; k < channels * innerSize; ++k) { - if (!similarity_check(o_s[k], o_v[k], THRESHOLD)) { - error = 1; - printf("Error at index %d. %x != %x\n", k, *(uint32_t *)(&o_v[k]), - *(uint32_t *)(&o_s[k])); - } - } - if (!error) - printf("Check okay. No errors.\n"); - - return error; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-softmax/softmax.c b/bb-tests/workloads/src/CTest/rvv/vec-softmax/softmax.c deleted file mode 100644 index 37a86196..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-softmax/softmax.c +++ /dev/null @@ -1,213 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#include -#include - -#include "riscv_vector.h" - -#include "ara/exp.h" - -// Our fdiv cannot receive any X in input -// The following macro is just a trick and should NOT be used -#define RESET_VREGS - -// Scalar implmentation inspired by OpenCV softmax: -// https://github.com/opencv/opencv/blob/master/modules/dnn/src/layers/softmax_layer.cpp -void softmax(const float *i, const float *o, const float *buf, - uint64_t channels, uint64_t innerSize) { - - // OpenCV names - float *srcPtr = (float *)i; - float *bufPtr = (float *)buf; - float *dstPtr = (float *)o; - - // Batch size == 1 - size_t outerSize = 1; - - // Steps - size_t outerStep = channels * innerSize; - size_t cnStep = innerSize; - - // Compute max along axis - for (size_t outerDim = 0; outerDim < outerSize; outerDim++) { - - size_t srcOffset = outerDim * outerStep; - size_t bufOffset = outerDim * cnStep; - - memcpy(bufPtr + bufOffset, srcPtr + srcOffset, innerSize * sizeof(float)); - - for (size_t cnDim = 1; cnDim < channels; cnDim++) { - for (size_t i = 0; i < innerSize; i++) { - bufPtr[bufOffset + i] = - fmax(bufPtr[bufOffset + i], srcPtr[srcOffset + cnDim * cnStep + i]); - } - } - - // Subtract max - for (size_t outerDim = 0; outerDim < outerSize; outerDim++) { - size_t srcOffset = outerDim * outerStep; - size_t bufOffset = outerDim * cnStep; - - for (size_t cnDim = 0; cnDim < channels; cnDim++) { - const int offset = srcOffset + cnDim * cnStep; - for (size_t i = 0; i < innerSize; i++) - dstPtr[offset + i] = srcPtr[offset + i] - bufPtr[bufOffset + i]; - } - } - - // Exponentiate - for (size_t outerDim = 0; outerDim < outerSize; outerDim++) { - size_t srcOffset = outerDim * outerStep; - - for (size_t cnDim = 0; cnDim < channels; cnDim++) { - const int offset = srcOffset + cnDim * cnStep; - for (size_t i = 0; i < innerSize; i++) - dstPtr[offset + i] = exp(dstPtr[offset + i]); - } - } - - // Sum exps and divide - for (size_t outerDim = 0; outerDim < outerSize; outerDim++) { - size_t srcOffset = outerDim * outerStep; - size_t bufOffset = outerDim * cnStep; - - // Sum exp along axis - for (size_t i = 0; i < innerSize; i++) - bufPtr[bufOffset + i] = 0.f; - - for (size_t cnDim = 0; cnDim < channels; cnDim++) { - const int offset = srcOffset + cnDim * cnStep; - for (size_t i = 0; i < innerSize; i++) - bufPtr[bufOffset + i] += dstPtr[offset + i]; - } - - // Divide by computed sum - for (size_t cnDim = 0; cnDim < channels; cnDim++) { - const int offset = srcOffset + cnDim * cnStep; - for (size_t i = 0; i < innerSize; i++) - dstPtr[offset + i] /= bufPtr[bufOffset + i]; - } - } - } -} - -void softmax_vec(const float *i, const float *o, uint64_t channels, - uint64_t innerSize) { - - /* ONLY FOR DEBUGGING PURPOSE. DELETE THE FOLLOWING ASM LINES - */ - // Clean the regs from Xes -#ifdef RESET_VREGS - volatile int temp; - asm volatile("vsetvli %0, zero, e32, m8, ta, ma" : "=r"(temp)); - - asm volatile("vmv.v.i v0, 0"); - asm volatile("vmv.v.i v8, 0"); - asm volatile("vmv.v.i v16, 0"); - asm volatile("vmv.v.i v24, 0"); -#endif - - size_t avl = innerSize; - size_t vl; - - // Stripmining pointers - float *_i = (float *)i; - float *_o = (float *)o; - // Channel pointers - float *__i = (float *)i; - float *__o = (float *)o; - - // Vector registers - vfloat32m1_t max_chunk_v; - vfloat32m1_t buf_chunk_v; - vfloat32m1_t num_chunk_v; - vfloat32m1_t den_chunk_v; - vfloat32m1_t res_chunk_v; - - // Stripmine on innerSize - for (vl = __riscv_vsetvl_e32m1(avl); avl > 0; avl -= vl) { - - vl = __riscv_vsetvl_e32m1(avl); - - /* - Calculate the maximum along the channel dimension - */ - - // Initialize the max vector - max_chunk_v = __riscv_vle32_v_f32m1(__i, vl); - // Bump the pointer - __i += innerSize; - for (uint64_t ch = 1; ch < channels; ++ch) { - // Load a chunk of the input vector - buf_chunk_v = __riscv_vle32_v_f32m1(__i, vl); - // Bump the channel pointer - __i += innerSize; - // Calculate the elm-wise maximum between the two chunks - max_chunk_v = __riscv_vfmax_vv_f32m1(max_chunk_v, buf_chunk_v, vl); - } - // Restore the channel pointer - __i = _i; - - /* - Fetch, subtract, exponentiate along the channel dimension - */ - - // Initialize accumulator - den_chunk_v = __riscv_vfmv_v_f_f32m1(0, vl); - for (uint64_t ch = 0; ch < channels; ++ch) { - // Fetch one chunk from channel ch - buf_chunk_v = __riscv_vle32_v_f32m1(__i, vl); - // Subtract the maximum - buf_chunk_v = __riscv_vfsub_vv_f32m1(buf_chunk_v, max_chunk_v, vl); - // Exponentiate - buf_chunk_v = __exp_f32m1(buf_chunk_v, vl); - // Store the numerator to memory - __riscv_vse32_v_f32m1(__o, buf_chunk_v, vl); - // Accumulate - den_chunk_v = __riscv_vfadd_vv_f32m1(den_chunk_v, buf_chunk_v, vl); - // Bump channel pointers - __i += innerSize; - __o += innerSize; - } - // Restore the pointers - __i = _i; - __o = _o; - - /* - Divide by the computed sum - */ - - for (uint64_t ch = 0; ch < channels; ++ch) { - // Load numerator from memory - num_chunk_v = __riscv_vle32_v_f32m1(__o, vl); - // Divide - res_chunk_v = __riscv_vfdiv_vv_f32m1(num_chunk_v, den_chunk_v, vl); - // Store the result to memory - __riscv_vse32_v_f32m1(__o, res_chunk_v, vl); - // Bump channel pointers - __o += innerSize; - } - // Bump stripmining pointers - _i += vl; - _o += vl; - // Reset channel pointers - __i = _i; - __o = _o; - } -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-softmax/softmax.h b/bb-tests/workloads/src/CTest/rvv/vec-softmax/softmax.h deleted file mode 100644 index 65c5beb9..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-softmax/softmax.h +++ /dev/null @@ -1,28 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// -// Author: Matteo Perotti - -#ifndef _SOFTMAX_H_ -#define _SOFTMAX_H_ - -void softmax(const float *i, const float *o, const float *buf, - uint64_t channels, uint64_t innerSize); - -void softmax_vec(const float *i, const float *o, uint64_t channels, - uint64_t innerSize); - -#endif diff --git a/bb-tests/workloads/src/CTest/rvv/vec-spmv/gen_data.py b/bb-tests/workloads/src/CTest/rvv/vec-spmv/gen_data.py deleted file mode 100644 index 9fff2b32..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-spmv/gen_data.py +++ /dev/null @@ -1,174 +0,0 @@ -#!/usr/bin/env python3 -# Copyright 2021 ETH Zurich and University of Bologna. -# -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# // Author: Chi Zhang, ETH Zurich - - -# arg1: row, arg2: column, arg3: density -# default configuration: -# # INT32 idx -# # FP64 data - -import random -import numpy as np -import sys - - -# fun for froming file -def emit(name, array, alignment="8"): - print(".global %s" % name) - print(".balign " + alignment) - print("%s:" % name) - bs = array.tobytes() - for i in range(0, len(bs), 4): - s = "" - for n in range(4): - s += "%02x" % bs[i + 3 - n] - print(" .word 0x%s" % s) - - -# generate random CSR format sparse matrix -def randomCSR(num_row, num_col, density, element_byte): - non_zero = int(num_row * num_col * density) - # print("non_zero="+str(non_zero)) - # random insert - insert_list = [] - pool = list(range(num_row * num_col)) - for x in range(non_zero): - insert = random.choice(pool) - pool.remove(insert) - # print(insert) - insert_list.append(insert) - # print("inseting: "+str(x)+"/"+str(non_zero), end="\r") - pass - insert_list.sort() - # print(insert_list) - - # Count for p_row - p_row = [] - p_row.append(0) - acc_bar = num_col - acc_cnt = 0 - for x in range(non_zero): - # print("generate p_row: "+str(x)+"/"+str(non_zero), end="\r") - if insert_list[x] >= acc_bar: - p_row.append(x) - acc_bar = acc_bar + num_col - acc_cnt = acc_cnt + 1 - while insert_list[x] >= acc_bar: - p_row.append(x) - acc_bar = acc_bar + num_col - acc_cnt = acc_cnt + 1 - pass - pass - pass - for x in range(num_row - acc_cnt): - p_row.append(non_zero) - pass - # print(p_row) - - # generate indicies - index_list = [] - for x in range(num_row): - # print("generate index: "+str(x)+"/"+str(num_row), end="\r") - length = p_row[x + 1] - p_row[x] - row_idx_list = [] - pool = list(range(0, num_col * element_byte, element_byte)) - for x in range(length): - index = random.choice(pool) - pool.remove(index) - row_idx_list.append(index) - pass - row_idx_list.sort() - index_list = index_list + row_idx_list - pass - - # generate data - # print("start generate data") - data_list = [] - for x in range(non_zero): - # data_list.append(random.random()) - data_list.append(x) - pass - - # generate vector - # print("start generate vector") - vector_list = [] - for x in range(num_col): - vector_list.append(random.random()) - # vector_list.append(x) - pass - - return non_zero, p_row, index_list, data_list, vector_list - - pass - - -# SCRIPT - - -if len(sys.argv) == 4: - R = int(sys.argv[1]) - C = int(sys.argv[2]) - D = float(sys.argv[3]) -else: - print("Error. Give me one argument: the number of vector elements.") - sys.exit() - -data_type = np.float64 -idx_type = np.int32 -element_byte = 8 -idx_byte = 4 - -# generate sparse matrix -non_zero, p_row, index_list, data_list, vector_list = randomCSR(R, C, D, element_byte) - -# Create the file -print('.section .data,"aw",@progbits') -emit("R", np.array(R, dtype=np.uint64)) -emit("C", np.array(C, dtype=np.uint64)) -emit("NZ", np.array(non_zero, dtype=np.uint64)) -emit("CSR_PROW", np.array(p_row, dtype=idx_type), "NR_LANES*4") -emit("CSR_INDEX", np.array(index_list, dtype=idx_type), "NR_LANES*4") -emit("CSR_DATA", np.array(data_list, dtype=data_type), "NR_LANES*4") -emit("CSR_IN_VECTOR", np.array(vector_list, dtype=data_type), "NR_LANES*4") -emit("CSR_OUT_VECTOR", np.zeros([C], dtype=data_type), "NR_LANES*4") - - -# TSTEPS = 1 - -# # Fill in the extra data to align the matrices to 4*NrLanes in SW -# maxNrLanes = 16 -# maxAlignment = 4*maxNrLanes # [B] -# sizeOfDType = np.dtype(dtype).itemsize # [B] -# R_ext = int(R + (maxAlignment / sizeOfDType)) -# C_ext = int(C + (maxAlignment / sizeOfDType)) - -# # Vector of samples (padding is random since it does not impact performance) -# A = np.random.rand(R_ext, C_ext).astype(dtype) -# B = np.zeros([R_ext, C_ext], dtype=dtype) - -# Create the file -# print(".section .data,\"aw\",@progbits") -# emit("R", np.array(R, dtype=np.uint64)) -# emit("C", np.array(C, dtype=np.uint64)) -# emit("TSTEPS", np.array(TSTEPS, dtype=np.uint64)) -# emit("A_v", A, 'NR_LANES*4') -# emit("B_v", B, 'NR_LANES*4') -# if not OnlyVec: -# emit("A_s", A, 'NR_LANES*4') -# emit("B_s", B, 'NR_LANES*4') diff --git a/bb-tests/workloads/src/CTest/rvv/vec-spmv/main.c b/bb-tests/workloads/src/CTest/rvv/vec-spmv/main.c deleted file mode 100644 index 5565b212..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-spmv/main.c +++ /dev/null @@ -1,78 +0,0 @@ -// Copyright 2022 ETH Zurich and University of Bologna. -// -// SPDX-License-Identifier: Apache-2.0 -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -// Author: Chi Zhang, ETH Zurich - -#include -#include - -#include "ara/spmv.h" -#include "util.h" -#include - -extern uint64_t R; -extern uint64_t C; -extern uint64_t NZ; - -extern int32_t CSR_PROW[] __attribute__((aligned(32), section(".l2"))); -extern int32_t CSR_INDEX[] __attribute__((aligned(32), section(".l2"))); -extern double CSR_DATA[] __attribute__((aligned(32), section(".l2"))); -extern double CSR_IN_VECTOR[] __attribute__((aligned(32), section(".l2"))); -extern double CSR_OUT_VECTOR[] __attribute__((aligned(32), section(".l2"))); - -int main() { - printf("SpMV\n"); - - unsigned long cycles1, cycles2, instr2, instr1; - double density = ((double)NZ) / (R * C); - double nz_per_row = ((double)NZ) / R; - -#if PREALLOCATE - spmv_csr_idx32(R, CSR_PROW, CSR_INDEX, CSR_DATA, CSR_IN_VECTOR, - CSR_OUT_VECTOR); -#endif - - printf( - "Calculating a (%d x %d) x %d sparse matrix vector multiplication...\n", - R, C, C); - printf("CSR format with %d nozeros: %ld nonzeros per 1000 elements, %ld " - "nonzeros per row \n", - NZ, (uint64_t)(density * 1000.0), (uint64_t)nz_per_row); - instr1 = read_csr(minstret); - cycles1 = read_csr(mcycle); - spmv_csr_idx32(R, CSR_PROW, CSR_INDEX, CSR_DATA, CSR_IN_VECTOR, - CSR_OUT_VECTOR); - asm volatile("fence"); - instr2 = read_csr(minstret); - cycles2 = read_csr(mcycle); - - // Metrics - int64_t runtime = cycles2 - cycles1; - float performance = 2.0 * NZ / runtime; - - printf("The execution took %d cycles.\n", runtime); - printf("The performance is %ld FLOPs/1000 cycles.\n", - (uint64_t)(1000.0 * performance)); - - printf("Verifying ...\n"); - if (spmv_verify(R, CSR_PROW, CSR_INDEX, CSR_DATA, CSR_IN_VECTOR, - CSR_OUT_VECTOR)) { - return 1; - } else { - printf("Passed.\n"); - } - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/dataset1.h deleted file mode 100644 index b195bf11..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/dataset1.h +++ /dev/null @@ -1,19 +0,0 @@ -#define DATA_SIZE 300 - -float input1_data[DATA_SIZE] = { - 2, 22, 41, 16, 28, 0, 9, 49, 37, 18, 17, 28, 6, 3, 47, 7, 29, 10, 40, - 7, 31, 10, 0, 28, 46, 17, 44, 29, 19, 44, 34, 11, 48, 0, 5, 44, 5, 37, - 14, 32, 21, 25, 15, 21, 33, 35, 38, 40, 15, 49, 33, 7, 43, 34, 18, 28, 23, - 13, 46, 12, 27, 32, 28, 25, 10, 4, 14, 35, 37, 33, 30, 18, 25, 27, 32, 46, - 9, 29, 4, 28, 13, 47, 11, 40, 16, 29, 47, 32, 45, 18, 12, 24, 45, 16, 41, - 15, 46, 29, 49, 19, 9, 27, 48, 32, 28, 49, 17, 49, 32, 40, 32, 3, 9, 10, - 5, 49, 42, 31, 3, 42, 14, 35, 17, 49, 7, 12, 45, 35, 44, 21, 13, 20, 28, - 26, 49, 35, 38, 0, 12, 24, 23, 5, 25, 43, 20, 34, 11, 20, 22, 14, 36, 14, - 33, 31, 15, 3, 11, 17, 46, 27, 14, 32, 5, 8, 30, 26, 30, 14, 19, 39, 17, - 40, 22, 36, 13, 37, 18, 37, 17, 4, 29, 49, 3, 13, 49, 42, 20, 39, 17, 26, - 25, 11, 27, 23, 45, 12, 38, 17, 12, 16, 38, 34, 29, 37, 7, 22, 1, 15, 6, - 37, 47, 27, 46, 23, 40, 27, 27, 46, 21, 39, 32, 17, 47, 44, 14, 33, 0, 25, - 2, 27, 4, 43, 4, 42, 22, 22, 29, 3, 30, 33, 19, 19, 14, 0, 40, 0, 48, - 8, 41, 14, 17, 19, 22, 26, 25, 9, 48, 45, 17, 27, 40, 49, 44, 44, 26, 2, - 47, 30, 35, 47, 17, 24, 28, 7, 44, 0, 1, 32, 28, 16, 33, 36, 21, 4, 21, - 36, 7, 14, 0, 19, 4, 34, 12, 45, 2, 23, 21, 40, 39, 36}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/root_approx_gendata.pl b/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/root_approx_gendata.pl deleted file mode 100755 index 646862de..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/root_approx_gendata.pl +++ /dev/null @@ -1,131 +0,0 @@ -#!/usr/bin/perl -w -#========================================================================== -# root_approx.pl -# -# Author: Generated -# Date: Today -# -(our $usageMsg = <<'ENDMSG') =~ s/^\#//gm; -# -# Simple script which creates an input data set and the reference data -# for the given conditional operation. -# -ENDMSG - -use strict "vars"; -use warnings; -no warnings("once"); -use Getopt::Long; - -#-------------------------------------------------------------------------- -# Command line processing -#-------------------------------------------------------------------------- - -our %opts; - -sub usage() -{ - - print "\n"; - print " Usage: conditional_gendata.pl [options] \n"; - print "\n"; - print " Options:\n"; - print " --help print this message\n"; - print " --size size of input data [1000]\n"; - print " --seed random seed [1]\n"; - print "$usageMsg"; - - exit(); -} - -sub processCommandLine() -{ - - $opts{"help"} = 0; - $opts{"size"} = 300; - $opts{"seed"} = 1; - Getopt::Long::GetOptions( \%opts, 'help|?', 'size:i', 'seed:i' ) or usage(); - $opts{"help"} and usage(); - -} - -#-------------------------------------------------------------------------- -# Helper Functions -#-------------------------------------------------------------------------- -sub printArray -{ - my $arrayName = $_[0]; - my $arrayRef = $_[1]; - my $type = $_[2]; - - my $numCols = 20; - my $arrayLen = scalar(@{$arrayRef}); - - print $type." ".$arrayName."[DATA_SIZE] = \n"; - print "{\n"; - - if ( $arrayLen <= $numCols ) { - print " "; - for ( my $i = 0; $i < $arrayLen; $i++ ) { - print sprintf("%3d",$arrayRef->[$i]); - if ( $i != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - else { - my $numRows = int($arrayLen/$numCols); - for ( my $j = 0; $j < $numRows; $j++ ) { - print " "; - for ( my $i = 0; $i < $numCols; $i++ ) { - my $index = $j*$numCols + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - if ( $arrayLen > ($numRows*$numCols) ) { - print " "; - for ( my $i = 0; $i < ($arrayLen-($numRows*$numCols)); $i++ ) { - my $index = $numCols*$numRows + $i; - print sprintf("%3d",$arrayRef->[$index]); - if ( $index != $arrayLen-1 ) { - print ", "; - } - } - print "\n"; - } - - } - - print "};\n\n"; -} - -#-------------------------------------------------------------------------- -# Main -#-------------------------------------------------------------------------- - -sub main() -{ - - processCommandLine(); - srand($opts{"seed"}); - - my @input1_data; # x - my @input2_data; # y - for ( my $i = 0; $i < $opts{"size"}; $i++ ) { - my $valueX = int(rand(50)); # x - - push( @input1_data, $valueX ); - } - - print "\n\#define DATA_SIZE ".$opts{"size"}." \n\n"; - printArray( "input1_data", \@input1_data, "float" ); # x -} - -main(); diff --git a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/vec-square_root.S b/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/vec-square_root.S deleted file mode 100644 index ad3ad4b7..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/vec-square_root.S +++ /dev/null @@ -1,32 +0,0 @@ - .text - .balign 4 - .global vec_root_approx - -# v1 = sqrt(v1) to almost 23 bits of precision. - -vec_root_approx: - vsetvli t1, a0, e32, m1, ta, mu - vle32.v v1, (a1) # load x values - sub a0, a0, t1 - slli t1, t1, 2 - fmv.w.x ft0, x0 # Mask off zero inputs - vmfne.vf v0, v1, ft0 # to avoid div by zero - vfrsqrt7.v v2, v1, v0.t # Estimate 1/sqrt(x) - vmfne.vf v0, v2, ft0, v0.t # Additionally mask off +inf inputs - li t0, 0x40400000 - vmv.v.x v4, t0 # Splat 3.0 - vfmul.vv v3, v1, v2, v0.t # x * est - vfnmsub.vv v3, v2, v4, v0.t # - x * est * est + 3 - vfmul.vv v3, v3, v2, v0.t # est * (-x * est * est + 3) - li t0, 0x3f000000 - fmv.w.x ft0, t0 # 0.5 - vfmul.vf v2, v3, ft0, v0.t # Estimate to 14 bits - vfmul.vv v3, v1, v2, v0.t # x * est - vfnmsub.vv v3, v2, v4, v0.t # - x * est * est + 3 - vfmul.vv v3, v3, v2, v0.t # est * (-x * est * est + 3) - vfmul.vf v2, v3, ft0, v0.t # Estimate to 23 bits - vfmul.vv v1, v2, v1, v0.t # x * 1/sqrt(x) - vse32.v v1, (a1) - add a1, a1, t1 # Bump pointer - bnez a0, vec_root_approx # Any more? - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/vec-square_root_main.c b/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/vec-square_root_main.c deleted file mode 100644 index f3e74eff..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-square-root-approx/vec-square_root_main.c +++ /dev/null @@ -1,37 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// square root approximation benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized conditional implementation. -// The input data (and reference data) should be generated using -// the root_approx_gendata.pl perl script and dumped to a file named -// dataset1.h. - -#include "util.h" -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" -#include - -//-------------------------------------------------------------------------- -// Main - -void vec_root_approx(size_t n, float x[]); - -int main(int argc, char *argv[]) { - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_root_approx(DATA_SIZE, input1_data); -#endif - - // Do the root - setStats(1); - vec_root_approx(DATA_SIZE, input1_data); - setStats(0); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-strlen/main.c b/bb-tests/workloads/src/CTest/rvv/vec-strlen/main.c deleted file mode 100644 index de4aa346..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-strlen/main.c +++ /dev/null @@ -1,32 +0,0 @@ -#include "util.h" -#include -#include -#include - -const char *input = - "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod " - "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim " - "veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea " - "commodo consequat. Duis aute irure dolor in reprehenderit in voluptate " - "velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint " - "occaecat cupidatat non proident, sunt in culpa qui officia deserunt " - "mollit anim id est laborum"; - -size_t strlen_rvv(const char *s); - -int main() { - size_t cycles1, cycles2; - size_t max = strlen(input); - printf("Performing strlen with max len = %ld\n", max); - - cycles1 = read_csr(mcycle); - for (size_t i = 0; i < max; i += 15) { - size_t r = strlen_rvv(input + i); - if (r != max - i) { - return 1; - } - } - cycles2 = read_csr(mcycle); - printf("The execution took %ld cycles.\n", cycles2 - cycles1); - return 0; -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-strlen/vec_strlen.S b/bb-tests/workloads/src/CTest/rvv/vec-strlen/vec_strlen.S deleted file mode 100644 index 5b323815..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-strlen/vec_strlen.S +++ /dev/null @@ -1,17 +0,0 @@ -.text -.balign 4 -.global strlen_rvv -strlen_rvv: - mv a3, a0 -loop: - vsetvli a1, x0, e8, m8, ta, ma - vle8ff.v v8, (a3) - csrr a1, vl - vmseq.vi v0, v8, 0 - vfirst.m a2, v0 - add a3, a3, a1 - bltz a2, loop - add a0, a0, a1 - add a3, a3, a2 - sub a0, a3, a0 - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/dataset1.h deleted file mode 100644 index ffeebcdd..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/dataset1.h +++ /dev/null @@ -1,116 +0,0 @@ -#define DIM_M 32 -#define DIM_N 32 -#define ARRAY_SIZE 1024 - -float input_matrix[ARRAY_SIZE] = { - 94, 91, 7, 31, 65, 87, 56, 25, 45, 24, 22, 28, 8, 18, 79, 25, 41, 73, 91, - 80, 23, 34, 73, 42, 36, 51, 79, 30, 47, 81, 50, 41, 68, 79, 3, 47, 52, 91, - 17, 11, 66, 8, 11, 33, 82, 7, 67, 95, 20, 15, 80, 19, 64, 81, 57, 14, 50, - 31, 55, 80, 26, 54, 67, 31, 64, 25, 38, 53, 32, 93, 97, 35, 44, 71, 22, 93, - 58, 70, 56, 70, 98, 14, 10, 52, 14, 8, 17, 78, 78, 18, 16, 91, 65, 70, 74, - 81, 32, 3, 60, 37, 53, 82, 24, 46, 22, 96, 71, 44, 50, 98, 7, 24, 74, 13, - 62, 25, 96, 97, 90, 29, 27, 20, 84, 60, 28, 95, 81, 94, 24, 21, 40, 84, 88, - 66, 1, 81, 76, 18, 56, 47, 73, 25, 66, 44, 59, 26, 78, 2, 87, 10, 65, 16, - 44, 92, 53, 56, 29, 86, 31, 99, 54, 39, 76, 40, 26, 22, 85, 9, 11, 12, 73, - 18, 8, 27, 14, 61, 6, 4, 99, 44, 27, 75, 77, 4, 74, 30, 79, 55, 42, 72, - 25, 92, 54, 18, 95, 7, 71, 33, 98, 99, 63, 35, 85, 81, 29, 17, 33, 49, 36, - 88, 41, 27, 84, 42, 31, 53, 98, 19, 1, 56, 43, 43, 63, 47, 6, 89, 42, 58, - 74, 90, 7, 59, 98, 7, 31, 89, 45, 86, 47, 61, 72, 47, 20, 34, 91, 1, 6, - 12, 35, 70, 90, 13, 24, 93, 30, 7, 61, 29, 66, 86, 87, 27, 78, 36, 9, 6, - 79, 88, 49, 94, 75, 46, 93, 19, 64, 65, 46, 30, 28, 75, 39, 16, 38, 39, 82, - 48, 9, 3, 32, 15, 71, 55, 3, 73, 28, 92, 22, 59, 73, 19, 9, 80, 50, 60, - 37, 11, 69, 5, 78, 98, 34, 48, 37, 14, 34, 5, 10, 72, 15, 6, 81, 33, 88, - 74, 9, 16, 8, 64, 39, 40, 62, 63, 67, 63, 16, 61, 10, 14, 17, 43, 65, 40, - 41, 10, 34, 46, 63, 87, 5, 81, 26, 61, 76, 70, 44, 77, 31, 91, 3, 5, 9, - 53, 40, 31, 41, 71, 62, 31, 68, 62, 76, 2, 17, 12, 48, 5, 76, 26, 78, 79, - 16, 34, 6, 54, 7, 78, 4, 99, 77, 70, 99, 69, 83, 4, 98, 39, 28, 35, 16, - 7, 43, 13, 71, 56, 79, 38, 44, 29, 94, 96, 93, 53, 48, 27, 10, 74, 6, 80, - 74, 60, 40, 84, 22, 67, 47, 3, 27, 81, 62, 94, 32, 95, 48, 84, 70, 15, 83, - 96, 23, 37, 39, 62, 51, 31, 52, 95, 24, 35, 67, 80, 16, 74, 82, 1, 88, 42, - 51, 88, 70, 90, 19, 18, 25, 97, 40, 20, 65, 63, 81, 62, 73, 5, 35, 39, 17, - 66, 85, 51, 14, 87, 30, 27, 58, 3, 89, 43, 38, 6, 53, 43, 72, 83, 91, 11, - 73, 69, 6, 91, 67, 34, 18, 31, 5, 28, 74, 4, 8, 5, 39, 39, 21, 41, 6, - 13, 60, 74, 53, 41, 6, 90, 80, 30, 71, 35, 94, 29, 17, 30, 7, 57, 66, 59, - 7, 0, 74, 13, 25, 62, 93, 73, 12, 20, 79, 11, 15, 84, 0, 72, 73, 96, 96, - 19, 42, 67, 4, 25, 49, 1, 60, 93, 10, 46, 63, 2, 67, 70, 28, 88, 9, 44, - 14, 55, 18, 99, 64, 46, 40, 72, 84, 37, 59, 44, 23, 31, 14, 12, 74, 29, 64, - 54, 95, 94, 50, 25, 81, 15, 27, 27, 13, 11, 31, 76, 87, 59, 35, 76, 91, 61, - 64, 31, 22, 37, 56, 2, 66, 65, 90, 61, 17, 52, 69, 80, 49, 98, 61, 16, 58, - 54, 0, 59, 39, 76, 21, 83, 59, 3, 84, 75, 90, 59, 56, 2, 28, 5, 9, 7, - 40, 72, 23, 39, 86, 35, 36, 91, 17, 92, 17, 16, 27, 2, 74, 67, 92, 95, 22, - 7, 50, 92, 97, 8, 71, 33, 32, 51, 78, 72, 33, 8, 70, 66, 5, 7, 35, 21, - 59, 85, 69, 87, 3, 17, 53, 59, 80, 73, 15, 51, 25, 49, 57, 19, 25, 86, 29, - 20, 78, 65, 53, 7, 16, 48, 45, 50, 18, 80, 4, 64, 16, 88, 30, 72, 43, 19, - 36, 77, 48, 9, 65, 91, 94, 54, 1, 73, 83, 32, 12, 61, 79, 6, 9, 5, 56, - 74, 38, 56, 32, 68, 7, 68, 63, 57, 79, 49, 93, 40, 6, 13, 37, 62, 99, 58, - 78, 35, 86, 65, 93, 43, 55, 11, 30, 22, 79, 3, 33, 93, 71, 19, 11, 78, 60, - 51, 57, 72, 45, 61, 14, 21, 51, 0, 58, 1, 70, 7, 94, 50, 35, 12, 38, 24, - 47, 9, 91, 12, 73, 56, 58, 28, 46, 98, 99, 84, 89, 29, 74, 22, 66, 83, 26, - 83, 73, 5, 80, 84, 71, 57, 54, 56, 8, 26, 57, 56, 4, 12, 7, 40, 41, 47, - 68, 88, 44, 6, 9, 74, 75, 16, 20, 6, 48, 0, 11, 56, 19, 13, 4, 23, 62, - 40, 45, 15, 81, 38, 7, 30, 76, 87, 43, 85, 41, 10, 60, 65, 95, 51, 90, 21, - 37, 22, 26, 45, 89, 59, 81, 93, 87, 87, 49, 20, 39, 82, 99, 16, 14, 23, 63, - 80, 71, 7, 10, 31, 12, 65, 28, 5, 28, 38, 30, 19, 46, 94, 4, 8, 51, 7, - 60, 9, 89, 10, 57, 94, 56, 0, 42, 88, 25, 82, 18, 8, 93, 47, 15, 74, 70, - 51, 86, 84, 42, 13, 74, 97, 2, 57, 58, 7, 77, 11, 14, 58, 42, 95, 43, 85, - 53, 74, 9, 42, 52, 49, 52, 7, 38, 72, 9, 73, 99, 9, 20, 19, 69, 53, 67, - 96, 70, 1, 11, 20, 36, 44, 10, 68, 50, 22, 71, 28, 90, 84, 45, 4, 36, 96, - 86, 33, 56, 0, 8, 99, 24, 82, 52, 35, 96, 17, 80, 74, 0, 26, 50, 13, 47, - 83, 78, 56, 91, 38, 47, 51, 78, 6, 75, 51, 96, 76, 8, 81, 38, 37, -}; -float verify_data[ARRAY_SIZE] = { - 94, 68, 64, 32, 24, 54, 54, 6, 61, 32, 81, 76, 7, 6, 67, 30, 6, 15, 40, - 64, 56, 32, 78, 79, 30, 91, 7, 43, 10, 15, 9, 8, 91, 79, 25, 3, 21, 39, - 18, 89, 29, 15, 33, 70, 78, 80, 80, 27, 13, 84, 72, 31, 2, 51, 65, 6, 22, - 12, 40, 85, 31, 74, 73, 99, 7, 3, 38, 60, 40, 76, 95, 42, 66, 71, 88, 44, - 4, 74, 16, 58, 60, 0, 84, 22, 28, 78, 53, 9, 79, 73, 41, 41, 12, 70, 99, - 24, 31, 47, 53, 37, 84, 40, 7, 58, 86, 55, 74, 77, 99, 60, 74, 3, 74, 72, - 37, 37, 5, 72, 7, 5, 3, 56, 47, 10, 65, 51, 9, 82, 65, 52, 32, 53, 88, - 26, 71, 74, 87, 3, 9, 31, 77, 40, 82, 89, 53, 73, 59, 56, 9, 33, 16, 56, - 33, 58, 68, 60, 28, 86, 20, 52, 87, 91, 93, 82, 66, 22, 33, 90, 27, 73, 16, - 91, 70, 84, 1, 43, 41, 96, 44, 2, 7, 8, 48, 74, 93, 28, 88, 65, 5, 84, - 19, 35, 56, 17, 97, 24, 1, 85, 98, 7, 78, 28, 8, 3, 99, 22, 88, 38, 6, - 96, 23, 66, 40, 70, 45, 38, 71, 46, 44, 95, 28, 42, 69, 96, 25, 11, 35, 46, - 81, 9, 99, 59, 36, 92, 64, 5, 69, 67, 42, 6, 90, 19, 31, 65, 72, 66, 50, - 56, 19, 98, 6, 51, 38, 13, 53, 17, 45, 66, 44, 22, 76, 11, 63, 98, 9, 22, - 39, 9, 83, 47, 51, 53, 80, 42, 14, 90, 23, 5, 18, 32, 11, 99, 9, 90, 30, - 74, 67, 80, 24, 8, 71, 96, 18, 12, 35, 7, 6, 59, 40, 53, 4, 3, 88, 43, - 30, 67, 12, 61, 39, 7, 80, 68, 78, 84, 74, 21, 19, 97, 96, 74, 22, 11, 22, - 71, 56, 73, 85, 31, 79, 73, 62, 40, 98, 27, 70, 72, 71, 4, 74, 17, 86, 35, - 4, 7, 60, 89, 75, 37, 46, 2, 70, 0, 28, 33, 93, 44, 47, 18, 81, 89, 88, - 19, 63, 31, 39, 81, 90, 83, 35, 25, 29, 52, 35, 21, 64, 68, 51, 29, 16, 22, - 94, 57, 1, 26, 8, 82, 58, 50, 73, 8, 29, 45, 49, 9, 67, 41, 28, 62, 19, - 91, 94, 49, 64, 69, 36, 59, 16, 63, 57, 74, 20, 26, 4, 58, 11, 50, 18, 7, - 70, 98, 25, 27, 17, 86, 94, 80, 63, 71, 35, 94, 18, 11, 29, 1, 54, 80, 91, - 85, 88, 57, 72, 22, 6, 45, 8, 7, 20, 13, 79, 67, 56, 7, 66, 14, 33, 47, - 75, 50, 16, 62, 16, 32, 25, 73, 17, 60, 95, 49, 17, 69, 30, 79, 45, 66, 48, - 89, 51, 77, 36, 47, 25, 95, 70, 24, 44, 61, 49, 61, 46, 60, 61, 31, 7, 95, - 97, 69, 30, 93, 94, 98, 92, 87, 72, 49, 61, 83, 0, 59, 7, 11, 44, 83, 41, - 20, 98, 74, 59, 6, 36, 72, 93, 37, 10, 68, 43, 48, 40, 6, 7, 10, 50, 61, - 17, 3, 43, 93, 14, 26, 11, 81, 60, 14, 10, 78, 73, 15, 14, 13, 26, 4, 88, - 47, 19, 11, 14, 62, 13, 84, 20, 91, 57, 46, 25, 16, 16, 17, 19, 40, 21, 83, - 56, 93, 9, 58, 68, 56, 91, 80, 10, 62, 78, 99, 41, 20, 64, 69, 17, 76, 71, - 70, 65, 67, 66, 63, 81, 58, 27, 53, 36, 6, 51, 73, 19, 87, 89, 42, 50, 91, - 80, 19, 52, 25, 2, 44, 27, 34, 65, 5, 43, 2, 56, 15, 63, 34, 59, 2, 15, - 54, 2, 59, 77, 13, 0, 5, 13, 87, 10, 95, 22, 38, 23, 64, 14, 96, 87, 27, - 84, 91, 46, 78, 65, 17, 79, 83, 81, 18, 7, 67, 27, 0, 74, 80, 48, 37, 58, - 80, 4, 49, 57, 43, 71, 47, 34, 81, 8, 97, 10, 75, 42, 1, 30, 98, 40, 12, - 38, 96, 62, 31, 0, 70, 27, 59, 67, 73, 9, 62, 1, 84, 23, 20, 94, 85, 28, - 51, 73, 57, 17, 90, 65, 77, 31, 6, 28, 34, 41, 48, 44, 23, 73, 5, 74, 28, - 13, 39, 92, 15, 65, 99, 70, 71, 62, 39, 56, 53, 90, 78, 42, 14, 78, 29, 16, - 4, 53, 12, 75, 48, 10, 5, 29, 37, 5, 28, 13, 88, 11, 76, 95, 51, 91, 58, - 7, 57, 40, 82, 0, 74, 84, 6, 36, 50, 78, 27, 44, 74, 98, 35, 39, 37, 34, - 76, 94, 39, 35, 74, 25, 9, 31, 21, 22, 25, 94, 78, 94, 54, 45, 99, 42, 9, - 45, 75, 51, 31, 18, 20, 92, 30, 19, 70, 16, 14, 46, 26, 96, 62, 39, 4, 62, - 44, 76, 83, 7, 49, 54, 35, 50, 56, 15, 16, 88, 42, 4, 51, 79, 55, 16, 84, - 53, 79, 1, 90, 38, 34, 63, 78, 93, 51, 17, 8, 93, 14, 87, 59, 50, 57, 1, - 86, 35, 8, 81, 14, 25, 52, 36, 96, 30, 80, 91, 60, 56, 55, 56, 13, 39, 5, - 87, 79, 53, 31, 66, 5, 73, 55, 59, 3, 92, 19, 73, 65, 12, 26, 38, 23, 82, - 49, 96, 76, 47, 26, 65, 28, 29, 42, 43, 24, 82, 10, 5, 16, 48, 52, 85, 39, - 12, 18, 35, 84, 97, 25, 83, 93, 38, 57, 7, 63, 18, 52, 86, 8, 81, 54, 70, - 95, 86, 72, 43, 93, 48, 72, 81, 34, 27, 95, 51, 39, 20, 99, 76, 75, 8, 86, - 32, 43, 24, 56, 30, 80, 8, 7, 33, 81, 50, 67, 74, 81, 31, 25, 63, 30, 9, - 15, 26, 6, 10, 24, 14, 21, 79, 64, 91, 90, 71, 29, 12, 55, 47, 4, 76, 71, - 93, 38, 56, 38, 41, 31, 81, 94, 99, 92, 47, 7, 3, 6, 61, 54, 74, 35, 87, - 41, 11, 46, 61, 59, 33, 20, 61, 11, 9, 12, 87, 7, 47, 72, 0, 37, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/gendata.py deleted file mode 100755 index 54bde6c3..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/gendata.py +++ /dev/null @@ -1,34 +0,0 @@ -#!/usr/bin/env python3 - -# Script for generating a basic transpose test case - -import numpy as np - -dim_m = 32 -dim_n = 32 - -input_matrix = np.random.randint(0, 100, size=(dim_m, dim_n), dtype=np.int32) -transpose_matrix = input_matrix.T - -print( - """#define DIM_M {} -#define DIM_N {} -#define ARRAY_SIZE {} - -""".format( - dim_m, dim_n, dim_m * dim_n - ) -) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=8): - print("{} {}[{}] = {{".format(data_type, name, data_size)) - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print_array("input_matrix", input_matrix.flatten(), "ARRAY_SIZE") -print_array("verify_data", input_matrix.T.flatten(), "ARRAY_SIZE") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/vec-transpose.S b/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/vec-transpose.S deleted file mode 100644 index d7402c04..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/vec-transpose.S +++ /dev/null @@ -1,89 +0,0 @@ - .text - .balign 4 - .global vec_transpose -# RV64IDV system -# -# void -# vec_transpose(size_t n, -# size_t m, -# const float*a, // m * n matrix -# float*b, // n * m matrix -# - -############### UNOPTIMIZED ###################### - -#define n a0 -#define m a1 -#define ap a2 -#define bp a3 - -#define astride t0 -#define bstride t1 -#define nvl t2 -#define amp t3 -#define bmp t4 -#define mt t5 -#define nt t6 - -#define bnp a4 - - -vec_transpose: - # Check for zero size matrices - beqz n, exit - beqz m, exit - - # Convert elements strides to byte strides. - slli astride, n, 2 - slli bstride, m, 2 - - slti t6, m, 4 - bnez t6, end_rows - -a_row_loop: - mv mt, m - - mv amp, ap - mv bmp, bp - -a_col_loop: - vsetvli nvl, mt, e32, m1, ta, ma - - mv bnp, bmp - - // Load the input matrix using strided segment loads - vlsseg4e32.v v0, (amp), astride - - // Store the transposed output matrix using unit stride stores - vse32.v v0, (bnp) - add bnp, bnp, bstride - vse32.v v1, (bnp) - add bnp, bnp, bstride - vse32.v v2, (bnp) - add bnp, bnp, bstride - vse32.v v3, (bnp) - - slli a5, nvl, 2 - add bmp, bmp, a5 - - mul a5, astride, nvl - add amp, amp, a5 - - sub mt, mt, nvl - - bnez mt, a_col_loop - - addi n, n, -4 - - slli a5, bstride, 2 - add bp, bp, a5 - - addi ap, ap, 16 - - bnez n, a_row_loop - - -end_rows: - -exit: - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/vec-transpose_main.c b/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/vec-transpose_main.c deleted file mode 100644 index 61fbcb3e..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-load/vec-transpose_main.c +++ /dev/null @@ -1,37 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Transpose benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized matrix transpose implementation. - -#include "util.h" -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_transpose(size_t, size_t, const float *, float *); - -int main(int argc, char *argv[]) { - float results_data[ARRAY_SIZE] = {0}; - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_transpose(DIM_N, DIM_M, input_matrix, results_data); - memset(results_data, 0, sizeof(results_data)); -#endif - - setStats(1); - vec_transpose(DIM_N, DIM_M, input_matrix, results_data); - setStats(0); - - // Check the results - return verifyFloat(ARRAY_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/dataset1.h b/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/dataset1.h deleted file mode 100644 index 39ec76e8..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/dataset1.h +++ /dev/null @@ -1,116 +0,0 @@ -#define DIM_M 32 -#define DIM_N 32 -#define ARRAY_SIZE 1024 - -float input_matrix[ARRAY_SIZE] = { - 44, 76, 26, 36, 98, 42, 0, 26, 30, 53, 44, 9, 72, 4, 58, 54, 28, 78, 47, - 42, 64, 93, 96, 44, 67, 53, 82, 21, 78, 84, 12, 95, 60, 3, 42, 54, 79, 74, - 48, 81, 41, 51, 13, 86, 16, 4, 8, 69, 71, 44, 19, 11, 53, 49, 97, 35, 93, - 87, 4, 87, 36, 54, 2, 70, 33, 34, 33, 5, 18, 33, 76, 12, 90, 61, 38, 49, - 85, 86, 62, 0, 23, 33, 70, 79, 93, 28, 36, 54, 35, 98, 95, 51, 41, 81, 45, - 21, 17, 4, 80, 85, 14, 99, 83, 78, 69, 42, 90, 25, 46, 93, 74, 78, 17, 64, - 51, 89, 42, 77, 1, 46, 65, 85, 28, 81, 0, 87, 22, 9, 72, 55, 27, 17, 86, - 68, 96, 85, 91, 44, 20, 29, 87, 44, 98, 81, 85, 38, 4, 70, 72, 79, 99, 38, - 45, 43, 0, 11, 99, 81, 36, 58, 61, 63, 44, 72, 33, 25, 15, 51, 85, 66, 47, - 57, 21, 82, 6, 59, 6, 60, 80, 66, 89, 21, 92, 36, 5, 87, 51, 24, 96, 43, - 9, 55, 85, 26, 19, 97, 10, 19, 58, 33, 28, 9, 47, 35, 36, 0, 30, 45, 37, - 61, 76, 41, 69, 67, 35, 90, 16, 43, 41, 10, 3, 55, 30, 90, 22, 57, 12, 31, - 85, 16, 76, 72, 8, 55, 38, 92, 33, 71, 66, 7, 82, 27, 77, 88, 67, 20, 11, - 37, 70, 74, 19, 39, 87, 61, 7, 57, 36, 86, 73, 99, 25, 82, 82, 10, 9, 61, - 96, 89, 76, 33, 3, 48, 12, 69, 53, 46, 85, 35, 73, 99, 51, 8, 12, 95, 39, - 23, 79, 97, 6, 10, 2, 98, 90, 51, 19, 29, 12, 24, 2, 6, 80, 46, 67, 16, - 43, 42, 82, 56, 24, 84, 88, 55, 63, 79, 21, 83, 51, 68, 67, 17, 89, 99, 26, - 20, 27, 0, 52, 21, 26, 25, 71, 83, 44, 28, 46, 89, 63, 40, 77, 93, 31, 77, - 48, 96, 79, 69, 0, 15, 69, 21, 17, 58, 87, 21, 2, 15, 56, 32, 60, 17, 86, - 94, 3, 75, 57, 83, 31, 21, 93, 89, 83, 74, 17, 24, 21, 43, 6, 18, 56, 2, - 93, 87, 85, 78, 80, 11, 93, 24, 97, 58, 2, 38, 67, 57, 13, 96, 69, 28, 23, - 13, 76, 86, 27, 82, 62, 26, 74, 42, 8, 14, 34, 71, 61, 68, 47, 25, 26, 58, - 90, 8, 23, 77, 83, 49, 91, 28, 61, 57, 2, 3, 39, 54, 23, 57, 99, 53, 48, - 11, 53, 97, 56, 3, 53, 46, 22, 16, 56, 25, 41, 6, 67, 61, 15, 82, 12, 44, - 52, 85, 86, 4, 51, 23, 96, 36, 43, 53, 71, 17, 26, 43, 58, 90, 24, 65, 56, - 93, 32, 52, 75, 20, 53, 24, 78, 2, 72, 43, 92, 67, 28, 93, 55, 60, 10, 21, - 10, 65, 89, 10, 46, 54, 13, 19, 91, 39, 4, 41, 93, 23, 4, 36, 26, 49, 71, - 64, 77, 69, 63, 38, 56, 84, 98, 94, 75, 53, 45, 93, 61, 33, 34, 31, 25, 34, - 37, 13, 11, 48, 89, 52, 27, 39, 84, 34, 81, 81, 47, 83, 54, 35, 78, 62, 30, - 84, 16, 4, 81, 29, 24, 55, 57, 29, 1, 90, 10, 28, 22, 69, 37, 15, 53, 90, - 69, 54, 58, 88, 38, 12, 16, 35, 88, 86, 13, 16, 27, 28, 93, 12, 49, 63, 92, - 65, 40, 73, 50, 8, 16, 37, 13, 98, 38, 2, 80, 39, 75, 56, 28, 47, 94, 33, - 81, 79, 48, 41, 20, 0, 47, 57, 52, 36, 42, 4, 9, 86, 73, 3, 73, 87, 48, - 94, 0, 36, 75, 20, 17, 76, 99, 18, 61, 50, 77, 8, 72, 18, 83, 33, 29, 98, - 17, 21, 62, 17, 36, 30, 87, 4, 65, 89, 10, 85, 31, 42, 34, 10, 19, 28, 31, - 48, 94, 61, 83, 36, 35, 0, 5, 90, 62, 58, 6, 98, 66, 60, 27, 5, 15, 51, - 64, 1, 51, 86, 5, 18, 53, 23, 45, 46, 7, 47, 97, 0, 10, 51, 64, 42, 46, - 1, 11, 37, 90, 22, 21, 92, 72, 55, 62, 57, 24, 84, 26, 21, 6, 39, 35, 48, - 70, 89, 46, 83, 40, 79, 31, 89, 55, 8, 85, 83, 81, 73, 22, 62, 73, 9, 35, - 78, 84, 77, 89, 42, 67, 55, 82, 74, 78, 66, 92, 78, 56, 53, 86, 97, 33, 68, - 56, 23, 7, 17, 14, 20, 20, 14, 53, 87, 4, 47, 18, 36, 10, 80, 63, 88, 49, - 81, 46, 5, 11, 80, 39, 54, 28, 98, 47, 76, 34, 81, 5, 55, 25, 78, 56, 40, - 17, 14, 59, 54, 39, 87, 16, 53, 25, 82, 68, 83, 67, 92, 83, 7, 77, 20, 68, - 33, 15, 14, 32, 99, 43, 55, 18, 50, 57, 35, 99, 49, 46, 4, 86, 19, 54, 64, - 93, 6, 59, 50, 36, 88, 33, 41, 6, 25, 67, 17, 13, 56, 47, 84, 93, 72, 47, - 3, 39, 60, 75, 56, 26, 32, 21, 26, 88, 26, 75, 95, 85, 55, 0, 5, 27, 10, - 89, 63, 38, 39, 65, 10, 74, 13, 72, 17, 28, 39, 4, 52, 67, 62, 22, 69, 70, - 17, 79, 20, 14, 91, 68, 80, 77, 54, 90, 31, 75, 89, 29, 85, 74, 50, 0, 19, - 35, 84, 38, 88, 91, 80, 33, 86, 46, 48, 91, 53, 7, 24, 74, 74, 80, 33, 16, - 40, 61, 58, 73, 2, 8, 81, 46, 19, 77, 10, 43, 85, 14, 86, 63, 51, 26, 76, - 1, 12, 66, 48, 70, 12, 79, 24, 66, 11, 14, 77, 10, 94, 53, 34, 20, 62, 33, - 64, 73, 77, 53, 39, 60, 11, 20, 90, 30, 55, 83, 47, 88, 26, 34, 4, 60, 15, - 19, 61, 87, 64, 9, 21, 17, 57, 16, 79, 22, 70, 11, 16, 24, 85, 89, 74, 37, - 40, 45, 54, 60, 44, 59, 52, 40, 21, 64, 59, 39, 95, 87, 44, 48, 1, -}; -float verify_data[ARRAY_SIZE] = { - 44, 60, 33, 17, 72, 61, 85, 22, 36, 6, 89, 87, 80, 26, 41, 53, 71, 47, 16, - 81, 72, 5, 11, 22, 53, 59, 86, 88, 14, 80, 14, 9, 76, 3, 34, 4, 55, 63, - 26, 57, 86, 10, 99, 21, 11, 58, 6, 24, 64, 83, 35, 79, 18, 90, 37, 62, 87, - 54, 19, 26, 91, 33, 77, 21, 26, 42, 33, 80, 27, 44, 19, 12, 73, 2, 26, 2, - 93, 90, 67, 78, 77, 54, 88, 48, 83, 62, 90, 73, 4, 39, 54, 75, 68, 16, 10, - 17, 36, 54, 5, 85, 17, 72, 97, 31, 99, 98, 20, 15, 24, 8, 61, 2, 69, 35, - 86, 41, 33, 58, 22, 9, 47, 87, 64, 95, 80, 40, 94, 57, 98, 79, 18, 14, 86, - 33, 10, 85, 25, 90, 27, 56, 97, 23, 15, 72, 63, 78, 13, 20, 29, 6, 21, 35, - 18, 16, 93, 85, 77, 61, 53, 16, 42, 74, 33, 99, 68, 25, 19, 16, 82, 51, 0, - 32, 58, 77, 82, 43, 38, 62, 16, 0, 98, 98, 92, 78, 36, 53, 6, 55, 54, 58, - 34, 79, 0, 48, 76, 83, 96, 15, 58, 76, 82, 19, 52, 60, 2, 83, 12, 92, 56, - 30, 27, 47, 17, 66, 72, 84, 10, 25, 59, 0, 90, 73, 20, 22, 26, 81, 12, 78, - 85, 51, 33, 72, 10, 29, 21, 17, 38, 49, 44, 67, 84, 84, 28, 57, 21, 60, 55, - 77, 80, 82, 50, 5, 31, 2, 62, 70, 30, 41, 90, 69, 91, 85, 28, 8, 9, 12, - 26, 86, 67, 91, 52, 28, 98, 16, 93, 52, 62, 27, 62, 89, 63, 68, 36, 27, 75, - 8, 33, 11, 53, 51, 61, 42, 44, 66, 9, 55, 61, 24, 25, 94, 57, 28, 85, 93, - 94, 4, 12, 36, 17, 5, 57, 42, 88, 83, 88, 10, 89, 81, 64, 16, 44, 13, 38, - 90, 20, 47, 47, 38, 96, 2, 71, 3, 13, 61, 86, 55, 75, 81, 49, 42, 36, 15, - 24, 67, 49, 67, 33, 89, 29, 46, 73, 24, 9, 86, 49, 25, 29, 57, 35, 92, 89, - 6, 83, 75, 96, 57, 4, 60, 53, 29, 63, 4, 30, 51, 84, 55, 81, 92, 41, 63, - 85, 19, 77, 85, 72, 16, 85, 46, 87, 21, 36, 33, 76, 80, 44, 57, 69, 2, 51, - 10, 45, 24, 92, 9, 87, 64, 26, 82, 46, 83, 6, 38, 74, 77, 53, 89, 4, 4, - 86, 93, 44, 82, 0, 71, 33, 46, 28, 83, 28, 3, 23, 21, 93, 55, 65, 86, 4, - 1, 21, 74, 5, 7, 25, 39, 50, 10, 39, 74, 58, 8, 62, 74, 98, 6, 30, 66, - 3, 67, 46, 31, 23, 39, 96, 10, 61, 57, 40, 73, 65, 51, 6, 78, 11, 77, 67, - 65, 0, 43, 60, 37, 54, 69, 0, 78, 81, 59, 45, 7, 48, 16, 89, 21, 13, 54, - 36, 65, 33, 29, 73, 3, 89, 86, 39, 66, 80, 20, 17, 10, 19, 85, 11, 40, 28, - 71, 23, 17, 85, 6, 37, 82, 12, 43, 63, 93, 76, 23, 43, 89, 34, 1, 50, 73, - 10, 5, 35, 92, 39, 68, 13, 74, 35, 14, 20, 45, 78, 44, 33, 64, 38, 60, 61, - 27, 69, 42, 40, 89, 86, 57, 53, 10, 31, 90, 8, 87, 85, 18, 48, 78, 54, 33, - 56, 13, 84, 86, 90, 54, 47, 19, 70, 51, 4, 80, 76, 77, 53, 82, 77, 83, 27, - 99, 71, 46, 25, 10, 16, 48, 31, 53, 70, 56, 28, 15, 47, 72, 38, 63, 30, 60, - 42, 11, 79, 89, 70, 66, 41, 88, 46, 56, 93, 74, 82, 53, 17, 54, 34, 28, 37, - 94, 42, 23, 89, 53, 98, 14, 84, 17, 88, 51, 55, 44, 64, 53, 93, 42, 72, 89, - 69, 67, 85, 24, 31, 17, 62, 48, 26, 13, 37, 22, 13, 0, 34, 45, 46, 86, 47, - 32, 93, 28, 91, 26, 83, 59, 93, 49, 28, 77, 79, 21, 67, 20, 35, 84, 77, 24, - 26, 11, 43, 19, 13, 69, 98, 36, 10, 46, 83, 97, 76, 99, 72, 39, 80, 76, 47, - 52, 96, 97, 36, 1, 99, 92, 35, 11, 73, 88, 48, 21, 74, 53, 58, 91, 11, 37, - 38, 75, 19, 7, 40, 33, 34, 43, 47, 4, 33, 1, 88, 40, 44, 35, 54, 46, 38, - 36, 90, 37, 99, 55, 96, 43, 42, 97, 90, 39, 48, 15, 2, 20, 28, 47, 79, 68, - 81, 55, 3, 52, 86, 12, 26, 21, 67, 93, 35, 65, 45, 5, 16, 70, 51, 63, 79, - 6, 8, 56, 24, 4, 89, 53, 80, 17, 31, 97, 31, 56, 5, 18, 39, 67, 46, 66, - 34, 64, 53, 87, 98, 85, 43, 87, 43, 74, 8, 79, 69, 18, 14, 3, 65, 41, 52, - 90, 39, 76, 48, 0, 89, 23, 55, 50, 60, 62, 48, 48, 4, 59, 82, 4, 95, 28, - 0, 51, 41, 19, 12, 21, 0, 56, 34, 53, 56, 93, 27, 69, 75, 99, 94, 10, 55, - 7, 25, 57, 75, 22, 91, 70, 60, 39, 21, 87, 51, 81, 11, 24, 10, 39, 95, 83, - 15, 2, 71, 46, 93, 23, 39, 54, 56, 18, 61, 51, 8, 17, 78, 35, 56, 69, 53, - 12, 15, 95, 78, 36, 41, 0, 99, 96, 3, 87, 39, 51, 69, 93, 61, 22, 32, 4, - 84, 58, 28, 61, 83, 64, 85, 14, 56, 99, 26, 70, 7, 79, 19, 87, 84, 54, 81, - 87, 81, 43, 55, 61, 23, 68, 21, 87, 68, 16, 52, 36, 34, 88, 47, 50, 36, 42, - 83, 20, 40, 49, 32, 17, 24, 24, 61, 44, 12, 2, 45, 22, 36, 9, 30, 7, 79, - 67, 17, 85, 47, 56, 75, 26, 81, 38, 94, 77, 35, 46, 81, 20, 17, 46, 21, 79, - 74, 66, 87, 48, 95, 70, 21, 9, 58, 55, 90, 57, 97, 17, 58, 78, 25, 25, 20, - 49, 81, 12, 33, 8, 0, 1, 73, 14, 14, 4, 26, 20, 74, 11, 64, 1, -}; diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/gendata.py b/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/gendata.py deleted file mode 100755 index 54bde6c3..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/gendata.py +++ /dev/null @@ -1,34 +0,0 @@ -#!/usr/bin/env python3 - -# Script for generating a basic transpose test case - -import numpy as np - -dim_m = 32 -dim_n = 32 - -input_matrix = np.random.randint(0, 100, size=(dim_m, dim_n), dtype=np.int32) -transpose_matrix = input_matrix.T - -print( - """#define DIM_M {} -#define DIM_N {} -#define ARRAY_SIZE {} - -""".format( - dim_m, dim_n, dim_m * dim_n - ) -) - - -def print_array(name, data, data_size, data_type="float", data_fmt="{}", fold=8): - print("{} {}[{}] = {{".format(data_type, name, data_size)) - for i in range(0, len(data), fold): - print( - " ", ", ".join(data_fmt.format(x) for x in data[i : i + fold]), ",", sep="" - ) - print("};") - - -print_array("input_matrix", input_matrix.flatten(), "ARRAY_SIZE") -print_array("verify_data", input_matrix.T.flatten(), "ARRAY_SIZE") diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/vec-transpose.S b/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/vec-transpose.S deleted file mode 100644 index e560b9a1..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/vec-transpose.S +++ /dev/null @@ -1,90 +0,0 @@ - .text - .balign 4 - .global vec_transpose -# RV64IDV system -# -# void -# vec_transpose(size_t n, -# size_t m, -# const float*a, // m * n matrix -# float*b, // n * m matrix -# - -############### UNOPTIMIZED ###################### - -#define n a0 -#define m a1 -#define ap a2 -#define bp a3 - -#define astride t0 -#define bstride t1 -#define nvl t2 -#define amp t3 -#define bnp t4 -#define mt t5 -#define nt t6 -#define anp a4 - - -vec_transpose: - # Check for zero size matrices - beqz n, exit - beqz m, exit - - # Convert elements strides to byte strides. - slli astride, n, 2 - slli bstride, m, 2 - - slti t6, m, 4 - bnez t6, end_rows - -a_row_loop: - mv nt, n - - mv anp, ap - mv bnp, bp - -a_col_loop: - vsetvli nvl, nt, e32, m1, ta, ma - - mv amp, anp - - // Load the input matrix with unit stride - vle32.v v0, (amp) - add amp, amp, astride - vle32.v v1, (amp) - add amp, amp, astride - vle32.v v2, (amp) - add amp, amp, astride - vle32.v v3, (amp) - - // Output the transpose using strided segment store - vssseg4e32.v v0, (bnp), bstride - - slli a5, nvl, 2 - add anp, anp, a5 - - mul a5, bstride, nvl - add bnp, bnp, a5 - - sub nt, nt, nvl - - bnez nt, a_col_loop - - mv nt, n - addi m, m, -4 - - slli a5, astride, 2 - add ap, ap, a5 - - addi bp, bp, 16 - - bnez m, a_row_loop - - -end_rows: - # Not done - -exit: - ret diff --git a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/vec-transpose_main.c b/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/vec-transpose_main.c deleted file mode 100644 index 61fbcb3e..00000000 --- a/bb-tests/workloads/src/CTest/rvv/vec-transpose-store/vec-transpose_main.c +++ /dev/null @@ -1,37 +0,0 @@ -// See LICENSE for license details. - -//************************************************************************** -// Transpose benchmark -//-------------------------------------------------------------------------- -// -// This benchmark tests a vectorized matrix transpose implementation. - -#include "util.h" -#include - -//-------------------------------------------------------------------------- -// Input/Reference Data - -#include "dataset1.h" - -//-------------------------------------------------------------------------- -// Main - -void *vec_transpose(size_t, size_t, const float *, float *); - -int main(int argc, char *argv[]) { - float results_data[ARRAY_SIZE] = {0}; - -#if PREALLOCATE - // If needed we preallocate everything in the caches - vec_transpose(DIM_N, DIM_M, input_matrix, results_data); - memset(results_data, 0, sizeof(results_data)); -#endif - - setStats(1); - vec_transpose(DIM_N, DIM_M, input_matrix, results_data); - setStats(0); - - // Check the results - return verifyFloat(ARRAY_SIZE, results_data, verify_data); -} diff --git a/bb-tests/workloads/src/CTest/toy/CMakeLists.txt b/bb-tests/workloads/src/CTest/toy/CMakeLists.txt index 212ec222..d2dd6a8d 100644 --- a/bb-tests/workloads/src/CTest/toy/CMakeLists.txt +++ b/bb-tests/workloads/src/CTest/toy/CMakeLists.txt @@ -1,34 +1,28 @@ set(ELF_CC "riscv64-unknown-elf-gcc") -set(LINUX_CC "riscv64-unknown-linux-gnu-g++") #------------------------------------------------------------------------------- # Set baremetal compilation flags #------------------------------------------------------------------------------- -set(C_FLAGS -g -fno-common -O1 -static -march=rv64gc -mcmodel=medany - -fno-builtin-printf -specs=htif_nano.specs -I${CTEST_TOY_WORKLOAD_DIR}) - -#------------------------------------------------------------------------------- -# Define common compilation step functions -#------------------------------------------------------------------------------- +set(BBSIM_LD ${CTEST_TOY_WORKLOAD_DIR}/bbsim.ld) +set(C_FLAGS -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany + -fno-builtin-printf -specs=nano.specs -specs=nosys.specs -nostartfiles + -Wl,-T,${BBSIM_LD} -I${CTEST_TOY_WORKLOAD_DIR}) #------------------------------------------------------------------------------- # Generate executables for different platforms #------------------------------------------------------------------------------- set(CMAKE_C_COMPILER "riscv64-unknown-linux-gnu-gcc") -set(CMAKE_CXX_COMPILER "riscv64-unknown-linux-gnu-g++") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=rv64gc") -set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=rv64gc") + # Generate Linux version executables function(add_linux_test_target TEST_NAME SOURCE_FILE) set(EXECUTABLE "${TEST_NAME}-linux") - add_executable(${EXECUTABLE} ${CTEST_TOY_WORKLOAD_DIR}/${SOURCE_FILE} ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c) - target_include_directories(${EXECUTABLE} PRIVATE - ${WORKLOAD_LIB_DIR} - ) - # Ensure dependent libraries are built first and link merged library files - add_dependencies(${EXECUTABLE} bbhw-linux) - target_link_libraries(${EXECUTABLE} ${CMAKE_BINARY_DIR}/workloads/lib/bbhw/libbbhw-linux.a) + add_executable(${EXECUTABLE} + ${CTEST_TOY_WORKLOAD_DIR}/${SOURCE_FILE} + ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c) + target_include_directories(${EXECUTABLE} PRIVATE ${WORKLOAD_LIB_DIR}) + set_target_properties(${EXECUTABLE} PROPERTIES LINKER_LANGUAGE C) endfunction() # Generate multicore baremetal version executables @@ -42,16 +36,14 @@ function(add_multicore_baremetal_test_target TEST_NAME SOURCE_FILE) ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c ${CTEST_TOY_WORKLOAD_DIR}/${SOURCE_FILE} -I${WORKLOAD_LIB_DIR} - ${CMAKE_BINARY_DIR}/workloads/lib/bbhw/libbbhw-baremetal.a DEPENDS ${CTEST_TOY_WORKLOAD_DIR}/${SOURCE_FILE} ${CTEST_TOY_WORKLOAD_DIR}/start.S - ${CMAKE_BINARY_DIR}/workloads/lib/bbhw/libbbhw-baremetal.a COMMENT "Building multicore baremetal executable: ${EXECUTABLE}" WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} ) add_custom_target(${TEST_NAME}_multicore_baremetal - DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} bbhw-baremetal ) + DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE}) endfunction() # Generate singlecore baremetal version executables @@ -61,20 +53,19 @@ function(add_singlecore_baremetal_test_target TEST_NAME SOURCE_FILE) add_custom_command( OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} COMMAND ${ELF_CC} ${C_FLAGS} -o ${EXECUTABLE} + ${CTEST_TOY_WORKLOAD_DIR}/crt0.S ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c ${CTEST_TOY_WORKLOAD_DIR}/${SOURCE_FILE} -I${WORKLOAD_LIB_DIR} - ${CMAKE_BINARY_DIR}/workloads/lib/bbhw/libbbhw-baremetal.a DEPENDS ${CTEST_TOY_WORKLOAD_DIR}/${SOURCE_FILE} + ${CTEST_TOY_WORKLOAD_DIR}/crt0.S ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c - ${CMAKE_BINARY_DIR}/workloads/lib/bbhw/libbbhw-baremetal.a COMMENT "Building singlecore baremetal executable: ${EXECUTABLE}" WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} ) add_custom_target(${TEST_NAME}_singlecore_baremetal - DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE} bbhw-baremetal - ) + DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${EXECUTABLE}) endfunction() # Create cross-platform test targets @@ -110,35 +101,39 @@ add_cross_platform_test_target(ctest_vecunit_matmul_row_col_vector vecunit_matmu add_cross_platform_test_target(ctest_vecunit_matmul_col_row_vector vecunit_matmul_col_row_vector.c) add_cross_platform_test_target(ctest_vecunit_matmul_random1 vecunit_matmul_random1.c) add_cross_platform_test_target(ctest_vecunit_matmul_random2 vecunit_matmul_random2.c) -add_cross_platform_test_target(ctest_vecunit_matmul_random3 vecunit_matmul_random3.c) +add_cross_platform_test_target(ctest_vecunit_tiled_matmul vecunit_tiled_matmul.c) add_cross_platform_test_target(ctest_vecunit_matmul_zero_random vecunit_matmul_zero_random.c) add_cross_platform_test_target(ctest_vecunit_matmul_16xn_ones vecunit_matmul_16xn_ones.c) add_cross_platform_test_target(ctest_vecunit_matmul_16xn_random1 vecunit_matmul_16xn_random1.c) add_cross_platform_test_target(ctest_vecunit_matmul_16xn_random2 vecunit_matmul_16xn_random2.c) add_cross_platform_test_target(ctest_vecunit_matmul_16xn_random3 vecunit_matmul_16xn_random3.c) add_cross_platform_test_target(ctest_vecunit_matmul_16xn_zero_random vecunit_matmul_16xn_zero_random.c) -add_cross_platform_test_target(ctest_vecunit_simple_nn_forward_pass_test vecunit_simple_nn_forward_pass_test.c) add_cross_platform_test_target(ctest_im2col_test im2col_test.c) -add_cross_platform_test_target(ctest_bbfp_matmul_test bbfpmatmul.c) -add_cross_platform_test_target(ctest_bbfp_matmul_random1 bbfp_matmul_random1.c) -add_cross_platform_test_target(ctest_bbfp_matmul_random2 bbfp_matmul_random2.c) -add_cross_platform_test_target(ctest_bbfp_matmul_random3 bbfp_matmul_random3.c) add_cross_platform_test_target(ctest_transpose_test transpose_test.c) +add_cross_platform_test_target(ctest_transpose_16xn_test transpose_16xn_test.c) add_cross_platform_test_target(ctest_transpose_matmul transpose_matmul.c) -add_cross_platform_test_target(ctest_tiled_matmul tiled_matmul.c) add_cross_platform_test_target(ctest_relu_test relu_test.c) -add_cross_platform_test_target(ctest_nnlut_test nnlut_test.c) -add_cross_platform_test_target(ctest_snn_test snn_test.c) -add_cross_platform_test_target(ctest_abft_systolic_test abft_systolic_test.c) -add_cross_platform_test_target(ctest_conv_test conv_test.c) -add_cross_platform_test_target(ctest_cim_test cim_test.c) -add_cross_platform_test_target(ctest_transfer_test transfer_test.c) - -# Create library dependency target to ensure all libraries are built first -# add_custom_target(buckyball-libs-build -# DEPENDS bbmem-linux-build bbisa-linux-build bbmem-baremetal-build bbisa-baremetal-build -# COMMENT "Building all required libraries" -# ) +add_cross_platform_test_target(ctest_mxfp_test mxfp_test.c) +add_cross_platform_test_target(ctest_vsetvli test_vsetvli.c) +add_cross_platform_test_target(ctest_bfp_test bfp_test.c) +add_cross_platform_test_target(ctest_quant_test quant_test.c) +add_cross_platform_test_target(ctest_dequant_test dequant_test.c) +add_cross_platform_test_target(ctest_tlb_test tlb_test.c) +add_cross_platform_test_target(ctest_bdb_counter_test bdb_counter_test.c) +add_cross_platform_test_target(ctest_bdb_backdoor_test bdb_backdoor_test.c) +add_cross_platform_test_target(ctest_gemmini_os_risc_basic_test gemmini_os_risc_basic_test.c) +add_cross_platform_test_target(ctest_gemmini_os_risc_shift_test gemmini_os_risc_shift_test.c) +add_cross_platform_test_target(ctest_gemmini_os_risc_atranspose_test gemmini_os_risc_atranspose_test.c) +add_cross_platform_test_target(ctest_gemmini_os_risc_btranspose_test gemmini_os_risc_btranspose_test.c) +add_cross_platform_test_target(ctest_gemmini_os_risc_abtranspose_test gemmini_os_risc_abtranspose_test.c) +add_cross_platform_test_target(ctest_gemmini_ws_risc_basic_test gemmini_ws_risc_basic_test.c) +add_cross_platform_test_target(ctest_gemmini_ws_risc_shift_test gemmini_ws_risc_shift_test.c) +add_cross_platform_test_target(ctest_gemmini_ws_risc_btranspose_test gemmini_ws_risc_btranspose_test.c) +add_cross_platform_test_target(ctest_gemmini_os_cisc_basic_test gemmini_os_cisc_basic_test.c) +add_cross_platform_test_target(ctest_gemmini_os_cisc_shift_test gemmini_os_cisc_shift_test.c) +add_cross_platform_test_target(ctest_gemmini_os_cisc_atranspose_test gemmini_os_cisc_atranspose_test.c) +add_cross_platform_test_target(ctest_gemmini_os_cisc_btranspose_test gemmini_os_cisc_btranspose_test.c) +add_cross_platform_test_target(ctest_gemmini_ws_cisc_conv_test gemmini_ws_cisc_conv_test.c) # Create master build target add_custom_target(buckyball-CTest-build ALL DEPENDS @@ -150,28 +145,38 @@ add_custom_target(buckyball-CTest-build ALL DEPENDS ctest_vecunit_matmul_col_row_vector ctest_vecunit_matmul_random1 ctest_vecunit_matmul_random2 - ctest_vecunit_matmul_random3 + ctest_vecunit_tiled_matmul ctest_vecunit_matmul_zero_random ctest_vecunit_matmul_16xn_ones ctest_vecunit_matmul_16xn_random1 ctest_vecunit_matmul_16xn_random2 ctest_vecunit_matmul_16xn_random3 ctest_vecunit_matmul_16xn_zero_random - ctest_vecunit_simple_nn_forward_pass_test ctest_im2col_test - ctest_bbfp_matmul_test - ctest_bbfp_matmul_random1 - ctest_bbfp_matmul_random2 - ctest_bbfp_matmul_random3 ctest_transpose_test + ctest_transpose_16xn_test ctest_transpose_matmul - ctest_tiled_matmul ctest_relu_test - ctest_nnlut_test - ctest_snn_test - ctest_abft_systolic_test - ctest_conv_test - ctest_cim_test - ctest_transfer_test + ctest_mxfp_test + ctest_vsetvli + ctest_bfp_test + ctest_quant_test + ctest_dequant_test + ctest_tlb_test + ctest_bdb_counter_test + ctest_bdb_backdoor_test + ctest_gemmini_os_risc_basic_test + ctest_gemmini_os_risc_shift_test + ctest_gemmini_os_risc_atranspose_test + ctest_gemmini_os_risc_btranspose_test + ctest_gemmini_os_risc_abtranspose_test + ctest_gemmini_ws_risc_basic_test + ctest_gemmini_ws_risc_shift_test + ctest_gemmini_ws_risc_btranspose_test + ctest_gemmini_os_cisc_basic_test + ctest_gemmini_os_cisc_shift_test + ctest_gemmini_os_cisc_atranspose_test + ctest_gemmini_os_cisc_btranspose_test + ctest_gemmini_ws_cisc_conv_test COMMENT "Building all workloads for Buckyball" VERBATIM) diff --git a/bb-tests/workloads/src/CTest/toy/abft_systolic_test.c b/bb-tests/workloads/src/CTest/toy/abft_systolic_test.c deleted file mode 100644 index 92dcbf1d..00000000 --- a/bb-tests/workloads/src/CTest/toy/abft_systolic_test.c +++ /dev/null @@ -1,108 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); -static elem_t output_matrix_c[DIM * DIM] __attribute__((aligned(64))); - -// CPU reference computation for matrix multiplication with ABFT -int abft_systolic_cpu_reference(elem_t *a, elem_t *b, elem_t *c, int size) { - // Compute C = A * B - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - int32_t sum = 0; - for (int k = 0; k < size; k++) { - sum += (int32_t)a[i * size + k] * (int32_t)b[k * size + j]; - } - // Clamp to int8_t range - if (sum > 127) { - c[i * size + j] = 127; - } else if (sum < -128) { - c[i * size + j] = -128; - } else { - c[i * size + j] = (elem_t)sum; - } - } - } - return 1; -} - -void hw_abft_systolic(const char *test_name, elem_t *a, elem_t *b, elem_t *c, - int size) { - // Matrix A in spad bank 0, Matrix B in spad bank 1, result in spad bank 2 - uint32_t op1_addr = spad_addr(0, 0); - uint32_t op2_addr = spad_addr(1, 0); - uint32_t wr_addr = spad_addr(2, 0); - - // Move input matrices into scratchpad - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_fence(); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - - // Call ABFT systolic array instruction - bb_abft_systolic(op1_addr, op2_addr, wr_addr, size); - bb_fence(); - - // Result will be moved back in run_test for verification -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, elem_t *c, int size) { - // CPU reference computation - abft_systolic_cpu_reference(a, b, c, size); - - // Hardware computation - hw_abft_systolic(test_name, a, b, c, size); - - // Move result back from scratchpad for verification - uint32_t wr_addr = spad_addr(2, 0); - bb_mvout((uintptr_t)output_matrix_c, wr_addr, size, 1); - bb_fence(); - - // Verify results - int passed = 1; - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - int idx = i * size + j; - if (output_matrix_c[idx] != c[idx]) { - printf("Mismatch at [%d][%d]: expected %d, got %d\n", i, j, c[idx], - output_matrix_c[idx]); - passed = 0; - } - } - } - - return passed; -} - -int test_abft_systolic(int seed) { - // Initialize input matrices with random values - init_i8_random_matrix(input_matrix_a, DIM, DIM, seed); - init_i8_random_matrix(input_matrix_b, DIM, DIM, seed + 1); - - // Run hardware test with verification - return run_test("ABFT-Systolic", input_matrix_a, input_matrix_b, - output_matrix_c, DIM); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - - int passed = test_abft_systolic(5); - if (passed) { - printf("ABFT-Systolic test PASSED!!!!\n"); - } else { - printf("ABFT-Systolic test FAILED\n"); - } - return (!passed); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/aligned_matmul.c b/bb-tests/workloads/src/CTest/toy/aligned_matmul.c index ec3fdbe8..e37b702f 100644 --- a/bb-tests/workloads/src/CTest/toy/aligned_matmul.c +++ b/bb-tests/workloads/src/CTest/toy/aligned_matmul.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM (BANK_WIDTH / sizeof(elem_t)) + // Column count n for 16xn matrix multiplication #define MATMUL_COL 50 // 16-byte alignment of n @@ -18,23 +20,21 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); + uint32_t op1_bank_id = 0; // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); + uint32_t op2_bank_id = 1; // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); uint32_t col_stride = (size + DIM - 1) / DIM; for (int i = 0; i < col_stride; i++) { - bb_mvin((uintptr_t)a + i * DIM, op2_addr + size + i * DIM, DIM, col_stride); + bb_mvin((uintptr_t)a + i * DIM, op2_bank_id, size + i * DIM, col_stride); } - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_mvin((uintptr_t)c, wr_addr, DIM << 2, 1); - bb_fence(); - bb_transpose(op2_addr + size, op1_addr, size, 0); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + int acc_bank_id = 2; // virtual bank id + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_mvin((uintptr_t)c, op2_bank_id, DIM, 1); + bb_transpose(op2_bank_id, op1_bank_id, size, 0); + bb_mul_warp16(op1_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, op2_bank_id, DIM, 1); bb_fence(); } diff --git a/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random1.c b/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random1.c deleted file mode 100644 index 5740649f..00000000 --- a/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random1.c +++ /dev/null @@ -1,66 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); -static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); - -void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, - int size) { - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - printf("op1_addr: %d\n", op1_addr); - printf("op2_addr: %d\n", op2_addr); - printf("wr_addr: %d\n", wr_addr); - - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_bbfp_mul(op1_addr, op2_addr, wr_addr, size); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); - bb_fence(); -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); - cpu_matmul(b, a, expected_matrix, size, size, size); - hw_matmul(test_name, a, b, output_matrix, size); - if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { - printf("Test %s PASSED\n", test_name); - return 1; - } else { - printf("Test %s FAILED\n", test_name); - return 0; - } -} - -int test_random1() { - init_bbfp_random_matrix(input_matrix_a, DIM, DIM, 456); - init_bbfp_random_matrix(input_matrix_b, DIM, DIM, 789); - return run_test("Random matrices 1", input_matrix_a, input_matrix_b, DIM); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - int passed = test_random1(); - if (passed) { - printf("bbfp_matmul_random1 test PASSED\n"); - } else { - printf("bbfp_matmul_random1 test FAILED\n"); - } -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random2.c b/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random2.c deleted file mode 100644 index 1447b373..00000000 --- a/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random2.c +++ /dev/null @@ -1,66 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); -static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); - -void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, - int size) { - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - printf("op1_addr: %d\n", op1_addr); - printf("op2_addr: %d\n", op2_addr); - printf("wr_addr: %d\n", wr_addr); - - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_bbfp_mul(op1_addr, op2_addr, wr_addr, size); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); - bb_fence(); -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); - cpu_matmul(b, a, expected_matrix, size, size, size); - hw_matmul(test_name, a, b, output_matrix, size); - if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { - printf("Test %s PASSED\n", test_name); - return 1; - } else { - printf("Test %s FAILED\n", test_name); - return 0; - } -} - -int test_random1() { - init_bbfp_random_matrix(input_matrix_a, DIM, DIM, 111); - init_bbfp_random_matrix(input_matrix_b, DIM, DIM, 222); - return run_test("Random matrices 2", input_matrix_a, input_matrix_b, DIM); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - int passed = test_random1(); - if (passed) { - printf("bbfp_matmul_random2 test PASSED\n"); - } else { - printf("bbfp_matmul_random2 test FAILED\n"); - } -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random3.c b/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random3.c deleted file mode 100644 index 064a4466..00000000 --- a/bb-tests/workloads/src/CTest/toy/bbfp_matmul_random3.c +++ /dev/null @@ -1,62 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); -static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); - -void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, - int size) { - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_bbfp_mul(op1_addr, op2_addr, wr_addr, size); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); - bb_fence(); -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); - cpu_matmul(b, a, expected_matrix, size, size, size); - hw_matmul(test_name, a, b, output_matrix, size); - if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { - printf("Test %s PASSED\n", test_name); - return 1; - } else { - printf("Test %s FAILED\n", test_name); - return 0; - } -} - -int test_random1() { - init_bbfp_random_matrix(input_matrix_a, DIM, DIM, 333); - init_bbfp_random_matrix(input_matrix_b, DIM, DIM, 444); - return run_test("Random matrices 3", input_matrix_a, input_matrix_b, DIM); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - int passed = test_random1(); - if (passed) { - printf("bbfp_matmul_random3 test PASSED\n"); - } else { - printf("bbfp_matmul_random3 test FAILED\n"); - } -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/bbfpmatmul.c b/bb-tests/workloads/src/CTest/toy/bbfpmatmul.c deleted file mode 100644 index c2a00a4c..00000000 --- a/bb-tests/workloads/src/CTest/toy/bbfpmatmul.c +++ /dev/null @@ -1,73 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include -#include - -void init_matrix(elem_t *matrix, int rows, int cols, int seed) { - srand(seed); - for (int i = 0; i < rows * cols; i++) { - matrix[i] = - rand() % 4; // Initialize with random values in the range [0, 127] - } -} -void flip_matrix(elem_t *matrix, int rows, int cols) { - for (int i = 0; i < rows / 2; i++) { - for (int j = 0; j < cols; j++) { - elem_t temp = matrix[i * cols + j]; - matrix[i * cols + j] = matrix[(rows - 1 - i) * cols + j]; - matrix[(rows - 1 - i) * cols + j] = temp; - } - } -} -// Test matrices -static elem_t input_matrix[DIM * DIM] __attribute__((aligned(64))); -static elem_t transposed_matrix[DIM * DIM] __attribute__((aligned(64))); -static elem_t weight_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_output_matrix[DIM * DIM] __attribute__((aligned(64))); - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); // Only allow specified hart to continue -#endif - - // Initialize weight matrix - init_matrix(weight_matrix, DIM, DIM, 42); - init_matrix(input_matrix, DIM, DIM, 51); - - // Clear output matrix - memset(output_matrix, 0, sizeof(output_matrix)); - cpu_matmul(input_matrix, weight_matrix, expected_output_matrix, DIM, DIM, - DIM); - // print_matrix("Input", input_matrix, DIM, DIM); - - // Move input to scratchpad - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - bb_mvin((uintptr_t)weight_matrix, op1_addr, DIM, 1); - bb_mvin((uintptr_t)input_matrix, op2_addr, DIM, 1); - bb_fence(); - bb_bbfp_mul(op1_addr, op2_addr, wr_addr, DIM); - bb_fence(); - // Move back from scratchpad to output - bb_mvout((uintptr_t)output_matrix, wr_addr, DIM << 2, 1); - bb_fence(); - if (compare_u32_matrices(output_matrix, expected_output_matrix, DIM, DIM)) { - printf("Test passed!\n"); - } else { - printf("Test failed!\n"); - } - // print_matrix("Output", output_matrix, DIM, DIM); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/bbfptest.c b/bb-tests/workloads/src/CTest/toy/bbfptest.c deleted file mode 100644 index 5f4e37ff..00000000 --- a/bb-tests/workloads/src/CTest/toy/bbfptest.c +++ /dev/null @@ -1,80 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include -#include - -// Test matrices -static elem_t input_matrix[DIM * DIM] __attribute__((aligned(64))); -static elem_t weight_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); - -// Utility function -void print_result_matrix(const char *name, result_t *matrix, int rows, - int cols) { - printf("Matrix %s:\n", name); - for (int i = 0; i < rows; i++) { - for (int j = 0; j < cols; j++) { - printf("%4d ", (int32_t)matrix[i * cols + j]); - } - printf("\n"); - } - printf("\n"); -} - -void init_matrixv2(elem_t *matrix, int rows, int cols, int seed, int value) { - for (int i = 0; i < rows * cols; i++) { - matrix[i] = value; - } -} - -int compare_matrices(result_t *a, result_t *b, int rows, int cols) { - for (int i = 0; i < rows * cols; i++) { - if (a[i] != b[i]) { - return 0; // Matrices are different - } - } - return 1; // Matrices are the same -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); // Only allow specified hart to continue -#endif - - // Initialize input matrix - init_matrixv2(input_matrix, 16, 16, 42, 3); - init_matrixv2(weight_matrix, 16, 16, 42, 2); - // Clear output matrix - memset(output_matrix, 0, sizeof(output_matrix)); - - // print_matrix("Input", input_matrix, DIM, DIM); - - // Move input to scratchpad - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - bb_mvin((uintptr_t)input_matrix, op1_addr, DIM, 1); - bb_mvin((uintptr_t)weight_matrix, op2_addr, DIM, 1); - printf("Perform Matmul\n"); - bb_bbfp_mul(op1_addr, op2_addr, wr_addr, DIM); - - printf("change"); - bb_matmul_ws(wr_addr, op2_addr, wr_addr, 16); - init_matrixv2(input_matrix, 16, 16, 42, 4); - bb_matmul_ws(wr_addr, op2_addr, wr_addr, 16); - printf("Matmul Done\n"); - bb_mvout(((uintptr_t)output_matrix), wr_addr, DIM << 2, 1); - - print_result_matrix("Output", output_matrix, DIM, DIM); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/bbsim.ld b/bb-tests/workloads/src/CTest/toy/bbsim.ld new file mode 100644 index 00000000..7d8e660d --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/bbsim.ld @@ -0,0 +1,45 @@ +/* bbsim.ld — baremetal linker script for BBSimHarness (DRAM at 0x8000_0000) */ +/* Bootrom jumps directly to 0x80000000 (ELF entry), no _start needed. */ +OUTPUT_ARCH("riscv") +ENTRY(main) + +SECTIONS { + . = 0x80000000; + + .text : { + *(.text.init) /* crt0.S _start must be first */ + *(.text.startup .text.startup.*) + *(.text .text.*) + *(.gnu.linkonce.t.*) + } + + .rodata : { + *(.rodata .rodata.*) + *(.srodata .srodata.*) + *(.gnu.linkonce.r.*) + } + + .data : { + *(.data .data.*) + *(.sdata .sdata.*) + *(.gnu.linkonce.d.*) + *(.gnu.linkonce.s.*) + PROVIDE(_edata = .); + } + + /* All zero-init sections: crt0.S clears __bss_start .. __bss_end, + so heap (at _end) starts after ALL bss including sbss. */ + .bss (NOLOAD) : { + PROVIDE(__bss_start = .); + *(.bss .bss.*) + *(.sbss .sbss.*) + *(.gnu.linkonce.b.*) + *(.gnu.linkonce.sb.*) + *(COMMON) + PROVIDE(__bss_end = .); + } + + . = ALIGN(16); + PROVIDE(end = .); + PROVIDE(_end = .); +} diff --git a/bb-tests/workloads/src/CTest/toy/bdb_backdoor_test.c b/bb-tests/workloads/src/CTest/toy/bdb_backdoor_test.c new file mode 100644 index 00000000..be04c941 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/bdb_backdoor_test.c @@ -0,0 +1,52 @@ +#include "buckyball.h" +#include +#include +#include +#include +#include + +// Test bdb_backdoor: SRAM backdoor write + read via DPI-C +// +// C++ side (ctrace.cc) generates test data: each 128-bit row is filled +// with (row*16 + byte_offset) & 0xFF pattern via generate_test_data(). +// +// Flow: +// 1. bdb_backdoor_write: C++ generates (row, data) per iteration via DPI-C, +// RTL writes to external bank 0 +// 2. bdb_backdoor_read: RTL reads bank 0, sends data back via DPI-C, +// C++ logs to bdb.log as [BANK-TRACE] +// 3. mvout + fence: verify data was actually written (no hang = pass) + +#define DIM 16 + +static elem_t output_matrix[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + // Build expected pattern (must match C++ generate_test_data): + // row i, byte j -> (i*16 + j) & 0xFF + + uint32_t bank_id = 0; + bb_mem_alloc(bank_id, 1, 1); + + // C++ injects DIM rows into bank 0 via DPI-C (data decided by C++) + bdb_backdoor_write(bank_id, DIM); + + // Dump bank 0 contents to bdb.log via DPI-C + bdb_backdoor_read(bank_id, DIM); + + // mvout to verify data integrity + clear_i8_matrix(output_matrix, DIM, DIM); + bb_mvout((uintptr_t)output_matrix, bank_id, DIM, 1); + bb_fence(); + + printf("bdb_backdoor test PASSED\n"); + return 0; + +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/CTest/toy/bdb_counter_test.c b/bb-tests/workloads/src/CTest/toy/bdb_counter_test.c new file mode 100644 index 00000000..5acc8693 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/bdb_counter_test.c @@ -0,0 +1,65 @@ +#include "buckyball.h" +#include +#include +#include +#include +#include + +// Test bdb_counter: start/stop/read cycle counters +// Verification: if the instruction reaches TraceBall and completes without +// hanging, the test passes. The actual [CTRACE] output goes to bdb.log. + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== bdb_counter test ===\n"); + + // Test 1: basic start/stop on counter 0 + printf("Test 1: basic start/stop\n"); + bdb_counter_start(0, 0xA001); + // Do some work to burn cycles + volatile int x = 0; + for (int i = 0; i < 10; i++) x += i; + bdb_counter_stop(0); + printf("Test 1 PASSED\n"); + + // Test 2: read without stopping + printf("Test 2: start/read/stop\n"); + bdb_counter_start(1, 0xA002); + volatile int y = 0; + for (int i = 0; i < 5; i++) y += i; + bdb_counter_read(1); + bdb_counter_stop(1); + printf("Test 2 PASSED\n"); + + // Test 3: nested counters (two levels) + printf("Test 3: nested counters\n"); + bdb_counter_start(0, 0xB001); // outer + bdb_counter_start(1, 0xB002); // inner + volatile int z = 0; + for (int i = 0; i < 5; i++) z += i; + bdb_counter_stop(1); // inner done + bdb_counter_stop(0); // outer done + printf("Test 3 PASSED\n"); + + // Test 4: multiple independent counters + printf("Test 4: multiple counters\n"); + bdb_counter_start(0, 0xC000); + bdb_counter_start(1, 0xC001); + bdb_counter_start(2, 0xC002); + bdb_counter_start(3, 0xC003); + bdb_counter_stop(3); + bdb_counter_stop(2); + bdb_counter_stop(1); + bdb_counter_stop(0); + printf("Test 4 PASSED\n"); + + printf("bdb_counter test PASSED\n"); + return 0; + +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/CTest/toy/bfp_test.c b/bb-tests/workloads/src/CTest/toy/bfp_test.c new file mode 100644 index 00000000..a25c9e20 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/bfp_test.c @@ -0,0 +1,78 @@ +#include "buckyball.h" +#include +#include +#include +#include + +#define DIM 16 + +static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); +static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); +static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); + +void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, + int size) { + // static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); + // transpose_u8_matrix(a, a_transposed, size, size); + // spad0: operand A, offset 0 + uint32_t op1_bank_id = 0; + // spad1: operand B, offset 0 + uint32_t op2_bank_id = 1; + // acc0: write to accumulator, offset 0 + int acc_bank_id = 2; // virtual bank id + // bb_mem_alloc(acc_bank_id, 1, 4); + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + + bb_mvin((uintptr_t)a, op1_bank_id, DIM, 1); + bb_mvin((uintptr_t)b, op2_bank_id, DIM, 1); + + bb_BFP(op1_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size << 2, 1); + bb_fence(); +} + +int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { + cpu_matmul(a, b, expected_matrix, size, size, size); + hw_matmul(test_name, a, b, output_matrix, size); + + if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { + printf("Test %s PASSED\n", test_name); + return 1; + } else { + printf("Test %s FAILED\n", test_name); + return 0; + } + return 1; +} + +int test_ones() { + /* + init_sequence_matrix(input_matrix_a, DIM, DIM); + init_sequence_matrix(input_matrix_b, DIM, DIM); + */ + init_u8_random_matrix(input_matrix_a, DIM, DIM, 111); + init_u8_random_matrix(input_matrix_b, DIM, DIM, 222); + return run_test("BFP Matmul", input_matrix_a, input_matrix_b, DIM); +} + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + int passed = test_ones(); + if (passed) { + printf("BFP Matmul test PASSED\n"); + return 0; + } else { + printf("BFP Matmul test FAILED\n"); + return 1; + } + +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/CTest/toy/buckyball.c b/bb-tests/workloads/src/CTest/toy/buckyball.c index d8a61d26..331adb1c 100644 --- a/bb-tests/workloads/src/CTest/toy/buckyball.c +++ b/bb-tests/workloads/src/CTest/toy/buckyball.c @@ -1,10 +1,12 @@ #include "buckyball.h" #include -#include +#include #include #include #include +#define DIM (BANK_WIDTH / sizeof(elem_t)) + /* Read cycle counter (rdcycle) helper. Works on RV64 with a single rdcycle. On RV32 we read low/high and detect rollover to produce a 64-bit value. */ unsigned long long read_rdcycle(void) { @@ -31,6 +33,14 @@ void init_u8_random_matrix(elem_t *matrix, int rows, int cols, int seed) { } } +// Initialize matrix with incrementing values for debugging +void init_u8_incremental_matrix(elem_t *matrix, int rows, int cols, + int start_value) { + for (int i = 0; i < rows * cols; i++) { + matrix[i] = (start_value + i) & 0xFF; // Keep values in 0-255 range + } +} + void init_u32_random_matrix(result_t *matrix, int rows, int cols, int seed) { srand(seed); for (int i = 0; i < rows * cols; i++) { @@ -263,3 +273,32 @@ unsigned long long read_cycle(void) { asm volatile("csrr %0, cycle" : "=r"(c)); return c; } + +// MMIO stubs are for baremetal/BBSim only. +// Linux user-mode tests (`*-linux` under `spike pk`) must use libc/syscall exit +// path. +#if !defined(__linux__) +// MMIO address map (BBSimHarness, WithDefaultMMIOPort base=0x6000_0000): +// 0x6000_0000 : simulation exit — write triggers sim_exit() +// 0x6002_0000 : UART0 TX — write low byte → putchar in C++ +#define MMIO_SIM_EXIT ((volatile uint32_t *)0x60000000UL) +#define MMIO_UART_TX ((volatile uint32_t *)0x60020000UL) + +// _write: route stdout/stderr through MMIO UART so printf works in simulation. +// nosys.specs provides a weak _write stub; we override it here. +int _write(int fd, const char *buf, int len) { + (void)fd; + for (int i = 0; i < len; i++) { + *MMIO_UART_TX = (uint32_t)(unsigned char)buf[i]; + } + return len; +} + +// _exit: write exit code to MMIO sim-exit register; C++ mmio_tick() detects +// this and calls sim_exit(). +void __attribute__((noreturn)) _exit(int code) { + *MMIO_SIM_EXIT = (uint32_t)code; + while (1) { + } // wait for C++ to process the MMIO write and call sim_exit() +} +#endif diff --git a/bb-tests/workloads/src/CTest/toy/buckyball.h b/bb-tests/workloads/src/CTest/toy/buckyball.h index a3449960..0ab1d148 100644 --- a/bb-tests/workloads/src/CTest/toy/buckyball.h +++ b/bb-tests/workloads/src/CTest/toy/buckyball.h @@ -44,6 +44,8 @@ void print_i32_matrix(const char *name, result_t *matrix, int rows, int cols); void print_i8_matrix(const char *name, elem_t *matrix, int rows, int cols); void init_u8_random_matrix(elem_t *matrix, int rows, int cols, int seed); +void init_u8_incremental_matrix(elem_t *matrix, int rows, int cols, + int start_value); void init_u32_random_matrix(result_t *matrix, int rows, int cols, int seed); void init_i8_random_matrix(elem_t *matrix, int rows, int cols, int seed); void init_i32_random_matrix(result_t *matrix, int rows, int cols, int seed); @@ -77,4 +79,5 @@ void cpu_relu(elem_t *a, elem_t *matrix, int rows, int cols); void cpu_transfer(elem_t *src, elem_t *dst, int rows, int cols); unsigned long long read_cycle(void); + #endif diff --git a/bb-tests/workloads/src/CTest/toy/cim_test.c b/bb-tests/workloads/src/CTest/toy/cim_test.c deleted file mode 100644 index 430bc2d0..00000000 --- a/bb-tests/workloads/src/CTest/toy/cim_test.c +++ /dev/null @@ -1,170 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include - -#define OP1_SIZE 64 -#define OP2_SIZE 64 -#define RESULT_SIZE 64 - -static elem_t operand1[OP1_SIZE] __attribute__((aligned(64))); -static elem_t operand2[OP2_SIZE] __attribute__((aligned(64))); -static elem_t result[RESULT_SIZE] __attribute__((aligned(64))); -static elem_t expected_result[RESULT_SIZE] __attribute__((aligned(64))); - -// CPU reference computation for CIM operations -int cim_cpu_reference(elem_t *op1, elem_t *op2, elem_t *result, int rows, - int cols, int op_type) { - // op_type: 0=matmul, 1=add, 2=mul - if (op_type == 0) { - // Matrix multiplication: result = op1 * op2 - // Assume op1 is rows x cols, op2 is cols x cols - for (int i = 0; i < rows; i++) { - for (int j = 0; j < cols; j++) { - int32_t sum = 0; - for (int k = 0; k < cols; k++) { - sum += (int32_t)op1[i * cols + k] * (int32_t)op2[k * cols + j]; - } - // Clamp to int8_t range - if (sum > 127) { - result[i * cols + j] = 127; - } else if (sum < -128) { - result[i * cols + j] = -128; - } else { - result[i * cols + j] = (elem_t)sum; - } - } - } - } else if (op_type == 1) { - // Element-wise addition - for (int i = 0; i < rows * cols; i++) { - int32_t sum = (int32_t)op1[i] + (int32_t)op2[i]; - if (sum > 127) { - result[i] = 127; - } else if (sum < -128) { - result[i] = -128; - } else { - result[i] = (elem_t)sum; - } - } - } else if (op_type == 2) { - // Element-wise multiplication - for (int i = 0; i < rows * cols; i++) { - int32_t prod = (int32_t)op1[i] * (int32_t)op2[i]; - if (prod > 127) { - result[i] = 127; - } else if (prod < -128) { - result[i] = -128; - } else { - result[i] = (elem_t)prod; - } - } - } - return 1; -} - -void hw_cim(const char *test_name, elem_t *op1, elem_t *op2, elem_t *result, - int rows, int cols, int op_type) { - // Operand 1 in spad bank 0, operand 2 in spad bank 1, result in spad bank 2 - uint32_t op1_addr = spad_addr(0, 0); - uint32_t op2_addr = spad_addr(1, 0); - uint32_t result_addr = spad_addr(2, 0); - - // Move operand 1 into scratchpad - bb_mvin((uintptr_t)op1, op1_addr, OP1_SIZE, 1); - bb_fence(); - - // Move operand 2 into scratchpad - bb_mvin((uintptr_t)op2, op2_addr, OP2_SIZE, 1); - bb_fence(); - - // Call CIM instruction - // iter is the number of iterations (simplified: use rows*cols for now) - uint32_t iter = rows * cols; - bb_cim(op1_addr, op2_addr, result_addr, iter, rows, cols, op_type); - bb_fence(); - - // Result will be moved back in run_test for verification -} - -int run_test(const char *test_name, elem_t *op1, elem_t *op2, elem_t *result, - int rows, int cols, int op_type) { - // CPU reference computation - cim_cpu_reference(op1, op2, expected_result, rows, cols, op_type); - - // Hardware computation - hw_cim(test_name, op1, op2, result, rows, cols, op_type); - - // Move result back from scratchpad for verification - uint32_t result_addr = spad_addr(2, 0); - bb_mvout((uintptr_t)result, result_addr, RESULT_SIZE, 1); - bb_fence(); - - // Verify results - int passed = 1; - for (int i = 0; i < rows; i++) { - for (int j = 0; j < cols; j++) { - int idx = i * cols + j; - if (result[idx] != expected_result[idx]) { - printf("Mismatch at [%d][%d]: expected %d, got %d\n", i, j, - expected_result[idx], result[idx]); - passed = 0; - } - } - } - - return passed; -} - -int test_cim(int seed) { - // Initialize operands with random values (8x8 matrices) - int rows = 8; - int cols = 8; - for (int i = 0; i < OP1_SIZE; i++) { - operand1[i] = (elem_t)(rand() % 256 - 128); - } - for (int i = 0; i < OP2_SIZE; i++) { - operand2[i] = (elem_t)(rand() % 256 - 128); - } - - // Test matrix multiplication (op_type = 0) - int passed = - run_test("CIM-MATMUL", operand1, operand2, result, rows, cols, 0); - if (!passed) { - return 0; - } - - // Test element-wise addition (op_type = 1) - passed = run_test("CIM-ADD", operand1, operand2, result, rows, cols, 1); - if (!passed) { - return 0; - } - - // Test element-wise multiplication (op_type = 2) - passed = run_test("CIM-MUL", operand1, operand2, result, rows, cols, 2); - if (!passed) { - return 0; - } - - return passed; -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - - int passed = test_cim(5); - if (passed) { - printf("CIM test PASSED!!!!\n"); - } else { - printf("CIM test FAILED\n"); - } - return (!passed); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/conv_test.c b/bb-tests/workloads/src/CTest/toy/conv_test.c deleted file mode 100644 index 785cce12..00000000 --- a/bb-tests/workloads/src/CTest/toy/conv_test.c +++ /dev/null @@ -1,141 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include - -#define IFMAP_SIZE 64 -#define WEIGHT_SIZE 16 -#define OFMAP_SIZE 16 - -static elem_t input_feature_map[IFMAP_SIZE] __attribute__((aligned(64))); -static elem_t weights[WEIGHT_SIZE] __attribute__((aligned(64))); -static elem_t output_feature_map[OFMAP_SIZE] __attribute__((aligned(64))); - -// CPU reference computation for simple convolution -int conv_cpu_reference(elem_t *ifmap, elem_t *weight, elem_t *ofmap, int in_h, - int in_w, int kernel_h, int kernel_w) { - // Simplified 2D convolution: assume stride=1, pad=0 - int out_h = in_h - kernel_h + 1; - int out_w = in_w - kernel_w + 1; - - for (int oh = 0; oh < out_h; oh++) { - for (int ow = 0; ow < out_w; ow++) { - int32_t sum = 0; - for (int kh = 0; kh < kernel_h; kh++) { - for (int kw = 0; kw < kernel_w; kw++) { - int ih = oh + kh; - int iw = ow + kw; - int ifmap_idx = ih * in_w + iw; - int weight_idx = kh * kernel_w + kw; - sum += (int32_t)ifmap[ifmap_idx] * (int32_t)weight[weight_idx]; - } - } - // Clamp to int8_t range - if (sum > 127) { - ofmap[oh * out_w + ow] = 127; - } else if (sum < -128) { - ofmap[oh * out_w + ow] = -128; - } else { - ofmap[oh * out_w + ow] = (elem_t)sum; - } - } - } - return 1; -} - -void hw_conv(const char *test_name, elem_t *ifmap, elem_t *weight, - elem_t *ofmap, int in_h, int in_w, int kernel_h, int kernel_w) { - // Input feature map in spad bank 0, weights in spad bank 1, output in spad - // bank 2 - uint32_t ifmap_addr = spad_addr(0, 0); - uint32_t weight_addr = spad_addr(1, 0); - uint32_t ofmap_addr = spad_addr(2, 0); - - // Move input feature map into scratchpad - bb_mvin((uintptr_t)ifmap, ifmap_addr, IFMAP_SIZE, 1); - bb_fence(); - - // Move weights into scratchpad - bb_mvin((uintptr_t)weight, weight_addr, WEIGHT_SIZE, 1); - bb_fence(); - - // Call CONV instruction - // iter is the number of iterations (simplified: use 1 for now) - uint32_t iter = 1; - bb_conv(ifmap_addr, weight_addr, ofmap_addr, iter, in_h, in_w, kernel_h, - kernel_w); - bb_fence(); - - // Result will be moved back in run_test for verification -} - -int run_test(const char *test_name, elem_t *ifmap, elem_t *weight, - elem_t *ofmap, int in_h, int in_w, int kernel_h, int kernel_w) { - // CPU reference computation - conv_cpu_reference(ifmap, weight, ofmap, in_h, in_w, kernel_h, kernel_w); - - // Hardware computation - hw_conv(test_name, ifmap, weight, ofmap, in_h, in_w, kernel_h, kernel_w); - - // Move result back from scratchpad for verification - uint32_t ofmap_addr = spad_addr(2, 0); - bb_mvout((uintptr_t)output_feature_map, ofmap_addr, OFMAP_SIZE, 1); - bb_fence(); - - // Verify results - int out_h = in_h - kernel_h + 1; - int out_w = in_w - kernel_w + 1; - int passed = 1; - for (int i = 0; i < out_h; i++) { - for (int j = 0; j < out_w; j++) { - int idx = i * out_w + j; - if (output_feature_map[idx] != ofmap[idx]) { - printf("Mismatch at [%d][%d]: expected %d, got %d\n", i, j, ofmap[idx], - output_feature_map[idx]); - passed = 0; - } - } - } - - return passed; -} - -int test_conv(int seed) { - // Initialize input feature map with random values (8x8 image) - int in_h = 8; - int in_w = 8; - for (int i = 0; i < IFMAP_SIZE; i++) { - input_feature_map[i] = (elem_t)(rand() % 256 - 128); - } - - // Initialize weights with random values (3x3 kernel) - int kernel_h = 3; - int kernel_w = 3; - for (int i = 0; i < WEIGHT_SIZE; i++) { - weights[i] = (elem_t)(rand() % 256 - 128); - } - - // Run hardware test with verification - return run_test("CONV", input_feature_map, weights, output_feature_map, in_h, - in_w, kernel_h, kernel_w); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - - int passed = test_conv(5); - if (passed) { - printf("CONV test PASSED!!!!\n"); - } else { - printf("CONV test FAILED\n"); - } - return (!passed); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/crt0.S b/bb-tests/workloads/src/CTest/toy/crt0.S new file mode 100644 index 00000000..9e143f3a --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/crt0.S @@ -0,0 +1,25 @@ +/* crt0.S — minimal startup for BBSimHarness baremetal + * Bootrom jumps to 0x80000000 (this code). + * Sets up stack, clears BSS, then calls main. */ +.section .text.init,"ax",@progbits +.global _start +_start: + /* stack pointer: use a fixed address well above code */ + li sp, 0x80400000 + + /* clear BSS */ + la a0, __bss_start + la a1, __bss_end +bss_loop: + bgeu a0, a1, bss_done + sd zero, 0(a0) + addi a0, a0, 8 + j bss_loop +bss_done: + + call main + + /* if main returns, write 0 to sim_exit */ + li a0, 0x60000000 + sw zero, 0(a0) +1: j 1b diff --git a/bb-tests/workloads/src/CTest/toy/dequant_test.c b/bb-tests/workloads/src/CTest/toy/dequant_test.c new file mode 100644 index 00000000..8d8ff0bc --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/dequant_test.c @@ -0,0 +1,87 @@ +#include "buckyball.h" +#include +#include +#include +#include +#include + +#define DIM 16 + +// INT32 input data (4 elements per SRAM word, 16 words) +static int32_t int32_input[DIM * 4] __attribute__((aligned(64))) = { + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 7, 100, -100, 8, 16, -8, + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 7, 100, -100, 8, 16, -8, + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 7, 100, -100, 8, 16, -8, + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 7, 100, -100, 8, 16, -8, +}; + +// Expected FP32 output as bit patterns: int32_val * 1.0 = float(int32_val) +// 1.0=0x3F800000, 2.0=0x40000000, 3.0=0x40400000, -1.0=0xBF800000 +// -2.0=0xC0000000, 0.0=0x00000000, 4.0=0x40800000, 5.0=0x40A00000 +// 10.0=0x41200000, -10.0=0xC1200000, 7.0=0x40E00000, 100.0=0x42C80000 +// -100.0=0xC2C80000, 8.0=0x41000000, 16.0=0x41800000, -8.0=0xC1000000 +static uint32_t expected_fp32[DIM * 4] __attribute__((aligned(64))) = { + 0x3F800000, 0x40000000, 0x40400000, 0xBF800000, 0xC0000000, 0x00000000, + 0x40800000, 0x40A00000, 0x41200000, 0xC1200000, 0x40E00000, 0x42C80000, + 0xC2C80000, 0x41000000, 0x41800000, 0xC1000000, 0x3F800000, 0x40000000, + 0x40400000, 0xBF800000, 0xC0000000, 0x00000000, 0x40800000, 0x40A00000, + 0x41200000, 0xC1200000, 0x40E00000, 0x42C80000, 0xC2C80000, 0x41000000, + 0x41800000, 0xC1000000, 0x3F800000, 0x40000000, 0x40400000, 0xBF800000, + 0xC0000000, 0x00000000, 0x40800000, 0x40A00000, 0x41200000, 0xC1200000, + 0x40E00000, 0x42C80000, 0xC2C80000, 0x41000000, 0x41800000, 0xC1000000, + 0x3F800000, 0x40000000, 0x40400000, 0xBF800000, 0xC0000000, 0x00000000, + 0x40800000, 0x40A00000, 0x41200000, 0xC1200000, 0x40E00000, 0x42C80000, + 0xC2C80000, 0x41000000, 0x41800000, 0xC1000000, +}; + +static uint32_t output_fp32[DIM * 4] __attribute__((aligned(64))); + +// FP32 bit pattern for scale = 1.0 +#define SCALE_1_0 0x3F800000U + +void hw_dequant(int32_t *input, uint32_t *output, int num_words) { + uint32_t op1_bank_id = 0; + uint32_t wr_bank_id = 1; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(wr_bank_id, 1, 1); + + bb_mvin((uintptr_t)input, op1_bank_id, num_words, 1); + + bb_dequant(op1_bank_id, wr_bank_id, num_words, SCALE_1_0); + + bb_mvout((uintptr_t)output, wr_bank_id, num_words, 1); + bb_fence(); +} + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + for (int i = 0; i < DIM * 4; i++) { + output_fp32[i] = 0; + } + + hw_dequant(int32_input, output_fp32, DIM); + + int passed = 1; + for (int i = 0; i < DIM * 4; i++) { + if (output_fp32[i] != expected_fp32[i]) { + printf("MISMATCH at [%d]: got 0x%08X, expected 0x%08X\n", i, + output_fp32[i], expected_fp32[i]); + passed = 0; + } + } + + if (passed) { + printf("Dequant test PASSED\n"); + } else { + printf("Dequant test FAILED\n"); + } + return (!passed); + +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_atranspose_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_atranspose_test.c new file mode 100644 index 00000000..f9d80866 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_atranspose_test.c @@ -0,0 +1,43 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_at[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS CISC a_transpose Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + transpose_u8_matrix(mat_a, mat_at, DIM, DIM); + cpu_matmul(mat_at, mat_b, expected, DIM, DIM, DIM); + + bb_gemmini_config(0, 0, 1, 0, 0); + bb_gemmini_loop_ws_config_bounds(1, 1, 1); + bb_gemmini_loop_ws_config_addr_a((uintptr_t)mat_a); + bb_gemmini_loop_ws_config_addr_b((uintptr_t)mat_b); + bb_gemmini_loop_ws_config_addr_d(0); + bb_gemmini_loop_ws_config_addr_c((uintptr_t)mat_c); + bb_gemmini_loop_ws_config_strides_ab(DIM, DIM); + bb_gemmini_loop_ws_config_strides_dc(0, DIM * 4); + bb_gemmini_loop_ws(0, 1, 2, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS CISC a_transpose Test PASSED\n"); + return 0; + } + printf("Gemmini OS CISC a_transpose Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_basic_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_basic_test.c new file mode 100644 index 00000000..a0f0ea1d --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_basic_test.c @@ -0,0 +1,45 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_a_t[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS CISC Basic Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + // OS mode (transposer disabled): mesh computes A_loaded^T * B. + // Pre-transpose A so result = mat_a_t^T * mat_b = mat_a * mat_b. + transpose_u8_matrix(mat_a, mat_a_t, DIM, DIM); + cpu_matmul(mat_a, mat_b, expected, DIM, DIM, DIM); + + bb_gemmini_config(0, 0, 0, 0, 0); + bb_gemmini_loop_ws_config_bounds(1, 1, 1); + bb_gemmini_loop_ws_config_addr_a((uintptr_t)mat_a_t); + bb_gemmini_loop_ws_config_addr_b((uintptr_t)mat_b); + bb_gemmini_loop_ws_config_addr_d(0); + bb_gemmini_loop_ws_config_addr_c((uintptr_t)mat_c); + bb_gemmini_loop_ws_config_strides_ab(DIM, DIM); + bb_gemmini_loop_ws_config_strides_dc(0, DIM * 4); + bb_gemmini_loop_ws(0, 1, 2, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS CISC Basic Test PASSED\n"); + return 0; + } + printf("Gemmini OS CISC Basic Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_btranspose_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_btranspose_test.c new file mode 100644 index 00000000..dc2d2575 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_btranspose_test.c @@ -0,0 +1,43 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_bt[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS CISC b_transpose Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + transpose_u8_matrix(mat_b, mat_bt, DIM, DIM); + cpu_matmul(mat_a, mat_bt, expected, DIM, DIM, DIM); + + bb_gemmini_config(0, 0, 0, 1, 0); + bb_gemmini_loop_ws_config_bounds(1, 1, 1); + bb_gemmini_loop_ws_config_addr_a((uintptr_t)mat_a); + bb_gemmini_loop_ws_config_addr_b((uintptr_t)mat_b); + bb_gemmini_loop_ws_config_addr_d(0); + bb_gemmini_loop_ws_config_addr_c((uintptr_t)mat_c); + bb_gemmini_loop_ws_config_strides_ab(DIM, DIM); + bb_gemmini_loop_ws_config_strides_dc(0, DIM * 4); + bb_gemmini_loop_ws(0, 1, 2, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS CISC b_transpose Test PASSED\n"); + return 0; + } + printf("Gemmini OS CISC b_transpose Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_shift_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_shift_test.c new file mode 100644 index 00000000..d66bd470 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_cisc_shift_test.c @@ -0,0 +1,44 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 +#define SHIFT 4 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS CISC in_shift Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + cpu_matmul(mat_a, mat_b, expected, DIM, DIM, DIM); + for (int i = 0; i < DIM * DIM; i++) + expected[i] >>= SHIFT; + + bb_gemmini_config(0, 0, 0, 0, SHIFT); + bb_gemmini_loop_ws_config_bounds(1, 1, 1); + bb_gemmini_loop_ws_config_addr_a((uintptr_t)mat_a); + bb_gemmini_loop_ws_config_addr_b((uintptr_t)mat_b); + bb_gemmini_loop_ws_config_addr_d(0); + bb_gemmini_loop_ws_config_addr_c((uintptr_t)mat_c); + bb_gemmini_loop_ws_config_strides_ab(DIM, DIM); + bb_gemmini_loop_ws_config_strides_dc(0, DIM * 4); + bb_gemmini_loop_ws(0, 1, 2, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS CISC in_shift Test PASSED\n"); + return 0; + } + printf("Gemmini OS CISC in_shift Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_abtranspose_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_abtranspose_test.c new file mode 100644 index 00000000..c15dbe32 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_abtranspose_test.c @@ -0,0 +1,46 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_at[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_bt[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS RISC ab_transpose Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + transpose_u8_matrix(mat_a, mat_at, DIM, DIM); + transpose_u8_matrix(mat_b, mat_bt, DIM, DIM); + cpu_matmul(mat_at, mat_bt, expected, DIM, DIM, DIM); + + uint32_t bank_a = 0, bank_b = 1, bank_c = 2; + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_b, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_b, bank_b, DIM, 1); + bb_gemmini_config(0, 0, 1, 1, 0); + bb_gemmini_preload(bank_a, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_b, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS RISC ab_transpose Test PASSED\n"); + return 0; + } + printf("Gemmini OS RISC ab_transpose Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_atranspose_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_atranspose_test.c new file mode 100644 index 00000000..c4bbddcb --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_atranspose_test.c @@ -0,0 +1,44 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_at[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS RISC a_transpose Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + transpose_u8_matrix(mat_a, mat_at, DIM, DIM); + cpu_matmul(mat_at, mat_b, expected, DIM, DIM, DIM); + + uint32_t bank_a = 0, bank_b = 1, bank_c = 2; + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_b, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_b, bank_b, DIM, 1); + bb_gemmini_config(0, 0, 1, 0, 0); + bb_gemmini_preload(bank_a, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_b, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS RISC a_transpose Test PASSED\n"); + return 0; + } + printf("Gemmini OS RISC a_transpose Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_basic_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_basic_test.c new file mode 100644 index 00000000..bdde74fb --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_basic_test.c @@ -0,0 +1,68 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_a_t[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t zeros[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + printf("=== Gemmini OS RISC Basic Test ===\n"); + + for (int i = 0; i < DIM * DIM; i++) { + mat_a[i] = (elem_t)((i + 1) % 128); + mat_b[i] = (elem_t)((2 * (i + 1)) % 128); + } + transpose_u8_matrix(mat_a, mat_a_t, DIM, DIM); + // Hardware (transposer disabled) computes mat_a_t^T * mat_b = mat_a * mat_b + cpu_matmul(mat_a, mat_b, expected, DIM, DIM, DIM); + + uint32_t bank_a = 0, bank_b = 1, bank_c = 2, bank_d_zeros = 3; + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_b, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mem_alloc(bank_d_zeros, 1, 1); + bb_mvin((uintptr_t)mat_a_t, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_b, bank_b, DIM, 1); + bb_mvin((uintptr_t)zeros, bank_d_zeros, DIM, 1); + bb_gemmini_config(0, 0, 0, 0, 0); + /* Preload D from zeros bank so C = A*B + D = A*B (not A*B + A) */ + bb_gemmini_preload(bank_d_zeros, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_b, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS RISC Basic Test PASSED\n"); + return 0; + } + + printf("got mat_c: "); + for (int i = 0; i < DIM; i++) { + printf("%d ", mat_c[i]); + } + printf("\n"); + // for (int i = 0; i < DIM; i++) { + // for (int j = 0; j < DIM; j++) { + // printf("%d ", mat_c[i * DIM + j]); + // } + // printf("\n"); + // } + + printf("exp row0: "); + for (int i = 0; i < DIM; i++) { + printf("%d ", expected[i]); + } + printf("\n"); + printf("Gemmini OS RISC Basic Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_btranspose_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_btranspose_test.c new file mode 100644 index 00000000..1aa09779 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_btranspose_test.c @@ -0,0 +1,44 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_bt[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS RISC b_transpose Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + transpose_u8_matrix(mat_b, mat_bt, DIM, DIM); + cpu_matmul(mat_a, mat_bt, expected, DIM, DIM, DIM); + + uint32_t bank_a = 0, bank_b = 1, bank_c = 2; + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_b, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_b, bank_b, DIM, 1); + bb_gemmini_config(0, 0, 0, 1, 0); + bb_gemmini_preload(bank_a, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_b, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS RISC b_transpose Test PASSED\n"); + return 0; + } + printf("Gemmini OS RISC b_transpose Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_shift_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_shift_test.c new file mode 100644 index 00000000..8921bb66 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_os_risc_shift_test.c @@ -0,0 +1,45 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 +#define SHIFT 4 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini OS RISC in_shift Test ===\n"); + + init_u8_random_matrix(mat_a, DIM, DIM, 42); + init_u8_random_matrix(mat_b, DIM, DIM, 84); + cpu_matmul(mat_a, mat_b, expected, DIM, DIM, DIM); + for (int i = 0; i < DIM * DIM; i++) + expected[i] >>= SHIFT; + + uint32_t bank_a = 0, bank_b = 1, bank_c = 2; + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_b, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_b, bank_b, DIM, 1); + bb_gemmini_config(0, 0, 0, 0, SHIFT); + bb_gemmini_preload(bank_a, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_b, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini OS RISC in_shift Test PASSED\n"); + return 0; + } + printf("Gemmini OS RISC in_shift Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_ws_cisc_conv_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_ws_cisc_conv_test.c new file mode 100644 index 00000000..e19af1a0 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_ws_cisc_conv_test.c @@ -0,0 +1,71 @@ +#include "buckyball.h" +#include +#include +#include + +// Pointwise conv: batch=1, in_dim=1, in_channels=DIM, out_channels=DIM, +// kernel=1 Degenerates to a single DIM x DIM matmul (input[1x1xIC] * +// weight[1x1xICxOC]) +#define DIM 16 +#define BATCH 1 +#define IN_DIM 1 +#define OUT_DIM 1 +#define IN_CH DIM +#define OUT_CH DIM +#define KERNEL_DIM 1 + +// input layout: [batch, in_dim, in_dim, in_ch] = [1, 1, 1, 16] = 16 bytes +// weight layout: [krow, kcol, in_ch, out_ch] = [1, 1, 16, 16] = 256 bytes +// output layout: [batch, out_dim, out_dim, out_ch] = [1, 1, 1, 16] x 4 bytes = +// 64 bytes +static elem_t input[BATCH * IN_DIM * IN_DIM * IN_CH] + __attribute__((aligned(64))); +static elem_t weight[KERNEL_DIM * KERNEL_DIM * IN_CH * OUT_CH] + __attribute__((aligned(64))); +static result_t output[BATCH * OUT_DIM * OUT_DIM * OUT_CH] + __attribute__((aligned(64))); +static result_t expected[BATCH * OUT_DIM * OUT_DIM * OUT_CH] + __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini WS CISC Loop Conv Test ===\n"); + + // Initialize: input = 1x16 row vector, weight = 16x16 matrix + init_u8_random_matrix(input, IN_CH, 1, 42); + init_u8_random_matrix(weight, IN_CH, OUT_CH, 84); + + // CPU reference: output[0..OUT_CH] = sum_k(input[k] * weight[k][j]) + // cpu_matmul(A[M×K], B[K×N], C[M×N]) — here A=input(1×IN_CH), + // B=weight(IN_CH×OUT_CH) + cpu_matmul(input, weight, expected, 1, OUT_CH, IN_CH); + + // Strides: + // input_stride = in_ch * elemSize = 16 * 1 = 16 (bytes per spatial step) + // weight_stride = in_ch * out_ch * elemSize = 16 * 16 * 1 = 256 (bytes per + // kernel step) output_stride = out_ch * accBytes = 16 * 4 = 64 + bb_gemmini_config(0, 0, 0, 0, 0); + bb_gemmini_loop_conv_ws_config_1(BATCH, IN_DIM, IN_CH); + bb_gemmini_loop_conv_ws_config_2(OUT_CH, OUT_DIM, 1, 0); + bb_gemmini_loop_conv_ws_config_3(KERNEL_DIM, 0, 0, 0); + bb_gemmini_loop_conv_ws_config_4(0); // no bias + bb_gemmini_loop_conv_ws_config_5((uintptr_t)input); + bb_gemmini_loop_conv_ws_config_6((uintptr_t)weight); + bb_gemmini_loop_conv_ws_config_7((uintptr_t)output); + bb_gemmini_loop_conv_ws_config_8( + IN_CH, IN_CH * OUT_CH); // input_stride, weight_stride + bb_gemmini_loop_conv_ws_config_9(OUT_CH * 4); // output_stride (accBytes=4) + bb_gemmini_loop_conv_ws( + 0, 1, 2, 1); // bank_input=0, bank_weight=1, bank_output=2, no_bias=1 + bb_fence(); + + if (compare_u32_matrices(output, expected, 1, OUT_CH)) { + printf("Gemmini WS CISC Loop Conv Test PASSED\n"); + return 0; + } + printf("Gemmini WS CISC Loop Conv Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_basic_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_basic_test.c new file mode 100644 index 00000000..66a71950 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_basic_test.c @@ -0,0 +1,47 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_d[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + printf("=== Gemmini WS RISC Basic Test ===\n"); + + for (int i = 0; i < DIM * DIM; i++) { + mat_a[i] = (elem_t)((i + 1) % 128); + mat_b[i] = (elem_t)((2 * (i + 1)) % 128); + } + clear_u8_matrix(mat_d, DIM, DIM); + cpu_matmul(mat_a, mat_b, expected, DIM, DIM, DIM); + + // WS: bank_w=weights(B), bank_a=activations(A), bank_d=bias(0), bank_c=output + uint32_t bank_w = 0, bank_a = 1, bank_d = 2, bank_c = 3; + bb_mem_alloc(bank_w, 1, 1); + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_d, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_b, bank_w, DIM, 1); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_d, bank_d, DIM, 1); + bb_gemmini_config(1, 0, 0, 0, 0); + bb_gemmini_preload(bank_w, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_d, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini WS RISC Basic Test PASSED\n"); + return 0; + } + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_btranspose_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_btranspose_test.c new file mode 100644 index 00000000..b7063a1a --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_btranspose_test.c @@ -0,0 +1,50 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_bt[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_d[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini WS RISC b_transpose Test ===\n"); + + for (int i = 0; i < DIM * DIM; i++) { + mat_a[i] = (elem_t)((i + 1) % 128); + mat_b[i] = (elem_t)((2 * (i + 1)) % 128); + } + transpose_u8_matrix(mat_b, mat_bt, DIM, DIM); + clear_u8_matrix(mat_d, DIM, DIM); + cpu_matmul(mat_a, mat_bt, expected, DIM, DIM, DIM); + + uint32_t bank_w = 0, bank_a = 1, bank_d = 2, bank_c = 3; + bb_mem_alloc(bank_w, 1, 1); + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_d, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_b, bank_w, DIM, 1); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_d, bank_d, DIM, 1); + bb_gemmini_config(1, 0, 0, 1, 0); + bb_gemmini_preload(bank_w, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_d, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini WS RISC b_transpose Test PASSED\n"); + return 0; + } + printf("Gemmini WS RISC b_transpose Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_shift_test.c b/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_shift_test.c new file mode 100644 index 00000000..305a07f7 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/gemmini_ws_risc_shift_test.c @@ -0,0 +1,51 @@ +#include "buckyball.h" +#include +#include +#include + +#define DIM 16 +#define SHIFT 4 + +static elem_t mat_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_b[DIM * DIM] __attribute__((aligned(64))); +static elem_t mat_d[DIM * DIM] __attribute__((aligned(64))); +static result_t mat_c[DIM * DIM] __attribute__((aligned(64))); +static result_t expected[DIM * DIM] __attribute__((aligned(64))); + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + printf("=== Gemmini WS RISC in_shift Test ===\n"); + + for (int i = 0; i < DIM * DIM; i++) { + mat_a[i] = (elem_t)((i + 1) % 128); + mat_b[i] = (elem_t)((2 * (i + 1)) % 128); + } + clear_u8_matrix(mat_d, DIM, DIM); + cpu_matmul(mat_a, mat_b, expected, DIM, DIM, DIM); + for (int i = 0; i < DIM * DIM; i++) + expected[i] >>= SHIFT; + + uint32_t bank_w = 0, bank_a = 1, bank_d = 2, bank_c = 3; + bb_mem_alloc(bank_w, 1, 1); + bb_mem_alloc(bank_a, 1, 1); + bb_mem_alloc(bank_d, 1, 1); + bb_mem_alloc(bank_c, 1, 4); + bb_mvin((uintptr_t)mat_b, bank_w, DIM, 1); + bb_mvin((uintptr_t)mat_a, bank_a, DIM, 1); + bb_mvin((uintptr_t)mat_d, bank_d, DIM, 1); + bb_gemmini_config(1, 0, 0, 0, SHIFT); + bb_gemmini_preload(bank_w, bank_c, DIM); + bb_gemmini_compute_preloaded(bank_a, bank_d, bank_c, DIM); + bb_mvout((uintptr_t)mat_c, bank_c, DIM, 1); + bb_fence(); + + if (compare_u32_matrices(mat_c, expected, DIM, DIM)) { + printf("Gemmini WS RISC in_shift Test PASSED\n"); + return 0; + } + printf("Gemmini WS RISC in_shift Test FAILED\n"); + return 1; +} diff --git a/bb-tests/workloads/src/CTest/toy/im2col_test.c b/bb-tests/workloads/src/CTest/toy/im2col_test.c index 18641ef9..d3ebae5a 100644 --- a/bb-tests/workloads/src/CTest/toy/im2col_test.c +++ b/bb-tests/workloads/src/CTest/toy/im2col_test.c @@ -1,37 +1,103 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * 64] __attribute__((aligned(64))); -static elem_t output_matrix_b[DIM * 1024] __attribute__((aligned(64))); +#define DIM 16 +#define KROW 4 +#define KCOL 1 +#define INROW 16 +#define INCOL 16 +#define STARTROW 0 +#define STARTCOL 0 + +#define EXPECTED_ROWS 1024 +#define EXPECTED_COLS 4 +#define LANES_PER_BEAT 16 + +static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); +static elem_t output_matrix_b[EXPECTED_ROWS * EXPECTED_COLS] + __attribute__((aligned(64))); +static elem_t expected_matrix[EXPECTED_ROWS * EXPECTED_COLS] + __attribute__((aligned(64))); + +static int conv_num(void) { + return (INROW - KROW + 1 - STARTROW) * (INCOL - KCOL + 1 - STARTCOL); +} + +static int kernel_elems(void) { return KROW * KCOL; } + +static void build_expected_im2col_matrix(elem_t *input, elem_t *expected, + int inrow, int incol, int krow, + int kcol, int startrow, int startcol) { + clear_i8_matrix(expected, EXPECTED_ROWS, EXPECTED_COLS); + + int row_end = inrow - krow; + int col_end = incol - kcol; + int kernel = krow * kcol; + int window_idx = 0; + + for (int r = startrow; r <= row_end; r++) { + for (int c = startcol; c <= col_end; c++) { + int elem_idx = 0; + for (int kr = 0; kr < krow; kr++) { + for (int kc = 0; kc < kcol; kc++) { + expected[window_idx * kernel + elem_idx] = + input[(r + kr) * incol + (c + kc)]; + elem_idx++; + } + } + window_idx++; + } + } +} void hw_im2col(const char *test_name, elem_t *a, elem_t *b, int size) { + (void)test_name; + (void)size; + // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); + uint32_t op1_bank_id = 0; // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); + uint32_t op2_bank_id = 1; + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); - bb_mvin((uintptr_t)a, op1_addr, size, 1); + bb_mvin((uintptr_t)a, op1_bank_id, 32, 1); + uint64_t krow = KROW; + uint64_t kcol = KCOL; + uint64_t inrow = INROW; + uint64_t incol = INCOL; + uint64_t startrow = STARTROW; + uint64_t startcol = STARTCOL; + bb_im2col(op1_bank_id, op2_bank_id, krow, kcol, inrow, incol, startrow, + startcol); + bb_mvout((uintptr_t)b, op2_bank_id, conv_num() / kernel_elems(), 1); bb_fence(); - uint64_t krow = 4; - uint64_t kcol = 1; - uint64_t inrow = 16; - uint64_t incol = 16; - uint64_t startrow = 1; - uint64_t startcol = 1; - // bb_im2col(op1_addr, op2_addr, krow, kcol, inrow, incol, startrow, - // startcol); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { + int conv = conv_num(); + int elems = kernel_elems(); + + clear_i8_matrix(b, EXPECTED_ROWS, EXPECTED_COLS); + build_expected_im2col_matrix(a, expected_matrix, INROW, INCOL, KROW, KCOL, + STARTROW, STARTCOL); + hw_im2col(test_name, a, b, size); - return 1; + + if (compare_i8_matrices(b, expected_matrix, conv, elems)) { + printf("%s compare test PASSED\n", test_name); + return 1; + } else { + printf("%s compare test FAILED\n", test_name); + return 0; + } } int test_im2col() { - init_sequence_matrix(input_matrix_a, DIM, 32); + init_sequence_matrix(input_matrix_a, DIM, DIM); return run_test("Im2col", input_matrix_a, output_matrix_b, DIM); } diff --git a/bb-tests/workloads/src/CTest/toy/mvin_mvout_acc_test.c b/bb-tests/workloads/src/CTest/toy/mvin_mvout_acc_test.c index c829ce82..38196e37 100644 --- a/bb-tests/workloads/src/CTest/toy/mvin_mvout_acc_test.c +++ b/bb-tests/workloads/src/CTest/toy/mvin_mvout_acc_test.c @@ -1,10 +1,12 @@ #include "buckyball.h" #include -#include +#include #include #include #include +#define DIM (BANK_WIDTH / sizeof(elem_t)) + // Test matrices static elem_t input_matrix_a[DIM * 1024] __attribute__((aligned(64))); static elem_t input_matrix_b[DIM * 1024] __attribute__((aligned(64))); @@ -16,14 +18,15 @@ int acc_mvin_mvout_pressure_test() { for (int i = 0; i < 4; i++) { init_u32_random_matrix(expected_matrix, DIM, DIM, i * 10 + i); - uint32_t wr_addr = spad_addr(4, i); - bb_mvin((uintptr_t)expected_matrix, wr_addr, DIM << 2, 1); + uint32_t acc_bank_id = 2; // virtual bank id + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mvin((uintptr_t)expected_matrix, acc_bank_id, DIM, 1); init_u32_random_matrix(expected_matrix, DIM, DIM, i * 10 + i); clear_u32_matrix(output_matrix, DIM, DIM); - wr_addr = spad_addr(4, i); - bb_mvout((uintptr_t)output_matrix, wr_addr, DIM << 2, 1); - bb_fence(); + acc_bank_id = 2; // virtual bank id + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mvout((uintptr_t)output_matrix, acc_bank_id, DIM, 1); if (!compare_u32_matrices(output_matrix, expected_matrix, DIM, DIM)) { printf("Test ACC mvin/mvout pressure %d FAILED\n", i); return 0; diff --git a/bb-tests/workloads/src/CTest/toy/mvin_mvout_test.c b/bb-tests/workloads/src/CTest/toy/mvin_mvout_test.c index 84dc9d96..b6325d7c 100644 --- a/bb-tests/workloads/src/CTest/toy/mvin_mvout_test.c +++ b/bb-tests/workloads/src/CTest/toy/mvin_mvout_test.c @@ -1,30 +1,30 @@ #include "buckyball.h" #include -#include +#include #include #include #include +#define DIM 16 + // Test matrices -static elem_t input_matrix_a[DIM * 1024] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * 1024] __attribute__((aligned(64))); -static elem_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static elem_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); -static elem_t a_transposed[DIM * 1024] __attribute__((aligned(64))); +static elem_t input_matrix[DIM * DIM] __attribute__((aligned(128))); +static elem_t output_matrix[DIM * DIM] __attribute__((aligned(128))); + +int mvin_mvout_simple_test() { + uint32_t bank_id = 0; + bb_mem_alloc(bank_id, 1, 1); -int alternately_mvin_mvout_pressure_test() { - for (int i = 0; i < 4; i++) { - init_u8_random_matrix(expected_matrix, DIM, DIM, i * 10 + i); - uint32_t wr_addr = spad_addr(0, i); - bb_mvin((uintptr_t)expected_matrix, wr_addr, DIM, 1); - clear_u32_matrix(output_matrix, DIM, DIM); - bb_mvout((uintptr_t)output_matrix, wr_addr, DIM, 1); - bb_fence(); - if (!compare_u8_matrices(output_matrix, expected_matrix, DIM, DIM)) { - printf("Test mvin/mvout pressure %d FAILED\n", i); + for (int i = 0; i < 1; i++) { + init_u8_random_matrix(input_matrix, DIM, DIM, 111); + bb_mvin((uintptr_t)input_matrix, bank_id, DIM, 1); + clear_u8_matrix(output_matrix, DIM, DIM); + bb_mvout((uintptr_t)output_matrix, bank_id, DIM, 1); + if (!compare_u8_matrices(output_matrix, input_matrix, DIM, DIM)) { + printf("Test mvin/mvout simple %d FAILED\n", i); return 0; } else { - printf("Test mvin/mvout pressure %d PASSED\n", i); + printf("Test mvin/mvout simple %d PASSED\n", i); } } return 1; @@ -34,11 +34,11 @@ int main() { #ifdef MULTICORE multicore(MULTICORE); #endif - int passed = alternately_mvin_mvout_pressure_test(); + int passed = mvin_mvout_simple_test(); if (passed) { - printf("Alternately mvin/mvout pressure test PASSED\n"); + printf("mvin/mvout simple test PASSED\n"); } else { - printf("Alternately mvin/mvout pressure test FAILED\n"); + printf("mvin/mvout simple test FAILED\n"); } #ifdef MULTICORE exit(0); diff --git a/bb-tests/workloads/src/CTest/toy/mxfp_test.c b/bb-tests/workloads/src/CTest/toy/mxfp_test.c new file mode 100644 index 00000000..8b80d7cd --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/mxfp_test.c @@ -0,0 +1,218 @@ +#include "buckyball.h" +#include +#include + +#include +#include +#include +#include + +#define BLOCK_ELEMS 16 +#define NUM_BLOCKS 16 + +#define BANK_WORD_BYTES 16 // 128-bit bank word +#define ELEMS_PER_BANK_WORD 4 +#define WORDS_PER_BLOCK 4 +#define OUT_BYTES_PER_BLOCK 16 // one packed MX block = one 128-bit word +#define TOTAL_INPUT_WORDS (NUM_BLOCKS * BLOCK_ELEMS) +#define TOTAL_OUT_BYTES (NUM_BLOCKS * OUT_BYTES_PER_BLOCK) + +static uint32_t input_bits[TOTAL_INPUT_WORDS] __attribute__((aligned(64))); +static uint8_t expected_output[TOTAL_OUT_BYTES] __attribute__((aligned(64))); +static uint8_t output_buffer[TOTAL_OUT_BYTES] __attribute__((aligned(64))); + +static inline uint32_t fp_sign(uint32_t x) { return (x >> 31) & 0x1u; } +static inline uint32_t fp_exp(uint32_t x) { return (x >> 23) & 0xffu; } +static inline uint32_t fp_frac(uint32_t x) { return x & 0x7fffffu; } + +static inline int is_zero_u32(uint32_t x) { + return (fp_exp(x) == 0u) && (fp_frac(x) == 0u); +} + +static inline int is_subnormal_u32(uint32_t x) { + return (fp_exp(x) == 0u) && (fp_frac(x) != 0u); +} + +static inline int is_special_u32(uint32_t x) { return fp_exp(x) == 0xffu; } + +static inline uint8_t normal_exp_or_zero(uint32_t x) { + if (is_zero_u32(x) || is_subnormal_u32(x) || is_special_u32(x)) { + return 0; + } + return (uint8_t)fp_exp(x); +} + +static uint8_t quantize_mag4(uint32_t x, uint8_t shared_exp) { + uint32_t exp = fp_exp(x); + uint32_t frac = fp_frac(x); + + if (is_zero_u32(x) || is_subnormal_u32(x)) + return 0; + if (is_special_u32(x)) + return 15; + if (exp > shared_exp) + return 15; + + uint64_t sig24 = (1ull << 23) | frac; + uint32_t shift_amt = (20u + (uint32_t)shared_exp - exp) & 0x3fu; + uint64_t shifted = sig24 >> shift_amt; + + if (shifted >= 15u) + return 15; + return (uint8_t)(shifted & 0xfu); +} + +static void pack_mx6_block(const uint32_t *block_in, uint8_t *block_out_16B) { + uint8_t exps[BLOCK_ELEMS]; + uint8_t global_exp = 0; + uint8_t micro_byte = 0; + uint8_t payloads[BLOCK_ELEMS]; + + memset(block_out_16B, 0, OUT_BYTES_PER_BLOCK); + + for (int i = 0; i < BLOCK_ELEMS; ++i) { + exps[i] = normal_exp_or_zero(block_in[i]); + if (exps[i] > global_exp) + global_exp = exps[i]; + } + + for (int p = 0; p < BLOCK_ELEMS / 2; ++p) { + uint8_t e0 = exps[2 * p]; + uint8_t e1 = exps[2 * p + 1]; + uint8_t pair_max = (e0 > e1) ? e0 : e1; + + uint8_t micro = 0; + if ((global_exp != 0u) && + ((uint16_t)pair_max + 1u <= (uint16_t)global_exp)) { + micro = 1; + } + if (micro) + micro_byte |= (uint8_t)(1u << p); + + uint8_t local_exp = micro ? (uint8_t)(global_exp - 1u) : global_exp; + + for (int k = 0; k < 2; ++k) { + int idx = 2 * p + k; + uint8_t sign = (uint8_t)fp_sign(block_in[idx]); + uint8_t mag = quantize_mag4(block_in[idx], local_exp); + payloads[idx] = (uint8_t)((sign << 4) | mag); + } + } + + block_out_16B[0] = global_exp; + block_out_16B[1] = micro_byte; + + for (int i = 0; i < BLOCK_ELEMS; ++i) { + uint32_t bit_pos = (uint32_t)(i * 5); + uint32_t byte_pos = bit_pos / 8; + uint32_t bit_off = bit_pos % 8; + uint16_t val = (uint16_t)(payloads[i] & 0x1fu); + + block_out_16B[2 + byte_pos] |= (uint8_t)(val << bit_off); + if (bit_off > 3) { + block_out_16B[2 + byte_pos + 1] |= (uint8_t)(val >> (8 - bit_off)); + } + } +} + +static void init_input_bits(void) { + static const uint32_t base_block[BLOCK_ELEMS] = { + 0xBF000000u, // -0.5 + 0x3F400000u, // 0.75 + 0xBF800000u, // -1.0 + 0x3FC00000u, // 1.5 + 0xC0000000u, // -2.0 + 0x40400000u, // 3.0 + 0xC0800000u, // -4.0 + 0x40C00000u, // 6.0 + 0xC1000000u, // -8.0 + 0x41400000u, // 12.0 + 0xC1800000u, // -16.0 + 0x41C00000u, // 24.0 + 0xC2000000u, // -32.0 + 0x42400000u, // 48.0 + 0xC2800000u, // -64.0 + 0x42C00000u // 96.0 + }; + + for (int blk = 0; blk < NUM_BLOCKS; ++blk) { + for (int i = 0; i < BLOCK_ELEMS; ++i) { + input_bits[blk * BLOCK_ELEMS + i] = base_block[i]; + } + } +} + +static void build_expected_output(void) { + memset(expected_output, 0, sizeof(expected_output)); + for (int blk = 0; blk < NUM_BLOCKS; ++blk) { + pack_mx6_block(&input_bits[blk * BLOCK_ELEMS], + &expected_output[blk * OUT_BYTES_PER_BLOCK]); + } +} + +static int compare_bytes(const uint8_t *got, const uint8_t *exp, int n) { + for (int i = 0; i < n; ++i) { + if (got[i] != exp[i]) { + printf("Mismatch at index %d: Expected %u, Got %u\n", i, (unsigned)exp[i], + (unsigned)got[i]); + return 0; + } + } + return 1; +} + +static void hw_mxfp(const char *test_name, uint32_t *src_bits, + uint8_t *dst_bytes) { + (void)test_name; + + uint32_t op1_bank_id = 0; + uint32_t wr_bank_id = 1; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(wr_bank_id, 1, 1); + + bb_mvin((uintptr_t)src_bits, op1_bank_id, NUM_BLOCKS * WORDS_PER_BLOCK, 1); + bb_mxfp(op1_bank_id, wr_bank_id, NUM_BLOCKS); + bb_mvout((uintptr_t)dst_bytes, wr_bank_id, NUM_BLOCKS, 1); + bb_fence(); +} + +static int run_test(const char *test_name) { + memset(output_buffer, 0, sizeof(output_buffer)); + + init_input_bits(); + build_expected_output(); + hw_mxfp(test_name, input_bits, output_buffer); + + if (compare_bytes(output_buffer, expected_output, TOTAL_OUT_BYTES)) { + printf("%s compare PASSED\n", test_name); + return 1; + } else { + printf("%s compare FAILED\n", test_name); + return 0; + } +} + +int test_mxfp(int seed) { + (void)seed; + return run_test("MXFP"); +} + +int main(void) { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + int passed = test_mxfp(5); + if (passed) { + printf("MXFP hardware test PASSED!\n"); + } else { + printf("MXFP hardware test FAILED!\n"); + } + +#ifdef MULTICORE + exit(0); +#endif + + return !passed; +} diff --git a/bb-tests/workloads/src/CTest/toy/nnlut_test.c b/bb-tests/workloads/src/CTest/toy/nnlut_test.c deleted file mode 100644 index 59359c3e..00000000 --- a/bb-tests/workloads/src/CTest/toy/nnlut_test.c +++ /dev/null @@ -1,109 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t output_matrix_b[DIM * 1024] __attribute__((aligned(64))); - -// Simple LUT for reference (256 entries) -static elem_t cpu_lut[256]; - -void init_lut() { - // Initialize a simple LUT: identity function with saturation - // In real NN-LUT, this would contain pre-computed activation function values - for (int i = 0; i < 256; i++) { - int val = (int8_t)i; - if (val < -128) { - cpu_lut[i] = -128; - } else if (val > 127) { - cpu_lut[i] = 127; - } else { - cpu_lut[i] = val; - } - } -} - -void hw_nnlut(const char *test_name, elem_t *a, elem_t *b, int size) { - // Source operand in spad bank 0, write target in spad bank 1 - uint32_t op1_addr = spad_addr(0, 0); - uint32_t wr_addr = spad_addr(1, 0); - - // Move input into scratchpad bank0, starting at offset 0, iterate size times - // row-wise - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_fence(); - // Call NN-LUT instruction - bb_nnlut(op1_addr, wr_addr, size); - bb_fence(); - - // Result will be moved back in run_test for verification -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - // CPU reference computation - nnlut_cpu_reference(a, b, size); - - // Hardware computation - hw_nnlut(test_name, a, b, size); - - // Move result back from scratchpad for verification - uint32_t wr_addr = spad_addr(1, 0); - bb_mvout((uintptr_t)output_matrix_b, wr_addr, size, 1); - bb_fence(); - - // Verify results - int passed = 1; - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - int idx = i * size + j; - if (output_matrix_b[idx] != b[idx]) { - printf("Mismatch at [%d][%d]: expected %d, got %d\n", i, j, b[idx], - output_matrix_b[idx]); - passed = 0; - } - } - } - - return passed; -} - -int nnlut_cpu_reference(elem_t *input, elem_t *output, int size) { - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - elem_t val = input[i * size + j]; - // Convert to unsigned index (0-255) - uint8_t idx = (uint8_t)val; - output[i * size + j] = cpu_lut[idx]; - } - } - return 1; -} - -int test_nnlut(int seed) { - init_lut(); - init_i8_random_matrix(input_matrix_a, DIM, DIM, seed); - - // Run hardware test with verification - return run_test("NN-LUT", input_matrix_a, output_matrix_b, DIM); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - - int passed = test_nnlut(5); - if (passed) { - printf("NN-LUT test PASSED!!!!\n"); - } else { - printf("NN-LUT test FAILED\n"); - } - return (!passed); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/quant_test.c b/bb-tests/workloads/src/CTest/toy/quant_test.c new file mode 100644 index 00000000..14d8a87f --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/quant_test.c @@ -0,0 +1,147 @@ +#include "buckyball.h" +#include +#include +#include +#include +#include + +#define DIM 16 + +// FP32 bit patterns for input data (4 elements per SRAM word, 16 words = 64 +// FP32 values) Using scale = 1.0 so INT32 result = round(fp32_val * 1.0) = +// round(fp32_val) Values: 1.0, 2.0, 3.0, -1.0, -2.0, 0.0, 4.0, 5.0, 10.0, +// -10.0, 0.5, 100.0, -100.0, 7.0, 8.0, -8.0 Repeated 4 times to fill 16 words +// (64 elements) + +static uint32_t fp32_input[DIM * 4] __attribute__((aligned(64))) = { + // Word 0-3 (row 0): 1.0, 2.0, 3.0, -1.0 | -2.0, 0.0, 4.0, 5.0 | 10.0, + // -10.0, 0.5, 100.0 | -100.0, 7.0, 8.0, -8.0 + 0x3F800000, + 0x40000000, + 0x40400000, + 0xBF800000, + 0xC0000000, + 0x00000000, + 0x40800000, + 0x40A00000, + 0x41200000, + 0xC1200000, + 0x3F000000, + 0x42C80000, + 0xC2C80000, + 0x40E00000, + 0x41000000, + 0xC1000000, + // Word 4-7 (row 1): same pattern + 0x3F800000, + 0x40000000, + 0x40400000, + 0xBF800000, + 0xC0000000, + 0x00000000, + 0x40800000, + 0x40A00000, + 0x41200000, + 0xC1200000, + 0x3F000000, + 0x42C80000, + 0xC2C80000, + 0x40E00000, + 0x41000000, + 0xC1000000, + // Word 8-11 (row 2): same pattern + 0x3F800000, + 0x40000000, + 0x40400000, + 0xBF800000, + 0xC0000000, + 0x00000000, + 0x40800000, + 0x40A00000, + 0x41200000, + 0xC1200000, + 0x3F000000, + 0x42C80000, + 0xC2C80000, + 0x40E00000, + 0x41000000, + 0xC1000000, + // Word 12-15 (row 3): same pattern + 0x3F800000, + 0x40000000, + 0x40400000, + 0xBF800000, + 0xC0000000, + 0x00000000, + 0x40800000, + 0x40A00000, + 0x41200000, + 0xC1200000, + 0x3F000000, + 0x42C80000, + 0xC2C80000, + 0x40E00000, + 0x41000000, + 0xC1000000, +}; + +// Expected INT32 output: round(fp32 * 1.0) +// 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 1, 100, -100, 7, 8, -8 (0.5 rounds to 1) +static int32_t expected_int32[DIM * 4] __attribute__((aligned(64))) = { + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 1, 100, -100, 7, 8, -8, + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 1, 100, -100, 7, 8, -8, + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 1, 100, -100, 7, 8, -8, + 1, 2, 3, -1, -2, 0, 4, 5, 10, -10, 1, 100, -100, 7, 8, -8, +}; + +static int32_t output_int32[DIM * 4] __attribute__((aligned(64))); + +// FP32 bit pattern for scale = 1.0 +#define SCALE_1_0 0x3F800000U + +void hw_quant(uint32_t *input, int32_t *output, int num_words) { + uint32_t op1_bank_id = 0; + uint32_t wr_bank_id = 1; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(wr_bank_id, 1, 1); + + bb_mvin((uintptr_t)input, op1_bank_id, num_words, 1); + + bb_quant(op1_bank_id, wr_bank_id, num_words, SCALE_1_0); + + bb_mvout((uintptr_t)output, wr_bank_id, num_words, 1); + bb_fence(); +} + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + for (int i = 0; i < DIM * 4; i++) { + output_int32[i] = 0; + } + + hw_quant(fp32_input, output_int32, DIM); + + int passed = 1; + for (int i = 0; i < DIM * 4; i++) { + if (output_int32[i] != expected_int32[i]) { + printf("MISMATCH at [%d]: got %d, expected %d\n", i, output_int32[i], + expected_int32[i]); + passed = 0; + } + } + + if (passed) { + printf("Quant test PASSED\n"); + } else { + printf("Quant test FAILED\n"); + } + return (!passed); + +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/CTest/toy/relu_test.c b/bb-tests/workloads/src/CTest/toy/relu_test.c index 607b3db1..88490015 100644 --- a/bb-tests/workloads/src/CTest/toy/relu_test.c +++ b/bb-tests/workloads/src/CTest/toy/relu_test.c @@ -1,41 +1,97 @@ #include "buckyball.h" #include -#include +#include #include #include #include -static elem_t input_matrix[DIM * DIM] __attribute__((aligned(64))); +#define DIM 16 // 强制 16x16 + +// ======================= +// 固定输入矩阵(4行周期) +// ======================= +static elem_t input_matrix[DIM * DIM] __attribute__((aligned(64))) = { + // ---- Cycle 1 ---- + // Row1 + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, + // Row2 + -17, -16, -15, -14, -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, + // Row3 + -27, -26, -25, -24, -23, -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, + // Row4 + -37, -36, -35, -34, -33, -32, -31, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 2 ---- + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -17, -16, -15, -14, + -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, -27, -26, -25, -24, -23, + -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, -37, -36, -35, -34, -33, -32, + -31, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 3 ---- + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -17, -16, -15, -14, + -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, -27, -26, -25, -24, -23, + -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, -37, -36, -35, -34, -33, -32, + -31, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 4 ---- + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -17, -16, -15, -14, + -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, -27, -26, -25, -24, -23, + -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, -37, -36, -35, -34, -33, -32, + -31, 30, 31, 32, 33, 34, 35, 36, 37, 38}; + +// ======================= +// 直接写死 ReLU 结果 +// ======================= +static elem_t expected_matrix[DIM * DIM] __attribute__((aligned(64))) = { + // ---- Cycle 1 ---- + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 10, 11, + 12, 13, 14, 15, 16, 17, 18, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, + 27, 28, 0, 0, 0, 0, 0, 0, 0, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 2 ---- + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 10, 11, + 12, 13, 14, 15, 16, 17, 18, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, + 27, 28, 0, 0, 0, 0, 0, 0, 0, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 3 ---- + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 10, 11, + 12, 13, 14, 15, 16, 17, 18, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, + 27, 28, 0, 0, 0, 0, 0, 0, 0, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 4 ---- + 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 10, 11, + 12, 13, 14, 15, 16, 17, 18, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, + 27, 28, 0, 0, 0, 0, 0, 0, 0, 30, 31, 32, 33, 34, 35, 36, 37, 38}; + static elem_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static elem_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); -// Used to verify content in SPAD after MVIN - -// Expected: provide a ReLU flow similar to TRANSPOSE -// Currently bbhw/isa does not have bb_relu high-level API, this example uses -// the same move-in->execute->fence flow as transpose. Need to add -// bb_relu(op1_addr, wr_addr, iter) wrapper in bbhw implementation -// (func7=RELU_FUNC7). - -void hw_relu(const char *test_name, elem_t *a, result_t*b, int size) { - // Source operand in spad bank 0, write target in spad bank 1 - uint32_t op1_addr = spad_addr(0, 0); - uint32_t wr_addr = spad_addr(1, 0); - - // Move input into scratchpad bank0, starting at offset 0, iterate size times - // row-wise - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_fence(); - // Call ReLU instruction - bb_relu(op1_addr, wr_addr, size); + +// ======================= +// HW ReLU Flow(保持不变) +// ======================= +void hw_relu(const char *test_name, elem_t *a, elem_t *b, int size) { + uint32_t op1_bank_id = 0; + uint32_t wr_bank_id = 1; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(wr_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + + bb_relu(op1_bank_id, wr_bank_id, size); + + bb_mvout((uintptr_t)b, wr_bank_id, size, 1); bb_fence(); - bb_mvout((uintptr_t)b, wr_addr, size, 1); } +// ======================= +// 测试函数(去掉 CPU 计算) +// ======================= int run_test(const char *test_name, elem_t *a, int size) { clear_i8_matrix(output_matrix, size, size); - cpu_relu(a, expected_matrix, size, size); + hw_relu(test_name, a, output_matrix, size); - if(compare_i8_matrices(output_matrix, expected_matrix, size, size)) { + + if (compare_i8_matrices(output_matrix, expected_matrix, size, size)) { printf("%s compare test PASSED\n", test_name); return 1; } else { @@ -44,35 +100,7 @@ int run_test(const char *test_name, elem_t *a, int size) { } } -int relu_cpu_reference(elem_t *input, elem_t *output, int size) { - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - elem_t val = input[i * size + j]; - output[i * size + j] = (val < 0) ? 0 : val; - } - } - return 1; -} - -int test_relu(int seed) { - init_i8_random_matrix(input_matrix, DIM, DIM, seed); - // // CPU TEST BEGIN - // // Measure cycles for the CPU ReLU reference implementation - // unsigned long long start = read_rdcycle(); - // // CPU verification - // int ok = relu_cpu_reference(input_matrix_a, output_matrix_b, DIM); - // unsigned long long end = read_rdcycle(); - // unsigned long long cycles = end - start; - // /* Print as hex high/low 32-bit parts to avoid embedded printf lacking - // full long long support. This produces a stable, greppable output. */ - // uint32_t lo = (uint32_t)(cycles & 0xffffffffULL); - // uint32_t hi = (uint32_t)(cycles >> 32); - // printf("BB_CYCLES_RELU: 0x%08x%08x\n", hi, lo); - // return ok; - // // CPU TEST END - return run_test("ReLU", input_matrix, DIM); - // ReLUBall test code, need to comment out the code block above -} +int test_relu(int seed) { return run_test("ReLU", input_matrix, DIM); } int main() { #ifdef MULTICORE @@ -81,7 +109,7 @@ int main() { int passed = test_relu(5); if (passed) { - printf("ReLU test PASSED!!!\n"); + printf("ReLU test PASSED\n"); } else { printf("ReLU test FAILED\n"); } diff --git a/bb-tests/workloads/src/CTest/toy/snn_test.c b/bb-tests/workloads/src/CTest/toy/snn_test.c deleted file mode 100644 index 9c3f8c1e..00000000 --- a/bb-tests/workloads/src/CTest/toy/snn_test.c +++ /dev/null @@ -1,110 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t output_matrix_b[DIM * DIM] __attribute__((aligned(64))); - -// Simple LIF neuron model for CPU reference -int snn_cpu_reference(elem_t *input, elem_t *output, int size, - uint8_t threshold, uint8_t leak_factor) { - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - int val = (int8_t)input[i * size + j]; - - // Apply leak: multiply by leak_factor, then divide by 256 - int leaked = (val * (int)leak_factor) >> 8; - - // Fire condition: if leaked >= threshold, output threshold, else output - // leaked Clamp to threshold range - if (leaked >= (int)threshold) { - output[i * size + j] = (elem_t)threshold; - } else if (leaked < -(int)threshold) { - output[i * size + j] = (elem_t)(-threshold); - } else { - output[i * size + j] = (elem_t)leaked; - } - } - } - return 1; -} - -void hw_snn(const char *test_name, elem_t *a, elem_t *b, int size, - uint8_t threshold, uint8_t leak_factor) { - // Source operand in spad bank 0, write target in spad bank 1 - uint32_t op1_addr = spad_addr(0, 0); - uint32_t wr_addr = spad_addr(1, 0); - - // Move input into scratchpad bank0, starting at offset 0, iterate size times - // row-wise - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_fence(); - // Call SNN instruction - bb_snn(op1_addr, wr_addr, size, threshold, leak_factor); - bb_fence(); - - // Result will be moved back in run_test for verification -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size, - uint8_t threshold, uint8_t leak_factor) { - // CPU reference computation - snn_cpu_reference(a, b, size, threshold, leak_factor); - - // Hardware computation - hw_snn(test_name, a, b, size, threshold, leak_factor); - - // Move result back from scratchpad for verification - uint32_t wr_addr = spad_addr(1, 0); - bb_mvout((uintptr_t)output_matrix_b, wr_addr, size, 1); - bb_fence(); - - // Verify results - int passed = 1; - for (int i = 0; i < size; i++) { - for (int j = 0; j < size; j++) { - int idx = i * size + j; - if (output_matrix_b[idx] != b[idx]) { - printf("Mismatch at [%d][%d]: expected %d, got %d\n", i, j, b[idx], - output_matrix_b[idx]); - passed = 0; - } - } - } - - return passed; -} - -int test_snn(int seed) { - // Initialize input matrix with random values - init_i8_random_matrix(input_matrix_a, DIM, DIM, seed); - - // Test with default parameters: threshold=127, leak_factor=240 - uint8_t threshold = 127; - uint8_t leak_factor = 240; // 240/256 ≈ 0.9375 - - // Run hardware test with verification - return run_test("SNN", input_matrix_a, output_matrix_b, DIM, threshold, - leak_factor); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - - int passed = test_snn(5); - if (passed) { - printf("SNN test PASSED!!!!\n"); - } else { - printf("SNN test FAILED\n"); - } - return (!passed); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/test_vsetvli.c b/bb-tests/workloads/src/CTest/toy/test_vsetvli.c new file mode 100644 index 00000000..1e1a94ef --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/test_vsetvli.c @@ -0,0 +1,20 @@ +#include + +int main() { + printf("Testing RVV RoCC decode with VSETVLI\n"); + + register unsigned long vl asm("a0"); + + // vsetvli a0, zero, e32, m1, ta, ma + // opcode=1010111, rd=a0(x10), funct3=111, rs1=x0, zimm=0x08 + // Encoding: 0000 0000 1000 00000 111 01010 1010111 + // = 0x0080d557 + asm volatile(".word 0x0080d557\n" // vsetvli a0, x0, 0x08 (e32, m1) + : "=r"(vl) + : + : "memory"); + + printf("VSETVLI executed, vl = %lu\n", vl); + printf("Test PASSED\n"); + return 0; +} diff --git a/bb-tests/workloads/src/CTest/toy/tiled_matmul.c b/bb-tests/workloads/src/CTest/toy/tiled_matmul.c deleted file mode 100644 index 074fe64c..00000000 --- a/bb-tests/workloads/src/CTest/toy/tiled_matmul.c +++ /dev/null @@ -1,217 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include - -#define DIM_I 32 -#define DIM_J 32 -#define DIM_K 32 - -static elem_t input_a[DIM_I * DIM_J] __attribute__((aligned(16))); -static elem_t input_b[DIM_J * DIM_K] __attribute__((aligned(16))); -static result_t output_c[DIM_I * DIM_K] __attribute__((aligned(64))); -static result_t expected_c[DIM_I * DIM_K] __attribute__((aligned(64))); -static result_t c_zero[DIM * DIM] __attribute__((aligned(64))); -/** -void tiled_matmul(uint32_t dim_i, uint32_t dim_j, uint32_t dim_k, - elem_t *a, elem_t *b, result_t *c, result_t *zero_c) { - int row_iter = dim_i / DIM; - int col_iter = dim_k / DIM; - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - uint32_t j_stride = dim_j / DIM; - uint32_t k_stride = dim_k / DIM; - unsigned long long start_compute, end_compute; - start_compute = read_cycle(); - for (int i = 0; i < row_iter; i++) { - for (int k = 0; k < k_stride; k++) { - if(k == 0){ - for(int j = 0; j < j_stride; j++) - bb_mvin((uintptr_t)a + i * dim_j * DIM + j * DIM, op2_addr + -dim_j + j * DIM, DIM, j_stride); - } - bb_mvin((uintptr_t)b + k * DIM, op2_addr, dim_k, k_stride); - bb_mvin((uintptr_t)zero_c, wr_addr, DIM << 2, 1); - bb_fence(); - if(k == 0){ - bb_transpose(op2_addr + dim_j, op1_addr, dim_j, 0); - bb_fence(); - } - bb_mul_warp16(op1_addr, op2_addr, wr_addr, dim_j, 0); - bb_fence(); - bb_mvout((uintptr_t)c + i * dim_k * DIM * 4 + k * DIM * 4, wr_addr, -DIM << 2, k_stride); bb_fence(); - } - } - bb_fence(); - end_compute = read_cycle(); - printf("Cycles for matmul: %d\n", end_compute - start_compute); -} -*/ -void tiled_matmul_connect_mode(uint32_t dim_i, uint32_t dim_j, uint32_t dim_k, - elem_t *a, elem_t *b, result_t *c) { - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - uint32_t i_stride = dim_i / DIM; - uint32_t j_stride = dim_j / DIM; - uint32_t k_stride = dim_k / DIM; - uint64_t en = 1; - - unsigned long long start_compute, end_compute; - start_compute = read_cycle(); - bb_bbus_config(3, 0, en); - // mvin matrix a - for (int i = 0; i < i_stride; i++) { - for (int j = 0; j < j_stride; j++) { - bb_mvin((uintptr_t)a + i * dim_j * DIM + j * DIM, - op1_addr + dim_j * k_stride + dim_j * i + j * DIM, DIM, j_stride); - } - } - // mvin matrix b - for (int k = 0; k < k_stride; k++) { - bb_mvin((uintptr_t)b + k * DIM, op2_addr + dim_j * k, dim_j, k_stride); - } - bb_fence(); - // perform matmul - for (int i = 0; i < i_stride; i++) { - for (int k = 0; k < k_stride; k++) { - bb_mul_warp16(op1_addr + dim_j * i, op2_addr + dim_j * k, - wr_addr + i * dim_k / 2 + k * DIM / 2, dim_j, 1); - bb_transpose((uintptr_t)op1_addr + dim_j * k_stride + dim_j * i, - op1_addr + dim_j * i, dim_j, 1); - } - } - bb_fence(); - // mvout matrix c - for (int i = 0; i < i_stride; i++) { - for (int k = 0; k < k_stride; k++) { - bb_mvout((uintptr_t)c + i * dim_k * DIM * 4 + k * DIM * 4, - wr_addr + i * dim_k / 2 + k * DIM / 2, DIM << 2, k_stride); - bb_fence(); - } - } - bb_bbus_config(3, 0, 0); - end_compute = read_cycle(); - printf("Cycles for matmul: %d\n", end_compute - start_compute); -} - -void tiled_matmul_normal_mode(uint32_t dim_i, uint32_t dim_j, uint32_t dim_k, - elem_t *a, elem_t *b, result_t *c) { - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - uint32_t i_stride = dim_i / DIM; - uint32_t j_stride = dim_j / DIM; - uint32_t k_stride = dim_k / DIM; - - unsigned long long start_compute, end_compute; - start_compute = read_cycle(); - - // mvin matrix a - for (int i = 0; i < i_stride; i++) { - for (int j = 0; j < j_stride; j++) { - bb_mvin((uintptr_t)a + i * dim_j * DIM + j * DIM, - op2_addr + dim_j * k_stride + dim_j * i + j * DIM, DIM, j_stride); - } - } - // mvin matrix b - for (int k = 0; k < k_stride; k++) { - bb_mvin((uintptr_t)b + k * DIM, op2_addr + dim_j * k, dim_j, k_stride); - } - bb_fence(); - - // transpose matrix a - for (int i = 0; i < i_stride; i++) { - bb_transpose((uintptr_t)op2_addr + dim_j * k_stride + dim_j * i, - op1_addr + dim_j * i, dim_j, 0); - } - bb_fence(); - - // perform matmul - for (int i = 0; i < i_stride; i++) { - for (int k = 0; k < k_stride; k++) { - bb_mul_warp16(op1_addr + dim_j * i, op2_addr + dim_j * k, - wr_addr + i * dim_k / 2 + k * DIM / 2, dim_j, 0); - bb_fence(); - } - } - bb_fence(); - // mvout matrix c - for (int i = 0; i < i_stride; i++) { - for (int k = 0; k < k_stride; k++) { - bb_mvout((uintptr_t)c + i * dim_k * DIM * 4 + k * DIM * 4, - wr_addr + i * dim_k / 2 + k * DIM / 2, DIM << 2, k_stride); - bb_fence(); - } - } - - end_compute = read_cycle(); - printf("Cycles for matmul: %d\n", end_compute - start_compute); -} - -int run_test(const char *test_name) { - - tiled_matmul_normal_mode(DIM_I, DIM_J, DIM_K, input_a, input_b, output_c); - cpu_matmul(input_a, input_b, expected_c, DIM_I, DIM_K, DIM_J); - if (compare_u32_matrices(output_c, expected_c, DIM_I, DIM_K)) { - printf("Test Connect Mode %s PASSED\n", test_name); - } else { - printf("Test Connect Mode %s FAILED\n", test_name); - return 0; - } - - clear_u32_matrix(output_c, DIM_I, DIM_K); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - // TODO: ACC overwrite write can skip this step - bb_mvin((uintptr_t)output_c, wr_addr, DIM_I * DIM_K * 4 / DIM, 1); - tiled_matmul_normal_mode(DIM_I, DIM_J, DIM_K, input_a, input_b, output_c); - if (compare_u32_matrices(output_c, expected_c, DIM_I, DIM_K)) { - printf("Test Normal Mode %s PASSED\n", test_name); - return 1; - } else { - printf("Test Normal Mode %s FAILED\n", test_name); - return 0; - } -} - -int test_tiled_matmul() { - /** - init_u8_random_matrix(input_a, DIM_I, DIM_J, 111); - init_u8_random_matrix(input_b, DIM_J, DIM_K, 222); - */ - init_sequence_matrix(input_a, DIM_I, DIM_J); - init_sequence_matrix(input_b, DIM_I, DIM_J); - return run_test("Tiled Matmul"); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - // printf("%p ,%p", input_a, input_b); - // printf("Testing Tiled Matmul\n"); - int passed = test_tiled_matmul(); - if (passed) { - printf("Tiled Matmul test PASSED\n"); - return 0; - } else { - printf("Tiled Matmul test FAILED\n"); - return 1; - } -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/tlb_test.c b/bb-tests/workloads/src/CTest/toy/tlb_test.c new file mode 100644 index 00000000..06e3926b --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/tlb_test.c @@ -0,0 +1,217 @@ +#include "buckyball.h" +#include +#include +#include +#include + +// --------------------------------------------------------------------------- +// Sv39 page table setup for bare-metal TLB test +// --------------------------------------------------------------------------- +// Sv39: 3-level page table, 4KB pages +// VA bits: [38:30] VPN[2], [29:21] VPN[1], [20:12] VPN[0], [11:0] offset +// We use 1GB megapages (level-2 only) for simplicity: one PTE covers 1GB. +// This creates an identity mapping (VA == PA) for the first 4GB. + +#define PAGESIZE 4096 +#define PTE_V 0x01 // Valid +#define PTE_R 0x02 // Read +#define PTE_W 0x04 // Write +#define PTE_X 0x08 // Execute +#define PTE_U 0x00 // Supervisor only (U=0) +#define PTE_A 0x40 // Accessed +#define PTE_D 0x80 // Dirty +#define PTE_RWXAD (PTE_R | PTE_W | PTE_X | PTE_A | PTE_D) + +#define SATP_MODE_SV39 8UL +#define SATP_MODE_BARE 0UL + +// Page table: one page, 512 entries (level-2 table for Sv39) +// Must be page-aligned (4KB) +static uint64_t page_table[512] __attribute__((aligned(PAGESIZE))); + +static void setup_identity_mapping(void) { + // Create identity mapping using 1GB megapages + // PTE format: [53:10] = PPN, [7:0] = flags + // For a 1GB megapage at level 2: PPN = physical GB index + for (int i = 0; i < 4; i++) { + // PPN for 1GB megapage: GB_index << 18 (since PPN has 28 bits for Sv39, + // and 1GB = 2^30 bytes, PPN = PA >> 12, so PPN = i << 18) + uint64_t ppn = (uint64_t)i << 18; + page_table[i] = (ppn << 10) | PTE_V | PTE_RWXAD; + } + // Remaining entries are invalid (0) + for (int i = 4; i < 512; i++) { + page_table[i] = 0; + } +} + +static void enable_sv39(void) { + uint64_t satp_val = (SATP_MODE_SV39 << 60) | ((uint64_t)page_table >> 12); + asm volatile("csrw satp, %0" ::"r"(satp_val)); + // Flush TLB after changing satp + asm volatile("sfence.vma"); +} + +static void disable_vm(void) { + uint64_t satp_val = SATP_MODE_BARE << 60; + asm volatile("csrw satp, %0" ::"r"(satp_val)); + asm volatile("sfence.vma"); +} + +// --------------------------------------------------------------------------- +// Test data +// --------------------------------------------------------------------------- +#define DIM 16 + +static elem_t input_matrix[DIM * DIM] __attribute__((aligned(128))); +static elem_t output_matrix[DIM * DIM] __attribute__((aligned(128))); + +// --------------------------------------------------------------------------- +// Test 1: DMA with VM disabled (satp.mode=Bare) +// Verifies passthrough works (vm_enabled=false → vaddr=paddr) +// --------------------------------------------------------------------------- +int test_bare_mode(void) { + printf("[TLB Test 1] DMA with VM disabled (Bare mode)...\n"); + + disable_vm(); + + uint32_t bank_id = 0; + bb_mem_alloc(bank_id, 1, 1); + + init_u8_random_matrix(input_matrix, DIM, DIM, 42); + clear_u8_matrix(output_matrix, DIM, DIM); + + bb_mvin((uintptr_t)input_matrix, bank_id, DIM, 1); + bb_mvout((uintptr_t)output_matrix, bank_id, DIM, 1); + bb_fence(); + + if (!compare_u8_matrices(output_matrix, input_matrix, DIM, DIM)) { + printf("[TLB Test 1] FAILED: mvin/mvout mismatch in Bare mode\n"); + return 0; + } + + printf("[TLB Test 1] PASSED\n"); + return 1; +} + +// --------------------------------------------------------------------------- +// Test 2: DMA with Sv39 identity mapping +// Verifies TLB translation works (vm_enabled=true, but VA==PA) +// --------------------------------------------------------------------------- +int test_sv39_identity(void) { + printf("[TLB Test 2] DMA with Sv39 identity mapping...\n"); + + setup_identity_mapping(); + enable_sv39(); + + uint32_t bank_id = 0; + bb_mem_alloc(bank_id, 1, 1); + + init_u8_random_matrix(input_matrix, DIM, DIM, 77); + clear_u8_matrix(output_matrix, DIM, DIM); + + bb_mvin((uintptr_t)input_matrix, bank_id, DIM, 1); + bb_mvout((uintptr_t)output_matrix, bank_id, DIM, 1); + bb_fence(); + + if (!compare_u8_matrices(output_matrix, input_matrix, DIM, DIM)) { + printf("[TLB Test 2] FAILED: mvin/mvout mismatch with Sv39\n"); + disable_vm(); + return 0; + } + + printf("[TLB Test 2] PASSED\n"); + return 1; +} + +// --------------------------------------------------------------------------- +// Test 3: sfence.vma flushes Buckyball TLB +// After Sv39 is active, issue sfence.vma, then verify DMA still works +// (TLB entries were flushed, must refill from page table) +// --------------------------------------------------------------------------- +int test_sfence_flush(void) { + printf("[TLB Test 3] sfence.vma flush + DMA refill...\n"); + + // sfence.vma: flush all TLBs (CPU + Buckyball) + asm volatile("sfence.vma"); + + uint32_t bank_id = 0; + bb_mem_alloc(bank_id, 1, 1); + + init_u8_random_matrix(input_matrix, DIM, DIM, 99); + clear_u8_matrix(output_matrix, DIM, DIM); + + bb_mvin((uintptr_t)input_matrix, bank_id, DIM, 1); + bb_mvout((uintptr_t)output_matrix, bank_id, DIM, 1); + bb_fence(); + + if (!compare_u8_matrices(output_matrix, input_matrix, DIM, DIM)) { + printf("[TLB Test 3] FAILED: mvin/mvout mismatch after sfence\n"); + disable_vm(); + return 0; + } + + printf("[TLB Test 3] PASSED\n"); + return 1; +} + +// --------------------------------------------------------------------------- +// Test 4: Multiple sfence cycles +// Repeatedly: do DMA → sfence → do DMA, to stress TLB flush/refill +// --------------------------------------------------------------------------- +int test_repeated_sfence(void) { + printf("[TLB Test 4] Repeated sfence + DMA cycles...\n"); + + uint32_t bank_id = 0; + + for (int i = 0; i < 4; i++) { + asm volatile("sfence.vma"); + + bb_mem_alloc(bank_id, 1, 1); + init_u8_random_matrix(input_matrix, DIM, DIM, 100 + i); + clear_u8_matrix(output_matrix, DIM, DIM); + + bb_mvin((uintptr_t)input_matrix, bank_id, DIM, 1); + bb_mvout((uintptr_t)output_matrix, bank_id, DIM, 1); + bb_fence(); + + if (!compare_u8_matrices(output_matrix, input_matrix, DIM, DIM)) { + printf("[TLB Test 4] FAILED at iteration %d\n", i); + disable_vm(); + return 0; + } + } + + printf("[TLB Test 4] PASSED\n"); + return 1; +} + +// --------------------------------------------------------------------------- +// Main +// --------------------------------------------------------------------------- +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + + int all_passed = 1; + + all_passed &= test_bare_mode(); + all_passed &= test_sv39_identity(); + all_passed &= test_sfence_flush(); + all_passed &= test_repeated_sfence(); + + // Restore Bare mode + disable_vm(); + + if (all_passed) { + printf("\n=== ALL TLB TESTS PASSED ===\n"); + } else { + printf("\n=== SOME TLB TESTS FAILED ===\n"); + } + +#ifdef MULTICORE + exit(0); +#endif + return !all_passed; +} diff --git a/bb-tests/workloads/src/CTest/toy/transfer_test.c b/bb-tests/workloads/src/CTest/toy/transfer_test.c deleted file mode 100644 index c2c6a69f..00000000 --- a/bb-tests/workloads/src/CTest/toy/transfer_test.c +++ /dev/null @@ -1,66 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t output_matrix_b[DIM * DIM] __attribute__((aligned(64))); -static elem_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); -// static elem_t probe_matrix[DIM * DIM] __attribute__((aligned(64))); -// Used to verify content in SPAD after MVIN - - -// bb_transfer(op1_addr, wr_addr, iter) wrapper in bbhw implementation -// (func7=TRANSFER_FUNC7). - -void hw_transfer(const char *test_name, elem_t *a, elem_t *b, int size) { - // Source operand in spad bank 0, write target in spad bank 1 - uint32_t op1_addr = spad_addr(0, 0); - uint32_t wr_addr = spad_addr(1, 0); - - // Move input into scratchpad bank0, starting at offset 0, iterate size times - // row-wise - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_fence(); - // Call Transfer instruction - bb_transfer(op1_addr, wr_addr, size); - bb_fence(); - bb_mvout((uintptr_t)b, wr_addr, size, 1); -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_i8_matrix(output_matrix_b, size, size); - cpu_transfer(a, expected_matrix, size, size); - hw_transfer(test_name, a, output_matrix_b, size); - if (!compare_i8_matrices(expected_matrix, output_matrix_b, size, size)) { - printf("%s: Output matrix does not match expected result!\n", test_name); - return 0; - } - printf("%s: Output matrix match expected result!\n", test_name); - return 1; -} - -int test_transfer(int seed) { - init_i8_random_matrix(input_matrix_a, DIM, DIM, seed); - return run_test("Transfer", input_matrix_a, output_matrix_b, DIM); - // TransferBall test code, need to comment out the code block above -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - int passed = test_transfer(5); - if (passed) { - printf("Transfer test PASSED!\n"); - } else { - printf("Transfer test FAILED!\n"); - } - return (!passed); - -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/transpose_16xn_test.c b/bb-tests/workloads/src/CTest/toy/transpose_16xn_test.c new file mode 100644 index 00000000..27301f89 --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/transpose_16xn_test.c @@ -0,0 +1,67 @@ +#include "buckyball.h" +#include +#include +#include +#include + +#define DIM 16 + +static elem_t input_matrix[DIM * 64] __attribute__((aligned(64))); +static elem_t output_matrix[64 * DIM] __attribute__((aligned(64))); +static elem_t expected_matrix[64 * DIM] __attribute__((aligned(64))); + +void hw_transpose(elem_t *a, elem_t *b, int size) { + uint32_t op1_bank_id = 0; + uint32_t op2_bank_id = 1; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_transpose(op1_bank_id, op2_bank_id, size, 0); + bb_mvout((uintptr_t)b, op2_bank_id, size, 1); + bb_fence(); +} + +int run_test(const char *test_name, int cols) { + int size = cols; // iter = number of rows in scratchpad = cols for 16xN + + // Initialize input: A[i][j] = i + j + init_sequence_matrix(input_matrix, DIM, cols); + + // Compute expected transpose on CPU + transpose_u8_matrix(input_matrix, expected_matrix, DIM, cols); + + // Run hardware transpose + hw_transpose(input_matrix, output_matrix, size); + + // Compare results + if (compare_u8_matrices(output_matrix, expected_matrix, cols, DIM)) { + printf("Test %s PASSED\n", test_name); + return 1; + } else { + printf("Test %s FAILED\n", test_name); + return 0; + } +} + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + int all_passed = 1; + + all_passed &= run_test("transpose 16x16", 16); + all_passed &= run_test("transpose 16x32", 32); + + if (all_passed) { + printf("All transpose 16xn tests PASSED\n"); + return 0; + } else { + printf("Some transpose 16xn tests FAILED\n"); + return 1; + } +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/CTest/toy/transpose_matmul.c b/bb-tests/workloads/src/CTest/toy/transpose_matmul.c index a96da9e8..18f7bb42 100644 --- a/bb-tests/workloads/src/CTest/toy/transpose_matmul.c +++ b/bb-tests/workloads/src/CTest/toy/transpose_matmul.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM (BANK_WIDTH / sizeof(elem_t)) + // Column count n for 16xn matrix multiplication #define MATMUL_COL 50 // 16-byte alignment of n @@ -18,22 +20,21 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); + uint32_t op1_bank_id = 0; // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); + uint32_t op2_bank_id = 1; // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + int acc_bank_id = 2; // virtual bank id + bb_mem_alloc(acc_bank_id, 1, 4); uint32_t col_stride = (size + DIM - 1) / DIM; for (int i = 0; i < col_stride; i++) { - bb_mvin((uintptr_t)a + i * DIM, op2_addr + size + i * DIM, DIM, col_stride); + bb_mvin((uintptr_t)a + i * DIM, op2_bank_id + size + i * DIM, DIM, + col_stride); } - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_transpose(op2_addr + size, op1_addr, size, 0); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op2_bank_id + size, op1_bank_id, size, 0); + bb_mul_warp16(op1_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); bb_fence(); } diff --git a/bb-tests/workloads/src/CTest/toy/transpose_test.c b/bb-tests/workloads/src/CTest/toy/transpose_test.c index a79a69c0..597244ce 100644 --- a/bb-tests/workloads/src/CTest/toy/transpose_test.c +++ b/bb-tests/workloads/src/CTest/toy/transpose_test.c @@ -1,32 +1,104 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); +#define DIM 16 + +static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))) = { + // ---- Cycle 1 ---- + // Row1 + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, + // Row2 + -17, -16, -15, -14, -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, + // Row3 + -27, -26, -25, -24, -23, -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, + // Row4 + -37, -36, -35, -34, -33, -32, -31, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 2 ---- + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -17, -16, -15, -14, + -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, -27, -26, -25, -24, -23, + -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, -37, -36, -35, -34, -33, -32, + -31, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 3 ---- + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -17, -16, -15, -14, + -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, -27, -26, -25, -24, -23, + -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, -37, -36, -35, -34, -33, -32, + -31, 30, 31, 32, 33, 34, 35, 36, 37, 38, + + // ---- Cycle 4 ---- + -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -17, -16, -15, -14, + -13, -12, -11, 10, 11, 12, 13, 14, 15, 16, 17, 18, -27, -26, -25, -24, -23, + -22, -21, 20, 21, 22, 23, 24, 25, 26, 27, 28, -37, -36, -35, -34, -33, -32, + -31, 30, 31, 32, 33, 34, 35, 36, 37, 38}; + +static elem_t expected_matrix[DIM * DIM] __attribute__((aligned(64))) = { + // Row 1 of A^T (col 1 of A) + -7, -17, -27, -37, -7, -17, -27, -37, -7, -17, -27, -37, -7, -17, -27, -37, + // Row 2 + -6, -16, -26, -36, -6, -16, -26, -36, -6, -16, -26, -36, -6, -16, -26, -36, + // Row 3 + -5, -15, -25, -35, -5, -15, -25, -35, -5, -15, -25, -35, -5, -15, -25, -35, + // Row 4 + -4, -14, -24, -34, -4, -14, -24, -34, -4, -14, -24, -34, -4, -14, -24, -34, + // Row 5 + -3, -13, -23, -33, -3, -13, -23, -33, -3, -13, -23, -33, -3, -13, -23, -33, + // Row 6 + -2, -12, -22, -32, -2, -12, -22, -32, -2, -12, -22, -32, -2, -12, -22, -32, + // Row 7 + -1, -11, -21, -31, -1, -11, -21, -31, -1, -11, -21, -31, -1, -11, -21, -31, + // Row 8 + 0, 10, 20, 30, 0, 10, 20, 30, 0, 10, 20, 30, 0, 10, 20, 30, + // Row 9 + 1, 11, 21, 31, 1, 11, 21, 31, 1, 11, 21, 31, 1, 11, 21, 31, + // Row 10 + 2, 12, 22, 32, 2, 12, 22, 32, 2, 12, 22, 32, 2, 12, 22, 32, + // Row 11 + 3, 13, 23, 33, 3, 13, 23, 33, 3, 13, 23, 33, 3, 13, 23, 33, + // Row 12 + 4, 14, 24, 34, 4, 14, 24, 34, 4, 14, 24, 34, 4, 14, 24, 34, + // Row 13 + 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, + // Row 14 + 6, 16, 26, 36, 6, 16, 26, 36, 6, 16, 26, 36, 6, 16, 26, 36, + // Row 15 + 7, 17, 27, 37, 7, 17, 27, 37, 7, 17, 27, 37, 7, 17, 27, 37, + // Row 16 + 8, 18, 28, 38, 8, 18, 28, 38, 8, 18, 28, 38, 8, 18, 28, 38}; static elem_t output_matrix_b[DIM * 1024] __attribute__((aligned(64))); void hw_transpose(const char *test_name, elem_t *a, elem_t *b, int size) { // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); + uint32_t op1_bank_id = 0; // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); + uint32_t op2_bank_id = 1; - bb_mvin((uintptr_t)a, op1_addr, size, 1); - bb_fence(); - bb_transpose(op1_addr, op2_addr, size, 0); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_transpose(op1_bank_id, op2_bank_id, size, 0); + bb_mvout((uintptr_t)b, op2_bank_id, size, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { hw_transpose(test_name, a, b, size); - return 1; + if (compare_i8_matrices(output_matrix_b, expected_matrix, size, size)) { + printf("%s compare test PASSED\n", test_name); + return 1; + } else { + printf("%s compare test FAILED\n", test_name); + return 0; + } } int test_transpose() { - init_sequence_matrix(input_matrix_a, DIM, DIM); - return run_test("Im2col", input_matrix_a, output_matrix_b, DIM); + // init_sequence_matrix(input_matrix_a, DIM, DIM); + return run_test("transpose", input_matrix_a, output_matrix_b, DIM); } int main() { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_ones.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_ones.c index 4634c17d..9ba00564 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_ones.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_ones.c @@ -1,38 +1,44 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * 32] __attribute__((aligned(16))); -static elem_t input_matrix_b[32 * DIM] __attribute__((aligned(16))); +#define DIM 16 + +static elem_t input_matrix_a[DIM * 32] __attribute__((aligned(64))); +static elem_t input_matrix_b[32 * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[32 * DIM] __attribute__((aligned(16))); - transpose_u8_matrix(a, a_transposed, DIM, 32); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); - cpu_matmul(a, b, expected_matrix, DIM, DIM, size); hw_matmul(test_name, a, b, output_matrix, size); + cpu_matmul(a, b, expected_matrix, DIM, DIM, size); if (compare_u32_matrices(output_matrix, expected_matrix, DIM, DIM)) { printf("Test %s PASSED\n", test_name); return 1; @@ -43,8 +49,8 @@ int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { } int test_ones_16x32() { - init_ones_matrix(input_matrix_a, DIM, 32); - init_ones_matrix(input_matrix_b, 32, DIM); + init_sequence_matrix(input_matrix_a, DIM, 32); + init_sequence_matrix(input_matrix_b, 32, DIM); return run_test("All-ones matrices", input_matrix_a, input_matrix_b, 32); } diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random1.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random1.c index eb644a7b..70842579 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random1.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random1.c @@ -1,36 +1,42 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * 64] __attribute__((aligned(16))); -static elem_t input_matrix_b[64 * DIM] __attribute__((aligned(16))); +#define DIM 16 + +static elem_t input_matrix_a[DIM * 64] __attribute__((aligned(64))); +static elem_t input_matrix_b[64 * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[64 * DIM] __attribute__((aligned(16))); - transpose_u8_matrix(a, a_transposed, DIM, 64); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, DIM, DIM, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, DIM, DIM)) { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random2.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random2.c index 9c0eb366..874959c4 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random2.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random2.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM 16 + static elem_t input_matrix_a[DIM * 32] __attribute__((aligned(16))); static elem_t input_matrix_b[32 * DIM] __attribute__((aligned(16))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); @@ -11,26 +13,30 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[32 * DIM] __attribute__((aligned(16))); - transpose_u8_matrix(a, a_transposed, DIM, 32); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, DIM, DIM, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, DIM, DIM)) { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random3.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random3.c index a55ced5f..79f23613 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random3.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_random3.c @@ -1,31 +1,38 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * 48] __attribute__((aligned(16))); -static elem_t input_matrix_b[48 * DIM] __attribute__((aligned(16))); +#define DIM 16 + +static elem_t input_matrix_a[DIM * 48] __attribute__((aligned(64))); +static elem_t input_matrix_b[48 * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[48 * DIM] __attribute__((aligned(16))); - transpose_u8_matrix(a, a_transposed, DIM, 48); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); bb_fence(); } diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_zero_random.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_zero_random.c index 68e291bd..bf53b6ce 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_zero_random.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_16xn_zero_random.c @@ -1,36 +1,42 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * 32] __attribute__((aligned(16))); -static elem_t input_matrix_b[32 * DIM] __attribute__((aligned(16))); +#define DIM 16 + +static elem_t input_matrix_a[DIM * 32] __attribute__((aligned(64))); +static elem_t input_matrix_b[32 * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[32 * DIM] __attribute__((aligned(16))); - transpose_u8_matrix(a, a_transposed, DIM, 32); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, DIM << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, DIM, DIM, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, DIM, DIM)) { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_col_row_vector.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_col_row_vector.c index 78071ec3..5bbc1204 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_col_row_vector.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_col_row_vector.c @@ -1,8 +1,10 @@ #include "buckyball.h" #include -#include +#include #include #include + +#define DIM 16 #include static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); @@ -12,22 +14,26 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size, 1); bb_fence(); } diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_identity_random.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_identity_random.c index 14301583..c2d63900 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_identity_random.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_identity_random.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM 16 + static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); @@ -11,22 +13,26 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size, 1); bb_fence(); } diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_ones.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_ones.c index 4a2ab8e3..75b0b131 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_ones.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_ones.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM 16 + static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); @@ -11,27 +13,29 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); + // static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); + // transpose_u8_matrix(a, a_transposed, size, size); + // spad0: operand A, offset 0 + uint32_t op1_bank_id = 0; // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); + uint32_t op2_bank_id = 1; // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + int acc_bank_id = 2; // virtual bank id + // bb_mem_alloc(acc_bank_id, 1, 4); - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mvin((uintptr_t)a, op1_bank_id, DIM, 1); + bb_mvin((uintptr_t)b, op2_bank_id, DIM, 1); + + bb_mul_warp16(op1_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size << 2, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, size, size, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { @@ -44,8 +48,8 @@ int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { } int test_ones() { - init_ones_matrix(input_matrix_a, DIM, DIM); - init_ones_matrix(input_matrix_b, DIM, DIM); + init_sequence_matrix(input_matrix_a, DIM, DIM); + init_sequence_matrix(input_matrix_b, DIM, DIM); return run_test("All-ones matrices", input_matrix_a, input_matrix_b, DIM); } @@ -54,6 +58,7 @@ int main() { multicore(MULTICORE); #endif int passed = test_ones(); + if (passed) { printf("vecunit_matmul_ones test PASSED\n"); return 0; @@ -61,6 +66,7 @@ int main() { printf("vecunit_matmul_ones test FAILED\n"); return 1; } + #ifdef MULTICORE exit(0); #endif diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random1.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random1.c index c8b3b0df..9230e985 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random1.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random1.c @@ -1,37 +1,89 @@ #include "buckyball.h" #include -#include +#include #include #include -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); +#define DIM 16 + +// Simple test matrices: A = identity-like, B = simple pattern +static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))) = { + 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}; + +static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))) = { + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, + 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, + 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, + 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, + 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, + 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, + 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, + 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, + 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, + 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, + 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, 16}; + static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); +static result_t zero_matrix[DIM * DIM] __attribute__((aligned(64))) = {0}; + +// Expected result: A * B where A is diagonal-like +static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))) = { + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, + 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, + 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, + 10, 11, 12, 13, 14, 15, 16, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, + 26, 28, 30, 32, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, + 32, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 2, 4, + 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 1, 2, 3, 4, 5, + 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, + 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, + 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, + 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, + 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, + 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, 16}; void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + // spad3: transposed A + // spad1: operand B + // acc0: write to accumulator + uint32_t op1_bank_id = 0; + uint32_t op2_bank_id = 1; + uint32_t acc_bank_id = 2; + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + // Initialize accumulator bank with zeros before matrix multiplication + bb_mvin((uintptr_t)zero_matrix, acc_bank_id, DIM, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, DIM, 1); + bb_mvin((uintptr_t)b, op2_bank_id, DIM, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); - cpu_matmul(a, b, expected_matrix, size, size, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { printf("Test %s PASSED\n", test_name); @@ -42,24 +94,26 @@ int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { } } -int test_random1() { - init_u8_random_matrix(input_matrix_a, DIM, DIM, 456); - init_u8_random_matrix(input_matrix_b, DIM, DIM, 789); - return run_test("Random matrices 1", input_matrix_a, input_matrix_b, DIM); +int test_ones() { + // init_u8_random_matrix(input_matrix_a, DIM, DIM, 456); + // init_u8_random_matrix(input_matrix_b, DIM, DIM, 789); + return run_test("Random matrices", input_matrix_a, input_matrix_b, DIM); } int main() { #ifdef MULTICORE multicore(MULTICORE); #endif - int passed = test_random1(); + int passed = test_ones(); + if (passed) { - printf("vecunit_matmul_random1 test PASSED\n"); + printf("vecunit_matmul_random test PASSED\n"); return 0; } else { - printf("vecunit_matmul_random1 test FAILED\n"); + printf("vecunit_matmul_random test FAILED\n"); return 1; } + #ifdef MULTICORE exit(0); #endif diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random2.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random2.c index 9a627dac..d08a62d8 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random2.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random2.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM 16 + static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); @@ -11,26 +13,30 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)a, op1_bank_id, DIM, 1); + bb_mvin((uintptr_t)b, op2_bank_id, DIM, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, size, size, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random3.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random3.c deleted file mode 100644 index 01f209e7..00000000 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_random3.c +++ /dev/null @@ -1,66 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include - -static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); -static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); -static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); - -void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, - int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); - bb_fence(); -} - -int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); - cpu_matmul(a, b, expected_matrix, size, size, size); - hw_matmul(test_name, a, b, output_matrix, size); - if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { - printf("Test %s PASSED\n", test_name); - return 1; - } else { - printf("Test %s FAILED\n", test_name); - return 0; - } -} - -int test_random3() { - init_u8_random_matrix(input_matrix_a, DIM, DIM, 333); - init_u8_random_matrix(input_matrix_b, DIM, DIM, 444); - return run_test("Random matrices 3", input_matrix_a, input_matrix_b, DIM); -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - int passed = test_random3(); - if (passed) { - printf("vecunit_matmul_random3 test PASSED\n"); - return 0; - } else { - printf("vecunit_matmul_random3 test FAILED\n"); - return 1; - } -#ifdef MULTICORE - exit(0); -#endif -} diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_row_col_vector.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_row_col_vector.c index 581afbcd..88a53ec5 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_row_col_vector.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_row_col_vector.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM 16 + static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); @@ -11,27 +13,30 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, size, size, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_zero_random.c b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_zero_random.c index 99718c73..56f19874 100644 --- a/bb-tests/workloads/src/CTest/toy/vecunit_matmul_zero_random.c +++ b/bb-tests/workloads/src/CTest/toy/vecunit_matmul_zero_random.c @@ -1,9 +1,11 @@ #include "buckyball.h" #include -#include +#include #include #include +#define DIM 16 + static elem_t input_matrix_a[DIM * DIM] __attribute__((aligned(64))); static elem_t input_matrix_b[DIM * DIM] __attribute__((aligned(64))); static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); @@ -11,27 +13,30 @@ static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); void hw_matmul(const char *test_name, elem_t *a, elem_t *b, result_t *c, int size) { - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); + // spad0: original A + uint32_t op1_bank_id = 0; + // spad1: operand B + uint32_t op2_bank_id = 1; + // acc0: write to accumulator + int acc_bank_id = 2; // virtual bank id + // spad3: transposed A + uint32_t a_transposed_bank_id = 3; - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); - bb_fence(); - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); + bb_mvin((uintptr_t)a, op1_bank_id, size, 1); + bb_mvin((uintptr_t)b, op2_bank_id, size, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, size, 0); + + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, size, 0); + bb_mvout((uintptr_t)c, acc_bank_id, size, 1); bb_fence(); } int run_test(const char *test_name, elem_t *a, elem_t *b, int size) { - clear_u32_matrix(output_matrix, DIM, DIM); cpu_matmul(a, b, expected_matrix, size, size, size); hw_matmul(test_name, a, b, output_matrix, size); if (compare_u32_matrices(output_matrix, expected_matrix, size, size)) { diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_simple_nn_forward_pass_test.c b/bb-tests/workloads/src/CTest/toy/vecunit_simple_nn_forward_pass_test.c deleted file mode 100644 index 608f0067..00000000 --- a/bb-tests/workloads/src/CTest/toy/vecunit_simple_nn_forward_pass_test.c +++ /dev/null @@ -1,156 +0,0 @@ -#include "buckyball.h" -#include -#include -#include -#include - -// Define neural network parameters -#define INPUT_SIZE DIM -#define HIDDEN_SIZE DIM -#define OUTPUT_SIZE DIM - -// Test matrices and data buffers -static elem_t input_data[DIM * DIM] __attribute__((aligned(64))); -static elem_t weights1[HIDDEN_SIZE * INPUT_SIZE] __attribute__((aligned(64))); -static elem_t weights2[OUTPUT_SIZE * HIDDEN_SIZE] __attribute__((aligned(64))); -static result_t hidden_output[DIM * DIM] __attribute__((aligned(64))); -static result_t final_output[DIM * DIM] __attribute__((aligned(64))); -static result_t expected_output[DIM * DIM] __attribute__((aligned(64))); - -// ReLU activation function (executed on CPU) -void relu(result_t *matrix, int rows, int cols) { - for (int i = 0; i < rows * cols; i++) { - if (matrix[i] < 0) { - matrix[i] = 0; - } - } -} - -// Quantization function (quantize int32 results to elem_t type) -void quantize_matrix(result_t *src, elem_t *dst, int size) { - for (int i = 0; i < size * size; i++) { - dst[i] = (src[i] > 127) ? 127 : (src[i] < -128) ? -128 : (elem_t)src[i]; - } -} - -// Neural network forward propagation on CPU -void cpu_nn_forward(elem_t *input, elem_t *w1, elem_t *w2, result_t *hidden, - result_t *output, int size) { - // Input layer -> hidden layer - cpu_matmul(input, w1, hidden, size, size, size); - // Apply ReLU activation - relu(hidden, size, size); - - // Quantize hidden layer output as input for next layer - static elem_t hidden_quantized[DIM * DIM]; - quantize_matrix(hidden, hidden_quantized, size); - - // Hidden layer -> output layer - cpu_matmul(hidden_quantized, w2, output, size, size, size); -} - -// Execute hardware matrix multiplication -void hw_matmul(elem_t *a, elem_t *b, result_t *c, int size) { - // Transpose left matrix - static elem_t a_transposed[DIM * DIM] __attribute__((aligned(64))); - transpose_u8_matrix(a, a_transposed, size, size); - - // Move matrices to scratchpad - // spad0: operand A, offset 0 - uint32_t op1_addr = spad_addr(0, 0); - // spad1: operand B, offset 0 - uint32_t op2_addr = spad_addr(1, 0); - // acc0: write to accumulator, offset 0 - uint32_t wr_addr = spad_addr(4, 0); - - bb_mvin((uintptr_t)a_transposed, op1_addr, size, 1); - bb_mvin((uintptr_t)b, op2_addr, size, 1); - bb_mvin((uintptr_t)c, wr_addr, size << 2, 1); - bb_fence(); - - // Execute matrix multiplication - bb_mul_warp16(op1_addr, op2_addr, wr_addr, size, 0); - bb_fence(); - - // Move result back - bb_mvout((uintptr_t)c, wr_addr, size << 2, 1); - bb_fence(); -} - -void hw_nn_forward(elem_t *input, elem_t *w1, elem_t *w2, result_t *hidden, - result_t *output, int size) { - // Input layer -> hidden layer - hw_matmul(input, w1, hidden, size); - // Apply ReLU on CPU - relu(hidden, size, size); - - // Quantize hidden layer output as input for next layer - static elem_t hidden_quantized[DIM * DIM]; - quantize_matrix(hidden, hidden_quantized, size); - - // Hidden layer -> output layer - hw_matmul(hidden_quantized, w2, output, size); -} - -// Execute neural network test -int test_neural_network() { - // Initialize data - printf("Initializing random input data and weights...\n"); - init_u8_random_matrix(input_data, DIM, DIM, 123); - - // Initialize weights - srand(114); - for (int i = 0; i < HIDDEN_SIZE * INPUT_SIZE; i++) { - weights1[i] = rand() % 128; - } - srand(514); - for (int i = 0; i < OUTPUT_SIZE * HIDDEN_SIZE; i++) { - weights2[i] = rand() % 128; - } - - // Clear output buffers - clear_u32_matrix(hidden_output, DIM, DIM); - clear_u32_matrix(expected_output, DIM, DIM); - - // Generate expected results on CPU - printf("Running CPU Neural Network Forward Pass...\n"); - cpu_nn_forward(input_data, weights1, weights2, hidden_output, expected_output, - DIM); - - // Clear hidden_output again for hardware computation - clear_u32_matrix(hidden_output, DIM, DIM); - clear_u32_matrix(final_output, DIM, DIM); - - // Execute neural network forward propagation on hardware - printf("Running Hardware Neural Network Forward Pass...\n"); - hw_nn_forward(input_data, weights1, weights2, hidden_output, final_output, - DIM); - - // Compare hardware output with expected output - printf("Comparing hardware output with expected output...\n"); - if (compare_u32_matrices(final_output, expected_output, DIM, DIM)) { - return 1; - } else { - return 0; - } -} - -int main() { -#ifdef MULTICORE - multicore(MULTICORE); -#endif - printf("Neural Network Test Starting...\n"); - int pass = test_neural_network(); - if (pass) { - printf("Neural Network test PASSED\n"); - return 0; - } else { - printf("Neural Network test FAILED\n"); - return 1; - } - -#ifdef MULTICORE - exit(0); -#endif - return 0; -} diff --git a/bb-tests/workloads/src/CTest/toy/vecunit_tiled_matmul.c b/bb-tests/workloads/src/CTest/toy/vecunit_tiled_matmul.c new file mode 100644 index 00000000..a98bc83d --- /dev/null +++ b/bb-tests/workloads/src/CTest/toy/vecunit_tiled_matmul.c @@ -0,0 +1,109 @@ +#include "buckyball.h" +#include +#include +#include +#include +#include + +#define DIM 16 +#define KDIM 1024 +#define KTILE 512 + +_Static_assert(KDIM % KTILE == 0, "KDIM must be divisible by KTILE"); +_Static_assert(KDIM % DIM == 0, "KDIM must be divisible by DIM"); +_Static_assert(KTILE % 16 == 0, + "KTILE must be multiple of 16 (mvin line size)"); + +static elem_t input_matrix_a[DIM * KDIM] __attribute__((aligned(64))); +static elem_t input_matrix_b[KDIM * DIM] __attribute__((aligned(64))); +static elem_t tile_a[DIM * KTILE] __attribute__((aligned(64))); +static result_t output_matrix[DIM * DIM] __attribute__((aligned(64))); +static result_t expected_matrix[DIM * DIM] __attribute__((aligned(64))); +static result_t zero_matrix[DIM * DIM] __attribute__((aligned(64))) = {0}; + +/// C = A * B with A row-major DIM×KDIM, B row-major KDIM×DIM; K 维按 KTILE +/// 分块累加到 acc。 +void hw_matmul_tiled(const char *test_name, elem_t *a, elem_t *b, result_t *c) { + (void)test_name; + uint32_t op1_bank_id = 0; + uint32_t op2_bank_id = 1; + int acc_bank_id = 2; + uint32_t a_transposed_bank_id = 3; + + bb_mem_alloc(op1_bank_id, 1, 1); + bb_mem_alloc(op2_bank_id, 1, 1); + bb_mem_alloc(acc_bank_id, 1, 4); + bb_mem_alloc(a_transposed_bank_id, 1, 1); + + bb_mvin((uintptr_t)zero_matrix, acc_bank_id, DIM, 1); + + for (int k0 = 0; k0 < KDIM; k0 += KTILE) { + /* Row-major A: column block [k0, k0+KTILE) is not contiguous across rows; + * pack to tile_a. */ + for (int r = 0; r < DIM; r++) { + memcpy(&tile_a[r * KTILE], &a[r * KDIM + k0], (size_t)KTILE); + } + bb_mvin((uintptr_t)tile_a, op1_bank_id, DIM * (KTILE / 16), 1); + /* B rows k0..k0+KTILE-1 are contiguous: each row is DIM bytes. */ + bb_mvin((uintptr_t)(b + k0 * DIM), op2_bank_id, KTILE, 1); + bb_transpose(op1_bank_id, a_transposed_bank_id, KTILE, 0); + bb_mul_warp16(a_transposed_bank_id, op2_bank_id, acc_bank_id, KTILE, 0); + } + + bb_mvout((uintptr_t)c, acc_bank_id, DIM, 1); + bb_fence(); +} + +void init_diag_ones(elem_t *a, elem_t *b, result_t *expected) { + clear_u8_matrix(a, DIM, KDIM); + clear_u8_matrix(b, KDIM, DIM); + clear_u32_matrix(expected, DIM, DIM); + + /* One 1 per k: A[r,k]=1 iff r==k%DIM, B[k,c]=1 iff c==k%DIM => C[r,c] nonzero + * only on diagonal. */ + for (int k = 0; k < KDIM; k++) { + int i = k % DIM; + a[i * KDIM + k] = 1; + b[k * DIM + i] = 1; + } + + int diag_val = KDIM / DIM; + for (int r = 0; r < DIM; r++) { + expected[r * DIM + r] = diag_val; + } +} + +int run_test(const char *test_name, elem_t *a, elem_t *b) { + hw_matmul_tiled(test_name, a, b, output_matrix); + if (compare_u32_matrices(output_matrix, expected_matrix, DIM, DIM)) { + printf("Test %s PASSED\n", test_name); + return 1; + } else { + printf("Test %s FAILED\n", test_name); + return 0; + } +} + +int test_tiled_matmul() { + init_diag_ones(input_matrix_a, input_matrix_b, expected_matrix); + return run_test("K-tiled diag-ones matmul", input_matrix_a, input_matrix_b); +} + +int main() { +#ifdef MULTICORE + multicore(MULTICORE); +#endif + int passed = test_tiled_matmul(); + + if (passed) { + printf("tiled_matmul test PASSED\n"); + return 0; + } else { + printf("tiled_matmul test FAILED\n"); + return 1; + } + +#ifdef MULTICORE + exit(0); +#endif +} diff --git a/bb-tests/workloads/src/ModelTest/models/LeNet/pytorch-lenet-train.py b/bb-tests/workloads/src/ModelTest/models/LeNet/pytorch-lenet-train.py index 4ee9e3ed..b474a69b 100644 --- a/bb-tests/workloads/src/ModelTest/models/LeNet/pytorch-lenet-train.py +++ b/bb-tests/workloads/src/ModelTest/models/LeNet/pytorch-lenet-train.py @@ -52,7 +52,7 @@ loss.backward() optimizer.step() running_loss += loss.item() - print(f"Epoch {epoch+1}/{epochs} - Loss: {running_loss/len(train_loader)}") + print(f"Epoch {epoch + 1}/{epochs} - Loss: {running_loss / len(train_loader)}") # Save the complete model torch.save(model, "lenet-model.pth") diff --git a/bb-tests/workloads/src/OpTest/CMakeLists.txt b/bb-tests/workloads/src/OpTest/CMakeLists.txt index 784d4753..0640aae5 100644 --- a/bb-tests/workloads/src/OpTest/CMakeLists.txt +++ b/bb-tests/workloads/src/OpTest/CMakeLists.txt @@ -1,10 +1,9 @@ set(OPTEST_TOY_DIR ${OPTEST_WORKLOAD_DIR}/toy) -set(OPTEST_GEMMINI_DIR ${OPTEST_WORKLOAD_DIR}/gemmini) +set(OPTEST_TILE_DIR ${OPTEST_WORKLOAD_DIR}/tile) -add_subdirectory(toy) -add_subdirectory(gemmini) +# add_subdirectory(toy) +# add_subdirectory(tile) add_custom_target(OpTest-all ALL DEPENDS - OpTest-gemmini -# OpTest-toy + ) diff --git a/bb-tests/workloads/src/OpTest/gemmini/.gitignore b/bb-tests/workloads/src/OpTest/gemmini/.gitignore deleted file mode 100644 index b65d24ce..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -log.* -core -*pk -a.out diff --git a/bb-tests/workloads/src/OpTest/gemmini/CMakeLists.txt b/bb-tests/workloads/src/OpTest/gemmini/CMakeLists.txt deleted file mode 100644 index bc16b256..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/CMakeLists.txt +++ /dev/null @@ -1,163 +0,0 @@ - - - -#------------------------------------------------------------------------------- -# Define common compilation step functions .mlir -> .o -#------------------------------------------------------------------------------- - -#------------------------------------------------------------------------------- -# Generate executables for different platforms -#------------------------------------------------------------------------------- -# Generate baremetal executables -function(add_baremetal_target TARGET_NAME MLIR_FILE) - set(OBJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-baremetal.o") - set(EXECUTABLE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-baremetal") - - # Compile xx.mlir to xx-baremetal.o - add_custom_command( - OUTPUT ${OBJ_FILE} - COMMAND ${BUDDY_OPT} ${MLIR_FILE} - -convert-linalg-to-gemmini - -expand-strided-metadata - -convert-linalg-to-loops - -lower-gemmini | - ${BUDDY_TRANSLATE} --buddy-to-llvmir | - ${BUDDY_LLC} -filetype=obj -mtriple=riscv64 - -mattr=+buddyext,+D -float-abi=hard - -relocation-model=pic - -o ${OBJ_FILE} - DEPENDS ${MLIR_FILE} - COMMENT "Compiling ${MLIR_FILE} to object file" - ) - - # Link xx-baremetal.o to xx-baremetal - add_custom_command( - - OUTPUT ${EXECUTABLE} - COMMAND ${ELF_CC} -O2 -static -specs=htif_nano.specs - ${OBJ_FILE} -o ${EXECUTABLE} - DEPENDS ${OBJ_FILE} - COMMENT "Linking baremetal executable: ${EXECUTABLE}" - ) - - # Create target for xx-baremetal - add_custom_target(${TARGET_NAME}_baremetal - DEPENDS ${EXECUTABLE} - ) -endfunction() - -# Generate Linux executables -function(add_linux_target TARGET_NAME MLIR_FILE) - set(OBJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-linux.o") - set(EXECUTABLE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-linux") - - # Compile xx.mlir to xx-linux.o - add_custom_command( - OUTPUT ${OBJ_FILE} - COMMAND ${BUDDY_OPT} ${MLIR_FILE} - -convert-linalg-to-gemmini - -expand-strided-metadata - -convert-linalg-to-loops - -lower-gemmini | - ${BUDDY_TRANSLATE} --buddy-to-llvmir | - ${BUDDY_LLC} -filetype=obj -mtriple=riscv64 - -mattr=+buddyext,+D -float-abi=hard - -o ${OBJ_FILE} - DEPENDS ${MLIR_FILE} - COMMENT "Compiling ${MLIR_FILE} to Linux object file" - ) - - # Link xx-linux.o to xx-linux - add_custom_command( - OUTPUT ${EXECUTABLE} - COMMAND ${LINUX_CC} -O2 -static ${OBJ_FILE} -o ${EXECUTABLE} - DEPENDS ${OBJ_FILE} - COMMENT "Linking Linux executable: ${EXECUTABLE}" - ) - - # Create target for xx-linux - use target name prefix to avoid conflicts - add_custom_target(${TARGET_NAME}_linux - DEPENDS ${EXECUTABLE} - ) -endfunction() - -# Create executables for different platforms -function(add_cross_platform_target TARGET_NAME MLIR_FILE) - add_baremetal_target(${TARGET_NAME} ${MLIR_FILE}) - add_linux_target(${TARGET_NAME} ${MLIR_FILE}) - - # Create a master target that builds both platforms simultaneously - add_custom_target(${TARGET_NAME} - DEPENDS ${TARGET_NAME}_baremetal ${TARGET_NAME}_linux - COMMENT "Building ${TARGET_NAME} for both baremetal and Linux" - ) -endfunction() - -#------------------------------------------------------------------------------- -# Build list -#------------------------------------------------------------------------------- -add_cross_platform_target(matrix-add ${OPTEST_GEMMINI_DIR}/matrix-add.mlir) -add_cross_platform_target(matrix-add-scale ${OPTEST_GEMMINI_DIR}/matrix-add-scale.mlir) -add_cross_platform_target(mvin-mvout ${OPTEST_GEMMINI_DIR}/mvin-mvout.mlir) -add_cross_platform_target(transpose ${OPTEST_GEMMINI_DIR}/transpose.mlir) -add_cross_platform_target(compute-accumulated ${OPTEST_GEMMINI_DIR}/compute-accumulated.mlir) - -add_cross_platform_target(matmul ${OPTEST_GEMMINI_DIR}/matmul.mlir) -add_cross_platform_target(matmul-os ${OPTEST_GEMMINI_DIR}/matmul-os.mlir) -add_cross_platform_target(matmul-ws ${OPTEST_GEMMINI_DIR}/matmul-ws.mlir) -add_cross_platform_target(batch_matmul ${OPTEST_GEMMINI_DIR}/batch_matmul.mlir) - -add_cross_platform_target(tile-matmul ${OPTEST_GEMMINI_DIR}/tile-matmul.mlir) -add_cross_platform_target(tile-matmul-os ${OPTEST_GEMMINI_DIR}/tile-matmul-os.mlir) -add_cross_platform_target(tile-matmul-ws-relu ${OPTEST_GEMMINI_DIR}/tile-matmul-ws-relu.mlir) -add_cross_platform_target(tile-matmul-ws-igelu ${OPTEST_GEMMINI_DIR}/tile-matmul-ws-igelu.mlir) -add_cross_platform_target(tile-matmul-ws-softmax ${OPTEST_GEMMINI_DIR}/tile-matmul-ws-softmax.mlir) -add_cross_platform_target(tile-matmul-ws-layernorm ${OPTEST_GEMMINI_DIR}/tile-matmul-ws-layernorm.mlir) - -add_cross_platform_target(conv_2d_nchw_fchw_f32 ${OPTEST_GEMMINI_DIR}/conv_2d_nchw_fchw_f32.mlir) -add_cross_platform_target(conv_2d_nchw_fchw_i8 ${OPTEST_GEMMINI_DIR}/conv_2d_nchw_fchw_i8.mlir) -add_cross_platform_target(conv_2d_nhwc_hwcf_i8 ${OPTEST_GEMMINI_DIR}/conv_2d_nhwc_hwcf_i8.mlir) -add_cross_platform_target(conv_2d_nhwc_hwcf_f32 ${OPTEST_GEMMINI_DIR}/conv_2d_nhwc_hwcf_f32.mlir) -add_cross_platform_target(conv_2d_nhwc_hwcf_5x5_i8 ${OPTEST_GEMMINI_DIR}/conv_2d_nhwc_hwcf_5x5_i8.mlir) -add_cross_platform_target(conv_2d_nhwc_fhwc_f32 ${OPTEST_GEMMINI_DIR}/conv_2d_nhwc_fhwc_f32.mlir) -add_cross_platform_target(conv_2d_nhwc_fhwc_i8 ${OPTEST_GEMMINI_DIR}/conv_2d_nhwc_fhwc_i8.mlir) -add_cross_platform_target(conv_2d_nhwc_fhwc_5x5_i8 ${OPTEST_GEMMINI_DIR}/conv_2d_nhwc_fhwc_5x5_i8.mlir) - -add_cross_platform_target(tile-conv ${OPTEST_GEMMINI_DIR}/tile-conv.mlir) -add_cross_platform_target(tile-conv-relu ${OPTEST_GEMMINI_DIR}/tile-conv-relu.mlir) -add_cross_platform_target(tile-conv-softmax ${OPTEST_GEMMINI_DIR}/tile-conv-softmax.mlir) -add_cross_platform_target(tile-rect-conv ${OPTEST_GEMMINI_DIR}/tile-rect-conv.mlir) -add_cross_platform_target(tile-conv-igelu ${OPTEST_GEMMINI_DIR}/tile-conv-igelu.mlir) -add_cross_platform_target(tile-conv-layernorm ${OPTEST_GEMMINI_DIR}/tile-conv-layernorm.mlir) - -add_custom_target(OpTest-gemmini ALL DEPENDS - matrix-add - matrix-add-scale - mvin-mvout - transpose - compute-accumulated - matmul - matmul-os - matmul-ws - batch_matmul - tile-matmul - tile-matmul-os - tile-matmul-ws-relu - tile-matmul-ws-igelu - tile-matmul-ws-softmax - tile-matmul-ws-layernorm - conv_2d_nchw_fchw_f32 - conv_2d_nchw_fchw_i8 - conv_2d_nhwc_hwcf_i8 - conv_2d_nhwc_hwcf_f32 - conv_2d_nhwc_hwcf_5x5_i8 - conv_2d_nhwc_fhwc_f32 - conv_2d_nhwc_fhwc_i8 - conv_2d_nhwc_fhwc_5x5_i8 - tile-conv - tile-conv-relu - tile-conv-softmax - tile-rect-conv - tile-conv-igelu - tile-conv-layernorm -) diff --git a/bb-tests/workloads/src/OpTest/gemmini/batch_matmul.mlir b/bb-tests/workloads/src/OpTest/gemmini/batch_matmul.mlir deleted file mode 100644 index 0244443c..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/batch_matmul.mlir +++ /dev/null @@ -1,30 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %1 = arith.constant 1 : i8 - %2 = arith.constant 2 : i8 - %input0 = memref.alloc() : memref<3x3x3xi8> - %input1 = memref.alloc() : memref<3x3x3xi8> - %output = memref.alloc() : memref<3x3x3xi8> - linalg.fill - ins(%1 : i8) - outs(%input0 : memref<3x3x3xi8>) - linalg.fill - ins(%2 : i8) - outs(%input1 : memref<3x3x3xi8>) - // CHECK: gemmini.tile_matmul %subview %subview_2 %subview_3 %alloc_4 : - // CHECK-SAME: memref<3x3xi8, strided<[3, 1]>> memref<3x3xi8, strided<[3, 1]>> memref<3x3xi8, strided<[3, 1]>> memref<3x3xi32> - // CHECK: gemmini.tile_matmul %subview_5 %subview_6 %subview_7 %alloc_8 : - // CHECK-SAME: memref<3x3xi8, strided<[3, 1], offset: 9>> memref<3x3xi8, strided<[3, 1], offset: 9>> memref<3x3xi8, strided<[3, 1], offset: 9>> memref<3x3xi32> - // CHECK: gemmini.tile_matmul %subview_10 %subview_11 %subview_12 %alloc_13 : - // CHECK-SAME: memref<3x3xi8, strided<[3, 1], offset: 18>> memref<3x3xi8, strided<[3, 1], offset: 18>> memref<3x3xi8, strided<[3, 1], offset: 18>> memref<3x3xi32> - linalg.batch_matmul - ins(%input0, %input1: memref<3x3x3xi8>, memref<3x3x3xi8>) - outs(%output : memref<3x3x3xi8>) - gemmini.print %output : memref<3x3x3xi8> - memref.dealloc %output : memref<3x3x3xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/compute-accumulated.mlir b/bb-tests/workloads/src/OpTest/gemmini/compute-accumulated.mlir deleted file mode 100644 index 154a4f60..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/compute-accumulated.mlir +++ /dev/null @@ -1,49 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @gv1 : memref<4x4xi8> = dense<[[1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16]]> -memref.global "private" @gv2 : memref<4x4xi8> = dense<[[1, 1, 1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1]]> - -func.func @main() -> i64 { - %in = memref.get_global @gv1 : memref<4x4xi8> - %identity = memref.get_global @gv2 : memref<4x4xi8> - %out = memref.alloc() : memref<4x4xi8> - gemmini.print %out : memref<4x4xi8> - %inSpAddr = arith.constant 0 : i64 - %outSpAddr = arith.constant 4 : i64 - %identitySpAddr = arith.constant 8 : i64 - %cst4 = arith.constant 4 : i64 - %cst0 = arith.constant 0 : i64 - // CHECK: "gemmini.intr.config_st" - gemmini.config_st %cst4 : i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %in %inSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %identity %identitySpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ex" - gemmini.config_ex - // CHECK: "gemmini.intr.preload" - gemmini.preload_zeros %outSpAddr %cst4 %cst4 : i64 i64 i64 - // CHECK: "gemmini.intr.compute_preloaded" - gemmini.compute_preloaded %inSpAddr %identitySpAddr %cst4 %cst4 %cst4 %cst4 : i64 i64 i64 i64 i64 i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %out %outSpAddr : memref<4x4xi8> i64 - gemmini.print %out : memref<4x4xi8> - // CHECK: "gemmini.intr.compute_accumulated" - gemmini.compute_accumulated %inSpAddr %identitySpAddr %cst4 %cst4 %cst4 %cst4 : i64 i64 i64 i64 i64 i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %out %outSpAddr : memref<4x4xi8> i64 - gemmini.print %out : memref<4x4xi8> - return %cst0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nchw_fchw_f32.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nchw_fchw_f32.mlir deleted file mode 100644 index eda6cb2c..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nchw_fchw_f32.mlir +++ /dev/null @@ -1,51 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini="acc_t=f32" | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<2x2x5x5xf32> = dense<[[[[1., 0., -1., 0., 1.], - [1., 0., -1., 0., 1.], - [1., 0., -1., 0., 1.], - [1., 0., -1., 0., 1.], - [-1., 0., 1., 0., -1.]], - [[-1., 0., 1., 0., -1.], - [-1., 0., 1., 0., -1.], - [-1., 0., 1., 0., -1.], - [-1., 0., 1., 0., -1.], - [-1., 0., 1., 0., -1.]]], - [[[1., 0., 2., 0., 1.], - [1., 0., 2., 0., 1.], - [1., 0., 2., 0., 1.], - [1., 0., 2., 0., 1.], - [-1., 0., 2., 0., -1.]], - [[-1., 0., 2., 0., -1.], - [-1., 0., 2., 0., -1.], - [-1., 0., 2., 0., -1.], - [-1., 0., 2., 0., -1.], - [-1., 0., 2., 0., -1.]]]]> - -memref.global "private" @weight : memref<2x2x3x3xf32> = dense<[[[[1., 2., 3.], - [3., 2., 1.], - [1., 2., 3.]], - [[3., 2., 1.], - [1., 2., 3.], - [3., 2., 1.]]], - [[[1., 2., 3.], - [3., 2., 1.], - [1., 2., 3.]], - [[3., 2., 1.], - [1., 2., 3.], - [3., 2., 1.]]]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %mem0 = memref.get_global @input : memref<2x2x5x5xf32> - %mem1 = memref.get_global @weight : memref<2x2x3x3xf32> - %mem2 = memref.alloc() : memref<2x2x3x3xf32> - // CHECK: gemmini.tile_conv %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK-SAME: memref<2x5x5x2xf32> memref<18x2xf32> memref<2xf32> memref<18x2xf32> i64 i64 - linalg.conv_2d_nchw_fchw - ins (%mem0, %mem1 : memref<2x2x5x5xf32>, memref<2x2x3x3xf32>) - outs(%mem2 : memref<2x2x3x3xf32>) - gemmini.print %mem2 : memref<2x2x3x3xf32> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nchw_fchw_i8.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nchw_fchw_i8.mlir deleted file mode 100644 index eaec8b32..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nchw_fchw_i8.mlir +++ /dev/null @@ -1,51 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<2x2x5x5xi8> = dense<[[[[1, 0, -1, 0, 1], - [1, 0, -1, 0, 1], - [1, 0, -1, 0, 1], - [1, 0, -1, 0, 1], - [-1, 0, 1, 0, -1]], - [[-1, 0, 1, 0, -1], - [-1, 0, 1, 0, -1], - [-1, 0, 1, 0, -1], - [-1, 0, 1, 0, -1], - [-1, 0, 1, 0, -1]]], - [[[1, 0, 2, 0, 1], - [1, 0, 2, 0, 1], - [1, 0, 2, 0, 1], - [1, 0, 2, 0, 1], - [-1, 0, 2, 0, -1]], - [[-1, 0, 2, 0, -1], - [-1, 0, 2, 0, -1], - [-1, 0, 2, 0, -1], - [-1, 0, 2, 0, -1], - [-1, 0, 2, 0, -1]]]]> - -memref.global "private" @weight : memref<2x2x3x3xi8> = dense<[[[[1, 2, 3], - [3, 2, 1], - [1, 2, 3]], - [[3, 2, 1], - [1, 2, 3], - [3, 2, 1]]], - [[[1, 2, 3], - [3, 2, 1], - [1, 2, 3]], - [[3, 2, 1], - [1, 2, 3], - [3, 2, 1]]]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %mem0 = memref.get_global @input : memref<2x2x5x5xi8> - %mem1 = memref.get_global @weight : memref<2x2x3x3xi8> - %mem2 = memref.alloc() : memref<2x2x3x3xi8> - // CHECK: gemmini.tile_conv %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // memref<2x5x5x2xi8> memref<18x2xi8> memref<2xi32> memref<18x2xi8> i64 i64 - linalg.conv_2d_nchw_fchw - ins (%mem0, %mem1 : memref<2x2x5x5xi8>, memref<2x2x3x3xi8>) - outs(%mem2 : memref<2x2x3x3xi8>) - gemmini.print %mem2 : memref<2x2x3x3xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_5x5_i8.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_5x5_i8.mlir deleted file mode 100644 index a7d7662b..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_5x5_i8.mlir +++ /dev/null @@ -1,32 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<1x7x7x1xi8> = dense<[[[[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]]]]> - -memref.global "private" @kernel : memref<1x5x5x1xi8> = dense<[[[[1], [1], [1], [1], [1]], - [[1], [1], [1], [1], [1]], - [[1], [1], [1], [1], [1]], - [[1], [1], [1], [1], [1]], - [[1], [1], [1], [1], [1]]]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %input = memref.get_global @input : memref<1x7x7x1xi8> - %kernel = memref.get_global @kernel : memref<1x5x5x1xi8> - %output = memref.alloc() : memref<1x3x3x1xi8> - - // CHECK: gemmini.tile_conv %{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK-SAME: memref<1x7x7x1xi8> memref<25x1xi8> memref<1xi32> memref<9x1xi8> i64 i64 - linalg.conv_2d_nhwc_fhwc - ins(%input, %kernel : memref<1x7x7x1xi8>, memref<1x5x5x1xi8>) - outs(%output : memref<1x3x3x1xi8>) - gemmini.print %output : memref<1x3x3x1xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_f32.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_f32.mlir deleted file mode 100644 index 998b2d43..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_f32.mlir +++ /dev/null @@ -1,31 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini="acc_t=f32" | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<1x5x5x1xf32> = dense<[[[[1.],[2.],[3.],[4.],[5.]], - [[6.],[7.],[8.],[9.],[10.]], - [[11.],[12.],[13.],[14.],[15.]], - [[16.],[17.],[18.],[19.],[20.]], - [[21.],[22.],[23.],[24.],[25.]]]]> - -memref.global "private" @kernel : memref<1x3x3x1xf32> = dense<[[[[1.], [1.], [1.]], - [[1.], [1.], [1.]], - [[1.], [1.], [1.]]]]> - - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - // batchsize = 2 inputchannel = 2 - %input = memref.get_global @input : memref<1x5x5x1xf32> - // outputchannel = 3 - %kernel = memref.get_global @kernel : memref<1x3x3x1xf32> - // batchsize h w outputchannel - %output = memref.alloc() : memref<1x3x3x1xf32> - // CHECK: gemmini.tile_conv %{{.+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK: memref<1x5x5x1xf32> memref<9x1xf32> memref<1xf32> memref<9x1xf32> i64 i64 - linalg.conv_2d_nhwc_fhwc - ins(%input, %kernel : memref<1x5x5x1xf32>, memref<1x3x3x1xf32>) - outs(%output : memref<1x3x3x1xf32>) - gemmini.print %output : memref<1x3x3x1xf32> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_i8.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_i8.mlir deleted file mode 100644 index 0bfeafca..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_fhwc_i8.mlir +++ /dev/null @@ -1,30 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<1x7x7x1xi8> = dense<[[[[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]]]]> - -memref.global "private" @kernel : memref<1x3x3x1xi8> = dense<[[[[1], [1], [1]], - [[1], [1], [1]], - [[1], [1], [1]]]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %input = memref.get_global @input : memref<1x7x7x1xi8> - %kernel = memref.get_global @kernel : memref<1x3x3x1xi8> - %output = memref.alloc() : memref<1x5x5x1xi8> - - // CHECK: gemmini.tile_conv %{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK-SAME: memref<1x7x7x1xi8> memref<9x1xi8> memref<1xi32> memref<25x1xi8> i64 i64 - linalg.conv_2d_nhwc_fhwc - ins(%input, %kernel : memref<1x7x7x1xi8>, memref<1x3x3x1xi8>) - outs(%output : memref<1x5x5x1xi8>) - gemmini.print %output : memref<1x5x5x1xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_5x5_i8.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_5x5_i8.mlir deleted file mode 100644 index ee9af7d2..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_5x5_i8.mlir +++ /dev/null @@ -1,32 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<1x7x7x1xi8> = dense<[[[[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]], - [[1],[1],[1],[1],[1],[1],[1]]]]> - -memref.global "private" @kernel : memref<5x5x1x1xi8> = dense<[[[[1]], [[1]], [[1]], [[1]], [[1]]], - [[[1]], [[1]], [[1]], [[1]], [[1]]], - [[[1]], [[1]], [[1]], [[1]], [[1]]], - [[[1]], [[1]], [[1]], [[1]], [[1]]], - [[[1]], [[1]], [[1]], [[1]], [[1]]]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %input = memref.get_global @input : memref<1x7x7x1xi8> - %kernel = memref.get_global @kernel : memref<5x5x1x1xi8> - %output = memref.alloc() : memref<1x3x3x1xi8> - - // CHECK: gemmini.tile_conv %{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK-SAME: memref<1x7x7x1xi8> memref<25x1xi8> memref<1xi32> memref<9x1xi8> i64 i64 - linalg.conv_2d_nhwc_hwcf - ins(%input, %kernel : memref<1x7x7x1xi8>, memref<5x5x1x1xi8>) - outs(%output : memref<1x3x3x1xi8>) - gemmini.print %output : memref<1x3x3x1xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_f32.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_f32.mlir deleted file mode 100644 index f3c094dc..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_f32.mlir +++ /dev/null @@ -1,30 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini="acc_t=f32" | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<1x5x5x1xf32> = dense<[[[[1.],[2.],[3.],[4.],[5.]], - [[6.],[7.],[8.],[9.],[10.]], - [[11.],[12.],[13.],[14.],[15.]], - [[16.],[17.],[18.],[19.],[20.]], - [[21.],[22.],[23.],[24.],[25.]]]]> -memref.global "private" @kernel : memref<3x3x1x1xf32> = dense<[[[[1.]], [[1.]], [[1.]]], - [[[1.]], [[1.]], [[1.]]], - [[[1.]], [[1.]], [[1.]]]]> - - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - // batchsize = 2 inputchannel = 2 - %input = memref.get_global @input : memref<1x5x5x1xf32> - // outputchannel = 3 - %kernel = memref.get_global @kernel : memref<3x3x1x1xf32> - // batchsize h w outputchannel - %output = memref.alloc() : memref<1x3x3x1xf32> - // CHECK: gemmini.tile_conv %{{.+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK: memref<1x5x5x1xf32> memref<9x1xf32> memref<1xf32> memref<9x1xf32> i64 i64 - linalg.conv_2d_nhwc_hwcf - ins(%input, %kernel : memref<1x5x5x1xf32>, memref<3x3x1x1xf32>) - outs(%output : memref<1x3x3x1xf32>) - gemmini.print %output : memref<1x3x3x1xf32> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_i8.mlir b/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_i8.mlir deleted file mode 100644 index c6472d1b..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/conv_2d_nhwc_hwcf_i8.mlir +++ /dev/null @@ -1,30 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @input : memref<1x5x5x1xi8> = dense<[[[[1],[2],[3],[4],[5]], - [[6],[7],[8],[9],[10]], - [[11],[12],[13],[14],[15]], - [[16],[17],[18],[19],[20]], - [[21],[22],[23],[24],[25]]]]> -memref.global "private" @kernel : memref<3x3x1x1xi8> = dense<[[[[1]], [[1]], [[1]]], - [[[1]], [[1]], [[1]]], - [[[1]], [[1]], [[1]]]]> - - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - // batchsize = 2 inputchannel = 2 - %input = memref.get_global @input : memref<1x5x5x1xi8> - // outputchannel = 3 - %kernel = memref.get_global @kernel : memref<3x3x1x1xi8> - // batchsize h w outputchannel - %output = memref.alloc() : memref<1x3x3x1xi8> - // CHECK: gemmini.tile_conv %{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %{{.+}} %{{.+}} : - // CHECK-SAME: memref<1x5x5x1xi8> memref<9x1xi8> memref<1xi32> memref<9x1xi8> i64 i64 - linalg.conv_2d_nhwc_hwcf - ins(%input, %kernel : memref<1x5x5x1xi8>, memref<3x3x1x1xi8>) - outs(%output : memref<1x3x3x1xi8>) - gemmini.print %output : memref<1x3x3x1xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/matmul-os.mlir b/bb-tests/workloads/src/OpTest/gemmini/matmul-os.mlir deleted file mode 100644 index 9d704586..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/matmul-os.mlir +++ /dev/null @@ -1,44 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @gv1 : memref<4x4xi8> = dense<[[1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16]]> -memref.global "private" @gv2 : memref<4x4xi8> = dense<[[1, 1, 1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1]]> - -func.func @main() -> i64 { - %in = memref.get_global @gv1 : memref<4x4xi8> - %identity = memref.get_global @gv2 : memref<4x4xi8> - %out = memref.alloc() : memref<4x4xi8> - gemmini.print %out : memref<4x4xi8> - %inSpAddr = arith.constant 0 : i64 - %outSpAddr = arith.constant 4 : i64 - %identitySpAddr = arith.constant 8 : i64 - %cst4 = arith.constant 4 : i64 - %cst0 = arith.constant 0 : i64 - // CHECK: "gemmini.intr.config_st" - gemmini.config_st %cst4 : i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %in %inSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %identity %identitySpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ex" - gemmini.config_ex {dataflow = 0 } - // CHECK: "gemmini.intr.preload" - gemmini.preload_zeros %outSpAddr %cst4 %cst4 : i64 i64 i64 - // CHECK: "gemmini.intr.compute_preloaded" - gemmini.compute_preloaded %inSpAddr %identitySpAddr %cst4 %cst4 %cst4 %cst4 : i64 i64 i64 i64 i64 i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %out %outSpAddr : memref<4x4xi8> i64 - gemmini.print %out : memref<4x4xi8> - return %cst0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/matmul-ws.mlir b/bb-tests/workloads/src/OpTest/gemmini/matmul-ws.mlir deleted file mode 100644 index 0152565b..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/matmul-ws.mlir +++ /dev/null @@ -1,52 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @gv1 : memref<4x4xi8> = dense<[[1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16]]> -memref.global "private" @gv2 : memref<4x4xi8> = dense<[[1, 1, 1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1], - [1, 1, 1, 1]]> -memref.global "private" @gv3 : memref<4x4xi8> = dense<[[2, 2, 2, 2], - [2, 2, 2, 2], - [2, 2, 2, 2], - [2, 2, 2, 2]]> - -func.func @main() -> i64 { - %aArray = memref.get_global @gv1 : memref<4x4xi8> - %bArray = memref.get_global @gv2 : memref<4x4xi8> - %cArray = memref.alloc() : memref<4x4xi8> - %dArray = memref.get_global @gv3 : memref<4x4xi8> - gemmini.print %cArray : memref<4x4xi8> - %aSpAddr = arith.constant 0 : i64 - %bSpAddr = arith.constant 4 : i64 - %cSpAddr = arith.constant 8 : i64 - %dSpAddr = arith.constant 12 : i64 - %cst4 = arith.constant 4 : i64 - %cst0 = arith.constant 0 : i64 - // CHECK: "gemmini.intr.config_st" - gemmini.config_st %cst4 : i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %aArray %aSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %bArray %bSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %cArray %cSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %dArray %dSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ex" - gemmini.config_ex {dataflow = 1} - // CHECK: "gemmini.intr.preload" - gemmini.preload %bSpAddr %cSpAddr %cst4 %cst4 %cst4 %cst4: i64 i64 i64 i64 i64 i64 - // CHECK: "gemmini.intr.compute_preloaded" - gemmini.compute_preloaded %aSpAddr %dSpAddr %cst4 %cst4 %cst4 %cst4 : i64 i64 i64 i64 i64 i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %cArray %cSpAddr : memref<4x4xi8> i64 - gemmini.print %cArray : memref<4x4xi8> - return %cst0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/matmul.mlir b/bb-tests/workloads/src/OpTest/gemmini/matmul.mlir deleted file mode 100644 index ee53ea39..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/matmul.mlir +++ /dev/null @@ -1,28 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --convert-linalg-to-gemmini | \ -// RUN: FileCheck %s - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %1 = arith.constant 1 : i8 - %2 = arith.constant 2 : i8 - %mem0 = memref.alloc() : memref<8x8xi8> - %mem1 = memref.alloc() : memref<8x8xi8> - %mem2 = memref.alloc() : memref<8x8xi8> - linalg.fill - ins(%2 : i8) - outs(%mem0 : memref<8x8xi8>) - linalg.fill - ins(%1 : i8) - outs(%mem1 : memref<8x8xi8>) - // CHECK: gemmini.tile_matmul %alloc %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} %alloc_{{[0-9]+}} - // CHECK-SAME: memref<8x8xi8> memref<8x8xi8> memref<8x8xi8> memref<8x8xi32> - linalg.matmul - ins(%mem0, %mem1 : memref<8x8xi8>, memref<8x8xi8>) - outs(%mem2 : memref<8x8xi8>) - gemmini.print %mem2 : memref<8x8xi8> - memref.dealloc %mem0 : memref<8x8xi8> - memref.dealloc %mem1 : memref<8x8xi8> - memref.dealloc %mem2 : memref<8x8xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/matrix-add-scale.mlir b/bb-tests/workloads/src/OpTest/gemmini/matrix-add-scale.mlir deleted file mode 100644 index babebf85..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/matrix-add-scale.mlir +++ /dev/null @@ -1,40 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s -memref.global "private" @gv1 : memref<4x4xi8> = dense<[[1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16]]> -memref.global "private" @gv2 : memref<4x4xi8> = dense<[[17, 18, 19, 20], - [21, 22, 23, 24], - [25, 26, 27, 28], - [29, 30, 31, 32]]> - -func.func @main() -> i64 { - %arrayA = memref.get_global @gv1 : memref<4x4xi8> - %arrayB = memref.get_global @gv2 : memref<4x4xi8> - %arrayC = memref.alloc() : memref<4x4xi8> - gemmini.print %arrayC : memref<4x4xi8> - // 10000000000000000000000000000000 - %aAccAddr = arith.constant 2147483648 : i64 - // 11000000000000000000000000000000 - %bAccAddr = arith.constant 3221225472 : i64 - // 10000000000000000000000000000000 - %cAccAddr = arith.constant 2147483648 - %cst4 = arith.constant 4 : i64 - %cst0 = arith.constant 0 : i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 {shrunk = true, scale = 2.0 : f32 } : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %arrayA %aAccAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 {shrunk = true, scale = 2.0 : f32 } : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %arrayB %bAccAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_st" - gemmini.config_st %cst4 : i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %arrayC %cAccAddr : memref<4x4xi8> i64 - gemmini.print %arrayC : memref<4x4xi8> - return %cst0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/matrix-add.mlir b/bb-tests/workloads/src/OpTest/gemmini/matrix-add.mlir deleted file mode 100644 index 3011f562..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/matrix-add.mlir +++ /dev/null @@ -1,41 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @gv1 : memref<4x4xi8> = dense<[[1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16]]> -memref.global "private" @gv2 : memref<4x4xi8> = dense<[[17, 18, 19, 20], - [21, 22, 23, 24], - [25, 26, 27, 28], - [29, 30, 31, 32]]> - -func.func @main() -> i64 { - %arrayA = memref.get_global @gv1 : memref<4x4xi8> - %arrayB = memref.get_global @gv2 : memref<4x4xi8> - %arrayC = memref.alloc() : memref<4x4xi8> - gemmini.print %arrayC : memref<4x4xi8> - // 10000000000000000000000000000000 - %aAccAddr = arith.constant 2147483648 : i64 - // 11000000000000000000000000000000 - %bAccAddr = arith.constant 3221225472 : i64 - // 10000000000000000000000000000000 - %cAccAddr = arith.constant 2147483648 - %cst4 = arith.constant 4 : i64 - %cst0 = arith.constant 0 : i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 {shrunk = true} : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %arrayA %aAccAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 {shrunk = true} : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %arrayB %bAccAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_st" - gemmini.config_st %cst4 : i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %arrayC %cAccAddr : memref<4x4xi8> i64 - gemmini.print %arrayC : memref<4x4xi8> - return %cst0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/mvin-mvout.mlir b/bb-tests/workloads/src/OpTest/gemmini/mvin-mvout.mlir deleted file mode 100644 index bf3c5c88..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/mvin-mvout.mlir +++ /dev/null @@ -1,36 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @gv : memref<2x16xi8> = dense<[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], - [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %stride16 = arith.constant 16 : i64 - %stride8 = arith.constant 8 : i64 - %spadAddr = arith.constant 0 : i64 - %arrayA = memref.get_global @gv : memref<2x16xi8> - %arrayB = memref.alloc() : memref<3x16xi8> - %arrayC = memref.alloc() : memref<2x8xi8> - gemmini.print %arrayB : memref<3x16xi8> - gemmini.print %arrayC : memref<2x8xi8> - // CHECK: "gemmini.intr.config_st" - // The mvout op's stride is 16. - gemmini.config_st %stride16 : i64 - // CHECK: "gemmini.intr.config_ld" - // The mvin op's stride is 16 - gemmini.config_ld %stride16 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %arrayA %spadAddr : memref<2x16xi8> i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %arrayB %spadAddr : memref<3x16xi8> i64 - // CHECK: "gemmini.intr.config_st" - // The mvout op's stride is 8 - gemmini.config_st %stride8 : i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %arrayC %spadAddr : memref<2x8xi8> i64 - gemmini.print %arrayB : memref<3x16xi8> - gemmini.print %arrayC : memref<2x8xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-igelu.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-conv-igelu.mlir deleted file mode 100644 index cea3487c..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-igelu.mlir +++ /dev/null @@ -1,52 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -// batchSize = 1 inputDim = 5 inChannels = 1 -memref.global "private" @input : memref<1x5x5x1xi8> = dense<[[[[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]]]]> - -// outChannels = 2 kernelDim = 3 inChannels = 1 -memref.global "private" @weight : memref<9x2xi8> = dense<[[-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2]]> - -// outChannels = 2 -memref.global "private" @bias : memref<2xi32> = dense<[1,1]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %3 = arith.constant 3 : i64 - %input = memref.get_global @input : memref<1x5x5x1xi8> - %weight = memref.get_global @weight : memref<9x2xi8> - %bias = memref.get_global @bias : memref<2xi32> - %output = memref.alloc() : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1, act = 3}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-layernorm.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-conv-layernorm.mlir deleted file mode 100644 index 8d89ca89..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-layernorm.mlir +++ /dev/null @@ -1,52 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -// batchSize = 1 inputDim = 5 inChannels = 1 -memref.global "private" @input : memref<1x5x5x1xi8> = dense<[[[[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]]]]> - -// outChannels = 2 kernelDim = 3 inChannels = 1 -memref.global "private" @weight : memref<9x2xi8> = dense<[[-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2]]> - -// outChannels = 2 -memref.global "private" @bias : memref<2xi32> = dense<[1,1]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %3 = arith.constant 3 : i64 - %input = memref.get_global @input : memref<1x5x5x1xi8> - %weight = memref.get_global @weight : memref<9x2xi8> - %bias = memref.get_global @bias : memref<2xi32> - %output = memref.alloc() : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1, act = 2}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-relu.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-conv-relu.mlir deleted file mode 100644 index d2bac046..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-relu.mlir +++ /dev/null @@ -1,52 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -// batchSize = 1 inputDim = 5 inChannels = 1 -memref.global "private" @input : memref<1x5x5x1xi8> = dense<[[[[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]]]]> - -// outChannels = 2 kernelDim = 3 inChannels = 1 -memref.global "private" @weight : memref<9x2xi8> = dense<[[-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2]]> - -// outChannels = 2 -memref.global "private" @bias : memref<2xi32> = dense<[1,1]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %3 = arith.constant 3 : i64 - %input = memref.get_global @input : memref<1x5x5x1xi8> - %weight = memref.get_global @weight : memref<9x2xi8> - %bias = memref.get_global @bias : memref<2xi32> - %output = memref.alloc() : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1, act = 1}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-softmax.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-conv-softmax.mlir deleted file mode 100644 index a2958bf1..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-conv-softmax.mlir +++ /dev/null @@ -1,52 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -// batchSize = 1 inputDim = 5 inChannels = 1 -memref.global "private" @input : memref<1x5x5x1xi8> = dense<[[[[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]]]]> - -// outChannels = 2 kernelDim = 3 inChannels = 1 -memref.global "private" @weight : memref<9x2xi8> = dense<[[-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2], - [-1, 2], [-1, 2], [-1, 2]]> - -// outChannels = 2 -memref.global "private" @bias : memref<2xi32> = dense<[1,1]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %3 = arith.constant 3 : i64 - %input = memref.get_global @input : memref<1x5x5x1xi8> - %weight = memref.get_global @weight : memref<9x2xi8> - %bias = memref.get_global @bias : memref<2xi32> - %output = memref.alloc() : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1, act = 4}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-conv.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-conv.mlir deleted file mode 100644 index 567e657b..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-conv.mlir +++ /dev/null @@ -1,39 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -// batchSize = 1 inputDim = 5 inChannels = 1 -memref.global "private" @input : memref<1x5x5x1xi8> = dense<[[[[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1]]]]> - -// outChannels = 2 kernelDim = 3 inChannels = 1 -memref.global "private" @weight : memref<9x2xi8> = dense<[[1, 2], [1, 2], [1, 2], - [1, 2], [1, 2], [1, 2], - [1, 2], [1, 2], [1, 2]]> - -// outChannels = 2 -memref.global "private" @bias : memref<2xi32> = dense<[1,1]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %3 = arith.constant 3 : i64 - %input = memref.get_global @input : memref<1x5x5x1xi8> - %weight = memref.get_global @weight : memref<9x2xi8> - %bias = memref.get_global @bias : memref<2xi32> - %output = memref.alloc() : memref<9x2xi8> - // CHECK: "gemmini.intr.loop_conv_ws_config1" - // CHECK: "gemmini.intr.loop_conv_ws_config2" - // CHECK: "gemmini.intr.loop_conv_ws_config3" - // CHECK: "gemmini.intr.loop_conv_ws_config4" - // CHECK: "gemmini.intr.loop_conv_ws_config5" - // CHECK: "gemmini.intr.loop_conv_ws_config6" - // CHECK: "gemmini.intr.loop_conv_ws" - // CHECK: "gemmini.intr.flush" - gemmini.tile_conv %input %weight %bias %output %3 %3 %3 {stride = 1}: - memref<1x5x5x1xi8> memref<9x2xi8> memref<2xi32> memref<9x2xi8> i64 i64 i64 - gemmini.print %output : memref<9x2xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-os.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-os.mlir deleted file mode 100644 index 78412703..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-os.mlir +++ /dev/null @@ -1,37 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -func.func @main() -> i8 { - %i0 = arith.constant 0 : i8 - %i1I8 = arith.constant 1 : i8 - %i2I8 = arith.constant 2 : i8 - %i2I32 = arith.constant 2 : i32 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %aArray = memref.alloc() {alignment = 16} : memref<64x64xi8> - %bArray = memref.alloc() {alignment = 16}: memref<64x64xi8> - %cArray = memref.alloc() {alignment = 16}: memref<64x64xi8> - %dArray = memref.alloc() {alignment = 64} : memref<64x64xi32> - %dim = memref.dim %aArray, %c0 : memref<64x64xi8> - scf.for %i = %c0 to %dim step %c1 { - scf.for %j = %c0 to %dim step %c1 { - memref.store %i1I8, %aArray[%i, %j] : memref<64x64xi8> - memref.store %i1I8, %bArray[%i, %j] : memref<64x64xi8> - memref.store %i2I32, %dArray[%i, %j] : memref<64x64xi32> - } - } - - gemmini.print %aArray : memref<64x64xi8> - gemmini.print %bArray : memref<64x64xi8> - gemmini.print %dArray : memref<64x64xi32> - // CHECK: "gemmini.intr.config_ld" - // CHECK: "gemmini.intr.mvin" - // CHECK: "gemmini.intr.preload" - // CHECK: "gemmini.intr.compute_preloaded" - // CHECK: "gemmini.intr.compute_accumulated" - // CHECK: "gemmini.intr.mvout" - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=0} : memref<64x64xi8> memref<64x64xi8> memref<64x64xi8> memref <64x64xi32> - gemmini.print %cArray : memref<64x64xi8> - return %i0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-igelu.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-igelu.mlir deleted file mode 100644 index fdc318bf..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-igelu.mlir +++ /dev/null @@ -1,49 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @g1 : memref<5x5xi8> = dense<[[1, 0, 0, 1, 0], [1, -1, 1, 0, 0], [-1, 0, 1, -1, 1], [1, 0, 0, 1, 0], [-1, 0, 0, -1, 0]]> -memref.global "private" @g2 : memref<5x5xi8> = dense<[[1, -1, 0, 0, 1], [1, 0, -1, 0, -1], [-1, -1, 0, -1, 1], [-1, 0, 0, 1, 0], [1, 0, 0, -1, 0]]> - - -func.func @main() -> i8 { - %i0 = arith.constant 0 : i8 - %i1I8 = arith.constant 1 : i8 - %minus1 = arith.constant -2 : i8 - %i2I8 = arith.constant 2 : i8 - %i2I32 = arith.constant 2 : i32 - %dI32 = arith.constant 0 : i32 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %aArray = memref.get_global @g1 : memref<5x5xi8> - %bArray = memref.get_global @g2 : memref<5x5xi8> - %cArray = memref.alloc() : memref<5x5xi8> - %dArray = memref.alloc() : memref<5x5xi32> - %dim_I = memref.dim %aArray, %c0 : memref<5x5xi8> - %dim_J = memref.dim %bArray, %c1 : memref<5x5xi8> - %dim_K = memref.dim %aArray, %c1 : memref<5x5xi8> - - scf.for %i3 = %c0 to %dim_I step %c1 { - scf.for %j3 = %c0 to %dim_J step %c1 { - memref.store %dI32, %dArray[%i3, %j3] : memref<5x5xi32> - } - } - - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - - // CHECK: "gemmini.intr.config_ex" - // CHECK: "gemmini.intr.config_st" - // CHECK: "gemmini.intr.config_ld" - // CHECK: "gemmini.intr.config_norm" - // CHECK: "gemmini.intr.loop_ws_config_bounds" - // CHECK: "gemmini.intr.loop_ws_config_addrs_ab" - // CHECK: "gemmini.intr.loop_ws_config_addrs_dc" - // CHECK: "gemmini.intr.loop_ws_config_strides_ab" - // CHECK: "gemmini.intr.loop_ws_config_strides_dc" - // CHECK: "gemmini.intr.loop_ws" - // CHECk: "gemmini.intr.flush" - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1, act=3, bertScale=0.8:f32}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - return %i0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-layernorm.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-layernorm.mlir deleted file mode 100644 index b9e5d933..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-layernorm.mlir +++ /dev/null @@ -1,48 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @g1 : memref<5x5xi8> = dense<[[1, 0, 0, 0, 0], [0, -1, 1, 0, 1], [-1, 0, -1, -1, 0], [-1, 0, 0, 1, 0], [0, 0, 0, 0, 0]]> -memref.global "private" @g2 : memref<5x5xi8> = dense<[[-1, 0, 1, 0, -1], [1, -1, 1, 0, -1], [-1, -1, -1, 1, 1], [-1, 1, 0, -1, 1], [-1, 0, 1, 1, 1]]> - - -func.func @main() -> i8 { - %i0 = arith.constant 0 : i8 - %i1I8 = arith.constant 1 : i8 - %minus1 = arith.constant -2 : i8 - %i2I8 = arith.constant 2 : i8 - %i2I32 = arith.constant 2 : i32 - %dI32 = arith.constant 0 : i32 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %aArray = memref.get_global @g1 : memref<5x5xi8> - %bArray = memref.get_global @g2 : memref<5x5xi8> - %cArray = memref.alloc() : memref<5x5xi8> - %dArray = memref.alloc() : memref<5x5xi32> - %dim_I = memref.dim %aArray, %c0 : memref<5x5xi8> - %dim_J = memref.dim %bArray, %c1 : memref<5x5xi8> - %dim_K = memref.dim %aArray, %c1 : memref<5x5xi8> - - scf.for %i3 = %c0 to %dim_I step %c1 { - scf.for %j3 = %c0 to %dim_J step %c1 { - memref.store %dI32, %dArray[%i3, %j3] : memref<5x5xi32> - } - } - - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - - // CHECK: "gemmini.intr.config_ex" - // CHECK: "gemmini.intr.config_st" - // CHECK: "gemmini.intr.config_ld" - // CHECK: "gemmini.intr.loop_ws_config_bounds" - // CHECK: "gemmini.intr.loop_ws_config_addrs_ab" - // CHECK: "gemmini.intr.loop_ws_config_addrs_dc" - // CHECK: "gemmini.intr.loop_ws_config_strides_ab" - // CHECK: "gemmini.intr.loop_ws_config_strides_dc" - // CHECK: "gemmini.intr.loop_ws" - // CHECk: "gemmini.intr.flush" - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1, act=2}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - return %i0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-relu.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-relu.mlir deleted file mode 100644 index 2e63385d..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-relu.mlir +++ /dev/null @@ -1,48 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @g1 : memref<5x5xi8> = dense<[[1, 0, 0, 1, 0], [1, -1, 1, 0, 0], [-1, 0, 1, -1, 1], [1, 0, 0, 1, 0], [-1, 0, 0, -1, 0]]> -memref.global "private" @g2 : memref<5x5xi8> = dense<[[1, -1, 0, 0, 1], [1, 0, -1, 0, -1], [-1, -1, 0, -1, 1], [-1, 0, 0, 1, 0], [1, 0, 0, -1, 0]]> - - -func.func @main() -> i8 { - %i0 = arith.constant 0 : i8 - %i1I8 = arith.constant 1 : i8 - %minus1 = arith.constant -2 : i8 - %i2I8 = arith.constant 2 : i8 - %i2I32 = arith.constant 2 : i32 - %dI32 = arith.constant 0 : i32 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %aArray = memref.get_global @g1 : memref<5x5xi8> - %bArray = memref.get_global @g2 : memref<5x5xi8> - %cArray = memref.alloc() : memref<5x5xi8> - %dArray = memref.alloc() : memref<5x5xi32> - %dim_I = memref.dim %aArray, %c0 : memref<5x5xi8> - %dim_J = memref.dim %bArray, %c1 : memref<5x5xi8> - %dim_K = memref.dim %aArray, %c1 : memref<5x5xi8> - - scf.for %i3 = %c0 to %dim_I step %c1 { - scf.for %j3 = %c0 to %dim_J step %c1 { - memref.store %dI32, %dArray[%i3, %j3] : memref<5x5xi32> - } - } - - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - - // CHECK: "gemmini.intr.config_ex" - // CHECK: "gemmini.intr.config_st" - // CHECK: "gemmini.intr.config_ld" - // CHECK: "gemmini.intr.loop_ws_config_bounds" - // CHECK: "gemmini.intr.loop_ws_config_addrs_ab" - // CHECK: "gemmini.intr.loop_ws_config_addrs_dc" - // CHECK: "gemmini.intr.loop_ws_config_strides_ab" - // CHECK: "gemmini.intr.loop_ws_config_strides_dc" - // CHECK: "gemmini.intr.loop_ws" - // CHECk: "gemmini.intr.flush" - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1, act=1}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - return %i0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-softmax.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-softmax.mlir deleted file mode 100644 index 18c46ad9..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul-ws-softmax.mlir +++ /dev/null @@ -1,49 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @g1 : memref<5x5xi8> = dense<[[1, 0, 0, 1, 0], [1, -1, 1, 0, 0], [-1, 0, 1, -1, 1], [1, 0, 0, 1, 0], [-1, 0, 0, -1, 0]]> -memref.global "private" @g2 : memref<5x5xi8> = dense<[[1, -1, 0, 0, 1], [1, 0, -1, 0, -1], [-1, -1, 0, -1, 1], [-1, 0, 0, 1, 0], [1, 0, 0, -1, 0]]> - - -func.func @main() -> i8 { - %i0 = arith.constant 0 : i8 - %i1I8 = arith.constant 1 : i8 - %minus1 = arith.constant -2 : i8 - %i2I8 = arith.constant 2 : i8 - %i2I32 = arith.constant 2 : i32 - %dI32 = arith.constant 0 : i32 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %aArray = memref.get_global @g1 : memref<5x5xi8> - %bArray = memref.get_global @g2 : memref<5x5xi8> - %cArray = memref.alloc() : memref<5x5xi8> - %dArray = memref.alloc() : memref<5x5xi32> - %dim_I = memref.dim %aArray, %c0 : memref<5x5xi8> - %dim_J = memref.dim %bArray, %c1 : memref<5x5xi8> - %dim_K = memref.dim %aArray, %c1 : memref<5x5xi8> - - scf.for %i3 = %c0 to %dim_I step %c1 { - scf.for %j3 = %c0 to %dim_J step %c1 { - memref.store %dI32, %dArray[%i3, %j3] : memref<5x5xi32> - } - } - - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - - // CHECK: "gemmini.intr.config_ex" - // CHECK: "gemmini.intr.config_st" - // CHECK: "gemmini.intr.config_ld" - // CHECK: "gemmini.intr.config_norm" - // CHECK: "gemmini.intr.loop_ws_config_bounds" - // CHECK: "gemmini.intr.loop_ws_config_addrs_ab" - // CHECK: "gemmini.intr.loop_ws_config_addrs_dc" - // CHECK: "gemmini.intr.loop_ws_config_strides_ab" - // CHECK: "gemmini.intr.loop_ws_config_strides_dc" - // CHECK: "gemmini.intr.loop_ws" - // CHECk: "gemmini.intr.flush" - gemmini.tile_matmul %aArray %bArray %cArray %dArray {dataflow=1, act=4, bertScale=0.05:f32}: memref<5x5xi8> memref<5x5xi8> memref<5x5xi8> memref<5x5xi32> - gemmini.print %cArray : memref<5x5xi8> - return %i0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-matmul.mlir deleted file mode 100644 index ef8e0041..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-matmul.mlir +++ /dev/null @@ -1,37 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -func.func @main() -> i8 { - %i0 = arith.constant 0 : i8 - %i1I8 = arith.constant 1 : i8 - %i2I8 = arith.constant 2 : i8 - %i2I32 = arith.constant 2 : i32 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %aArray = memref.alloc() : memref<64x64xi8> - %bArray = memref.alloc() : memref<64x64xi8> - %cArray = memref.alloc() : memref<64x64xi8> - %dArray = memref.alloc() : memref<64x64xi32> - %dim = memref.dim %aArray, %c0 : memref<64x64xi8> - scf.for %i = %c0 to %dim step %c1 { - scf.for %j = %c0 to %dim step %c1 { - memref.store %i1I8, %aArray[%i, %j] : memref<64x64xi8> - memref.store %i1I8, %bArray[%i, %j] : memref<64x64xi8> - memref.store %i2I32, %dArray[%i, %j] : memref<64x64xi32> - } - } - gemmini.print %aArray : memref<64x64xi8> - gemmini.print %bArray : memref<64x64xi8> - gemmini.print %dArray : memref<64x64xi32> - // CHECK: "gemmini.intr.loop_ws_config_bounds" - // CHECK: "gemmini.intr.loop_ws_config_addrs_ab" - // CHECK: "gemmini.intr.loop_ws_config_addrs_dc" - // CHECK: "gemmini.intr.loop_ws_config_strides_ab" - // CHECK: "gemmini.intr.loop_ws_config_strides_dc" - // CHECK: "gemmini.intr.loop_ws" - // CHECk: "gemmini.intr.flush" - gemmini.tile_matmul %aArray %bArray %cArray %dArray : memref<64x64xi8> memref<64x64xi8> memref<64x64xi8> memref <64x64xi32> - gemmini.print %cArray : memref<64x64xi8> - return %i0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/tile-rect-conv.mlir b/bb-tests/workloads/src/OpTest/gemmini/tile-rect-conv.mlir deleted file mode 100644 index 6d083d73..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/tile-rect-conv.mlir +++ /dev/null @@ -1,42 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -// batchSize = 1 inputRowDim = 5 inputColDim = 10 inChannels = 1 -memref.global "private" @input : memref<1x5x10x1xi8> = dense<[[[[1], [0], [-1], [0], [1], [1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1], [1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1], [1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1], [1], [0], [-1], [0], [1]], - [[1], [0], [-1], [0], [1], [1], [0], [-1], [0], [1]]]]> - -// outChannels = 2 kernelDim = 3 inChannels = 1 -memref.global "private" @weight : memref<9x2xi8> = dense<[[1, 2], [1, 2], [1, 2], - [1, 2], [1, 2], [1, 2], - [1, 2], [1, 2], [1, 2]]> - -// outChannels = 2 -memref.global "private" @bias : memref<2xi32> = dense<[1,1]> - -func.func @main() -> i64 { - %0 = arith.constant 0 : i64 - %3 = arith.constant 3 : i64 - %8 = arith.constant 8 : i64 - - %input = memref.get_global @input : memref<1x5x10x1xi8> - %weight = memref.get_global @weight : memref<9x2xi8> - %bias = memref.get_global @bias : memref<2xi32> - %output = memref.alloc() : memref<24x2xi8> - // CHECK: "gemmini.intr.config_st" - // CHECK: "gemmini.intr.config_ex" - // CHECK: "gemmini.intr.config_ld" - // CHECK: "gemmini.intr.mvin3" - // CHECK: "gemmini.intr.mvin" - // CHECK: "gemmini.intr.mvin2" - // CHECK: "gemmini.intr.preload" - // CHECK: "gemmini.intr.compute_preloaded" - // CHECK: "gemmini.intr.compute_accumulated" - gemmini.tile_conv %input %weight %bias %output %3 %8 %3 {stride = 1}: - memref<1x5x10x1xi8> memref<9x2xi8> memref<2xi32> memref<24x2xi8> i64 i64 i64 - gemmini.print %output : memref<24x2xi8> - return %0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/gemmini/transpose.mlir b/bb-tests/workloads/src/OpTest/gemmini/transpose.mlir deleted file mode 100644 index 07508383..00000000 --- a/bb-tests/workloads/src/OpTest/gemmini/transpose.mlir +++ /dev/null @@ -1,44 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: --lower-gemmini | \ -// RUN: FileCheck %s - -memref.global "private" @gv1 : memref<4x4xi8> = dense<[[1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16]]> -memref.global "private" @gv2 : memref<4x4xi8> = dense<[[1, 0, 0, 0], - [0, 1, 0, 0], - [0, 0, 1, 0], - [0, 0, 0, 1]]> - -func.func @main() -> i64 { - %in = memref.get_global @gv1 : memref<4x4xi8> - %identity = memref.get_global @gv2 : memref<4x4xi8> - %out = memref.alloc() : memref<4x4xi8> - gemmini.print %out : memref<4x4xi8> - %inSpAddr = arith.constant 0 : i64 - %outSpAddr = arith.constant 4 : i64 - %identitySpAddr = arith.constant 8 : i64 - %cst4 = arith.constant 4 : i64 - %cst0 = arith.constant 0 : i64 - // CHECK: "gemmini.intr.config_st" - gemmini.config_st %cst4 : i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %in %inSpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ld" - gemmini.config_ld %cst4 : i64 - // CHECK: "gemmini.intr.mvin" - gemmini.mvin %identity %identitySpAddr : memref<4x4xi8> i64 - // CHECK: "gemmini.intr.config_ex" - gemmini.config_ex {dataflow = 0, aTranspose = true } - // CHECK: "gemmini.intr.preload" - gemmini.preload_zeros %outSpAddr %cst4 %cst4 : i64 i64 i64 - // CHECK: "gemmini.intr.compute_preloaded" - gemmini.compute_preloaded %inSpAddr %identitySpAddr %cst4 %cst4 %cst4 %cst4 : i64 i64 i64 i64 i64 i64 - // CHECK: "gemmini.intr.mvout" - gemmini.mvout %out %outSpAddr : memref<4x4xi8> i64 - gemmini.print %out : memref<4x4xi8> - return %cst0 : i64 -} diff --git a/bb-tests/workloads/src/OpTest/tile/README.md b/bb-tests/workloads/src/OpTest/tile/README.md deleted file mode 100644 index 2ce05f8b..00000000 --- a/bb-tests/workloads/src/OpTest/tile/README.md +++ /dev/null @@ -1,43 +0,0 @@ -# Tile Dialect Test Cases - -## Test Files - -### Stage-by-Stage Tests -- `tile-matmul.mlir` - Test Linalg → Tile conversion -- `tile-to-buckyball.mlir` - Test Tile → Buckyball conversion -- `buckyball-to-llvm.mlir` - Test Buckyball MatMulOp → LLVM Intrinsics conversion - -### End-to-End Tests -- `end-to-end.mlir` - Complete conversion pipeline test (Linalg → Tile → Buckyball → LLVM) - -## Running Tests - -```bash -# Stage 1: Linalg → Tile -./compiler/build/bin/buddy-opt bb-tests/workloads/src/OpTest/tile/tile-matmul.mlir -convert-linalg-to-tile - -# Stage 2: Tile → Buckyball -./compiler/build/bin/buddy-opt bb-tests/workloads/src/OpTest/tile/tile-to-buckyball.mlir -convert-tile-to-buckyball - -# Stage 3: Buckyball → LLVM Intrinsics -./compiler/build/bin/buddy-opt bb-tests/workloads/src/OpTest/tile/buckyball-to-llvm.mlir \ - -lower-buckyball="dim=16 sp_addr_len=14 spad_rows=1024 acc_rows=1024 warp=16 lane=16" - -# End-to-end test: Complete pipeline -./compiler/build/bin/buddy-opt bb-tests/workloads/src/OpTest/tile/end-to-end.mlir \ - -convert-linalg-to-tile \ - -convert-tile-to-buckyball \ - -lower-buckyball="dim=16 sp_addr_len=14 spad_rows=1024 acc_rows=1024 warp=16 lane=16" -``` - -## New Architecture Description - -``` -Linalg MatmulOp - ↓ (convert-linalg-to-tile) -Tile TileMatMulOp - ↓ (convert-tile-to-buckyball) -Buckyball MatMulOp - ↓ (lower-buckyball) -LLVM Intrinsics (mvin/mvout/mul_warp16) -``` diff --git a/bb-tests/workloads/src/OpTest/tile/buckyball-to-llvm.mlir b/bb-tests/workloads/src/OpTest/tile/buckyball-to-llvm.mlir deleted file mode 100644 index ea10d8a3..00000000 --- a/bb-tests/workloads/src/OpTest/tile/buckyball-to-llvm.mlir +++ /dev/null @@ -1,10 +0,0 @@ -// RUN: buddy-opt %s -lower-buckyball="dim=16 sp_addr_len=14 spad_rows=1024 acc_rows=1024 warp=16 lane=16" | FileCheck %s - -func.func @buckyball_matmul_to_llvm(%arg0: memref<32x16xi8>, %arg1: memref<16x32xi8>, %arg2: memref<32x32xi32>) { - // CHECK: buckyball.intr.bb_mvin - // CHECK: buckyball.intr.bb_mvin - // CHECK: buckyball.intr.bb_mul_warp16 - // CHECK: buckyball.intr.bb_mvout - buckyball.bb_matmul %arg0 %arg1 %arg2 : memref<32x16xi8> memref<16x32xi8> memref<32x32xi32> - return -} diff --git a/bb-tests/workloads/src/OpTest/tile/end-to-end.mlir b/bb-tests/workloads/src/OpTest/tile/end-to-end.mlir deleted file mode 100644 index 09b69d2a..00000000 --- a/bb-tests/workloads/src/OpTest/tile/end-to-end.mlir +++ /dev/null @@ -1,16 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: -convert-linalg-to-tile \ -// RUN: -convert-tile-to-buckyball \ -// RUN: -lower-buckyball="dim=16 sp_addr_len=14 spad_rows=1024 acc_rows=1024 warp=16 lane=16" \ -// RUN: | FileCheck %s - -// Complete conversion flow test: linalg.matmul → tile.tile_matmul → buckyball.bb_matmul → intrinsics -func.func @end_to_end_test(%arg0: memref<32x32xi8>, %arg1: memref<32x32xi8>, %arg2: memref<32x32xi32>) { - // CHECK: buckyball.intr.bb_mvin - // CHECK: buckyball.intr.bb_mul_warp16 - // CHECK: buckyball.intr.bb_mvout - linalg.matmul - ins(%arg0, %arg1 : memref<32x32xi8>, memref<32x32xi8>) - outs(%arg2 : memref<32x32xi32>) - return -} diff --git a/bb-tests/workloads/src/OpTest/tile/tile-matmul.mlir b/bb-tests/workloads/src/OpTest/tile/tile-matmul.mlir deleted file mode 100644 index 1df68ff5..00000000 --- a/bb-tests/workloads/src/OpTest/tile/tile-matmul.mlir +++ /dev/null @@ -1,14 +0,0 @@ -// RUN: buddy-opt %s -convert-linalg-to-tile | FileCheck %s - -#map0 = affine_map<(d0, d1) -> (d0, d1)> -#map1 = affine_map<(d0, d1, d2) -> (d0, d2)> -#map2 = affine_map<(d0, d1, d2) -> (d2, d1)> -#map3 = affine_map<(d0, d1, d2) -> (d0, d1)> - -func.func @matmul_tile(%arg0: memref<32x32xi8>, %arg1: memref<32x32xi8>, %arg2: memref<32x32xi32>) { - // CHECK: tile.tile_matmul - linalg.matmul - ins(%arg0, %arg1 : memref<32x32xi8>, memref<32x32xi8>) - outs(%arg2 : memref<32x32xi32>) - return -} diff --git a/bb-tests/workloads/src/OpTest/tile/tile-to-buckyball.mlir b/bb-tests/workloads/src/OpTest/tile/tile-to-buckyball.mlir deleted file mode 100644 index 3ea2f41a..00000000 --- a/bb-tests/workloads/src/OpTest/tile/tile-to-buckyball.mlir +++ /dev/null @@ -1,7 +0,0 @@ -// RUN: buddy-opt %s -convert-tile-to-buckyball | FileCheck %s - -func.func @tile_to_buckyball(%arg0: memref<32x32xi8>, %arg1: memref<32x32xi8>, %arg2: memref<32x32xi32>) { - // CHECK: buckyball.bb_matmul - tile.tile_matmul %arg0 %arg1 %arg2 : memref<32x32xi8> memref<32x32xi8> memref<32x32xi32> - return -} diff --git a/bb-tests/workloads/src/OpTest/tile/tile-transpose.mlir b/bb-tests/workloads/src/OpTest/tile/tile-transpose.mlir deleted file mode 100644 index f281e9bd..00000000 --- a/bb-tests/workloads/src/OpTest/tile/tile-transpose.mlir +++ /dev/null @@ -1,13 +0,0 @@ -// RUN: buddy-opt %s -convert-linalg-to-tile | FileCheck %s - -#map0 = affine_map<(d0, d1) -> (d0, d1)> -#map1 = affine_map<(d0, d1) -> (d1, d0)> - -func.func @transpose_tile(%arg0: memref<32x32xi8>, %arg1: memref<32x32xi8>) { - // CHECK: tile.tile_transpose - linalg.transpose - ins(%arg0 : memref<32x32xi8>) - outs(%arg1 : memref<32x32xi8>) - permutation = [1, 0] - return -} diff --git a/bb-tests/workloads/src/OpTest/toy/CMakeLists.txt b/bb-tests/workloads/src/OpTest/toy/CMakeLists.txt deleted file mode 100644 index 2f5f306e..00000000 --- a/bb-tests/workloads/src/OpTest/toy/CMakeLists.txt +++ /dev/null @@ -1,137 +0,0 @@ - - -#------------------------------------------------------------------------------- -# Generate executables for different platforms -#------------------------------------------------------------------------------- -# single-core baremetal -# function(add_baremetal_target TARGET_NAME MLIR_FILE) -# set(OBJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-baremetal.o") -# set(EXECUTABLE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-baremetal") - -# # Compile xx.mlir to xx-baremetal.o -# add_custom_command( -# OUTPUT ${OBJ_FILE} -# COMMAND ${BUDDY_OPT} ${MLIR_FILE} -# -lower-buckyball | -# ${BUDDY_TRANSLATE} --buddy-to-llvmir | -# ${BUDDY_LLC} -filetype=obj -mtriple=riscv64 -# -mattr=+buddyext,+D -float-abi=hard -# -relocation-model=pic -# -o ${OBJ_FILE} -# DEPENDS ${MLIR_FILE} -# COMMENT "Compiling ${MLIR_FILE} to object file" -# ) - -# # Link xx-baremetal.o to xx-baremetal -# add_custom_command( - -# OUTPUT ${EXECUTABLE} -# COMMAND ${ELF_CC} -O2 -static -specs=htif_nano.specs -# ${OBJ_FILE} -o ${EXECUTABLE} -# DEPENDS ${OBJ_FILE} -# COMMENT "Linking baremetal executable: ${EXECUTABLE}" -# ) - -# # Create target for xx-baremetal -# add_custom_target(${TARGET_NAME}_baremetal -# DEPENDS ${EXECUTABLE} -# ) -# endfunction() - -# multi-core baremetal -# function(add_baremetal_target TARGET_NAME MLIR_FILE) -# set(OBJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-baremetal.o") -# set(EXECUTABLE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-baremetal") - -# # Compile xx.mlir to xx-baremetal.o -# add_custom_command( -# OUTPUT ${OBJ_FILE} -# COMMAND ${BUDDY_OPT} ${MLIR_FILE} -# -lower-buckyball=hartId=3 | -# ${BUDDY_TRANSLATE} --buddy-to-llvmir | -# ${BUDDY_LLC} -filetype=obj -mtriple=riscv64 -# -mattr=+buddyext,+D -float-abi=hard -# -relocation-model=pic -# -o ${OBJ_FILE} -# DEPENDS ${MLIR_FILE} -# COMMENT "Compiling ${MLIR_FILE} to object file" -# ) - -# # Link xx-baremetal.o to xx-baremetal -# add_custom_command( - -# OUTPUT ${EXECUTABLE} -# COMMAND ${ELF_CC} -O2 -static -specs=htif_nano.specs -# ${OBJ_FILE} -o ${EXECUTABLE} -# DEPENDS ${OBJ_FILE} -# COMMENT "Linking baremetal executable: ${EXECUTABLE}" -# ) - -# # Create target for xx-baremetal -# add_custom_target(${TARGET_NAME}_baremetal -# DEPENDS ${EXECUTABLE} -# ) -# endfunction() - -# Generate Linux executables -# function(add_linux_target TARGET_NAME MLIR_FILE) -# set(OBJ_FILE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-linux.o") -# set(EXECUTABLE "${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}-linux") - -# # Compile xx.mlir to xx-linux.o -# add_custom_command( -# OUTPUT ${OBJ_FILE} -# COMMAND ${BUDDY_OPT} ${MLIR_FILE} -# -lower-buckyball | -# ${BUDDY_TRANSLATE} --buddy-to-llvmir | -# ${BUDDY_LLC} -filetype=obj -mtriple=riscv64 -# -mattr=+buddyext,+D -float-abi=hard -# -o ${OBJ_FILE} -# DEPENDS ${MLIR_FILE} -# COMMENT "Compiling ${MLIR_FILE} to Linux object file" -# ) - -# # Link xx-linux.o to xx-linux -# add_custom_command( -# OUTPUT ${EXECUTABLE} -# COMMAND ${LINUX_CC} -O2 -static ${OBJ_FILE} -o ${EXECUTABLE} -# DEPENDS ${OBJ_FILE} -# COMMENT "Linking Linux executable: ${EXECUTABLE}" -# ) - -# # Create target for xx-linux - use target name prefix to avoid conflicts -# add_custom_target(${TARGET_NAME}_linux -# DEPENDS ${EXECUTABLE} -# ) -# endfunction() - - -# Unified build function -# function(add_cross_platform_target TARGET_NAME MLIR_FILE) -# add_multicore_baremetal_target(${TARGET_NAME} ${MLIR_FILE}) -# add_singlecore_baremetal_target(${TARGET_NAME} ${MLIR_FILE}) -# add_linux_target(${TARGET_NAME} ${MLIR_FILE}) - -# add_custom_target(${TARGET_NAME} -# DEPENDS ${TARGET_NAME}_multicore_baremetal ${TARGET_NAME}_singlecore_baremetal ${TARGET_NAME}_linux -# COMMENT "Building ${TARGET_NAME} for both baremetal and Linux" -# ) -# endfunction() - - -#------------------------------------------------------------------------------- -# Build list -#------------------------------------------------------------------------------- -# add_cross_platform_target(bb_mvin_mvout ${OPTEST_TOY_DIR}/bb_mvin_mvout.mlir) -# add_cross_platform_target(bb_dma1 ${OPTEST_TOY_DIR}/bb_dma1.mlir) -# add_cross_platform_target(bb_dma2 ${OPTEST_TOY_DIR}/bb_dma2.mlir) -# add_cross_platform_target(bb_dma3 ${OPTEST_TOY_DIR}/bb_dma3.mlir) -# add_cross_platform_target(bb_mul_warp16 ${OPTEST_TOY_DIR}/bb_mul_warp16.mlir) - -# add_custom_target(OpTest-toy ALL DEPENDS -# bb_mvin_mvout -# bb_dma1 -# bb_dma2 -# bb_dma3 -# bb_mul_warp16 -# ) diff --git a/bb-tests/workloads/src/OpTest/toy/bb_dma1.mlir b/bb-tests/workloads/src/OpTest/toy/bb_dma1.mlir deleted file mode 100644 index c5810df4..00000000 --- a/bb-tests/workloads/src/OpTest/toy/bb_dma1.mlir +++ /dev/null @@ -1,46 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: -lower-buckyball | \ -// RUN: FileCheck %s - -// Spec: -// Purpose: verify correctness of DMA module when facing 16-byte aligned addresses -// 1. Print input matrix -// 2. Print matrix at target address before move [CHECK1] print result should be all-zero matrix -// 3. Use mvin to move data from 16-byte aligned source address to scratchpad -// 4. Use mvout to move data from scratchpad to 16-byte aligned target address -// 5. Print matrix at target address after move [CHECK2] print result should be same as input matrix - -memref.global "private" @input_matrix_aligned : memref<4x16xi8> = dense<[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], - [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], - [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], - [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - // 16-byte aligned scratchpad address - %spadAddr16 = arith.constant 16 : i64 - // 32-byte aligned scratchpad address - %spadAddr32 = arith.constant 32 : i64 - - %arrayA = memref.get_global @input_matrix_aligned : memref<4x16xi8> - %arrayB = memref.alloc() {alignment = 16} : memref<4x16xi8> - - // Print input matrix - // Print target matrix before move [CHECK1] - buckyball.print %arrayA : memref<4x16xi8> - buckyball.print %arrayB : memref<4x16xi8> - - // Use mvin to move data from 16-byte aligned address to scratchpad - // CHECK: mvin - // Use mvout to move data from scratchpad to 16-byte aligned target address - buckyball.bb_mvin %arrayA %spadAddr16 : memref<4x16xi8> i64 - // CHECK: mvout - buckyball.bb_mvout %arrayB %spadAddr16 : memref<4x16xi8> i64 - - // Print output matrix after move [CHECK2] - buckyball.print %arrayB : memref<4x16xi8> - - // Release allocated memory - memref.dealloc %arrayB : memref<4x16xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/toy/bb_dma2.mlir b/bb-tests/workloads/src/OpTest/toy/bb_dma2.mlir deleted file mode 100644 index 81e847ca..00000000 --- a/bb-tests/workloads/src/OpTest/toy/bb_dma2.mlir +++ /dev/null @@ -1,64 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: -lower-buckyball | \ -// RUN: FileCheck %s - -// Spec: -// Purpose: verify correctness of DMA module's fast alternating read/write -// 1. Print input matrices A and B -// 2. Print matrix at target address before move [CHECK1] print result should be all-zero matrix -// 3. Execute fast alternating operations: use mvin to read row by row, mvout to write -// 4. Print matrix at target address after move [CHECK2] print result should show A and B contents swapped - -memref.global "private" @input_matrix_a : memref<2x16xi8> = dense<[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], - [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]]> - -memref.global "private" @input_matrix_b : memref<2x16xi8> = dense<[[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115], - [116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 126, 125, 124, 123]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - // Scratchpad address 1 - %spadAddr1 = arith.constant 10 : i64 - // Scratchpad address 2 - %spadAddr2 = arith.constant 50 : i64 - - %arrayA = memref.get_global @input_matrix_a : memref<2x16xi8> - %arrayB = memref.get_global @input_matrix_b : memref<2x16xi8> - %arrayTemp = memref.alloc() {alignment = 16} : memref<2x16xi8> - - // Print input matrices A and B - buckyball.print %arrayA : memref<2x16xi8> - buckyball.print %arrayB : memref<2x16xi8> - // Print temporary matrix before move [CHECK1] - buckyball.print %arrayTemp : memref<2x16xi8> - - // Fast alternating mvin/mvout operation sequence - // Step 1: A -> scratchpad 1 - // CHECK: mvin - // Step 2: scratchpad 1 -> temp - buckyball.bb_mvin %arrayA %spadAddr1 : memref<2x16xi8> i64 - // CHECK: mvout - buckyball.bb_mvout %arrayTemp %spadAddr1 : memref<2x16xi8> i64 - - // Step 3: B -> scratchpad 2 - // CHECK: mvin - // Step 4: scratchpad 2 -> A - buckyball.bb_mvin %arrayB %spadAddr2 : memref<2x16xi8> i64 - // CHECK: mvout - buckyball.bb_mvout %arrayA %spadAddr2 : memref<2x16xi8> i64 - - // Step 5: temp -> scratchpad 1 - // CHECK: mvin - // Step 6: scratchpad 1 -> B - buckyball.bb_mvin %arrayTemp %spadAddr1 : memref<2x16xi8> i64 - // CHECK: mvout - buckyball.bb_mvout %arrayB %spadAddr1 : memref<2x16xi8> i64 - - // Print swapped matrices [CHECK2] - buckyball.print %arrayA : memref<2x16xi8> - buckyball.print %arrayB : memref<2x16xi8> - - // Release allocated memory - memref.dealloc %arrayTemp : memref<2x16xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/toy/bb_dma3.mlir b/bb-tests/workloads/src/OpTest/toy/bb_dma3.mlir deleted file mode 100644 index 7a58e6c8..00000000 --- a/bb-tests/workloads/src/OpTest/toy/bb_dma3.mlir +++ /dev/null @@ -1,82 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: -lower-buckyball | \ -// RUN: FileCheck %s - -// Spec: -// Purpose: verify correctness of DMA module's long read/write operations -// 1. Dynamically generate large input matrix, repeatedly fill with data 0~127 -// 2. Print matrix at target address before move [CHECK1] print result should be all-zero matrix -// 3. Use mvin to read large amount of data from memory into scratchpad -// 4. Use mvout to write large amount of data from scratchpad to memory -// 5. Print matrix at target address after move [CHECK2] print result should be same as input matrix - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - %c0 = arith.constant 0 : index - %c1 = arith.constant 1 : index - %c2 = arith.constant 2 : index - %c16 = arith.constant 16 : index - %c127 = arith.constant 127 : index - // Scratchpad start address - %spadAddr = arith.constant 0 : i64 - - // ========== Row count configuration area (only modify here) ========== - // Total number of rows - %total_rows = arith.constant 1024 : index - // Last two rows start = total_rows - 2 - %last_rows_start = arith.constant 1022 : index - // %offset_elements = arith.constant 16336 : index // Offset = last_rows_start × 16 - // Offset = last_rows_start × 16 - %offset_elements = arith.constant 16352 : index - // ================================================ - - // Allocate large matrix - %arrayA = memref.alloc() {alignment = 16} : memref<1023x16xi8> - %arrayB = memref.alloc() {alignment = 16} : memref<1023x16xi8> - - // Use linalg.fill to initialize arrayB to 0 - linalg.fill ins(%0 : i8) outs(%arrayB : memref<1023x16xi8>) - - // Dynamically fill arrayA: repeatedly fill with 0~127 - scf.for %i = %c0 to %total_rows step %c1 { - scf.for %j = %c0 to %c16 step %c1 { - %row_offset = arith.muli %i, %c16 : index - %linear_idx = arith.addi %row_offset, %j : index - %mod_val = arith.remui %linear_idx, %c127 : index - %val = arith.index_cast %mod_val : index to i8 - memref.store %val, %arrayA[%i, %j] : memref<1023x16xi8> - } - } - - // Print first two rows and last two rows of input matrix - %arrayA_head = memref.subview %arrayA[0, 0] [2, 16] [1, 1] : memref<1023x16xi8> to memref<2x16xi8, strided<[16, 1]>> - %arrayA_tail = memref.subview %arrayA[1021, 0] [2, 16] [1, 1] : memref<1023x16xi8> to memref<2x16xi8, strided<[16, 1], offset: 16336>> - buckyball.print %arrayA_head : memref<2x16xi8, strided<[16, 1]>> - buckyball.print %arrayA_tail : memref<2x16xi8, strided<[16, 1], offset: 16336>> - - // Print first two rows and last two rows of target matrix before move [CHECK1] - %arrayB_head = memref.subview %arrayB[0, 0] [2, 16] [1, 1] : memref<1023x16xi8> to memref<2x16xi8, strided<[16, 1]>> - %arrayB_tail = memref.subview %arrayB[1021, 0] [2, 16] [1, 1] : memref<1023x16xi8> to memref<2x16xi8, strided<[16, 1], offset: 16336>> - buckyball.print %arrayB_head : memref<2x16xi8, strided<[16, 1]>> - buckyball.print %arrayB_tail : memref<2x16xi8, strided<[16, 1], offset: 16336>> - - // Execute long read/write operations - // Step 1: long read - read large amount of data from memory into scratchpad - // CHECK: mvin - buckyball.bb_mvin %arrayA %spadAddr : memref<1023x16xi8> i64 - - // Step 2: long write - write large amount of data from scratchpad to memory - // CHECK: mvout - buckyball.bb_mvout %arrayB %spadAddr : memref<1023x16xi8> i64 - - // Print first two rows and last two rows of output matrix after move [CHECK2] - %arrayB_head_after = memref.subview %arrayB[0, 0] [2, 16] [1, 1] : memref<1023x16xi8> to memref<2x16xi8, strided<[16, 1]>> - %arrayB_tail_after = memref.subview %arrayB[1021, 0] [2, 16] [1, 1] : memref<1023x16xi8> to memref<2x16xi8, strided<[16, 1], offset: 16336>> - buckyball.print %arrayB_head_after : memref<2x16xi8, strided<[16, 1]>> - buckyball.print %arrayB_tail_after : memref<2x16xi8, strided<[16, 1], offset: 16336>> - - // Release allocated memory - memref.dealloc %arrayA : memref<1023x16xi8> - memref.dealloc %arrayB : memref<1023x16xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/toy/bb_mul_warp16.mlir b/bb-tests/workloads/src/OpTest/toy/bb_mul_warp16.mlir deleted file mode 100644 index 5e246f7a..00000000 --- a/bb-tests/workloads/src/OpTest/toy/bb_mul_warp16.mlir +++ /dev/null @@ -1,77 +0,0 @@ -// Test for bb_mul_warp16: 16x16 * 16x16 = 16x16 matrix multiplication -// RUN: %run - -// Matrix A: 16x16 (identity-like matrix for easier verification) -memref.global "private" @matrix_a : memref<16x16xi8> = dense<[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [-1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [-1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [-1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], - [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], - [-1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], - [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], - [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], - [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], - [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], - [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], - [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]> - -// Matrix B: 16x16 (test matrix with simple pattern) -memref.global "private" @matrix_b : memref<16x16xi8> = dense<[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]> - -memref.global "private" @val : memref<1x1xi8> = dense<[[1]]> - -func.func @main() -> i8 { - %0 = arith.constant 0 : i8 - - // Scratchpad addresses - %spadAddr_a = arith.constant 0: i64 // Matrix A (16x16) at spad address 0-15 - %spadAddr_b = arith.constant 4096 : i64 // Matrix B (16x16) at spad address 16-31 - %spadAddr_c = arith.constant 8192 : i64 // Result C (16x16) at spad address 32-47 - // Get global matrices - %arrayA = memref.get_global @matrix_a : memref<16x16xi8> - %arrayB = memref.get_global @matrix_b : memref<16x16xi8> - %arrayC = memref.alloc() : memref<16x16xi8> - %val = memref.get_global @val : memref<1x1xi8> - // Initialize result matrix C to zero - - // MVIN: Move matrices to scratchpad - buckyball.bb_mvin %arrayA %spadAddr_a : memref<16x16xi8> i64 - buckyball.bb_mvin %arrayB %spadAddr_b : memref<16x16xi8> i64 - buckyball.print %val : memref<1x1xi8> - // WARP16: Matrix multiplication (16x16 * 16x16) - // aSpAddr = 0 (Matrix A address) - // bSpAddr = 20 (Matrix B address) - // cSpAddr = 40 (Result C address) - // nLen = 16 (matrix dimension) - %nLen = arith.constant 16 : i64 - buckyball.bb_mul_warp16 %spadAddr_a %spadAddr_b %spadAddr_c %nLen : i64 i64 i64 i64 - // MVOUT: Move result from scratchpad to memory - buckyball.print %val : memref<1x1xi8> - buckyball.bb_mvout %arrayC %spadAddr_c : memref<16x16xi8> i64 - - // Print result matrix - // Expected: Since A is identity matrix, result should be same as matrix B - buckyball.print %arrayC : memref<16x16xi8> - - memref.dealloc %arrayC : memref<16x16xi8> - return %0 : i8 -} diff --git a/bb-tests/workloads/src/OpTest/toy/bb_mvin_mvout.mlir b/bb-tests/workloads/src/OpTest/toy/bb_mvin_mvout.mlir deleted file mode 100644 index aaec6cb9..00000000 --- a/bb-tests/workloads/src/OpTest/toy/bb_mvin_mvout.mlir +++ /dev/null @@ -1,59 +0,0 @@ -// RUN: buddy-opt %s \ -// RUN: -lower-buckyball | \ -// RUN: FileCheck %s - -// Spec: -// Purpose: correctness of mvin and mvout instructions -// 1. Print input matrix -// 2. Print matrix at target address before move [CHECK] print result should be all-zero matrix -// 3. Use mvin to move data from memory to scratchpad -// 4. Use mvout to move data from scratchpad back to output memory -// 5. Print matrix at target address after move [CHECK] print result should be same as input matrix - -// Matrix B: 4x16 (test matrix with simple pattern) -memref.global "private" @input_matrix : memref<4x16xi8> = dense<[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]> - -// // === Check if current hartid is 5, otherwise exit === -func.func @main() -> i8 { -// %hartid = llvm.inline_asm "csrr $0, mhartid", "=r" : () -> i32 -// // buckyball.print_scalar %hartid : i32 - -// %target_hart = arith.constant 5 : i32 -// %is_correct_hart = arith.cmpi eq, %hartid, %target_hart : i32 -// cf.cond_br %is_correct_hart, ^continue, ^exit - -// ^exit: -// // If not hart 5, return 0 -// %error = arith.constant -1 : i8 -// return %error : i8 - -// ^continue: - buckyball.multicore - // %hartid = llvm.inline_asm "csrr $0, mhartid", "=r" : () -> i32 - // buckyball.print_scalar %hartid : i32 - // === Main program === - %0 = arith.constant 0 : i8 - %spadAddr = arith.constant 1040 : i64 - %arrayA = memref.get_global @input_matrix : memref<4x16xi8> - %arrayB = memref.alloc() : memref<4x16xi8> - // Use mvin to move data from memory to scratchpad - // CHECK: mvin - buckyball.bb_mvin %arrayA %spadAddr : memref<4x16xi8> i64 - // Use mvout to move data from scratchpad back to output memory - // CHECK: mvout - buckyball.bb_mvout %arrayB %spadAddr : memref<4x16xi8> i64 - // Print moved output matrix - buckyball.print %arrayB : memref<4x16xi8> - // Release allocated memory - memref.dealloc %arrayB : memref<4x16xi8> - - // exit - %exit_code = arith.constant 0 : i32 - func.call @exit(%exit_code) : (i32) -> () - llvm.unreachable -} - -func.func private @exit(i32) -> () diff --git a/bb-tests/workloads/src/tutorial/CMakeLists.txt b/bb-tests/workloads/src/tutorial/CMakeLists.txt index 273011f6..5eb6ba04 100644 --- a/bb-tests/workloads/src/tutorial/CMakeLists.txt +++ b/bb-tests/workloads/src/tutorial/CMakeLists.txt @@ -3,33 +3,40 @@ project(tutorial C) #------------------------------------------------------------------------------- # build linux version workload #------------------------------------------------------------------------------- -set(LINK_FLAGS "-static -Wl,--no-dynamic-linker") -set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LINK_FLAGS}") +set(CMAKE_C_COMPILER "riscv64-unknown-linux-gnu-gcc") +set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=rv64gc") -set(CMAKE_C_COMPILER "riscv64-unknown-linux-gnu-g++") add_executable(tutorial-linux ${TUTORIAL_WORKLOAD_DIR}/tutorial.c) -add_custom_target(tutorial-linux-build ALL DEPENDS - tutorial-linux +add_custom_target(tutorial-linux-build ALL DEPENDS tutorial-linux COMMENT "Building linux workloads for tutorial" VERBATIM) #------------------------------------------------------------------------------- -# build baremetal version workload +# build baremetal version workload (BBSimHarness, same flags as toy) #------------------------------------------------------------------------------- set(ELF_CC "riscv64-unknown-elf-gcc") +set(BBSIM_LD ${CTEST_TOY_WORKLOAD_DIR}/bbsim.ld) -set(C_FLAGS -std=c11 -g -fno-common -O2 -static - -fno-builtin-printf -specs=htif_nano.specs) +set(C_FLAGS -std=gnu11 -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany + -fno-builtin-printf -specs=nano.specs -specs=nosys.specs -nostartfiles + -Wl,-T,${BBSIM_LD} -I${WORKLOAD_LIB_DIR}) add_custom_target(tutorial-baremetal-build ALL - COMMAND ${ELF_CC} ${C_FLAGS} -o tutorial-baremetal ${TUTORIAL_WORKLOAD_DIR}/tutorial.c - DEPENDS ${TUTORIAL_WORKLOAD_DIR}/tutorial.c + COMMAND ${ELF_CC} ${C_FLAGS} + -o tutorial-baremetal + ${CTEST_TOY_WORKLOAD_DIR}/crt0.S + ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c + ${TUTORIAL_WORKLOAD_DIR}/tutorial.c + DEPENDS + ${TUTORIAL_WORKLOAD_DIR}/tutorial.c + ${CTEST_TOY_WORKLOAD_DIR}/crt0.S + ${CTEST_TOY_WORKLOAD_DIR}/buckyball.c COMMENT "Building baremetal workloads for tutorial" VERBATIM) #------------------------------------------------------------------------------- -# build all version workload +# build all #------------------------------------------------------------------------------- add_custom_target(tutorial-build ALL DEPENDS tutorial-linux-build diff --git a/bbdev b/bbdev new file mode 160000 index 00000000..8fae4219 --- /dev/null +++ b/bbdev @@ -0,0 +1 @@ +Subproject commit 8fae421972381c16fb66ab82192102245541a3f9 diff --git a/bebop b/bebop new file mode 160000 index 00000000..8c4414a4 --- /dev/null +++ b/bebop @@ -0,0 +1 @@ +Subproject commit 8c4414a474d67962b135b952bd4dc650d543b398 diff --git a/bebop/.gitignore b/bebop/.gitignore deleted file mode 100644 index b66ba4e3..00000000 --- a/bebop/.gitignore +++ /dev/null @@ -1,3 +0,0 @@ -target/ -Cargo.lock -build/ diff --git a/bebop/CMakeLists.txt b/bebop/CMakeLists.txt deleted file mode 100644 index e69de29b..00000000 diff --git a/bebop/README.md b/bebop/README.md deleted file mode 100644 index a4a3c76a..00000000 --- a/bebop/README.md +++ /dev/null @@ -1,25 +0,0 @@ -# bebop -A buckyball emulator written in Rust - - -### Quick start - -1. Activate the virtual environment -``` -source $BUCKYBALL_PATH/env.sh -``` - - - -3. Start the socket server -``` -./scripts/bebop_setup.sh -``` - -4. Run the program -``` -$BUCKYBALL_PATH/bebop/host/spike/riscv-isa-sim/install/bin/spike --extension=bebop --log-commits $BUCKYBALL_PATH/bb-tests/build/workloads/src/OpTest/gemmini/transpose-baremetal 2>/dev/null -``` diff --git a/bebop/bebop/Cargo.toml b/bebop/bebop/Cargo.toml deleted file mode 100644 index eaba853b..00000000 --- a/bebop/bebop/Cargo.toml +++ /dev/null @@ -1,16 +0,0 @@ -[package] -name = "bebop" -version = "0.1.0" -edition = "2021" - -[lib] -name = "bebop" -path = "src/lib.rs" - -[dependencies] -serde = { version = "1.0", features = ["derive"] } -serde_json = "1.0" - -[[bin]] -name = "bebop" -path = "src/bin/bebop.rs" diff --git a/bebop/bebop/rustfmt.toml b/bebop/bebop/rustfmt.toml deleted file mode 100644 index 5de02797..00000000 --- a/bebop/bebop/rustfmt.toml +++ /dev/null @@ -1,23 +0,0 @@ -# Rust code formatting configuration file (stable features only) -# More configuration options: https://rust-lang.github.io/rustfmt/ - -# Maximum width per line -max_width = 120 - -# Hard tab width -tab_spaces = 2 - -# Use field initialization shorthand -use_field_init_shorthand = true - -# Use small layout on small arrays -use_small_heuristics = "Default" - -# Match block trailing comma -match_block_trailing_comma = true - -# Newline style -newline_style = "Unix" - -# Edition -edition = "2021" diff --git a/bebop/bebop/src/bin/bebop.rs b/bebop/bebop/src/bin/bebop.rs deleted file mode 100644 index ee49378b..00000000 --- a/bebop/bebop/src/bin/bebop.rs +++ /dev/null @@ -1,24 +0,0 @@ -/// Bebop - Accelerator Simulator -/// -/// Main executable for running the Bebop accelerator simulator. -/// This program listens for custom instruction requests from Host -/// and simulates accelerator behavior. -use bebop::{log_info, Simulator, StepMode}; -use std::env; - -fn main() -> std::io::Result<()> { - let args: Vec = env::args().collect(); - let step_mode = if args.iter().any(|arg| arg == "--step" || arg == "-s") { - StepMode::Step - } else { - StepMode::Run - }; - - if step_mode == StepMode::Step { - log_info!("Bebop Accelerator Simulator (Step Mode)"); - log_info!("Commands: Enter=step, r=run, q=quit"); - } - - let simulator = Simulator::new("127.0.0.1", 9999, step_mode); - simulator.run() -} diff --git a/bebop/bebop/src/buckyball/config.rs b/bebop/bebop/src/buckyball/config.rs deleted file mode 100644 index 4bb942ff..00000000 --- a/bebop/bebop/src/buckyball/config.rs +++ /dev/null @@ -1,16 +0,0 @@ -#[derive(Clone, Debug)] -pub struct NpuConfig { - pub mem_size: usize, -} - -impl NpuConfig { - pub fn new() -> Self { - Self { mem_size: 1024 } - } -} - -impl Default for NpuConfig { - fn default() -> Self { - Self::new() - } -} diff --git a/bebop/bebop/src/buckyball/frontend/decoder/decoder.rs b/bebop/bebop/src/buckyball/frontend/decoder/decoder.rs deleted file mode 100644 index aa7c11a3..00000000 --- a/bebop/bebop/src/buckyball/frontend/decoder/decoder.rs +++ /dev/null @@ -1,31 +0,0 @@ -use crate::log_backward; - -pub struct Decoder { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub is_fence: bool, -} - -impl Decoder { - pub fn new() -> Self { - Self { funct: 0, xs1: 0, xs2: 0, is_fence: false } - } - - pub fn decode_cmd(&mut self, funct: u32, xs1: u64, xs2: u64) { - self.funct = funct; - self.xs1 = xs1; - self.xs2 = xs2; - self.is_fence = funct == 31; - if self.is_fence { - log_backward!("Fence instruction decoded!"); - } else { - log_backward!("Inst decode!"); - } - } - - pub fn print_status(&self) { - println!(" [Decoder] funct={}, xs1=0x{:x}, xs2=0x{:x}, is_fence={}", - self.funct, self.xs1, self.xs2, self.is_fence); - } -} diff --git a/bebop/bebop/src/buckyball/frontend/decoder/mod.rs b/bebop/bebop/src/buckyball/frontend/decoder/mod.rs deleted file mode 100644 index 32f89d6d..00000000 --- a/bebop/bebop/src/buckyball/frontend/decoder/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -mod decoder; - -pub use decoder::Decoder; diff --git a/bebop/bebop/src/buckyball/frontend/domain_scheduler/domain_scheduler.rs b/bebop/bebop/src/buckyball/frontend/domain_scheduler/domain_scheduler.rs deleted file mode 100644 index dfbd73c6..00000000 --- a/bebop/bebop/src/buckyball/frontend/domain_scheduler/domain_scheduler.rs +++ /dev/null @@ -1,38 +0,0 @@ -use crate::log_backward; - -pub struct DomainScheduler { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub rob_id: u64, - pub domain_id: u64, -} - -impl DomainScheduler { - pub fn new() -> Self { - Self { funct: 0, xs1: 0, xs2: 0, rob_id: 0, domain_id: 0 } - } - - pub fn dispatch_cmd(&mut self, funct: u32, xs1: u64, xs2: u64, rob_id: u64) { - self.funct = funct; - self.xs1 = xs1; - self.xs2 = xs2; - self.rob_id = rob_id; - - // Allocate domain_id based on instruction type: - // - funct=24 (mvin) or funct=25 (mvout) -> domain_id=0 (MemDomain) - // - Other instructions -> domain_id=1 (other domains) - self.domain_id = match funct { - 0 => 0, // unknown instruction - 24 | 25 => 1, // mvin/mvout -> MemDomain - _ => 2, // other instructions - }; - - log_backward!("Inst dispatch (funct={}, domain_id={})!", funct, self.domain_id); - } - - pub fn print_status(&self) { - println!(" [DomainScheduler] funct={}, xs1=0x{:x}, xs2=0x{:x}, rob_id={}, domain_id={}", - self.funct, self.xs1, self.xs2, self.rob_id, self.domain_id); - } -} diff --git a/bebop/bebop/src/buckyball/frontend/domain_scheduler/mod.rs b/bebop/bebop/src/buckyball/frontend/domain_scheduler/mod.rs deleted file mode 100644 index 96d4b93b..00000000 --- a/bebop/bebop/src/buckyball/frontend/domain_scheduler/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -mod domain_scheduler; - -pub use domain_scheduler::DomainScheduler; diff --git a/bebop/bebop/src/buckyball/frontend/frontend.rs b/bebop/bebop/src/buckyball/frontend/frontend.rs deleted file mode 100644 index 5b3df0ad..00000000 --- a/bebop/bebop/src/buckyball/frontend/frontend.rs +++ /dev/null @@ -1,174 +0,0 @@ -use crate::builtin::{Sim, EventQueue}; -use crate::buckyball::top::{MemCmd, CmdResponse}; -use super::{Decoder, Rob, Rs, DomainScheduler}; -use std::sync::mpsc::Sender; -use crate::{log_forward, log_error, log_tpc}; - -pub struct Frontend { - decoder: Decoder, - rob: Rob, - rs: Rs, - domain_scheduler: DomainScheduler, - pub event_queue: EventQueue, - - // Channel to send memory commands to MemDomain - mem_cmd_tx: Option>, - - // Channel to send command responses - cmd_response_tx: Option>, - - pub funct: u32, - pub xs1: u64, - pub xs2: u64, -} - -impl Frontend { - pub fn new() -> Self { - Self { - decoder: Decoder::new(), - rob: Rob::new(), - rs: Rs::new(), - domain_scheduler: DomainScheduler::new(), - event_queue: EventQueue::new(), - mem_cmd_tx: None, - cmd_response_tx: None, - - funct: 0, - xs1: 0, - xs2: 0, - } - } - - /// Set the memory command sender channel - pub fn set_mem_cmd_sender(&mut self, sender: Sender) { - self.mem_cmd_tx = Some(sender); - } - - /// Set the command response sender channel - pub fn set_cmd_response_sender(&mut self, sender: Sender) { - self.cmd_response_tx = Some(sender); - } - - pub fn rocc_cmd(&mut self) { - let funct = self.funct; - let xs1 = self.xs1; - let xs2 = self.xs2; - - self.event_queue.push("RoccCmd", move |frontend: &mut Frontend| { - frontend.decoder.decode_cmd(funct, xs1, xs2); - }); - log_forward!("Inst decode!"); - self.enter_rob(); - } - - pub fn enter_rob(&mut self) { - let funct = self.decoder.funct; - let xs1 = self.decoder.xs1; - let xs2 = self.decoder.xs2; - let is_fence = self.decoder.is_fence; - - self.event_queue.push("EnterRob", move |frontend: &mut Frontend| { - frontend.rob.enter_rob(funct, xs1, xs2, is_fence); - }); - log_forward!("Inst enter rob!"); - - // Fence instructions don't go through dispatch/issue - if !is_fence { - // Normal instructions: send cmd_response immediately after entering ROB - if let Some(ref tx) = self.cmd_response_tx { - let cmd_response = CmdResponse { result: 0 }; - if let Err(e) = tx.send(cmd_response) { - log_error!("Failed to send CmdResponse: {}", e); - } else { - log_tpc!("Normal instruction entered ROB, sending cmd_response immediately"); - } - } - } else { - // For fence, we'll send response when ROB is empty (checked in backward) - log_forward!("Fence instruction - will wait for ROB empty"); - } - - self.dispatch_cmd(); - - } - - pub fn dispatch_cmd(&mut self) { - let funct = self.rob.funct; - let xs1 = self.rob.xs1; - let xs2 = self.rob.xs2; - let rob_id = self.rob.rob_id; - - self.event_queue.push("DispatchCmd", move |frontend: &mut Frontend| { - frontend.domain_scheduler.dispatch_cmd(funct, xs1, xs2, rob_id); - }); - log_forward!("Inst dispatch!"); - self.issue_cmd(); - } - - pub fn issue_cmd(&mut self) { - let funct = self.domain_scheduler.funct; - let xs1 = self.domain_scheduler.xs1; - let xs2 = self.domain_scheduler.xs2; - let rob_id = self.domain_scheduler.rob_id; - let domain_id = self.domain_scheduler.domain_id; - - self.event_queue.push("IssueCmd", move |frontend: &mut Frontend| { - frontend.rs.issue_cmd(funct, xs1, xs2, rob_id, domain_id); - - // Send memory command to MemDomain if it's a memory operation (funct=24 or 25) - if domain_id == 1 { - if let Some(ref tx) = frontend.mem_cmd_tx { - let mem_cmd = MemCmd {funct, xs1, xs2, rob_id, domain_id}; - if let Err(e) = tx.send(mem_cmd) { - log_error!("Failed to send MemCmd to MemDomain: {}", e); - } else { - log_tpc!("Sent MemCmd to MemDomain: funct={}, rob_id={}, domain_id={}", funct, rob_id, domain_id); - } - } - } - }); - log_forward!("Inst issue!"); - } -} - -impl Sim for Frontend { - fn forward(&mut self) { - // self.rocc_cmd(); - // self.enter_rob(); - // self.dispatch_cmd(); - // self.issue_cmd(); - } - - fn backward(&mut self) { - // 反向出栈并处理:处理队列中的所有事件 - // 每个事件函数会接收 &mut self 作为参数 - // 临时取出队列以避免借用冲突 - let mut queue = std::mem::take(&mut self.event_queue); - queue.process_all(self); - self.event_queue = queue; - - // Check if fence instruction is ready (ROB is empty) - if let Some(fence_rob_id) = self.rob.check_fence_ready() { - if let Some(ref tx) = self.cmd_response_tx { - let cmd_response = CmdResponse { result: 0 }; - if let Err(e) = tx.send(cmd_response) { - log_error!("Failed to send CmdResponse for fence: {}", e); - } else { - log_tpc!("Fence instruction completed! fence_rob_id={}, sending cmd_response", fence_rob_id); - } - } - } - } - - fn module_name(&self) -> &str { - "Frontend" - } - - fn print_status(&self) { - println!(" [Frontend] Module Status:"); - self.decoder.print_status(); - self.rob.print_status(); - self.domain_scheduler.print_status(); - self.rs.print_status(); - } -} diff --git a/bebop/bebop/src/buckyball/frontend/mod.rs b/bebop/bebop/src/buckyball/frontend/mod.rs deleted file mode 100644 index 18755af6..00000000 --- a/bebop/bebop/src/buckyball/frontend/mod.rs +++ /dev/null @@ -1,11 +0,0 @@ -mod decoder; -mod frontend; -mod rob; -mod rs; -mod domain_scheduler; - -pub use decoder::Decoder; -pub use frontend::Frontend; -pub use rob::Rob; -pub use rs::Rs; -pub use domain_scheduler::DomainScheduler; diff --git a/bebop/bebop/src/buckyball/frontend/rob/mod.rs b/bebop/bebop/src/buckyball/frontend/rob/mod.rs deleted file mode 100644 index 0c2e91f8..00000000 --- a/bebop/bebop/src/buckyball/frontend/rob/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -mod rob; - -pub use rob::Rob; diff --git a/bebop/bebop/src/buckyball/frontend/rob/rob.rs b/bebop/bebop/src/buckyball/frontend/rob/rob.rs deleted file mode 100644 index edfe8fac..00000000 --- a/bebop/bebop/src/buckyball/frontend/rob/rob.rs +++ /dev/null @@ -1,66 +0,0 @@ -use crate::log_backward; - -pub struct Rob { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub rob_id: u64, - pub is_fence: bool, - pub pending_fence_rob_id: Option, - committed_rob_id: u64, -} - -impl Rob { - pub fn new() -> Self { - Self { - funct: 0, - xs1: 0, - xs2: 0, - rob_id: 0, - is_fence: false, - pending_fence_rob_id: None, - committed_rob_id: 0, - } - } - - pub fn enter_rob(&mut self, funct: u32, xs1: u64, xs2: u64, is_fence: bool) { - self.funct = funct; - self.xs1 = xs1; - self.xs2 = xs2; - self.is_fence = is_fence; - self.rob_id += 1; - - if is_fence { - self.pending_fence_rob_id = Some(self.rob_id); - log_backward!("Fence instruction entered ROB! rob_id={}", self.rob_id); - } else { - log_backward!("Inst enter rob!"); - } - } - - - - pub fn is_empty(&self) -> bool { - self.committed_rob_id >= self.rob_id - } - - pub fn commit_instruction(&mut self) { - self.committed_rob_id += 1; - } - - pub fn check_fence_ready(&mut self) -> Option { - if let Some(fence_rob_id) = self.pending_fence_rob_id { - if self.is_empty() { - log_backward!("Fence instruction ready! ROB is empty, fence_rob_id={}", fence_rob_id); - self.pending_fence_rob_id = None; - return Some(fence_rob_id); - } - } - None - } - - pub fn print_status(&self) { - println!(" [Rob] funct={}, xs1=0x{:x}, xs2=0x{:x}, rob_id={}, is_fence={}, pending_fence={:?}, committed={}", - self.funct, self.xs1, self.xs2, self.rob_id, self.is_fence, self.pending_fence_rob_id, self.committed_rob_id); - } -} diff --git a/bebop/bebop/src/buckyball/frontend/rs/mod.rs b/bebop/bebop/src/buckyball/frontend/rs/mod.rs deleted file mode 100644 index 5a89c733..00000000 --- a/bebop/bebop/src/buckyball/frontend/rs/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -mod rs; - -pub use rs::Rs; diff --git a/bebop/bebop/src/buckyball/frontend/rs/rs.rs b/bebop/bebop/src/buckyball/frontend/rs/rs.rs deleted file mode 100644 index 10a8f31e..00000000 --- a/bebop/bebop/src/buckyball/frontend/rs/rs.rs +++ /dev/null @@ -1,29 +0,0 @@ -use crate::log_backward; - -pub struct Rs { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub rob_id: u64, - pub domain_id: u64, -} - -impl Rs { - pub fn new() -> Self { - Self { funct: 0, xs1: 0, xs2: 0, rob_id: 0, domain_id: 0 } - } - - pub fn issue_cmd(&mut self, funct: u32, xs1: u64, xs2: u64, rob_id: u64, domain_id: u64) { - self.funct = funct; - self.xs1 = xs1; - self.xs2 = xs2; - self.rob_id = rob_id; - self.domain_id = domain_id; - log_backward!("Inst issue!"); - } - - pub fn print_status(&self) { - println!(" [Rs] funct={}, xs1=0x{:x}, xs2=0x{:x}, rob_id={}, domain_id={}", - self.funct, self.xs1, self.xs2, self.rob_id, self.domain_id); - } -} diff --git a/bebop/bebop/src/buckyball/memdomain/bank/bank.rs b/bebop/bebop/src/buckyball/memdomain/bank/bank.rs deleted file mode 100644 index 763b57fd..00000000 --- a/bebop/bebop/src/buckyball/memdomain/bank/bank.rs +++ /dev/null @@ -1,141 +0,0 @@ -// Bank operation state -use crate::log_backward; - -#[derive(Debug, Clone, Copy, PartialEq)] -pub enum BankState { - Idle, - SramReq, // Issuing SRAM request - SramWait, // Waiting for SRAM response -} - -// Bank configuration (unified for all banks) -const NUM_BANKS: usize = 12; // Total number of banks (e.g., 4 SP + 8 ACC) -const BANK_ENTRIES: usize = 4096; // Entries per bank -const ENTRY_BYTES: usize = 16; // 16 bytes per entry (128 bits) - -pub struct Bank { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub bank_id: u32, - - // State machine - pub state: BankState, - - // Cached instruction parameters - pub is_load: bool, - pub is_store: bool, - pub sp_bank: u32, - pub sp_bank_addr: u32, - pub iter: u32, - - // SRAM access tracking - pub sram_count: u32, // Current iteration counter - pub data_buffer: Vec, // Data buffer for streaming - - // Actual storage: Unified banks (NUM_BANKS x BANK_ENTRIES x ENTRY_BYTES) - banks: Vec>>, -} - -impl Bank { - pub fn new() -> Self { - // Initialize all banks: NUM_BANKS x BANK_ENTRIES x ENTRY_BYTES - let banks = (0..NUM_BANKS) - .map(|_| { - (0..BANK_ENTRIES) - .map(|_| vec![0u8; ENTRY_BYTES]) - .collect() - }) - .collect(); - - Self { - funct: 0, - xs1: 0, - xs2: 0, - bank_id: 0, - state: BankState::Idle, - is_load: false, - is_store: false, - sp_bank: 0, - sp_bank_addr: 0, - iter: 0, - sram_count: 0, - data_buffer: Vec::new(), - banks, - } - } - - pub fn bank_read(&mut self, sp_bank: u32, sp_bank_addr: u32, iter: u32) { - self.is_load = false; - self.is_store = true; - self.sp_bank = sp_bank; - self.sp_bank_addr = sp_bank_addr; - self.iter = iter; - - // Initialize state - self.state = BankState::SramReq; - self.sram_count = 0; - self.bank_id += 1; - - // MVOUT: Read data from bank (to DMA) - log_backward!( - "Bank READ: mvout (bank_id={}, bank={}, addr=0x{:x}, iter={})", - self.bank_id, self.sp_bank, self.sp_bank_addr, self.iter - ); - } - - /// Read one entry from bank - pub fn read_entry(&self, bank: u32, addr: u32) -> Vec { - let bank_idx = bank as usize; - let addr_idx = addr as usize; - - if bank_idx < NUM_BANKS && addr_idx < BANK_ENTRIES { - self.banks[bank_idx][addr_idx].clone() - } else { - log_backward!("Bank READ ERROR: addr out of range (bank={}, addr={})", bank, addr); - vec![0u8; ENTRY_BYTES] - } - } - - pub fn bank_write(&mut self, sp_bank: u32, sp_bank_addr: u32, iter: u32) { - self.is_load = true; - self.is_store = false; - self.sp_bank = sp_bank; - self.sp_bank_addr = sp_bank_addr; - self.iter = iter; - - // Initialize state - self.state = BankState::SramReq; - self.sram_count = 0; - self.bank_id += 1; - - // MVIN: Write data to bank (from DMA) - log_backward!( - "Bank WRITE: mvin (bank_id={}, bank={}, addr=0x{:x}, iter={})", - self.bank_id, self.sp_bank, self.sp_bank_addr, self.iter - ); - } - - /// Write one entry to bank - pub fn write_entry(&mut self, bank: u32, addr: u32, data: Vec) { - let bank_idx = bank as usize; - let addr_idx = addr as usize; - - if bank_idx < NUM_BANKS && addr_idx < BANK_ENTRIES { - // Ensure data is correct size - if data.len() == ENTRY_BYTES { - self.banks[bank_idx][addr_idx] = data; - log_backward!("Bank WRITE: bank[{}][0x{:x}] = {} bytes", bank, addr, ENTRY_BYTES); - } else { - log_backward!("Bank WRITE ERROR: data size mismatch (expected {}, got {})", ENTRY_BYTES, data.len()); - } - } else { - log_backward!("Bank WRITE ERROR: addr out of range (bank={}, addr={})", bank, addr); - } - } - - pub fn print_status(&self) { - println!(" [Bank] bank_id={}, state={:?}, sp_bank={}, sp_bank_addr=0x{:x}, iter={}", - self.bank_id, self.state, self.sp_bank, self.sp_bank_addr, self.iter); - } -} diff --git a/bebop/bebop/src/buckyball/memdomain/bank/mod.rs b/bebop/bebop/src/buckyball/memdomain/bank/mod.rs deleted file mode 100644 index d269ac13..00000000 --- a/bebop/bebop/src/buckyball/memdomain/bank/mod.rs +++ /dev/null @@ -1,2 +0,0 @@ -mod bank; -pub use bank::Bank; diff --git a/bebop/bebop/src/buckyball/memdomain/decoder/decoder.rs b/bebop/bebop/src/buckyball/memdomain/decoder/decoder.rs deleted file mode 100644 index d35581cf..00000000 --- a/bebop/bebop/src/buckyball/memdomain/decoder/decoder.rs +++ /dev/null @@ -1,83 +0,0 @@ -use crate::log_backward; - -pub struct Decoder { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - - // Decoded instruction fields - pub is_load: bool, - pub is_store: bool, - pub mem_addr: u64, // Memory address from xs1 (rs1[31:0]) - pub sp_addr: u32, // Scratchpad address (rs2[14:0]) - linear address - pub sp_bank: u32, // Bank ID extracted from sp_addr - pub sp_bank_addr: u32, // Address within bank extracted from sp_addr - pub iter: u32, // Number of iterations (rs2[24:15]) - pub stride: u32, // Stride/col_stride (rs2[33:24]) -} - -impl Decoder { - pub fn new() -> Self { - Self { - funct: 0, - xs1: 0, - xs2: 0, - is_load: false, - is_store: false, - mem_addr: 0, - sp_addr: 0, - sp_bank: 0, - sp_bank_addr: 0, - iter: 0, - stride: 0, - } - } - - pub fn decode_cmd(&mut self, funct: u32, xs1: u64, xs2: u64) { - self.funct = funct; - self.xs1 = xs1; - self.xs2 = xs2; - - // Decode instruction type - self.is_load = funct == 24; // mvin - self.is_store = funct == 25; // mvout - - // Parse xs1: memory address (DRAM address) - // rs1[31:0] = base_dram_addr - self.mem_addr = xs1 & 0xFFFFFFFF; - - // Parse xs2 fields (matching C implementation): - // rs2[14:0] = base_sp_addr (15 bits) - // rs2[24:15] = iter (10 bits) - // rs2[33:24] = col_stride/stride (10 bits) - self.sp_addr = (xs2 & 0x7FFF) as u32; // bits [14:0] - self.iter = ((xs2 >> 15) & 0x3FF) as u32; // bits [24:15] - self.stride = ((xs2 >> 24) & 0x3FF) as u32; // bits [33:24] - - // Parse sp_addr into bank and bank_addr - // Assuming: 12 banks total, 4096 entries per bank - // sp_bank_addr_bits = log2(4096) = 12 - // sp_bank_bits = log2(12) = 4 (rounded up) - const SP_BANK_ADDR_BITS: u32 = 12; - self.sp_bank_addr = self.sp_addr & ((1 << SP_BANK_ADDR_BITS) - 1); // bits [11:0] - self.sp_bank = self.sp_addr >> SP_BANK_ADDR_BITS; // bits [14:12] - - // funct=24: mvin, funct=25: mvout - match funct { - 24 => log_backward!( - "MemDomain decode: mvin (mem_addr=0x{:x}, sp_addr=0x{:x} [bank={}, addr=0x{:x}], iter={}, col_stride={})", - self.mem_addr, self.sp_addr, self.sp_bank, self.sp_bank_addr, self.iter, self.stride - ), - 25 => log_backward!( - "MemDomain decode: mvout (mem_addr=0x{:x}, sp_addr=0x{:x} [bank={}, addr=0x{:x}], iter={}, stride={})", - self.mem_addr, self.sp_addr, self.sp_bank, self.sp_bank_addr, self.iter, self.stride - ), - _ => log_backward!("MemDomain decode: unknown funct={}", funct), - } - } - - pub fn print_status(&self) { - println!(" [Decoder] funct={}, mem_addr=0x{:x}, sp_addr=0x{:x} [bank={}, addr=0x{:x}], iter={}, stride={}, is_load={}, is_store={}", - self.funct, self.mem_addr, self.sp_addr, self.sp_bank, self.sp_bank_addr, self.iter, self.stride, self.is_load, self.is_store); - } -} diff --git a/bebop/bebop/src/buckyball/memdomain/decoder/mod.rs b/bebop/bebop/src/buckyball/memdomain/decoder/mod.rs deleted file mode 100644 index 3df94419..00000000 --- a/bebop/bebop/src/buckyball/memdomain/decoder/mod.rs +++ /dev/null @@ -1,2 +0,0 @@ -mod decoder; -pub use decoder::Decoder; diff --git a/bebop/bebop/src/buckyball/memdomain/memdomain.rs b/bebop/bebop/src/buckyball/memdomain/memdomain.rs deleted file mode 100644 index 38e427c8..00000000 --- a/bebop/bebop/src/buckyball/memdomain/memdomain.rs +++ /dev/null @@ -1,177 +0,0 @@ -use crate::builtin::{Sim, EventQueue}; -use super::{Decoder, Reader, Writer, Bank, OutController}; -use crate::{log_forward}; -use crate::buckyball::top::{DmaRequest, DmaResponse}; -use std::sync::mpsc::{Sender, Receiver}; - -pub struct MemDomain { - decoder: Decoder, - reader: Reader, - writer: Writer, - bank: Bank, - out_ctrl: OutController, - pub event_queue: EventQueue, - - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub rob_id: u32, // ROB ID for tracking instruction completion - - // DMA channels - dma_req_tx: Option>, - dma_resp_rx: Option>, -} - -impl MemDomain { - pub fn new() -> Self { - Self { - decoder: Decoder::new(), - reader: Reader::new(), - writer: Writer::new(), - bank: Bank::new(), - out_ctrl: OutController::new(), - event_queue: EventQueue::new(), - - funct: 0, - xs1: 0, - xs2: 0, - rob_id: 0, - - dma_req_tx: None, - dma_resp_rx: None, - } - } - - pub fn set_dma_channels(&mut self, req_tx: Sender, resp_rx: Receiver) { - self.dma_req_tx = Some(req_tx.clone()); - self.dma_resp_rx = Some(resp_rx); - self.reader.set_dma_sender(req_tx.clone()); - self.writer.set_dma_sender(req_tx); - } - - pub fn mem_cmd(&mut self) { - let funct = self.funct; - let xs1 = self.xs1; - let xs2 = self.xs2; - - println!("[MemDomain] mem_cmd called: funct={}, xs1=0x{:x}, xs2=0x{:x}", funct, xs1, xs2); - - self.event_queue.push("MemCmd", move |memdomain: &mut MemDomain| { - memdomain.decoder.decode_cmd(funct, xs1, xs2); - }); - self.outside_schedule(funct); - } - - pub fn outside_schedule(&mut self, funct: u32) { - let mem_addr = self.decoder.mem_addr; - let is_load = self.decoder.is_load; - let is_store = self.decoder.is_store; - let bank_id = self.decoder.sp_bank; - let sp_bank_addr = self.decoder.sp_bank_addr; - let iter = self.decoder.iter; - let stride = self.decoder.stride; - - println!("[MemDomain] outside_schedule: is_load={}, is_store={}, mem_addr=0x{:x}", is_load, is_store, mem_addr); - - self.event_queue.push("ChooseDma", move |memdomain: &mut MemDomain| { - memdomain.out_ctrl.dma_schedule( bank_id, is_load, is_store, - mem_addr, sp_bank_addr, iter, stride); - }); - log_forward!("MemDomain: Drive DMA (mem_addr=0x{:x}, iter={}, stride={})", - mem_addr, iter, stride); - if funct == 24 { - println!("[MemDomain] Calling dma_read()"); - self.dma_read(); - } else if funct == 25 { - println!("[MemDomain] Calling dma_write()"); - self.dma_write(); - } else { - println!("[MemDomain] Not Calling dma"); - } - } - - pub fn dma_read(&mut self) { - // MVIN: DMA read from memory - let mem_addr = self.out_ctrl.mem_addr; - let iter = self.out_ctrl.iter; - let stride = self.out_ctrl.stride; - - println!("[MemDomain] dma_read: pushing DmaRead event to queue (mem_addr=0x{:x})", mem_addr); - - self.event_queue.push("DmaRead", move |memdomain: &mut MemDomain| { - println!("[MemDomain] DmaRead event executing, calling reader.dma_read()"); - memdomain.reader.dma_read(mem_addr, iter, stride); - }); - log_forward!("MemDomain: DMA read request"); - self.bank_write(); - } - - pub fn dma_write(&mut self) { - // MVOUT: First read from bank, then DMA write to memory - self.bank_read(); - - // MVOUT: DMA write to memory after reading from bank - let mem_addr = self.out_ctrl.mem_addr; - let iter = self.out_ctrl.iter; - let stride = self.out_ctrl.stride; - - self.event_queue.push("DmaWrite", move |memdomain: &mut MemDomain| { - memdomain.writer.dma_write(mem_addr, iter, stride); - }); - log_forward!("MemDomain: DMA write request"); - } - - pub fn bank_read(&mut self) { - // MVOUT: Read data from bank - let bank_id = self.out_ctrl.bank_id; - let sp_bank_addr = self.out_ctrl.sp_bank_addr; - let iter = self.out_ctrl.iter; - - self.event_queue.push("BankRead", move |memdomain: &mut MemDomain| { - memdomain.bank.bank_read(bank_id, sp_bank_addr, iter); - }); - log_forward!("MemDomain: Bank read for mvout"); - } - - pub fn bank_write(&mut self) { - // MVIN: Write data to bank from DMA - let bank_id = self.out_ctrl.bank_id; - let sp_bank_addr = self.out_ctrl.sp_bank_addr; - let iter = self.out_ctrl.iter; - - self.event_queue.push("BankWrite", move |memdomain: &mut MemDomain| { - memdomain.bank.bank_write(bank_id, sp_bank_addr, iter); - }); - log_forward!("MemDomain: Bank write for mvin"); - } - -} - -impl Sim for MemDomain { - fn forward(&mut self) { - // Forward phase is triggered by external command - // Actual forward logic is in mem_cmd() - } - - fn backward(&mut self) { - // Process all events in the queue (LIFO) - println!("[MemDomain] backward: processing event queue"); - let mut queue = std::mem::take(&mut self.event_queue); - queue.process_all(self); - self.event_queue = queue; - println!("[MemDomain] backward: event queue processed"); - } - - fn module_name(&self) -> &str { - "MemDomain" - } - - fn print_status(&self) { - println!(" [MemDomain] Module Status:"); - self.decoder.print_status(); - self.out_ctrl.print_status(); - self.reader.print_status(); - self.writer.print_status(); - self.bank.print_status(); - } -} diff --git a/bebop/bebop/src/buckyball/memdomain/mod.rs b/bebop/bebop/src/buckyball/memdomain/mod.rs deleted file mode 100644 index 279181ac..00000000 --- a/bebop/bebop/src/buckyball/memdomain/mod.rs +++ /dev/null @@ -1,13 +0,0 @@ -mod decoder; -mod reader; -mod writer; -mod bank; -mod out_controller; -mod memdomain; - -pub use decoder::Decoder; -pub use reader::Reader; -pub use writer::Writer; -pub use bank::Bank; -pub use out_controller::OutController; -pub use memdomain::MemDomain; diff --git a/bebop/bebop/src/buckyball/memdomain/out_controller/controller.rs b/bebop/bebop/src/buckyball/memdomain/out_controller/controller.rs deleted file mode 100644 index f66db95b..00000000 --- a/bebop/bebop/src/buckyball/memdomain/out_controller/controller.rs +++ /dev/null @@ -1,52 +0,0 @@ -use crate::log_backward; - -pub struct OutController { - pub bank_id: u32, - - // Completion tracking - pub is_load: bool, - pub is_store: bool, - pub mem_addr: u64, - pub sp_bank_addr: u32, - pub iter: u32, - pub stride: u32, - pub task_complete: bool, - pub rob_id: u32, // ROB ID for tracking instruction completion -} - -impl OutController { - pub fn new() -> Self { - Self { - bank_id: 0, - is_load: false, - is_store: false, - mem_addr: 0, - sp_bank_addr: 0, - iter: 0, - stride: 0, - task_complete: false, - rob_id: 0, - } - } - - pub fn dma_schedule(&mut self, bank_id: u32, is_load: bool, is_store: bool, - mem_addr: u64, sp_bank_addr: u32, iter: u32, stride: u32) { - - self.bank_id = bank_id; - - self.is_load = is_load; - self.is_store = is_store; - - self.mem_addr = mem_addr; - self.sp_bank_addr = sp_bank_addr; - self.iter = iter; - self.stride = stride; - } - - pub fn print_status(&self) { - println!(" [OutController] bank_id={}, task_complete={}, rob_id={}, op={}, mem_addr=0x{:x}, sp_bank_addr=0x{:x}, iter={}, stride={}", - self.bank_id, self.task_complete, self.rob_id, - if self.is_load { "mvin" } else if self.is_store { "mvout" } else { "idle" }, - self.mem_addr, self.sp_bank_addr, self.iter, self.stride); - } -} diff --git a/bebop/bebop/src/buckyball/memdomain/out_controller/mod.rs b/bebop/bebop/src/buckyball/memdomain/out_controller/mod.rs deleted file mode 100644 index 0c547d8a..00000000 --- a/bebop/bebop/src/buckyball/memdomain/out_controller/mod.rs +++ /dev/null @@ -1,2 +0,0 @@ -mod controller; -pub use controller::OutController; diff --git a/bebop/bebop/src/buckyball/memdomain/reader/mod.rs b/bebop/bebop/src/buckyball/memdomain/reader/mod.rs deleted file mode 100644 index 1661cf05..00000000 --- a/bebop/bebop/src/buckyball/memdomain/reader/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -mod reader; - -pub use reader::Reader; diff --git a/bebop/bebop/src/buckyball/memdomain/reader/reader.rs b/bebop/bebop/src/buckyball/memdomain/reader/reader.rs deleted file mode 100644 index edfb1594..00000000 --- a/bebop/bebop/src/buckyball/memdomain/reader/reader.rs +++ /dev/null @@ -1,60 +0,0 @@ -use crate::log_backward; -use crate::buckyball::top::DmaRequest; -use std::sync::mpsc::Sender; - -pub struct Reader { - pub mem_addr: u64, // Memory address from xs1 (rs1[31:0]) - // pub bank_id: u32, // Bank ID extracted from sp_addr - // pub bank_addr: u32, // Address within bank extracted from sp_addr - pub iter: u32, // Number of iterations (rs2[24:15]) - pub stride: u32, // Stride/col_stride (rs2[33:24]) - - dma_req_tx: Option>, -} - -impl Reader { - pub fn new() -> Self { - Self { - mem_addr: 0, - iter: 0, - stride: 0, - dma_req_tx: None, - } - } - - pub fn set_dma_sender(&mut self, sender: Sender) { - self.dma_req_tx = Some(sender); - } - - pub fn dma_read(&mut self, mem_addr: u64, iter: u32, stride: u32) { - self.mem_addr = mem_addr; - self.iter = iter; - self.stride = stride; - - println!("[Reader] dma_read called: mem_addr=0x{:x}, iter={}, stride={}", mem_addr, iter, stride); - - // Send DMA read request via channel - if let Some(ref tx) = self.dma_req_tx { - println!("[Reader] dma_req_tx channel exists, creating DmaRequest::Read"); - // For now, read 8 bytes (64-bit word) - if mem_addr == 0 { - eprintln!("[Reader] ERROR: mem_addr is 0!"); - return; - } - let req = DmaRequest::Read { addr: mem_addr, size: 8 }; - println!("[Reader] Sending DmaRequest::Read to channel..."); - if let Err(e) = tx.send(req) { - eprintln!("[Reader] ERROR: Failed to send DMA read request: {}", e); - } else { - println!("[Reader] SUCCESS: Sent DMA read request to channel: addr=0x{:x}, size=8", mem_addr); - } - } else { - println!("[Reader] ERROR: dma_req_tx channel is None!"); - } - } - - pub fn print_status(&self) { - println!(" [Reader] mem_addr=0x{:x}, iter={}, stride={}", - self.mem_addr, self.iter, self.stride); - } -} diff --git a/bebop/bebop/src/buckyball/memdomain/writer/mod.rs b/bebop/bebop/src/buckyball/memdomain/writer/mod.rs deleted file mode 100644 index d93ae306..00000000 --- a/bebop/bebop/src/buckyball/memdomain/writer/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -mod writer; - -pub use writer::Writer; diff --git a/bebop/bebop/src/buckyball/memdomain/writer/writer.rs b/bebop/bebop/src/buckyball/memdomain/writer/writer.rs deleted file mode 100644 index fbcb827e..00000000 --- a/bebop/bebop/src/buckyball/memdomain/writer/writer.rs +++ /dev/null @@ -1,53 +0,0 @@ -use crate::log_backward; -use crate::buckyball::top::DmaRequest; -use std::sync::mpsc::Sender; - -pub struct Writer { - pub mem_addr: u64, // Memory address from xs1 (rs1[31:0]) - pub bank_id: u32, // Bank ID extracted from sp_addr - pub bank_addr: u32, // Address within bank extracted from sp_addr - pub iter: u32, // Number of iterations (rs2[24:15]) - pub stride: u32, // Stride/col_stride (rs2[33:24]) - - dma_req_tx: Option>, -} - -impl Writer { - pub fn new() -> Self { - Self { - mem_addr: 0, - bank_id: 0, - bank_addr: 0, - iter: 0, - stride: 0, - dma_req_tx: None, - } - } - - pub fn set_dma_sender(&mut self, sender: Sender) { - self.dma_req_tx = Some(sender); - } - - pub fn dma_write(&mut self, mem_addr: u64, iter: u32, stride: u32) { - self.mem_addr = mem_addr; - self.iter = iter; - self.stride = stride; - - // Send DMA write request via channel - if let Some(ref tx) = self.dma_req_tx { - // For now, write dummy data (0x0) as 8 bytes (64-bit word) - // In a real implementation, this would read data from the bank - let req = DmaRequest::Write { addr: mem_addr, data: 0x0, size: 8 }; - if let Err(e) = tx.send(req) { - eprintln!("[Writer] Failed to send DMA write request: {}", e); - } else { - println!("[Writer] Sent DMA write request: addr=0x{:x}, data=0x0, size=8", mem_addr); - } - } - } - - pub fn print_status(&self) { - println!(" [Writer] mem_addr=0x{:x}, bank_id={}, bank_addr=0x{:x}, iter={}, stride={}", - self.mem_addr, self.bank_id, self.bank_addr, self.iter, self.stride); - } -} diff --git a/bebop/bebop/src/buckyball/mod.rs b/bebop/bebop/src/buckyball/mod.rs deleted file mode 100644 index 49b7f6af..00000000 --- a/bebop/bebop/src/buckyball/mod.rs +++ /dev/null @@ -1,16 +0,0 @@ -/// Buckyball - Core NPU modules -/// -/// - `top` - 顶层模块 -/// - `config` - NPU 配置 -/// - `frontend` - 前端模块 -/// - `memdomain` - 内存域模块 - -pub mod config; -pub mod frontend; -pub mod memdomain; -pub mod top; - -pub use crate::builtin::Sim; -pub use config::NpuConfig; -pub use memdomain::MemDomain; -pub use top::Top; diff --git a/bebop/bebop/src/buckyball/top.rs b/bebop/bebop/src/buckyball/top.rs deleted file mode 100644 index 5d4c8ee1..00000000 --- a/bebop/bebop/src/buckyball/top.rs +++ /dev/null @@ -1,163 +0,0 @@ -/// Top Module - NPU top-level -use crate::builtin::Sim; -use crate::buckyball::frontend::Frontend; -use crate::buckyball::memdomain::MemDomain; -use std::sync::mpsc::{channel, Receiver, Sender}; -use std::sync::{Arc, Mutex}; -use crate::log_tpc; - -/// RoCC command from host -pub struct RoccCmd { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, -} - -/// Command response notification (sent from Frontend to CmdHandler) -pub struct CmdResponse { - pub result: u64, -} - -/// Memory command sent from Frontend to MemDomain -#[derive(Debug, Clone)] -pub struct MemCmd { - pub funct: u32, - pub xs1: u64, - pub xs2: u64, - pub rob_id: u64, - pub domain_id: u64, -} - -/// DMA request types -#[derive(Debug, Clone)] -pub enum DmaRequest { - Read { addr: u64, size: u32 }, - Write { addr: u64, data: u64, size: u32 }, -} - -/// DMA response types -#[derive(Debug, Clone)] -pub enum DmaResponse { - ReadComplete { data: u64 }, - WriteComplete, - Error(String), -} - -/// Top - NPU top-level module -pub struct Top { - name: String, - frontend: Frontend, - memdomain: MemDomain, - - // RoCC command channel (host -> Top) - cmd_rx: Receiver, - cmd_tx: Sender, - - // Memory command channel (Frontend -> MemDomain) - #[allow(dead_code)] - mem_cmd_tx: Sender, - mem_cmd_rx: Receiver, - - // Command response channel (Frontend -> CmdHandler) - pub cmd_response_rx: Arc>>, - cmd_response_tx: Sender, - - // DMA request/response channels (MemDomain <-> ConnectionHandler) - pub dma_req_rx: Arc>>, - pub dma_resp_tx: Arc>>, -} - -impl Top { - pub fn new(name: impl Into) -> Self { - let (cmd_tx, cmd_rx) = channel(); - let (mem_cmd_tx, mem_cmd_rx) = channel(); - let (cmd_response_tx, cmd_response_rx) = channel(); - let (dma_req_tx, dma_req_rx) = channel(); - let (dma_resp_tx, dma_resp_rx) = channel(); - - let mut frontend = Frontend::new(); - frontend.set_mem_cmd_sender(mem_cmd_tx.clone()); - frontend.set_cmd_response_sender(cmd_response_tx.clone()); - - let mut memdomain = MemDomain::new(); - memdomain.set_dma_channels(dma_req_tx, dma_resp_rx); - - Self { - name: name.into(), - frontend, - memdomain, - cmd_rx, - cmd_tx, - mem_cmd_tx, - mem_cmd_rx, - cmd_response_rx: Arc::new(Mutex::new(cmd_response_rx)), - cmd_response_tx, - dma_req_rx: Arc::new(Mutex::new(dma_req_rx)), - dma_resp_tx: Arc::new(Mutex::new(dma_resp_tx)), - } - } - - pub fn frontend_cmd(&mut self, funct: u32, xs1: u64, xs2: u64) { - self.frontend.funct = funct; - self.frontend.xs1 = xs1; - self.frontend.xs2 = xs2; - self.frontend.rocc_cmd(); - } - - pub fn memdomain_cmd(&mut self, funct: u32, xs1: u64, xs2: u64, rob_id: u64, domain_id: u64) { - self.memdomain.funct = funct; - self.memdomain.xs1 = xs1; - self.memdomain.xs2 = xs2; - self.memdomain.mem_cmd(); - } - - /// Get a sender for sending RoCC commands - pub fn get_cmd_sender(&self) -> Sender { - self.cmd_tx.clone() - } - - /// Get command response receiver for CmdHandler - pub fn get_cmd_response_receiver(&self) -> Arc>> { - self.cmd_response_rx.clone() - } - - /// Get DMA channel handles for ConnectionHandler - pub fn get_dma_channels(&self) -> (Arc>>, Arc>>) { - (self.dma_req_rx.clone(), self.dma_resp_tx.clone()) - } -} - -impl Sim for Top { - fn forward(&mut self) { - // 1. Receive RoCC command from host - if let Ok(rocc_cmd) = self.cmd_rx.try_recv() { - self.frontend_cmd(rocc_cmd.funct, rocc_cmd.xs1, rocc_cmd.xs2); - } - - // 2. Check if Frontend sent memory commands to MemDomain - if let Ok(mem_cmd) = self.mem_cmd_rx.try_recv() { - log_tpc!("[Top] Received MemCmd from Frontend: funct={}, xs1=0x{:x}, xs2=0x{:x}, rob_id={}, domain_id={}", - mem_cmd.funct, mem_cmd.xs1, mem_cmd.xs2, mem_cmd.rob_id, mem_cmd.domain_id); - self.memdomain_cmd(mem_cmd.funct, mem_cmd.xs1, mem_cmd.xs2, mem_cmd.rob_id, mem_cmd.domain_id); - } - - // Debug: print queue status - self.frontend.event_queue.print_status("Frontend"); - self.memdomain.event_queue.print_status("MemDomain"); - } - - fn backward(&mut self) { - self.frontend.backward(); - self.memdomain.backward(); - - // Print all module status - println!("\n=== Module Status ==="); - self.frontend.print_status(); - self.memdomain.print_status(); - println!("====================\n"); - } - - fn module_name(&self) -> &str { - &self.name - } -} diff --git a/bebop/bebop/src/builtin/event.rs b/bebop/bebop/src/builtin/event.rs deleted file mode 100644 index 8066739f..00000000 --- a/bebop/bebop/src/builtin/event.rs +++ /dev/null @@ -1,91 +0,0 @@ -use std::collections::VecDeque; - -/// 事件结构:包含事件名称和回调函数 -struct EventItem { - name: String, - callback: Box, -} - -/// 事件队列,用于模块内部的事件管理 -/// 泛型参数 T 是事件回调函数接收的上下文类型(通常是模块自身) -/// 正向压栈(push),反向出栈(pop)并执行回调函数 -pub struct EventQueue { - queue: VecDeque>, -} - -impl EventQueue { - /// 创建新的事件队列 - pub fn new() -> Self { - Self { - queue: VecDeque::new(), - } - } - - /// 正向压栈:将事件函数压入队列尾部 - /// 事件函数接收一个可变引用参数,可以访问和修改模块状态 - pub fn push(&mut self, name: impl Into, event: F) - where - F: FnOnce(&mut T) + 'static, - { - self.queue.push_back(EventItem { - name: name.into(), - callback: Box::new(event), - }); - } - - /// 反向出栈并处理:从队列尾部弹出事件(LIFO)并执行函数 - pub fn pop_and_process(&mut self, context: &mut T) -> bool { - if let Some(event_item) = self.queue.pop_back() { - (event_item.callback)(context); // 执行事件函数,传入上下文 - true - } else { - false - } - } - - /// 处理队列中的所有事件 - pub fn process_all(&mut self, context: &mut T) { - while self.pop_and_process(context) {} - } - - /// 获取队列长度 - pub fn len(&self) -> usize { - self.queue.len() - } - - /// 检查队列是否为空 - pub fn is_empty(&self) -> bool { - self.queue.is_empty() - } - - /// 清空队列 - pub fn clear(&mut self) { - self.queue.clear(); - } - - /// 打印队列内部状态 - pub fn print_status(&self, module_name: &str) { - println!("╔═══════════════════════════════════════════════════════════════="); - println!("║ EventQueue Status - {}", module_name); - println!("╠═══════════════════════════════════════════════════════════════="); - println!("║ Queue Length: {}", self.queue.len()); - println!("║ Is Empty: {}", self.queue.is_empty()); - println!("║ Capacity: {}", self.queue.capacity()); - println!("╠═══════════════════════════════════════════════════════════════="); - if self.queue.is_empty() { - println!("║ [Queue is empty]"); - } else { - println!("║ Events in queue (from front to back):"); - for (idx, event_item) in self.queue.iter().enumerate() { - println!("║ [{}] {}", idx, event_item.name); - } - } - println!("╚═══════════════════════════════════════════════════════════════="); - } -} - -impl Default for EventQueue { - fn default() -> Self { - Self::new() - } -} diff --git a/bebop/bebop/src/builtin/interface.rs b/bebop/bebop/src/builtin/interface.rs deleted file mode 100644 index 3fc3ed25..00000000 --- a/bebop/bebop/src/builtin/interface.rs +++ /dev/null @@ -1,49 +0,0 @@ -/// 调用 Interface 的函数指针的宏 -/// -/// # 示例 -/// ``` -/// // 调用静态函数 -/// call_interface!(interface, fn(u32, u64, u64), arg1, arg2, arg3); -/// -/// // 调用方法(需要传入 self) -/// call_interface!(interface, fn(&mut Self, u32, u64, u64), &mut obj, arg1, arg2, arg3); -/// ``` -#[macro_export] -macro_rules! call_interface { - ($interface:expr, fn($($arg_type:ty),*) $(-> $ret:ty)?, $($arg:expr),*) => { - if !$interface.function.is_null() { - unsafe { - let f: fn($($arg_type),*) $(-> $ret)? = std::mem::transmute($interface.function); - f($($arg),*) - } - } - }; -} - -pub struct Interface { - pub name: String, - pub latency: u32, - pub ready: fn() -> bool, - pub function: *const (), -} - -impl Interface { - pub fn new(name: impl Into, latency: u32) -> Self { - fn default_ready() -> bool { true } - - Self { - name: name.into(), - latency, - ready: default_ready, - function: std::ptr::null(), - } - } - - pub fn ready(&self) -> bool { - (self.ready)() - } - - pub fn set_function(&mut self, f: *const ()) { - self.function = f; - } -} diff --git a/bebop/bebop/src/builtin/mod.rs b/bebop/bebop/src/builtin/mod.rs deleted file mode 100644 index 5962a0fb..00000000 --- a/bebop/bebop/src/builtin/mod.rs +++ /dev/null @@ -1,9 +0,0 @@ -pub mod module; -pub mod sim; -pub mod interface; -pub mod event; - -pub use module::Module; -pub use sim::Sim; -pub use interface::Interface; -pub use event::EventQueue; diff --git a/bebop/bebop/src/builtin/module.rs b/bebop/bebop/src/builtin/module.rs deleted file mode 100644 index 25482ae6..00000000 --- a/bebop/bebop/src/builtin/module.rs +++ /dev/null @@ -1,3 +0,0 @@ -pub trait Module { - fn module_name(&self) -> &str; -} diff --git a/bebop/bebop/src/builtin/sim.rs b/bebop/bebop/src/builtin/sim.rs deleted file mode 100644 index 24dc43bf..00000000 --- a/bebop/bebop/src/builtin/sim.rs +++ /dev/null @@ -1,11 +0,0 @@ -pub trait Sim { - fn module_name(&self) -> &str; - fn forward(&mut self); - fn backward(&mut self); - - /// Optional method to print module status - /// Override this to print module-specific state - fn print_status(&self) { - // Default: do nothing - } -} diff --git a/bebop/bebop/src/lib.rs b/bebop/bebop/src/lib.rs deleted file mode 100644 index 113d36ce..00000000 --- a/bebop/bebop/src/lib.rs +++ /dev/null @@ -1,19 +0,0 @@ -/// Bebop - Accelerator simulator for RISC-V host -/// -/// This library provides socket-based communication between host (RISC-V ISA simulator) -/// and custom accelerator implementations. -#[macro_use] -pub mod builtin; -pub mod buckyball; -#[macro_use] -pub mod simulator; - -// Re-export log configuration functions for convenience -pub use simulator::utils::log_config::{ - set_forward_log, set_backward_log, - is_forward_log_enabled, is_backward_log_enabled, - enable_all_logs, disable_all_logs, -}; - -pub use buckyball::{NpuConfig, Top}; -pub use simulator::{Simulator, StepMode}; diff --git a/bebop/bebop/src/simulator/mod.rs b/bebop/bebop/src/simulator/mod.rs deleted file mode 100644 index 4aca9248..00000000 --- a/bebop/bebop/src/simulator/mod.rs +++ /dev/null @@ -1,7 +0,0 @@ -/// Simulator module -#[macro_use] -pub mod utils; -pub mod server; -pub mod simulator; - -pub use simulator::{Simulator, StepMode}; diff --git a/bebop/bebop/src/simulator/server/cmd.rs b/bebop/bebop/src/simulator/server/cmd.rs deleted file mode 100644 index d83692ed..00000000 --- a/bebop/bebop/src/simulator/server/cmd.rs +++ /dev/null @@ -1,80 +0,0 @@ -/// CMD channel handler - receives RoCC commands from host and sends to Top -use std::io::{Read, Write}; -use std::net::TcpStream; -use std::sync::mpsc::{Sender, Receiver}; -use std::sync::{Arc, Mutex}; - -use crate::buckyball::top::{RoccCmd, CmdResponse}; -use super::socket::cmd::{CmdReq, CmdResp}; -use super::dma::DmaHandler; - -pub struct CmdHandler { - cmd_tx: Sender, - cmd_response_rx: Arc>>, -} - -impl CmdHandler { - pub fn new(cmd_tx: Sender, cmd_response_rx: Arc>>) -> Self { - Self { cmd_tx, cmd_response_rx } - } - - /// Handle a single CMD request - pub fn handle_cmd_request(&mut self, stream: &mut TcpStream, msg_type_val: u32, dma_handler: &mut DmaHandler) -> std::io::Result<()> { - // Read remaining CMD request bytes (already read 4 bytes for msg_type) - let mut remaining_bytes = [0u8; CmdReq::SIZE - 4]; - stream.read_exact(&mut remaining_bytes)?; - - // Reconstruct full message - let mut msg_bytes = [0u8; CmdReq::SIZE]; - msg_bytes[0..4].copy_from_slice(&msg_type_val.to_le_bytes()); - msg_bytes[4..].copy_from_slice(&remaining_bytes); - - // Parse CMD request - let cmd_req = CmdReq::from_bytes(&msg_bytes); - let funct = cmd_req.funct; - let xs1 = cmd_req.xs1; - let xs2 = cmd_req.xs2; - log_ipc!("[CMD] Received: funct={}, xs1=0x{:016x}, xs2=0x{:016x}", funct, xs1, xs2); - - // Forward command to Top via channel - let rocc_cmd = RoccCmd { funct, xs1, xs2 }; - if let Err(e) = self.cmd_tx.send(rocc_cmd) { - log_error!("[CMD] Failed to send to Top: {}", e); - return Err(std::io::Error::new(std::io::ErrorKind::BrokenPipe, e)); - } - - // Wait for cmd_response from Frontend - // While waiting, send any pending DMA requests - let result = loop { - // Try to send DMA requests (non-blocking) - dma_handler.try_send_dma_request(stream)?; - - // Try to receive cmd_response (non-blocking) - let response_rx = self.cmd_response_rx.lock().unwrap(); - match response_rx.try_recv() { - Ok(cmd_response) => { - log_ipc!("[CMD] Received cmd_response from Frontend: result=0x{:016x}", cmd_response.result); - break cmd_response.result; - } - Err(std::sync::mpsc::TryRecvError::Empty) => { - // No response yet, continue loop - drop(response_rx); - std::thread::sleep(std::time::Duration::from_micros(100)); - continue; - } - Err(std::sync::mpsc::TryRecvError::Disconnected) => { - log_error!("[CMD] cmd_response channel disconnected"); - return Err(std::io::Error::new(std::io::ErrorKind::BrokenPipe, "cmd_response channel disconnected")); - } - } - }; - - // Send CMD response to host - let cmd_resp = CmdResp::new(result); - let resp_bytes = cmd_resp.to_bytes(); - stream.write_all(&resp_bytes)?; - log_ipc!("[CMD] Sent response to host: result=0x{:016x}\n", result); - - Ok(()) - } -} diff --git a/bebop/bebop/src/simulator/server/dma.rs b/bebop/bebop/src/simulator/server/dma.rs deleted file mode 100644 index a5316c26..00000000 --- a/bebop/bebop/src/simulator/server/dma.rs +++ /dev/null @@ -1,110 +0,0 @@ -/// DMA channel handler - processes DMA requests from MemDomain and sends to host via TCP -use std::io::{Read, Write}; -use std::net::TcpStream; -use std::sync::mpsc::{Receiver, Sender}; -use std::sync::{Arc, Mutex}; - -use crate::buckyball::top::{DmaRequest, DmaResponse}; -use super::socket::read::{DmaReadReq, DmaReadResp}; -use super::socket::write::{DmaWriteReq, DmaWriteResp}; - -pub struct DmaHandler { - dma_req_rx: Arc>>, - dma_resp_tx: Arc>>, -} - -impl DmaHandler { - pub fn new( - dma_req_rx: Arc>>, - dma_resp_tx: Arc>>, - ) -> Self { - Self { - dma_req_rx, - dma_resp_tx, - } - } - - /// Try to send a DMA request to host (non-blocking check) - /// Returns true if a request was sent, false if no request available - pub fn try_send_dma_request(&mut self, stream: &mut TcpStream) -> std::io::Result { - // Non-blocking check for DMA request from MemDomain - let rx = self.dma_req_rx.lock().unwrap(); - match rx.try_recv() { - Ok(dma_req) => { - drop(rx); // Release lock before I/O - - match dma_req { - DmaRequest::Read { addr, size } => { - println!("[DMA] Sending Read request: addr=0x{:x}, size={}", addr, size); - let req = DmaReadReq::new(addr, size); - let req_bytes = req.to_bytes(); - stream.write_all(&req_bytes)?; - log_ipc!("[DMA] Read request sent: addr=0x{:x}, size={}", addr, size); - } - DmaRequest::Write { addr, data, size } => { - println!("[DMA] Sending Write request: addr=0x{:x}, data=0x{:x}, size={}", addr, data, size); - let req = DmaWriteReq::new(addr, data, size); - let req_bytes = req.to_bytes(); - stream.write_all(&req_bytes)?; - log_ipc!("[DMA] Write request sent: addr=0x{:x}, data=0x{:x}, size={}", addr, data, size); - } - } - Ok(true) - } - Err(std::sync::mpsc::TryRecvError::Empty) => { - Ok(false) // No request available - } - Err(std::sync::mpsc::TryRecvError::Disconnected) => { - Err(std::io::Error::new( - std::io::ErrorKind::BrokenPipe, - "DMA request channel disconnected", - )) - } - } - } - - /// Handle DMA read response from host - pub fn handle_dma_read_response(&mut self, stream: &mut TcpStream, msg_type_val: u32) -> std::io::Result<()> { - // Read remaining DMA read response bytes - let mut remaining_bytes = [0u8; DmaReadResp::SIZE - 4]; - stream.read_exact(&mut remaining_bytes)?; - - // Reconstruct full message - let mut msg_bytes = [0u8; DmaReadResp::SIZE]; - msg_bytes[0..4].copy_from_slice(&msg_type_val.to_le_bytes()); - msg_bytes[4..].copy_from_slice(&remaining_bytes); - - let resp = DmaReadResp::from_bytes(&msg_bytes); - let data = resp.data; - log_ipc!("[DMA] Read response received: data=0x{:x}\n", data); - - // Send response back to MemDomain via channel - let dma_resp = DmaResponse::ReadComplete { data }; - let tx = self.dma_resp_tx.lock().unwrap(); - if let Err(e) = tx.send(dma_resp) { - log_error!("[DMA] Failed to send response to MemDomain: {}", e); - return Err(std::io::Error::new(std::io::ErrorKind::BrokenPipe, e)); - } - - Ok(()) - } - - /// Handle DMA write response from host - pub fn handle_dma_write_response(&mut self, stream: &mut TcpStream, msg_type_val: u32) -> std::io::Result<()> { - // Read remaining DMA write response bytes - let mut remaining_bytes = [0u8; DmaWriteResp::SIZE - 4]; - stream.read_exact(&mut remaining_bytes)?; - - log_ipc!("[DMA] Write response received\n"); - - // Send response back to MemDomain via channel - let dma_resp = DmaResponse::WriteComplete; - let tx = self.dma_resp_tx.lock().unwrap(); - if let Err(e) = tx.send(dma_resp) { - log_error!("[DMA] Failed to send response to MemDomain: {}", e); - return Err(std::io::Error::new(std::io::ErrorKind::BrokenPipe, e)); - } - - Ok(()) - } -} diff --git a/bebop/bebop/src/simulator/server/mod.rs b/bebop/bebop/src/simulator/server/mod.rs deleted file mode 100644 index c926dd3d..00000000 --- a/bebop/bebop/src/simulator/server/mod.rs +++ /dev/null @@ -1,7 +0,0 @@ -/// Server module for handling host connections -mod cmd; -mod dma; -mod server; -mod socket; - -pub use server::SocketServer; diff --git a/bebop/bebop/src/simulator/server/server.rs b/bebop/bebop/src/simulator/server/server.rs deleted file mode 100644 index d3fb59b2..00000000 --- a/bebop/bebop/src/simulator/server/server.rs +++ /dev/null @@ -1,128 +0,0 @@ -/// TCP server for accepting host connections -use std::net::{TcpListener, TcpStream}; -use std::sync::mpsc::{Sender, Receiver}; -use std::sync::{Arc, Mutex}; -use std::thread; - -use super::cmd::CmdHandler; -use super::dma::DmaHandler; -use super::socket::protocol::MsgType; -use crate::buckyball::top::{RoccCmd, DmaRequest, DmaResponse, CmdResponse}; - -pub struct SocketServer { - host: String, - port: u16, - step_mode: bool, - cmd_tx: Sender, - cmd_response_rx: Arc>>, - dma_req_rx: Arc>>, - dma_resp_tx: Arc>>, -} - -impl SocketServer { - pub fn new( - host: impl Into, - port: u16, - step_mode: bool, - cmd_tx: Sender, - cmd_response_rx: Arc>>, - dma_req_rx: Arc>>, - dma_resp_tx: Arc>>, - ) -> Self { - Self { - host: host.into(), - port, - step_mode, - cmd_tx, - cmd_response_rx, - dma_req_rx, - dma_resp_tx, - } - } - - /// Start the server and listen for connections - pub fn run(&self) -> std::io::Result<()> { - let addr = format!("{}:{}", self.host, self.port); - - log_info!("Socket server starting..."); - log_info!(" Listening on: {}", addr); - log_info!("Waiting for host connections...\n"); - - let listener = TcpListener::bind(&addr)?; - log_info!("[Server] Listener ready on {}", addr); - - for stream in listener.incoming() { - match stream { - Ok(stream) => { - let cmd_tx = self.cmd_tx.clone(); - let cmd_response_rx = self.cmd_response_rx.clone(); - let dma_req_rx = self.dma_req_rx.clone(); - let dma_resp_tx = self.dma_resp_tx.clone(); - - thread::spawn(move || { - if let Err(e) = Self::handle_connection(stream, cmd_tx, cmd_response_rx, dma_req_rx, dma_resp_tx) { - log_error!("[Server] Connection handler error: {}", e); - } - }); - } - Err(e) => { - log_error!("[Server] Connection error: {}", e); - } - } - } - - Ok(()) - } - - /// Handle a single connection, dispatching messages to CMD or DMA handlers - fn handle_connection( - mut stream: TcpStream, - cmd_tx: Sender, - cmd_response_rx: Arc>>, - dma_req_rx: Arc>>, - dma_resp_tx: Arc>>, - ) -> std::io::Result<()> { - use std::io::Read; - - let peer_addr = stream.peer_addr()?; - log_info!("[Server] Connected from: {}", peer_addr); - - let mut cmd_handler = CmdHandler::new(cmd_tx, cmd_response_rx); - let mut dma_handler = DmaHandler::new(dma_req_rx, dma_resp_tx); - - loop { - // Read message from host (blocking) - let mut msg_type_bytes = [0u8; 4]; - match stream.read_exact(&mut msg_type_bytes) { - Ok(_) => { - let msg_type_val = u32::from_le_bytes(msg_type_bytes); - let msg_type = MsgType::from_u32(msg_type_val); - - match msg_type { - Some(MsgType::CmdReq) => { - cmd_handler.handle_cmd_request(&mut stream, msg_type_val, &mut dma_handler)?; - } - Some(MsgType::DmaReadResp) => { - dma_handler.handle_dma_read_response(&mut stream, msg_type_val)?; - } - Some(MsgType::DmaWriteResp) => { - dma_handler.handle_dma_write_response(&mut stream, msg_type_val)?; - } - _ => { - log_error!("[Server] Unknown message type: {}", msg_type_val); - return Err(std::io::Error::new(std::io::ErrorKind::InvalidData, "Unknown message type")); - } - } - } - Err(e) => { - if e.kind() == std::io::ErrorKind::UnexpectedEof { - log_info!("[Server] Client {} disconnected", peer_addr); - return Ok(()); - } else { - return Err(e); - } - } - } - } - } -} diff --git a/bebop/bebop/src/simulator/server/socket/cmd.rs b/bebop/bebop/src/simulator/server/socket/cmd.rs deleted file mode 100644 index f4766d06..00000000 --- a/bebop/bebop/src/simulator/server/socket/cmd.rs +++ /dev/null @@ -1,72 +0,0 @@ -/// CMD protocol definitions -use super::protocol::{MsgHeader, MsgType}; - -/// Command request from client (CMD path) - 24 bytes -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct CmdReq { - pub header: MsgHeader, // 8 bytes - pub funct: u32, // 4 bytes - pub padding: u32, // 4 bytes - pub xs1: u64, // 8 bytes - pub xs2: u64, // 8 bytes -} - -impl CmdReq { - pub const SIZE: usize = 32; // 8 + 4 + 4 + 8 + 8 - - pub fn from_bytes(bytes: &[u8; Self::SIZE]) -> Self { - let msg_type = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]); - let reserved = u32::from_le_bytes([bytes[4], bytes[5], bytes[6], bytes[7]]); - let funct = u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]); - let padding = u32::from_le_bytes([bytes[12], bytes[13], bytes[14], bytes[15]]); - let xs1 = u64::from_le_bytes([ - bytes[16], bytes[17], bytes[18], bytes[19], bytes[20], bytes[21], bytes[22], bytes[23], - ]); - let xs2 = u64::from_le_bytes([ - bytes[24], bytes[25], bytes[26], bytes[27], bytes[28], bytes[29], bytes[30], bytes[31], - ]); - - Self { - header: MsgHeader { msg_type, reserved }, - funct, - padding, - xs1, - xs2, - } - } -} - -/// Command response from server (CMD path) - 16 bytes -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct CmdResp { - pub header: MsgHeader, // 8 bytes - pub result: u64, // 8 bytes -} - -impl CmdResp { - pub const SIZE: usize = 16; - - pub fn new(result: u64) -> Self { - Self { - header: MsgHeader { - msg_type: MsgType::CmdResp as u32, - reserved: 0, - }, - result, - } - } - - pub fn to_bytes(&self) -> [u8; Self::SIZE] { - let mut bytes = [0u8; Self::SIZE]; - bytes[0..4].copy_from_slice(&self.header.msg_type.to_le_bytes()); - bytes[4..8].copy_from_slice(&self.header.reserved.to_le_bytes()); - bytes[8..16].copy_from_slice(&self.result.to_le_bytes()); - bytes - } -} - -// Backward compatibility aliases -pub type SocketMsg = CmdReq; -pub type SocketResp = CmdResp; diff --git a/bebop/bebop/src/simulator/server/socket/mod.rs b/bebop/bebop/src/simulator/server/socket/mod.rs deleted file mode 100644 index 4bf719cb..00000000 --- a/bebop/bebop/src/simulator/server/socket/mod.rs +++ /dev/null @@ -1,8 +0,0 @@ -/// Socket communication module for host-Bebop interface -pub mod cmd; -pub mod protocol; -pub mod read; -pub mod write; - -pub use cmd::{CmdReq, CmdResp, SocketMsg, SocketResp}; -pub use protocol::MsgType; diff --git a/bebop/bebop/src/simulator/server/socket/protocol.rs b/bebop/bebop/src/simulator/server/socket/protocol.rs deleted file mode 100644 index 84b4e7c9..00000000 --- a/bebop/bebop/src/simulator/server/socket/protocol.rs +++ /dev/null @@ -1,36 +0,0 @@ -/// Message protocol definitions for host-Bebop communication -/// Matches the C++ structures in customext/include/socket.h - -/// Message types for socket communication -#[repr(u32)] -#[derive(Debug, Clone, Copy, PartialEq, Eq)] -pub enum MsgType { - CmdReq = 0, // Command request from client - CmdResp = 1, // Command response from server - DmaReadReq = 2, // DMA read request from server - DmaReadResp = 3, // DMA read response from client - DmaWriteReq = 4, // DMA write request from server - DmaWriteResp = 5, // DMA write response from client -} - -impl MsgType { - pub fn from_u32(value: u32) -> Option { - match value { - 0 => Some(MsgType::CmdReq), - 1 => Some(MsgType::CmdResp), - 2 => Some(MsgType::DmaReadReq), - 3 => Some(MsgType::DmaReadResp), - 4 => Some(MsgType::DmaWriteReq), - 5 => Some(MsgType::DmaWriteResp), - _ => None, - } - } -} - -/// Common message header (8 bytes) -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct MsgHeader { - pub msg_type: u32, - pub reserved: u32, -} diff --git a/bebop/bebop/src/simulator/server/socket/read.rs b/bebop/bebop/src/simulator/server/socket/read.rs deleted file mode 100644 index a00671d4..00000000 --- a/bebop/bebop/src/simulator/server/socket/read.rs +++ /dev/null @@ -1,63 +0,0 @@ -/// DMA read protocol definitions -use super::protocol::{MsgHeader, MsgType}; - -/// DMA read request from server (DMA path) - 24 bytes -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct DmaReadReq { - pub header: MsgHeader, // 8 bytes - pub size: u32, // 4 bytes - pub padding: u32, // 4 bytes - pub addr: u64, // 8 bytes -} - -impl DmaReadReq { - pub const SIZE: usize = 24; - - pub fn new(addr: u64, size: u32) -> Self { - Self { - header: MsgHeader { - msg_type: MsgType::DmaReadReq as u32, - reserved: 0, - }, - size, - padding: 0, - addr, - } - } - - pub fn to_bytes(&self) -> [u8; Self::SIZE] { - let mut bytes = [0u8; Self::SIZE]; - bytes[0..4].copy_from_slice(&self.header.msg_type.to_le_bytes()); - bytes[4..8].copy_from_slice(&self.header.reserved.to_le_bytes()); - bytes[8..12].copy_from_slice(&self.size.to_le_bytes()); - bytes[12..16].copy_from_slice(&self.padding.to_le_bytes()); - bytes[16..24].copy_from_slice(&self.addr.to_le_bytes()); - bytes - } -} - -/// DMA read response from client (DMA path) - 16 bytes -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct DmaReadResp { - pub header: MsgHeader, // 8 bytes - pub data: u64, // 8 bytes -} - -impl DmaReadResp { - pub const SIZE: usize = 16; - - pub fn from_bytes(bytes: &[u8; Self::SIZE]) -> Self { - let msg_type = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]); - let reserved = u32::from_le_bytes([bytes[4], bytes[5], bytes[6], bytes[7]]); - let data = u64::from_le_bytes([ - bytes[8], bytes[9], bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15], - ]); - - Self { - header: MsgHeader { msg_type, reserved }, - data, - } - } -} diff --git a/bebop/bebop/src/simulator/server/socket/write.rs b/bebop/bebop/src/simulator/server/socket/write.rs deleted file mode 100644 index 4c24485e..00000000 --- a/bebop/bebop/src/simulator/server/socket/write.rs +++ /dev/null @@ -1,69 +0,0 @@ -/// DMA write protocol definitions -use super::protocol::{MsgHeader, MsgType}; - -/// DMA write request from server (DMA path) - 32 bytes -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct DmaWriteReq { - pub header: MsgHeader, // 8 bytes - pub size: u32, // 4 bytes - pub padding: u32, // 4 bytes - pub addr: u64, // 8 bytes - pub data: u64, // 8 bytes -} - -impl DmaWriteReq { - pub const SIZE: usize = 32; - - pub fn new(addr: u64, data: u64, size: u32) -> Self { - Self { - header: MsgHeader { - msg_type: MsgType::DmaWriteReq as u32, - reserved: 0, - }, - size, - padding: 0, - addr, - data, - } - } - - pub fn to_bytes(&self) -> [u8; Self::SIZE] { - let mut bytes = [0u8; Self::SIZE]; - bytes[0..4].copy_from_slice(&self.header.msg_type.to_le_bytes()); - bytes[4..8].copy_from_slice(&self.header.reserved.to_le_bytes()); - bytes[8..12].copy_from_slice(&self.size.to_le_bytes()); - bytes[12..16].copy_from_slice(&self.padding.to_le_bytes()); - bytes[16..24].copy_from_slice(&self.addr.to_le_bytes()); - bytes[24..32].copy_from_slice(&self.data.to_le_bytes()); - bytes - } -} - -/// DMA write response from client (DMA path) - 16 bytes -#[repr(C, packed)] -#[derive(Debug, Clone, Copy)] -pub struct DmaWriteResp { - pub header: MsgHeader, // 8 bytes - pub reserved: u64, // 8 bytes -} - -impl DmaWriteResp { - pub const SIZE: usize = 16; - - pub fn from_bytes(bytes: &[u8; Self::SIZE]) -> Self { - let msg_type = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]); - let reserved_header = u32::from_le_bytes([bytes[4], bytes[5], bytes[6], bytes[7]]); - let reserved = u64::from_le_bytes([ - bytes[8], bytes[9], bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15], - ]); - - Self { - header: MsgHeader { - msg_type, - reserved: reserved_header, - }, - reserved, - } - } -} diff --git a/bebop/bebop/src/simulator/simulator.rs b/bebop/bebop/src/simulator/simulator.rs deleted file mode 100644 index 746ce4ee..00000000 --- a/bebop/bebop/src/simulator/simulator.rs +++ /dev/null @@ -1,76 +0,0 @@ -/// Accelerator simulator with state management -use super::server::SocketServer; -use crate::buckyball::Top; -use crate::builtin::Sim; -use std::thread; - -/// Execution mode for the simulator -#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] -pub enum StepMode { - #[default] - Run, - Step, -} - -/// Accelerator simulator - drives Top module -pub struct Simulator { - host: String, - port: u16, - step_mode: StepMode, -} - -impl Simulator { - pub fn new(host: impl Into, port: u16, step_mode: StepMode) -> Self { - Self { - host: host.into(), - port, - step_mode, - } - } - - /// Run the simulator server - pub fn run(self) -> std::io::Result<()> { - let step_mode = matches!(self.step_mode, StepMode::Step); - let mut buckyball = Top::new("buckyball"); - let cmd_tx = buckyball.get_cmd_sender(); - let cmd_response_rx = buckyball.get_cmd_response_receiver(); - let (dma_req_rx, dma_resp_tx) = buckyball.get_dma_channels(); - - // Start server in background thread (lower priority) - let server = SocketServer::new(self.host, self.port, step_mode, cmd_tx, cmd_response_rx, dma_req_rx, dma_resp_tx); - thread::spawn(move || { - if let Err(e) = server.run() { - log_error!("Server error: {}", e); - } - }); - - // Run buckyball in main thread (high priority) - log_info!("Buckyball main loop started"); - - if step_mode { - // Step mode: wait for user input (press Enter) before each tick - log_info!("Running in STEP mode - press Enter to tick"); - use std::io::{self, BufRead}; - let stdin = io::stdin(); - let mut lines = stdin.lock().lines(); - - loop { - // Wait for Enter key - if lines.next().is_some() { - buckyball.forward(); - buckyball.backward(); - } else { - break; // EOF or error - } - } - } else { - // Free-running mode: tick as fast as possible - loop { - buckyball.forward(); - buckyball.backward(); - } - } - - Ok(()) - } -} diff --git a/bebop/bebop/src/simulator/utils/log.rs b/bebop/bebop/src/simulator/utils/log.rs deleted file mode 100644 index 769eaaaa..00000000 --- a/bebop/bebop/src/simulator/utils/log.rs +++ /dev/null @@ -1,57 +0,0 @@ -/// Logging utilities with colored output - -/// Print a log message with blue [Log] prefix -#[macro_export] -macro_rules! log_info { - ($($arg:tt)*) => { - println!("\x1b[34m[Log]\x1b[0m {}", format!($($arg)*)); - }; -} - -#[macro_export] -macro_rules! log_ipc { - ($($arg:tt)*) => { - println!("\x1b[32m[IPC]\x1b[0m {}", format!($($arg)*)); - }; -} - -/// Print an error message with red [Error] prefix -#[macro_export] -macro_rules! log_error { - ($($arg:tt)*) => { - eprintln!("\x1b[31m[Error]\x1b[0m {}", format!($($arg)*)); - }; -} - -/// Print an error message with red [Error] prefix -#[macro_export] -macro_rules! log_event { - ($($arg:tt)*) => { - println!("\x1b[33m[Event]\x1b[0m {}", format!($($arg)*)); - }; -} - -#[macro_export] -macro_rules! log_forward { - ($($arg:tt)*) => {{ - if $crate::simulator::utils::log_config::is_forward_log_enabled() { - println!("\x1b[33m[Forward]\x1b[0m {}", format!($($arg)*)); - } - }}; -} - -#[macro_export] -macro_rules! log_backward { - ($($arg:tt)*) => {{ - if $crate::simulator::utils::log_config::is_backward_log_enabled() { - println!("\x1b[33m[Backward]\x1b[0m {}", format!($($arg)*)); - } - }}; -} - -#[macro_export] -macro_rules! log_tpc { - ($($arg:tt)*) => {{ - println!("\x1b[35m[Tpc]\x1b[0m {}", format!($($arg)*)); - }}; -} diff --git a/bebop/bebop/src/simulator/utils/log_config.rs b/bebop/bebop/src/simulator/utils/log_config.rs deleted file mode 100644 index 3e2483c2..00000000 --- a/bebop/bebop/src/simulator/utils/log_config.rs +++ /dev/null @@ -1,38 +0,0 @@ -/// Global logging configuration -use std::sync::atomic::{AtomicBool, Ordering}; - -/// Global flags for controlling log output -static ENABLE_FORWARD_LOG: AtomicBool = AtomicBool::new(true); -static ENABLE_BACKWARD_LOG: AtomicBool = AtomicBool::new(true); - -/// Enable or disable forward phase logging -pub fn set_forward_log(enabled: bool) { - ENABLE_FORWARD_LOG.store(enabled, Ordering::Relaxed); -} - -/// Enable or disable backward phase logging -pub fn set_backward_log(enabled: bool) { - ENABLE_BACKWARD_LOG.store(enabled, Ordering::Relaxed); -} - -/// Check if forward logging is enabled -pub fn is_forward_log_enabled() -> bool { - ENABLE_FORWARD_LOG.load(Ordering::Relaxed) -} - -/// Check if backward logging is enabled -pub fn is_backward_log_enabled() -> bool { - ENABLE_BACKWARD_LOG.load(Ordering::Relaxed) -} - -/// Enable all phase logs -pub fn enable_all_logs() { - set_forward_log(true); - set_backward_log(true); -} - -/// Disable all phase logs -pub fn disable_all_logs() { - set_forward_log(false); - set_backward_log(false); -} diff --git a/bebop/bebop/src/simulator/utils/mod.rs b/bebop/bebop/src/simulator/utils/mod.rs deleted file mode 100644 index 7e0ea083..00000000 --- a/bebop/bebop/src/simulator/utils/mod.rs +++ /dev/null @@ -1,3 +0,0 @@ -#[macro_use] -pub mod log; -pub mod log_config; diff --git a/bebop/host/CMakeLists.txt b/bebop/host/CMakeLists.txt deleted file mode 100644 index 7b543000..00000000 --- a/bebop/host/CMakeLists.txt +++ /dev/null @@ -1,6 +0,0 @@ -cmake_minimum_required(VERSION 3.16) -project(bebop_host LANGUAGES C CXX) - -add_subdirectory(ipc) -add_subdirectory(spike) -# add_subdirectory(gem5) diff --git a/bebop/host/gem5/CMakeLists.txt b/bebop/host/gem5/CMakeLists.txt deleted file mode 100644 index e69de29b..00000000 diff --git a/bebop/host/gem5/bebop.patch b/bebop/host/gem5/bebop.patch deleted file mode 100644 index ee58851c..00000000 --- a/bebop/host/gem5/bebop.patch +++ /dev/null @@ -1,211 +0,0 @@ -diff --git a/SConstruct b/SConstruct -index a5802ad371..b3008bd565 100755 ---- a/SConstruct -+++ b/SConstruct -@@ -198,6 +198,15 @@ main = Environment(tools=[ - main.Tool(SCons.Tool.FindTool(['gcc', 'clang'], main)) - main.Tool(SCons.Tool.FindTool(['g++', 'clang++'], main)) - -+bebop_ipc_lib = environ.get('BEBOP_IPC_LIB') -+bebop_ipc_include = environ.get('BEBOP_IPC_INCLUDE') -+if bebop_ipc_lib: -+ print(f'Linking with bebop IPC library: {bebop_ipc_lib}') -+ main.Append(_LIBFLAGS=[bebop_ipc_lib]) -+if bebop_ipc_include: -+ print(f'Adding bebop IPC include path: {bebop_ipc_include}') -+ main.Append(CPPPATH=[bebop_ipc_include]) -+ - Export('main') - - from gem5_scons.util import get_termcap -diff --git a/src/arch/riscv/faults.cc b/src/arch/riscv/faults.cc -index dc312b5f67..4e58948316 100644 ---- a/src/arch/riscv/faults.cc -+++ b/src/arch/riscv/faults.cc -@@ -31,6 +31,7 @@ - - #include "arch/riscv/faults.hh" - -+#include "arch/riscv/insts/custom.hh" - #include "arch/riscv/insts/static_inst.hh" - #include "arch/riscv/isa.hh" - #include "arch/riscv/mmu.hh" -@@ -286,6 +287,12 @@ void - UnknownInstFault::invokeSE(ThreadContext *tc, const StaticInstPtr &inst) - { - auto *rsi = static_cast(inst.get()); -+ const uint8_t opcode = rsi->machInst.opcode; -+ // Handle custom-3 (opcode 0x7b) which conflicts with M5Op -+ if (opcode == 0x7b) { -+ handleRiscvCustomInstruction(tc, rsi->machInst, inst.get()); -+ return; -+ } - panic("Unknown instruction 0x%08x at pc %s", rsi->machInst, - tc->pcState()); - } -diff --git a/src/arch/riscv/insts/SConscript b/src/arch/riscv/insts/SConscript -index 9694cc1405..8449eb7c80 100644 ---- a/src/arch/riscv/insts/SConscript -+++ b/src/arch/riscv/insts/SConscript -@@ -33,6 +33,7 @@ if not env['CONF']['USE_RISCV_ISA']: - Source('amo.cc', tags=['riscv isa']) - Source('bs.cc', tags=['riscv isa']) - Source('compressed.cc', tags=['riscv isa']) -+Source('custom.cc', tags=['riscv isa']) - Source('mem.cc', tags=['riscv isa']) - Source('standard.cc', tags=['riscv isa']) - Source('static_inst.cc', tags=['riscv isa']) -diff --git a/src/arch/riscv/insts/custom.cc b/src/arch/riscv/insts/custom.cc -new file mode 100644 -index 0000000000..e06548d6ec ---- /dev/null -+++ b/src/arch/riscv/insts/custom.cc -@@ -0,0 +1,83 @@ -+#include "arch/riscv/insts/custom.hh" -+ -+#include "ipc/socket.h" -+#include "arch/riscv/insts/static_inst.hh" -+#include "arch/riscv/pcstate.hh" -+#include "arch/riscv/regs/int.hh" -+#include "debug/Faults.hh" -+#include "mem/se_translating_port_proxy.hh" -+#include "sim/debug.hh" -+ -+namespace gem5 -+{ -+ -+namespace RiscvISA -+{ -+namespace -+{ -+ -+struct RoCCInstFields { -+ unsigned opcode : 7; -+ unsigned rd : 5; -+ unsigned xs2 : 1; -+ unsigned xs1 : 1; -+ unsigned xd : 1; -+ unsigned rs1 : 5; -+ unsigned rs2 : 5; -+ unsigned funct : 7; -+}; -+ -+union RoCCInst { -+ RoCCInstFields r; -+ uint32_t bits; -+}; -+ -+SocketClient &getSocketClient() { -+ static SocketClient client; -+ return client; -+} -+ -+} // anonymous namespace -+ -+void -+handleRiscvCustomInstruction(ThreadContext *tc, ExtMachInst instBits, -+ const StaticInst *inst) -+{ -+ RoCCInst rocc{}; -+ rocc.bits = instBits.instBits; -+ -+ RegVal xs1 = rocc.r.xs1 ? -+ tc->getReg(intRegClass[rocc.r.rs1]) : -+ static_cast(-1); -+ RegVal xs2 = rocc.r.xs2 ? -+ tc->getReg(intRegClass[rocc.r.rs2]) : -+ static_cast(-1); -+ -+ auto read_cb = [tc](uint64_t addr, uint32_t size) -> uint64_t { -+ SETranslatingPortProxy proxy(tc); -+ uint64_t value = 0; -+ proxy.readBlob(addr, reinterpret_cast(&value), size); -+ return value; -+ }; -+ -+ auto write_cb = [tc](uint64_t addr, uint64_t data, uint32_t size) { -+ SETranslatingPortProxy proxy(tc); -+ proxy.writeBlob(addr, reinterpret_cast(&data), size); -+ }; -+ -+ auto &client = getSocketClient(); -+ client.set_dma_callbacks(read_cb, write_cb); -+ uint64_t result = client.send_and_wait(rocc.r.funct, xs1, xs2); -+ client.set_dma_callbacks(dma_read_cb_t(), dma_write_cb_t()); -+ -+ if (rocc.r.xd) -+ tc->setReg(intRegClass[rocc.r.rd], result); -+ -+ auto pc_state = tc->pcState().as(); -+ inst->advancePC(pc_state); -+ tc->pcState(pc_state); -+} -+ -+} // namespace RiscvISA -+} // namespace gem5 -+ -diff --git a/src/arch/riscv/insts/custom.hh b/src/arch/riscv/insts/custom.hh -new file mode 100644 -index 0000000000..3e09b67199 ---- /dev/null -+++ b/src/arch/riscv/insts/custom.hh -@@ -0,0 +1,20 @@ -+#ifndef __ARCH_RISCV_CUSTOM_INST_HH__ -+#define __ARCH_RISCV_CUSTOM_INST_HH__ -+ -+#include "arch/riscv/types.hh" -+#include "cpu/static_inst.hh" -+#include "cpu/thread_context.hh" -+ -+namespace gem5 -+{ -+ -+namespace RiscvISA -+{ -+ -+void handleRiscvCustomInstruction(ThreadContext *tc, ExtMachInst instBits, -+ const StaticInst *inst); -+ -+} // namespace RiscvISA -+} // namespace gem5 -+ -+#endif // __ARCH_RISCV_CUSTOM_INST_HH__ -diff --git a/src/arch/riscv/isa/decoder.isa b/src/arch/riscv/isa/decoder.isa -index 6235b34aee..678c3db2ea 100644 ---- a/src/arch/riscv/isa/decoder.isa -+++ b/src/arch/riscv/isa/decoder.isa -@@ -6360,6 +6360,22 @@ decode QUADRANT default Unknown::unknown() { - } - } - -+ // Custom instructions (bebop extension) -+ // custom-0 (opcode 0x0b), custom-1 (opcode 0x2b), custom-2 (opcode 0x5b) -+ format ROp { -+ 0x02: bebop_custom0({{ -+ handleRiscvCustomInstruction(xc->tcBase(), machInst, this); -+ }}); -+ 0x0a: bebop_custom1({{ -+ handleRiscvCustomInstruction(xc->tcBase(), machInst, this); -+ }}); -+ 0x16: bebop_custom2({{ -+ handleRiscvCustomInstruction(xc->tcBase(), machInst, this); -+ }}); -+ } -+ -+ // M5Op uses 0x1e which conflicts with custom-3 (opcode 0x7b) -+ // Keep M5Op for now, custom-3 handled in unknown fault if needed - 0x1e: M5Op::M5Op(); - } - } -diff --git a/src/arch/riscv/isa/includes.isa b/src/arch/riscv/isa/includes.isa -index b4be0e4ac6..d4f27fb845 100644 ---- a/src/arch/riscv/isa/includes.isa -+++ b/src/arch/riscv/isa/includes.isa -@@ -53,6 +53,7 @@ output header {{ - #include "arch/riscv/insts/pseudo.hh" - #include "arch/riscv/insts/standard.hh" - #include "arch/riscv/insts/static_inst.hh" -+#include "arch/riscv/insts/custom.hh" - #include "arch/riscv/insts/unknown.hh" - #include "arch/riscv/insts/vector.hh" - #include "arch/riscv/insts/zcmp.hh" diff --git a/bebop/host/gem5/gem5 b/bebop/host/gem5/gem5 deleted file mode 160000 index ddd4ae35..00000000 --- a/bebop/host/gem5/gem5 +++ /dev/null @@ -1 +0,0 @@ -Subproject commit ddd4ae35adb0a3df1f1ba11e9a973a5c2f8c2944 diff --git a/bebop/host/gem5/install-gem5.sh b/bebop/host/gem5/install-gem5.sh deleted file mode 100755 index 8c9ed971..00000000 --- a/bebop/host/gem5/install-gem5.sh +++ /dev/null @@ -1,46 +0,0 @@ -#!/usr/bin/env bash - -set -euo pipefail - -SCRIPT_DIR="$(dirname "$(realpath "$0")")" -HOST_ROOT=${SCRIPT_DIR}/.. -GEM5_ROOT=${SCRIPT_DIR}/gem5 -HOST_BUILD=${HOST_ROOT}/build -IPC_BUILD_LIB=${HOST_BUILD}/ipc -IPC_INCLUDE=${HOST_ROOT}/ipc/include - -cmake -S ${HOST_ROOT} -B ${HOST_BUILD} -cmake --build ${HOST_BUILD} --target bebop_ipc -j$(nproc) - - -# Install gem5 and integerate bebop into gem5 -# sudo apt install build-essential git m4 scons zlib1g zlib1g-dev \ -# libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev \ -# python3-dev python-is-python3 libboost-all-dev pkg-config gcc-10 g++-10 \ -# python3-tk clang-format-18 -# cd $ROOT/thirdparty/gem5 -# export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH -# scons build/RISCV/gem5.opt -j $(nproc) LIBS="absl_log_internal_check_op \ - -cd ${GEM5_ROOT} -# Apply the patch to gem5 -git apply ${SCRIPT_DIR}/bebop.patch -# We need to update the patch in this way if we make changes to gem5 -# git add -A && git diff --cached > ../bebop.patch - -# Build gem5 -export PKG_CONFIG_PATH=${CONDA_PREFIX:-}/lib/pkgconfig:${PKG_CONFIG_PATH:-} -BEBOP_IPC_LIB=${IPC_BUILD_LIB}/libbebop_ipc.a \ - BEBOP_IPC_INCLUDE=${IPC_INCLUDE} \ - scons build/RISCV/gem5.opt -j $(nproc) \ - LIBS="absl_log_internal_check_op \ - absl_log_internal_conditions \ - absl_log_internal_message \ - absl_base \ - absl_raw_logging_internal \ - absl_strings \ - absl_throw_delegate \ - absl_string_view \ - absl_spinlock_wait \ - absl_int128 \ - absl_log_severity" diff --git a/bebop/host/install.sh b/bebop/host/install.sh deleted file mode 100755 index 245b31ad..00000000 --- a/bebop/host/install.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -set -e - -ROOT="$(dirname "$(realpath "$0")")" - -cd $ROOT -# git submodule update --init - -$ROOT/spike/install-spike.sh -$ROOT/gem5/install-gem5.sh diff --git a/bebop/host/ipc/CMakeLists.txt b/bebop/host/ipc/CMakeLists.txt deleted file mode 100644 index 01b57f18..00000000 --- a/bebop/host/ipc/CMakeLists.txt +++ /dev/null @@ -1,28 +0,0 @@ -set(BEBOP_IPC_SOURCES - src/socket/socket.cc - src/socket/socket_cmd.cc - src/socket/socket_dma.cc -) - -add_library(bebop_ipc STATIC ${BEBOP_IPC_SOURCES}) - -target_include_directories(bebop_ipc - PUBLIC - ${CMAKE_CURRENT_SOURCE_DIR}/include - PRIVATE - $ENV{RISCV}/include -) - -set_target_properties(bebop_ipc PROPERTIES - OUTPUT_NAME "bebop_ipc" - POSITION_INDEPENDENT_CODE ON -) - -set(_bebop_ipc_install_dir ${CMAKE_INSTALL_RPATH}) -if(NOT _bebop_ipc_install_dir) - set(_bebop_ipc_install_dir $ENV{RISCV}/lib) -endif() - -install(TARGETS bebop_ipc - ARCHIVE DESTINATION ${_bebop_ipc_install_dir} -) diff --git a/bebop/host/ipc/include/ipc/socket.h b/bebop/host/ipc/include/ipc/socket.h deleted file mode 100644 index 9705f368..00000000 --- a/bebop/host/ipc/include/ipc/socket.h +++ /dev/null @@ -1,120 +0,0 @@ -#ifndef IPC_SOCKET_H_ -#define IPC_SOCKET_H_ - -#include -#include - -// Socket configuration -#define SOCKET_PORT 9999 -#define SOCKET_HOST "127.0.0.1" - -// Message types for socket communication -enum socket_msg_type_t : uint32_t { - MSG_TYPE_CMD_REQ = 0, // Command request from client - MSG_TYPE_CMD_RESP = 1, // Command response from server - MSG_TYPE_DMA_READ_REQ = 2, // DMA read request from server - MSG_TYPE_DMA_READ_RESP = 3, // DMA read response from client - MSG_TYPE_DMA_WRITE_REQ = 4, // DMA write request from server - MSG_TYPE_DMA_WRITE_RESP = 5 // DMA write response from client -}; - -// Common message header -struct msg_header_t { - uint32_t msg_type; // socket_msg_type_t - uint32_t reserved; -}; - -// Command request from client (CMD path) -struct cmd_req_t { - msg_header_t header; // header.msg_type = MSG_TYPE_CMD_REQ - uint32_t funct; - uint32_t padding; - uint64_t xs1; - uint64_t xs2; -}; - -// Command response from server (CMD path) -struct cmd_resp_t { - msg_header_t header; // header.msg_type = MSG_TYPE_CMD_RESP - uint64_t result; -}; - -// DMA read request from server (DMA path) -struct dma_read_req_t { - msg_header_t header; // header.msg_type = MSG_TYPE_DMA_READ_REQ - uint32_t size; // Size in bytes (1, 2, 4, or 8) - uint32_t padding; - uint64_t addr; // Memory address -}; - -// DMA read response from client (DMA path) -struct dma_read_resp_t { - msg_header_t header; // header.msg_type = MSG_TYPE_DMA_READ_RESP - uint64_t data; // Read data -}; - -// DMA write request from server (DMA path) -struct dma_write_req_t { - msg_header_t header; // header.msg_type = MSG_TYPE_DMA_WRITE_REQ - uint32_t size; // Size in bytes (1, 2, 4, or 8) - uint32_t padding; - uint64_t addr; // Memory address - uint64_t data; // Write data -}; - -// DMA write response from client (DMA path) -struct dma_write_resp_t { - msg_header_t header; // header.msg_type = MSG_TYPE_DMA_WRITE_RESP - uint64_t reserved; // Reserved for future use -}; - -using dma_read_cb_t = std::function; -using dma_write_cb_t = - std::function; - -// Socket client class -class SocketClient { -public: - SocketClient(); - ~SocketClient(); - - // Initialize and connect to socket server - bool init(); - - // Close socket connection - void close(); - - // Register DMA callbacks - void set_dma_callbacks(dma_read_cb_t read_cb, dma_write_cb_t write_cb); - - // Send request and wait for response (handles DMA requests during wait) - uint64_t send_and_wait(uint32_t funct, uint64_t xs1, uint64_t xs2); - - // Check if socket is connected - bool is_connected() const { return socket_initialized; } - -private: - int sock_fd; - bool socket_initialized; - dma_read_cb_t dma_read_cb; - dma_write_cb_t dma_write_cb; - - // CMD path functions - bool send_cmd_request(const cmd_req_t &req); - bool recv_cmd_response(cmd_resp_t &resp); - - // DMA path functions - bool recv_dma_read_request(dma_read_req_t &req); - bool send_dma_read_response(const dma_read_resp_t &resp); - bool recv_dma_write_request(dma_write_req_t &req); - bool send_dma_write_response(const dma_write_resp_t &resp); - - // Low-level recv/send - bool recv_header(msg_header_t &header); - - // DMA handlers - uint64_t handle_dma_read(uint64_t addr, uint32_t size); - void handle_dma_write(uint64_t addr, uint64_t data, uint32_t size); -}; - -#endif // IPC_SOCKET_H_ diff --git a/bebop/host/ipc/src/socket/socket.cc b/bebop/host/ipc/src/socket/socket.cc deleted file mode 100644 index 87316791..00000000 --- a/bebop/host/ipc/src/socket/socket.cc +++ /dev/null @@ -1,173 +0,0 @@ -#include "ipc/socket.h" -#include -#include -#include -#include -#include -#include - -SocketClient::SocketClient() : sock_fd(-1), socket_initialized(false) {} - -SocketClient::~SocketClient() { close(); } - -bool SocketClient::init() { - if (socket_initialized) { - return true; - } - - sock_fd = socket(AF_INET, SOCK_STREAM, 0); - if (sock_fd < 0) { - fprintf(stderr, "Socket: Failed to create socket\n"); - return false; - } - - struct sockaddr_in server_addr; - memset(&server_addr, 0, sizeof(server_addr)); - server_addr.sin_family = AF_INET; - server_addr.sin_port = htons(SOCKET_PORT); - - if (inet_pton(AF_INET, SOCKET_HOST, &server_addr.sin_addr) <= 0) { - fprintf(stderr, "Socket: Invalid address/Address not supported\n"); - ::close(sock_fd); - sock_fd = -1; - return false; - } - - if (connect(sock_fd, (struct sockaddr *)&server_addr, sizeof(server_addr)) < - 0) { - fprintf(stderr, "Socket: Connection failed to %s:%d\n", SOCKET_HOST, - SOCKET_PORT); - ::close(sock_fd); - sock_fd = -1; - return false; - } - - socket_initialized = true; - printf("Socket: Connected to %s:%d\n", SOCKET_HOST, SOCKET_PORT); - return true; -} - -void SocketClient::close() { - if (sock_fd >= 0) { - ::close(sock_fd); - sock_fd = -1; - } - socket_initialized = false; -} - -void SocketClient::set_dma_callbacks(dma_read_cb_t read_cb, - dma_write_cb_t write_cb) { - dma_read_cb = std::move(read_cb); - dma_write_cb = std::move(write_cb); -} - -// Receive message header (peek first to get type) -bool SocketClient::recv_header(msg_header_t &header) { - if (sock_fd < 0) { - fprintf(stderr, "Socket: Not connected\n"); - return false; - } - - ssize_t received = recv(sock_fd, &header, sizeof(header), MSG_PEEK); - - if (received < 0) { - fprintf(stderr, "Socket: Failed to peek header\n"); - close(); - return false; - } else if (received == 0) { - fprintf(stderr, "Socket: Connection closed by remote\n"); - close(); - return false; - } - - return true; -} - -uint64_t SocketClient::send_and_wait(uint32_t funct, uint64_t xs1, - uint64_t xs2) { - // Auto-connect if not connected - if (!socket_initialized) { - if (!init()) { - return 0; - } - } - - // Prepare and send CMD request - cmd_req_t cmd_req; - cmd_req.header.msg_type = MSG_TYPE_CMD_REQ; - cmd_req.header.reserved = 0; - cmd_req.funct = funct; - cmd_req.padding = 0; - cmd_req.xs1 = xs1; - cmd_req.xs2 = xs2; - - if (!send_cmd_request(cmd_req)) { - return 0; - } - - // Loop to handle responses (CMD response or DMA requests) - while (true) { - // Peek message header to determine type - msg_header_t header; - if (!recv_header(header)) { - return 0; - } - - // Handle based on message type - if (header.msg_type == MSG_TYPE_CMD_RESP) { - // Receive CMD response - cmd_resp_t cmd_resp; - if (!recv_cmd_response(cmd_resp)) { - return 0; - } - return cmd_resp.result; - - } else if (header.msg_type == MSG_TYPE_DMA_READ_REQ) { - // Receive DMA read request - dma_read_req_t dma_read_req; - if (!recv_dma_read_request(dma_read_req)) { - return 0; - } - - // Handle DMA read - uint64_t read_data = - handle_dma_read(dma_read_req.addr, dma_read_req.size); - - // Send DMA read response - dma_read_resp_t dma_read_resp; - dma_read_resp.header.msg_type = MSG_TYPE_DMA_READ_RESP; - dma_read_resp.header.reserved = 0; - dma_read_resp.data = read_data; - - if (!send_dma_read_response(dma_read_resp)) { - return 0; - } - - } else if (header.msg_type == MSG_TYPE_DMA_WRITE_REQ) { - // Receive DMA write request - dma_write_req_t dma_write_req; - if (!recv_dma_write_request(dma_write_req)) { - return 0; - } - - // Handle DMA write - handle_dma_write(dma_write_req.addr, dma_write_req.data, - dma_write_req.size); - - // Send DMA write response - dma_write_resp_t dma_write_resp; - dma_write_resp.header.msg_type = MSG_TYPE_DMA_WRITE_RESP; - dma_write_resp.header.reserved = 0; - dma_write_resp.reserved = 0; - - if (!send_dma_write_response(dma_write_resp)) { - return 0; - } - - } else { - fprintf(stderr, "Socket: Unknown message type %d\n", header.msg_type); - close(); - return 0; - } - } -} diff --git a/bebop/host/ipc/src/socket/socket_cmd.cc b/bebop/host/ipc/src/socket/socket_cmd.cc deleted file mode 100644 index 88b65c3b..00000000 --- a/bebop/host/ipc/src/socket/socket_cmd.cc +++ /dev/null @@ -1,45 +0,0 @@ -#include "ipc/socket.h" -#include -#include - -// CMD path: send command request -bool SocketClient::send_cmd_request(const cmd_req_t &req) { - if (sock_fd < 0) { - fprintf(stderr, "Socket: Not connected, cannot send CMD request\n"); - return false; - } - - fprintf(stderr, "Socket: Sending CMD request: sizeof(req)=%zu, funct=%u\n", - sizeof(req), req.funct); - ssize_t sent = send(sock_fd, &req, sizeof(req), 0); - if (sent < 0) { - fprintf(stderr, "Socket: Failed to send CMD request\n"); - close(); - return false; - } - fprintf(stderr, "Socket: Sent %zd bytes\n", sent); - - return true; -} - -// CMD path: receive command response -bool SocketClient::recv_cmd_response(cmd_resp_t &resp) { - if (sock_fd < 0) { - fprintf(stderr, "Socket: Not connected, cannot receive CMD response\n"); - return false; - } - - ssize_t received = recv(sock_fd, &resp, sizeof(resp), 0); - - if (received < 0) { - fprintf(stderr, "Socket: Failed to receive CMD response\n"); - close(); - return false; - } else if (received == 0) { - fprintf(stderr, "Socket: Connection closed by remote\n"); - close(); - return false; - } - - return true; -} diff --git a/bebop/host/ipc/src/socket/socket_dma.cc b/bebop/host/ipc/src/socket/socket_dma.cc deleted file mode 100644 index d7febc4f..00000000 --- a/bebop/host/ipc/src/socket/socket_dma.cc +++ /dev/null @@ -1,104 +0,0 @@ -#include "ipc/socket.h" -#include -#include - -// DMA path: receive DMA read request -bool SocketClient::recv_dma_read_request(dma_read_req_t &req) { - if (sock_fd < 0) { - fprintf(stderr, "Socket: Not connected, cannot receive DMA read request\n"); - return false; - } - - ssize_t received = recv(sock_fd, &req, sizeof(req), 0); - - if (received < 0) { - fprintf(stderr, "Socket: Failed to receive DMA read request\n"); - close(); - return false; - } else if (received == 0) { - fprintf(stderr, "Socket: Connection closed by remote\n"); - close(); - return false; - } - - return true; -} - -// DMA path: send DMA read response -bool SocketClient::send_dma_read_response(const dma_read_resp_t &resp) { - if (sock_fd < 0) { - fprintf(stderr, "Socket: Not connected, cannot send DMA read response\n"); - return false; - } - - ssize_t sent = send(sock_fd, &resp, sizeof(resp), 0); - if (sent < 0) { - fprintf(stderr, "Socket: Failed to send DMA read response\n"); - close(); - return false; - } - - return true; -} - -// DMA path: receive DMA write request -bool SocketClient::recv_dma_write_request(dma_write_req_t &req) { - if (sock_fd < 0) { - fprintf(stderr, - "Socket: Not connected, cannot receive DMA write request\n"); - return false; - } - - ssize_t received = recv(sock_fd, &req, sizeof(req), 0); - - if (received < 0) { - fprintf(stderr, "Socket: Failed to receive DMA write request\n"); - close(); - return false; - } else if (received == 0) { - fprintf(stderr, "Socket: Connection closed by remote\n"); - close(); - return false; - } - - return true; -} - -// DMA path: send DMA write response -bool SocketClient::send_dma_write_response(const dma_write_resp_t &resp) { - if (sock_fd < 0) { - fprintf(stderr, "Socket: Not connected, cannot send DMA write response\n"); - return false; - } - - ssize_t sent = send(sock_fd, &resp, sizeof(resp), 0); - if (sent < 0) { - fprintf(stderr, "Socket: Failed to send DMA write response\n"); - close(); - return false; - } - - return true; -} - -// DMA handlers -uint64_t SocketClient::handle_dma_read(uint64_t addr, uint32_t size) { - if (!dma_read_cb) { - fprintf(stderr, "Socket: DMA read callback not set\n"); - return 0; - } - uint64_t value = dma_read_cb(addr, size); - printf("Socket: DMA read addr=0x%lx size=%d value=0x%lx\n", addr, size, - value); - return value; -} - -void SocketClient::handle_dma_write(uint64_t addr, uint64_t data, - uint32_t size) { - if (!dma_write_cb) { - fprintf(stderr, "Socket: DMA write callback not set\n"); - return; - } - dma_write_cb(addr, data, size); - printf("Socket: DMA write addr=0x%lx size=%d data=0x%lx\n", addr, size, data); -} diff --git a/bebop/host/spike/CMakeLists.txt b/bebop/host/spike/CMakeLists.txt deleted file mode 100644 index 0059c828..00000000 --- a/bebop/host/spike/CMakeLists.txt +++ /dev/null @@ -1 +0,0 @@ -add_subdirectory(customext) diff --git a/bebop/host/spike/customext/CMakeLists.txt b/bebop/host/spike/customext/CMakeLists.txt deleted file mode 100644 index 1c4f5c3d..00000000 --- a/bebop/host/spike/customext/CMakeLists.txt +++ /dev/null @@ -1,38 +0,0 @@ -cmake_minimum_required(VERSION 3.10) -project(bebop) - -set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -O3") -set(SPIKE_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/../riscv-isa-sim") -set(SPIKE_PREFIX "${SPIKE_ROOT}/install") - -set(SPIKE_INCLUDE "${SPIKE_PREFIX}/include") -set(SPIKE_LIB_DIR "${SPIKE_PREFIX}/lib") - -link_directories("${SPIKE_LIB_DIR}") -set(CMAKE_INSTALL_RPATH "${SPIKE_LIB_DIR}") -set(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE) - -set(BEBOP_INSTALL_LIB_DIR "${SPIKE_LIB_DIR}") - -add_library(bebop SHARED - src/bebop.cc -) - -if(NOT TARGET bebop_ipc) - add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/../../ipc - ${CMAKE_CURRENT_BINARY_DIR}/ipc) -endif() - - -target_include_directories(bebop PRIVATE - ${SPIKE_INCLUDE} - ${CMAKE_CURRENT_SOURCE_DIR}/include -) - -target_link_libraries(bebop PRIVATE bebop_ipc) - -set_target_properties(bebop PROPERTIES OUTPUT_NAME "bebop") - -install(TARGETS bebop - LIBRARY DESTINATION ${BEBOP_INSTALL_LIB_DIR} -) diff --git a/bebop/host/spike/customext/include/bebop.h b/bebop/host/spike/customext/include/bebop.h deleted file mode 100644 index 425a7f99..00000000 --- a/bebop/host/spike/customext/include/bebop.h +++ /dev/null @@ -1,44 +0,0 @@ -#ifndef _BEBOP_H -#define _BEBOP_H - -#include "common.h" -#include -#include -#include -#include - -#define MAKECUSTOMFN(opcode) custom##opcode -#define CUSTOMFN(opcode) MAKECUSTOMFN(opcode) - -// Forward declaration -class SocketClient; - -struct bebop_state_t { - void reset(); - bool enable; - bool resetted = false; -}; - -class bebop_t : public extension_t { -public: - bebop_t(); - ~bebop_t(); - const char *name() const override { return "bebop"; } - - reg_t CUSTOMFN(XCUSTOM_ACC)(rocc_insn_t insn, reg_t xs1, reg_t xs2); - void set_processor(processor_t *p) { this->p = p; } - std::vector get_instructions(const processor_t &proc) override; - std::vector - get_disasms(const processor_t *proc = nullptr) override; - -private: - bebop_state_t bebop_state; - processor_t *p; - - // Socket client - std::unique_ptr socket_client; - template T read_from_dram(reg_t addr); - template void write_to_dram(reg_t addr, T data); -}; - -#endif // _BEBOP_H diff --git a/bebop/host/spike/customext/include/common.h b/bebop/host/spike/customext/include/common.h deleted file mode 100644 index 706a1269..00000000 --- a/bebop/host/spike/customext/include/common.h +++ /dev/null @@ -1,9 +0,0 @@ -#ifndef _COMMON_H -#define _COMMON_H - -#include -#include - -#define XCUSTOM_ACC 3 - -#endif // _COMMON_H diff --git a/bebop/host/spike/customext/src/bebop.cc b/bebop/host/spike/customext/src/bebop.cc deleted file mode 100644 index 7121d96f..00000000 --- a/bebop/host/spike/customext/src/bebop.cc +++ /dev/null @@ -1,124 +0,0 @@ -#include "bebop.h" -#include "ipc/socket.h" -#include -#include -#include -#include - -using namespace std; - -REGISTER_EXTENSION(bebop, []() { return new bebop_t; }) - -bebop_t::bebop_t() : socket_client(new SocketClient()) {} - -bebop_t::~bebop_t() { - // socket_client will be automatically destroyed -} - -#define dprintf(...) \ - { \ - if (p->get_log_commits_enabled()) \ - printf(__VA_ARGS__); \ - } - -template T bebop_t::read_from_dram(reg_t addr) { - T value = 0; - for (size_t byte_idx = 0; byte_idx < sizeof(T); ++byte_idx) { - value |= p->get_mmu()->load(addr + byte_idx) << (byte_idx * 8); - } - return value; -} - -template void bebop_t::write_to_dram(reg_t addr, T data) { - for (size_t byte_idx = 0; byte_idx < sizeof(T); ++byte_idx) { - p->get_mmu()->store(addr + byte_idx, - (data >> (byte_idx * 8)) & 0xFF); - } -} - -void bebop_state_t::reset() { - enable = true; - resetted = true; -} - -reg_t bebop_t::CUSTOMFN(XCUSTOM_ACC)(rocc_insn_t insn, reg_t xs1, reg_t xs2) { - - if (!bebop_state.resetted) { - bebop_state.reset(); - } - - auto read_cb = [this](uint64_t addr, uint32_t size) -> uint64_t { - switch (size) { - case 1: - return read_from_dram(addr); - case 2: - return read_from_dram(addr); - case 4: - return read_from_dram(addr); - case 8: - return read_from_dram(addr); - default: - fprintf(stderr, "bebop: Invalid DMA read size %u\n", size); - return 0; - } - }; - - auto write_cb = [this](uint64_t addr, uint64_t data, uint32_t size) { - switch (size) { - case 1: - write_to_dram(addr, static_cast(data)); - break; - case 2: - write_to_dram(addr, static_cast(data)); - break; - case 4: - write_to_dram(addr, static_cast(data)); - break; - case 8: - write_to_dram(addr, data); - break; - default: - fprintf(stderr, "bebop: Invalid DMA write size %u\n", size); - break; - } - }; - - socket_client->set_dma_callbacks(read_cb, write_cb); - - // Send socket request and wait for response - dprintf("bebop: Processing custom instruction with funct=%d\n", insn.funct); - reg_t result = socket_client->send_and_wait(insn.funct, xs1, xs2); - - dprintf("bebop: custom instruction funct=%d completed with result=0x%lx\n", - insn.funct, result); - - return result; -} - -static reg_t bebop_custom(processor_t *p, insn_t insn, reg_t pc) { - bebop_t *bebop = static_cast(p->get_extension("bebop")); - rocc_insn_union_t u; - state_t *state = p->get_state(); - bebop->set_processor(p); - u.i = insn; - reg_t xs1 = u.r.xs1 ? state->XPR[insn.rs1()] : -1; - reg_t xs2 = u.r.xs2 ? state->XPR[insn.rs2()] : -1; - reg_t xd = bebop->CUSTOMFN(XCUSTOM_ACC)(u.r, xs1, xs2); - if (u.r.xd) { - state->log_reg_write[insn.rd() << 4] = {xd, 0}; - state->XPR.write(insn.rd(), xd); - } - return pc + 4; -} - -std::vector bebop_t::get_instructions(const processor_t &proc) { - std::vector insns; - push_custom_insn(insns, ROCC_OPCODE3, ROCC_OPCODE_MASK, ILLEGAL_INSN_FUNC, - bebop_custom); - return insns; -} - -std::vector bebop_t::get_disasms(const processor_t *proc) { - std::vector insns; - return insns; -} diff --git a/bebop/host/spike/install-spike.sh b/bebop/host/spike/install-spike.sh deleted file mode 100755 index 79d8b816..00000000 --- a/bebop/host/spike/install-spike.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/usr/bin/env bash - -set -euo pipefail - -SCRIPT_DIR="$(dirname "$(realpath "$0")")" -# HOST_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" -SPIKE_SRC="${SCRIPT_DIR}/riscv-isa-sim" -SPIKE_BUILD="${SPIKE_SRC}/build" -SPIKE_INSTALL="${SPIKE_SRC}/install" -# HOST_BUILD="${HOST_ROOT}/build" - -mkdir -p "${SPIKE_BUILD}" -( - cd "${SPIKE_BUILD}" - ../configure --prefix="${SPIKE_INSTALL}" \ - --with-boost=no \ - --with-boost-asio=no \ - --with-boost-regex=no - make -j$(nproc) - make install -) - -cd ${SCRIPT_DIR} -mkdir -p build && cd build -cmake .. -make install diff --git a/bebop/host/spike/riscv-isa-sim b/bebop/host/spike/riscv-isa-sim deleted file mode 160000 index 88edb8b8..00000000 --- a/bebop/host/spike/riscv-isa-sim +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 88edb8b81383bf282949be30476c9e4d5459cec4 diff --git a/bebop/scripts/bebop_setup.sh b/bebop/scripts/bebop_setup.sh deleted file mode 100755 index 5827900a..00000000 --- a/bebop/scripts/bebop_setup.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/bash - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -BEBOP_DIR="$(dirname "$SCRIPT_DIR")" -BEBOP_BIN="$BEBOP_DIR/bebop/target/release/bebop" - -cd "$BEBOP_DIR/bebop" -cargo build --release --bin bebop - -if [ $? -ne 0 ]; then - echo "Error: Failed to build bebop" - exit 1 -fi - -if netstat -tuln 2>/dev/null | grep -q ":9999 "; then - echo "Warning: Port 9999 is already in use" - echo "Please stop the existing process or change SOCKET_PORT" - exit 1 -fi - -echo "Starting Bebop simulator..." -echo "Listening on 127.0.0.1:9999" -echo "Press Ctrl+C to stop" -echo "" - -exec "$BEBOP_BIN" diff --git a/compiler b/compiler index 4ca70f74..d5e80948 160000 --- a/compiler +++ b/compiler @@ -1 +1 @@ -Subproject commit 4ca70f74fef248bd8418fa1eb6231733175bad73 +Subproject commit d5e809481d7ff2a5009101d5135df99e851a4f69 diff --git a/docs b/docs new file mode 160000 index 00000000..597d0cd9 --- /dev/null +++ b/docs @@ -0,0 +1 @@ +Subproject commit 597d0cd91dbbfc9e9de338ae00f7e7384ae925a9 diff --git a/docs/.gitignore b/docs/.gitignore deleted file mode 100644 index baa790a6..00000000 --- a/docs/.gitignore +++ /dev/null @@ -1,2 +0,0 @@ -book/ -index.html diff --git a/docs/bb-note/book.toml b/docs/bb-note/book.toml deleted file mode 100644 index 28791b47..00000000 --- a/docs/bb-note/book.toml +++ /dev/null @@ -1,20 +0,0 @@ -[book] -authors = ["Dango"] -language = "en" -src = "src" -title = "Buckyball Technical Documentation" - -[preprocessor.toc] -command = "mdbook-toc" -renderer = ["html"] -marker = "[[_TOC_]]" -max-level = 5 - -[preprocessor.mermaid] -command = "mdbook-mermaid" - -[output.html] -additional-js = ["mermaid.min.js", "mermaid-init.js"] - -[output.html.fold] -enable = true diff --git a/docs/bb-note/mermaid-init.js b/docs/bb-note/mermaid-init.js deleted file mode 100644 index 15a7f4e5..00000000 --- a/docs/bb-note/mermaid-init.js +++ /dev/null @@ -1,35 +0,0 @@ -(() => { - const darkThemes = ['ayu', 'navy', 'coal']; - const lightThemes = ['light', 'rust']; - - const classList = document.getElementsByTagName('html')[0].classList; - - let lastThemeWasLight = true; - for (const cssClass of classList) { - if (darkThemes.includes(cssClass)) { - lastThemeWasLight = false; - break; - } - } - - const theme = lastThemeWasLight ? 'default' : 'dark'; - mermaid.initialize({ startOnLoad: true, theme }); - - // Simplest way to make mermaid re-render the diagrams in the new theme is via refreshing the page - - for (const darkTheme of darkThemes) { - document.getElementById(darkTheme).addEventListener('click', () => { - if (lastThemeWasLight) { - window.location.reload(); - } - }); - } - - for (const lightTheme of lightThemes) { - document.getElementById(lightTheme).addEventListener('click', () => { - if (!lastThemeWasLight) { - window.location.reload(); - } - }); - } -})(); diff --git a/docs/bb-note/src/README.md b/docs/bb-note/src/README.md deleted file mode 120000 index 8a33348c..00000000 --- a/docs/bb-note/src/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../README.md \ No newline at end of file diff --git a/docs/bb-note/src/SUMMARY.md b/docs/bb-note/src/SUMMARY.md deleted file mode 100644 index c1ce3e07..00000000 --- a/docs/bb-note/src/SUMMARY.md +++ /dev/null @@ -1,45 +0,0 @@ -# Summary - -- [Project Overview](./README.md) - - [Project Structure](./overview/overview.md) - - [Development Tutorial](./tutorial/tutorial.md) - -- [Architecture Design](./arch/arch.md) - - [Scala Source Code](./arch/src/main/scala/README.md) - - [Util](./arch/src/main/scala/Util/README.md) - - [framework](./arch/src/main/scala/framework/README.md) - - [rocket](./arch/src/main/scala/framework/rocket/README.md) - - [prototype](./arch/src/main/scala/prototype/README.md) - - [format](./arch/src/main/scala/prototype/format/README.md) - - [im2col](./arch/src/main/scala/prototype/im2col/README.md) - - [matrix](./arch/src/main/scala/prototype/matrix/README.md) - - [transpose](./arch/src/main/scala/prototype/transpose/README.md) - - [vector](./arch/src/main/scala/prototype/vector/README.md) - - [bond](./arch/src/main/scala/prototype/vector/bond/README.md) - - [op](./arch/src/main/scala/prototype/vector/op/README.md) - - [thread](./arch/src/main/scala/prototype/vector/thread/README.md) - - [warp](./arch/src/main/scala/prototype/vector/warp/README.md) - - [examples](./arch/src/main/scala/examples/README.md) - - [toy](./arch/src/main/scala/examples/toy/README.md) - - [balldomain](./arch/src/main/scala/examples/toy/balldomain/README.md) - - [bbus](./arch/src/main/scala/examples/toy/balldomain/bbus/README.md) - - [rs](./arch/src/main/scala/examples/toy/balldomain/rs/README.md) - - [sims](./arch/src/main/scala/sims/README.md) - - [firesim](./arch/src/main/scala/sims/firesim/README.md) - - [verilator](./arch/src/main/scala/sims/verilator/README.md) -- [compiler](./compiler/README.md) - - [BuckyballDialect](./compiler/BuckyballDilact/README.md) -- [bbdev](./workflow/README.md) - - [agent](./workflow/steps/agent/README.md) - - [compiler](./workflow/steps/compiler/README.md) - - [doc-agent](./workflow/steps/doc-agent/README.md) - - [funcsim](./workflow/steps/funcsim/README.md) - - [marshal](./workflow/steps/marshal/README.md) - - [sardine](./workflow/steps/sardine/README.md) - - [uvm](./workflow/steps/uvm/README.md) - - [verilator](./workflow/steps/verilator/README.md) - - [workload](./workflow/steps/workload/README.md) - ---- - -[Contributors](misc/contributors.md) diff --git a/docs/bb-note/src/arch/arch.md b/docs/bb-note/src/arch/arch.md deleted file mode 100644 index 8439785c..00000000 --- a/docs/bb-note/src/arch/arch.md +++ /dev/null @@ -1,68 +0,0 @@ -# Buckyball Architecture Design Overview - -The Buckyball architecture module contains complete hardware design implementations, based on the RISC-V instruction set architecture, developed using the Scala/Chisel hardware description language. The architecture design follows modular and extensible principles, supporting various configurations and custom extensions. - -## Architecture Hierarchy - -### System-Level Architecture -Buckyball adopts a layered design, including from top to bottom: -- **SoC Subsystem**: Integrates multi-core processors, cache hierarchy, interconnect networks -- **Processor Core**: Custom implementation based on Rocket core -- **Coprocessor**: Dedicated accelerators supporting RoCC interface -- **Memory Subsystem**: High-performance memory controllers and DMA engines - -### Core Features -- **Configurability**: Supports parameter configuration for core count, cache size, bus width, etc. -- **Extensibility**: Provides standardized coprocessor interfaces and extension mechanisms -- **Compatibility**: Maintains compatibility with the standard RISC-V ecosystem -- **Performance Optimization**: Performance-optimized design for specific application scenarios - -## Directory Structure - -``` -arch/ -├── src/main/scala/ -│ └── framework/ - Buckyball framework core -│ ├── rocket/ - Rocket core custom implementation -│ └── builtin/ - Built-in component library -│ └── memdomain/ - Memory domain implementation -│ ├── mem/ - Memory components -│ └── dma/ - DMA engine -└── thirdparty/ - Third-party dependencies - └── chipyard/ - Chipyard framework -``` - -## Design Principles - -### Modular Design -Each functional module has clear interface definitions and independent implementations, facilitating testing, verification, and reuse. Modules communicate through standardized interfaces, reducing coupling. - -### Parameterized Configuration -All hardware modules support parameterized configuration, achieving flexible hardware generation through Scala's type system and configuration framework. Configuration parameters include: -- Data path width -- Cache size and organization -- Parallelism and pipeline depth -- Coprocessor types and quantities - -### Performance Optimization -Specialized performance optimizations for target application scenarios: -- Memory access pattern optimization -- Data pipeline design -- Parallel computing support -- Low-latency communication mechanisms - -## Development Workflow - -1. **Requirement Analysis**: Determine performance and functional requirements for target applications -2. **Architecture Design**: Select appropriate configuration parameters and extension modules -3. **RTL Implementation**: Use Chisel for hardware description and implementation -4. **Functional Verification**: Verify functional correctness through unit tests and integration tests -5. **Performance Evaluation**: Use simulators and FPGA for performance analysis and optimization - -## Toolchain Support - -- **Chisel/FIRRTL**: Hardware description and synthesis toolchain -- **Verilator**: Fast simulation and verification -- **VCS**: Commercial-grade simulation tools -- **FireSim**: FPGA accelerated simulation platform -- **Chipyard**: Integrated development environment and toolchain diff --git a/docs/bb-note/src/arch/src/main/scala/README.md b/docs/bb-note/src/arch/src/main/scala/README.md deleted file mode 120000 index 74e333ea..00000000 --- a/docs/bb-note/src/arch/src/main/scala/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../arch/src/main/scala/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/Util/README.md b/docs/bb-note/src/arch/src/main/scala/Util/README.md deleted file mode 120000 index 48297a46..00000000 --- a/docs/bb-note/src/arch/src/main/scala/Util/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../arch/src/main/scala/Util/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/examples/README.md b/docs/bb-note/src/arch/src/main/scala/examples/README.md deleted file mode 120000 index d8f07b00..00000000 --- a/docs/bb-note/src/arch/src/main/scala/examples/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../arch/src/main/scala/examples/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/examples/toy/README.md b/docs/bb-note/src/arch/src/main/scala/examples/toy/README.md deleted file mode 120000 index f6d2314b..00000000 --- a/docs/bb-note/src/arch/src/main/scala/examples/toy/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/examples/toy/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/README.md b/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/README.md deleted file mode 120000 index 6309bf6e..00000000 --- a/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../arch/src/main/scala/examples/toy/balldomain/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/bbus/README.md b/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/bbus/README.md deleted file mode 120000 index 01677810..00000000 --- a/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/bbus/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../../arch/src/main/scala/examples/toy/balldomain/bbus/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/rs/README.md b/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/rs/README.md deleted file mode 120000 index 3175d85d..00000000 --- a/docs/bb-note/src/arch/src/main/scala/examples/toy/balldomain/rs/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../../arch/src/main/scala/examples/toy/balldomain/rs/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/framework/README.md b/docs/bb-note/src/arch/src/main/scala/framework/README.md deleted file mode 120000 index 17a49722..00000000 --- a/docs/bb-note/src/arch/src/main/scala/framework/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../arch/src/main/scala/framework/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/framework/builtin/README.md b/docs/bb-note/src/arch/src/main/scala/framework/builtin/README.md deleted file mode 120000 index 45c7b104..00000000 --- a/docs/bb-note/src/arch/src/main/scala/framework/builtin/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/framework/builtin/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/framework/builtin/builtin.md b/docs/bb-note/src/arch/src/main/scala/framework/builtin/builtin.md deleted file mode 100644 index dfa8cdef..00000000 --- a/docs/bb-note/src/arch/src/main/scala/framework/builtin/builtin.md +++ /dev/null @@ -1 +0,0 @@ -# builtin diff --git a/docs/bb-note/src/arch/src/main/scala/framework/builtin/util/README.md b/docs/bb-note/src/arch/src/main/scala/framework/builtin/util/README.md deleted file mode 120000 index dbbf2331..00000000 --- a/docs/bb-note/src/arch/src/main/scala/framework/builtin/util/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../arch/src/main/scala/framework/builtin/util/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/framework/framework.md b/docs/bb-note/src/arch/src/main/scala/framework/framework.md deleted file mode 100644 index 6843d6ed..00000000 --- a/docs/bb-note/src/arch/src/main/scala/framework/framework.md +++ /dev/null @@ -1 +0,0 @@ -# framework diff --git a/docs/bb-note/src/arch/src/main/scala/framework/rocket/README.md b/docs/bb-note/src/arch/src/main/scala/framework/rocket/README.md deleted file mode 120000 index 72c54f5b..00000000 --- a/docs/bb-note/src/arch/src/main/scala/framework/rocket/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/framework/rocket/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/README.md deleted file mode 120000 index e157e9a5..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../arch/src/main/scala/prototype/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/format/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/format/README.md deleted file mode 120000 index 41014440..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/format/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/prototype/format/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/im2col/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/im2col/README.md deleted file mode 120000 index b6df17ce..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/im2col/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/prototype/im2col/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/matrix/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/matrix/README.md deleted file mode 120000 index 2e67de76..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/matrix/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/prototype/matrix/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/transpose/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/transpose/README.md deleted file mode 120000 index be6c792c..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/transpose/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/prototype/transpose/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/vector/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/vector/README.md deleted file mode 120000 index db1f585d..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/vector/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/prototype/vector/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/vector/bond/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/vector/bond/README.md deleted file mode 120000 index fd054912..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/vector/bond/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../arch/src/main/scala/prototype/vector/bond/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/vector/op/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/vector/op/README.md deleted file mode 120000 index efe9e6c4..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/vector/op/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../arch/src/main/scala/prototype/vector/op/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/vector/thread/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/vector/thread/README.md deleted file mode 120000 index a20e5317..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/vector/thread/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../arch/src/main/scala/prototype/vector/thread/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/prototype/vector/warp/README.md b/docs/bb-note/src/arch/src/main/scala/prototype/vector/warp/README.md deleted file mode 120000 index 5850ac88..00000000 --- a/docs/bb-note/src/arch/src/main/scala/prototype/vector/warp/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../../arch/src/main/scala/prototype/vector/warp/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/sims/README.md b/docs/bb-note/src/arch/src/main/scala/sims/README.md deleted file mode 120000 index 4cf84ab2..00000000 --- a/docs/bb-note/src/arch/src/main/scala/sims/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../arch/src/main/scala/sims/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/sims/firesim/README.md b/docs/bb-note/src/arch/src/main/scala/sims/firesim/README.md deleted file mode 120000 index a292a501..00000000 --- a/docs/bb-note/src/arch/src/main/scala/sims/firesim/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/sims/firesim/README.md \ No newline at end of file diff --git a/docs/bb-note/src/arch/src/main/scala/sims/verilator/README.md b/docs/bb-note/src/arch/src/main/scala/sims/verilator/README.md deleted file mode 120000 index f0a79884..00000000 --- a/docs/bb-note/src/arch/src/main/scala/sims/verilator/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../../../../arch/src/main/scala/sims/verilator/README.md \ No newline at end of file diff --git a/docs/bb-note/src/compiler/BuckyballDilact/README.md b/docs/bb-note/src/compiler/BuckyballDilact/README.md deleted file mode 100644 index fdeeb7b6..00000000 --- a/docs/bb-note/src/compiler/BuckyballDilact/README.md +++ /dev/null @@ -1,41 +0,0 @@ -# Tile Dialect Refactoring Documentation - -## Refactoring Background and Goals - -The core goal of this refactoring is to introduce a new intermediate layer between Linalg Dialect and Buckyball Dialect - the Tile Dialect - to achieve clearer separation of responsibilities and better code organization. In the original architecture, the conversion from `linalg.matmul` to hardware instructions was completed in one step through `convert-linalg-to-buckyball`, which caused Buckyball Dialect to handle both the slicing logic for arbitrary-size matrices and hardware-level memory management and computation scheduling, resulting in overly mixed responsibilities. The new architecture splits the conversion process into two phases: `convert-linalg-to-tile` and `convert-tile-to-buckyball`, making each layer have a clear and single responsibility. - -## New Architecture Design - -The entire compilation flow is now divided into three clear layers. First is the Linalg layer, which represents high-level linear algebra operations, such as `linalg.matmul` representing matrix multiplication of arbitrary size. This layer does not care about hardware constraints. Next is the newly introduced Tile layer, whose core responsibility is to tile arbitrary-size matrix operations into fixed-size blocks that conform to hardware constraints. The Tile layer expresses this high-level tiling intent through the `tile.tile_matmul` operation. The specific tiling strategy, loop generation, and boundary handling are all implemented in the `convert-tile-to-buckyball` pass. Finally, the Buckyball layer focuses on hardware-level operations. `buckyball.bb_matmul` receives pre-tiled fixed-size matrix blocks and is responsible for generating precise hardware instruction sequences, including data movement (mvin/mvout), computation scheduling (mul_warp16), and memory address calculation. - -## Tile Dialect Design Details - -The Tile Dialect defines the `TileMatMulOp` operation, which accepts three memref parameters representing matrices A, B, and C respectively. The semantics of this operation are: perform multiplication on input matrices of arbitrary size, automatically handling tiling, padding, and loops. In implementation, `TileMatMulOp` will be converted by the `convert-tile-to-buckyball` pass into multiple `buckyball.bb_matmul` operations and corresponding `memref.subview` operations. This conversion process will consider hardware scratchpad size limitations, warp and lane parallelism constraints, and generate an optimal tiling strategy. The design philosophy of the Tile layer is to provide a platform-independent intermediate representation, allowing upper-layer optimizations to transform matrix operations without understanding specific hardware details. - -## Buckyball Dialect Simplification - -In the new architecture, the Buckyball Dialect has been significantly simplified. The original four operations `VecTileMatMulOp`, `MergeTileMatMulOp`, `MetaTileMatMulOp`, and `VecMulWarp16Op` have been unified into a single `MatMulOp`. This simplification is reasonable because the tiling logic has been moved up to the Tile layer, and the Buckyball layer only needs to express the single concept of "performing hardware-level multiplication on a matrix block that already conforms to hardware constraints." The lowering process of `buckyball.bb_matmul` will directly generate LLVM intrinsics: first load matrices A and B into the scratchpad through `Mvin_IntrOp`, then generate multiple `Mul_Warp16_IntrOp` operations based on warp and lane parameters for computation, and finally write the results back to main memory through `Mvout_IntrOp`. All address calculations and encodings are completed in this lowering process. - -## Key Implementation Details - -When implementing the `convert-linalg-to-tile` pass, the core logic is very simple: match the `linalg.matmul` operation and directly replace it with `tile.tile_matmul`, passing the same three memref operands. The role of this pass is mainly type and semantic conversion, indicating that we have moved from the general linear algebra operation domain into the hardware-oriented tile operation domain. - -The `convert-tile-to-buckyball` pass is the most complex part of the entire refactoring. It needs to extract matrix dimension information (M, K, N) from the operands of `tile.tile_matmul`, then calculate the optimal tiling strategy based on hardware parameters (dim, warp, lane). For the K dimension, it will tile according to warp size; for M and N dimensions, it will consider scratchpad capacity limitations. Each tile corresponds to a `buckyball.bb_matmul` operation, and tiles are connected through `memref.subview` to create matrix views. Special attention should be paid to handling boundary cases: when matrix dimensions cannot be evenly divided by tile size, the actual size of the last tile needs to be calculated to avoid out-of-bounds access. - -When implementing `BuckyballMatMulLowering`, we encountered an important concept in MLIR's type conversion system: OpAdaptor. In conversion patterns, the types of the original operation (such as `memref<32x16xi8>`) will be converted to LLVM types (such as LLVM struct types) by the TypeConverter during the lowering process. OpAdaptor provides converted values, but we need to obtain type information (such as shape) from the original operation because this static information may no longer exist in the same form after conversion. Therefore, the correct approach is: obtain the original `MemRefType` from `matMulOp.getOperandTypes()` to extract shape information for address calculation and loop generation; for actual value operations (such as `ExtractAlignedPointerAsIndexOp`), use the original memref value, because MLIR's memref operations still require MemRefType. - -Another key design decision is: `MatMulOp`'s lowering should directly generate intrinsic operations (`Mvin_IntrOp`, `Mul_Warp16_IntrOp`, `Mvout_IntrOp`), rather than generating `MvinOp`, `MvoutOp` and then waiting for them to be lowered. The reason is that in the LLVM lowering stage, the type system has already been converted, and creating high-level Buckyball operations again would cause type mismatch issues. Directly generating intrinsics avoids multiple type conversions and makes the code clearer and more efficient. Referring to the Gemmini dialect implementation, we adopted the same strategy. - -## Test System - -To verify the correctness of the new architecture, we created complete test cases in the `bb-tests/workloads/src/OpTest/tile/` directory. Tests are divided into two categories: staged tests and end-to-end tests. - -`tile-matmul.mlir` tests the conversion from Linalg to Tile, verifying that `linalg.matmul` is correctly converted to `tile.tile_matmul`. This is the most basic type conversion test. `tile-to-buckyball.mlir` tests the conversion from Tile to Buckyball, verifying that the tiling logic is correct and that the correct number of `buckyball.bb_matmul` operations and `memref.subview` operations are generated. `buckyball-to-llvm.mlir` tests the conversion from Buckyball MatMulOp to LLVM intrinsics, verifying that the correct sequences of `buckyball.intr.bb_mvin`, `buckyball.intr.bb_mul_warp16`, and `buckyball.intr.bb_mvout` instructions are generated. - -`end-to-end.mlir` is the most important test, testing the complete conversion flow: starting from `linalg.matmul`, sequentially passing through the three passes `-convert-linalg-to-tile`, `-convert-tile-to-buckyball`, `-lower-buckyball`, and finally generating LLVM intrinsics. This test ensures that each part of the entire pipeline works correctly and that there are no issues with the connections between parts. - -## Pass Registration and Toolchain Integration - -The two newly added passes need to be registered in multiple places. First, register the pass creation functions `registerLowerLinalgToTilePass()` and `registerLowerTileToBuckyballPass()` in `InitAll.cpp`, and also register `buddy::tile::TileDialect`. In the `buddy-opt` tool, `buddy::tile::TileDialect` needs to be added to the dialect registry so that the tool can recognize and parse tile dialect operations. In the CMake build system, the new libraries `BuddyTile`, `LowerLinalgToTilePass`, and `LowerTileToBuckyballPass` need to be added to the link dependencies, ensuring correct dependency relationships. - -It is particularly worth noting that in the `configureBuckyballLegalizeForExportTarget` function in `LegalizeForLLVMExport.cpp`, we need to add `target.addLegalDialect()` and `target.addLegalDialect()`, because memref and arith operations will be used during the lowering process of `MatMulOp`. If these dialects are not marked as legal, the conversion framework will attempt to lower these operations, causing type conversion conflicts. diff --git a/docs/bb-note/src/compiler/README.md b/docs/bb-note/src/compiler/README.md deleted file mode 100644 index be7db00d..00000000 --- a/docs/bb-note/src/compiler/README.md +++ /dev/null @@ -1,28 +0,0 @@ -# Compiler Build Guide - -## Basic Workload Compilation - -To build the workload, follow these steps: - -```bash -mkdir build && cd build -cmake -G Ninja .. -ninja -``` - -## Model-Level Testing - -To enable model-level testing with specific models and architectures: - -```bash -mkdir build && cd build -cmake -G Ninja .. \ - -DMODEL="lenet,resnet18,mobilenetv3,bert,stablediffusion,llama2,deepseekr1" \ - -DARCH="gemmini,buckyball" -ninja -``` - -Note: -1. Model downloads for bert, whisper, stable-diffusion, llama2, DeepseekR1 require pre-configured HuggingFace access -2. whisper is currently not supported -3. llama2 model download requires additional API-key or cached credentials diff --git a/docs/bb-note/src/misc/contributors.md b/docs/bb-note/src/misc/contributors.md deleted file mode 100644 index 4be3bf33..00000000 --- a/docs/bb-note/src/misc/contributors.md +++ /dev/null @@ -1,50 +0,0 @@ -# Contributors - -Thank you to all developers and researchers who have contributed to the Buckyball project. - -## Core Development Team - -The Buckyball project is primarily developed by the DangoSys team, dedicated to building a high-performance domain-specific architecture framework. - -## Contribution Methods - -We welcome contributions of all kinds: - -### Code Contributions -- Hardware architecture design and optimization -- Software toolchain improvements -- Test cases and benchmark programs -- Documentation writing and maintenance - -### Issue Feedback -- Bug reports and fix suggestions -- Feature requirements and improvement suggestions -- Performance optimization suggestions -- Usage experience feedback - -### Academic Collaboration -- Research papers and technical reports -- Conference presentations and technical sharing -- Open source community promotion - -## Participation Guidelines - -1. **Fork Project**: Create a project branch from GitHub -2. **Local Development**: Set up development environment according to documentation -3. **Submit Changes**: Follow code standards and commit format -4. **Create PR**: Describe changes and test results in detail -5. **Code Review**: Cooperate with maintainers to complete code review process - -## Contact - -- **GitHub**: [DangoSys/buckyball](https://github.com/DangoSys/buckyball) -- **Issues**: Report issues through GitHub Issues -- **Discussions**: Participate in [Slack](https://buckyballhq.slack.com/) for discussions - -## Acknowledgments - -Special thanks to the following open source projects and communities: -- Buddy-Compiler development team -- Chipyard project -- RISCV Foundation -- All test users and feedback providers diff --git a/docs/bb-note/src/overview/overview.md b/docs/bb-note/src/overview/overview.md deleted file mode 100644 index a9c6e69a..00000000 --- a/docs/bb-note/src/overview/overview.md +++ /dev/null @@ -1,68 +0,0 @@ -# Buckyball Project Structure Overview - -Buckyball is a scalable framework for domain-specific architectures. The project adopts a modular design with clear directory responsibilities, supporting a complete toolchain from hardware design to software development. - -## Main Directory Structure - -### Core Architecture Module -- **`arch/`** - Hardware architecture implementation, containing RTL code written in Scala/Chisel - - Based on Rocket-chip and Chipyard framework - - Implements custom RoCC coprocessors and memory subsystems - - Supports various configuration and extension options - -### Test Verification Module -- **`bb-tests/`** - Unified test framework - - `workloads/` - Application workload tests - - `customext/` - Custom extension verification - - `sardine/` - Sardine test framework - - `uvbb/` - Unit test suite - -### Simulation Environment Module -- **`sims/`** - Simulators and verification environments - - Supports Verilator, VCS and other simulators - - Integrates FireSim FPGA accelerated simulation - - Provides performance analysis and debugging tools - -### Development Tools Module -- **`scripts/`** - Build and deployment scripts - - Environment initialization scripts - - Automated build tools - - Dependency management and configuration - -- **`workflow/`** - Development workflows and automation - - CI/CD pipeline configuration - - Documentation generation tools - - Code quality checks - -### Documentation System -- **`docs/`** - Project documentation - - `bb-note/` - Technical documentation based on mdBook - - `img/` - Documentation image resources - - Supports automatic generation and updates - -### Third-party Dependencies -- **`thirdparty/`** - External dependency modules (**submodules**) - - `chipyard/` - Berkeley Chipyard SoC design framework - - `circt/` - CIRCT circuit compiler toolchain - -## Development Workflow - -1. **Environment Setup**: Use `scripts/init.sh` to initialize the development environment -2. **Architecture Development**: Perform hardware design and modifications in the `arch/` directory -3. **Test Verification**: Use test suites in `bb-tests/` for functional verification -4. **Simulation Debugging**: Perform performance analysis through simulation environments in the `sims/` directory -5. **Documentation Updates**: Automatically generate or manually update technical documentation in `docs/` - -## Build System - -The project supports multiple build methods: -- **Make**: Traditional Makefile builds -- **SBT**: Scala project build tool -- **CMake**: Test framework build system -- **Conda**: Python environment and dependency management - -## Version Management Notes - -- **Submodules**: Modules under `thirdparty/` need independent updates -- **Main Repository**: Core code and configuration update synchronously with the main branch -- **Documentation**: Supports automatic generation, keeping in sync with code changes diff --git a/docs/bb-note/src/tutorial/tutorial.md b/docs/bb-note/src/tutorial/tutorial.md deleted file mode 100644 index 13a6b4b1..00000000 --- a/docs/bb-note/src/tutorial/tutorial.md +++ /dev/null @@ -1,337 +0,0 @@ -# Tutorial for buckyball - -> by - Bohan Wang -> -> This document will be gradually updated as the author continues to solve and summarize encountered issues. - -This document explains the step-by-step process and problem-solving approaches for a complete `buckyball` development workflow. We use building a ball operator module for executing the `relu()` function as an example: - -First, we need to complete the hardware code writing for this module, i.e., write hardware code in Scala's Chisel language and generate corresponding `verilog` code. - -Second, we need to write test software to implement `relu()`, which can be a reference function that runs on `CPU` with software code and an experimental function that runs software code on the dedicated hardware written in step one. If the test results match, it's successful, or proceed to step three for testing. - -Third, simulate at the hardware level, view waveform diagrams for debugging. Additionally, there are other details such as compiler documentation changes, instruction set updates, etc., which will be explained below. - -When encountering issues during development, you can visit [DangoSys/buckyball | DeepWiki](https://deepwiki.com/DangoSys/buckyball) or [Project Overview - Buckyball Technical Documentation](https://dangosys.github.io/buckyball/index.html) - -Chisel learning resources: [binder](https://mybinder.org/v2/gh/freechipsproject/chisel-bootcamp/master) - -Before starting officially, let's initialize the environment: - -``` -cd /path/to/buckyball -source env.sh -// source ./env.sh if this gives an error -// All paths in this document are relative paths starting from ./buckyball -``` - -## I. Writing Chisel Hardware Module - -Create a Chisel implementation of the `ReLU` accelerator in the `arch/src/main/scala/prototype/` directory. Referring to existing accelerator structures, it's recommended to create a new subdirectory under `prototype/`, for example `prototype/relu/Relu.scala`, and write the hardware code. - -## II. Hardware Instruction Decoding - -Next, decode hardware instructions. Support for ReLU instructions needs to be added on the **hardware side** so that the hardware decoder recognizes this instruction, and register the instruction set for this ball. - -This work is mainly divided into the following five aspects: - -- Instruction enumeration (DISA) defines func7 → instruction name (RELU) -- Decoder (DomainDecoder) defines func7 → decoding rules (read/write/address/iter) → BID (e.g., 4) -- Bus registration (busRegister) defines BID → actual Ball instance (ReluBall indexed at 4) -- Reservation station registration (rsRegister) is used for RS/issue descriptions, aligned with BID, facilitating system issue/completion management and debugging -If any link is missing or inconsistent, the ReLU instruction cannot be correctly recognized/routed/executed on actual hardware. -- Create a new Ball execution unit `class ReluUnit` to handle ReLU operations. - -#### 1. Define RELU_BITPAT in DISA.scala - -`arch/src/main/scala/examples/toy/balldomain/DISA.scala` defines the funct7 encoding (BitPat) for Ball instructions, such as TRANSPOSE, IM2COL, etc. It can be viewed as an "instruction set enumeration table" for decoder matching. - -Add the bit pattern definition for the ReLU instruction in this file: - -``` -val RELU_BITPAT = BitPat("b0100110") // func7 = 38 = 0x23 -``` - -#### 2. Add ReLU instruction to Ball domain decoder - -`arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala` is the Ball domain decoder. -Its functions are as follows: -- Input: PostGDCmd from global decoding (already determined to be a Ball category command). -- Output: Structured BallDecodeCmd, including: - - Whether to use op1/op2, whether to write back to scratchpad, whether operands come from scratchpad - - Operand/writeback bank and address - - Iteration count iter - - Target Ball ID (BID) - - Other dedicated fields special, etc. -- Internally maps different funct7 instructions to a set of boolean switches and field extraction rules through ListLookup(func7, ...). - -Add the decoding entry for the ReLU instruction in the decoding list in this file. Referring to the implementation of other instructions (e.g., TRANSPOSE_FUNC7 = 38), you need: - -``` -// Add to BallDecodeFields ListLookup -RELU -> List(Y,N,Y,Y,N, rs1(spAddrLen-1,0), 0.U(spAddrLen.W), rs2(spAddrLen-1,0), rs2(spAddrLen + 9,spAddrLen), 7.U, rs2(63,spAddrLen + 10), Y) // Fill in decoding fields according to specific ReLU instruction requirements, the number of list parameters must be consistent, you can refer to other instructions -``` - -#### 3. Add ReLuBall generator and register it - -a. `arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala` is the Ball bus registration table, using a `Seq(() => new SomeBall(...))` to register the actual Ball modules to be instantiated in the system. - -Find and add the new ID for ReLuBall in this file. - -``` -class BBusModule(implicit b: CustomBuckyballConfig, p: Parameters) - extends BBus( - // Define Ball device generator to register - Seq( - () => new examples.toy.balldomain.vecball.VecBall(0), - () => new examples.toy.balldomain.matrixball.MatrixBall(1), - () => new examples.toy.balldomain.im2colball.Im2colBall(2), - () => new examples.toy.balldomain.transposeball.TransposeBall(3), - ... - () =>new examples.toy.balldomain.reluball.ReluBall(7) // Ball ID 7 - newly added - ) - ) { - override lazy val desiredName = "BBusModule" -} -``` - -b. `arch/src/main/scala/examples/toy/balldomain/rs/rsRegister.scala` is the "Ball reservation station" registration table, using a list to register which Balls exist in the system (specifying ID and name by ballId). The reservation station (RS) is responsible for managing Ball issue, occupancy, completion and other metadata, usually also used for visualization/statistics, naming and logging. - -Register ReluBall in this file: - -``` -class BallRSModule(implicit b: CustomBuckyballConfig, p: Parameters) - extends BallReservationStation( - // Define Ball device information to register - Seq( - BallRsRegist(ballId = 0, ballName = "VecBall"), - BallRsRegist(ballId = 1, ballName = "MatrixBall"), - BallRsRegist(ballId = 2, ballName = "Im2colBall"), - BallRsRegist(ballId = 3, ballName = "TransposeBall"), - ... - BallRsRegist(ballId = 7, ballName = "ReluBall") // Ball ID 7 - newly added - ) - ) { - override lazy val desiredName = "BallRSModule" -} -``` -#### 4. Write ReluBall interface file - -Create a `reluball` folder in the `arch/src/main/scala/examples/toy/balldomain` directory, enter the folder and create `ReluBall.scala` to write the interface code. - -## III. Writing Test Software and Compilation Settings - -### 1. Create test file - -Create `relu_test.c` under `bb-tests/workloads/src/CTest/toy/`, write test code. The core function in the code will execute `void bb_relu(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter);` Note the declaration and definition of this function below. - -### 2. Modify CMakeLists.txt - -Add test target in `bb-tests/workloads/src/CTest/toy/CMakeLists.txt`: CMakeLists.txt:120-127 - -``` -add_cross_platform_test_target(ctest_relu_test relu_test.c) -``` - -And add to the main build target: CMakeLists.txt:137-162 - -``` -add_custom_target(buckyball-CTest-build ALL DEPENDS - # ... other tests ... - ctest_relu_test - COMMENT "Building all workloads for Buckyball" - VERBATIM) -``` - -### 3. Need to add ReLU instruction API - -#### a. isa.h - -- Add declaration for `ReLU` instruction in `bb-tests/workloads/lib/bbhw/isa/isa.h`: `isa.h:33-43` - -- Add to `InstructionType` enum: - -``` -RELU_FUNC7 = 38, // 0x26 - ReLU function code (or other value you choose) -``` - -- Add to function declaration section: `isa.h:72-73` - -``` -void bb_relu(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter); -``` - -#### b. isa.c - -- Add `38_relu.c` in `bb-tests/workloads/lib/bbhw/isa`, implement `void bb_relu(uint32_t op1_addr, uint32_t wr_addr, uint32_t iter)` inside - -- Add declaration in `bb-tests/workloads/lib/bbhw/isa/isa.c`: `isa.c:53-76` - -``` -case RELU_FUNC7: - return &relu_config; -``` - -- In `isa.c:37-47` - -``` -extern const InstructionConfig relu_config; -``` - -### 4. Update CMakeLists.txt - -Add compilation and linking of `38_relu.c` in all three compilation commands in `bb-tests/workloads/lib/bbhw/isa/CMakeLists.txt`: - -1. **Linux version**: Add in `COMMAND` of `add_custom_command`: - - ``` - && riscv64-unknown-linux-gnu-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/38_relu.c -march=rv64gc -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o linux-38_relu.o - ``` - - And add `linux-38_relu.o` to the `ar rcs` command - -2. **Baremetal version**: Add in `COMMAND` of `add_custom_command`: - - ``` - && riscv64-unknown-elf-gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/38_relu.c -g -fno-common -O2 -static -march=rv64gc -mcmodel=medany -fno-builtin-printf -D__BAREMETAL__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o baremetal-38_relu.o - ``` - - And add `baremetal-38_relu.o` to the `ar rcs` command - -3. **x86 version**: Add in `COMMAND` of `add_custom_command`: - - ``` - && gcc -c ${CMAKE_CURRENT_SOURCE_DIR}/38_relu.c -fPIC -D__x86_64__ -I${CMAKE_CURRENT_SOURCE_DIR} -I${CMAKE_CURRENT_SOURCE_DIR}/.. -o x86-38_relu.o - ``` - - And add `x86-38_relu.o` to the `ar rcs` command - -4. The ISA submodule library at the beginning needs to add the corresponding **38_relu.c** file. - - -## IV. Test Operation Steps - -### Step 1: Compile test program - -``` -cd bb-tests/build -rm -rf * -cmake -G Ninja ../ -``` - -**Warning**: Before executing `rm -rf *`, make sure you are in the `bb-tests/build` directory, otherwise forcing deletion in the wrong folder will be catastrophic! - -If a disaster occurs, you can pull the initial documents from GitHub again, but files updated on the server side cannot be recovered. - -``` -ninja ctest_relu_test // Software compilation -``` - -If `ninja ctest_relu_test` reports an error after execution, this means software compilation failed, please check **"III. Writing Test Software"** and related files. - -``` -bbdev workload --build -``` - -Compile/package the selected workload source code or configuration into artifacts (such as executable files, images, runtime scripts, input data packages, etc.) that can be used in the simulation or runtime environment for subsequent running on the Verilator/simulation platform or host side. - -### Step 2: Generate Verilog - -``` -cd buckyball -bbdev verilator --verilog -``` - -If `bbdev verilator --verilog` reports an error after execution, this means hardware compilation failed, please check **"I. Writing Chisel Hardware Module II. Compilation Adaptation Preparation"** related files. - - -### Step 3: Run simulation - -``` -bbdev verilator --run '--jobs 16 --binary ctest_relu_test_singlecore-baremetal --batch' -``` - -If `bbdev verilator --verilog` reports an error after execution, this means the hardware system has timeout, deadlock and other issues, please check **I. Writing Chisel Hardware Module** related files. - -### Step 5: View simulation files - -In `arch/waveform/SimulationFileName(E.g.2025-10-08-00-03-ctest_vecunit_matmul_random1_singlecore-baremetal)`, download the `waveform.fst` file to your local system using software like `Filezilla`, and view the waveform using a local simulation waveform viewer (E.g. GTKWave). - -Note that the simulation file folder should only contain the `waveform.fst` file. If a `waveform.fst.hier` file exists, it means the simulation failed. - -If the waveform does not meet theoretical conditions, check **I. Writing Chisel Hardware Module** related files when the software test code is correct. - -To check if the software code has problems, you can refer to its execution results on `CPU`. You can temporarily completely remove hardware accelerator calls from the `relu_test.c` file and only test the CPU version. - -## V. Simulation Waveform -After importing `waveform.fst` locally, use [GTKWAVE](https://zhuanlan.zhihu.com/p/647533706) to find in the project index: -`TOP.TestHarness.chiptop0.system.tile_prci_domain.element_reset_domain_tile.buckyball.ballDomain.bbus.balls_4.reluUnit` The constants under this file are all hardware constants used by Relu.scala, double-click to view the waveform! - -> Some naming for different routines may not be exactly the same, but they are basically similar - -## VI. Performance Testing - -### Query number of clock cycles used - speed performance metric -```Scala -cat /home/MikeNotFound/code/buckyball/arch/log/2025-10-24-16-59-ctest_relu_test_singlecore-baremetal/disasm.log | grep "PMC" -``` - -### DC Test - Check Timing, Frequency, Area, and Related Parameters - -* **Preparation** - -1. In the `/home//bash.sh` file, add the required environment variables at the end: - - ```bash - export SNPSLMD_LICENSE_FILE=27000@amax - export PATH="$PATH:/opt/riscv/bin" - export VCS_HOME="/data0/tools/Synopsys/vcs/vcs/W-2024.09-SP1" - export PATH="$PATH:$VCS_HOME/bin" - export VERDI_HOME="/data0/tools/Synopsys/verdi/verdi/W-2024.09-SP1" - export PATH="$PATH:$VERDI_HOME/bin" - export SCL_HOME="/data0/tools/Synopsys/scl/scl/2024.06" - export PATH="$PATH:$SCL_HOME/linux64/bin" - export DC_HOME="/data0/tools/Synopsys/dc/syn/W-2024.09-SP1" - export PATH="$PATH:$DC_HOME/bin" - export PT_HOME="/data0/tools/Synopsys/ptpx/prime/W-2024.09-SP1/" - export PATH="$PATH:$PT_HOME/bin" - - export LM_LICENSE_FILE=/data0/tools/Synopsys/lic/Synopsys.dat - - alias vcs="vcs -full64" - alias lmli="lmgrd -c /data0/tools/Synopsys/lic/Synopsys.dat" - ``` - -2. In the `/home//code/buckyball/evals/run-dc.sh` file, remove the `-retime` option around line 126. - ---- - -* **Formal Test** - -1. Go back to the `buckyball` directory and run the command - - ```bash - bbdev verilator --verilog "--balltype ReluBall --output_dir ReluBall_1" - ``` - - This will generate a Verilog folder for the specified ball under the `arch` directory. - -2. Grant execution permission to the script: - - ```bash - chmod 777 evals/run-dc.sh - ``` - -3. Run the DC command: - - ```bash - ./evals/run-dc.sh --srcdir arch/ReluBall_1 --top ReluBall - ``` - - This means performing the DC test on the top-level file `ReluBall.sv` located in the `arch/ReluBall_1` folder. - -4. You can find the test results in - - ``` - /home//buckyball/bb-tests/output/dc/reports - ``` diff --git a/docs/bb-note/src/workflow/README.md b/docs/bb-note/src/workflow/README.md deleted file mode 100644 index e69de29b..00000000 diff --git a/docs/bb-note/src/workflow/steps/agent/README.md b/docs/bb-note/src/workflow/steps/agent/README.md deleted file mode 120000 index 039f1835..00000000 --- a/docs/bb-note/src/workflow/steps/agent/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/agent/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/compiler/README.md b/docs/bb-note/src/workflow/steps/compiler/README.md deleted file mode 120000 index 09be4abc..00000000 --- a/docs/bb-note/src/workflow/steps/compiler/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/compiler/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/doc-agent/README.md b/docs/bb-note/src/workflow/steps/doc-agent/README.md deleted file mode 120000 index 0a6a9422..00000000 --- a/docs/bb-note/src/workflow/steps/doc-agent/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/doc-agent/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/firesim/README.md b/docs/bb-note/src/workflow/steps/firesim/README.md deleted file mode 120000 index c11df3c6..00000000 --- a/docs/bb-note/src/workflow/steps/firesim/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/firesim/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/marshal/README.md b/docs/bb-note/src/workflow/steps/marshal/README.md deleted file mode 120000 index 49253fcd..00000000 --- a/docs/bb-note/src/workflow/steps/marshal/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/marshal/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/sardine/README.md b/docs/bb-note/src/workflow/steps/sardine/README.md deleted file mode 120000 index 438b4dc8..00000000 --- a/docs/bb-note/src/workflow/steps/sardine/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/sardine/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/uvm/README.md b/docs/bb-note/src/workflow/steps/uvm/README.md deleted file mode 120000 index 91c79b4e..00000000 --- a/docs/bb-note/src/workflow/steps/uvm/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/uvm/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/verilator/README.md b/docs/bb-note/src/workflow/steps/verilator/README.md deleted file mode 120000 index 89b70019..00000000 --- a/docs/bb-note/src/workflow/steps/verilator/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/verilator/README.md \ No newline at end of file diff --git a/docs/bb-note/src/workflow/steps/workload/README.md b/docs/bb-note/src/workflow/steps/workload/README.md deleted file mode 120000 index b6a35edc..00000000 --- a/docs/bb-note/src/workflow/steps/workload/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../../../workflow/steps/workload/README.md \ No newline at end of file diff --git a/docs/img/buckyball.png b/docs/img/buckyball.png deleted file mode 100644 index a8daaa45..00000000 Binary files a/docs/img/buckyball.png and /dev/null differ diff --git a/docs/img/logo.png b/docs/img/logo.png deleted file mode 100644 index 96c2cbea..00000000 Binary files a/docs/img/logo.png and /dev/null differ diff --git a/docs/index.html b/docs/index.html deleted file mode 100644 index 41375114..00000000 --- a/docs/index.html +++ /dev/null @@ -1,278 +0,0 @@ - - - - - - Buckyball - BuckyballNote - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-

Keyboard shortcuts

-
-

Press or to navigate between chapters

-

Press S or / to search in the book

-

Press ? to show this help

-

Press Esc to hide this help

-
-
-
-
- - diff --git a/evals/.gitignore b/evals/.gitignore deleted file mode 100644 index e69de29b..00000000 diff --git a/evals/run-dc.sh b/evals/run-dc.sh deleted file mode 100755 index 1353c978..00000000 --- a/evals/run-dc.sh +++ /dev/null @@ -1,150 +0,0 @@ -#!/bin/bash - -# exit script if any command fails -set -e -set -o pipefail - - -# DigitalTop -while [[ $# -gt 0 ]]; do - case $1 in - --srcdir) - SRC_DIR="$2" - shift 2 - ;; - --top) - TOP_MODULE="$2" - shift 2 - ;; - *) - echo "Unknown option: $1" - echo "Usage: $0 --srcdir --top [top_module](Optional)" - echo "Example: $0 --srcdir arch/VecBall_1 --top VecBall_1" - exit 1 - ;; - esac -done - -if [ -z "$SRC_DIR" ]; then - echo "Error: Missing required parameter: srcdir" - echo "Usage: $0 --srcdir --top [top_module](Optional)" - echo "Example: $0 --srcdir arch/VecBall_1 --top VecBall_1" - exit 1 -fi - - - -CYDIR=$(git rev-parse --show-toplevel) - -WORK_DIR="${CYDIR}/bb-tests/output/dc" -DESIGN_DIR="${CYDIR}/bb-tests/output/dc/design" -REPORT_DIR="${CYDIR}/bb-tests/output/dc/reports" -TMP_DIR="${CYDIR}/bb-tests/output/dc/tmp" -TCL_FILE="${CYDIR}/bb-tests/output/dc/dc_script.tcl" -DB_FILE="/opt/dc/lib/TSMCHOME/SRAM_m4swbsoffg0p99v0c/" - -mkdir -p $WORK_DIR -mkdir -p $DESIGN_DIR -mkdir -p $REPORT_DIR -mkdir -p $TMP_DIR - -#------------------------------------------------------------------- -# Step0 Execute build Verilator -#------------------------------------------------------------------- -# ${CYDIR}/voyager-test/scripts/build-verilator.sh --config ${CONFIG} - -#------------------------------------------------------------------- -# Step1 Copy Verilog files for corresponding Config to work directory -#------------------------------------------------------------------- -DESIGN_SOURCE_DIR="${CYDIR}/${SRC_DIR}" -rm -rf ${DESIGN_DIR}/* -cp -r ${DESIGN_SOURCE_DIR}/* ${DESIGN_DIR}/ - -#------------------------------------------------------------------- -# Step2 Replace SRAM -#------------------------------------------------------------------- -# echo "Checking SRAM File..." -# python ${CYDIR}/voyager-test/scripts/read_json.py $DB_FILE $DESIGN_DIR "/home/hxm123/tapeout-Voyager/sims/verilator/generated-src/chipyard.harness.TestHarness.GemminiRocketConfig/gen-collateral/metadata/seq_mems.json" - -#------------------------------------------------------------------- -# Step3 Write tcl script -#------------------------------------------------------------------- - - -# Format file paths as Tcl required string (space-separated) -tcl_db_list="" - - -cat > $TCL_FILE << EOF -# Set search path -set search_path [list . $DESIGN_DIR] -define_design_lib work -path $TMP_DIR - -set target_library "$tcl_db_list\ -/data0/tools/lib/db/scc28nhkcp_hdc35p140_rvt_ffg_v0p99_0c_basic.db \ -/data0/tools/lib/db/scc28nhkcp_hdc35p140_rvt_ffg_v0p99_0c_ccs.db \ -/data0/tools/lib/db/scc28nhkcp_hdc35p140_rvt_ffg_v0p99_0c_ecsm.db \ -" -set link_library "$tcl_db_list\ -/data0/tools/lib/db/scc28nhkcp_hdc35p140_rvt_ffg_v0p99_0c_basic.db \ -/data0/tools/lib/db/scc28nhkcp_hdc35p140_rvt_ffg_v0p99_0c_ccs.db \ -/data0/tools/lib/db/scc28nhkcp_hdc35p140_rvt_ffg_v0p99_0c_ecsm.db \ -" - -# Read design files -set file_list [glob -nocomplain -directory $DESIGN_DIR *.sv ] -analyze -format sverilog \$file_list -elaborate $TOP_MODULE - -# Set top module name -set current_design "$TOP_MODULE" - -# Link design -link - -create_clock -name clk1 -period 2 [get_ports clock] - -set_clock_uncertainty 0.6 [get_clocks clock] - -set_input_delay 1.2 -clock clk1 [remove_from_collection [all_inputs] [get_ports clock]] -set_output_delay 0.6 -clock clk1 [all_outputs] - -# Same frequency same phase - -set_clock_equivalence clk1 - -# Output load - -set_load 0.08 [all_outputs] - -set_input_transition 0.2 [remove_from_collection [all_inputs] [get_ports clock]] - -set_clock_transition 0.08 [get_clocks clk1] - - - -compile_ultra -scan -write -format ddc -hierarchy -output $REPORT_DIR/design_compiled.ddc - -# Generate reports -report_area -hierarchy -nosplit > $REPORT_DIR/area.rpt -report_timing > $REPORT_DIR/timing.rpt -report_power -hierarchy > $REPORT_DIR/power.rpt - -# Save netlist -write -format verilog -output $REPORT_DIR/netlist.v - -# Exit -exit -EOF - -#------------------------------------------------------------------- -# Step4 Run DC -#------------------------------------------------------------------- -echo "Running DC synthesis for design: ${CONFIG}, top module: $TOP_MODULE" -dc_shell -f $TCL_FILE - -# rm $TCL_FILE -rm -rf ${CYDIR}/alib-52 - -echo "Synthesis completed. Reports are available in $REPORT_DIR directory." diff --git a/flake.lock b/flake.lock new file mode 100644 index 00000000..91ac58d3 --- /dev/null +++ b/flake.lock @@ -0,0 +1,61 @@ +{ + "nodes": { + "flake-utils": { + "inputs": { + "systems": "systems" + }, + "locked": { + "lastModified": 1731533236, + "narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=", + "owner": "numtide", + "repo": "flake-utils", + "rev": "11707dc2f618dd54ca8739b309ec4fc024de578b", + "type": "github" + }, + "original": { + "owner": "numtide", + "repo": "flake-utils", + "type": "github" + } + }, + "nixpkgs": { + "locked": { + "lastModified": 1770019141, + "narHash": "sha256-VKS4ZLNx4PNrABoB0L8KUpc1fE7CLpQXQs985tGfaCU=", + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "cb369ef2efd432b3cdf8622b0ffc0a97a02f3137", + "type": "github" + }, + "original": { + "owner": "NixOS", + "ref": "nixos-unstable", + "repo": "nixpkgs", + "type": "github" + } + }, + "root": { + "inputs": { + "flake-utils": "flake-utils", + "nixpkgs": "nixpkgs" + } + }, + "systems": { + "locked": { + "lastModified": 1681028828, + "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=", + "owner": "nix-systems", + "repo": "default", + "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e", + "type": "github" + }, + "original": { + "owner": "nix-systems", + "repo": "default", + "type": "github" + } + } + }, + "root": "root", + "version": 7 +} diff --git a/flake.nix b/flake.nix new file mode 100644 index 00000000..6332fda7 --- /dev/null +++ b/flake.nix @@ -0,0 +1,124 @@ +{ + description = "Development environment for Buckyball with Verilator"; + + inputs = { + nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable"; + flake-utils.url = "github:numtide/flake-utils"; + }; + + outputs = { self, nixpkgs, flake-utils }@inputs: + flake-utils.lib.eachDefaultSystem + (system: + let + overlay = import ./scripts/nix/overlay.nix; + pkgs = import nixpkgs { overlays = [ overlay ]; inherit system; }; + in + { + legacyPackages = pkgs; + + # nix build + packages.default = pkgs.buildEnv { + name = "buckyball-environment"; + paths = with pkgs; [ + tools.verilator + tools.dramsim2 + tools.ccache + tools.lld + tools.yosys + tools.opensta + tools.lcov + + # RISC-V toolchain + riscv.riscv-embedded-gcc + riscv.riscv-linux-gcc + + # python environment + python.python3Packages + pkgs."pre-commit" + pkgs.clang-tools # clang-format for pre-commit (language: system) + + # Rust toolchain + rustTools.rustc + rustTools.cargo + rustTools.rustfmt + rustTools.clippy + + # bbdev dependencies + bbdev.nodejs + bbdev.pnpm + bbdev.uv + bbdev.allure + bbdev.gcc + bbdev.gnumake + bbdev.pkg-config + + # C libraries (headers + link libs) + clibs.zlib-dev + clibs.zlib + clibs.readline-dev + clibs.readline + clibs.jpeg-dev + clibs.jpeg + clibs.png-dev + clibs.png + + # Scala tools + scala.mill + scala.sbt + scala.scalafmt + scala.coursier + + # Documentation tools + doc.mdbook + doc.mdbook-linkcheck + doc.mdbook-pdf + doc.mdbook-toc + doc.mdbook-mermaid + ]; + }; + + # nix develop + devShells.default = pkgs.mkShell { + buildInputs = with pkgs; [ + clibs.zlib-dev + clibs.zlib + clibs.readline-dev + clibs.readline + clibs.jpeg-dev + clibs.jpeg + clibs.png-dev + clibs.png + ]; + shellHook = '' + if [ -d "$PWD/result/bin" ]; then + export PATH="$PWD/result/bin:$PATH" + else + echo "Warning: result/bin not found. Run 'nix build' first." >&2 + fi + + source "$PWD/sourceme.sh" + + # Verilator build acceleration: ccache via OBJCACHE + export OBJCACHE=ccache + + if [ -z "$NIX_QUIET" ]; then + echo "================= Buckyball Environment Activated =========================" + echo "Development environment loaded:" + echo "Verilator: $(verilator --version 2>&1 | head -1)" + echo "RISC-V Embedded GCC: $(riscv64-unknown-elf-gcc --version 2>&1 | head -1)" + echo "RISC-V Linux GCC: $(riscv64-unknown-linux-gnu-gcc --version 2>&1 | head -1)" + echo "Mill: $(mill --version 2>&1 | head -1)" + echo "Cargo: $(cargo --version 2>&1 | head -1)" + echo "npm: $(npm --version 2>&1 | head -1)" + echo "bbdev: $(which bbdev)" + echo "RISCV: $RISCV" + echo "Yosys: $(yosys --version 2>&1 | head -1)" + echo "OpenSTA: $(sta -version 2>&1 | head -1)" + echo "Buddy MLIR: $(which buddy-opt)" + echo "===========================================================================" + fi + ''; + }; + } + ); +} diff --git a/scripts/claude/README.md b/scripts/claude/README.md new file mode 100644 index 00000000..70c88eec --- /dev/null +++ b/scripts/claude/README.md @@ -0,0 +1,97 @@ +# Buckyball Claude Code Workflow + +Claude Code 作为交互前端,bbdev 作为执行后端。Claude 通过 MCP Server 调用 bbdev 的 HTTP API(server 模式,自动管理生命周期)。 + +## 三个 Workflow + +| # | 触发 | 功能 | +|---|------|------| +| 1 | `/ball ` | 新建 Ball:实现 → 注册 → ISA 宏 → CTest → 编译 → 仿真验证 | +| 2 | `/verify ` | 验证 Ball:完整性检查 → 补全 → 编译 → 仿真 → 覆盖率分析 | +| 3 | `/optimize ` | 优化 Ball:面积(yosys) + 时序(OpenSTA) + 延迟(仿真cycle) → 优化 → 回归验证 | + +## 架构 + +``` +用户 ──→ Claude Code (slash commands + CLAUDE.md) + │ + ├── 读写代码:Read/Edit/Write + ├── 静态校验:MCP validate + └── 编译/仿真/综合/测试:MCP bbdev_* → bbdev HTTP API + │ + └── bbdev server (Motia workflow 后端,MCP 自动管理生命周期) + ├── POST /verilator/run 全流程 clean→verilog→build→sim + ├── POST /verilator/verilog 生成 Verilog(支持 --balltype) + ├── POST /verilator/build 编译 Verilator(支持 --coverage) + ├── POST /verilator/sim 跑仿真(支持 --coverage) + ├── POST /workload/build 编译 CTest + ├── POST /sardine/run 批量测试(支持 --coverage → 覆盖率报告) + └── POST /yosys/synth Yosys 综合 + OpenSTA 时序分析 +``` + +## 文件清单 + +| 文件 | 说明 | +|------|------| +| `scripts/claude/mcp_server.py` | MCP Server:validate + bbdev API 封装 + server 生命周期管理 | +| `.claude/settings.json` | MCP 配置 | +| `CLAUDE.md` | 全局指令:项目结构、Blink 协议、注册不变量、工具使用 | +| `.claude/commands/ball.md` | `/ball ` 新建 Ball 全流程 | +| `.claude/commands/verify.md` | `/verify ` 验证 Ball | +| `.claude/commands/optimize.md` | `/optimize ` 优化 Ball | +| `.claude/commands/check.md` | `/check` 静态校验 | + +## MCP Server 工具列表 + +### 校验 +| 工具 | 功能 | +|------|------| +| `validate` | 检查 6 项注册不变量(ballId 递增/funct7 唯一/bid 对齐等) | + +### bbdev API 封装 +| 工具 | API | 说明 | +|------|-----|------| +| `bbdev_workload_build` | `/workload/build` | 编译 CTest | +| `bbdev_verilator_run` | `/verilator/run` | 全流程 clean→verilog→build→sim | +| `bbdev_verilator_verilog` | `/verilator/verilog` | 生成 Verilog | +| `bbdev_verilator_build` | `/verilator/build` | 编译 Verilator | +| `bbdev_verilator_sim` | `/verilator/sim` | 跑仿真 | +| `bbdev_sardine_run` | `/sardine/run` | 批量测试 | +| `bbdev_yosys_synth` | `/yosys/synth` | Yosys 综合 + OpenSTA | + +## bbdev Server 生命周期 + +MCP Server 自动管理 bbdev server: +- 首次调用 bbdev_* 时自动启动(`pnpm dev --port `) +- 启动前清理 BullMQ AOF 防止重放旧事件 +- 端口从 5100-5500 自动选择 +- 健康检查通过后才返回 +- 每次调用前检查存活,挂了自动重启 +- MCP Server 退出时自动清理 + +## Workflow 详细流程 + +### `/ball ` — 新建 Ball + +1. **需求收集**:读 default.json/DISA.scala 确定 ballId/funct7,问用户功能/inBW/outBW/op2 +2. **实现 Ball**:参考现有 Ball 代码,在 prototype/ 下创建 wrapper/core/config +3. **注册**:更新 default.json + busRegister + DISA + DomainDecoder +4. **ISA 宏**:创建 C 宏文件,更新 isa.h +5. **CTest**:创建测试 .c,注册 CMakeLists.txt,追加 sardine 列表 +6. **验证**:validate → bbdev_workload_build → bbdev_verilator_run → PASS/FAIL + +### `/verify ` — 验证 Ball + +1. **完整性检查**:注册/ISA 宏/CTest/sardine 条目是否完整,缺什么补什么 +2. **编译 + 仿真**:bbdev_workload_build → bbdev_verilator_run +3. **覆盖率分析**:bbdev_sardine_run(coverage=true) → 读覆盖率报告 → 建议补测试 +4. **失败分析**:读仿真 log → 分析 Chisel 代码 → 提修复方案 + +### `/optimize ` — 优化 Ball + +1. **基线测量**:bbdev_yosys_synth(面积+时序)+ bbdev_verilator_run(cycle 数) +2. **面积分析**:从 hierarchy_report 提取子模块面积,识别面积大户 +3. **时序/延迟分析**:timing_report 关键路径 + 仿真 cycle 数 + FSM 源码分析 +4. **优化方案**:量化的方案列表(手段/面积变化/延迟变化/频率影响/trade-off) +5. **实施**:修改 Chisel 代码 +6. **优化后测量**:再跑 yosys + verilator,输出前后对比报告 diff --git a/scripts/claude/mcp_server.py b/scripts/claude/mcp_server.py new file mode 100644 index 00000000..a0586014 --- /dev/null +++ b/scripts/claude/mcp_server.py @@ -0,0 +1,370 @@ +#!/usr/bin/env python3 +"""MCP Server for Buckyball Claude Code workflow. + +Provides: +- validate: static registration invariant checks +- bbdev_* tools: wrappers around bbdev HTTP API (server mode, auto-managed lifecycle) + +Uses the official `mcp` Python SDK for protocol compatibility with both +Claude Code CLI and Cursor IDE. +""" + +from __future__ import annotations + +import atexit +import json +import re +import shutil +import socket +import subprocess +import time +from pathlib import Path +from typing import Any, Dict, List, Optional +from urllib.error import HTTPError + +from mcp.server.fastmcp import FastMCP + +# --------------------------------------------------------------------------- +# Paths +# --------------------------------------------------------------------------- +REPO_ROOT = Path(__file__).resolve().parents[2] +BBDEV_API_DIR = REPO_ROOT / "bbdev" / "api" +BBDEV_LOG_DIR = REPO_ROOT / "bbdev" / "api" / "steps" +BBDEV_SERVER_LOG = REPO_ROOT / "bbdev" / "server.log" + +REGISTRATION_FILES = { + "default_json": REPO_ROOT + / "arch/src/main/scala/framework/balldomain/configs/default.json", + "bus_register": REPO_ROOT + / "arch/src/main/scala/examples/toy/balldomain/bbus/busRegister.scala", + "disa": REPO_ROOT / "arch/src/main/scala/examples/toy/balldomain/DISA.scala", + "domain_decoder": REPO_ROOT + / "arch/src/main/scala/examples/toy/balldomain/DomainDecoder.scala", +} + +# --------------------------------------------------------------------------- +# bbdev server lifecycle +# --------------------------------------------------------------------------- +_bbdev_proc: Optional[subprocess.Popen] = None +_bbdev_port: Optional[int] = None + + +def _find_available_port(start: int = 5200, end: int = 5500) -> int: + for port in range(start, end + 1): + try: + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + s.bind(("localhost", port)) + return port + except OSError: + continue + raise RuntimeError(f"No available port in {start}-{end}") + + +def _ensure_bbdev_server() -> int: + """Start bbdev server if not running. Returns port.""" + global _bbdev_proc, _bbdev_port + + if _bbdev_port is not None and _bbdev_proc is not None: + if _bbdev_proc.poll() is None and _health_check(_bbdev_port): + return _bbdev_port + # Server died, clean up + _stop_bbdev_server() + + # Clean AOF to prevent BullMQ replaying old events + aof_dir = BBDEV_API_DIR / ".motia" / "appendonlydir" + if aof_dir.exists(): + shutil.rmtree(aof_dir) + + port = _find_available_port() + _log_file = open(BBDEV_SERVER_LOG, "a", encoding="utf-8") + _bbdev_proc = subprocess.Popen( + ["pnpm", "dev", "--port", str(port)], + cwd=str(BBDEV_API_DIR), + stdout=_log_file, + stderr=_log_file, + ) + _bbdev_port = port + + # Wait for server to be ready + for _ in range(90): + if _health_check(port): + return port + time.sleep(1) + + _stop_bbdev_server() + raise RuntimeError(f"bbdev server failed to start on port {port} within 90s") + + +def _health_check(port: int) -> bool: + try: + import urllib.request + + req = urllib.request.Request( + f"http://localhost:{port}", + method="GET", + ) + with urllib.request.urlopen(req, timeout=2) as resp: + return resp.status == 200 + except Exception: + return False + + +def _stop_bbdev_server(): + global _bbdev_proc, _bbdev_port + if _bbdev_proc is not None: + try: + _bbdev_proc.terminate() + _bbdev_proc.wait(timeout=5) + except Exception: + try: + _bbdev_proc.kill() + except Exception: + pass + _bbdev_proc = None + _bbdev_port = None + + +atexit.register(_stop_bbdev_server) + + +def _bbdev_call( + endpoint: str, params: Dict[str, Any], timeout: int = 600 +) -> Dict[str, Any]: + """Call bbdev HTTP API. Auto-starts server if needed.""" + port = _ensure_bbdev_server() + url = f"http://localhost:{port}{endpoint}" + + data = json.dumps(params).encode("utf-8") + import urllib.request + + req = urllib.request.Request( + url, + data=data, + headers={"Content-Type": "application/json"}, + method="POST", + ) + try: + with urllib.request.urlopen(req, timeout=timeout) as resp: + body = resp.read().decode("utf-8") + return json.loads(body) + except urllib.error.HTTPError as e: + error_body = "" + try: + error_body = e.read().decode("utf-8") + except Exception: + pass + return { + "success": False, + "failure": True, + "error": str(e), + "status_code": e.code, + "response_body": error_body, + "server_log": str(BBDEV_SERVER_LOG), + } + except Exception as e: + return {"success": False, "failure": True, "error": str(e)} + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- +def _read_json(path: Path) -> Any: + with path.open("r", encoding="utf-8") as f: + return json.load(f) + + +def _extract_bitpat_values(text: str) -> List[int]: + vals = [] + for m in re.finditer(r'BitPat\("b([01]+)"\)', text): + vals.append(int(m.group(1), 2)) + return vals + + +def _extract_bus_register_names(text: str) -> List[str]: + names = [] + for m in re.finditer(r'case\s+"(\w+)"', text): + names.append(m.group(1)) + return names + + +def _extract_decoder_bids(text: str) -> List[int]: + bids = [] + for m in re.finditer(r"(\d+)\.U,\s*rs2", text): + bids.append(int(m.group(1))) + return bids + + +def _fmt(payload: Any) -> str: + return json.dumps(payload, ensure_ascii=False, indent=2) + + +# --------------------------------------------------------------------------- +# MCP Server (using official SDK) +# --------------------------------------------------------------------------- +mcp = FastMCP("buckyball-dev") + + +@mcp.tool() +def validate() -> str: + """Check 6 registration invariants: ballNum consistency, ballId strict increment, ballId no duplicates, funct7 no duplicates, busRegister matches default.json, decoder BIDs match default.json.""" + missing_files = [] + for name, path in REGISTRATION_FILES.items(): + if not path.exists(): + missing_files.append(str(path)) + + if missing_files: + return f"ERROR: Missing registration files: {', '.join(missing_files)}" + + cfg = _read_json(REGISTRATION_FILES["default_json"]) + mappings = cfg.get("ballIdMappings", []) + ids = [e.get("ballId") for e in mappings] + names_from_json = [e.get("ballName") for e in mappings] + + disa_text = REGISTRATION_FILES["disa"].read_text(encoding="utf-8") + funct7_values = _extract_bitpat_values(disa_text) + + bus_text = REGISTRATION_FILES["bus_register"].read_text(encoding="utf-8") + bus_names = _extract_bus_register_names(bus_text) + + decoder_text = REGISTRATION_FILES["domain_decoder"].read_text(encoding="utf-8") + decoder_bids = _extract_decoder_bids(decoder_text) + + checks = { + "ballNum_matches_count": { + "pass": cfg.get("ballNum") == len(mappings), + "expected": len(mappings), + "actual": cfg.get("ballNum"), + }, + "ballId_strict_increment": { + "pass": ids == list(range(len(ids))), + "ids": ids, + }, + "ballId_no_duplicates": { + "pass": len(ids) == len(set(ids)), + "duplicates": sorted(x for x in ids if ids.count(x) > 1), + }, + "funct7_no_duplicates": { + "pass": len(funct7_values) == len(set(funct7_values)), + "duplicates": sorted( + x for x in funct7_values if funct7_values.count(x) > 1 + ), + }, + "busRegister_matches_json": { + "pass": set(bus_names) == set(names_from_json), + "in_json_not_bus": sorted(set(names_from_json) - set(bus_names)), + "in_bus_not_json": sorted(set(bus_names) - set(names_from_json)), + }, + "decoder_bids_match_json": { + "pass": sorted(decoder_bids) == sorted(ids), + "decoder_bids": sorted(decoder_bids), + "json_ids": sorted(ids), + }, + } + + all_passed = all(c["pass"] for c in checks.values()) + return _fmt({"passed": all_passed, "checks": checks}) + + +@mcp.tool() +def bbdev_workload_build() -> str: + """Compile CTest workloads (bb-tests). Calls bbdev POST /workload/build.""" + result = _bbdev_call("/workload/build", {}, timeout=120) + return _fmt(result) + + +@mcp.tool() +def bbdev_verilator_run( + binary: str, + config: str = "sims.verilator.BuckyballToyVerilatorConfig", + batch: bool = True, + coverage: bool = False, + jobs: Optional[int] = None, +) -> str: + """Full verilator pipeline: clean -> verilog -> build -> sim. Calls bbdev POST /verilator/run.""" + api_params: Dict[str, Any] = { + "binary": binary, + "config": config, + "batch": batch, + "coverage": coverage, + } + if jobs is not None: + api_params["jobs"] = jobs + result = _bbdev_call("/verilator/run", api_params, timeout=1200) + return _fmt(result) + + +@mcp.tool() +def bbdev_verilator_verilog( + config: Optional[str] = None, + balltype: Optional[str] = None, +) -> str: + """Generate Verilog from Chisel. Supports --balltype for single Ball elaboration. Calls bbdev POST /verilator/verilog.""" + api_params: Dict[str, Any] = {} + if config: + api_params["config"] = config + if balltype: + api_params["balltype"] = balltype + result = _bbdev_call("/verilator/verilog", api_params, timeout=600) + return _fmt(result) + + +@mcp.tool() +def bbdev_verilator_build( + jobs: int = 16, + coverage: bool = False, +) -> str: + """Build verilator simulation executable. Calls bbdev POST /verilator/build.""" + api_params: Dict[str, Any] = {"jobs": jobs} + if coverage: + api_params["coverage"] = True + result = _bbdev_call("/verilator/build", api_params, timeout=600) + return _fmt(result) + + +@mcp.tool() +def bbdev_verilator_sim( + binary: str, + batch: bool = True, + coverage: bool = False, +) -> str: + """Run verilator simulation (assumes already built). Calls bbdev POST /verilator/sim.""" + api_params: Dict[str, Any] = { + "binary": binary, + "batch": batch, + } + if coverage: + api_params["coverage"] = True + result = _bbdev_call("/verilator/sim", api_params, timeout=1200) + return _fmt(result) + + +@mcp.tool() +def bbdev_sardine_run( + workload: str = "ctest", + coverage: bool = False, +) -> str: + """Run sardine batch tests. Calls bbdev POST /sardine/run. With coverage=true, generates coverage report at bb-tests/sardine/reports/coverage/.""" + api_params: Dict[str, Any] = {"workload": workload} + if coverage: + api_params["coverage"] = True + result = _bbdev_call("/sardine/run", api_params, timeout=1200) + return _fmt(result) + + +@mcp.tool() +def bbdev_yosys_synth( + top: Optional[str] = None, + config: Optional[str] = None, +) -> str: + """Run Yosys synthesis for area estimation + OpenSTA timing analysis. Generates hierarchy_report.txt, area_report.txt, and timing_report.txt in bbdev/api/steps/yosys/log/. Calls bbdev POST /yosys/synth.""" + api_params: Dict[str, Any] = {} + if top: + api_params["top"] = top + if config: + api_params["config"] = config + result = _bbdev_call("/yosys/synth", api_params, timeout=600) + return _fmt(result) + + +if __name__ == "__main__": + mcp.run(transport="stdio") diff --git a/scripts/docker/.dockerignore b/scripts/docker/.dockerignore deleted file mode 100644 index dd935bda..00000000 --- a/scripts/docker/.dockerignore +++ /dev/null @@ -1,60 +0,0 @@ -# Git -.git -.gitignore -.gitmodules - -# Docker -docker/ -Dockerfile* -docker-compose* - -# IDE -.vscode/ -.idea/ -*.swp -*.swo - -# OS -.DS_Store -Thumbs.db - -# Logs -*.log -logs/ - -# Runtime data -pids -*.pid -*.seed -*.pid.lock - -# Coverage directory used by tools like istanbul -coverage/ - -# Dependency directories -node_modules/ -jspm_packages/ - -# Optional npm cache directory -.npm - -# Optional REPL history -.node_repl_history - -# Output of 'npm pack' -*.tgz - -# Yarn Integrity file -.yarn-integrity - -# dotenv environment variables file -.env - -# Build outputs -dist/ -build/ -target/ - -# Temporary folders -tmp/ -temp/ diff --git a/scripts/docker/Dockerfile b/scripts/docker/Dockerfile deleted file mode 100644 index d149120f..00000000 --- a/scripts/docker/Dockerfile +++ /dev/null @@ -1,43 +0,0 @@ -FROM ubuntu:20.04 - -# Set environment variables -ENV DEBIAN_FRONTEND=noninteractive -ENV PYTHONUNBUFFERED=1 - -# Install system dependencies -RUN apt-get update && apt-get install -y \ - curl \ - wget \ - git \ - build-essential \ - python3 \ - python3-pip \ - python3-venv \ - nodejs \ - npm \ - openjdk-11-jdk \ - scala \ - && rm -rf /var/lib/apt/lists/* - -# Set working directory -WORKDIR /buckyball - -# Copy project files -COPY . /buckyball/ - -# Install Python dependencies (if requirements.txt exists) -# RUN if [ -f requirements.txt ]; then pip3 install -r requirements.txt; fi - -# Install Node.js dependencies (if package.json exists) -# RUN if [ -f package.json ]; then npm install; fi - -# Create a non-root user -RUN useradd -m -u 1000 bb && \ - chown -R bb:bb /buckyball -USER bb - -# Expose common development ports -EXPOSE 3000 8080 8000 - -# Set default command -CMD ["/bin/bash"] diff --git a/scripts/docker/README.md b/scripts/docker/README.md deleted file mode 100644 index 2e41b5d5..00000000 --- a/scripts/docker/README.md +++ /dev/null @@ -1,80 +0,0 @@ -# Buckyball Docker Environment - -This directory contains Docker configurations for running the Buckyball project. - -## File Description - -- `Dockerfile`: Main Docker image build file -- `docker-compose.yml`: Docker Compose configuration file for managing containers -- `.dockerignore`: Specifies which files should not be copied into the Docker container -- `README.md`: This documentation file - -## Environment Requirements - -- Docker Engine 20.10+ -- Docker Compose 2.0+ - -## Quick Start - -### 1. Build Image - -```bash -# Execute from project root directory -docker build -f docker/Dockerfile -t buckyball-dev . -``` - -### 2. Start Environment Using Docker Compose - -```bash -# Execute from docker directory -cd docker -docker-compose up -d -``` - -### 3. Enter Container - -```bash -# Enter running container -docker exec -it buckyball-dev bash -``` - -## Detailed Usage Instructions - -### Using Docker Compose - -1. **Start Environment**: -```bash -cd docker -docker-compose up -d -``` - -2. **Check Container Status**: -```bash -docker-compose ps -``` - -3. **Enter Container**: -```bash -docker-compose exec buckyball bash -``` - -4. **Stop Environment**: -```bash -docker-compose down -``` - -5. **Rebuild and Start**: -```bash -docker-compose up --build -d -``` - -## Common Issues - -1. When executing `docker-compose up -d`, the following error occurs: -``` -permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.45/containers/json?all=1&filters=%7B%22label%22%3A%7B%22com.docker.compose.config-hash%22%3Atrue%2C%22com.docker.compose.project%3Ddocker%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied -``` -Execute the following command and logout and login again: -``` -sudo usermod -aG docker $USER -``` diff --git a/scripts/docker/docker-compose.yml b/scripts/docker/docker-compose.yml deleted file mode 100644 index 1ef2681e..00000000 --- a/scripts/docker/docker-compose.yml +++ /dev/null @@ -1,20 +0,0 @@ -services: - buckyball: - build: - context: .. - dockerfile: docker/Dockerfile - container_name: bb-dev - volumes: - - ..:/buckyball - ports: - - "3000:3000" - - "8080:8080" - - "8000:8000" - environment: - - NODE_ENV=development - - PYTHONPATH=/buckyball - working_dir: /buckyball - stdin_open: true - tty: true - command: /bin/bash - image: buckyball:latest diff --git a/scripts/env-exit.sh b/scripts/env-exit.sh deleted file mode 100755 index 9ea2ddc1..00000000 --- a/scripts/env-exit.sh +++ /dev/null @@ -1,5 +0,0 @@ -# conda exit -# conda deactivate - -# path -export PATH=$(echo $PATH | tr ':' '\n' | grep -v buckyball | tr '\n' ':' | sed 's/:$//') diff --git a/scripts/init-as-lib.sh b/scripts/init-as-lib.sh deleted file mode 100755 index e69de29b..00000000 diff --git a/scripts/init.sh b/scripts/init.sh deleted file mode 100755 index 6cc81334..00000000 --- a/scripts/init.sh +++ /dev/null @@ -1,184 +0,0 @@ -#!/usr/bin/env bash - -# exit script if any command fails -set -e -set -o pipefail - -BBDIR=$(git rev-parse --show-toplevel) - -source ${BBDIR}/scripts/utils.sh - -usage() { - echo "Usage: ${0} [OPTIONS] " - echo "" - echo "Helper script to fully initialize repository that wraps other scripts." - echo "By default it initializes/installs things in the following order:" - echo " 1. Setup workflow management system" - echo " 2. Buckyball submodules" - echo " 3. Toolchain installation" - echo " 4. Compiler (buddy-mlir) pre-compile sources" - echo " 5. bb-tests (workloads) pre-compile sources" - echo " 6. Install Chipyard and Firesim" - echo " 7. Buckyball pre-compile sources" - echo " 8. Setup document management system" - echo " 9. Install func-sim" - echo " 10. Runs repository clean-up" - echo "" - echo "**See below for options to skip parts of the setup. Skipping parts of the setup is not guaranteed to be tested/working.**" - echo "" - echo "Options" - echo " --help -h : Display this message" - echo " --verbose -v : Verbose printout" - echo " --skip -s N : Skip step N in the list above. Use multiple times to skip multiple steps ('-s N -s M ...')." - echo " --admin : Add this option to install the admin tools (You dont need do this)." - echo " --conda-env-name : Add this option to specify the conda environment name. Default is buckyball." - - exit "$1" -} - -SKIP_LIST=() -VERBOSE_FLAG="" -ADMIN_MODE=false -CONDA_ENV_NAME="buckyball" - -while [ "$1" != "" ]; -do - case $1 in - -h | --help ) - usage 3 ;; - --verbose | -v) - VERBOSE_FLAG=$1 - set -x ;; - --skip | -s) - shift - SKIP_LIST+=(${1}) ;; - --admin) - ADMIN_MODE=true ;; - --conda-env-name) - shift - CONDA_ENV_NAME=${1} ;; - * ) - echo "Error: invalid option $1" >&2 - usage 1 ;; - esac - shift -done - -# return true if the arg is not found in the SKIP_LIST -run_step() { - local value=$1 - [[ ! " ${SKIP_LIST[*]} " =~ " ${value} " ]] -} - -function begin_step -{ - thisStepNum=$1; - thisStepDesc=$2; - - # Color codes - local BLUE='\033[0;34m' - local GREEN='\033[0;32m' - local YELLOW='\033[1;33m' - local NC='\033[0m' # No Color - - echo -e "${BLUE} =========================================================================" - echo -e "${GREEN} ==== BUCKYBALL SETUP STEP ${YELLOW}$thisStepNum${GREEN}: ${YELLOW}$thisStepDesc${GREEN} " - echo -e "${BLUE} =========================================================================" - echo -e "${NC}" -} - -if run_step "0"; then - begin_step "0" "init env.sh" - replace_content ${BBDIR}/env.sh base-conda-setup "source $(conda info --base)/etc/profile.d/conda.sh" -fi - -if run_step "1"; then - begin_step "1" "submodules init" - git submodule update --init - replace_content ${BBDIR}/arch/thirdparty/chipyard/env.sh base-conda-setup "source $(conda info --base)/etc/profile.d/conda.sh" -fi - -# setup and install chipyard environment -if run_step "2"; then - begin_step "2" "Chipyard environment setup" - cd ${BBDIR}/arch/thirdparty/chipyard && ./build-setup.sh --conda-env-name ${CONDA_ENV_NAME} - cp ${BBDIR}/arch/thirdparty/chipyard/env.sh ${BBDIR}/env.sh - replace_content ${BBDIR}/env.sh build-setup-conda "conda activate ${CONDA_ENV_NAME} -source ${BBDIR}/arch/thirdparty/chipyard/scripts/fix-open-files.sh" - replace_content ${BBDIR}/env.sh bb-dir-helper "export BB_DIR=${BBDIR}" -fi - -if run_step "3"; then - begin_step "3" "Compiler (buddy-mlir) pre-compile sources" - cd ${BBDIR} - source ${BBDIR}/env.sh - ./scripts/install-compiler.sh -fi - -if run_step "4"; then - begin_step "4" "Install bebop" - source ${BBDIR}/env.sh - # ${BBDIR}/scripts/install-bebop.sh - echo "bebop is not installed" -fi - -if run_step "5"; then - begin_step "5" "bb-tests (workloads) pre-compile sources" - source ${BBDIR}/env.sh - cd ${BBDIR}/bb-tests - mkdir -p build && cd build - cmake -G Ninja ../ - ninja -j$(nproc) -fi - -if run_step "6"; then - begin_step "6" "Install requirements for sardine" - source ${BBDIR}/env.sh - pip install -r ${BBDIR}/bb-tests/sardine/requirements.txt - npm install --prefix ${BBDIR}/bb-tests/sardine allure-commandline -fi - - -if run_step "7"; then - begin_step "7" "Install document management system" - source ${BBDIR}/env.sh - ${BBDIR}/scripts/install-doc.sh -fi - -if run_step "8"; then - begin_step "8" "Init workflow management system" - source ${BBDIR}/env.sh - ${BBDIR}/scripts/install-workflow.sh -fi - -if run_step "9"; then - begin_step "9" "Install mill" - source ${BBDIR}/env.sh - cd ${BBDIR}/tools/mill - ./install-mill.sh -fi - -# if run_step "10"; then -# begin_step "10" "pre-compile buckyball arch code" -# source ${BBDIR}/env.sh -# cd ${BBDIR}/ -# bbdev verilator --verilog -# fi - -if run_step "11"; then - begin_step "11" "Install pre-commit" - source ${BBDIR}/env.sh - # ${BBDIR}/scripts/install-pre-commit.sh - pip install pre-commit - cd ${BBDIR} - pre-commit install -fi - -# if run_step "12"; then -# begin_step "12" "Install verify tools" -# source ${BBDIR}/env.sh -# # ${BBDIR}/scripts/install-verify-tools.sh -# echo "veriy toolchain is not installed" -# fi - -begin_step "END" "Setup completed successfully!" diff --git a/scripts/install-bebop.sh b/scripts/install-bebop.sh deleted file mode 100755 index 6e89ddc0..00000000 --- a/scripts/install-bebop.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/bin/bash - -set -e - -BBDIR=$(git rev-parse --show-toplevel) -source ${BBDIR}/scripts/utils.sh - -cd $BBDIR/bebop -./host/install.sh diff --git a/scripts/install-compiler.sh b/scripts/install-compiler.sh deleted file mode 100755 index 2a5e11eb..00000000 --- a/scripts/install-compiler.sh +++ /dev/null @@ -1,50 +0,0 @@ -#!/usr/bin/env bash - -# exit script if any command fails -set -e -set -o pipefail - -BBDIR=$(git rev-parse --show-toplevel) -BUDDY_MLIR_DIR=${BBDIR}/compiler - -# get helpful utilities -source ${BBDIR}/scripts/utils.sh - -source ${BBDIR}/env.sh -pip install -r ${BUDDY_MLIR_DIR}/requirements.txt - -cd ${BUDDY_MLIR_DIR} -git submodule update --init - -mkdir -p llvm/build && cd llvm/build -cmake -G Ninja ../llvm \ - -DLLVM_ENABLE_PROJECTS="mlir;clang" \ - -DLLVM_TARGETS_TO_BUILD="host;RISCV" \ - -DLLVM_ENABLE_ASSERTIONS=ON \ - -DCMAKE_BUILD_TYPE=RELEASE \ - -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ - -DPython3_EXECUTABLE=$(which python3) -ninja #check-mlir check-clang - -cd ${BUDDY_MLIR_DIR} -mkdir -p build && cd build -cmake -G Ninja .. \ - -DMLIR_DIR=$PWD/../llvm/build/lib/cmake/mlir \ - -DLLVM_DIR=$PWD/../llvm/build/lib/cmake/llvm \ - -DLLVM_ENABLE_ASSERTIONS=ON \ - -DCMAKE_BUILD_TYPE=RELEASE \ - -DBUDDY_MLIR_ENABLE_PYTHON_PACKAGES=ON \ - -DPython3_EXECUTABLE=$(which python3) \ - -DPython_EXECUTABLE=$(which python3) \ - -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -ninja -#ninja check-buddy - -replace_content ${BBDIR}/env.sh install-compiler "# line auto-generated by $0 -export BUDDY_MLIR_BUILD_DIR=${BUDDY_MLIR_DIR}/build -export LLVM_MLIR_BUILD_DIR=${BUDDY_MLIR_DIR}/llvm/build -export PYTHONPATH=${BUDDY_MLIR_DIR}/llvm/build/tools/mlir/python_packages/mlir_core:${BUDDY_MLIR_DIR}/build/python_packages:\${PYTHONPATH}" - -export BUDDY_MLIR_BUILD_DIR=${BUDDY_MLIR_DIR}/build -export LLVM_MLIR_BUILD_DIR=${BUDDY_MLIR_DIR}/llvm/build -export PYTHONPATH=${BUDDY_MLIR_DIR}/llvm/build/tools/mlir/python_packages/mlir_core:${BUDDY_MLIR_DIR}/build/python_packages:${PYTHONPATH} diff --git a/scripts/install-doc.sh b/scripts/install-doc.sh deleted file mode 100755 index 278d4441..00000000 --- a/scripts/install-doc.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash - -set -e - -BBDIR=$(git rev-parse --show-toplevel) -source ${BBDIR}/scripts/utils.sh - -# install rustup -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y -source $HOME/.cargo/env - -replace_content ${BBDIR}/env.sh bb-doc-server "source $HOME/.cargo/env" - -# install mdbook and mdbook-linkcheck -cargo install mdbook -cargo install mdbook-linkcheck -cargo install mdbook-pdf -cargo install mdbook-toc -cargo install mdbook-mermaid - -mdbook-mermaid install ${BBDIR}/docs/bb-note/ - -# mdbook serve --open -p 3001 diff --git a/scripts/install-pre-commit.sh b/scripts/install-pre-commit.sh deleted file mode 100755 index 5c17b388..00000000 --- a/scripts/install-pre-commit.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env bash -set -e - -BBDIR=$(git rev-parse --show-toplevel) -source ${BBDIR}/scripts/utils.sh - -# Check if scalafmt is already installed -if command -v scalafmt &> /dev/null; then - echo "scalafmt is already installed" - scalafmt --version - exit 0 -fi - -# If not, install coursier and scalafmt -if ! command -v cs &> /dev/null; then - echo "Installing coursier..." - curl -fL https://github.com/coursier/launchers/raw/master/cs-x86_64-pc-linux.gz | gzip -d > cs - chmod +x cs && ./cs setup --yes - rm -f cs -fi - -replace_content ${BBDIR}/env.sh install-pre-commit "export PATH=$HOME/.local/share/coursier/bin:\$PATH" - -# Install scalafmt -echo "Installing scalafmt..." -cs install scalafmt - -echo "Installation complete!" -echo "scalafmt version:" -scalafmt --version diff --git a/scripts/install-verify-tools.sh b/scripts/install-verify-tools.sh deleted file mode 100755 index 1e7c056a..00000000 --- a/scripts/install-verify-tools.sh +++ /dev/null @@ -1,36 +0,0 @@ -#!/bin/bash - -BBDIR=$(git rev-parse --show-toplevel) -source ${BBDIR}/scripts/utils.sh - -if [ -z "$CONDA_PREFIX" ]; then - echo "CONDA environment is not set, please source the env.sh" - exit 1 -else - PREFIX=$CONDA_PREFIX/picker - echo "Picker will be installed to $PREFIX" -fi - -# ================================================================== -# install picker -# ================================================================== - -# === install dependencies for picker -mkdir -p $BBDIR/tmp && cd $BBDIR/tmp -wget "https://github.com/chipsalliance/verible/releases/download/v0.0-4007-g98bdb38a/verible-v0.0-4007-g98bdb38a-linux-static-x86_64.tar.gz" -tar -xzf verible-v0.0-4007-g98bdb38a-linux-static-x86_64.tar.gz -mv verible-v0.0-4007-g98bdb38a/bin/* $PREFIX -cd $BBDIR && rm -rf $BBDIR/tmp - -conda install swig=4.2.0 -y - -# === install picker -cd $BBDIR/thirdparty/picker -make init - -cd $BBDIR/thirdparty/picker -make -j$(nproc) ARGS="-DCMAKE_INSTALL_PREFIX=$PREFIX" -sudo -E make install - -replace_content ${BBDIR}/env.sh picker-install "\ -export PATH=$PREFIX/bin:\$PATH" diff --git a/scripts/install-workflow.sh b/scripts/install-workflow.sh deleted file mode 100755 index 0b64e17e..00000000 --- a/scripts/install-workflow.sh +++ /dev/null @@ -1,50 +0,0 @@ -#!/bin/bash - -# exit script if any command fails -set -e -set -o pipefail - -BBDIR=$(git rev-parse --show-toplevel) - -source ${BBDIR}/scripts/utils.sh - -cd ${BBDIR} -replace_content ${BBDIR}/env.sh install-workflow "export PATH=${BBDIR}/workflow:\$PATH" -source ${BBDIR}/env.sh - -# lower node veersion is not supported for motia -conda install -c conda-forge nodejs=20 -y -# conda install -c conda-forge redis -y -pip install python-dotenv -pip install httpx - -cd ${BBDIR}/workflow -ln -s ${CONDA_PREFIX} ./python_modules || true - -cd ${BBDIR}/workflow -# if package.json does not exist, create and install motia -# if [ ! -f package.json ]; then - # install system dependencies for compiling Redis (redis-memory-server requires) -# if command -v apt-get &> /dev/null; then -# sudo apt-get update -qq -# sudo apt-get install -y libsystemd-dev build-essential || true -# fi -# npm init -y -# npm install motia@0.13.0-beta.161 -# fi -export USE_SYSTEMD=no -npm init -y -npm install motia@0.13.0-beta.161 -npx motia create -t python - -cd ${BBDIR}/workflow/steps && rm *.{py,json} || true -cd ${BBDIR}/workflow/steps && rm -r src/ || true -cd ${BBDIR}/workflow/steps && rm -r petstore/ || true -cd ${BBDIR}/workflow && rm -r src/ || true -cd ${BBDIR}/workflow && rm -r tutorial/ || true -cd ${BBDIR}/workflow && rm *.{md,tsx,rdb} || true - -# install MCP -pip install mcp -pip install redis -pip install httpx_sse diff --git a/scripts/nix/build-all.sh b/scripts/nix/build-all.sh new file mode 100755 index 00000000..41995485 --- /dev/null +++ b/scripts/nix/build-all.sh @@ -0,0 +1,163 @@ +#!/usr/bin/env bash + +# exit script if any command fails +set -e +set -o pipefail + +BBDIR=$(git rev-parse --show-toplevel) + +usage() { + echo "Usage: ${0} [OPTIONS] " + echo "" + echo "Helper script to fully initialize repository that wraps other scripts." + echo "By default it initializes/installs things in the following order:" + echo " 1. bbdev install" + echo " 2. Compiler installation" + echo " 3. RTL pre-compile sources" + echo " 4. bb-tests pre-compile sources" + echo " 5. waveform-mcp build" + echo " 6. pre-commit hooks installation" + echo "" + echo "**See below for options to skip parts of the setup. Skipping parts of the setup is not guaranteed to be tested/working.**" + echo "" + echo "Options" + echo " --help -h : Display this message" + echo " --skip -s N : Skip step N in the list above. Use multiple times to skip multiple steps ('-s N -s M ...')." + exit "$1" +} + +SKIP_LIST=() +VERBOSE_FLAG="" +INSTALL_IN_NIX=0 + +while [ "$1" != "" ]; +do + case $1 in + -h | --help ) + usage 3 ;; + --verbose | -v) + VERBOSE_FLAG=$1 + set -x ;; + --skip | -s) + shift + SKIP_LIST+=(${1}) ;; + --install-in-nix) + INSTALL_IN_NIX=1 ;; + * ) + echo "Error: invalid option $1" >&2 + usage 1 ;; + esac + shift +done + +# return true if the arg is not found in the SKIP_LIST +run_step() { + local value=$1 + [[ ! " ${SKIP_LIST[*]} " =~ " ${value} " ]] +} + +function begin_step +{ + thisStepNum=$1; + thisStepDesc=$2; + + # Color codes + local BLUE='\033[0;34m' + local GREEN='\033[0;32m' + local YELLOW='\033[1;33m' + local NC='\033[0m' # No Color + + echo -e "${BLUE} =========================================================================" + echo -e "${GREEN} ==== BUCKYBALL SETUP STEP ${YELLOW}$thisStepNum${GREEN}: ${YELLOW}$thisStepDesc${GREEN} " + echo -e "${BLUE} =========================================================================" + echo -e "${NC}" +} + +begin_step "0-1" "submodules init" +git submodule update --init +# fpga/fpga-shells is needed by palladium +cd ${BBDIR}/arch/thirdparty/chipyard && git submodule update --init fpga/fpga-shells generators/* tools/* sims/firesim + +begin_step "0-2" "Nix environment setup" +cd ${BBDIR} +nix build + +if [ "${INSTALL_IN_NIX}" != "1" ]; then + SKIP_ARGS="" + for skip in "${SKIP_LIST[@]}"; do + SKIP_ARGS="${SKIP_ARGS} -s ${skip}" + done + exec nix develop --command bash ${BBDIR}/scripts/nix/build-all.sh --install-in-nix ${SKIP_ARGS} ${VERBOSE_FLAG} +fi + +if run_step "1"; then + begin_step "1" "bbdev install" + + # Create python_modules venv FIRST (uses Nix Python with pydantic/requests) + # Motia postinstall would run pip install -> SOCKS proxy fails. Use Nix instead. + echo "Setting up bbdev Python environment (from Nix)..." + python3 -m venv --without-pip --system-site-packages "${BBDIR}/bbdev/api/python_modules" + + echo "Installing bbdev node dependencies..." + cd ${BBDIR}/bbdev/api + pnpm install --ignore-scripts --frozen-lockfile 2>/dev/null +fi + +if run_step "2"; then + begin_step "2" "Compiler installation" + cd ${BBDIR}/compiler + git submodule update --init llvm + + mkdir -p llvm/build && cd llvm/build + cmake -G Ninja ../llvm \ + -DLLVM_ENABLE_PROJECTS="mlir;clang" \ + -DLLVM_TARGETS_TO_BUILD="host;RISCV" \ + -DLLVM_ENABLE_ASSERTIONS=ON \ + -DCMAKE_BUILD_TYPE=RELEASE \ + -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ + -DPython3_EXECUTABLE=$(which python3) + ninja #check-mlir check-clang + + cd ${BBDIR}/compiler + mkdir -p build && cd build + cmake -G Ninja .. \ + -DMLIR_DIR=$PWD/../llvm/build/lib/cmake/mlir \ + -DLLVM_DIR=$PWD/../llvm/build/lib/cmake/llvm \ + -DLLVM_ENABLE_ASSERTIONS=ON \ + -DCMAKE_BUILD_TYPE=RELEASE \ + -DBUDDY_MLIR_ENABLE_PYTHON_PACKAGES=ON \ + -DPython3_EXECUTABLE=$(which python3) \ + -DPython_EXECUTABLE=$(which python3) \ + -DCMAKE_EXPORT_COMPILE_COMMANDS=ON + ninja # check-buddy +fi + +if run_step "3"; then + begin_step "3" "arch pre-compile sources" + # Generate firrtl2 ANTLR and compile firrtl2 in chipyard first (avoids antlr missing when arch compiles chipyard) + cd ${BBDIR}/arch/thirdparty/chipyard + sbt -J-Xms512m -J-Xmx4g -J-XX:+UseG1GC "firrtl2/compile" + cd ${BBDIR}/arch + bbdev verilator --verilog '--config sims.verilator.BuckyballToyVerilatorConfig' +fi + +if run_step "4"; then + begin_step "4" "bb-tests pre-compile sources" + bbdev workload --build +fi + +if run_step "5"; then + begin_step "5" "waveform-mcp build" + cd ${BBDIR}/thirdparty/waveform-mcp + cargo build --release +fi + +if run_step "6"; then + begin_step "6" "pre-commit hooks installation" + cd ${BBDIR} + pre-commit install + # Replace with wrapper so git commit gets nix env (result/bin in PATH) + cp "${BBDIR}/scripts/pre-commit-hook.sh" "${BBDIR}/.git/hooks/pre-commit" +fi + +begin_step "END" "Setup completed successfully!" diff --git a/scripts/nix/build-env-bbdev.nix b/scripts/nix/build-env-bbdev.nix new file mode 100644 index 00000000..f38471e8 --- /dev/null +++ b/scripts/nix/build-env-bbdev.nix @@ -0,0 +1,17 @@ +{ pkgs }: + +{ + # Node.js environment (for bbdev Motia backend) + nodejs = pkgs.nodejs_22; + pnpm = pkgs.nodePackages.pnpm; + + # UV (motia install uses it for Python deps; avoid auto-install via broken pip) + uv = pkgs.uv; + # Allure CLI for sardine Allure reports + allure = pkgs.allure; + + # Build tools (for compiling native modules) + gcc = pkgs.gcc; + gnumake = pkgs.gnumake; + pkg-config = pkgs.pkg-config; +} diff --git a/scripts/nix/build-env-clibs.nix b/scripts/nix/build-env-clibs.nix new file mode 100644 index 00000000..90a2f9c4 --- /dev/null +++ b/scripts/nix/build-env-clibs.nix @@ -0,0 +1,16 @@ +{ pkgs }: + +{ + # C libraries needed by Verilator build + zlib-dev = pkgs.zlib.dev; + zlib = pkgs.zlib; + # C libraries needed by bdb debugger + readline-dev = pkgs.readline.dev; + readline = pkgs.readline; + # buddy DIP imgcodecs (grfmt_jpeg.h) + jpeg-dev = pkgs.libjpeg.dev; + jpeg = pkgs.libjpeg; + # buddy DIP imgcodecs (grfmt_png.h) + png-dev = pkgs.libpng.dev; + png = pkgs.libpng; +} diff --git a/scripts/nix/build-env-doc.nix b/scripts/nix/build-env-doc.nix new file mode 100644 index 00000000..02799b16 --- /dev/null +++ b/scripts/nix/build-env-doc.nix @@ -0,0 +1,12 @@ +{ pkgs }: + +{ + # Rust documentation generator + mdbook = pkgs.mdbook; + + # mdbook plugins + mdbook-linkcheck = pkgs.mdbook-linkcheck; + mdbook-pdf = pkgs.mdbook-pdf; + mdbook-toc = pkgs.mdbook-toc; + mdbook-mermaid = pkgs.mdbook-mermaid; +} diff --git a/scripts/nix/build-env-python.nix b/scripts/nix/build-env-python.nix new file mode 100644 index 00000000..80634b18 --- /dev/null +++ b/scripts/nix/build-env-python.nix @@ -0,0 +1,54 @@ +{ pkgs }: + +{ + # Python and pip packages + python3 = pkgs.python3; + + # Python packages + python3Packages = pkgs.python3.withPackages (ps: with ps; [ + # bbdev + pydantic + python-dotenv + httpx + mcp + redis + httpx-sse + requests + pysocks + allure-pytest + matplotlib + + # pre-commit hooks (language: system use) + black + flake8 + pre-commit-hooks + + # compiler + torch + numpy + transformers + tokenizers + sentencepiece + accelerate + protobuf + pybind11 + torchvision + tabulate + datasets + soundfile + librosa + pyyaml + certifi + idna + diffusers + nanobind + + # testing (sardine) + pytest + pytest-html + pytest-xdist + pytest-cov + allure-pytest + colorlog + ]); +} diff --git a/scripts/nix/build-env-riscv.nix b/scripts/nix/build-env-riscv.nix new file mode 100644 index 00000000..ea5c7eef --- /dev/null +++ b/scripts/nix/build-env-riscv.nix @@ -0,0 +1,68 @@ +{ pkgs }: + +let + # Build newlib-nano with size optimization flags + newlib-nano = pkgs.pkgsCross.riscv64-embedded.newlib.overrideAttrs (oldAttrs: { + pname = "newlib-nano"; + configureFlags = oldAttrs.configureFlags or [] ++ [ + "--enable-newlib-nano-malloc" + "--enable-newlib-nano-formatted-io" + "--enable-newlib-reent-small" + "--disable-newlib-fvwrite-in-streamio" + "--disable-newlib-fseek-optimization" + "--disable-newlib-wide-orient" + "--disable-newlib-unbuf-stream-opt" + "--enable-lite-exit" + "--enable-newlib-global-atexit" + ]; + CFLAGS_FOR_TARGET = "-Os -ffunction-sections -fdata-sections -mcmodel=medany"; + }); + + # Create a custom cross system with newlib-nano + riscv64EmbeddedWithNano = pkgs.pkgsCross.riscv64-embedded.stdenv.targetPlatform // { + libc = "newlib-nano"; + }; + + # Build the toolchain with the custom platform + pkgsCrossWithNano = import pkgs.path { + inherit (pkgs) system; + crossSystem = riscv64EmbeddedWithNano; + overlays = [ + (self: super: { + newlib = newlib-nano; + }) + ]; + }; +in + +{ + # RISC-V embedded toolchain (bare metal), with riscv64-unknown-elf-* symlinks + # Uses newlib-nano; baremetal runtime provided by bb-tests/workloads/src/CTest/toy/crt0.S + riscv-embedded-gcc = pkgs.symlinkJoin { + name = "riscv64-unknown-elf-gcc"; + paths = [ pkgsCrossWithNano.buildPackages.gcc ]; + postBuild = '' + cd $out/bin + for f in riscv64-none-elf-*; do + [ -e "$f" ] || continue + newname=''${f/riscv64-none-elf/riscv64-unknown-elf} + ln -sf "$f" "$newname" + done + ''; + }; + + # RISC-V Linux toolchain + riscv-linux-gcc = let + cc = pkgs.pkgsCross.riscv64.stdenv.cc; + libcStatic = pkgs.pkgsCross.riscv64.stdenv.cc.libc.static; + in pkgs.runCommand "riscv64-linux-gnu-toolchain" {} '' + mkdir -p $out/bin + for f in ${cc}/bin/riscv64-unknown-linux-gnu-*; do + [ -e "$f" ] || continue + name=$(basename "$f") + echo '#!${pkgs.stdenv.shell}' > $out/bin/$name + echo 'exec "'"$f"'" -L${libcStatic}/lib "$@"' >> $out/bin/$name + chmod +x $out/bin/$name + done + ''; +} diff --git a/scripts/nix/build-env-rust.nix b/scripts/nix/build-env-rust.nix new file mode 100644 index 00000000..57cc5d90 --- /dev/null +++ b/scripts/nix/build-env-rust.nix @@ -0,0 +1,9 @@ +{ pkgs }: + +{ + # Rust toolchain (waveform-mcp, etc.) + rustc = pkgs.rustc; + cargo = pkgs.cargo; + rustfmt = pkgs.rustfmt; + clippy = pkgs.clippy; +} diff --git a/scripts/nix/build-env-scala.nix b/scripts/nix/build-env-scala.nix new file mode 100644 index 00000000..62fedc03 --- /dev/null +++ b/scripts/nix/build-env-scala.nix @@ -0,0 +1,51 @@ +{ pkgs }: + +let + millVersion = "0.11.4"; + millBinary = pkgs.fetchurl { + url = "https://github.com/com-lihaoyi/mill/releases/download/${millVersion}/${millVersion}"; + sha256 = "1swayysb1baqk7zhrlzvikd4plqznaa0nkx2bwc57dvwxp06whz2"; + }; + mill = pkgs.stdenv.mkDerivation { + name = "mill-${millVersion}"; + src = millBinary; + dontUnpack = true; + nativeBuildInputs = [ pkgs.makeWrapper ]; + installPhase = '' + mkdir -p $out/bin + cp $src $out/bin/mill + chmod +x $out/bin/mill + ''; + meta = with pkgs.lib; { + description = "Mill build tool ${millVersion}"; + homepage = "https://github.com/com-lihaoyi/mill"; + license = licenses.asl20; + platforms = platforms.all; + }; + }; +in +{ + # Build tool for Scala, Java and more + inherit mill; + + # sbt 1.8.2 + sbt = (pkgs.sbt.override { jre = pkgs.jdk17; }).overrideAttrs (old: { + version = "1.8.2"; + src = pkgs.fetchurl { + url = "https://github.com/sbt/sbt/releases/download/v1.8.2/sbt-1.8.2.tgz"; + sha256 = "11j6vyxpiqbaxg5pzm6awmrdf6fkz3pw14zszrnxdnvll16k8r8z"; + }; + }); + + # Scala formatter - use coursier to get 2.7.5 + scalafmt = pkgs.writeShellApplication { + name = "scalafmt"; + runtimeInputs = [ pkgs.coursier ]; + text = '' + exec coursier launch org.scalameta:scalafmt-cli_2.13:2.7.5 -- "$@" + ''; + }; + + # Coursier - Scala dependency manager and launcher + coursier = pkgs.coursier; +} diff --git a/scripts/nix/build-env-tools.nix b/scripts/nix/build-env-tools.nix new file mode 100644 index 00000000..f54d01fe --- /dev/null +++ b/scripts/nix/build-env-tools.nix @@ -0,0 +1,95 @@ +{ pkgs }: + +let + # DRAMSim2 from firesim (rev matches chipyard pin); -fPIC for PIE linking under Nix + dramsim2 = pkgs.stdenv.mkDerivation { + pname = "dramsim2"; + version = "2023-05-10"; + src = pkgs.fetchFromGitHub { + owner = "firesim"; + repo = "DRAMSim2"; + rev = "44322e2f935d7dac83b7adf8dd270b41a54c6acb"; + hash = "sha256-Vfb+MeWdUESc7gt6GhL6jBO1Uuvx8s1BdfhCikTyTh8="; + }; + buildPhase = '' + make CXXFLAGS="-DNO_STORAGE -Wall -DDEBUG_BUILD -O3 -fPIC" libdramsim.a + ''; + installPhase = '' + runHook preInstall + mkdir -p $out/lib $out/include + cp libdramsim.a $out/lib/ + cp *.h $out/include/ + runHook postInstall + ''; + }; + + # CUDD BDD library (required by OpenSTA) + cudd = pkgs.stdenv.mkDerivation { + pname = "cudd"; + version = "3.0.0"; + src = pkgs.fetchFromGitHub { + owner = "The-OpenROAD-Project"; + repo = "cudd"; + rev = "3.0.0"; + hash = "sha256-ybsFPcggPsb6lfZbWbwxNTuZSOC7lLNY/iZSTvyFmdU="; + }; + nativeBuildInputs = [ pkgs.autoreconfHook ]; + configureFlags = [ "--prefix=$(out)" "CFLAGS=-fPIC" "CXXFLAGS=-fPIC" ]; + installPhase = '' + runHook preInstall + make install + runHook postInstall + ''; + }; + + # OpenSTA - gate-level static timing analysis + opensta = pkgs.stdenv.mkDerivation { + pname = "opensta"; + version = "unstable-2025"; + src = pkgs.fetchFromGitHub { + owner = "The-OpenROAD-Project"; + repo = "OpenSTA"; + rev = "5e9e9db7061fddf1b0b9c47c49c920c56da140e3"; + hash = "sha256-SfxNh5PFWWTdTH0ZiiATV1F0qOBTh50+xM9roJMHtLg=="; + }; + nativeBuildInputs = with pkgs; [ cmake flex bison swig ]; + buildInputs = with pkgs; [ tcl eigen zlib ]; + cmakeFlags = [ + "-DCUDD_DIR=${cudd}" + "-DUSE_TCL_READLINE=OFF" + ]; + installPhase = '' + runHook preInstall + mkdir -p $out/bin + find . -name sta -type f -executable -exec cp {} $out/bin/ \; + runHook postInstall + ''; + }; +in +{ + # Pin to Verilator 5.022 2024-02-24 (nixpkgs-unstable ships 5.044) + verilator = pkgs.verilator.overrideAttrs (old: { + version = "5.022"; + src = pkgs.fetchurl { + url = "https://github.com/verilator/verilator/archive/refs/tags/v5.022.tar.gz"; + hash = "sha256-PC9TOPS2zn4vR6FCQBrN0Yy/TF2gYJJhjW0DbAr+8S0="; + }; + sourceRoot = "verilator-5.022"; + doCheck = false; + }); + + dramsim2 = dramsim2; + + # Build acceleration tools + ccache = pkgs.ccache; + lld = pkgs.lld; + + # Synthesis tools + yosys = pkgs.yosys; + + # Static timing analysis + opensta = opensta; + + # Coverage report (genhtml) + lcov = pkgs.lcov; +} diff --git a/scripts/nix/overlay.nix b/scripts/nix/overlay.nix new file mode 100644 index 00000000..90bd8b39 --- /dev/null +++ b/scripts/nix/overlay.nix @@ -0,0 +1,12 @@ +final: prev: +{ + bbdev = final.callPackage ./build-env-bbdev.nix { }; + # Named rustTools to avoid shadowing nixpkgs `rust` (used by rust hooks) + rustTools = final.callPackage ./build-env-rust.nix { }; + clibs = final.callPackage ./build-env-clibs.nix { }; + doc = final.callPackage ./build-env-doc.nix { }; + python = final.callPackage ./build-env-python.nix { }; + riscv = final.callPackage ./build-env-riscv.nix { }; + scala = final.callPackage ./build-env-scala.nix { }; + tools = final.callPackage ./build-env-tools.nix { }; +} diff --git a/scripts/pre-commit-hook.sh b/scripts/pre-commit-hook.sh new file mode 100755 index 00000000..4591db8c --- /dev/null +++ b/scripts/pre-commit-hook.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash +# Wrapper to run pre-commit with Nix env (result/bin in PATH). +# Git hooks run without nix develop, so we must load the env manually. + +HERE="$(cd "$(dirname "$0")" && pwd)" +REPO_ROOT="${HERE}/../.." +export PATH="${REPO_ROOT}/result/bin:${PATH}" + +exec pre-commit hook-impl --config="${REPO_ROOT}/.pre-commit-config.yaml" \ + --hook-type=pre-commit --hook-dir "$HERE" -- "$@" diff --git a/scripts/replace-content.py b/scripts/replace-content.py deleted file mode 100755 index 9798b875..00000000 --- a/scripts/replace-content.py +++ /dev/null @@ -1,67 +0,0 @@ -#!/usr/bin/env python3 - -# Replace text in a file given a key identifying a block to replace. -# If the file doesn't exist, create it. -# -# args -# $1 - file to replace text in -# $2 - key used to find block of text to replace -# $3 - text to fill in block that is replaced - -import re -import sys - - -def CY_INITIALIZE_RE_BLOCK(k): - return ( - r"^# >>> " + f"{k}" + r" initialize >>>(?:\n|\r\n)" - r"([\s\S]*?)" - r"# <<< " + f"{k}" + r" initialize <<<(?:\n|\r\n)?" - ) - - -def CY_INITIALIZE_START_TOKEN(k): - return "# >>> " + f"{k}" + " initialize >>>" - - -def CY_INITIALIZE_END_TOKEN(k): - return "# <<< " + f"{k}" + " initialize <<<" - - -# ------------------------------ - -try: - with open(sys.argv[1]) as fh: - fh_content = fh.read() -except FileNotFoundError: - fh_content = "" -except Exception: - raise - -initialize_comment_key = sys.argv[2] -inner_contents = ( - CY_INITIALIZE_START_TOKEN(initialize_comment_key) - + "\n" - + sys.argv[3] - + "\n" - + CY_INITIALIZE_END_TOKEN(initialize_comment_key) - + "\n" -) - -# ------------------------------ - -replace_str = "__CY_REPLACE_ME_123__" -fh_content = re.sub( - CY_INITIALIZE_RE_BLOCK(initialize_comment_key), - replace_str, - fh_content, - flags=re.MULTILINE, -) -# TODO: maybe remove all but last of replace_str, if there's more than one occurrence -fh_content = fh_content.replace(replace_str, inner_contents) - -if CY_INITIALIZE_START_TOKEN(initialize_comment_key) not in fh_content: - fh_content += "\n%s\n" % inner_contents - -with open(sys.argv[1], "w") as fh: - fh.write(fh_content) diff --git a/scripts/utils.sh b/scripts/utils.sh deleted file mode 100644 index 522675cb..00000000 --- a/scripts/utils.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash - - -####################################### -# Wrapper around replace-content.py. -# For a file ($1), write out text ($3) into it -# replacing any area designated by $2. -####################################### -function replace_content -{ - # If BASH_SOURCE is undefined, we may be running under zsh, in that case - # provide a zsh-compatible alternative - DIR="$(dirname "$(readlink -f "${BASH_SOURCE[0]:-${(%):-%x}}")")" - file="$1" - shift - key="$1" - shift - $DIR/replace-content.py "$file" "$key" "$@" -} diff --git a/sourceme.sh b/sourceme.sh new file mode 100644 index 00000000..2cdffb4f --- /dev/null +++ b/sourceme.sh @@ -0,0 +1,41 @@ +#!/usr/bin/env bash +# Source this file to add result/bin to PATH (requires 'nix build' first). +# This file is used to source the environment variables when you enter the +# buckyball environment ('nix develop' or just get environment variables). + +BBDIR=$(dirname "$(readlink -f "${BASH_SOURCE[0]}")") +RESULT_PATH="${BBDIR}/result" + +# if [ ! -d "$RESULT_PATH" ]; then +# echo "Warning: result not found at $RESULT_PATH. Run 'nix build' first." >&2 +# return 1 2>/dev/null || exit 1 +# fi + +#===----------------------------------------------------------------------------=== +# Source each submodule's ShellHooks +#===----------------------------------------------------------------------------=== +# source "${BBDIR}/bbdev/nix/init.sh" + +#===----------------------------------------------------------------------------=== +# Source Environment Variables +#===----------------------------------------------------------------------------=== +export BUDDY_MLIR_BUILD_DIR="${BBDIR}/compiler/build" +export LLVM_MLIR_BUILD_DIR="${BBDIR}/compiler/llvm/build" +export PYTHONPATH="${BBDIR}/compiler/llvm/build/tools/mlir/python_packages/mlir_core:${BBDIR}/compiler/build/python_packages:$PYTHONPATH" +export BUDDY_BINARY_DIR="${BBDIR}/compiler/build/bin" +export RISCV="${BBDIR}/result" +export PATH="${BBDIR}/thirdparty/libgloss/install/lib:$PATH" +export PATH="${BUDDY_BINARY_DIR}:${PATH}" + +#===----------------------------------------------------------------------------=== +# Export each submodule's PATH +#===----------------------------------------------------------------------------=== +export PATH="${RESULT_PATH}/riscv64-unknown-elf/lib:${PATH}" +export PATH="${RESULT_PATH}/bin:${PATH}" + +# bbdev CLI and Python utils +export PATH="${BBDIR}/bbdev:${PATH}" +export PYTHONPATH="${BBDIR}/bbdev/api:${PYTHONPATH}" + +# sardine +export PYTHONPATH="${BBDIR}/lib/python3.13/site-packages:${PYTHONPATH}" diff --git a/thirdparty/palladium b/thirdparty/palladium new file mode 160000 index 00000000..3990e3ed --- /dev/null +++ b/thirdparty/palladium @@ -0,0 +1 @@ +Subproject commit 3990e3edfc5176599d7789bd357ae17c2c6163cc diff --git a/thirdparty/picker b/thirdparty/picker deleted file mode 160000 index b3b96016..00000000 --- a/thirdparty/picker +++ /dev/null @@ -1 +0,0 @@ -Subproject commit b3b960163cea9ddeeab3ec4852931021c1e1471d diff --git a/thirdparty/waveform-mcp b/thirdparty/waveform-mcp new file mode 160000 index 00000000..0dc6a87d --- /dev/null +++ b/thirdparty/waveform-mcp @@ -0,0 +1 @@ +Subproject commit 0dc6a87dca16a151ca2313223b75c67e4270f610 diff --git a/tools/mill/.gitignore b/tools/mill/.gitignore deleted file mode 100644 index 50519060..00000000 --- a/tools/mill/.gitignore +++ /dev/null @@ -1 +0,0 @@ -mill diff --git a/tools/mill/install-mill.sh b/tools/mill/install-mill.sh deleted file mode 100755 index 0838e989..00000000 --- a/tools/mill/install-mill.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -set -e - -BBDIR=$(git rev-parse --show-toplevel) -MILL_DIR=$BBDIR/tools/mill - -# Install Mill -curl -L https://raw.githubusercontent.com/lefou/millw/0.4.11/millw > $MILL_DIR/mill && chmod +x $MILL_DIR/mill - -# Add Mill to PATH -source $BBDIR/scripts/utils.sh - -replace_content ${BBDIR}/env.sh "install-mill.sh" "export PATH=${BBDIR}/tools/mill:\$PATH" -source ${BBDIR}/env.sh -# export PATH=${BBDIR}/tools/mill:\$PATH -# echo "#mill path" >> ~/.${SHELL##*/}rc -# echo "export PATH=\"${BBDIR}/tools/mill:\$PATH\"" >> ~/.${SHELL##*/}rc - -# Verify installation -# mill --version diff --git a/tools/palladium b/tools/palladium deleted file mode 160000 index 5334475a..00000000 --- a/tools/palladium +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 5334475ace67b2ac6dd08d09e7de0ddccb777364 diff --git a/verify/.gitignore b/verify/.gitignore deleted file mode 100644 index ab52b538..00000000 --- a/verify/.gitignore +++ /dev/null @@ -1,2 +0,0 @@ -dut/ -out/ diff --git a/verify/Adder/Adder.v b/verify/Adder/Adder.v deleted file mode 100644 index ae53970d..00000000 --- a/verify/Adder/Adder.v +++ /dev/null @@ -1,35 +0,0 @@ -// A verilog 128-bit full adder with carry in and carry out - -module Adder #( - parameter WIDTH = 128 -) ( - input [WIDTH-1:0] a, - input [WIDTH-1:0] b, - input cin, - output [WIDTH-1:0] sum, - output cout -); - -assign {cout, sum} = a + b + cin; - -endmodule - -module Adder_128 ( - input [127:0] a, - input [127:0] b, - input cin, - output [127:0] sum, - output cout -); - - Adder #( - .WIDTH(128) - ) adder ( - .a(a), - .b(b), - .cin(cin), - .sum(sum), - .cout(cout) - ); - -endmodule diff --git a/verify/Adder/example.py b/verify/Adder/example.py deleted file mode 100644 index 93af1fcd..00000000 --- a/verify/Adder/example.py +++ /dev/null @@ -1,71 +0,0 @@ -try: - from UT_Adder import * -except Exception as e: - try: - from Adder import * - except Exception as e: - from __init__ import * - -import random - - -class input_t: - def __init__(self, a, b, cin): - self.a = a - self.b = b - self.cin = cin - - -class output_t: - def __init__(self): - self.sum = 0 - self.cout = 0 - - -def random_int(): - return random.randint(-(2**127), 2**127 - 1) & ((1 << 128) - 1) - - -def as_uint(x, nbits): - return x & ((1 << nbits) - 1) - - -def main(): - dut = DUTAdder() # Assuming USE_VERILATOR - - print("Initialized UTAdder") - dut.RefreshComb() - dut.dut.PauseWaveformDump() - - for c in range(11451): - i = input_t(random_int(), random_int(), random_int() & 1) - o_dut, o_ref = output_t(), output_t() - - def dut_cal(): - dut.a.value, dut.b.value, dut.cin.value = i.a, i.b, i.cin - dut.Step(1) - o_dut.sum = dut.sum.value - o_dut.cout = dut.cout.value - - def ref_cal(): - sum = as_uint(i.a + i.b + i.cin, 128 + 1) - o_ref.sum = as_uint(sum, 128) - o_ref.cout = as_uint(sum >> 128, 1) - - dut_cal() - ref_cal() - - print(f"[cycle {dut.xclock.clk}] a=0x{i.a:x}, b=0x{i.b:x}, cin=0x{i.cin:x}") - print(f"DUT: sum=0x{o_dut.sum:x}, cout=0x{o_dut.cout:x}") - print(f"REF: sum=0x{o_ref.sum:x}, cout=0x{o_ref.cout:x}") - - assert o_dut.sum == o_ref.sum, "sum mismatch" - if c == 11401: - dut.dut.ResumeWaveformDump() - - print("Test Passed, destroy UTAdder") - dut.Finish() # When using VCS, DUT.Finish() will exit the program, so it should be the last line of the program - - -if __name__ == "__main__": - main() diff --git a/verify/Adder/verilator.sh b/verify/Adder/verilator.sh deleted file mode 100755 index 25e06bb2..00000000 --- a/verify/Adder/verilator.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/bash - -BBDIR=$(git rev-parse --show-toplevel) - -VERIFY_WORKSPACE=$BBDIR/verify -PICKERDIR=$BBDIR/thirdparty/picker - -if ! command -v verible-verilog-syntax &> /dev/null -then - echo "verible could not be found" - echo "please add verible-verilog-syntax into path first" - echo "https://chipsalliance.github.io/verible/verilog_syntax.html" - echo "https://github.com/chipsalliance/verible/releases/tag/v0.0-3428-gcfcbb82b" - exit -fi - -rm -rf $VERIFY_WORKSPACE/out/ -picker export Adder.v --autobuild false -w Adder.fst --sname Adder --tdir $VERIFY_WORKSPACE/out/Adder --sdir $PICKERDIR/template $@ -# if python in $@, then it will generate python binding -if [[ $@ == *"python"* ]]; then - cp $VERIFY_WORKSPACE/Adder/example.py $VERIFY_WORKSPACE/out/Adder/python/ -else - echo "unsupport" -fi - -cd $VERIFY_WORKSPACE/out/Adder && make EXAMPLE=ON diff --git a/verify/run.sh b/verify/run.sh deleted file mode 100755 index 729ece1e..00000000 --- a/verify/run.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/bin/bash - -VERIFY_WORKSPACE="$(dirname "$(realpath "$0")")" - -bbdev verilator --verilog "--balltype vecball --output_dir $VERIFY_WORKSPACE/dut/" diff --git a/workflow/.gitignore b/workflow/.gitignore deleted file mode 100644 index 37fafeda..00000000 --- a/workflow/.gitignore +++ /dev/null @@ -1,21 +0,0 @@ -node_modules -python_modules -.venv -venv -.motia -.mermaid -dist -*.pyc -*.json -motia-workbench.json -package-lock.json -package.json -types.d.ts -appendonlydir/ -dump.rdb -services/ -*.log -output/ - -.cursor/ -.claude/ diff --git a/workflow/bbdev b/workflow/bbdev deleted file mode 100755 index 293982bf..00000000 --- a/workflow/bbdev +++ /dev/null @@ -1,525 +0,0 @@ -#! /usr/bin/env python3 - -import sys -import argparse -import subprocess -import os -import shlex -import json -import time -import requests - -from utils import find_available_port - -workflow_dir = os.path.dirname(os.path.abspath(__file__)) - - -def parse_args(argv: list[str]) -> argparse.Namespace: - parser = argparse.ArgumentParser( - prog="bbdev", - description="Development tool for buckyball project", - formatter_class=argparse.RawTextHelpFormatter, - ) - - # Global parameters - parser.add_argument( - "--port", type=int, default=None, help="Port for dev server (default: None)" - ) - parser.add_argument("--server", action="store_true", help="server mode") - - # Subcommand parser - subparsers = parser.add_subparsers(dest="command", help="Available commands") - - # ===== start subcommand ============================================================================== - start_parser = subparsers.add_parser("start", help="Start dev server") - - # ===== stop subcommand ============================================================================== - stop_parser = subparsers.add_parser("stop", help="Stop dev server") - stop_group = stop_parser.add_mutually_exclusive_group(required=False) - stop_group.add_argument("--all", action="store_true", help="Stop all servers") - - # ===== verilator subcommand ========================================================================= - verilator_parser = subparsers.add_parser("verilator", help="Verilator operations") - # Mutually exclusive option group for verilator - only one operation can be selected - verilator_group = verilator_parser.add_mutually_exclusive_group(required=True) - verilator_group.add_argument( - "--clean", action="store_true", help="Clean verilator build directory" - ) - verilator_group.add_argument( - "--verilog", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Generate verilog files from chisel. Args: "[--balltype ] [--job ] [--batch]"', - ) - verilator_group.add_argument( - "--build", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Build verilator simulation executable. Args: "[--job ]"', - ) - verilator_group.add_argument( - "--sim", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Run verilator simulation. Args: "--binary [--batch]"', - ) - verilator_group.add_argument( - "--run", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Integrated build+sim+run. Args: "--binary [--batch] [--job ]"', - ) - - # ===== vcs subcommand ========================================================================= - vcs_parser = subparsers.add_parser("vcs", help="vcs operations") - # Mutually exclusive option group for vcs - only one operation can be selected - vcs_group = vcs_parser.add_mutually_exclusive_group(required=True) - vcs_group.add_argument( - "--clean", action="store_true", help="Clean vcs build directory" - ) - vcs_group.add_argument( - "--verilog", action="store_true", help="Generate verilog files" - ) - vcs_group.add_argument( - "--build", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Build vcs simulation executable. Args: "[--job ]"', - ) - vcs_group.add_argument( - "--sim", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Run vcs simulation. Args: "--binary [--batch]"', - ) - vcs_group.add_argument( - "--run", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Integrated build+sim+run. Args: "--binary [--batch] [--job ]"', - ) - - # ===== sardine subcommand ========================================================================= - sardine_parser = subparsers.add_parser("sardine", help="sardine operations") - sardine_group = sardine_parser.add_mutually_exclusive_group(required=True) - sardine_group.add_argument( - "--run", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Run sardine. Args: "--workload "', - ) - - # ===== agent subcommand ========================================================================= - agent_parser = subparsers.add_parser("agent", help="agent operations") - agent_group = agent_parser.add_mutually_exclusive_group(required=True) - agent_group.add_argument( - "--chat", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Run agent. Args: "--message ", "--model "', - ) - - # ===== workload subcommand ========================================================================= - workload_parser = subparsers.add_parser("workload", help="workload operations") - workload_group = workload_parser.add_mutually_exclusive_group(required=True) - workload_group.add_argument( - "--build", - type=str, - nargs="?", - const="", - metavar="ARGS", - help="Build workload. ", - ) - - # ===== doc subcommand ========================================================================= - doc_parser = subparsers.add_parser("doc", help="doc operations") - doc_group = doc_parser.add_mutually_exclusive_group(required=True) - doc_group.add_argument("--deploy", action="store_true", help="Deploy doc. ") - - # ===== marshal subcommand ========================================================================= - marshal_parser = subparsers.add_parser("marshal", help="marshal operations") - marshal_group = marshal_parser.add_mutually_exclusive_group(required=True) - marshal_group.add_argument( - "--build", type=str, nargs="?", const="", metavar="ARGS", help="Build marshal. " - ) - marshal_group.add_argument( - "--launch", - type=str, - nargs="?", - const="", - metavar="ARGS", - help="Launch marshal. ", - ) - - # ===== firesim subcommand ========================================================================= - firesim_parser = subparsers.add_parser("firesim", help="firesim operations") - firesim_group = firesim_parser.add_mutually_exclusive_group(required=True) - firesim_group.add_argument( - "--buildbitstream", - type=str, - nargs="?", - const="", - metavar="ARGS", - help="Build bitstream. ", - ) - firesim_group.add_argument( - "--infrasetup", - type=str, - nargs="?", - const="", - metavar="ARGS", - help="Infrasetup. ", - ) - firesim_group.add_argument( - "--runworkload", - type=str, - nargs="?", - const="", - metavar="ARGS", - help="Run workload. ", - ) - - # ===== compiler subcommand ========================================================================= - compiler_parser = subparsers.add_parser("compiler", help="compiler operations") - compiler_group = compiler_parser.add_mutually_exclusive_group(required=True) - compiler_group.add_argument( - "--build", - type=str, - nargs="?", - const="", - metavar="ARGS", - help="Build compiler. ", - ) - - # ===== funcsim subcommand ========================================================================= - funcsim_parser = subparsers.add_parser("funcsim", help="funcsim operations") - funcsim_group = funcsim_parser.add_mutually_exclusive_group(required=True) - funcsim_group.add_argument( - "--build", type=str, nargs="?", const="", metavar="ARGS", help="Build funcsim. " - ) - funcsim_group.add_argument( - "--sim", type=str, nargs="?", const="", metavar="ARGS", help="Sim funcsim. " - ) - - # ===== uvm subcommand ========================================================================= - uvm_parser = subparsers.add_parser("uvm", help="uvm operations") - uvm_group = uvm_parser.add_mutually_exclusive_group(required=True) - uvm_group.add_argument( - "--builddut", type=str, nargs="?", const="", metavar="ARGS", help="Build dut. " - ) - uvm_group.add_argument( - "--build", type=str, nargs="?", const="", metavar="ARGS", help="Build uvm. " - ) - - # ===== palladium subcommand =================================================================== - palladium_parser = subparsers.add_parser("palladium", help="palladium operations") - palladium_group = palladium_parser.add_mutually_exclusive_group(required=True) - palladium_group.add_argument( - "--verilog", - type=str, - nargs="?", - const="", - metavar="ARGS", - help='Generate verilog files from chisel. Args example: "[--config sims.palladium.BuckyballToyP2EConfig]"', - ) - - # Parse arguments, allowing unknown arguments (so --port can work in subcommands) - args, unknown = parser.parse_known_args(argv) - - # If --port and --server are in unknown arguments, handle them manually - while unknown: - found = False - for i in range(len(unknown)): - if unknown[i] == "--port": - try: - args.port = int(unknown[i + 1]) - unknown = unknown[i + 2 :] # Remove processed arguments - found = True - break - except (ValueError, IndexError): - break - if unknown[i] == "--server": - args.server = True - unknown = unknown[i + 1 :] # Remove processed arguments - found = True - break - if not found: - break - # If there are other unknown arguments, throw error - if unknown: - parser.error(f"unrecognized arguments: {' '.join(unknown)}") - - return args - - -def extract_command_info(args): - """Generic command info extractor, applicable to all commands""" - # Basic return structure - result = {"command": getattr(args, "command", None), "operation": None, "args": {}} - - # Dynamically extract operation type (iterate through all attributes of args) - for attr_name in dir(args): - # Skip built-in attributes and known non-operation attributes - if attr_name.startswith("_") or attr_name in ["command", "port"]: - continue - - attr_value = getattr(args, attr_name) - # Find attributes with value True or non-empty string (skip None and False) - if attr_value is not None and attr_value is not False: - result["operation"] = attr_name - if attr_value is True: - # Boolean operations, such as --clean, --verilog - result["args"] = {} - else: - # Operations with arguments, such as --sim "arg_string" - # Parse argument string - args_dict = {} - if attr_value: - try: - # use shlex to parse args - arg_tokens = shlex.split(attr_value) - i = 0 - while i < len(arg_tokens): - token = arg_tokens[i] - if token.startswith("--"): - # long option - option_name = token[2:] # remove -- - if i + 1 < len(arg_tokens) and not arg_tokens[ - i + 1 - ].startswith("-"): - # next token is value - args_dict[option_name] = arg_tokens[i + 1] - i += 2 - else: - # boolean flag - args_dict[option_name] = True - i += 1 - elif token.startswith("-") and len(token) == 2: - # short option - option_name = token[1:] # remove - - if i + 1 < len(arg_tokens) and not arg_tokens[ - i + 1 - ].startswith("-"): - # next token is value - args_dict[option_name] = arg_tokens[i + 1] - i += 2 - else: - # boolean flag - args_dict[option_name] = True - i += 1 - else: - # position argument, skip - i += 1 - except ValueError as e: - print(f"Error parsing arguments: {e}") - result["args"] = args_dict - break - - return result - - -if __name__ == "__main__": - args = parse_args(sys.argv[1:]) - cmd_info = extract_command_info(args) - - # print(f"Command: {cmd_info['command']}") - # print(f"Operation: {cmd_info['operation']}") - # print(f"Arguments: {cmd_info['args']}") - - # ================================================================================== - # Two modes: server mode and script mode - # - # server mode: Manually start/stop server, can visualize and access workflows through browser - # script mode: Automatically assign port, start service when task begins, stop service when task ends - # ================================================================================== - if args.server: - if cmd_info["command"] == "start": - print(f"Starting dev server on port {args.port}") - subprocess.run( - ["npx", "motia", "dev", "--port", str(args.port)], - cwd=workflow_dir, - check=True, - ) - - elif cmd_info["command"] == "stop": - if cmd_info["operation"] == "all": - print("Stopping all servers") - subprocess.run( - "kill -9 $(ps aux | grep '[m]otia' | awk '{print $2}')", - shell=True, - check=False, - text=True, - ) - else: - print(f"Stopping server on port {args.port}") - subprocess.run( - f"kill -TERM $(lsof -t -i :{args.port})", - shell=True, - check=False, - text=True, - ) - - elif cmd_info["command"] in [ - "verilator", - "vcs", - "sardine", - "agent", - "workload", - "doc", - "marshal", - "firesim", - "compiler", - "funcsim", - "uvm", - "palladium", - ]: - api_path = f"/{cmd_info['command']}/{cmd_info['operation']}" - json_data = json.dumps(cmd_info["args"]) - subprocess.run( - [ - "curl", - "-X", - "POST", - f"http://localhost:{args.port}{api_path}", - "-H", - "Content-Type: application/json", - "-d", - json_data, - ], - cwd=workflow_dir, - check=True, - ) - else: - print(f"Unknown command: {cmd_info['command']}") - print( - "Available commands: start, stop, verilator, vcs, sardine, agent, \ - workload, doc, marshal, firesim, compiler, funcsim, uvm" - ) - sys.exit(1) - else: # script mode - if cmd_info["command"] == "start": - print(" 'start' do nothing in script mode") - elif cmd_info["command"] == "stop": - print(" 'stop' do nothing in script mode") - elif cmd_info["command"] in [ - "verilator", - "vcs", - "sardine", - "agent", - "workload", - "doc", - "marshal", - "firesim", - "compiler", - "funcsim", - "uvm", - "palladium", - ]: - # 1. Start service in background ================================ - # If port is specified, use the specified port; otherwise, automatically assign port - if args.port: - available_port = args.port - else: - available_port = find_available_port(start_port=5100, end_port=5500) - print(f"Starting server on port {available_port}...") - proc = subprocess.Popen( - ["npx", "motia", "dev", "--port", str(available_port)], cwd=workflow_dir - ) - - # Wait for service to start - max_retries = 30 - for i in range(max_retries): - try: - # Disable proxy, connect directly to localhost - # response = requests.get(f"http://localhost:{available_port}", timeout=1) - response = requests.get( - f"http://localhost:{available_port}", - timeout=1, - proxies={"http": None, "https": None, "all": None}, - ) - if response.status_code == 200: - print(f"Server is ready on port {available_port}") - break - except requests.exceptions.RequestException: - pass - time.sleep(3) - else: - print("Server failed to start within 30 seconds") - subprocess.run( - f"kill -TERM $(lsof -t -i :{available_port})", - shell=True, - check=False, - text=True, - ) - proc.terminate() - proc.wait() - sys.exit(1) - - # 2. Execute API call ================================ - api_path = f"/{cmd_info['command']}/{cmd_info['operation']}" - json_data = json.dumps(cmd_info["args"]) - print(f"Executing {cmd_info['command']} {cmd_info['operation']}...") - # Disable proxy, connect directly to localhost - result = subprocess.run( - [ - "curl", - "-sS", - "--noproxy", - "localhost", - "-X", - "POST", - f"http://localhost:{available_port}{api_path}", - "-H", - "Content-Type: application/json", - "-d", - json_data, - ], - cwd=workflow_dir, - capture_output=True, - text=True, - ) - - # 3. Shutdown service ================================ - # Give observability plugin time to finish async operations (e.g., Redis writes) - time.sleep(1) - proc.terminate() - proc.wait() - print( - f"\nTask completed. Command running on http://localhost:{available_port} is finished" - ) - - # Check the success field returned by API - try: - response = json.loads(result.stdout) - if not response.get("success", False): - print("Error: Task failed") - sys.exit(1) - except Exception: - print("Error: Invalid API response") - sys.exit(1) - - else: - print(f"Unknown command: {cmd_info['command']}") - print( - "Available commands: start, stop, verilator, vcs, sardine, agent, \ - workload, doc, marshal, firesim, compiler, funcsim, uvm" - ) - sys.exit(1) diff --git a/workflow/mcp-server/README.md b/workflow/mcp-server/README.md deleted file mode 100644 index 4629d8d1..00000000 --- a/workflow/mcp-server/README.md +++ /dev/null @@ -1,35 +0,0 @@ -# Buckyball MCP Server - -将 bbdev 工具封装为 MCP 服务,可以在 Claude/Cursor 中直接使用。 - -## 安装 - -```bash -pip install mcp -``` - -## 配置 - -Cursor - -在 MCP 配置中添加: - -```json -{ - "mcpServers": { - "bbdev": { - "command": "bash", - "args": ["-c", "cd /path/to/your/buckyball && source env.sh && python /path/to/your/buckyball/workflow/mcp-server/server.py"] - } - } -} -``` - -## 使用 - -配置后重启客户端,就可以自然语言调用 bbdev 所有功能了。 - -示例: -- "用 verilator 运行 gelu_test" -- "清理构建目录" -- "运行 sardine 测试" diff --git a/workflow/mcp-server/server.py b/workflow/mcp-server/server.py deleted file mode 100644 index 8041a3de..00000000 --- a/workflow/mcp-server/server.py +++ /dev/null @@ -1,47 +0,0 @@ -#!/usr/bin/env python3 -"""Buckyball Development Tools MCP Server""" - -import asyncio -import sys -from pathlib import Path - -# Add parent directory to path -workflow_dir = Path(__file__).parent.parent -sys.path.insert(0, str(workflow_dir)) - -from mcp.server import Server -from mcp.server.stdio import stdio_server -from mcp.types import Tool, TextContent - -from tools import get_all_tools, handle_tool_call - -# Create server -server = Server("bbdev-mcp") - - -@server.list_tools() -async def handle_list_tools() -> list[Tool]: - """List all available tools.""" - return get_all_tools() - - -@server.call_tool() -async def handle_call_tool(name: str, arguments: dict) -> list[TextContent]: - """Execute tool calls.""" - try: - result = await handle_tool_call(name, arguments) - return [TextContent(type="text", text=result)] - except Exception as e: - return [TextContent(type="text", text=f"Error: {str(e)}")] - - -async def main(): - """Main entry point.""" - async with stdio_server() as (read_stream, write_stream): - await server.run( - read_stream, write_stream, server.create_initialization_options() - ) - - -if __name__ == "__main__": - asyncio.run(main()) diff --git a/workflow/mcp-server/tools.py b/workflow/mcp-server/tools.py deleted file mode 100644 index a19da3f5..00000000 --- a/workflow/mcp-server/tools.py +++ /dev/null @@ -1,480 +0,0 @@ -""" -Tool definitions and handlers for Buckyball MCP server. -""" - -import asyncio -import json -import subprocess -import sys -from pathlib import Path -from typing import Any - -from mcp.types import Tool - -# Path to bbdev executable -BBDEV_PATH = Path(__file__).parent.parent / "bbdev" - - -async def execute_bbdev_command(command: list[str]) -> str: - """ - Execute a bbdev command and return the output. - - Args: - command: List of command arguments - - Returns: - Command output as string - """ - try: - # Run command in script mode (no --server flag) - process = await asyncio.create_subprocess_exec( - str(BBDEV_PATH), - *command, - stdout=asyncio.subprocess.PIPE, - stderr=asyncio.subprocess.PIPE, - cwd=BBDEV_PATH.parent, - ) - - stdout, stderr = await process.communicate() - - output = stdout.decode() if stdout else "" - error = stderr.decode() if stderr else "" - - if process.returncode != 0: - return f"Command failed with exit code {process.returncode}\n\nStdout:\n{output}\n\nStderr:\n{error}" - - return output if output else "Command completed successfully" - - except Exception as e: - return f"Error executing command: {str(e)}" - - -def get_all_tools() -> list[Tool]: - """Return all available tools.""" - return [ - # Verilator tools - Tool( - name="verilator_clean", - description="Clean verilator build directory", - inputSchema={ - "type": "object", - "properties": {}, - "required": [], - }, - ), - Tool( - name="verilator_verilog", - description="Generate verilog files from chisel", - inputSchema={ - "type": "object", - "properties": {}, - "required": [], - }, - ), - Tool( - name="verilator_build", - description="Build verilator simulation executable", - inputSchema={ - "type": "object", - "properties": { - "job": { - "type": "integer", - "description": "Number of parallel jobs for compilation (default: 16)", - "default": 16, - }, - }, - "required": [], - }, - ), - Tool( - name="verilator_sim", - description="Run verilator simulation with a binary", - inputSchema={ - "type": "object", - "properties": { - "binary": { - "type": "string", - "description": "Path or name of the binary to simulate", - }, - "batch": { - "type": "boolean", - "description": "Run in batch mode (no interactive output)", - "default": False, - }, - }, - "required": ["binary"], - }, - ), - Tool( - name="verilator_run", - description="Integrated build+sim+run for verilator (builds and runs simulation)", - inputSchema={ - "type": "object", - "properties": { - "binary": { - "type": "string", - "description": "Path or name of the binary to simulate", - }, - "batch": { - "type": "boolean", - "description": "Run in batch mode (no interactive output)", - "default": False, - }, - "job": { - "type": "integer", - "description": "Number of parallel jobs for compilation (default: 16)", - "default": 16, - }, - }, - "required": ["binary"], - }, - ), - # VCS tools - Tool( - name="vcs_clean", - description="Clean VCS build directory", - inputSchema={ - "type": "object", - "properties": {}, - "required": [], - }, - ), - Tool( - name="vcs_verilog", - description="Generate verilog files for VCS", - inputSchema={ - "type": "object", - "properties": {}, - "required": [], - }, - ), - Tool( - name="vcs_build", - description="Build VCS simulation executable", - inputSchema={ - "type": "object", - "properties": { - "job": { - "type": "integer", - "description": "Number of parallel jobs for compilation", - "default": 16, - }, - }, - "required": [], - }, - ), - Tool( - name="vcs_sim", - description="Run VCS simulation with a binary", - inputSchema={ - "type": "object", - "properties": { - "binary": { - "type": "string", - "description": "Path or name of the binary to simulate", - }, - "batch": { - "type": "boolean", - "description": "Run in batch mode", - "default": False, - }, - }, - "required": ["binary"], - }, - ), - Tool( - name="vcs_run", - description="Integrated build+sim+run for VCS", - inputSchema={ - "type": "object", - "properties": { - "binary": { - "type": "string", - "description": "Path or name of the binary to simulate", - }, - "batch": { - "type": "boolean", - "description": "Run in batch mode", - "default": False, - }, - "job": { - "type": "integer", - "description": "Number of parallel jobs", - "default": 16, - }, - }, - "required": ["binary"], - }, - ), - # Sardine tools - Tool( - name="sardine_run", - description="Run sardine test framework with a workload", - inputSchema={ - "type": "object", - "properties": { - "workload": { - "type": "string", - "description": "Path or name of the workload to run", - }, - }, - "required": ["workload"], - }, - ), - # Agent tools - Tool( - name="agent_chat", - description="Chat with the Buckyball development agent", - inputSchema={ - "type": "object", - "properties": { - "message": { - "type": "string", - "description": "Message to send to the agent", - }, - "model": { - "type": "string", - "description": "Model to use for the agent", - "default": "gpt-4", - }, - }, - "required": ["message"], - }, - ), - # Workload tools - Tool( - name="workload_build", - description="Build a workload for testing", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments for workload build", - "default": "", - }, - }, - "required": [], - }, - ), - # Documentation tools - Tool( - name="doc_deploy", - description="Deploy documentation", - inputSchema={ - "type": "object", - "properties": {}, - "required": [], - }, - ), - # Marshal tools - Tool( - name="marshal_build", - description="Build marshal", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments for marshal build", - "default": "", - }, - }, - "required": [], - }, - ), - Tool( - name="marshal_launch", - description="Launch marshal", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments for marshal launch", - "default": "", - }, - }, - "required": [], - }, - ), - # FireSim tools - Tool( - name="firesim_buildbitstream", - description="Build FireSim bitstream", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - Tool( - name="firesim_infrasetup", - description="Setup FireSim infrastructure", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - Tool( - name="firesim_runworkload", - description="Run workload on FireSim", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - # Compiler tools - Tool( - name="compiler_build", - description="Build compiler", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - # Funcsim tools - Tool( - name="funcsim_build", - description="Build functional simulation", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - Tool( - name="funcsim_sim", - description="Run functional simulation", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - # UVM tools - Tool( - name="uvm_builddut", - description="Build UVM DUT", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - Tool( - name="uvm_build", - description="Build UVM testbench", - inputSchema={ - "type": "object", - "properties": { - "args": { - "type": "string", - "description": "Additional arguments", - "default": "", - }, - }, - "required": [], - }, - ), - ] - - -async def handle_tool_call(name: str, arguments: dict) -> str: - """ - Handle a tool call by routing to the appropriate bbdev command. - - Args: - name: Tool name - arguments: Tool arguments - - Returns: - Tool execution result - """ - # Parse tool name to extract command and operation - parts = name.split("_", 1) - if len(parts) != 2: - return f"Invalid tool name format: {name}" - - command, operation = parts - - # Build bbdev command - cmd = [command, f"--{operation}"] - - # Build argument string based on the operation - if operation in ["build", "sim", "run"]: - arg_parts = [] - - # Handle common arguments - if "job" in arguments: - arg_parts.append(f"--jobs {arguments['job']}") - - if "binary" in arguments: - arg_parts.append(f"--binary {arguments['binary']}") - - if "batch" in arguments and arguments["batch"]: - arg_parts.append("--batch") - - if "workload" in arguments: - arg_parts.append(f"--workload {arguments['workload']}") - - if "message" in arguments: - arg_parts.append(f"--message '{arguments['message']}'") - - if "model" in arguments: - arg_parts.append(f"--model {arguments['model']}") - - if "args" in arguments and arguments["args"]: - arg_parts.append(arguments["args"]) - - # Join all arguments into a single string and pass as one argument - if arg_parts: - cmd.append(" ".join(arg_parts)) - - # Execute command - return await execute_bbdev_command(cmd) diff --git a/workflow/motia.config.ts b/workflow/motia.config.ts deleted file mode 100644 index c250ef7f..00000000 --- a/workflow/motia.config.ts +++ /dev/null @@ -1,9 +0,0 @@ -import { config } from '@motiadev/core' -const statesPlugin = require('@motiadev/plugin-states/plugin') -const endpointPlugin = require('@motiadev/plugin-endpoint/plugin') -const logsPlugin = require('@motiadev/plugin-logs/plugin') -const observabilityPlugin = require('@motiadev/plugin-observability/plugin') - -export default config({ - plugins: [observabilityPlugin, statesPlugin, endpointPlugin, logsPlugin], -}) diff --git a/workflow/prompts/doc/common/standards.md b/workflow/prompts/doc/common/standards.md deleted file mode 100644 index c36c3698..00000000 --- a/workflow/prompts/doc/common/standards.md +++ /dev/null @@ -1,28 +0,0 @@ -# 通用文档标准 - -## 语言和格式要求 -1. 所有描述性内容必须使用中文 -2. 技术术语、API名称、代码标识符等保持英文原文 -3. 严禁使用任何emoji、装饰性符号或花哨的Unicode字符 -4. 使用专业、客观的技术写作语调,避免口语化表达 -5. 严格使用标准Markdown语法 - -## 内容准确性要求 -1. 所有技术信息必须基于实际代码内容,不得编造或夸大 -2. 代码片段必须来自实际文件,不超过15行 -3. 功能描述必须准确反映代码实现,避免"高性能"、"先进"等无根据的形容词 -4. 接口和参数描述必须与实际代码一致 -5. 依赖关系必须基于实际的import和调用关系 -6. 请先读代码后写文档 - -## 文档结构要求 -1. 根据目录类型使用对应的文档结构 -2. 每个章节必须包含实质性内容,避免空泛描述 -3. 重点说明实际功能、使用方法和关键实现 -4. 避免添加"特点"、"优势"等营销性质的内容 - -## 代码分析要求 -1. 仔细分析目录中的实际文件内容 -2. 提取真实的模块关系、接口定义、配置参数 -3. 基于实际代码结构组织文档内容 -4. 引用的代码片段必须准确无误 diff --git a/workflow/prompts/doc/customext-doc.md b/workflow/prompts/doc/customext-doc.md deleted file mode 100644 index 35616dd8..00000000 --- a/workflow/prompts/doc/customext-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# 自定义扩展测试代码目录文档生成prompt - -你是一位硬件扩展和自定义指令专家文档生成助手。你的任务是为仓库中的bb-tests/customext目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其自定义硬件扩展、指令集扩展、加速器集成以及相关的测试验证方法。 - -请严格按照以下六个部分进行书写: - -一、扩展概述 (Extension Overview) -总结自定义扩展的主要功能和目的,包括扩展的类型和功能、目标应用场景和性能优化目标、与基础架构的集成方式和接口、扩展的技术实现方法。 - -二、架构设计 (Architecture Design) -分析扩展的架构设计和实现方式,包括硬件架构和模块组织、指令集定义和编码格式、数据路径和控制逻辑设计、与主处理器的接口和通信协议。 - -三、功能模块 (Functional Modules) -详细说明各个功能模块,包括指令定义、硬件实现、软件接口、集成测试等。对于每个模块,说明其功能描述、关键接口和参数定义、实现细节,并提供相关的代码示例(控制在15行以内)。 - -四、测试验证 (Testing and Verification) -说明测试验证的方法和流程,包括功能正确性测试的设计和执行、性能基准测试和评估方法、兼容性测试和回归测试、调试工具和验证环境。 - -五、使用指南 (Usage Guide) -提供使用和集成的指导,包括编译构建和依赖配置、运行环境的设置和要求、示例程序和使用案例、常见问题和故障排除。 - -六、开发扩展 (Development Extension) -指导如何开发新的扩展,包括扩展开发的流程和规范、代码结构和组织方式、测试用例的编写方法、文档和维护的方法。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/rtl-doc.md b/workflow/prompts/doc/rtl-doc.md deleted file mode 100644 index 4563463c..00000000 --- a/workflow/prompts/doc/rtl-doc.md +++ /dev/null @@ -1,38 +0,0 @@ -# RTL 代码目录文档生成prompt - -你是一位代码文档生成专家。你的任务是为仓库中的rtl目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其硬件逻辑设计、模块功能、接口定义以及RTL级别的实现细节等硬件相关方面。 - -请严格按照以下四个部分进行书写: -一、Overview -总结其主要目的,包括目录整体的功能、设计的硬件架构目标。目录在 @arch/src/main/scala 目录下的上下层所处位置,扮演的角色 - -二、代码结构 -分析目录结构,并列出所有文件和子目录。然后介绍每个文件之间的关系,包括模块间的层次结构、调用依赖、接口连接和数据流向。 - -三、模块详细说明 -对于每个文件或关键模块: -总结其主要目的和功能。 -解释关键组件(如端口定义、内部逻辑、状态机、组合/时序逻辑),并在相关处使用代码片段。 -描述输入、输出和边缘情况。 -注明任何依赖项、外部模块或使用的硬件描述语言特定特性(如Verilog构造)。 - -四、附加信息 -提供任何你关注到的注意事项 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 禁止自行添加超出我限定范围的内容 - -最后,整个输出必须严格按照提供的参考模板结构化。 -参考模板:@arch/src/main/scala/framework/builtin/memdomain/mem/README.md ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/sardine-doc.md b/workflow/prompts/doc/sardine-doc.md deleted file mode 100644 index 1171465a..00000000 --- a/workflow/prompts/doc/sardine-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# Sardine测试框架代码目录文档生成prompt - -你是一位测试框架和自动化测试专家文档生成助手。你的任务是为仓库中的bb-tests/sardine目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其测试框架架构、自动化测试流程、测试用例管理以及相关的测试工具和方法。 - -请严格按照以下六个部分进行书写: - -一、框架概述 (Framework Overview) -总结Sardine测试框架的主要功能和目的,包括测试框架的设计理念和架构、支持的测试类型和测试场景、与其他测试工具和CI/CD系统的集成、框架的核心功能。 - -二、架构设计 (Architecture Design) -分析测试框架的架构设计,包括测试框架的整体架构和组件关系、测试执行引擎和调度机制、测试数据管理和结果收集、插件系统和扩展机制。 - -三、核心组件 (Core Components) -详细说明框架的核心组件,包括测试执行器、结果分析器、配置管理、报告生成等。对于每个组件,说明其功能描述、关键接口和API、实现细节,并提供相关的代码示例(控制在15行以内)。 - -四、测试用例管理 (Test Case Management) -说明测试用例的组织和管理,包括测试用例的分类和组织结构、测试数据的准备和管理方法、测试用例的编写规范、测试用例的维护和版本控制。 - -五、使用指南 (Usage Guide) -提供框架使用的详细指导,包括环境配置和依赖安装、测试执行的命令和参数、配置文件的编写和定制、测试结果的查看和分析。 - -六、扩展开发 (Extension Development) -指导如何扩展和定制框架,包括添加新测试类型的方法、自定义测试插件的开发、集成外部工具和服务、框架的性能优化和调优。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/script-doc.md b/workflow/prompts/doc/script-doc.md deleted file mode 100644 index 1575093d..00000000 --- a/workflow/prompts/doc/script-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# 脚本工具代码目录文档生成prompt - -你是一位脚本开发和自动化工具专家文档生成助手。你的任务是为仓库中的scripts目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其脚本功能、自动化流程、工具使用方法以及相关的开发和部署工具。 - -请严格按照以下六个部分进行书写: - -一、脚本概述 (Script Overview) -总结脚本工具的主要功能和目的,包括脚本的主要功能分类、自动化流程的设计目标和应用场景、与项目开发流程的集成关系、脚本工具的核心功能。 - -二、脚本分类 (Script Categories) -按功能对脚本进行分类说明,包括构建脚本、部署脚本、测试脚本、工具脚本、配置脚本等不同类型的脚本及其用途。 - -三、脚本详细说明 (Script Details) -对每个主要脚本进行详细说明,包括脚本的具体功能和解决的问题、输入参数和配置选项、执行流程和逻辑、输出结果和依赖要求,并提供关键代码片段(控制在15行以内)。 - -四、使用指南 (Usage Guide) -提供脚本使用的详细指导,包括环境准备和依赖安装、脚本执行的基本方法和参数、配置文件的编写和定制、批量执行和自动化集成。 - -五、开发和维护 (Development and Maintenance) -说明脚本的开发和维护方法,包括脚本开发的规范、代码结构和组织方式、测试和验证方法、版本控制和更新流程。 - -六、集成和扩展 (Integration and Extension) -指导如何集成和扩展脚本,包括与CI/CD系统的集成方法、添加新功能和脚本的步骤、自定义配置和参数化、性能优化和错误处理。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/sim-doc.md b/workflow/prompts/doc/sim-doc.md deleted file mode 100644 index 68fd1020..00000000 --- a/workflow/prompts/doc/sim-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# 仿真器代码目录文档生成prompt - -你是一位计算机架构仿真和系统建模专家文档生成助手。你的任务是为仓库中的sims目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其仿真器架构、仿真模型、性能建模以及相关的仿真工具和方法。 - -请严格按照以下六个部分进行书写: - -一、仿真器概述 (Simulator Overview) -总结仿真器的主要功能和目的,包括仿真器的类型和仿真精度级别、目标架构和支持的指令集、仿真器的性能和适用场景、与其他仿真工具和开发环境的关系。 - -二、架构设计 (Architecture Design) -分析仿真器的架构设计,包括仿真器的整体架构和模块组织、处理器模型和内存子系统建模、I/O设备和外设的仿真实现、仿真引擎和调度机制。 - -三、核心组件 (Core Components) -详细说明仿真器的核心组件,包括处理器模型、内存系统、I/O系统、调试接口等。对于每个组件,说明其功能描述和建模精度、关键接口和配置参数、实现细节,并提供相关的代码示例(控制在15行以内)。 - -四、使用指南 (Usage Guide) -提供仿真器使用的详细指导,包括编译构建和依赖配置、仿真执行的命令和参数、配置文件和仿真选项、程序加载和执行方法。 - -五、性能分析 (Performance Analysis) -说明性能分析和调优方法,包括性能统计和指标收集、仿真速度和精度的权衡、性能瓶颈分析和优化、与真实硬件的对比验证。 - -六、扩展开发 (Extension Development) -指导如何扩展和定制仿真器,包括添加新指令和功能的方法、自定义设备和外设的集成、仿真模型的修改和优化、调试工具和分析插件的开发。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/uvbb-doc.md b/workflow/prompts/doc/uvbb-doc.md deleted file mode 100644 index d1a88db0..00000000 --- a/workflow/prompts/doc/uvbb-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# UVBB测试代码目录文档生成prompt - -你是一位硬件验证和UVM测试专家文档生成助手。你的任务是为仓库中的bb-tests/uvbb目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其UVM验证环境、测试平台架构、验证组件以及相关的硬件验证方法。 - -请严格按照以下六个部分进行书写: - -一、验证环境概述 (Verification Environment Overview) -总结UVBB验证环境的主要功能和目的,包括UVM验证环境的设计目标和验证策略、被测设计(DUT)的特征和验证需求、验证环境的覆盖率目标和质量标准、与其他验证工具和流程的集成。 - -二、验证架构 (Verification Architecture) -分析UVM验证环境的架构设计,包括UVM testbench的整体架构和组件层次、Agent、Driver、Monitor、Scoreboard等组件的组织、验证环境的配置和参数化机制、测试序列和场景的管理架构。 - -三、验证组件 (Verification Components) -详细说明各个验证组件,包括UVM Agent、Driver、Monitor、Scoreboard、Sequence等。对于每个组件,说明其功能描述、关键接口和配置参数、实现细节,并提供相关的代码示例(控制在15行以内)。 - -四、测试场景 (Test Scenarios) -说明测试场景的设计和实现,包括功能测试场景的分类和覆盖、边界条件和异常情况的测试、性能和压力测试的设计、随机化测试和约束定义。 - -五、运行和调试 (Execution and Debug) -提供运行和调试的指导,包括仿真环境的配置和启动、测试执行的命令和参数、波形分析和调试方法、覆盖率收集和分析工具。 - -六、验证流程 (Verification Flow) -描述完整的验证流程,包括验证计划的制定和执行、回归测试和持续集成、覆盖率驱动的验证方法、验证结果的分析和报告。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/workflow-doc.md b/workflow/prompts/doc/workflow-doc.md deleted file mode 100644 index f06d5199..00000000 --- a/workflow/prompts/doc/workflow-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# 工作流代码目录文档生成prompt - -你是一位开发工作流和自动化流程专家文档生成助手。你的任务是为仓库中的workflow目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其工作流设计、自动化流程、开发工具以及相关的流程管理和协作方法。 - -请严格按照以下六个部分进行书写: - -一、工作流概述 (Workflow Overview) -总结工作流系统的主要功能和目的,包括工作流的设计理念和架构目标、支持的开发流程和自动化场景、与开发工具链和CI/CD系统的集成、工作流系统的核心功能。 - -二、架构设计 (Architecture Design) -分析工作流系统的架构设计,包括工作流引擎和执行框架、任务调度和依赖管理机制、配置管理和参数化系统、插件系统和扩展架构。 - -三、核心组件 (Core Components) -详细说明工作流的核心组件,包括流程定义、任务执行、状态管理、结果处理等。对于每个组件,说明其功能描述、关键接口和配置选项、实现细节,并提供相关的代码示例(控制在15行以内)。 - -四、工作流配置 (Workflow Configuration) -说明工作流的配置和定制方法,包括配置文件的结构和语法、参数定义和环境变量管理、条件执行和分支控制、错误处理和重试机制。 - -五、使用指南 (Usage Guide) -提供工作流使用的详细指导,包括环境搭建和依赖安装、工作流的启动和执行方法、监控和调试工具的使用、常见问题和故障排除。 - -六、开发和扩展 (Development and Extension) -指导如何开发和扩展工作流,包括自定义任务和插件的开发、工作流模板的创建和复用、与外部系统的集成方法、性能优化和最佳实践。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/doc/workloads-doc.md b/workflow/prompts/doc/workloads-doc.md deleted file mode 100644 index 8a90c65f..00000000 --- a/workflow/prompts/doc/workloads-doc.md +++ /dev/null @@ -1,41 +0,0 @@ -# 工作负载测试代码目录文档生成prompt - -你是一位测试工程和性能评估专家文档生成助手。你的任务是为仓库中的bb-tests/workloads目录下的代码创建一份全面的README文档。你这次的目标是目录 @[`目录相对路径`] ,你需要详细描述其测试工作负载、性能基准、测试用例以及相关的测试框架和评估方法。 - -请严格按照以下六个部分进行书写: - -一、工作负载概述 (Workload Overview) -总结工作负载的主要功能和目的,包括测试工作负载的类型、目标测试场景和应用领域、性能评估的关键指标、与系统其他测试组件的关系。 - -二、测试用例结构 (Test Case Structure) -分析测试用例的组织和结构,包括测试用例的分类和层次结构、输入数据集和参数配置、预期输出和验证标准、测试用例之间的依赖关系。 - -三、工作负载详细说明 (Workload Details) -对每个主要工作负载进行详细说明,包括功能描述、算法实现、性能特征、配置参数,并提供关键代码片段和实现细节(控制在15行以内)。 - -四、构建和运行 (Build and Execution) -提供构建和运行的详细指导,包括编译依赖和构建系统配置、编译命令和构建选项、运行方法和命令行参数、输出结果的解读和分析。 - -五、性能评估 (Performance Evaluation) -说明性能评估的方法和工具,包括性能指标的定义和测量方法、基准测试的执行流程、结果分析和性能调优建议、与其他实现或平台的对比方法。 - -六、扩展和定制 (Extension and Customization) -指导如何扩展和定制工作负载,包括添加新测试用例的方法、修改现有工作负载的步骤、集成新的性能指标、适配不同硬件平台的方法。 - -文档规范: -1. 确保文档清晰、简洁,并使用Markdown格式以提高可读性,包括标题、项目符号、代码块,以及必要时的表格或流程图。 -2. 禁止在文档中使用任何花哨的Emoji或其他装饰性符号。 -3. 所有技术信息必须基于实际代码内容,不得编造或夸大功能。 -4. 代码片段必须来自实际文件,并提供准确的中文解释。 -5. 禁止自行添加超出我限定范围的内容。 - ---- - -## 使用方法 -1. 替换上述prompt中的占位符为实际信息 -2. 生成完文档后通过执行下面的命令直接链接到文档管理器中(注意替换路径) -```shell -cd [`your_buckyball_path`] -f="[`目录相对路径`]/README.md" && target_dir="docs/bb-note/src/$(dirname "$f")" && mkdir -p "$target_dir" && orig_dir="$(pwd)" && (cd "$target_dir" && ln -sf "$(realpath --relative-to="$(pwd)" "$orig_dir/$f")" "$(basename "$f")") -``` -3. 最后把文件路径添加到 `docs/bb-note/src/SUMMARY.md` 即可 diff --git a/workflow/prompts/new-ball.md b/workflow/prompts/new-ball.md deleted file mode 100644 index 873dbcee..00000000 --- a/workflow/prompts/new-ball.md +++ /dev/null @@ -1,17 +0,0 @@ -# 使用这个promts生成并集成一个新ball - -你是一位AI定制化加速单元实现专家,你的任务是在该仓库实现并集成新的硬件加速单元。 - -使用Deepwiki获取`DangoSys/buckyball`, blink协议相关内容 -使用Deepwiki获取`DangoSys/buckyball`, 如何集成一个自定义的ball进去 - -查询完成后执行以下任务,对于仓库你有任何不懂的可以直接使用ask_question问Deepwiki -1. 请依据 [arch/src/main/scala/prototype/nagisa/layernorm] 目录下的spec.md,实现一个 [LAYERNORM] ball并集成进系统 -2. 实现对应的Ctest测试用例 -3. 使用bbdev verilator进行测试 -4. 测试通过后将该测试加入 sardine 列表 -5. 将对应设计的唯一 README.md, 使用相对路径软链接加入 bb-note - -规范: -1. 请尽量避免生成总结文档 -2. 除了集成必要以外,你不应该修改ball以外的代码 diff --git a/workflow/prompts/new-spec.md b/workflow/prompts/new-spec.md deleted file mode 100644 index 5d86bc2d..00000000 --- a/workflow/prompts/new-spec.md +++ /dev/null @@ -1,911 +0,0 @@ -# 使用这个promts生成一个新ball的spec - -你是一位AI定制化加速单元的Spec书写专家,你的任务是为新的硬件加速单元的设计书写spec。 - -请你参考下面的Spec的格式(实现方案不用参考) -``` -# GELU加速单元设计规范 - -## 1. 概述 (Overview) - -GELU (Gaussian Error Linear Unit) 加速单元是Buckyball框架中的专用计算加速器,用于高效执行GELU激活函数运算。GELU是现代深度学习模型(如Transformer、BERT、GPT等)中广泛使用的非线性激活函数。 - -### 1.1 基本参数 - -- **数据格式**: 输入为INT8,输出为INT32 -- **向量化处理**: 每次处理16个INT8元素(veclane=16) -- **流水线架构**: 多级流水线设计(ID, Load, Execute, Store) -- **计算方法**: 采用GELU近似算法(tanh公式) -- **存储接口**: 支持Scratchpad和Accumulator读写 - -### 1.2 数学定义 - -GELU激活函数的精确定义: - -``` -GELU(x) = x · Φ(x) -``` - -其中Φ(x)是标准正态分布的累积分布函数。 - -硬件实现采用tanh近似公式: - -``` -GELU(x) ≈ 0.5 · x · (1 + tanh(√(2/π) · (x + 0.044715 · x³))) -``` - -简化常数: -- √(2/π) ≈ 0.7978845608 -- 0.044715 - -## 2. 系统架构 (Block Diagram) - -### 2.1 顶层架构 - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ GELU Accelerator │ -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ │ │ │ │ │ │ -│ │ Control │───▶│ Load Unit │───▶│ Execute │ │ -│ │ Unit (ID) │ │ │ │ Unit (EX) │ │ -│ │ │ │ │ │ │ │ -│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ -│ │ │ │ │ -│ │ │ │ │ -│ │ ┌────▼────┐ ┌────▼────┐ │ -│ │ │ SRAM │ │ Store │ │ -│ │ │ Read │ │ Unit │ │ -│ │ │ Arbiter │ │ │ │ -│ │ └─────────┘ └────┬────┘ │ -│ │ │ │ -│ ┌──────▼────────────────────────────────┐ │ │ -│ │ Command Interface │ │ │ -│ │ (Ball Bus / RoCC Interface) │ │ │ -│ └───────────────────────────────────────┘ │ │ -│ │ │ -│ ┌──────────────────────────────────────┐ │ │ -│ │ Status Monitor │ │ │ -│ │ (ready/valid/idle/init/running) │ │ │ -│ └──────────────────────────────────────┘ │ │ -│ │ │ -└─────────────────────────────────────────────────┼───────────────┘ - │ - ┌─────▼─────┐ - │ Memory │ - │ System │ - └───────────┘ -``` - -### 2.2 流水线结构 - -``` -┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ -│ ID │───▶│ Load │───▶│ EX │───▶│ Store │ -│ Stage │ │ Stage │ │ Stage │ │ Stage │ -└────────┘ └────────┘ └────────┘ └────────┘ - │ │ │ │ - │ │ │ │ - Decode Load Data Compute GELU Write Back - Command from SRAM Approximation to ACC/SRAM -``` - -### 2.3 计算单元架构 - -``` -┌─────────────────────────────────────────────────────────┐ -│ GELU Compute Pipeline (EX Stage) │ -├─────────────────────────────────────────────────────────┤ -│ │ -│ Input x │ -│ │ │ -│ ├──────────┬──────────┬──────────┐ │ -│ │ │ │ │ │ -│ │ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ │ -│ │ │ x² │───▶│ x³ │───▶│ MUL │ (x³·0.044715)│ -│ │ └─────┘ └─────┘ └──┬──┘ │ -│ │ │ │ -│ │ ┌──────────────────────┘ │ -│ │ │ │ -│ │ ┌──▼──┐ │ -│ │ │ ADD │ (x + 0.044715·x³) │ -│ │ └──┬──┘ │ -│ │ │ │ -│ │ ┌──▼──┐ │ -│ │ │ MUL │ (0.7978845608 · ...) │ -│ │ └──┬──┘ │ -│ │ │ │ -│ │ ┌──▼──┐ │ -│ │ │TANH │ (lookup table / polynomial approx) │ -│ │ └──┬──┘ │ -│ │ │ │ -│ │ ┌──▼──┐ │ -│ │ │ADD+1│ (1 + tanh(...)) │ -│ │ └──┬──┘ │ -│ │ │ │ -│ └────┬─────┘ │ -│ │ │ -│ ┌──▼──┐ │ -│ │ MUL │ x · (...) │ -│ └──┬──┘ │ -│ │ │ -│ ┌──▼──┐ │ -│ │ MUL │ 0.5 · (...) │ -│ └──┬──┘ │ -│ │ │ -│ Output GELU(x) │ -└─────────────────────────────────────────────────────────┘ -``` - - -## 3. 接口描述 (Interface Description) - -GELU单元对外提供以下接口: -- **命令接口** (Command Interface): 接收GELU指令并返回完成响应 -- **Scratchpad接口** (SRAM Interface): 访问INT8数据的存储器 -- **Accumulator接口** (ACC Interface): 访问INT32数据的存储器 -- **状态监控接口** (Status Interface): 输出当前运行状态信息 -- **时钟和复位接口**: 提供时钟和复位信号 - -### 3.1 指令语义 (Instruction Semantics) - -一条GELU指令的完整语义如下: - -**指令含义**:对存储在Scratchpad或Accumulator中的向量执行GELU运算 - -**数据格式**: -- **Scratchpad模式** (`is_acc=0`):INT8输入 → GELU → INT8输出 -- **Accumulator模式** (`is_acc=1`):INT32输入 → GELU → INT32输出 -- **注意**:Scratchpad存储INT8,Accumulator存储INT32 - -**处理单位**: -- 每个向量 = 16个元素(veclane = 16) -- 每个SRAM地址存储1个INT8向量(16×8位 = 128位 = 16字节) -- 每个ACC地址存储1个INT32向量(16×32位 = 512位 = 64字节) - -**指令参数说明**: - -| 参数 | 含义 | 示例 | -|-----|------|------| -| `iter` | 要处理的向量个数 | iter=64 表示处理64个向量 | -| `op1_bank` | 输入数据所在的Bank号 | 0-3 | -| `op1_bank_addr` | 输入起始地址 | 0x100 | -| `wr_bank` | 输出数据写入的Bank号 | 0-3 | -| `wr_bank_addr` | 输出起始地址 | 0x200 | -| `is_acc` | 数据类型选择 | 0=SRAM(INT8)模式, 1=ACC(INT32)模式 | - -**模式1:is_acc=0(SRAM模式,INT8)** - -输入范围: -``` -起始地址:SRAM[op1_bank][op1_bank_addr] -结束地址:SRAM[op1_bank][op1_bank_addr + iter - 1] -每地址:16个INT8元素(128位) -总元素数:iter × 16 个INT8元素 -``` - -输出范围: -``` -起始地址:SRAM[wr_bank][wr_bank_addr] -结束地址:SRAM[wr_bank][wr_bank_addr + iter - 1] -每地址:16个INT8元素(128位) -总元素数:iter × 16 个INT8元素 -``` - -**模式2:is_acc=1(ACC模式,INT32)** - -输入范围: -``` -起始地址:ACC[op1_bank][op1_bank_addr] -结束地址:ACC[op1_bank][op1_bank_addr + iter - 1] -每地址:16个INT32元素(512位) -总元素数:iter × 16 个INT32元素 -``` - -输出范围: -``` -起始地址:ACC[wr_bank][wr_bank_addr] -结束地址:ACC[wr_bank][wr_bank_addr + iter - 1] -每地址:16个INT32元素(512位) -总元素数:iter × 16 个INT32元素 -``` - -**示例1**:SRAM模式,处理INT8数据 -``` -iter = 64 -op1_bank = 0, op1_bank_addr = 0x000 -wr_bank = 1, wr_bank_addr = 0x000 -is_acc = 0 - -输入:SRAM[0][0x000~0x03F] 的64个向量(1024个INT8元素) -输出:SRAM[1][0x000~0x03F] 的64个向量(1024个INT8元素) -``` - -**示例2**:ACC模式,处理INT32数据 -``` -iter = 64 -op1_bank = 0, op1_bank_addr = 0x000 -wr_bank = 1, wr_bank_addr = 0x000 -is_acc = 1 - -输入:ACC[0][0x000~0x03F] 的64个向量(1024个INT32元素) -输出:ACC[1][0x000~0x03F] 的64个向量(1024个INT32元素) -``` - -**示例3**:单个向量处理 -``` -iter = 1 -op1_bank = 0, op1_bank_addr = 0x100 -wr_bank = 0, wr_bank_addr = 0x200 -is_acc = 0 - -输入:SRAM[0][0x100] 的16个INT8元素 -输出:SRAM[0][0x200] 的16个INT8元素 -``` - -### 3.2 命令接口 (Command Interface) - -GELU单元通过Ball Domain标准接口与系统交互: - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `cmdReq.valid` | Input | 1 | 命令请求有效信号 | -| `cmdReq.ready` | Output | 1 | 命令请求就绪信号 | -| `cmdReq.bits.rob_id` | Input | 10 | ROB (Reorder Buffer) 标识符 | -| `cmdReq.bits.iter` | Input | 10 | 向量迭代次数 (支持1-1024) | -| `cmdReq.bits.op1_bank` | Input | 2 | 操作数Bank选择 | -| `cmdReq.bits.op1_bank_addr` | Input | 12 | 操作数Bank内地址 | -| `cmdReq.bits.wr_bank` | Input | 2 | 写回Bank选择 | -| `cmdReq.bits.wr_bank_addr` | Input | 12 | 写回Bank内地址 | -| `cmdReq.bits.is_acc` | Input | 1 | 目标存储类型 (0=SRAM, 1=ACC) | - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `cmdResp.valid` | Output | 1 | 完成响应有效信号 | -| `cmdResp.ready` | Input | 1 | 完成响应就绪信号 | -| `cmdResp.bits.rob_id` | Output | 10 | 完成指令的ROB ID | -| `cmdResp.bits.commit` | Output | 1 | 提交标志 | - -### 3.2 Scratchpad存储接口 (SRAM Interface) - -支持多Bank并行访问的SRAM接口,存储INT8数据。Bank数量由配置参数`sp_banks`决定(典型值4)。 - -**读接口** (每Bank): - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `sramRead[i].req.valid` | Output | 1 | 读请求有效信号 | -| `sramRead[i].req.ready` | Input | 1 | 读请求就绪信号 | -| `sramRead[i].req.bits.addr` | Output | log2(entries) | 读地址 | -| `sramRead[i].resp.valid` | Input | 1 | 读响应有效信号 | -| `sramRead[i].resp.bits.data` | Input | 128 | 读数据(16个INT8 = 128位)| - -**写接口** (每Bank): - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `sramWrite[i].valid` | Output | 1 | 写请求有效信号 | -| `sramWrite[i].ready` | Input | 1 | 写请求就绪信号 | -| `sramWrite[i].bits.addr` | Output | log2(entries) | 写地址 | -| `sramWrite[i].bits.data` | Output | 128 | 写数据(16个INT8 = 128位)| -| `sramWrite[i].bits.mask` | Output | 16 | 写掩码(按INT8元素)| - -### 3.3 Accumulator存储接口 (ACC Interface) - -Accumulator存储INT32数据。Bank数量由配置参数`acc_banks`决定(典型值2)。 - -**读接口** (每Bank): - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `accRead[i].req.valid` | Output | 1 | 读请求有效信号 | -| `accRead[i].req.ready` | Input | 1 | 读请求就绪信号 | -| `accRead[i].req.bits.addr` | Output | log2(acc_entries) | 读地址 | -| `accRead[i].resp.valid` | Input | 1 | 读响应有效信号 | -| `accRead[i].resp.bits.data` | Input | 512 | 读数据(16个INT32 = 512位)| - -**写接口** (每Bank): - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `accWrite[i].valid` | Output | 1 | 写请求有效信号 | -| `accWrite[i].ready` | Input | 1 | 写请求就绪信号 | -| `accWrite[i].bits.addr` | Output | log2(acc_entries) | 写地址 | -| `accWrite[i].bits.data` | Output | 512 | 写数据(16个INT32 = 512位)| -| `accWrite[i].bits.mask` | Output | 16 | 写掩码(按INT32元素)| - -### 3.4 状态监控接口 (Status Interface) - -GELU单元提供状态监控接口,用于外部观察当前运行状态: - -| 信号名称 | 方向 | 位宽 | 描述 | -|---------|------|------|------| -| `status.ready` | Output | 1 | 设备准备好接受新输入 | -| `status.valid` | Output | 1 | 设备有有效输出 | -| `status.idle` | Output | 1 | 空闲状态(无输入无输出)| -| `status.init` | Output | 1 | 初始化状态(有输入但无输出)| -| `status.running` | Output | 1 | 运行状态(已开始产生输出)| -| `status.complete` | Output | 1 | 完成信号(完全完成当前批次)| -| `status.iter` | Output | 32 | 已完成的批次迭代计数 | - -**状态转换关系**: - -``` -idle (ready=1, valid=0, idle=1) - ↓ [cmdReq.fire] -init (ready=0, valid=0, init=1) - ↓ [开始产生输出] -running (ready=0, valid=1, running=1) - ↓ [所有数据处理完成] -complete (complete=1) - ↓ [cmdResp.fire] -idle (iter += 1) -``` - -**典型实现**: - -```scala -// Status tracking registers -val iterCnt = RegInit(0.U(32.W)) -val hasInput = RegInit(false.B) -val hasOutput = RegInit(false.B) - -when(io.cmdReq.fire) { - hasInput := true.B -} -when(io.cmdResp.fire) { - hasOutput := false.B - hasInput := false.B - iterCnt := iterCnt + 1.U -} -when(io.cmdResp.valid && !hasOutput) { - hasOutput := true.B -} - -// Status signal assignments -io.status.ready := io.cmdReq.ready -io.status.valid := io.cmdResp.valid -io.status.idle := !hasInput && !hasOutput -io.status.init := hasInput && !hasOutput -io.status.running := hasOutput -io.status.complete := io.cmdResp.fire -io.status.iter := iterCnt -``` - -### 3.5 时钟和复位接口 - -| 信号名称 | 方向 | 描述 | -|---------|------|------| -| `clock` | Input | 全局时钟信号 | -| `reset` | Input | 全局同步复位信号 (高有效) | - - -## 4. 寄存器映射 (Register Map) - -### 4.1 内部控制寄存器 - -GELU单元不直接暴露APB寄存器接口,而是通过Ball Domain命令接口进行控制。内部状态寄存器如下: - -| 寄存器名称 | 位宽 | 复位值 | 描述 | -|-----------|------|--------|------| -| `state` | 3 | `idle` | 状态机状态: idle/load/exec/store/complete | -| `rob_id_reg` | 10 | 0 | 当前处理指令的ROB ID | -| `iter_reg` | 10 | 0 | 迭代次数寄存器 | -| `iter_cnt` | 10 | 0 | 迭代计数器 | -| `op1_bank_reg` | 2 | 0 | 操作数Bank寄存器 | -| `op1_addr_reg` | 12 | 0 | 操作数地址寄存器 | -| `wr_bank_reg` | 2 | 0 | 写回Bank寄存器 | -| `wr_addr_reg` | 12 | 0 | 写回地址寄存器 | -| `is_acc_reg` | 1 | 0 | 写回目标类型寄存器 | -| `load_cnt` | 4 | 0 | 加载计数器 (跟踪SRAM读延迟) | -| `exec_cnt` | 5 | 0 | 执行计数器 (跟踪流水线进度) | -| `iter_cnt` | 32 | 0 | 批次迭代计数器 (用于status.iter) | -| `has_input` | 1 | 0 | 输入状态标志 (用于status跟踪) | -| `has_output` | 1 | 0 | 输出状态标志 (用于status跟踪) | - -### 4.2 状态机编码 - -| 状态名称 | 编码 | 描述 | -|---------|------|------| -| `idle` | 3'b000 | 空闲状态,等待命令 | -| `load` | 3'b001 | 加载数据状态 | -| `exec` | 3'b010 | 执行计算状态 | -| `store` | 3'b011 | 写回结果状态 | -| `complete` | 3'b100 | 完成响应状态 | - - -## 5. 功能描述 (Functional Description) - -### 5.1 操作流程 - -#### 5.1.1 指令接收 (Idle → Load) - -1. **空闲等待**: 状态机处于`idle`状态,监听`cmdReq.valid`信号 -2. **指令解码**: 当`cmdReq.valid && cmdReq.ready`时,捕获指令参数: - ```scala - rob_id_reg := cmdReq.bits.rob_id - iter_reg := cmdReq.bits.iter - op1_bank_reg := cmdReq.bits.op1_bank - op1_addr_reg := cmdReq.bits.op1_bank_addr - wr_bank_reg := cmdReq.bits.wr_bank - wr_addr_reg := cmdReq.bits.wr_bank_addr - is_acc_reg := cmdReq.bits.is_acc - ``` -3. **状态转移**: 转移到`load`状态 - -#### 5.1.2 数据加载 (Load) - -1. **发起读请求**: 向SRAM发起读请求 - ```scala - sramRead(op1_bank_reg).req.valid := true.B - sramRead(op1_bank_reg).req.bits.addr := op1_addr_reg + iter_cnt - ``` -2. **数据接收**: 等待`sramRead.resp.valid`,将数据存入缓冲寄存器 -3. **迭代控制**: 每次加载完成后,`iter_cnt`递增 -4. **状态转移**: 当所有数据加载完成时,转移到`exec`状态 - -#### 5.1.3 GELU计算 (Execute) - -执行流水线化的GELU近似计算,每个向量元素并行处理: - -**步骤1**: 计算x³ -```scala -val x2 = x * x -val x3 = x2 * x -``` - -**步骤2**: 计算内层多项式 -```scala -val poly = x + (x3 * 0.044715.F(32.BP)) -``` - -**步骤3**: 缩放 -```scala -val scaled = poly * 0.7978845608.F(32.BP) -``` - -**步骤4**: Tanh近似 -```scala -val tanh_out = tanhApprox(scaled) -``` - -**步骤5**: 最终组合 -```scala -val gelu_out = 0.5.F(32.BP) * x * (1.F(32.BP) + tanh_out) -``` - -#### 5.1.4 结果写回 (Store) - -1. **目标选择**: 根据`is_acc_reg`决定写入SRAM或ACC -2. **写请求**: - ```scala - when(is_acc_reg) { - accWrite(wr_bank_reg).valid := true.B - accWrite(wr_bank_reg).bits.addr := wr_addr_reg + iter_cnt - accWrite(wr_bank_reg).bits.data := gelu_result - }.otherwise { - sramWrite(wr_bank_reg).valid := true.B - sramWrite(wr_bank_reg).bits.addr := wr_addr_reg + iter_cnt - sramWrite(wr_bank_reg).bits.data := gelu_result - } - ``` -3. **迭代控制**: 所有结果写回完成后,转移到`complete`状态 - -#### 5.1.5 完成响应 (Complete) - -1. **发送完成信号**: - ```scala - cmdResp.valid := true.B - cmdResp.bits.rob_id := rob_id_reg - cmdResp.bits.commit := true.B - ``` -2. **状态复位**: 返回`idle`状态,准备接收下一条指令 - -### 5.2 Tanh近似算法 - -由于硬件实现完整的tanh函数代价高昂,采用分段线性近似或查找表方法: - -#### 方案1: 查找表 (LUT) - -- **输入范围**: [-4, 4],分为256个区间 -- **表项**: 每个区间存储斜率和截距 -- **插值**: 线性插值计算精确值 -- **误差**: < 0.01 - -#### 方案2: 分段多项式 - -将输入域划分为多个区间,每个区间用二阶多项式近似: - -``` -tanh(x) ≈ a₂x² + a₁x + a₀ (for x ∈ [x_min, x_max]) -``` - -- **区间数**: 8-16个区间 -- **系数存储**: 每区间3个系数 (a₀, a₁, a₂) -- **误差**: < 0.001 - -### 5.3 数据格式支持 - -#### 5.3.1 浮点数格式 (FP32) - -- **符号位**: 1位 -- **指数位**: 8位 (偏移127) -- **尾数位**: 23位 -- **范围**: ±1.4E-45 至 ±3.4E+38 -- **精度**: 约7位十进制 - -#### 5.3.2 半精度浮点 (FP16) - -- **符号位**: 1位 -- **指数位**: 5位 (偏移15) -- **尾数位**: 10位 -- **优势**: 节省50%存储和带宽 - -#### 5.3.3 定点数格式 (INT8/INT16) - -对于量化模型,支持定点数GELU计算: - -- **量化参数**: scale, zero_point -- **计算流程**: - 1. 反量化: `x_fp = (x_int - zero_point) * scale` - 2. GELU计算: `y_fp = GELU(x_fp)` - 3. 量化: `y_int = round(y_fp / scale) + zero_point` - -### 5.4 向量化处理 - -GELU单元支持向量级并行计算,典型配置为16通道: - -``` -Input Vector: [x₀, x₁, x₂, ..., x₁₅] - │ │ │ │ - ┌────┼───┼───┼───────┼────┐ - │ GELU Compute Array (16x) │ - └────┼───┼───┼───────┼────┘ - │ │ │ │ -Output Vector: [y₀, y₁, y₂, ..., y₁₅] -``` - -每个通道独立计算,共享控制逻辑但使用独立的数据路径。 - -### 5.5 流水线控制 - -4级流水线允许不同阶段重叠执行: - -| 周期 | Stage 0 | Stage 1 | Stage 2 | Stage 3 | -|-----|---------|---------|---------|---------| -| T0 | Decode | - | - | - | -| T1 | Decode | Load | - | - | -| T2 | Decode | Load | Exec | - | -| T3 | Decode | Load | Exec | Store | -| T4 | Idle | Load | Exec | Store | - -**吞吐率**: 在连续向量处理时,可达1次GELU/周期(每次处理16个元素) - - -## 6. 时序特性 (Timing Characteristics) - -### 6.1 延迟分析 - -| 操作阶段 | 周期数 | 说明 | -|---------|-------|------| -| 指令解码 (ID) | 1 | 命令参数捕获 | -| 数据加载 (Load) | 3-4 | SRAM读延迟 | -| GELU计算 (Exec) | 8-10 | 多级乘法和tanh近似 | -| 结果写回 (Store) | 2-3 | SRAM/ACC写延迟 | -| 完成响应 (Complete) | 1 | ROB通知 | -| **总延迟** | **15-19** | 典型值16周期 | - -### 6.2 关键路径 - -最长组合逻辑路径出现在Execute阶段的乘法器链: - -``` -x → MUL(x²) → MUL(x³) → MUL(poly) → TANH → MUL(final) -``` - -**优化策略**: -1. 插入流水线寄存器,将8-10周期的计算分解为多个子阶段 -2. 使用Booth编码乘法器减少关键路径 -3. Tanh LUT采用双端口SRAM提高访问速度 - - -## 7. 配置参数 (Configuration Parameters) - -### 7.1 编译时参数 - -通过`CustomBuckyballConfig`配置: - -| 参数名称 | 类型 | 默认值 | 描述 | -|---------|------|--------|------| -| `veclane` | Int | 16 | 向量通道数 | -| `sp_banks` | Int | 4 | Scratchpad Bank数量 | -| `acc_banks` | Int | 2 | Accumulator Bank数量 | -| `spad_bank_entries` | Int | 1024 | 每个SRAM Bank条目数 | -| `acc_bank_entries` | Int | 512 | 每个ACC Bank条目数 | -| `inputType` | DataType | FP32 | 输入数据类型 | -| `outputType` | DataType | FP32 | 输出数据类型 | -| `gelu_pipeline_depth` | Int | 10 | GELU流水线深度 | -| `tanh_lut_size` | Int | 256 | Tanh查找表大小 | - -### 7.2 运行时参数 - -通过Ball Domain指令传递: - -| 参数 | 位宽 | 范围 | 描述 | -|-----|------|------|------| -| `iter` | 10 | 1-1024 | 向量迭代次数 | -| `op1_bank` | 2 | 0-3 | 输入数据Bank | -| `op1_bank_addr` | 12 | 0-4095 | 输入起始地址 | -| `wr_bank` | 2 | 0-3 | 输出数据Bank | -| `wr_bank_addr` | 12 | 0-4095 | 输出起始地址 | -| `is_acc` | 1 | 0-1 | 输出目标 (0=SRAM, 1=ACC) | - - -## 9. 验证方案 (Verification Plan) - -### 9.1 功能验证 - -#### 9.1.1 单元测试 - -- **基本功能**: 单个元素GELU计算正确性 -- **边界条件**: 零值、最大值、最小值、NaN、Inf -- **向量处理**: 16元素并行计算一致性 -- **迭代处理**: 多次迭代的地址和数据正确性 - -#### 9.1.2 精度验证 - -与软件参考模型(PyTorch/NumPy)对比: - -```python -import torch -import numpy as np - -# Reference GELU -def gelu_ref(x): - return torch.nn.functional.gelu(x) - -# Test vectors -test_inputs = torch.randn(1000, 16) -golden_outputs = gelu_ref(test_inputs) - -# Compare with hardware outputs -max_error = torch.max(torch.abs(hw_outputs - golden_outputs)) -mean_error = torch.mean(torch.abs(hw_outputs - golden_outputs)) - -assert max_error < 1e-3, f"Max error {max_error} exceeds threshold" -``` - -### 9.3 集成验证 - -- **完整系统**: 在ToyBuckyball环境中集成测试 -- **编译器支持**: 验证MLIR lowering生成正确的GELU指令 -- **端到端**: 运行完整的Transformer模型,验证功能和性能 - - -## 10. 软件接口 (Software Interface) - -### 10.1 MLIR Dialect扩展 - -在Buckyball Dialect中添加GELU操作: - -```mlir -// MLIR IR -%output = buckyball.gelu %input : tensor<1024xf32> - -// Lowering to hardware intrinsic -func.func @gelu_layer(%arg0: memref<1024xf32>, %arg1: memref<1024xf32>) { - %c0 = arith.constant 0 : index - %c1024 = arith.constant 1024 : index - - // Issue GELU instruction to hardware - buckyball.gelu.hw %arg0, %arg1, %c0, %c1024 - {op1_bank = 0, wr_bank = 1, is_acc = 0} - - return -} -``` - -### 10.2 C/C++ Intrinsics - -提供底层硬件访问接口: - -```c -// C intrinsic -void gelu_hw( - float* input, // Input vector address - float* output, // Output vector address - int iter, // Number of vectors (each 16 elements) - int op1_bank, // Input bank - int op1_addr, // Input address offset - int wr_bank, // Output bank - int wr_addr, // Output address offset - bool is_acc // Write to accumulator -) { - // Encode instruction - uint64_t inst = encode_gelu_inst( - iter, op1_bank, op1_addr, wr_bank, wr_addr, is_acc - ); - - // Issue RoCC instruction - ROCC_INSTRUCTION(GELU_OPCODE, inst); - - // Wait for completion (optional) - wait_gelu_complete(); -} -``` - -### 10.3 编译器优化 - -#### Fusion优化 - -将连续的GELU操作与其他算子融合: - -``` -// Before -%1 = matmul(%A, %B) -%2 = add(%1, %bias) -%3 = gelu(%2) - -// After (fused) -%3 = matmul_add_gelu(%A, %B, %bias) -``` - -#### Tiling优化 - -大张量分块处理,优化内存访问: - -``` -// Original -%out = gelu(%in : tensor<10240xf32>) - -// Tiled (64 iterations of 16 elements) -for i in 0..64: - %tile_in = extract(%in, i*16, 16) - %tile_out = gelu(%tile_in) - insert(%out, %tile_out, i*16) -``` - - -## 11. 使用示例 (Usage Examples) - -### 11.1 基本用法 - -```scala -// Instantiate GELU unit -val geluUnit = Module(new GeluUnit) - -// Connect to Ball Domain -geluUnit.io.cmdReq <> ballDomain.io.geluReq -ballDomain.io.geluResp <> geluUnit.io.cmdResp - -// Connect to memory system -for (i <- 0 until sp_banks) { - scratchpad.io.read(i) <> geluUnit.io.sramRead(i) - scratchpad.io.write(i) <> geluUnit.io.sramWrite(i) -} - -for (i <- 0 until acc_banks) { - accumulator.io.read(i) <> geluUnit.io.accRead(i) - accumulator.io.write(i) <> geluUnit.io.accWrite(i) -} - -// Status monitoring (optional) -// Can be connected to performance counters or debug interface -val geluStatus = geluUnit.io.status -``` - -### 11.2 单次向量GELU - -```c -// Process single vector (16 elements) -float input[16] = {-2.0, -1.5, ..., 2.0}; -float output[16]; - -// Load input to SRAM bank 0, address 0x100 -load_to_sram(0, 0x100, input, 16); - -// Issue GELU instruction -gelu_hw( - input, output, - 1, // iter = 1 (single vector) - 0, // op1_bank = 0 - 0x100, // op1_addr - 1, // wr_bank = 1 - 0x200, // wr_addr - false // is_acc = false (write to SRAM) -); - -// Read result from SRAM bank 1, address 0x200 -read_from_sram(1, 0x200, output, 16); -``` - -### 11.3 批量处理 - -```c -// Process 1024 elements (64 vectors) -#define N 1024 -#define VECLANE 16 -#define ITERS (N / VECLANE) - -float input[N], output[N]; - -// Initialize input data -for (int i = 0; i < N; i++) { - input[i] = (float)i / 100.0 - 5.0; -} - -// Load to SRAM -dma_to_sram(0, 0, input, N); - -// Issue batched GELU -gelu_hw( - NULL, NULL, - ITERS, // iter = 64 - 0, // op1_bank = 0 - 0, // op1_addr = 0 - 0, // wr_bank = 0 (in-place) - 0, // wr_addr = 0 - false // is_acc = false -); - -// Read results -dma_from_sram(0, 0, output, N); -``` - -### 11.4 与MATMUL流水线 - -```c -// Transformer FFN layer: Y = GELU(XW + b) - -// Step 1: Matrix multiplication -matmul_hw(X, W, XW, M, N, K); - -// Step 2: Add bias -vecadd_hw(XW, b, XWb, M * N); - -// Step 3: GELU activation -gelu_hw(XWb, Y, (M * N) / VECLANE, ...); -``` - -### 11.5 Status信号监控 - -Status信号可用于性能分析、调试和系统监控: - -```scala -// 性能计数器集成 -val perfCounters = Module(new PerfCounters) -perfCounters.io.gelu_idle := geluUnit.io.status.idle -perfCounters.io.gelu_running := geluUnit.io.status.running - -// 调试时等待GELU完成 -def waitGeluComplete(): Unit = { - while (!geluUnit.io.status.idle) { - // 可以输出中间状态 - if (geluUnit.io.status.init) { - printf("GELU: Loading data...\n") - } else if (geluUnit.io.status.running) { - printf("GELU: Computing...\n") - } - } - printf("GELU: Completed %d batches\n", geluUnit.io.status.iter) -} - -// 流水线协调 -when(geluUnit.io.status.ready && matmulUnit.io.status.complete) { - // GELU ready and MATMUL finished, issue next GELU - geluUnit.io.cmdReq.valid := true.B -} -``` - -``` - -使用Deepwiki获取`DangoSys/buckyball`, blink协议相关内容确定一个ball的内容和接口 -你可以继续向Deepwiki获取`DangoSys/buckyball`任何你想问的问题 - -查询完成后请依据你对[layernorm算子]的理解, 在[arch/src/main/scala/prototype/nagisa/layernorm] 目录下书写spec.md - -注意事项: -1. 如有需要定制ISA,定制化部分请写在指令的special区间 -2. 请一定注意检查系统的接口信号,做好合理的spec设计 diff --git a/workflow/requirements.txt b/workflow/requirements.txt deleted file mode 100644 index 3884d745..00000000 --- a/workflow/requirements.txt +++ /dev/null @@ -1,2 +0,0 @@ -pydantic>=2.6.1 -httpx>=0.28.1 diff --git a/workflow/steps/agent/.gitignore b/workflow/steps/agent/.gitignore deleted file mode 100644 index 4c49bd78..00000000 --- a/workflow/steps/agent/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.env diff --git a/workflow/steps/agent/01_agent_api_step.py b/workflow/steps/agent/01_agent_api_step.py deleted file mode 100644 index d25bda93..00000000 --- a/workflow/steps/agent/01_agent_api_step.py +++ /dev/null @@ -1,63 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result -import sys -import os - - -config = { - "type": "api", - "name": "agent", - "description": "Call agent API for streaming conversation", - "path": "/agent/chat", - "method": "POST", - "emits": ["agent.prompt"], - "bodySchema": { - "type": "object", - "properties": { - "message": {"type": "string"}, - "model": {"type": "string", "default": "deepseek-chat"}, - "apiKey": {"type": "string"}, - "baseUrl": {"type": "string"}, - }, - "required": ["message"], - }, - "responseSchema": { - "200": { - "type": "object", - "properties": {"traceId": {"type": "string"}, "status": {"type": "string"}}, - } - }, - "flows": ["agent"], -} - - -async def handler(req, context): - context.logger.info("agent API - Request received", {"body": req.get("body")}) - body = req.get("body") - message = body.get("message") - model = body.get("model", "deepseek-chat") - api_key = body.get("apiKey") - base_url = body.get("baseUrl") - - # Send event to processing step - await context.emit( - { - "topic": "agent.prompt", - "data": { - "message": message, - "model": model, - "traceId": context.trace_id, - "apiKey": api_key, - "baseUrl": base_url, - }, - } - ) - - # ================================================================================== - # Wait for execution result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/agent/01_agent_event_step.py b/workflow/steps/agent/01_agent_event_step.py deleted file mode 100644 index 492f3f49..00000000 --- a/workflow/steps/agent/01_agent_event_step.py +++ /dev/null @@ -1,154 +0,0 @@ -import httpx -import json -import os -from dotenv import load_dotenv -import sys - -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -load_dotenv() - -config = { - "type": "event", - "name": "agent", - "description": "Handle agent streaming response", - "subscribes": ["agent.prompt"], - "emits": ["agent.response"], - "input": { - "type": "object", - "properties": { - "message": {"type": "string"}, - "model": {"type": "string"}, - "traceId": {"type": "string"}, - "apiKey": {"type": "string"}, - "baseUrl": {"type": "string"}, - }, - }, - "flows": ["agent"], -} - - -async def handler(input_data, context): - context.logger.info("agent - Starting processing", {"input": input_data}) - - message = input_data.get("message") - model = input_data.get("model", "deepseek-chat") - trace_id = input_data.get("traceId") - - # API configuration: prefer parameters passed in, otherwise use environment variables - api_key = input_data.get("apiKey") or os.getenv("API_KEY") - base_url = input_data.get("baseUrl") or os.getenv( - "BASE_URL", "https://api.deepseek.com/v1" - ) - - if not api_key: - error_msg = "API Key not provided" - context.logger.error(error_msg) - await context.emit( - { - "topic": "agent.error", - "data": { - "error": error_msg, - "original_message": message, - "traceId": trace_id, - }, - } - ) - await check_result(context, 1, continue_run=False) - return - - headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"} - - payload = { - "model": model, - "messages": [{"role": "user", "content": message}], - "stream": True, - "temperature": 0.7, - } - - try: - async with httpx.AsyncClient() as client: - async with client.stream( - "POST", - f"{base_url}/chat/completions", - headers=headers, - json=payload, - timeout=60.0, - ) as response: - - if response.status_code != 200: - context.logger.error(f"agent API error: {response.status_code}") - return - - full_response = "" - - async for line in response.aiter_lines(): - if line.startswith("data: "): - # Remove "data: " prefix - data = line[6:] - - if data == "[DONE]": - break - - try: - chunk = json.loads(data) - if "choices" in chunk and len(chunk["choices"]) > 0: - delta = chunk["choices"][0].get("delta", {}) - content = delta.get("content", "") - - if content: - full_response += content - context.logger.info(f"{content}") - - except json.JSONDecodeError: - continue - - # Send complete response - await context.emit( - { - "topic": "agent.response", - "data": { - "response": full_response, - "original_message": message, - "traceId": trace_id, - }, - } - ) - - context.logger.info( - "agent processing completed", - {"response_length": len(full_response), "traceId": trace_id}, - ) - - # Pass response content back to API via extra_fields - success_result, failure_result = await check_result( - context, 0, continue_run=False, extra_fields={"response": full_response} - ) - - except Exception as e: - context.logger.error(f"agent API call failed: {str(e)}") - await context.emit( - { - "topic": "agent.error", - "data": { - "error": str(e), - "original_message": message, - "traceId": trace_id, - }, - } - ) - - # Pass error information back to API via extra_fields - success_result, failure_result = await check_result( - context, 1, continue_run=False, extra_fields={"error": str(e)} - ) - - # ================================================================================== - # finish workflow - # ================================================================================== - return diff --git a/workflow/steps/agent/README.md b/workflow/steps/agent/README.md deleted file mode 100644 index e6fe4b89..00000000 --- a/workflow/steps/agent/README.md +++ /dev/null @@ -1,40 +0,0 @@ -# Agent Workflow - -AI assistant workflow in Buckyball framework, providing conversational interaction with AI models. - -## API Usage - -### `chat` -**Endpoint**: `POST /agent/chat` - -**Function**: Conversational interaction with AI assistant - -**Parameters**: -- **`message`** [Required] - Message content to send to AI -- **`model`** - AI model to use, default `"deepseek-chat"` - -**Examples**: -```bash -# Basic conversation -bbdev agent --chat "--message 'Hello, can you help me with Buckyball development?'" - -# Specify model -bbdev agent --chat "--message 'Explain this Scala code' --model deepseek-chat" - -# Code analysis -bbdev agent --chat "--message 'Please analyze this Chisel module and suggest optimizations'" -``` - -**Response**: -```json -{ - "traceId": "unique-trace-id", - "status": "success" -} -``` - -## Notes - -- Requires configured AI model API key -- Responses use streaming output -- Note message length limits diff --git a/workflow/steps/agent/example_prompt.md b/workflow/steps/agent/example_prompt.md deleted file mode 100644 index c834ca3b..00000000 --- a/workflow/steps/agent/example_prompt.md +++ /dev/null @@ -1,22 +0,0 @@ -# Code Generation Task - -## Objective -Create a simple Python calculator module with basic arithmetic operations. - -## Requirements -1. Create a file named `calculator.py` -2. Implement the following functions: - - `add(a, b)` - returns sum of two numbers - - `subtract(a, b)` - returns difference of two numbers - - `multiply(a, b)` - returns product of two numbers - - `divide(a, b)` - returns quotient of two numbers (handle division by zero) - -3. Each function should include: - - Type hints - - Docstrings - - Error handling where appropriate - -## Code Style -- Use 2 spaces for indentation -- Follow PEP 8 guidelines -- Keep function names concise but descriptive diff --git a/workflow/steps/agent/example_refactor_prompt.md b/workflow/steps/agent/example_refactor_prompt.md deleted file mode 100644 index 07089729..00000000 --- a/workflow/steps/agent/example_refactor_prompt.md +++ /dev/null @@ -1,37 +0,0 @@ -# 代码重构任务示例 - -## 目标 -重构现有的 calculator.py 文件,添加错误处理和日志功能 - -## 需求 - -### 1. 分析现有代码 -- 先读取 calculator.py 文件查看当前实现 -- 分析可以改进的地方 - -### 2. 添加日志功能 -- 为每个函数添加日志记录 -- 记录函数调用和参数 -- 记录计算结果 - -### 3. 增强错误处理 -- 添加输入类型检查 -- 完善除零错误处理 -- 添加有意义的错误消息 - -### 4. 代码优化 -- 保持现有的函数签名和文档字符串 -- 确保向后兼容性 -- 遵循 PEP 8 规范 - -## 代码风格 -- 使用 2 空格缩进 -- 添加详细的注释 -- 保持函数简洁 - -## 预期行为 -AI 应该: -1. 首先调用 read_file 读取现有的 calculator.py -2. 分析代码结构 -3. 调用 write_file 写入改进后的版本 -4. 解释做了哪些改进 diff --git a/workflow/steps/agent/test_code_agent.sh b/workflow/steps/agent/test_code_agent.sh deleted file mode 100755 index e53e64a1..00000000 --- a/workflow/steps/agent/test_code_agent.sh +++ /dev/null @@ -1,90 +0,0 @@ -#!/bin/bash - -# Code Agent 测试脚本 -# 演示如何使用 Function Calling 功能 - -BASE_URL="http://localhost:3001" -WORK_DIR="/home/mio/Code/buckyball/workflow/steps/agent" -MODEL="claude-sonnet-4-20250514" - -echo "==========================================" -echo "Code Agent Function Calling 测试" -echo "==========================================" -echo "" -echo "⚠️ 注意:Function Calling 模式下,AI 会多次调用工具" -echo " 每次请求可能需要 10-30 秒,请耐心等待" -echo "" - -echo "" -echo "测试 1: 单次代码生成" -echo "----------------------------" -echo "⏳ 请等待,AI 正在思考和调用工具..." -curl -s -X POST "$BASE_URL/agent/code" \ - -H "Content-Type: application/json" \ - -d "{ - \"promptPath\": \"example_prompt.md\", - \"model\": \"$MODEL\", - \"workDir\": \"$WORK_DIR\" - }" | jq - -echo "" -echo "" -echo "测试 2: 代码重构(需要先读取现有文件)" -echo "----------------------------" -echo "⏳ AI 会先读取文件,然后生成代码..." -curl -s -X POST "$BASE_URL/agent/code" \ - -H "Content-Type: application/json" \ - -d "{ - \"promptPath\": \"example_refactor_prompt.md\", - \"model\": \"$MODEL\", - \"workDir\": \"$WORK_DIR\" - }" | jq - -echo "" -echo "" -echo "测试 3: 多轮对话 - 第一轮" -echo "----------------------------" -echo "⏳ 创建新会话..." -SESSION_ID="test-session-$(date +%s)" -curl -s -X POST "$BASE_URL/agent/code" \ - -H "Content-Type: application/json" \ - -d "{ - \"promptPath\": \"example_prompt.md\", - \"workDir\": \"$WORK_DIR\", - \"model\": \"$MODEL\", - \"sessionId\": \"$SESSION_ID\" - }" | jq - -echo "" -echo "等待 2 秒..." -sleep 2 - -echo "" -echo "测试 3: 多轮对话 - 第二轮(使用同一个 session)" -echo "----------------------------" -echo "⏳ 继续之前的会话..." -# 创建一个临时 prompt -cat > /tmp/continue_prompt.md << EOF -# 继续任务 - -基于刚才创建的代码,请添加一个测试文件 test_calculator.py,包含基本的单元测试。 -EOF - -curl -s -X POST "$BASE_URL/agent/code" \ - -H "Content-Type: application/json" \ - -d "{ - \"promptPath\": \"/tmp/continue_prompt.md\", - \"workDir\": \"$WORK_DIR\", - \"model\": \"$MODEL\", - \"sessionId\": \"$SESSION_ID\" - }" | jq - -echo "" -echo "==========================================" -echo "测试完成!" -echo "==========================================" -echo "" -echo "查看生成的文件:" -echo " ls -la $WORK_DIR/*.py" -echo "" -echo "检查日志了解 AI 的工具调用过程" diff --git a/workflow/steps/compiler/01_build_api_step.py b/workflow/steps/compiler/01_build_api_step.py deleted file mode 100644 index e2420520..00000000 --- a/workflow/steps/compiler/01_build_api_step.py +++ /dev/null @@ -1,26 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Build Compiler", - "description": "build bitstream", - "path": "/compiler/build", - "method": "POST", - "emits": ["compiler.build"], - "flows": ["compiler"], -} - - -async def handler(req, context): - body = req.get("body") or {} - await context.emit({"topic": "compiler.build", "data": body}) - - # ================================================================================== - # Wait for build result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/compiler/01_build_event_step.py b/workflow/steps/compiler/01_build_event_step.py deleted file mode 100644 index 6cbd2dde..00000000 --- a/workflow/steps/compiler/01_build_event_step.py +++ /dev/null @@ -1,56 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "Build Compiler", - "description": "build bitstream", - "subscribes": ["compiler.build"], - "emits": [], - "flows": ["compiler"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - script_dir = f"{bbdir}/workflow/steps/compiler/scripts" - yaml_dir = f"{script_dir}/yaml" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"source {bbdir}/env.sh && mkdir -p {bbdir}/compiler/build" - result = stream_run_logger( - cmd=command, - logger=context.logger, - stdout_prefix="compiler build", - stderr_prefix="compiler build", - ) - command = f"cd {bbdir}/compiler/build && ninja -j{os.cpu_count()}" - result = stream_run_logger( - cmd=command, - logger=context.logger, - stdout_prefix="compiler build", - stderr_prefix="compiler build", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # ================================================================================== - return diff --git a/workflow/steps/compiler/README.md b/workflow/steps/compiler/README.md deleted file mode 100644 index 993560bc..00000000 --- a/workflow/steps/compiler/README.md +++ /dev/null @@ -1,33 +0,0 @@ -# Compiler Workflow - -Compiler build workflow in the Buckyball framework for building the Buckyball compiler toolchain. - -## API Usage - -### `build` -**Endpoint**: `POST /compiler/build` - -**Function**: Build Buckyball compiler - -**Parameters**: No specific parameters - -**Example**: -```bash -bbdev compiler --build -``` - -**Response**: -```json -{ - "status": 200, - "body": { - "success": true, - "processing": false, - "return_code": 0 - } -} -``` - -## Notes - -- Ensure the system has necessary build tools and dependencies diff --git a/workflow/steps/doc-agent/.env b/workflow/steps/doc-agent/.env deleted file mode 100644 index 596589c5..00000000 --- a/workflow/steps/doc-agent/.env +++ /dev/null @@ -1,2 +0,0 @@ -API_KEY=replace_with_your_key -BASE_URL=https://api.deepseek.com/v1 diff --git a/workflow/steps/doc-agent/00_doc_agent_api_step.py b/workflow/steps/doc-agent/00_doc_agent_api_step.py deleted file mode 100644 index 308a699d..00000000 --- a/workflow/steps/doc-agent/00_doc_agent_api_step.py +++ /dev/null @@ -1,112 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result -import sys -import os - - -config = { - "type": "api", - "name": "doc_agent", - "description": "Generate code directory documentation", - "path": "/doc/generate", - "method": "POST", - "emits": ["doc.generate"], - "bodySchema": { - "type": "object", - "properties": { - "target_path": {"type": "string"}, - "mode": {"type": "string", "enum": ["create", "update"]}, - }, - "required": ["target_path", "mode"], - }, - "responseSchema": { - "200": { - "type": "object", - "properties": { - "traceId": {"type": "string"}, - "status": {"type": "string"}, - "message": {"type": "string"}, - }, - }, - "400": { - "type": "object", - "properties": {"error": {"type": "string"}, "details": {"type": "string"}}, - }, - }, - "flows": ["doc_agent"], -} - - -async def handler(req, context): - context.logger.info("doc-agent API - Request received", {"body": req.get("body")}) - - # Parameter validation - body = req.get("body", {}) - target_path = body.get("target_path") - mode = body.get("mode") - - # Validate required parameters - if not target_path: - context.logger.error("doc-agent API - Missing target_path parameter") - return { - "status": 400, - "body": { - "error": "Missing required parameter", - "details": "target_path is required", - }, - } - - if not mode: - context.logger.error("doc-agent API - Missing mode parameter") - return { - "status": 400, - "body": { - "error": "Missing required parameter", - "details": "mode is required", - }, - } - - # Validate mode parameter value - if mode not in ["create", "update"]: - context.logger.error("doc-agent API - Invalid mode parameter", {"mode": mode}) - return { - "status": 400, - "body": { - "error": "Invalid parameter value", - "details": "mode must be either 'create' or 'update'", - }, - } - - # Validate target_path exists - if not os.path.exists(target_path): - context.logger.error( - "doc-agent API - Target path does not exist", {"target_path": target_path} - ) - return { - "status": 400, - "body": { - "error": "Invalid target path", - "details": f"Path '{target_path}' does not exist", - }, - } - - # Send event to processing step - await context.emit( - { - "topic": "doc.generate", - "data": { - "target_path": target_path, - "mode": mode, - "traceId": context.trace_id, - }, - } - ) - - # ================================================================================== - # Wait for execution result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/doc-agent/01_doc_agent_event_step.py b/workflow/steps/doc-agent/01_doc_agent_event_step.py deleted file mode 100644 index b24c39bf..00000000 --- a/workflow/steps/doc-agent/01_doc_agent_event_step.py +++ /dev/null @@ -1,195 +0,0 @@ -import httpx -import json -import os -import sys -from pathlib import Path -from dotenv import load_dotenv - -# Import local utility modules -current_dir = Path(__file__).parent -sys.path.insert(0, str(current_dir)) -from doc_utils import detect_doc_type, load_prompt_template, prepare_update_mode_prompt - -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.event_common import check_result - -load_dotenv() - -config = { - "type": "event", - "name": "doc_agent", - "description": "Handle documentation generation requests", - "subscribes": ["doc.generate"], - "emits": ["doc.response", "doc.integrate"], - "input": { - "type": "object", - "properties": { - "target_path": {"type": "string"}, - "mode": {"type": "string"}, - "traceId": {"type": "string"}, - }, - }, - "flows": ["doc_agent"], -} - - -async def handler(input_data, context): - context.logger.info("doc-agent - Start processing", {"input": input_data}) - - target_path = input_data.get("target_path") - mode = input_data.get("mode") - trace_id = input_data.get("traceId") - - try: - # 1. Detect document type and prepare prompt - doc_type = detect_doc_type(target_path) - context.logger.info( - "doc-agent - Document type detected", {"doc_type": doc_type} - ) - - prompt_template = load_prompt_template(doc_type, target_path) - prompt_template = prepare_update_mode_prompt(prompt_template, target_path, mode) - - # 2. Call LLM API to generate documentation - full_response = await generate_documentation(prompt_template, context) - - # 3. Save documentation - output_path = os.path.join(target_path, "README.md") - with open(output_path, "w", encoding="utf-8") as f: - f.write(full_response) - context.logger.info( - "doc-agent - Documentation saved", {"output_path": output_path} - ) - - # 4. Send integration event - await context.emit( - { - "topic": "doc.integrate", - "data": { - "target_path": target_path, - "output_path": output_path, - "doc_type": doc_type, - "traceId": trace_id, - }, - } - ) - - # 5. Send completion response - await send_success_response( - context, target_path, mode, doc_type, output_path, full_response, trace_id - ) - - except Exception as e: - await send_error_response(context, str(e), target_path, mode, trace_id) - - -async def generate_documentation(prompt_template, context): - """Call LLM API to generate documentation""" - api_key = os.getenv("API_KEY") - base_url = os.getenv("BASE_URL", "https://api.deepseek.com/v1") - - headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"} - payload = { - "model": "deepseek-chat", - "messages": [{"role": "user", "content": prompt_template}], - "stream": True, - "temperature": 0.3, - } - - async with httpx.AsyncClient() as client: - async with client.stream( - "POST", - f"{base_url}/chat/completions", - headers=headers, - json=payload, - timeout=120.0, - ) as response: - - if response.status_code != 200: - error_text = await response.atext() - raise Exception( - f"API call failed: {response.status_code}, {error_text}" - ) - - full_response = "" - async for line in response.aiter_lines(): - if line.startswith("data: ") and line[6:] != "[DONE]": - try: - chunk = json.loads(line[6:]) - content = ( - chunk.get("choices", [{}])[0] - .get("delta", {}) - .get("content", "") - ) - if content: - full_response += content - context.logger.info(f"{content}") - except json.JSONDecodeError: - continue - - return full_response - - -async def send_success_response( - context, target_path, mode, doc_type, output_path, full_response, trace_id -): - """Send success response""" - await context.emit( - { - "topic": "doc.response", - "data": { - "response": full_response, - "target_path": target_path, - "mode": mode, - "doc_type": doc_type, - "output_path": output_path, - "traceId": trace_id, - }, - } - ) - - success_result, failure_result = await check_result( - context, - 0, - continue_run=False, - extra_fields={ - "message": f"Documentation generation successful: {output_path}", - "data": { - "target_path": target_path, - "mode": mode, - "doc_type": doc_type, - "output_path": output_path, - "response_length": len(full_response), - }, - }, - ) - - -async def send_error_response(context, error_msg, target_path, mode, trace_id): - """Send error response""" - full_error_msg = f"doc-agent processing failed: {error_msg}" - context.logger.error(full_error_msg) - - await context.emit( - { - "topic": "doc.error", - "data": { - "error": full_error_msg, - "target_path": target_path, - "mode": mode, - "traceId": trace_id, - }, - } - ) - - success_result, failure_result = await check_result( - context, - 1, - continue_run=False, - extra_fields={ - "error": full_error_msg, - }, - ) diff --git a/workflow/steps/doc-agent/README.md b/workflow/steps/doc-agent/README.md deleted file mode 100644 index f9c15216..00000000 --- a/workflow/steps/doc-agent/README.md +++ /dev/null @@ -1,46 +0,0 @@ -# Doc-Agent Workflow - -Documentation generation workflow in the Buckyball framework, providing automated code documentation generation functionality. - -## API Usage Guide - -### `generate` -**Endpoint**: `POST /doc/generate` - -**Function**: Generate documentation for specified directory - -**Parameters**: -- **`target_path`** [Required] - Target directory path -- **`mode`** [Required] - Generation mode, options: `"create"`, `"update"` - -**Example**: -```bash -# Create new documentation for specified directory -bbdev doc --generate "--target_path arch/src/main/scala/framework --mode create" - -# Update existing documentation -bbdev doc --generate "--target_path arch/src/main/scala/framework --mode update" -``` - -**Response**: -```json -{ - "traceId": "unique-trace-id", - "status": "success", - "message": "Documentation generated successfully" -} -``` - -## Supported Document Types - -- RTL hardware documentation -- Test documentation -- Script documentation -- Simulator documentation -- Workflow documentation - -## Important Notes - -- Requires AI model API key configuration -- Generated documentation is automatically integrated into the mdBook system -- Supports symbolic link management and automatic SUMMARY.md updates diff --git a/workflow/steps/doc-agent/doc_agent_README.md b/workflow/steps/doc-agent/doc_agent_README.md deleted file mode 100644 index 252187dd..00000000 --- a/workflow/steps/doc-agent/doc_agent_README.md +++ /dev/null @@ -1,401 +0,0 @@ -# Doc-Agent User Documentation - -## Overview - -Doc-Agent is an automated documentation generation system built on the Motia framework, capable of automatically generating high-quality Chinese technical documentation for different types of directories in the codebase. The system supports various document types, including RTL hardware documentation, test documentation, script documentation, simulator documentation, and workflow documentation. - -## System Architecture - -Doc-Agent adopts an event-driven microservice architecture: - -``` -API Interface → Event Processing → Document Generation → Integration Management → mdBook Integration -``` - -- **API Step**: Receives HTTP requests and triggers document generation events -- **Event Step**: Handles document generation logic and calls LLM API -- **Integration Step**: Manages symbolic links and SUMMARY.md updates -- **Template System**: Provides dedicated templates for various document types - -## API Interface Description - -### Endpoint Information - -- **URL**: `POST /doc/generate` -- **Content-Type**: `application/json` -- **Description**: Generate documentation for specified directory - -### Request Parameters - -| Parameter | Type | Required | Description | -|--------|------|------|------| -| `target_path` | string | Yes | Relative path to target code directory | -| `mode` | string | Yes | Generation mode: `create` or `update` | - -#### Mode Description - -- **create**: Create new documentation from scratch, suitable for directories without existing documentation -- **update**: Update existing documentation, retain accurate content, correct outdated information - -### Response Format - -#### Success Response (200) -```json -{ - "status": "success", - "message": "Documentation generation task started", - "data": { - "target_path": "arch/src/main/scala/framework", - "mode": "create", - "doc_type": "rtl", - "trace_id": "doc-gen-20241201-001" - } -} -``` - -#### Error Response (400/500) -```json -{ - "status": "error", - "message": "Error description", - "error_code": "INVALID_PATH", - "details": { - "target_path": "Provided path does not exist or is inaccessible" - } -} -``` - -## Usage Examples - -### Basic Usage - -#### 1. Generate RTL Hardware Documentation -```bash -curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d '{ - "target_path": "arch/src/main/scala/framework/builtin", - "mode": "create" - }' -``` - -#### 2. Update Test Documentation -```bash -curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d '{ - "target_path": "bb-tests/workloads/src", - "mode": "update" - }' -``` - -#### 3. Generate Script Documentation -```bash -curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d '{ - "target_path": "scripts/docker", - "mode": "create" - }' -``` - -### Batch Processing Examples - -#### Process Entire Test Directory -```bash -# Process all bb-tests subdirectories -for dir in workloads customext sardine uvbb; do - curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d "{\"target_path\": \"bb-tests/$dir\", \"mode\": \"create\"}" - sleep 2 # Avoid concurrent overload -done -``` - -#### Batch Update Existing Documentation -```bash -# Update documentation for all major directories -targets=("arch/src/main/scala" "bb-tests/workloads" "scripts" "sims/func-sim" "workflow/steps") - -for target in "${targets[@]}"; do - echo "Updating documentation: $target" - curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d "{\"target_path\": \"$target\", \"mode\": \"update\"}" - echo "Waiting for processing to complete..." - sleep 5 -done -``` - -## Supported Document Types - -The system automatically identifies document types based on directory paths: - -| Path Pattern | Doc Type | Template File | Description | -|----------|----------|----------|------| -| `arch/src/main/scala/**` | RTL | rtl-doc.md | RTL hardware module documentation | -| `bb-tests/workloads/**` | Workloads | workloads-doc.md | Workload test documentation | -| `bb-tests/customext/**` | CustomExt | customext-doc.md | Custom extension test documentation | -| `bb-tests/sardine/**` | Sardine | sardine-doc.md | Sardine test framework documentation | -| `bb-tests/uvbb/**` | UVBB | uvbb-doc.md | UVBB test documentation | -| `scripts/**` | Script | script-doc.md | Script and tool documentation | -| `sims/**` | Simulator | sim-doc.md | Simulator documentation | -| `workflow/**` | Workflow | workflow-doc.md | Workflow and automation documentation | - -## Documentation Standards - -All generated documentation follows unified standards: - -### Language Specifications -- **Main Language**: Chinese -- **Technical Terms**: Keep original English -- **Code Comments**: Provide Chinese explanations -- **Professional Tone**: Avoid using emojis and informal expressions - -### Format Specifications -- **Markdown Format**: Standard Markdown syntax -- **Code Blocks**: Use syntax highlighting -- **Links**: Use relative paths -- **Diagrams**: Support Mermaid diagrams - -### Structure Specifications -Different document types have different structure requirements, but all include: -- Overview section -- Code structure analysis -- Detailed explanation -- Usage examples (if applicable) - -## Integration Features - -### Automatic Integration to mdBook - -Generated documentation is automatically integrated into the project's mdBook documentation system: - -1. **Symbolic Link Creation**: Create corresponding directory structure under `docs/bb-note/src/` -2. **SUMMARY.md Update**: Automatically add new documentation to the table of contents -3. **Structure Validation**: Ensure code directories and documentation directories correspond one-to-one - -### Directory Mapping Example - -``` -Code Directory → Documentation Directory -arch/src/main/scala/ → docs/bb-note/src/arch/src/main/scala/ -bb-tests/workloads/ → docs/bb-note/src/bb-tests/workloads/ -scripts/docker/ → docs/bb-note/src/scripts/docker/ -``` - -## Common Issues and Troubleshooting - -### Q1: Documentation generation fails with "path does not exist" error - -**Cause**: The provided `target_path` does not exist or is inaccessible - -**Solution**: -```bash -# Check if path exists -ls -la arch/src/main/scala/framework - -# Ensure using relative path, do not start with / -# Correct: "arch/src/main/scala/framework" -# Wrong: "/arch/src/main/scala/framework" -``` - -### Q2: Generated documentation is of poor quality or inaccurate content - -**Causes**: -- Few code files in directory or insufficient comments -- Wrong generation mode selected -- LLM API response anomaly - -**Solutions**: -```bash -# 1. Check directory contents -find arch/src/main/scala/framework -name "*.scala" | head -10 - -# 2. Try update mode instead of create mode -curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d '{ - "target_path": "arch/src/main/scala/framework", - "mode": "update" - }' - -# 3. Check system logs -tail -f logs/doc-agent.log -``` - -### Q3: SUMMARY.md update fails - -**Causes**: -- SUMMARY.md file permission issues -- File format does not meet expectations -- Concurrent update conflicts - -**Solutions**: -```bash -# Check file permissions -ls -la docs/bb-note/src/SUMMARY.md - -# Backup and reset SUMMARY.md -cp docs/bb-note/src/SUMMARY.md docs/bb-note/src/SUMMARY.md.backup - -# Check file format -head -20 docs/bb-note/src/SUMMARY.md -``` - -### Q4: Symbolic link creation fails - -**Causes**: -- Insufficient permissions for target directory -- Insufficient disk space -- File system does not support symbolic links - -**Solutions**: -```bash -# Check disk space -df -h docs/ - -# Check permissions -ls -la docs/bb-note/src/ - -# Manually test symbolic link creation -ln -s ../../../arch/src/main/scala/framework docs/bb-note/src/arch/src/main/scala/framework -``` - -### Q5: API request timeout - -**Causes**: -- Slow LLM API response -- Too many files in directory, long analysis time -- Network connection issues - -**Solutions**: -```bash -# Increase request timeout -curl -X POST http://localhost:8080/doc/generate \ - --max-time 300 \ - -H "Content-Type: application/json" \ - -d '{ - "target_path": "arch/src/main/scala/framework", - "mode": "create" - }' - -# Process large directories in batches -# Do not process arch/src/main/scala directly, but process its subdirectories -``` - -## Performance Optimization Recommendations - -### 1. Batch Processing Optimization -```bash -# Use parallel processing (use cautiously, avoid API limits) -parallel -j 2 curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d '{\"target_path\": \"{}\", \"mode\": \"create\"}' \ - ::: arch/src/main/scala/framework arch/src/main/scala/builtin -``` - -### 2. Incremental Update Strategy -```bash -# Only update recently modified directories -find arch/src/main/scala -type d -mtime -7 | while read dir; do - if [[ -f "$dir/README.md" ]]; then - curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d "{\"target_path\": \"$dir\", \"mode\": \"update\"}" - fi -done -``` - -### 3. Monitoring and Logging -```bash -# Monitor API response time -time curl -X POST http://localhost:8080/doc/generate \ - -H "Content-Type: application/json" \ - -d '{ - "target_path": "arch/src/main/scala/framework", - "mode": "create" - }' - -# View detailed logs -tail -f logs/doc-agent.log | grep -E "(ERROR|WARN|Generation complete)" -``` - -## Configuration Guide - -### Environment Variables - -| Variable Name | Description | Default Value | -|--------|------|--------| -| `DOC_AGENT_PORT` | API service port | 8080 | -| `LLM_API_KEY` | LLM API key | Required to set | -| `LLM_API_URL` | LLM API endpoint | Required to set | -| `DOC_OUTPUT_BASE` | Documentation output base path | `docs/bb-note/src` | -| `TEMPLATE_BASE_PATH` | Template file base path | `workflow/prompts/doc` | - -### Configuration File Example - -Create `.env` file: -```bash -# LLM API configuration -LLM_API_KEY=your_api_key_here -LLM_API_URL=https://api.openai.com/v1/chat/completions - -# Documentation system configuration -DOC_OUTPUT_BASE=docs/bb-note/src -TEMPLATE_BASE_PATH=workflow/prompts/doc - -# Performance configuration -MAX_CONCURRENT_REQUESTS=3 -REQUEST_TIMEOUT=300 -``` - -## Development and Debugging - -### Local Development Environment Setup - -```bash -# 1. Install dependencies -cd workflow -npm install - -# 2. Start Motia service -npm run dev - -# 3. Test API connection -curl http://localhost:8080/health -``` - -### Debug Mode - -```bash -# Enable verbose logging -export DEBUG=doc-agent:* -npm run dev - -# Test individual components -node -e " -const { loadTemplate } = require('./steps/doc-agent/template_loader'); -console.log(loadTemplate('rtl', 'arch/src/main/scala/test')); -" -``` - -## Version Information - -- **Current Version**: 1.0.0 -- **Motia Framework Version**: Compatible with v2.x -- **Supported Node.js Version**: >= 16.0.0 -- **Last Updated**: December 2024 - -## Support and Feedback - -If you encounter issues or need feature improvements, please: - -1. Check the troubleshooting section of this document -2. Review system log files -3. Create an Issue in the project repository -4. Contact the development team - ---- - -*This document is continuously maintained with system updates, please check regularly for the latest version.* diff --git a/workflow/steps/doc-agent/doc_integration_event_step.py b/workflow/steps/doc-agent/doc_integration_event_step.py deleted file mode 100644 index 65ccdc7e..00000000 --- a/workflow/steps/doc-agent/doc_integration_event_step.py +++ /dev/null @@ -1,122 +0,0 @@ -""" -Document integration event processing step -Responsible for handling symbolic link creation and SUMMARY.md updates for documentation -""" - -import sys -import os -from pathlib import Path - -# Add current directory to path for importing local modules -current_dir = Path(__file__).parent -sys.path.insert(0, str(current_dir)) - -from link_manager import LinkManager -from summary_manager import SummaryManager - -config = { - "type": "event", - "name": "doc_integration", - "description": "Handle documentation integration tasks", - "subscribes": ["doc.integrate"], - "emits": ["doc.integration.complete"], - "input": { - "type": "object", - "properties": { - "target_path": {"type": "string"}, - "output_path": {"type": "string"}, - "doc_type": {"type": "string"}, - "traceId": {"type": "string"}, - }, - }, - "flows": ["doc_agent"], -} - - -async def handler(input_data, context): - context.logger.info("doc-integration - Start processing", {"input": input_data}) - - target_path = input_data.get("target_path") - output_path = input_data.get("output_path") - doc_type = input_data.get("doc_type") - trace_id = input_data.get("traceId") - - try: - # 1. Initialize managers - link_manager = LinkManager() - summary_manager = SummaryManager() - - # 2. Create symbolic links - docs_path = None - try: - docs_path = link_manager.create_docs_structure(target_path) - link_manager.create_symbolic_link(output_path, docs_path) - context.logger.info( - "doc-integration - Symbolic link created", - {"source": output_path, "target": docs_path}, - ) - except Exception as e: - context.logger.warning( - "doc-integration - Symbolic link creation failed", - {"error": str(e), "source": output_path}, - ) - - # 3. Update SUMMARY.md - if docs_path: - try: - summary_path = "docs/bb-note/src/SUMMARY.md" - new_entry = summary_manager.generate_entry( - target_path, docs_path, doc_type - ) - success, message = summary_manager.update_summary( - summary_path, new_entry - ) - - if success: - context.logger.info( - "doc-integration - SUMMARY.md updated", - {"entry": new_entry["line"], "message": message}, - ) - else: - context.logger.info( - "doc-integration - SUMMARY.md update skipped", - {"message": message}, - ) - except Exception as e: - context.logger.warning( - "doc-integration - SUMMARY.md update failed", {"error": str(e)} - ) - - # 4. Send completion event - await context.emit( - { - "topic": "doc.integration.complete", - "data": { - "target_path": target_path, - "output_path": output_path, - "docs_path": docs_path, - "doc_type": doc_type, - "traceId": trace_id, - }, - } - ) - - context.logger.info( - "doc-integration processing complete", - {"target_path": target_path, "docs_path": docs_path, "traceId": trace_id}, - ) - - except Exception as e: - error_msg = f"doc-integration processing failed: {str(e)}" - context.logger.error(error_msg) - - await context.emit( - { - "topic": "doc.integration.error", - "data": { - "error": error_msg, - "target_path": target_path, - "traceId": trace_id, - }, - } - ) diff --git a/workflow/steps/doc-agent/doc_utils.py b/workflow/steps/doc-agent/doc_utils.py deleted file mode 100644 index e579db43..00000000 --- a/workflow/steps/doc-agent/doc_utils.py +++ /dev/null @@ -1,78 +0,0 @@ -""" -Documentation generation utility functions -""" - -import os -from pathlib import Path - - -def detect_doc_type(target_path): - """Automatically detect document type based on directory path""" - path_str = str(Path(target_path).resolve()).replace("\\", "/") - - if "arch/src/main/scala" in path_str: - return "rtl" - elif "bb-tests" in path_str: - if "/workloads/" in path_str or path_str.endswith("/workloads"): - return "workloads" - elif "/customext/" in path_str or path_str.endswith("/customext"): - return "customext" - elif "/sardine/" in path_str or path_str.endswith("/sardine"): - return "sardine" - elif "/uvbb/" in path_str or path_str.endswith("/uvbb"): - return "uvbb" - return "workloads" - elif "/scripts/" in path_str or path_str.endswith("/scripts"): - return "script" - elif "/sims/" in path_str or path_str.endswith("/sims"): - return "sim" - elif "/workflow/" in path_str or path_str.endswith("/workflow"): - return "workflow" - - return "script" - - -def load_prompt_template(doc_type, target_path): - """Load and process prompt template""" - template_path = f"workflow/prompts/doc/{doc_type}-doc.md" - - if not os.path.exists(template_path): - raise FileNotFoundError(f"Template file not found: {template_path}") - - with open(template_path, "r", encoding="utf-8") as f: - template = f.read() - - template = template.replace("[`目录相对路径`]", target_path) - template = template.replace("@[`目录相对路径`]", f"@{target_path}") - - return template - - -def prepare_update_mode_prompt(template, target_path, mode): - """Prepare prompt for update mode""" - if mode != "update": - return template - - existing_doc_path = os.path.join(target_path, "README.md") - if not os.path.exists(existing_doc_path): - return template - - with open(existing_doc_path, "r", encoding="utf-8") as f: - existing_content = f.read() - - update_instruction = f""" - -## Special Instructions for Update Mode - -You are updating existing documentation. Please note: -1. Carefully analyze existing documentation content, retain accurate and valuable information -2. Identify and update outdated, inaccurate or incomplete sections -3. Maintain overall document structure and style consistency -4. If existing content is accurate and complete, retain it - -Existing documentation content: -```markdown -{existing_content} -``` -""" - return template + update_instruction diff --git a/workflow/steps/doc-agent/link_manager.py b/workflow/steps/doc-agent/link_manager.py deleted file mode 100644 index c76a9440..00000000 --- a/workflow/steps/doc-agent/link_manager.py +++ /dev/null @@ -1,103 +0,0 @@ -""" -Symbolic link manager -Responsible for creating and validating symbolic links in the documentation directory structure -""" - -import os -import shutil -from pathlib import Path - - -class LinkManager: - """Symbolic link manager""" - - def __init__(self): - self.project_root = Path.cwd() - self.docs_base = self.project_root / "docs" / "bb-note" / "src" - - def create_docs_structure(self, target_path): - """Create corresponding documentation directory structure""" - docs_path = self._convert_to_docs_path(target_path) - docs_dir = Path(docs_path).parent - docs_dir.mkdir(parents=True, exist_ok=True) - return docs_path - - def _convert_to_docs_path(self, target_path): - """Convert code directory path to corresponding documentation directory path""" - target_path = Path(target_path).resolve() - - try: - relative_path = target_path.relative_to(self.project_root) - except ValueError: - relative_path = ( - Path(*target_path.parts[-3:]) - if len(target_path.parts) >= 3 - else target_path - ) - - docs_path = self.docs_base / relative_path / "README.md" - return str(docs_path) - - def create_symbolic_link(self, source_path, target_path): - """Create symbolic link""" - source = Path(source_path) - target = Path(target_path) - - if not source.exists(): - raise FileNotFoundError(f"Source file does not exist: {source_path}") - - target.parent.mkdir(parents=True, exist_ok=True) - - if target.exists() or target.is_symlink(): - target.unlink() - - try: - relative_source = os.path.relpath(source, target.parent) - target.symlink_to(relative_source) - return True - except Exception as e: - try: - shutil.copy2(source, target) - return True - except Exception as copy_error: - raise Exception( - f"Both creating symbolic link and copying file failed: {str(e)}, {str(copy_error)}" - ) - - def validate_links(self, docs_base_path=None): - """Validate symbolic links""" - if docs_base_path is None: - docs_base_path = self.docs_base - else: - docs_base_path = Path(docs_base_path) - - invalid_links = [] - valid_links = [] - - for link_path in docs_base_path.rglob("*"): - if link_path.is_symlink(): - try: - if link_path.exists(): - valid_links.append(str(link_path)) - else: - invalid_links.append( - { - "link": str(link_path), - "target": str(link_path.readlink()), - "error": "Target file does not exist", - } - ) - except Exception as e: - invalid_links.append( - { - "link": str(link_path), - "target": "Cannot read", - "error": str(e), - } - ) - - return { - "valid_links": valid_links, - "invalid_links": invalid_links, - "total_links": len(valid_links) + len(invalid_links), - } diff --git a/workflow/steps/doc-agent/summary_manager.py b/workflow/steps/doc-agent/summary_manager.py deleted file mode 100644 index eb495ccd..00000000 --- a/workflow/steps/doc-agent/summary_manager.py +++ /dev/null @@ -1,111 +0,0 @@ -""" -SUMMARY.md manager -Responsible for parsing and updating mdBook's SUMMARY.md file -""" - -import os -import re -from pathlib import Path - - -class SummaryManager: - """SUMMARY.md manager""" - - def __init__(self): - self.project_root = Path.cwd() - self.docs_base = self.project_root / "docs" / "bb-note" / "src" - - def parse_summary(self, summary_path): - """Parse SUMMARY.md file and return structured data""" - if not os.path.exists(summary_path): - return {"sections": [], "entries": [], "original_content": ""} - - with open(summary_path, "r", encoding="utf-8") as f: - content = f.read() - - lines = content.split("\n") - entries = [] - - for line in lines: - line = line.rstrip() - if line.strip().startswith("-"): - match = re.match(r"\s*-\s*\[([^\]]+)\]\(([^)]+)\)", line) - if match: - title, path = match.groups() - entries.append({"title": title, "path": path, "line": line}) - - return {"entries": entries, "original_content": content} - - def generate_entry(self, target_path, docs_path, doc_type): - """Generate SUMMARY.md entry for new documentation""" - docs_file = Path(docs_path) - - try: - relative_path = docs_file.relative_to(self.docs_base) - except ValueError: - relative_path = docs_file - - title = ( - Path(target_path).parts[-1] - if Path(target_path).parts - else "Unknown Document" - ) - indent_level = self._determine_indent_level(target_path, doc_type) - indent = "\t" * indent_level - - entry = f"{indent}- [{title}](./{relative_path})" - - return { - "title": title, - "path": f"./{relative_path}", - "line": entry, - "target_path": target_path, - "doc_type": doc_type, - } - - def _determine_indent_level(self, target_path, doc_type): - """Determine indent level based on directory path and document type""" - path_parts = Path(target_path).parts - base_level = 1 - - if doc_type == "rtl" and "scala" in path_parts: - scala_index = path_parts.index("scala") - base_level += len(path_parts) - scala_index - 1 - - return max(0, base_level) - - def update_summary(self, summary_path, new_entry): - """Update SUMMARY.md file, add new entry""" - summary_data = self.parse_summary(summary_path) - - # Check for duplicates - existing_paths = [entry["path"] for entry in summary_data["entries"]] - if new_entry["path"] in existing_paths: - return False, "Entry already exists" - - # Insert new entry - lines = summary_data["original_content"].split("\n") - new_lines = self._insert_entry(lines, new_entry) - - # Write back to file - new_content = "\n".join(new_lines) - with open(summary_path, "w", encoding="utf-8") as f: - f.write(new_content) - - return True, "SUMMARY.md updated" - - def _insert_entry(self, lines, new_entry): - """Insert new entry at appropriate position""" - new_lines = [] - inserted = False - - for line in lines: - new_lines.append(line) - if "contributors" in line.lower() and not inserted: - new_lines.insert(-1, new_entry["line"]) - inserted = True - - if not inserted: - new_lines.append(new_entry["line"]) - - return new_lines diff --git a/workflow/steps/firesim/.gitignore b/workflow/steps/firesim/.gitignore deleted file mode 100644 index 1e82fc7d..00000000 --- a/workflow/steps/firesim/.gitignore +++ /dev/null @@ -1 +0,0 @@ -*.yaml diff --git a/workflow/steps/firesim/01_buildbitstream_api_step.py b/workflow/steps/firesim/01_buildbitstream_api_step.py deleted file mode 100644 index 2c5fb9e9..00000000 --- a/workflow/steps/firesim/01_buildbitstream_api_step.py +++ /dev/null @@ -1,26 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Firesim Buildbitstream", - "description": "build bitstream", - "path": "/firesim/buildbitstream", - "method": "POST", - "emits": ["firesim.buildbitstream"], - "flows": ["firesim"], -} - - -async def handler(req, context): - body = req.get("body") or {} - await context.emit({"topic": "firesim.buildbitstream", "data": body}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/firesim/01_buildbitstream_event_step.py b/workflow/steps/firesim/01_buildbitstream_event_step.py deleted file mode 100644 index 380be083..00000000 --- a/workflow/steps/firesim/01_buildbitstream_event_step.py +++ /dev/null @@ -1,54 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "Firesim Buildbitstream", - "description": "build bitstream", - "subscribes": ["firesim.buildbitstream"], - "emits": [], - "flows": ["firesim"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - script_dir = f"{bbdir}/workflow/steps/firesim/scripts" - yaml_dir = f"{script_dir}/yaml" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"source {bbdir}/env.sh && firesim buildbitstream " - command += f" -a {yaml_dir}/config_hwdb.yaml" - command += f" -b {yaml_dir}/config_build.yaml" - command += f" -r {yaml_dir}/config_build_recipes.yaml" - command += f" -c {yaml_dir}/config_runtime.yaml" - result = stream_run_logger( - cmd=command, - logger=context.logger, - stdout_prefix="firesim buildbitstream", - stderr_prefix="firesim buildbitstream", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # ================================================================================== - - return diff --git a/workflow/steps/firesim/02_infrasetup_api_step.py b/workflow/steps/firesim/02_infrasetup_api_step.py deleted file mode 100644 index 32d3eba9..00000000 --- a/workflow/steps/firesim/02_infrasetup_api_step.py +++ /dev/null @@ -1,27 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Firesim Infrasetup", - "description": "infrasetup", - "path": "/firesim/infrasetup", - "method": "POST", - "emits": ["firesim.infrasetup"], - "flows": ["firesim"], -} - - -async def handler(req, context): - body = req.get("body") or {} - data = {"jobs": body.get("jobs", 16)} - await context.emit({"topic": "firesim.infrasetup", "data": data}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/firesim/02_infrasetup_event_step.py b/workflow/steps/firesim/02_infrasetup_event_step.py deleted file mode 100644 index ab0fc01e..00000000 --- a/workflow/steps/firesim/02_infrasetup_event_step.py +++ /dev/null @@ -1,56 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "Firesim Infrasetup", - "description": "infrasetup", - "subscribes": ["firesim.infrasetup"], - "emits": [], - "flows": ["firesim"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - script_dir = f"{bbdir}/workflow/steps/firesim/scripts" - yaml_dir = f"{script_dir}/yaml" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"source {bbdir}/env.sh && firesim infrasetup " - command += f" -a {yaml_dir}/config_hwdb.yaml" - command += f" -b {yaml_dir}/config_build.yaml" - command += f" -r {yaml_dir}/config_build_recipes.yaml" - command += f" -c {yaml_dir}/config_runtime.yaml" - result = stream_run_logger( - cmd=command, - logger=context.logger, - stdout_prefix="firesim infrasetup", - stderr_prefix="firesim infrasetup", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # Routing to verilog or finish workflow - # For run workflow, continue to verilog; for standalone clean, complete - # ================================================================================== - - return diff --git a/workflow/steps/firesim/03_runworkload_api_step.py b/workflow/steps/firesim/03_runworkload_api_step.py deleted file mode 100644 index be070512..00000000 --- a/workflow/steps/firesim/03_runworkload_api_step.py +++ /dev/null @@ -1,27 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Firesim Runworkload", - "description": "run workload", - "path": "/firesim/runworkload", - "method": "POST", - "emits": ["firesim.runworkload"], - "flows": ["firesim"], -} - - -async def handler(req, context): - body = req.get("body") or {} - data = {"jobs": body.get("jobs", 16)} - await context.emit({"topic": "firesim.runworkload", "data": data}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/firesim/03_runworkload_event_step.py b/workflow/steps/firesim/03_runworkload_event_step.py deleted file mode 100644 index 534e5c0a..00000000 --- a/workflow/steps/firesim/03_runworkload_event_step.py +++ /dev/null @@ -1,56 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "Firesim Runworkload", - "description": "run workload", - "subscribes": ["firesim.runworkload"], - "emits": [], - "flows": ["firesim"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - script_dir = f"{bbdir}/workflow/steps/firesim/scripts" - yaml_dir = f"{script_dir}/yaml" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"source {bbdir}/env.sh && firesim runworkload " - command += f" -a {yaml_dir}/config_hwdb.yaml" - command += f" -b {yaml_dir}/config_build.yaml" - command += f" -r {yaml_dir}/config_build_recipes.yaml" - command += f" -c {yaml_dir}/config_runtime.yaml" - result = stream_run_logger( - cmd=command, - logger=context.logger, - stdout_prefix="firesim runworkload", - stderr_prefix="firesim runworkload", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # Routing to verilog or finish workflow - # For run workflow, continue to verilog; for standalone clean, complete - # ================================================================================== - - return diff --git a/workflow/steps/firesim/README.md b/workflow/steps/firesim/README.md deleted file mode 100644 index e3cc4737..00000000 --- a/workflow/steps/firesim/README.md +++ /dev/null @@ -1,59 +0,0 @@ -# FireSim Workflow - -FireSim FPGA simulation workflow in Buckyball framework, providing FPGA-based hardware simulation environment. - -## API Usage - -### `buildbitstream` -**Endpoint**: `POST /firesim/buildbitstream` - -**Function**: Build FPGA bitstream file - -**Parameters**: No specific parameters - -**Example**: -```bash -bbdev firesim --buildbitstream -``` - -### `infrasetup` -**Endpoint**: `POST /firesim/infrasetup` - -**Function**: Setup FireSim infrastructure - -**Parameters**: No specific parameters - -**Example**: -```bash -bbdev firesim --infrasetup -``` - -### `runworkload` -**Endpoint**: `POST /firesim/runworkload` - -**Function**: Run workload on FireSim - -**Parameters**: No specific parameters - -**Example**: -```bash -bbdev firesim --runworkload -``` - -## Typical Workflow - -```bash -# 1. Build bitstream -bbdev firesim --buildbitstream - -# 2. Setup infrastructure -bbdev firesim --infrasetup - -# 3. Run workload -bbdev firesim --runworkload -``` - -## Notes - -- Bitstream build takes several hours -- infrasetup requires cloud computing resource configuration diff --git a/workflow/steps/firesim/scripts/makefrag/firesim/build.mk b/workflow/steps/firesim/scripts/makefrag/firesim/build.mk deleted file mode 100644 index 40450d4b..00000000 --- a/workflow/steps/firesim/scripts/makefrag/firesim/build.mk +++ /dev/null @@ -1,56 +0,0 @@ -# See LICENSE for license details. - -CHIPYARD_STAGING_DIR := $(chipyard_dir)/sims/firesim-staging - -# target scala directories to copy into midas. used by TARGET_COPY_TO_MIDAS_SCALA_DIRS -TARGET_COPY_TO_MIDAS_SCALA_DIRS := \ - $(addprefix $(chipyard_dir)/generators/firechip/,bridgeinterfaces goldengateimplementations) - -# this rule always is run, but may not update the timestamp of the targets (depending on what the Chipyard make does). -# if that is the case (Chipyard make doesn't update it's outputs), then downstream rules *should* be skipped. -# all other chipyard collateral is located in chipyard's generated sources area. -$(FIRRTL_FILE) $(ANNO_FILE) &: SHELL := /usr/bin/env bash # needed for running source in recipe -$(FIRRTL_FILE) $(ANNO_FILE) &: firesim_target_symlink_hook - @mkdir -p $(@D) - @mkdir -p $(TARGET_SBT_DIR)/target/generated-src/$(long_name) - source $(TARGET_SBT_DIR)/../env.sh - cd $(TARGET_SBT_DIR) && \ - pwd && \ - ${SBT} ";project $(TARGET_SBT_PROJECT); runMain chipyard.Generator \ - --target-dir $(TARGET_SBT_DIR)/target/generated-src/$(long_name) \ - --name $(long_name) \ - --top-module $(DESIGN_PACKAGE).$(DESIGN) \ - --legacy-configs $(TARGET_CONFIG_PACKAGE):$(TARGET_CONFIG) \ - --emit-legacy-sfc" - # Link to the generated files - ln -sf $(TARGET_SBT_DIR)/target/generated-src/$(long_name)/$(long_name).sfc.fir $(FIRRTL_FILE) - ln -sf $(TARGET_SBT_DIR)/target/generated-src/$(long_name)/$(long_name).anno.json $(ANNO_FILE) - # .d needed to run metasim CI tests - ln -sf $(TARGET_SBT_DIR)/target/generated-src/$(long_name)/$(long_name).d $(GENERATED_DIR)/$(long_name).d - -####################################### -# Setup Extra Verilator Compile Flags # -####################################### - -## default flags added for cva6 -CVA6_VERILATOR_FLAGS = \ - --unroll-count 256 \ - -Werror-PINMISSING \ - -Werror-IMPLICIT \ - -Wno-fatal \ - -Wno-PINCONNECTEMPTY \ - -Wno-ASSIGNDLY \ - -Wno-DECLFILENAME \ - -Wno-UNUSED \ - -Wno-UNOPTFLAT \ - -Wno-BLKANDNBLK \ - -Wno-style \ - -Wall - -# normal flags used for midas builds (that are incompatible with cva6) -DEFAULT_MIDAS_VERILATOR_FLAGS = \ - --assert - -# AJG: this must be evaluated after verilog generation to work (hence the =) -EXTRA_VERILATOR_FLAGS = \ - $(shell if ! grep -iq "module.*cva6" $(simulator_verilog); then echo "$(DEFAULT_MIDAS_VERILATOR_FLAGS)"; else echo "$(CVA6_VERILATOR_FLAGS)"; fi) diff --git a/workflow/steps/firesim/scripts/makefrag/firesim/config.mk b/workflow/steps/firesim/scripts/makefrag/firesim/config.mk deleted file mode 100644 index 55571068..00000000 --- a/workflow/steps/firesim/scripts/makefrag/firesim/config.mk +++ /dev/null @@ -1,30 +0,0 @@ -# Custom configuration for Buckyball FireSim builds -# This file overrides the default TARGET_CONFIG_PACKAGE to use our custom configs - -# Only used in this projects makefrags -makefile_path := $(abspath $(lastword $(MAKEFILE_LIST))) -makefile_dir := $(patsubst %/,%,$(dir $(makefile_path))) -chipyard_dir := $(abspath $(makefile_dir)/../../../../../../arch/thirdparty/chipyard) - -# These point at the main class of the target's Chisel generator -DESIGN_PACKAGE ?= firechip.chip -DESIGN ?= FireSim - -# Override to use our custom config package -TARGET_CONFIG_PACKAGE ?= sims.firesim -TARGET_CONFIG ?= FireSimBuckyballToyConfig - -# These guide chisel elaboration of simulation components by MIDAS, -# including models and widgets. -PLATFORM_CONFIG_PACKAGE ?= firesim.firesim -PLATFORM_CONFIG ?= BaseF1Config - -# Override project for the target. -TARGET_SBT_PROJECT := buckyball - -# Point to our project directory -TARGET_SBT_DIR := $(abspath $(makefile_dir)/../../../../../../arch) -TARGET_SOURCE_DIRS := $(abspath $(makefile_dir)/../../../../../../arch/src/main/scala) - -# SBT launcher -SBT ?= java -jar $(chipyard_dir)/scripts/sbt-launch.jar $(SBT_OPTS) diff --git a/workflow/steps/firesim/scripts/makefrag/firesim/driver.mk b/workflow/steps/firesim/scripts/makefrag/firesim/driver.mk deleted file mode 100644 index 4042d838..00000000 --- a/workflow/steps/firesim/scripts/makefrag/firesim/driver.mk +++ /dev/null @@ -1,70 +0,0 @@ -# See LICENSE for license details. - -# DOC include start: Bridge Build System Changes -########################## -# Driver Sources & Flags # -########################## - -ifeq (,$(wildcard $(RISCV)/lib/libriscv.so)) -$(warning libriscv not found) -LRISCV= -else -LRISCV=-lriscv -endif - -firechip_lib_dir = $(chipyard_dir)/generators/firechip/chip/src/main/cc -firechip_bridgestubs_lib_dir = $(chipyard_dir)/generators/firechip/bridgestubs/src/main/cc -testchipip_csrc_dir = $(chipyard_dir)/generators/testchipip/src/main/resources/testchipip/csrc - -# DRIVER_H only used to update recipe pre-reqs (ok to track more files) - -# fesvr and related srcs -DRIVER_H += \ - $(shell find $(testchipip_csrc_dir) -name "*.h") \ - $(shell find $(firechip_bridgestubs_lib_dir)/fesvr -name "*.h") -DRIVER_CC += \ - $(testchipip_csrc_dir)/cospike_impl.cc \ - $(testchipip_csrc_dir)/testchip_tsi.cc \ - $(testchipip_csrc_dir)/testchip_dtm.cc \ - $(testchipip_csrc_dir)/testchip_htif.cc \ - $(firechip_bridgestubs_lib_dir)/fesvr/firesim_tsi.cc \ - $(firechip_bridgestubs_lib_dir)/fesvr/firesim_dtm.cc \ - $(RISCV)/lib/libfesvr.a -# Disable missing override warning for testchipip. -TARGET_CXX_FLAGS += \ - -isystem $(testchipip_csrc_dir) \ - -isystem $(RISCV)/include \ - -Wno-inconsistent-missing-override -TARGET_LD_FLAGS += \ - -L$(RISCV)/lib \ - -Wl,-rpath,$(RISCV)/lib \ - $(LRISCV) - -# top-level sources -DRIVER_CC += $(addprefix $(firechip_lib_dir)/firesim/, $(addsuffix .cc, firesim_top)) -TARGET_CXX_FLAGS += -I$(firechip_bridgestubs_lib_dir)/bridge/test - -# bridge sources -DRIVER_H += $(shell find $(firechip_bridgestubs_lib_dir) -name "*.h") -DRIVER_CC += \ - $(wildcard \ - $(addprefix \ - $(firechip_bridgestubs_lib_dir)/, \ - $(addsuffix .cc,bridges/* bridges/tracerv/* bridges/cospike/*) \ - ) \ - ) -TARGET_CXX_FLAGS += \ - -I$(firechip_bridgestubs_lib_dir) \ - -I$(firechip_bridgestubs_lib_dir)/bridge \ - -I$(firechip_bridgestubs_lib_dir)/bridge/tracerv \ - -I$(firechip_bridgestubs_lib_dir)/bridge/cospike -TARGET_LD_FLAGS += \ - -l:libdwarf.so -l:libelf.so \ - -lz \ - -# other -TARGET_CXX_FLAGS += \ - -I$(GENERATED_DIR) \ - -g - -# DOC include end: Bridge Build System Changes diff --git a/workflow/steps/firesim/scripts/makefrag/firesim/metasim.mk b/workflow/steps/firesim/scripts/makefrag/firesim/metasim.mk deleted file mode 100644 index 1e7c50a3..00000000 --- a/workflow/steps/firesim/scripts/makefrag/firesim/metasim.mk +++ /dev/null @@ -1,140 +0,0 @@ -# See LICENSE for license details. - -################################################################ -# SW RTL Simulation Args -- for MIDAS- & FPGA-level Simulation # -################################################################ -TIMEOUT_CYCLES = 100000000 - -NET_SLOT ?= 0 -NET_LINK_LATENCY ?= 6405 -NET_BW ?= 100 -NET_SHMEMPORTNAME ?= $(shell printf '%0100d' $(NET_SLOT)) -NET_LOOPBACK ?= +nic-loopback0 -NET_MACADDR ?= $(shell printf '00:00:00:00:00:%02x' $$(($(NET_SLOT)+2))) -nic_args = +shmemportname0=$(NET_SHMEMPORTNAME) +macaddr0=$(NET_MACADDR) \ - +niclog0=niclog$(NET_SLOT) +linklatency0=$(NET_LINK_LATENCY) \ - +netbw0=$(NET_BW) +netburst0=8 $(NET_LOOPBACK) -tracer_args = +tracefile=TRACEFILE -blkdev_args = +blkdev-in-mem0=128 +blkdev-log0=blkdev-log$(NET_SLOT) -autocounter_args = +autocounter-readrate=1000 +autocounter-filename-base=AUTOCOUNTERFILE -# Neglecting this +arg will make the simulator use the same step size as on the -# FPGA. This will make ML simulation more closely match results seen on the -# FPGA at the expense of dramatically increased target runtime -serial_args = +fesvr-step-size=128 -#serial_args = - -COMMON_SIM_ARGS := $(COMMON_SIM_ARGS) $(serial_args) $(nic_args) $(tracer_args) $(blkdev_args) $(autocounter_args) - -# Arguments used only at a particular simulation abstraction -MIDAS_LEVEL_SIM_ARGS ?= +max-cycles=$(TIMEOUT_CYCLES) -FPGA_LEVEL_SIM_ARGS ?= - -################################ -# Verilator/VCS/XSIM execution # -################################ - -verilator = $(GENERATED_DIR)/V$(DESIGN) -verilator_debug = $(GENERATED_DIR)/V$(DESIGN)-debug -verilator_args = -vcs = $(GENERATED_DIR)/$(DESIGN) -vcs_debug = $(GENERATED_DIR)/$(DESIGN)-debug -vcs_args = +vcs+initreg+0 +vcs+initmem+0 -xsim = $(GENERATED_DIR)/$(DESIGN)-$(PLATFORM) -xcelium = $(GENERATED_DIR)/X$(DESIGN) -sim_binary_basename := $(basename $(notdir $(SIM_BINARY))) - -run-verilator: $(verilator) - cd $(dir $<) && \ - $(verilator) +permissive $(verilator_args) $(COMMON_SIM_ARGS) $(MIDAS_LEVEL_SIM_ARGS) $(EXTRA_SIM_ARGS) +permissive-off $(abspath $(SIM_BINARY)) -which_disasm := $(shell which spike-dasm 2> /dev/null) -ifneq ($(which_disasm),) - disasm := 3>&1 1>&2 2>&3 | $(which_disasm) $(DISASM_EXTENSION) > -endif - -# Some of the generated suites use specific plus args, that are prefixed with -# the binary name. These are captured with $($*_ARGS) -$(OUTPUT_DIR)/%.run: $(OUTPUT_DIR)/% $(EMUL) - cd $(dir $($(EMUL))) && \ - ./$(notdir $($(EMUL))) $< $($*_ARGS) $($(EMUL)_args) $(COMMON_SIM_ARGS) $(MIDAS_LEVEL_SIM_ARGS) $(EXTRA_SIM_ARGS) \ - 2> /dev/null 2> $@ && [ $$PIPESTATUS -eq 0 ] - -$(OUTPUT_DIR)/%.out: $(OUTPUT_DIR)/% $(EMUL) - cd $(dir $($(EMUL))) && \ - ./$(notdir $($(EMUL))) $< $($*_ARGS) $($(EMUL)_args) $(COMMON_SIM_ARGS) $(MIDAS_LEVEL_SIM_ARGS) $(EXTRA_SIM_ARGS) \ - $(disasm) $@ && [ $$PIPESTATUS -eq 0 ] - -$(OUTPUT_DIR)/%.vpd: $(OUTPUT_DIR)/% $(EMUL)-debug - cd $(dir $($(EMUL)_debug)) && \ - ./$(notdir $($(EMUL)_debug)) $< +vcdplusfile=$@ $($*_ARGS) $($(EMUL)_args) $(COMMON_SIM_ARGS) $(MIDAS_LEVEL_SIM_ARGS) $(EXTRA_SIM_ARGS) \ - $(disasm) $(patsubst %.vpd,%.out,$@) && [ $$PIPESTATUS -eq 0 ] - -$(OUTPUT_DIR)/%.fsdb: $(OUTPUT_DIR)/% $(EMUL)-debug - cd $(dir $($(EMUL)_debug)) && \ - ./$(notdir $($(EMUL)_debug)) $< +fsdbfile=$@ $($*_ARGS) $($(EMUL)_args) $(COMMON_SIM_ARGS) $(MIDAS_LEVEL_SIM_ARGS) $(EXTRA_SIM_ARGS) \ - $(disasm) $(patsubst %.fsdb,%.out,$@) && [ $$PIPESTATUS -eq 0 ] - -.PRECIOUS: $(OUTPUT_DIR)/%.fsdb $(OUTPUT_DIR)/%.vpd $(OUTPUT_DIR)/%.out $(OUTPUT_DIR)/%.run - -# TraceGen rules - -$(OUTPUT_DIR)/tracegen.out: $($(EMUL)) - mkdir -p $(OUTPUT_DIR) && \ - cd $(dir $($(EMUL))) && \ - ./$(notdir $($(EMUL))) $($(EMUL)_args) $(COMMON_SIM_ARGS) $(MIDAS_LEVEL_SIM_ARGS) $(EXTRA_SIM_ARGS) \ - 2> /dev/null 2> $@ && [ $$PIPESTATUS -eq 0 ] - -$(OUTPUT_DIR)/tracegen.result: $(OUTPUT_DIR)/tracegen.out $(AXE) - $(chipyard_dir)/scripts/check-tracegen.sh $< > $@ - -fsim-tracegen: $(OUTPUT_DIR)/tracegen.result - -.PHONY: fsim-tracegen diff --git a/workflow/steps/firesim/scripts/yaml/config_build.yaml b/workflow/steps/firesim/scripts/yaml/config_build.yaml deleted file mode 100644 index e6ee2d5c..00000000 --- a/workflow/steps/firesim/scripts/yaml/config_build.yaml +++ /dev/null @@ -1,44 +0,0 @@ -# Build-time build design / AGFI configuration for the FireSim Simulation Manager -# See https://docs.fires.im/en/stable/Advanced-Usage/Manager/Manager-Configuration-Files.html for documentation of all of these params. - -# ------------------------------------------------------------ -# Attention! This is the example on the Local U280 Machine -# Remember to modify this file use in the same way of firesim -# ------------------------------------------------------------ - -# this refers to build farms defined in config_build_farm.yaml -build_farm: - base_recipe: build-farm-recipes/externally_provisioned.yaml - recipe_arg_overrides: - # REQUIRED: (replace this) default location of build directory on build host. - default_build_dir: /home/mio/Code/buckyball/arch/thirdparty/chipyard/sims/firesim/deploy/FIRESIM_BUILD_DIR - # REQUIRED: List of IP addresses (or "localhost"). Each can have an OPTIONAL - # argument, called "override_build_dir", specifying to override the default - # build directory. - # - # Ex: - # build_farm_hosts: - # # use localhost and don't override the default build dir - # - localhost - # # use other IP address (don't override default build dir) - # - "111.111.1.111" - # # use other IP address (override default build dir for this build host) - # - "222.222.2.222": - # override_build_dir: /scratch/specific-build-host-build-dir - build_farm_hosts: - - localhost - -builds_to_run: - # this section references builds defined in config_build_recipes.yaml - # if you add a build here, it will be built when you run buildbitstream - # - alveo_u280_firesim_GemminiBuckyballConfig_no_nic - - alveo_u280_firesim_BuckyballToyConfig_no_nic - -agfis_to_share: - # - midasexamples_gcd - -share_with_accounts: - # To share with a specific user: - somebodysname: 123456789012 - # To share publicly: - # public: public diff --git a/workflow/steps/firesim/scripts/yaml/config_build_recipes.yaml b/workflow/steps/firesim/scripts/yaml/config_build_recipes.yaml deleted file mode 100644 index 01d8afb4..00000000 --- a/workflow/steps/firesim/scripts/yaml/config_build_recipes.yaml +++ /dev/null @@ -1,64 +0,0 @@ -# Build-time build recipe configuration for the FireSim Simulation Manager -# See https://docs.fires.im/en/stable/Advanced-Usage/Manager/Manager-Configuration-Files.html for documentation of all of these params. - -# this file contains sections that describe hardware designs that /can/ be built. -# edit config_build.yaml to actually "turn on" a config to be built when you run -# buildbitstream - -# ------------------------------------------------------------ -# Attention! This is the example on the Local U280 Machine -# Remember to modify this file use in the same way of firesim -# ------------------------------------------------------------ - -########### -# Schema: -########### -# : -# PLATFORM: -# TARGET_PROJECT: -# TARGET_PROJECT_MAKEFRAG: -# DESIGN: -# TARGET_CONFIG: -# PLATFORM_CONFIG: -# deploy_quintuplet: -# platform_config_args: -# fpga_frequency: -# build_strategy: -# post_build_hook: -# metasim_customruntimeconfig: -# bit_builder_recipe: -# # OPTIONAL: overrides for bit builder recipe -# # Arg structure should be identical to the args given -# # in the base_recipe. -# #bit_builder_arg_overrides: -# # : - -alveo_u280_firesim_GemminiBuckyballConfig_no_nic: - PLATFORM: xilinx_alveo_u280 - TARGET_PROJECT: firesim - TARGET_PROJECT_MAKEFRAG: ../makefrag/firesim - DESIGN: FireSim - TARGET_CONFIG: sims.firesim.FireSimGemminiBuckyballConfig - PLATFORM_CONFIG: BaseXilinxAlveoU280Config - deploy_quintuplet: null - platform_config_args: - fpga_frequency: 30 - build_strategy: TIMING - post_build_hook: null - metasim_customruntimeconfig: null - bit_builder_recipe: bit-builder-recipes/xilinx_alveo_u280.yaml - -alveo_u280_firesim_BuckyballToyConfig_no_nic: - PLATFORM: xilinx_alveo_u280 - TARGET_PROJECT: firesim - TARGET_PROJECT_MAKEFRAG: ../makefrag/firesim - DESIGN: FireSim - TARGET_CONFIG: sims.firesim.FireSimBuckyballToyConfig - PLATFORM_CONFIG: BaseXilinxAlveoU280Config - deploy_quintuplet: null - platform_config_args: - fpga_frequency: 30 - build_strategy: TIMING - post_build_hook: null - metasim_customruntimeconfig: null - bit_builder_recipe: bit-builder-recipes/xilinx_alveo_u280.yaml diff --git a/workflow/steps/firesim/scripts/yaml/config_hwdb.yaml b/workflow/steps/firesim/scripts/yaml/config_hwdb.yaml deleted file mode 100644 index 98b8155e..00000000 --- a/workflow/steps/firesim/scripts/yaml/config_hwdb.yaml +++ /dev/null @@ -1,43 +0,0 @@ -# Hardware config database for FireSim Simulation Manager -# See https://docs.fires.im/en/stable/Advanced-Usage/Manager/Manager-Configuration-Files.html for documentation of all of these params. - -# ------------------------------------------------------------ -# Attention! This is the example on the Local U280 Machine -# Remember to modify this file use in the same way of firesim -# ------------------------------------------------------------ - -# Hardware configs represent a combination of: -# - an agfi/bitstream_tar with the bitstream for an fpga -# - (optional) a deployquintuplet override -# - (optional) a deploy_makefrag_override -# - (optional) a custom_runtime_config - -# The HWDBs (and their AGFI's/bitstream_tar's) provided below are public and available to all users. - -# DOCREF START: Example HWDB Entry -midasexamples_gcd: - bitstream_tar: https://raw.githubusercontent.com/invalid/address.tar.gz - deploy_quintuplet_override: null - deploy_makefrag_override: null - custom_runtime_config: null -# DOCREF END: Example HWDB Entry - -# alveo_u280_firesim_GemminiBuckyballConfig_no_nic: -# bitstream_tar: file:///home/mio/Code/buckyball/arch/thirdparty/chipyard/sims/firesim/deploy/results-build/2025-09-24--17-27-06-alveo_u280_firesim_GemminiBuckyballConfig_no_nic/cl_xilinx_alveo_u280-firesim-FireSim-sims.firesim.FireSimGemminiBuckyballConfig-BaseXilinxAlveoU280Config/firesim.tar.gz -# deploy_quintuplet_override: null -# custom_runtime_config: null - -alveo_u280_firesim_GemminiBuckyballConfig_no_nic: - bitstream_tar: file:///home/mio/Code/buckyball/arch/thirdparty/chipyard/sims/firesim/deploy/results-build/2025-11-19--17-01-23-alveo_u280_firesim_GemminiBuckyballConfig_no_nic/cl_xilinx_alveo_u280-firesim-FireSim-sims.firesim.FireSimGemminiBuckyballConfig-BaseXilinxAlveoU280Config/firesim.tar.gz - deploy_quintuplet_override: null - custom_runtime_config: null - -# alveo_u280_firesim_BuckyballToyConfig_no_nic: -# bitstream_tar: file:///home/mio/Code/buckyball/arch/thirdparty/chipyard/sims/firesim/deploy/results-build/2025-09-29--19-18-46-alveo_u280_firesim_BuckyballToyConfig_no_nic/cl_xilinx_alveo_u280-firesim-FireSim-sims.firesim.FireSimBuckyballToyConfig-BaseXilinxAlveoU280Config/firesim.tar.gz -# deploy_quintuplet_override: null -# custom_runtime_config: null - -alveo_u280_firesim_BuckyballToyConfig_no_nic: - bitstream_tar: file:///home/mio/Code/buckyball/arch/thirdparty/chipyard/sims/firesim/deploy/results-build/2025-11-23--04-11-23-alveo_u280_firesim_BuckyballToyConfig_no_nic/cl_xilinx_alveo_u280-firesim-FireSim-sims.firesim.FireSimBuckyballToyConfig-BaseXilinxAlveoU280Config/firesim.tar.gz - deploy_quintuplet_override: null - custom_runtime_config: null diff --git a/workflow/steps/firesim/scripts/yaml/config_runtime.yaml b/workflow/steps/firesim/scripts/yaml/config_runtime.yaml deleted file mode 100644 index 490db5f9..00000000 --- a/workflow/steps/firesim/scripts/yaml/config_runtime.yaml +++ /dev/null @@ -1,110 +0,0 @@ -# Run-time configuration for the FireSim Simulation Manager -# See https://docs.fires.im/en/stable/Advanced-Usage/Manager/Manager-Configuration-Files.html for documentation of all of these params. - -# ------------------------------------------------------------ -# Attention! This is the example on the Local U280 Machine -# Remember to modify this file use in the same way of firesim -# ------------------------------------------------------------ - -run_farm: - base_recipe: run-farm-recipes/externally_provisioned.yaml - recipe_arg_overrides: - # REQUIRED: default platform used for run farm hosts. this is a class specifying - # how to run simulations on a run farm host. - default_platform: XilinxAlveoU280InstanceDeployManager - - # REQUIRED: default directory where simulations are run out of on the run farm hosts - default_simulation_dir: /home/mio/Code/buckyball/arch/thirdparty/chipyard/sims/firesim/deploy/FIRESIM_RUNS_DIR - - # REQUIRED: default fpga db file that enumerates what fpgas are available on the machine (used by XilinxU* Deploy Managers) - default_fpga_db: /opt/firesim-db.json - - # REQUIRED: List of unique hostnames/IP addresses, each with their - # corresponding specification that describes the properties of the host. - # - # Ex: - # run_farm_hosts_to_use: - # # use localhost which is described by "four_fpgas_spec" below. - # - localhost: four_fpgas_spec - # # supply IP address, which points to a machine that is described - # # by "four_fpgas_spec" below. - # - "111.111.1.111": four_fpgas_spec - run_farm_hosts_to_use: - - localhost: one_fpgas_spec - -metasimulation: - metasimulation_enabled: false - # vcs or verilator. use vcs-debug or verilator-debug for waveform generation - metasimulation_host_simulator: verilator - # plusargs passed to the simulator for all metasimulations - metasimulation_only_plusargs: "+fesvr-step-size=128 +max-cycles=100000000" - # plusargs passed to the simulator ONLY FOR vcs metasimulations - metasimulation_only_vcs_plusargs: "+vcs+initreg+0 +vcs+initmem+0" - -# DOCREF START: target_config area -target_config: - topology: no_net_config - no_net_num_nodes: 1 - link_latency: 6405 - switching_latency: 10 - net_bandwidth: 200 - profile_interval: -1 - - # This references a section from config_hwdb.yaml for fpga-accelerated simulation - # or from config_build_recipes.yaml for metasimulation - # In homogeneous configurations, use this to set the hardware config deployed - # for all simulators - default_hw_config: alveo_u280_firesim_BuckyballToyConfig_no_nic - - # Advanced: Specify any extra plusargs you would like to provide when - # booting the simulator (in both FPGA-sim and metasim modes). This is - # a string, with the contents formatted as if you were passing the plusargs - # at command line, e.g. "+a=1 +b=2" - plusarg_passthrough: "" -# DOCREF END: target_config area - -tracing: - enable: no - - # Trace output formats. Only enabled if "enable" is set to "yes" above - # 0 = human readable; 1 = binary (compressed raw data); 2 = flamegraph (stack - # unwinding -> Flame Graph) - output_format: 0 - - # Trigger selector. - # 0 = no trigger; 1 = cycle count trigger; 2 = program counter trigger; 3 = - # instruction trigger - selector: 1 - start: 0 - end: -1 - -autocounter: - read_rate: 0 - -workload: - workload_name: interactive.json # br-base.json - terminate_on_completion: no - suffix_tag: null - -host_debug: - # When enabled (=yes), Zeros-out FPGA-attached DRAM before simulations - # begin (takes 2-5 minutes). - # In general, this is not required to produce deterministic simulations on - # target machines running linux. Enable if you observe simulation non-determinism. - zero_out_dram: no - # If disable_synth_asserts: no, simulation will print assertion message and - # terminate simulation if synthesized assertion fires. - # If disable_synth_asserts: yes, simulation ignores assertion firing and - # continues simulation. - disable_synth_asserts: no - -# DOCREF START: Synthesized Prints -synth_print: - # Start and end cycles for outputting synthesized prints. - # They are given in terms of the base clock and will be converted - # for each clock domain. - start: 0 - end: -1 - # When enabled (=yes), prefix print output with the target cycle at which the print was triggered - cycle_prefix: yes -# DOCREF END: Synthesized Prints diff --git a/workflow/steps/marshal/.gitignore b/workflow/steps/marshal/.gitignore deleted file mode 100644 index 0521c5fb..00000000 --- a/workflow/steps/marshal/.gitignore +++ /dev/null @@ -1 +0,0 @@ -!*.json diff --git a/workflow/steps/marshal/01_build_api_step.py b/workflow/steps/marshal/01_build_api_step.py deleted file mode 100644 index 4c0936cf..00000000 --- a/workflow/steps/marshal/01_build_api_step.py +++ /dev/null @@ -1,25 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Marshal Build", - "description": "build marshal", - "path": "/marshal/build", - "method": "POST", - "emits": ["marshal.build"], - "flows": ["marshal"], -} - - -async def handler(req, context): - body = req.get("body") or {} - await context.emit({"topic": "marshal.build", "data": body}) - # ================================================================================== - # Wait for result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/marshal/01_build_event_step.py b/workflow/steps/marshal/01_build_event_step.py deleted file mode 100644 index db79e396..00000000 --- a/workflow/steps/marshal/01_build_event_step.py +++ /dev/null @@ -1,70 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "Marshal Build", - "description": "build marshal", - "subscribes": ["marshal.build"], - "emits": [], - "flows": ["marshal"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - script_dir = f"{bbdir}/workflow/steps/marshal/scripts" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"source {bbdir}/env.sh && ./marshal -v build interactive.json && ./marshal -v install interactive.json" - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=script_dir, - stdout_prefix="marshal build", - stderr_prefix="marshal build", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # Routing to completion or error handling - # ================================================================================== - if result.returncode == 0: - await context.emit( - { - "topic": "marshal.complete", - "data": {**data, "task": "marshal", "result": success_result}, - } - ) - else: - await context.emit( - { - "topic": "marshal.error", - "data": { - **data, - "task": "marshal", - "result": failure_result, - "returncode": result.returncode, - }, - } - ) - - return diff --git a/workflow/steps/marshal/02_launch_api_step.py b/workflow/steps/marshal/02_launch_api_step.py deleted file mode 100644 index 422f47dd..00000000 --- a/workflow/steps/marshal/02_launch_api_step.py +++ /dev/null @@ -1,26 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Marshal Launch", - "description": "launch marshal", - "path": "/marshal/launch", - "method": "POST", - "emits": ["marshal.launch"], - "flows": ["marshal"], -} - - -async def handler(req, context): - body = req.get("body") or {} - await context.emit({"topic": "marshal.launch", "data": body}) - - # ================================================================================== - # Wait for result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/marshal/02_launch_event_step.py b/workflow/steps/marshal/02_launch_event_step.py deleted file mode 100644 index 69604449..00000000 --- a/workflow/steps/marshal/02_launch_event_step.py +++ /dev/null @@ -1,50 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "Marshal Launch", - "description": "launch marshal", - "subscribes": ["marshal.launch"], - "emits": [], - "flows": ["marshal"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - script_dir = f"{bbdir}/workflow/steps/marshal/scripts" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"source {bbdir}/env.sh && ./marshal -v launch interactive.json" - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=script_dir, - stdout_prefix="marshal launch", - stderr_prefix="marshal launch", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # Finish workflow - # ================================================================================== - return diff --git a/workflow/steps/marshal/README.md b/workflow/steps/marshal/README.md deleted file mode 100644 index f8d25a32..00000000 --- a/workflow/steps/marshal/README.md +++ /dev/null @@ -1,53 +0,0 @@ -# Marshal Workflow - -Marshal workflow in the Buckyball framework, used to build and launch the Marshal component. - -## API Usage Guide - -### `build` -**Endpoint**: `POST /marshal/build` - -**Function**: Build Marshal component - -**Parameters**: No specific parameters - -**Example**: -```bash -bbdev marshal --build -``` - -### `launch` -**Endpoint**: `POST /marshal/launch` - -**Function**: Launch Marshal service - -**Parameters**: No specific parameters - -**Example**: -```bash -bbdev marshal --launch -``` - -## Typical Workflow - -```bash -# 1. Build Marshal -bbdev marshal --build - -# 2. Launch Marshal service -bbdev marshal --launch -``` - -## Response Format - -All API calls return a unified format: -```json -{ - "status": 200, - "body": { - "success": true, - "processing": false, - "return_code": 0 - } -} -``` diff --git a/workflow/steps/marshal/scripts/interactive.json b/workflow/steps/marshal/scripts/interactive.json deleted file mode 100644 index 8bb19496..00000000 --- a/workflow/steps/marshal/scripts/interactive.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "name" : "interactive", - "workdir" : ".", - "base" : "br-base.json", - "overlay" : "../output/overlay", - "host-init" : "./interactive/host-init.sh", - "rootfs-size" : "16GiB", - "spike-args" : "--extension=gemmini" -} diff --git a/workflow/steps/marshal/scripts/interactive/host-init.sh b/workflow/steps/marshal/scripts/interactive/host-init.sh deleted file mode 100755 index 45fa3b22..00000000 --- a/workflow/steps/marshal/scripts/interactive/host-init.sh +++ /dev/null @@ -1,20 +0,0 @@ -#!/bin/bash - -# This script will run on the host from the workload directory -# (e.g. workloads/example-fed) every time the workload is built. -# It is recommended to call into something like a makefile because -# this script may be called multiple times. - -BBDIR=$(git rev-parse --show-toplevel) - -cd $BBDIR && source env.sh - -echo "Building marshal workload" -bbdev workload --build - -mkdir -p $BBDIR/workflow/steps/marshal/output -rm -rf $BBDIR/workflow/steps/marshal/output/overlay -mkdir -p $BBDIR/workflow/steps/marshal/output/overlay/root - -# Copy workload binaries to /root directory -cp -r $BBDIR/bb-tests/output/workloads/src/* $BBDIR/workflow/steps/marshal/output/overlay/root/ diff --git a/workflow/steps/marshal/scripts/marshal b/workflow/steps/marshal/scripts/marshal deleted file mode 120000 index 6b68a668..00000000 --- a/workflow/steps/marshal/scripts/marshal +++ /dev/null @@ -1 +0,0 @@ -../../../../arch/thirdparty/chipyard/software/firemarshal/marshal \ No newline at end of file diff --git a/workflow/steps/marshal/scripts/marshal-config.yaml b/workflow/steps/marshal/scripts/marshal-config.yaml deleted file mode 100644 index 97b3664e..00000000 --- a/workflow/steps/marshal/scripts/marshal-config.yaml +++ /dev/null @@ -1,2 +0,0 @@ -firesim-dir: '../../../../arch/thirdparty/chipyard/sims/firesim' -# board-dir: '../../../../arch/thirdparty/chipyard/software/firemarshal/boards/prototype' diff --git a/workflow/steps/palladium/01_verilog_api_step.py b/workflow/steps/palladium/01_verilog_api_step.py deleted file mode 100644 index 975339f6..00000000 --- a/workflow/steps/palladium/01_verilog_api_step.py +++ /dev/null @@ -1,46 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - - -from utils.path import get_buckyball_path - - -config = { - "type": "api", - "name": "palladium Verilog", - "description": "generate verilog code", - "path": "/palladium/verilog", - "method": "POST", - "emits": ["palladium.verilog"], - "flows": ["palladium"], -} - - -async def handler(req, context): - bbdir = get_buckyball_path() - body = req.get("body") or {} - - # Get config name, must be provided - config_name = body.get("config") - if not config_name or config_name == "None": - return { - "status": "error", - "message": "Configuration name is required. Please specify --config_name parameter.", - "example": './bbdev palladium --verilog "--config_name sims.palladium.BuckyballToyP2EConfig"', - } - - data = { - "config": config_name, - "balltype": body.get("balltype"), - "output_dir": body.get("output_dir", f"{bbdir}/arch/build/"), - } - await context.emit({"topic": "palladium.verilog", "data": data}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/palladium/01_verilog_event_step.py b/workflow/steps/palladium/01_verilog_event_step.py deleted file mode 100644 index 125b3d2a..00000000 --- a/workflow/steps/palladium/01_verilog_event_step.py +++ /dev/null @@ -1,184 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "make verilog", - "description": "generate verilog code", - "subscribes": ["palladium.verilog"], - "emits": ["palladium.build"], - "flows": ["palladium"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - # Use arch/build as the base directory for chipyard.Generator - base_build_dir = f"{data.get('output_dir', f'{bbdir}/arch/build')}/palladium" - # Output directory for final Verilog files - verilog_output_dir = f"{base_build_dir}/verilog" - arch_dir = f"{bbdir}/arch" - - # Get config name, must be provided - config_name = data.get("config") - if not config_name or config_name == "None": - context.logger.error("Configuration name is required but not provided") - success_result, failure_result = await check_result( - context, - 1, - continue_run=False, - extra_fields={ - "task": "validation", - "error": "Configuration name is required. Please specify --config_name parameter.", - "example": './bbdev palladium --verilog "--config_name sims.palladium.BuckyballToyP2EConfig"', - }, - ) - return failure_result - - context.logger.info(f"Using configuration: {config_name}") - - # ================================================================================== - # Step 1: Generate FIRRTL using chipyard.Generator - # ================================================================================== - context.logger.info("Step 1: Generating FIRRTL with chipyard.Generator...") - os.system(f"mkdir -p {verilog_output_dir}") - firrtl_command = ( - f"cd {arch_dir} && " - f"sbt -J-Xmx256G -J-Xss64M -J-XX:+UseG1GC -J-XX:MaxGCPauseMillis=1000 " - f'"runMain chipyard.Generator ' - f"-td {base_build_dir} " - f"-T palladium.fpga.VCU118FPGATestHarness " - f'-C {config_name}"' - ) - - result = stream_run_logger( - cmd=firrtl_command, - logger=context.logger, - cwd=arch_dir, - stdout_prefix="palladium firrtl", - stderr_prefix="palladium firrtl", - ) - - if result.returncode != 0: - context.logger.error(f"FIRRTL generation failed with code {result.returncode}") - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=False, - extra_fields={"task": "firrtl", "step": "generate"}, - ) - return failure_result - - # ================================================================================== - # Step 2: Convert FIRRTL to SystemVerilog using firtool - # ================================================================================== - context.logger.info("Step 2: Converting FIRRTL to SystemVerilog with firtool...") - - # Extract the simple class name from the full config name - # e.g., "sims.palladium.BuckyballToyP2EConfig" -> "BuckyballToyP2EConfig" - config_class_name = config_name.split(".")[-1] - - # Find the generated FIRRTL file (in base_build_dir, not verilog_output_dir) - fir_file = f"{base_build_dir}/palladium.fpga.{config_class_name}.fir" - if not os.path.exists(fir_file): - context.logger.error(f"FIRRTL file not found: {fir_file}") - context.logger.info(f"Looking for files in {base_build_dir}...") - # List files to help debug - if os.path.exists(base_build_dir): - files = os.listdir(base_build_dir) - context.logger.info(f"Files in build dir: {files}") - success_result, failure_result = await check_result( - context, - 1, - continue_run=False, - extra_fields={"task": "firrtl", "step": "file_check"}, - ) - return failure_result - - verilog_command = ( - f"firtool {fir_file} " - f"-o {verilog_output_dir} " - f"--split-verilog " - f"--disable-all-randomization " - f"--strip-debug-info " - f"--ignore-read-enable-mem " - f"--lowering-options=disallowLocalVariables " - f"--disable-annotation-unknown" - ) - - result = stream_run_logger( - cmd=verilog_command, - logger=context.logger, - cwd=arch_dir, - stdout_prefix="palladium verilog", - stderr_prefix="palladium verilog", - ) - - if result.returncode != 0: - context.logger.error(f"Verilog generation failed with code {result.returncode}") - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=False, - extra_fields={"task": "verilog", "step": "firtool"}, - ) - return failure_result - - # ================================================================================== - # Step 3: Verify and clean up - # ================================================================================== - # context.logger.info("Step 3: Verifying generated files...") - - # # Check if top-level Verilog was generated - # top_sv_dir = f"{verilog_output_dir}/VCU118FPGATestHarness.sv" - # top_sv_file = f"{top_sv_dir}/VCU118FPGATestHarness.sv" - - # if os.path.exists(top_sv_file): - # # Count generated files - # sv_files = [f for f in os.listdir(top_sv_dir) if f.endswith('.sv')] - # context.logger.info(f"Successfully generated {len(sv_files)} SystemVerilog files") - # context.logger.info(f"Top-level module: {top_sv_file}") - # else: - # context.logger.error(f"Top-level Verilog file not found: {top_sv_file}") - - # # Remove unwanted file - # topname_file = f"{arch_dir}/TestHarness.sv" - # if os.path.exists(topname_file): - # os.remove(topname_file) - # context.logger.info(f"Removed unwanted file: {topname_file}") - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=data.get("from_run_workflow", False), - extra_fields={ - "task": "verilog", - "output_dir": verilog_output_dir, - "top_module": "VCU118FPGATestHarness", - }, - ) - - # ================================================================================== - # Continue routing - # Routing to verilog or finish workflow - # For run workflow, continue to verilog; for standalone clean, complete - # ================================================================================== - if data.get("from_run_workflow"): - await context.emit( - {"topic": "palladium.build", "data": {**data, "task": "run"}} - ) - - return diff --git a/workflow/steps/sardine/01_run_api_step.py b/workflow/steps/sardine/01_run_api_step.py deleted file mode 100644 index ac1e6a9f..00000000 --- a/workflow/steps/sardine/01_run_api_step.py +++ /dev/null @@ -1,42 +0,0 @@ -import subprocess -import sys -import os -import asyncio -from utils.event_common import wait_for_result - -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path - -config = { - "type": "api", - "name": "running sardine", - "description": "running sardine", - "path": "/sardine/run", - "method": "POST", - "emits": ["sardine.run"], - "flows": ["sardine"], -} - - -async def handler(req, context): - bbdir = get_buckyball_path() - - body = req.get("body") or {} - - data = {"workload": body.get("workload", "")} - - sardine_dir = f"{bbdir}/bb-tests/sardine" - - await context.emit({"topic": "sardine.run", "data": data}) - - # ================================================================================== - # Wait for execution result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/sardine/01_run_event_step.py b/workflow/steps/sardine/01_run_event_step.py deleted file mode 100644 index 500de852..00000000 --- a/workflow/steps/sardine/01_run_event_step.py +++ /dev/null @@ -1,56 +0,0 @@ -from contextlib import redirect_stdout -import os -from re import T -import subprocess -import sys -import time - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "running sardine", - "description": "running sardine", - "subscribes": ["sardine.run"], - "emits": [], - "flows": ["sardine"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - - sardine_dir = f"{bbdir}/bb-tests/sardine" - - command = f"source {bbdir}/env.sh && python run_tests.py --allure -m \"({data.get('workload', '')})\"" - context.logger.info( - "Executing sardine command", {"command": command, "cwd": sardine_dir} - ) - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=sardine_dir, - executable="bash", - stdout_prefix="sardine run", - stderr_prefix="sardine run", - ) - - # ================================================================================== - # Return execution result - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # finish workflow - # ================================================================================== - return diff --git a/workflow/steps/sardine/README.md b/workflow/steps/sardine/README.md deleted file mode 100644 index 90a82f2a..00000000 --- a/workflow/steps/sardine/README.md +++ /dev/null @@ -1,34 +0,0 @@ -# Sardine Workflow - -Sardine workflow in the Buckyball framework for running Sardine-related tasks. - -## API Usage - -### `run` -**Endpoint**: `POST /sardine/run` - -**Function**: Run Sardine tasks - -**Parameters**: -- **`workload`** - Specify the workload to run - -**Example**: -```bash -# Run specified workload -bbdev sardine --run "--workload /path/to/workload" - -# Run default workload -bbdev sardine --run -``` - -**Response**: -```json -{ - "status": 200, - "body": { - "success": true, - "processing": false, - "return_code": 0 - } -} -``` diff --git a/workflow/steps/tools/README.md b/workflow/steps/tools/README.md deleted file mode 100644 index 9dd3d0ba..00000000 --- a/workflow/steps/tools/README.md +++ /dev/null @@ -1,435 +0,0 @@ -# Function Calling Tool Management System - -A modular and extensible Function Calling tool management framework for AI Agent interaction with external systems. - -## 📁 Directory Structure - -``` -tools/ -├── __init__.py # Module exports -├── base.py # Tool base class and context -├── registry.py # Tool registry and manager -├── file_tools.py # File operation tools -├── presets.py # Predefined tool sets -└── README.md # This document -``` - -## 🚀 Quick Start - -### 1. Using Predefined Tool Manager - -```python -from steps.tools import create_code_agent_manager - -# Create tool manager (file_tools already registered) -manager = create_code_agent_manager() - -# Get tool definitions (send to LLM) -tools_schema = manager.get_tools_schema() - -# Call LLM -response = llm.chat( - messages=messages, - tools=tools_schema # Pass in tool definitions -) - -# Execute tool calls returned by LLM -if response.tool_calls: - for tool_call in response.tool_calls: - result = manager.execute_tool( - tool_name=tool_call.function.name, - arguments=tool_call.function.arguments, - work_dir="/path/to/project", - logger=logger - ) -``` - -### 2. Custom Tools - -```python -from steps.tools import Tool, ToolManager -import json - -class RunCommandTool(Tool): - """Command execution tool""" - - def get_name(self) -> str: - return "run_command" - - def get_description(self) -> str: - return "Execute a shell command" - - def get_parameters(self) -> dict: - return { - "type": "object", - "properties": { - "command": {"type": "string", "description": "Command to execute"} - }, - "required": ["command"] - } - - def execute(self, arguments: dict, context) -> str: - import subprocess - cmd = arguments.get("command") - - try: - result = subprocess.run( - cmd, - shell=True, - capture_output=True, - text=True, - timeout=30 - ) - - return json.dumps({ - "stdout": result.stdout, - "stderr": result.stderr, - "returncode": result.returncode - }) - except Exception as e: - return json.dumps({"error": str(e)}) - -# Register custom tool -manager = ToolManager() -manager.register_tool(RunCommandTool()) -``` - -## 📚 Core Concepts - -### Tool (Base Class) - -All tools inherit from the `Tool` base class: - -```python -class MyTool(Tool): - def get_name(self) -> str: - """Tool name (unique identifier)""" - return "my_tool" - - def get_description(self) -> str: - """Tool description (tells AI what this tool does)""" - return "My custom tool" - - def get_parameters(self) -> dict: - """Parameter definition (JSON Schema format)""" - return { - "type": "object", - "properties": { - "param1": {"type": "string", "description": "Parameter 1"}, - "param2": {"type": "integer", "description": "Parameter 2"} - }, - "required": ["param1"] - } - - def execute(self, arguments: dict, context) -> str: - """ - Execute tool logic - - Args: - arguments: Parameters passed by AI - context: ToolContext object (contains work_dir, logger, etc.) - - Returns: - Execution result (string, can be JSON) - """ - # Implement tool logic - return json.dumps({"result": "success"}) -``` - -### ToolContext (Execution Context) - -Execution environment provided to tools: - -```python -context = ToolContext( - work_dir="/path/to/project", # Working directory - logger=logger, # Logger - extra_key="extra_value" # Custom extension fields -) - -# Use in tool -class MyTool(Tool): - def execute(self, arguments, context): - context.log_info("Starting execution") - work_dir = context.work_dir - custom = context.extra.get("extra_key") - # ... -``` - -### ToolRegistry (Tool Registry) - -Manage tool registration and lookup: - -```python -from steps.tools import ToolRegistry, ReadFileTool, WriteFileTool - -registry = ToolRegistry() -registry.register(ReadFileTool()) -registry.register(WriteFileTool()) - -# Get tool -tool = registry.get("read_file") - -# List all tools -tools = registry.list_tools() # ['read_file', 'write_file'] - -# Convert to OpenAI format -schema = registry.to_openai_format() -``` - -### ToolManager (Tool Manager) - -High-level wrapper providing more convenient interface: - -```python -from steps.tools import ToolManager - -manager = ToolManager() -manager.register_tools([ReadFileTool(), WriteFileTool()]) - -# Get tool definitions -schema = manager.get_tools_schema() - -# Execute tool -result = manager.execute_tool( - tool_name="read_file", - arguments={"path": "main.py"}, - work_dir="/project" -) - -# View execution log -log = manager.get_execution_log() -``` - -## 🛠️ Built-in Tools - -### File Operation Tools - -#### read_file -Read file content - -```json -{ - "name": "read_file", - "parameters": { - "path": "relative/path/to/file.txt" - } -} -``` - -#### write_file -Write file content (automatically creates directories) - -```json -{ - "name": "write_file", - "parameters": { - "path": "output/file.txt", - "content": "file content here" - } -} -``` - -#### list_files -List files in directory - -```json -{ - "name": "list_files", - "parameters": { - "path": "src" // Optional, defaults to current directory - } -} -``` - -## 🎨 Predefined Tool Sets - -```python -from steps.tools import get_preset, list_presets - -# View available tool sets -presets = list_presets() # ['file_tools', 'code_agent'] - -# Get tool set -tools = get_preset("file_tools") # Returns [ReadFileTool, WriteFileTool, ...] - -# Create manager -from steps.tools import create_code_agent_manager -manager = create_code_agent_manager() -``` - -## 📝 Complete Examples - -### Agent Integration Example - -```python -from steps.tools import create_code_agent_manager -import httpx -import json - -async def run_agent(prompt: str, work_dir: str): - # 1. Create tool manager - manager = create_code_agent_manager() - tools_schema = manager.get_tools_schema() - - # 2. Initialize conversation - messages = [ - {"role": "system", "content": "You are a code assistant"}, - {"role": "user", "content": prompt} - ] - - # 3. AI loop - max_iterations = 10 - for i in range(max_iterations): - # Call LLM - response = await call_llm(messages, tools_schema) - assistant_msg = response["choices"][0]["message"] - messages.append(assistant_msg) - - # Check for tool calls - if not assistant_msg.get("tool_calls"): - print(f"Done! Final response: {assistant_msg['content']}") - break - - # Execute all tool calls - for tool_call in assistant_msg["tool_calls"]: - result = manager.execute_tool( - tool_name=tool_call["function"]["name"], - arguments=tool_call["function"]["arguments"], - work_dir=work_dir - ) - - # Add tool result to conversation - messages.append({ - "role": "tool", - "tool_call_id": tool_call["id"], - "content": result - }) -``` - -## 🔒 Security Features - -### Path Security - -All file operation tools include path traversal checks: - -```python -# ✅ Allowed -read_file("src/main.py") - -# ❌ Denied (path traversal) -read_file("../../../etc/passwd") -``` - -### Error Handling - -Tool execution automatically catches exceptions: - -```python -# Internal exceptions are caught and returned as error messages -result = manager.execute_tool("read_file", {"path": "nonexist.txt"}) -# Returns: {"error": "File not found: nonexist.txt"} -``` - -## 🧪 Testing - -Create test file: - -```python -# test_tools.py -from steps.tools import create_code_agent_manager -import tempfile -import os - -def test_file_tools(): - manager = create_code_agent_manager() - - with tempfile.TemporaryDirectory() as tmpdir: - # Test write_file - result = manager.execute_tool( - "write_file", - {"path": "test.txt", "content": "hello"}, - work_dir=tmpdir - ) - assert "success" in result - - # Test read_file - result = manager.execute_tool( - "read_file", - {"path": "test.txt"}, - work_dir=tmpdir - ) - assert result == "hello" - - print("✅ Test passed") - -if __name__ == "__main__": - test_file_tools() -``` - -## 📦 Extending Tools - -### Adding New Tool Categories - -Create new file `network_tools.py`: - -```python -from .base import Tool -import json -import httpx - -class HttpGetTool(Tool): - def get_name(self) -> str: - return "http_get" - - def get_description(self) -> str: - return "Make HTTP GET request" - - def get_parameters(self) -> dict: - return { - "type": "object", - "properties": { - "url": {"type": "string"} - }, - "required": ["url"] - } - - def execute(self, arguments, context): - url = arguments["url"] - response = httpx.get(url) - return json.dumps({ - "status": response.status_code, - "body": response.text[:1000] - }) -``` - -Then add to `presets.py`: - -```python -def create_network_tools(): - from .network_tools import HttpGetTool - return [HttpGetTool()] -``` - -## 🎯 Best Practices - -1. **Tool Naming**: Use clear verb+noun format (`read_file`, `list_users`) -2. **Parameter Descriptions**: Describe each parameter in detail to help AI use them correctly -3. **Error Handling**: Return JSON format error messages with `error` field -4. **Logging**: Use `context.log_info/log_error` to record key operations -5. **Security Checks**: Validate input parameters, prevent path traversal and other security issues -6. **Result Format**: Return JSON strings or plain text, maintain consistency - -## 🤝 Contributing - -Steps to add new tools: - -1. Inherit from `Tool` base class -2. Implement 4 abstract methods -3. Add to appropriate tool set in `presets.py` -4. Export in `__init__.py` -5. Update README documentation - -## 📄 License - -Same as main project diff --git a/workflow/steps/tools/__init__.py b/workflow/steps/tools/__init__.py deleted file mode 100644 index 6de3040c..00000000 --- a/workflow/steps/tools/__init__.py +++ /dev/null @@ -1,84 +0,0 @@ -""" -Function Calling Tool Management Module - -This module provides a complete Function Calling tool management system, including: -- Tool base class definitions -- Tool registration and management -- Predefined tool sets -- Tool execution context - -Usage example: - -```python -from tools import create_code_agent_manager - -# Create tool manager -manager = create_code_agent_manager() - -# Get tool definitions (to send to LLM) -tools_schema = manager.get_tools_schema() - -# Execute tool call -result = manager.execute_tool( - tool_name="read_file", - arguments={"path": "main.py"}, - work_dir="/path/to/project", - logger=logger -) -``` -""" - -from .base import Tool, ToolContext -from .registry import ToolRegistry, ToolManager -from .file_tools import ( - ReadFileTool, - WriteFileTool, - ListFilesTool, - MakeDirTool, - DeleteFileTool, - GetPathTool, - GrepFilesTool, -) -from .workflow_tools import WorkflowAPITool -from .deepwiki_tools import DeepwikiAskTool, DeepwikiReadWikiTool -from .agent_tools import CallAgentTool -from .presets import ( - create_file_tools, - create_code_agent_tools, - create_default_manager, - create_code_agent_manager, - get_preset, - list_presets, -) - -__all__ = [ - # Base classes - "Tool", - "ToolContext", - # Registration and management - "ToolRegistry", - "ToolManager", - # File operation tools - "ReadFileTool", - "WriteFileTool", - "ListFilesTool", - "MakeDirTool", - "DeleteFileTool", - "GetPathTool", - "GrepFilesTool", - # Agent coordination - "CallAgentTool", - # API tools - "WorkflowAPITool", - "DeepwikiAskTool", - "DeepwikiReadWikiTool", - # Predefined tool sets - "create_file_tools", - "create_code_agent_tools", - "create_default_manager", - "create_code_agent_manager", - "get_preset", - "list_presets", -] - -__version__ = "1.0.0" diff --git a/workflow/steps/tools/agent_tools.py b/workflow/steps/tools/agent_tools.py deleted file mode 100644 index 623377aa..00000000 --- a/workflow/steps/tools/agent_tools.py +++ /dev/null @@ -1,129 +0,0 @@ -"""Agent coordination tools""" - -import httpx -import uuid -from typing import Dict, Any -from .base import Tool - - -class CallAgentTool(Tool): - """Tool for calling other agents""" - - def get_name(self) -> str: - return "call_agent" - - def get_description(self) -> str: - return """Call another agent (spec_agent, code_agent, review_agent, verify_agent) with a task. - Use this to delegate work to specialized agents. - Returns the agent's response and generated files.""" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "agent_role": { - "type": "string", - "enum": ["spec", "code", "review", "verify"], - "description": "Which agent to call (spec/code/review/verify)", - }, - "task_description": { - "type": "string", - "description": "Task description or instructions for the agent", - }, - "context_files": { - "type": "array", - "items": {"type": "string"}, - "description": "Optional: List of file paths the agent should read for context", - }, - "model": { - "type": "string", - "description": "Optional: LLM model to use (will inherit from parent if not specified)", - }, - }, - "required": ["agent_role", "task_description"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - agent_role = arguments.get("agent_role") - task_description = arguments.get("task_description") - context_files = arguments.get("context_files", []) - # Prefer model from parameters, otherwise inherit from context - model = arguments.get("model") or context.extra.get("model") - - # Generate temporary task file - import os - import tempfile - - # Create temporary task file - task_content = task_description - - # Add context file references if specified - if context_files: - task_content += "\n\n## Context Files\n" - for filepath in context_files: - task_content += f"- {filepath}\n" - - # Write to temporary file - temp_dir = os.path.join(context.work_dir, ".agent_tasks") - os.makedirs(temp_dir, exist_ok=True) - - task_id = str(uuid.uuid4())[:8] - task_file = os.path.join(temp_dir, f"task_{agent_role}_{task_id}.md") - - with open(task_file, "w", encoding="utf-8") as f: - f.write(task_content) - - context.log_info(f"Created task file: {task_file}") - context.log_info(f"Calling {agent_role}_agent with task") - - try: - # Get workflow API address (from environment or use defaults) - import os - - workflow_host = os.getenv("WORKFLOW_HOST", "localhost") - workflow_port = os.getenv("WORKFLOW_PORT", "3001") - base_url = f"http://{workflow_host}:{workflow_port}" - url = f"{base_url}/agent" - - context.log_info(f"Calling workflow API at: {url}") - - payload = { - "agentRole": agent_role, - "promptPath": task_file, - "workDir": context.work_dir, - } - - # Add model to payload if specified - if model: - payload["model"] = model - context.log_info(f"Using model: {model}") - - response = httpx.post(url, json=payload, timeout=600.0) - - if response.status_code == 200: - result = response.json() - - # Clean up temporary file - try: - os.remove(task_file) - except Exception: - pass - - return str( - { - "status": "success", - "agent": agent_role, - "result": result, - "files": result.get("files", []), - } - ) - else: - return str( - { - "error": f"Agent call failed with status {response.status_code}", - "response": response.text[:500], - } - ) - - except Exception as e: - return str({"error": f"Failed to call agent: {str(e)}"}) diff --git a/workflow/steps/tools/base.py b/workflow/steps/tools/base.py deleted file mode 100644 index 7b91d9ed..00000000 --- a/workflow/steps/tools/base.py +++ /dev/null @@ -1,114 +0,0 @@ -"""Function Calling tool base classes""" - -from abc import ABC, abstractmethod -from typing import Dict, Any, Optional -import json - - -class Tool(ABC): - """Tool base class""" - - def __init__(self): - self.name = self.get_name() - self.description = self.get_description() - self.parameters = self.get_parameters() - - @abstractmethod - def get_name(self) -> str: - """Return tool name""" - pass - - @abstractmethod - def get_description(self) -> str: - """Return tool description""" - pass - - @abstractmethod - def get_parameters(self) -> Dict[str, Any]: - """Return tool parameter definition (JSON Schema)""" - pass - - @abstractmethod - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - """ - Execute tool - - Args: - arguments: Tool parameters - context: Execution context (contains logger, work_dir, etc.) - - Returns: - Execution result (string format) - """ - pass - - def to_openai_format(self) -> Dict[str, Any]: - """Convert to OpenAI Function Calling format""" - return { - "type": "function", - "function": { - "name": self.name, - "description": self.description, - "parameters": self.parameters, - }, - } - - def safe_execute(self, arguments: Any, context: Any) -> str: - """ - Safely execute tool (with error handling) - - Args: - arguments: Tool parameters (can be string or dict) - context: Execution context - - Returns: - Execution result or error message - """ - try: - # Parse arguments - if isinstance(arguments, str): - args = json.loads(arguments) - else: - args = arguments - - # Execute tool - result = self.execute(args, context) - return result - - except json.JSONDecodeError as e: - error = f"Invalid JSON arguments: {str(e)}" - if hasattr(context, "logger"): - context.logger.error(f"Tool {self.name} - {error}") - return json.dumps({"error": error}) - - except Exception as e: - error = f"Tool execution failed: {str(e)}" - if hasattr(context, "logger"): - context.logger.error(f"Tool {self.name} - {error}") - return json.dumps({"error": error}) - - def __repr__(self) -> str: - return f"Tool({self.name})" - - -class ToolContext: - """Tool execution context""" - - def __init__(self, work_dir: str, logger: Any = None, **kwargs): - self.work_dir = work_dir - self.logger = logger - self.extra = kwargs - - def log_info(self, message: str): - """Log info message""" - if self.logger: - self.logger.info(message) - else: - print(f"[INFO] {message}") - - def log_error(self, message: str): - """Log error message""" - if self.logger: - self.logger.error(message) - else: - print(f"[ERROR] {message}") diff --git a/workflow/steps/tools/deepwiki_tools.py b/workflow/steps/tools/deepwiki_tools.py deleted file mode 100644 index 214cdbda..00000000 --- a/workflow/steps/tools/deepwiki_tools.py +++ /dev/null @@ -1,327 +0,0 @@ -"""Deepwiki tool wrapper - via MCP Streamable HTTP""" - -import httpx -import json -import uuid -from httpx_sse import connect_sse -from typing import Dict, Any -from .base import Tool - - -class DeepwikiAskTool(Tool): - """Deepwiki Q&A tool - via MCP""" - - def get_name(self) -> str: - return "deepwiki_ask" - - def get_description(self) -> str: - return """Ask questions about a GitHub repository using Deepwiki. - Use this to understand code, architecture, and implementation details.""" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "repo": { - "type": "string", - "description": ( - "GitHub repository in format 'owner/repo' " - "(e.g., 'DangoSys/buckyball', 'ucb-bar/gemmini')" - ), - }, - "question": { - "type": "string", - "description": "Question to ask about the repository", - }, - }, - "required": ["repo", "question"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - repo = arguments.get("repo") - question = arguments.get("question") - - try: - context.log_info( - f"Asking Deepwiki via MCP HTTP: {question[:100]}... (repo: {repo})" - ) - - # MCP tool call request (JSON-RPC 2.0) - request_payload = { - "jsonrpc": "2.0", - "id": 1, - "method": "tools/call", - "params": { - "name": "ask_question", - "arguments": {"repoName": repo, "question": question}, - }, - } - - # Use Streamable HTTP endpoint - mcp_url = "https://mcp.deepwiki.com/mcp" - - context.log_info(f"MCP URL: {mcp_url}") - - with httpx.Client(timeout=120.0) as client: - # Step 1: Initialize session (without sessionId) - init_payload = { - "jsonrpc": "2.0", - "id": 0, - "method": "initialize", - "params": { - "protocolVersion": "2024-11-05", - "capabilities": {}, - "clientInfo": { - "name": "buckyball-workflow", - "version": "1.0.0", - }, - }, - } - - context.log_info("Initializing MCP session (without sessionId)...") - - session_id = None - # Use POST request directly, get sessionId from response headers - init_resp = client.post( - mcp_url, - json=init_payload, - headers={ - "Content-Type": "application/json", - "Accept": "application/json, text/event-stream", - }, - ) - - # Get sessionId from response headers - session_id = init_resp.headers.get("mcp-session-id") - if not session_id: - context.log_error(f"Init response status: {init_resp.status_code}") - context.log_error( - f"Init response headers: {dict(init_resp.headers)}" - ) - return "Error: No session ID in init response headers" - - context.log_info(f"Session initialized, ID: {session_id}") - - # Step 2: Call tool (with sessionId) - use SSE streaming - context.log_info( - f"Calling tool: {json.dumps(request_payload, ensure_ascii=False)[:300]}" - ) - - # Use stream instead of connect_sse - with client.stream( - "POST", - mcp_url, - json=request_payload, - headers={ - "Content-Type": "application/json", - "Accept": "application/json, text/event-stream", - "Mcp-Session-Id": session_id, - }, - ) as tool_resp: - context.log_info(f"Tool response status: {tool_resp.status_code}") - - # Manually parse SSE stream - for line in tool_resp.iter_lines(): - if line.startswith("data: "): - # Remove "data: " prefix - data_str = line[6:] - - # Skip heartbeat - if data_str.strip() == "ping": - continue - - try: - result = json.loads(data_str) - context.log_info( - f"Tool response: {json.dumps(result, ensure_ascii=False)[:500]}" - ) - - # Handle JSON-RPC 2.0 response - if "error" in result: - error = result["error"] - error_msg = f"MCP Error: {error.get('message', 'Unknown error')}" - context.log_error(error_msg) - return error_msg - - if "result" in result: - content = result["result"].get("content", []) - if content and len(content) > 0: - answer = content[0].get("text", "") - context.log_info( - f"Deepwiki answer length: {len(answer)} chars" - ) - context.log_info( - f"Deepwiki answer preview: {answer[:200]}..." - ) - - # Limit return length - if len(answer) > 3000: - answer = answer[:3000] + "\n... (truncated)" - return answer - except json.JSONDecodeError as e: - context.log_error( - f"Failed to parse SSE data: {e}, line: {line[:100]}" - ) - continue - - return "No valid response from Deepwiki" - - except Exception as e: - error_msg = f"Error calling Deepwiki MCP: {str(e)}" - context.log_error(error_msg) - return error_msg - - -class DeepwikiReadWikiTool(Tool): - """Read Deepwiki wiki content - via MCP""" - - def get_name(self) -> str: - return "deepwiki_read_wiki" - - def get_description(self) -> str: - return """Read wiki documentation for a GitHub repository from Deepwiki. - Use this to get structured documentation about the repository.""" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "repo": { - "type": "string", - "description": "GitHub repository in format 'owner/repo'", - } - }, - "required": ["repo"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - repo = arguments.get("repo") - - try: - context.log_info(f"Reading Deepwiki wiki via MCP HTTP for: {repo}") - - # MCP tool call request - request_payload = { - "jsonrpc": "2.0", - "id": 2, - "method": "tools/call", - "params": { - "name": "read_wiki_contents", - "arguments": {"repoName": repo}, - }, - } - - # Use Streamable HTTP endpoint - mcp_url = "https://mcp.deepwiki.com/mcp" - - context.log_info(f"MCP URL: {mcp_url}") - - with httpx.Client(timeout=120.0) as client: - # Step 1: Initialize session (without sessionId) - init_payload = { - "jsonrpc": "2.0", - "id": 0, - "method": "initialize", - "params": { - "protocolVersion": "2024-11-05", - "capabilities": {}, - "clientInfo": { - "name": "buckyball-workflow", - "version": "1.0.0", - }, - }, - } - - context.log_info("Initializing MCP session (without sessionId)...") - - session_id = None - # Use POST request directly, get sessionId from response headers - init_resp = client.post( - mcp_url, - json=init_payload, - headers={ - "Content-Type": "application/json", - "Accept": "application/json, text/event-stream", - }, - ) - - # Get sessionId from response headers - session_id = init_resp.headers.get("mcp-session-id") - if not session_id: - context.log_error(f"Init response status: {init_resp.status_code}") - context.log_error( - f"Init response headers: {dict(init_resp.headers)}" - ) - return "Error: No session ID in init response headers" - - context.log_info(f"Session initialized, ID: {session_id}") - - # Step 2: Call tool (with sessionId) - use SSE streaming - context.log_info( - f"Calling tool: {json.dumps(request_payload, ensure_ascii=False)}" - ) - - # Use stream instead of connect_sse - with client.stream( - "POST", - mcp_url, - json=request_payload, - headers={ - "Content-Type": "application/json", - "Accept": "application/json, text/event-stream", - "Mcp-Session-Id": session_id, - }, - ) as tool_resp: - context.log_info(f"Tool response status: {tool_resp.status_code}") - - # Manually parse SSE stream - for line in tool_resp.iter_lines(): - if line.startswith("data: "): - # Remove "data: " prefix - data_str = line[6:] - - # Skip heartbeat - if data_str.strip() == "ping": - continue - - try: - result = json.loads(data_str) - context.log_info( - f"Tool response: {json.dumps(result, ensure_ascii=False)[:500]}" - ) - - # Handle JSON-RPC 2.0 response - if "error" in result: - error = result["error"] - error_msg = f"MCP Error: {error.get('message', 'Unknown error')}" - context.log_error(error_msg) - return error_msg - - if "result" in result: - content = result["result"].get("content", []) - if content and len(content) > 0: - wiki_text = content[0].get("text", "") - context.log_info( - f"Deepwiki wiki length: {len(wiki_text)} chars" - ) - context.log_info( - f"Deepwiki wiki preview: {wiki_text[:200]}..." - ) - - if len(wiki_text) > 5000: - wiki_text = ( - wiki_text[:5000] + "\n... (truncated)" - ) - return wiki_text - except json.JSONDecodeError as e: - context.log_error( - f"Failed to parse SSE data: {e}, line: {line[:100]}" - ) - continue - - return "No valid wiki content from Deepwiki" - - except Exception as e: - error_msg = f"Error reading Deepwiki wiki via MCP: {str(e)}" - context.log_error(error_msg) - return error_msg diff --git a/workflow/steps/tools/file_tools.py b/workflow/steps/tools/file_tools.py deleted file mode 100644 index cec4ca8c..00000000 --- a/workflow/steps/tools/file_tools.py +++ /dev/null @@ -1,359 +0,0 @@ -"""File operation related tools""" - -import os -import json -import shutil -from typing import Dict, Any -from .base import Tool - - -class MakeDirTool(Tool): - """Create directory tool""" - - def get_name(self) -> str: - return "make_dir" - - def get_description(self) -> str: - return "Create a new directory (supports creating parent directories)" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "path": {"type": "string", "description": "Directory path to create"} - }, - "required": ["path"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - path = arguments.get("path") - - if not os.path.isabs(path): - path = os.path.join(context.work_dir, path) - - try: - if os.path.exists(path): - return json.dumps({"status": "exists", "path": path}) - - os.makedirs(path, exist_ok=True) - context.log_info(f"Created directory: {path}") - return json.dumps({"status": "success", "path": path}) - - except Exception as e: - return json.dumps({"error": f"Failed to create directory: {str(e)}"}) - - -class GetPathTool(Tool): - """Get path information tool""" - - def get_name(self) -> str: - return "get_path_info" - - def get_description(self) -> str: - return "Get absolute path and check if path exists" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "path": { - "type": "string", - "description": "Path to check (optional, defaults to work_dir)", - } - }, - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - path = arguments.get("path", ".") - - if not os.path.isabs(path): - path = os.path.join(context.work_dir, path) - - abs_path = os.path.abspath(path) - exists = os.path.exists(abs_path) - - info = { - "absolute_path": abs_path, - "exists": exists, - "work_dir": context.work_dir, - } - - if exists: - info["is_file"] = os.path.isfile(abs_path) - info["is_dir"] = os.path.isdir(abs_path) - - return json.dumps(info, indent=2) - - -class GrepFilesTool(Tool): - """Search file content tool""" - - def get_name(self) -> str: - return "grep_files" - - def get_description(self) -> str: - return "Search for text pattern in files" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "pattern": {"type": "string", "description": "Text pattern to search"}, - "path": {"type": "string", "description": "Directory or file path"}, - "file_ext": { - "type": "string", - "description": "File extension filter (e.g., '.scala')", - }, - }, - "required": ["pattern", "path"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - pattern = arguments.get("pattern") - path = arguments.get("path") - file_ext = arguments.get("file_ext") - - if not os.path.isabs(path): - path = os.path.join(context.work_dir, path) - - try: - results = [] - files_to_search = [] - - if os.path.isfile(path): - files_to_search = [path] - else: - for root, _, files in os.walk(path): - for file in files: - if file_ext and not file.endswith(file_ext): - continue - files_to_search.append(os.path.join(root, file)) - - # Limit file count - for filepath in files_to_search[:100]: - try: - with open(filepath, "r", encoding="utf-8", errors="ignore") as f: - for line_num, line in enumerate(f, 1): - if pattern in line: - results.append( - { - "file": filepath, - "line": line_num, - "content": line.strip()[:150], - } - ) - # Limit result count - if len(results) >= 50: - break - except Exception: - continue - - if len(results) >= 50: - break - - return json.dumps({"matches": len(results), "results": results}, indent=2) - - except Exception as e: - return json.dumps({"error": f"Search failed: {str(e)}"}) - - -class DeleteFileTool(Tool): - """Delete file tool""" - - def get_name(self) -> str: - return "delete_file" - - def get_description(self) -> str: - return "Delete a file (use with caution)" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "path": {"type": "string", "description": "Path to file to delete"} - }, - "required": ["path"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - path = arguments.get("path") - - if not os.path.isabs(path): - path = os.path.join(context.work_dir, path) - - try: - if not os.path.exists(path): - return json.dumps({"status": "not_found", "path": path}) - - if os.path.isfile(path): - os.remove(path) - context.log_info(f"Deleted file: {path}") - return json.dumps({"status": "success", "path": path}) - else: - return json.dumps({"error": "Path is a directory, not a file"}) - - except Exception as e: - return json.dumps({"error": f"Failed to delete: {str(e)}"}) - - -class ReadFileTool(Tool): - """Read file content""" - - def get_name(self) -> str: - return "read_file" - - def get_description(self) -> str: - return "Read the content of a file" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "path": { - "type": "string", - "description": "File path relative to work directory", - } - }, - "required": ["path"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - file_path = arguments.get("path") - - if not file_path: - return json.dumps({"error": "Missing required parameter: path"}) - - full_path = os.path.join(context.work_dir, file_path) - - # Security check: prevent path traversal - abs_full = os.path.abspath(full_path) - abs_work = os.path.abspath(context.work_dir) - if not abs_full.startswith(abs_work): - return json.dumps({"error": "Access denied: path outside work directory"}) - - if not os.path.exists(full_path): - return json.dumps({"error": f"File not found: {file_path}"}) - - if not os.path.isfile(full_path): - return json.dumps({"error": f"Not a file: {file_path}"}) - - try: - with open(full_path, "r", encoding="utf-8") as f: - content = f.read() - - context.log_info(f"Tool: read_file({file_path}) - {len(content)} chars") - return content - - except UnicodeDecodeError: - return json.dumps( - {"error": "Cannot read file: not a text file or encoding issue"} - ) - - -class WriteFileTool(Tool): - """Write file content""" - - def get_name(self) -> str: - return "write_file" - - def get_description(self) -> str: - return "Write content to a file (creates directories if needed)" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "path": { - "type": "string", - "description": "File path relative to work directory", - }, - "content": { - "type": "string", - "description": "Content to write to the file", - }, - }, - "required": ["path", "content"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - file_path = arguments.get("path") - content = arguments.get("content") - - if not file_path: - return json.dumps({"error": "Missing required parameter: path"}) - - if content is None: - return json.dumps({"error": "Missing required parameter: content"}) - - full_path = os.path.join(context.work_dir, file_path) - - # Security check: prevent path traversal - abs_full = os.path.abspath(full_path) - abs_work = os.path.abspath(context.work_dir) - if not abs_full.startswith(abs_work): - return json.dumps({"error": "Access denied: path outside work directory"}) - - try: - # Create directory - dir_path = os.path.dirname(full_path) - if dir_path: - os.makedirs(dir_path, exist_ok=True) - - # Write file - with open(full_path, "w", encoding="utf-8") as f: - f.write(content) - - context.log_info(f"Tool: write_file({file_path}) - {len(content)} chars") - - return json.dumps( - {"success": True, "path": file_path, "size": len(content)} - ) - - except Exception as e: - return json.dumps({"error": f"Failed to write file: {str(e)}"}) - - -class ListFilesTool(Tool): - """List files in directory""" - - def get_name(self) -> str: - return "list_files" - - def get_description(self) -> str: - return "List files in a directory" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "path": { - "type": "string", - "description": "Directory path relative to work directory (default: .)", - } - }, - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - dir_path = arguments.get("path", ".") - full_path = os.path.join(context.work_dir, dir_path) - - # Security check - abs_full = os.path.abspath(full_path) - abs_work = os.path.abspath(context.work_dir) - if not abs_full.startswith(abs_work): - return json.dumps({"error": "Access denied: path outside work directory"}) - - if not os.path.exists(full_path): - return json.dumps({"error": f"Directory not found: {dir_path}"}) - - if not os.path.isdir(full_path): - return json.dumps({"error": f"Not a directory: {dir_path}"}) - - try: - files = os.listdir(full_path) - context.log_info(f"Tool: list_files({dir_path}) - {len(files)} items") - - return json.dumps({"path": dir_path, "files": files, "count": len(files)}) - - except Exception as e: - return json.dumps({"error": f"Failed to list directory: {str(e)}"}) diff --git a/workflow/steps/tools/presets.py b/workflow/steps/tools/presets.py deleted file mode 100644 index 8db2aff0..00000000 --- a/workflow/steps/tools/presets.py +++ /dev/null @@ -1,106 +0,0 @@ -"""Predefined tool sets""" - -from typing import List -from .base import Tool -from .file_tools import ( - ReadFileTool, - WriteFileTool, - ListFilesTool, - MakeDirTool, - DeleteFileTool, - GetPathTool, - GrepFilesTool, -) -from .workflow_tools import WorkflowAPITool -from .deepwiki_tools import DeepwikiAskTool, DeepwikiReadWikiTool -from .agent_tools import CallAgentTool -from .registry import ToolManager - - -def create_file_tools() -> List[Tool]: - """Create file operation tool set""" - return [ - ReadFileTool(), - WriteFileTool(), - ListFilesTool(), - MakeDirTool(), - DeleteFileTool(), - GetPathTool(), - GrepFilesTool(), - ] - - -def create_code_agent_tools() -> List[Tool]: - """Create Code Agent tool set (includes all required tools)""" - return [ - # File operations - ReadFileTool(), - WriteFileTool(), - ListFilesTool(), - MakeDirTool(), - DeleteFileTool(), - GetPathTool(), - GrepFilesTool(), - # Agent coordination - CallAgentTool(), - # Workflow API - WorkflowAPITool(), - # Deepwiki - DeepwikiAskTool(), - DeepwikiReadWikiTool(), - ] - - -def create_default_manager() -> ToolManager: - """Create default tool manager""" - manager = ToolManager() - manager.register_tools(create_file_tools()) - return manager - - -def create_code_agent_manager() -> ToolManager: - """Create Code Agent dedicated tool manager""" - manager = ToolManager() - manager.register_tools(create_code_agent_tools()) - return manager - - -# Predefined tool set configurations -PRESET_CONFIGS = { - "file_tools": { - "name": "File Operations", - "description": "Basic file system operations", - "tools": create_file_tools, - }, - "code_agent": { - "name": "Code Agent", - "description": "Tools for code generation and manipulation", - "tools": create_code_agent_tools, - }, -} - - -def get_preset(name: str) -> List[Tool]: - """ - Get predefined tool set - - Args: - name: Tool set name ("file_tools", "code_agent") - - Returns: - Tool list - - Raises: - ValueError: If tool set does not exist - """ - config = PRESET_CONFIGS.get(name) - if not config: - available = ", ".join(PRESET_CONFIGS.keys()) - raise ValueError(f"Unknown preset: {name}. Available: {available}") - - return config["tools"]() - - -def list_presets() -> List[str]: - """List all available predefined tool sets""" - return list(PRESET_CONFIGS.keys()) diff --git a/workflow/steps/tools/registry.py b/workflow/steps/tools/registry.py deleted file mode 100644 index 13922466..00000000 --- a/workflow/steps/tools/registry.py +++ /dev/null @@ -1,131 +0,0 @@ -"""Function Calling tool registry""" - -from typing import Dict, List, Any, Optional -from .base import Tool, ToolContext - - -class ToolRegistry: - """Tool registration and management""" - - def __init__(self): - self._tools: Dict[str, Tool] = {} - - def register(self, tool: Tool): - """Register a tool""" - self._tools[tool.name] = tool - - def register_all(self, tools: List[Tool]): - """Batch register tools""" - for tool in tools: - self.register(tool) - - def get(self, name: str) -> Optional[Tool]: - """Get tool""" - return self._tools.get(name) - - def list_tools(self) -> List[str]: - """List all tool names""" - return list(self._tools.keys()) - - def to_openai_format(self) -> List[Dict[str, Any]]: - """Convert to OpenAI Function Calling format""" - return [tool.to_openai_format() for tool in self._tools.values()] - - def execute(self, tool_name: str, arguments: Any, context: ToolContext) -> str: - """ - Execute tool - - Args: - tool_name: Tool name - arguments: Tool parameters - context: Execution context - - Returns: - Execution result - """ - tool = self.get(tool_name) - - if not tool: - return f'{{"error": "Unknown tool: {tool_name}"}}' - - return tool.safe_execute(arguments, context) - - def __len__(self) -> int: - return len(self._tools) - - def __repr__(self) -> str: - return f"ToolRegistry({len(self)} tools: {', '.join(self.list_tools())})" - - -class ToolManager: - """Tool manager (high-level wrapper)""" - - def __init__(self, registry: Optional[ToolRegistry] = None): - self.registry = registry or ToolRegistry() - self._execution_log: List[Dict[str, Any]] = [] - - def register_tool(self, tool: Tool): - """Register tool""" - self.registry.register(tool) - - def register_tools(self, tools: List[Tool]): - """Batch register tools""" - self.registry.register_all(tools) - - def get_tools_schema(self) -> List[Dict[str, Any]]: - """Get tool definitions (OpenAI format)""" - return self.registry.to_openai_format() - - def execute_tool( - self, - tool_name: str, - arguments: Any, - work_dir: str, - logger: Any = None, - **kwargs, - ) -> str: - """ - Execute tool call - - Args: - tool_name: Tool name - arguments: Tool parameters - work_dir: Working directory - logger: Logger - **kwargs: Other context parameters - - Returns: - Execution result - """ - # Create context - context = ToolContext(work_dir=work_dir, logger=logger, **kwargs) - - # Execute tool - result = self.registry.execute(tool_name, arguments, context) - - # Log execution - self._execution_log.append( - { - "tool": tool_name, - "arguments": arguments, - # Truncate long results - "result": result[:200] if len(result) > 200 else result, - } - ) - - return result - - def get_execution_log(self) -> List[Dict[str, Any]]: - """Get execution log""" - return self._execution_log - - def clear_log(self): - """Clear execution log""" - self._execution_log.clear() - - def get_tool_names(self) -> List[str]: - """Get all tool names""" - return self.registry.list_tools() - - def __repr__(self) -> str: - return f"ToolManager({len(self.registry)} tools registered)" diff --git a/workflow/steps/tools/workflow_tools.py b/workflow/steps/tools/workflow_tools.py deleted file mode 100644 index ab17e891..00000000 --- a/workflow/steps/tools/workflow_tools.py +++ /dev/null @@ -1,71 +0,0 @@ -"""Workflow internal API call tools""" - -import httpx -import asyncio -from typing import Dict, Any -from .base import Tool - - -class WorkflowAPITool(Tool): - """Generic tool for calling Workflow internal APIs""" - - def get_name(self) -> str: - return "call_workflow_api" - - def get_description(self) -> str: - return """Call internal workflow API endpoints. - Available endpoints: - - /verilator/verilog: Generate Verilog - - /verilator/build: Build verilator (params: jobs) - - /verilator/sim: Run simulation (params: binary, batch) - - /workload/build: Build workload (params: args) - - /sardine/run: Run sardine tests (params: workload)""" - - def get_parameters(self) -> Dict[str, Any]: - return { - "type": "object", - "properties": { - "endpoint": { - "type": "string", - "description": "API endpoint path (e.g., '/verilator/build')", - }, - "params": { - "type": "object", - "description": "Request parameters as JSON object", - "additionalProperties": True, - }, - }, - "required": ["endpoint"], - } - - def execute(self, arguments: Dict[str, Any], context: Any) -> str: - endpoint = arguments.get("endpoint") - params = arguments.get("params", {}) - - # Get workflow API address - import os - - workflow_host = os.getenv("WORKFLOW_HOST", "localhost") - workflow_port = os.getenv("WORKFLOW_PORT", "3001") - base_url = f"http://{workflow_host}:{workflow_port}" - url = f"{base_url}{endpoint}" - - try: - context.log_info(f"Calling workflow API: {url}") - context.log_info(f"Parameters: {params}") - - # Synchronous call (using httpx sync client) - response = httpx.post(url, json=params, timeout=300.0) - - if response.status_code == 200: - return str(response.json()) - else: - return str( - { - "error": f"API call failed with status {response.status_code}", - "response": response.text[:500], - } - ) - - except Exception as e: - return str({"error": f"Workflow API call failed: {str(e)}"}) diff --git a/workflow/steps/uvm/01_builddut_api_step.py b/workflow/steps/uvm/01_builddut_api_step.py deleted file mode 100644 index 596f78bd..00000000 --- a/workflow/steps/uvm/01_builddut_api_step.py +++ /dev/null @@ -1,27 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "UVM Build DUT", - "description": "build dut", - "path": "/uvm/builddut", - "method": "POST", - "emits": ["uvm.builddut"], - "flows": ["uvm"], -} - - -async def handler(req, context): - body = req.get("body") or {} - data = {"jobs": body.get("jobs", 16)} - await context.emit({"topic": "uvm.builddut", "data": data}) - - # ================================================================================== - # Wait for build result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/uvm/01_builddut_event_step.py b/workflow/steps/uvm/01_builddut_event_step.py deleted file mode 100644 index 59678fd4..00000000 --- a/workflow/steps/uvm/01_builddut_event_step.py +++ /dev/null @@ -1,61 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "UVM Build DUT", - "description": "build dut", - "subscribes": ["uvm.builddut"], - "emits": [], - "flows": ["uvm"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - build_dir = f"{bbdir}/bb-tests/uvbb/dut/build" - dut_dir = f"{bbdir}/bb-tests/uvbb/dut" - arch_dir = f"{bbdir}/arch" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"cd {arch_dir} && mill -i __.uvbb.runMain uvbb.Elaborate " - command += "--disable-annotation-unknown -strip-debug-info -O=debug " - command += f"--split-verilog -o={build_dir}" - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=bbdir, - stdout_prefix="uvm build dut", - stderr_prefix="uvm build dut", - ) - - # Remove unwanted file - topname_file = f"{arch_dir}/BallTop.sv" - if os.path.exists(topname_file): - os.remove(topname_file) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # Continue routing - # Routing to verilog or finish workflow - # For run workflow, continue to verilog; for standalone clean, complete - # ================================================================================== - - return diff --git a/workflow/steps/uvm/03_build_api_step.py b/workflow/steps/uvm/03_build_api_step.py deleted file mode 100644 index e5fb39f7..00000000 --- a/workflow/steps/uvm/03_build_api_step.py +++ /dev/null @@ -1,27 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "UVM Build", - "description": "build uvm executable", - "path": "/uvm/build", - "method": "POST", - "emits": ["uvm.build"], - "flows": ["uvm"], -} - - -async def handler(req, context): - body = req.get("body") or {} - data = {"jobs": body.get("jobs", 16)} - await context.emit({"topic": "uvm.build", "data": data}) - - # ================================================================================== - # Wait for build result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/uvm/03_build_event_step.py b/workflow/steps/uvm/03_build_event_step.py deleted file mode 100644 index 51311692..00000000 --- a/workflow/steps/uvm/03_build_event_step.py +++ /dev/null @@ -1,109 +0,0 @@ -import os -import subprocess -import glob -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "make build", - "description": "build verilator executable", - "subscribes": ["uvm.build"], - "emits": [], - "flows": ["uvm"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - arch_dir = f"{bbdir}/arch" - dut_dir = f"{bbdir}/bb-tests/uvbb/dut" - build_dir = f"{bbdir}/bb-tests/uvbb/dut/build" - waveform_dir = f"{arch_dir}/waveform" - log_dir = f"{arch_dir}/log" - - # ================================================================================== - # Execute operation - # ================================================================================== - # Find sources - vsrcs = glob.glob(f"{build_dir}/**/*.v", recursive=True) + glob.glob( - f"{build_dir}/**/*.sv", recursive=True - ) - csrcs = ( - glob.glob(f"{dut_dir}/src/main/csrc/**/*.c", recursive=True) - + glob.glob(f"{dut_dir}/src/main/csrc/**/*.cc", recursive=True) - + glob.glob(f"{dut_dir}/src/main/csrc/**/*.cpp", recursive=True) - + glob.glob(f"{build_dir}/**/*.c", recursive=True) - + glob.glob(f"{build_dir}/**/*.cc", recursive=True) - + glob.glob(f"{build_dir}/**/*.cpp", recursive=True) - ) - - # Setup paths - inc_paths = [ - os.environ.get("RISCV", "") + "/include" if os.environ.get("RISCV") else "", - f"{arch_dir}/thirdparty/chipyard/tools/DRAMSim2", - build_dir, - f"{dut_dir}/src/main/csrc/include", - ] - inc_flags = " ".join([f"-I{p}" for p in inc_paths if p]) - - topname = "BallTop" - - cflags = f"{inc_flags} -DTOP_NAME='\"V{topname}\"' -std=c++17 " - ldflags = ( - f"-lreadline -ldramsim -lfesvr " - f"-L{arch_dir}/thirdparty/chipyard/tools/DRAMSim2 " - f"-L{arch_dir}/thirdparty/chipyard/toolchains/riscv-tools/riscv-isa-sim/build " - f"-L{arch_dir}/thirdparty/chipyard/toolchains/riscv-tools/riscv-isa-sim/build/lib" - ) - - obj_dir = f"{build_dir}/obj_dir" - subprocess.run(f"rm -rf {obj_dir}", shell=True) - os.makedirs(obj_dir, exist_ok=True) - - sources = " ".join(vsrcs + csrcs) - jobs = data.get("jobs", "") - - verilator_cmd = ( - f"verilator -MMD --build -cc --trace -O3 --x-assign fast --x-initial fast --noassert -Wno-fatal " - f"--timing -j {jobs} +incdir+{build_dir} --top {topname} {sources} " - f"-CFLAGS '{cflags}' -LDFLAGS '{ldflags}' --Mdir {obj_dir} --exe" - ) - - result = stream_run_logger( - cmd=verilator_cmd, - logger=context.logger, - cwd=bbdir, - stdout_prefix="uvm build", - stderr_prefix="uvm build", - ) - result = stream_run_logger( - cmd=f"make -C {obj_dir} -f V{topname}.mk {obj_dir}/V{topname}", - logger=context.logger, - cwd=bbdir, - stdout_prefix="verilator build", - stderr_prefix="verilator build", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, result.returncode, continue_run=data.get("from_run_workflow", False) - ) - - # ================================================================================== - # Continue routing - # Routing to verilog or finish workflow - # For run workflow, continue to verilog; for standalone clean, complete - # ================================================================================== - - return diff --git a/workflow/steps/uvm/README.md b/workflow/steps/uvm/README.md deleted file mode 100644 index 79ec5c50..00000000 --- a/workflow/steps/uvm/README.md +++ /dev/null @@ -1,61 +0,0 @@ -# UVM Workflow - -UVM (Universal Verification Methodology) workflow in the Buckyball framework for building and running UVM verification environments. - -## API Usage - -### `builddut` -**Endpoint**: `POST /uvm/builddut` - -**Function**: Build DUT (Design Under Test) - -**Parameters**: -- **`jobs`** - Number of parallel build tasks, default 16 - -**Example**: -```bash -# Build DUT with default parallelism -bbdev uvm --builddut - -# Specify number of parallel tasks -bbdev uvm --builddut "--jobs 8" -``` - -### `build` -**Endpoint**: `POST /uvm/build` - -**Function**: Build UVM executable - -**Parameters**: -- **`jobs`** - Number of parallel build tasks, default 16 - -**Example**: -```bash -# Build UVM with default parallelism -bbdev uvm --build - -# Specify number of parallel tasks -bbdev uvm --build "--jobs 8" -``` - -## Typical Workflow - -```bash -# 1. Build DUT -bbdev uvm --builddut - -# 2. Build UVM environment -bbdev uvm --build -``` - -**Response Format**: -```json -{ - "status": 200, - "body": { - "success": true, - "processing": false, - "return_code": 0 - } -} -``` diff --git a/workflow/steps/verilator/01_clean_api_step.py b/workflow/steps/verilator/01_clean_api_step.py deleted file mode 100644 index b447792c..00000000 --- a/workflow/steps/verilator/01_clean_api_step.py +++ /dev/null @@ -1,26 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Verilator Clean", - "description": "clean build directory", - "path": "/verilator/clean", - "method": "POST", - "emits": ["verilator.clean"], - "flows": ["verilator"], -} - - -async def handler(req, context): - body = req.get("body") or {} - await context.emit({"topic": "verilator.clean", "data": {**body, "task": "clean"}}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/verilator/01_clean_event_step.py b/workflow/steps/verilator/01_clean_event_step.py deleted file mode 100644 index 626eba3e..00000000 --- a/workflow/steps/verilator/01_clean_event_step.py +++ /dev/null @@ -1,57 +0,0 @@ -import subprocess -import os -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "make clean", - "description": "clean build directory", - "subscribes": ["verilator.run", "verilator.clean"], - "emits": ["verilator.verilog"], - "flows": ["verilator"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - build_dir = f"{bbdir}/arch/build" - # ================================================================================== - # Execute operation - # ================================================================================== - command = f"rm -rf {build_dir}" - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=bbdir, - stdout_prefix="verilator clean", - stderr_prefix="verilator clean", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=data.get("from_run_workflow", False), - extra_fields={"task": "clean"}, - ) - - # ================================================================================== - # Continue routing - # ================================================================================== - if data.get("from_run_workflow"): - await context.emit( - {"topic": "verilator.verilog", "data": {**data, "task": "run"}} - ) - - return diff --git a/workflow/steps/verilator/02_verilog_api_step.py b/workflow/steps/verilator/02_verilog_api_step.py deleted file mode 100644 index 5f900423..00000000 --- a/workflow/steps/verilator/02_verilog_api_step.py +++ /dev/null @@ -1,46 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - - -from utils.path import get_buckyball_path - - -config = { - "type": "api", - "name": "Verilator Verilog", - "description": "generate verilog code", - "path": "/verilator/verilog", - "method": "POST", - "emits": ["verilator.verilog"], - "flows": ["verilator"], -} - - -async def handler(req, context): - bbdir = get_buckyball_path() - body = req.get("body") or {} - - # Get config name, must be provided - config_name = body.get("config") - if not config_name or config_name == "None": - return { - "status": "error", - "message": "Configuration name is required. Please specify --config parameter.", - "example": 'bbdev verilator --verilog "--config sims.verilator.BuckyballToyVerilatorConfig"', - } - - data = { - "config": config_name, - "balltype": body.get("balltype"), - "output_dir": body.get("output_dir", f"{bbdir}/arch/build/"), - } - await context.emit({"topic": "verilator.verilog", "data": data}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/verilator/02_verilog_event_step.py b/workflow/steps/verilator/02_verilog_event_step.py deleted file mode 100644 index fd84c8ec..00000000 --- a/workflow/steps/verilator/02_verilog_event_step.py +++ /dev/null @@ -1,93 +0,0 @@ -import os -import subprocess -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "make verilog", - "description": "generate verilog code", - "subscribes": ["verilator.verilog"], - "emits": ["verilator.build"], - "flows": ["verilator"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - build_dir = data.get("output_dir", f"{bbdir}/arch/build/") - arch_dir = f"{bbdir}/arch" - - # Get config name, must be provided - config_name = data.get("config") - if not config_name or config_name == "None": - context.logger.error("Configuration name is required but not provided") - success_result, failure_result = await check_result( - context, - 1, - continue_run=False, - extra_fields={ - "task": "validation", - "error": "Configuration name is required. Please specify --config parameter.", - "example": 'bbdev verilator --verilog "--config sims.verilator.BuckyballToyVerilatorConfig"', - }, - ) - return failure_result - - context.logger.info(f"Using configuration: {config_name}") - - # ================================================================================== - # Execute operation - # ================================================================================== - if data.get("balltype"): - command = ( - f"mill -i __.test.runMain sims.verify.BallTopMain {data.get('balltype')} " - ) - else: - command = f"mill -i __.test.runMain sims.verilator.Elaborate {config_name} " - - command += "--disable-annotation-unknown -strip-debug-info -O=debug " - command += f"--split-verilog -o={build_dir}" - - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=arch_dir, - stdout_prefix="verilator verilog", - stderr_prefix="verilator verilog", - ) - - # Remove unwanted file - topname_file = f"{arch_dir}/TestHarness.sv" - if os.path.exists(topname_file): - os.remove(topname_file) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=data.get("from_run_workflow", False), - extra_fields={"task": "verilog"}, - ) - - # ================================================================================== - # Continue routing - # Routing to verilog or finish workflow - # For run workflow, continue to verilog; for standalone clean, complete - # ================================================================================== - if data.get("from_run_workflow"): - await context.emit( - {"topic": "verilator.build", "data": {**data, "task": "run"}} - ) - - return diff --git a/workflow/steps/verilator/03_build_api_step.py b/workflow/steps/verilator/03_build_api_step.py deleted file mode 100644 index 659a6d25..00000000 --- a/workflow/steps/verilator/03_build_api_step.py +++ /dev/null @@ -1,27 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Verilator Build", - "description": "build verilator executable", - "path": "/verilator/build", - "method": "POST", - "emits": ["verilator.build"], - "flows": ["verilator"], -} - - -async def handler(req, context): - body = req.get("body") or {} - data = {"jobs": body.get("jobs", 16)} - await context.emit({"topic": "verilator.build", "data": data}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/verilator/03_build_event_step.py b/workflow/steps/verilator/03_build_event_step.py deleted file mode 100644 index efe50dd4..00000000 --- a/workflow/steps/verilator/03_build_event_step.py +++ /dev/null @@ -1,121 +0,0 @@ -import os -import subprocess -import glob -import sys - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "make build", - "description": "build verilator executable", - "subscribes": ["verilator.build"], - "emits": ["verilator.sim"], - "flows": ["verilator"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - arch_dir = f"{bbdir}/arch" - build_dir = f"{arch_dir}/build" - waveform_dir = f"{arch_dir}/waveform" - log_dir = f"{arch_dir}/log" - - # ================================================================================== - # Execute operation - # ================================================================================== - # Find sources - vsrcs = glob.glob(f"{build_dir}/**/*.v", recursive=True) + glob.glob( - f"{build_dir}/**/*.sv", recursive=True - ) - csrcs = ( - glob.glob(f"{arch_dir}/src/csrc/**/*.c", recursive=True) - + glob.glob(f"{arch_dir}/src/csrc/**/*.cc", recursive=True) - + glob.glob(f"{arch_dir}/src/csrc/**/*.cpp", recursive=True) - + glob.glob(f"{build_dir}/**/*.c", recursive=True) - + glob.glob(f"{build_dir}/**/*.cc", recursive=True) - + glob.glob(f"{build_dir}/**/*.cpp", recursive=True) - ) - - # Setup paths - inc_paths = [ - os.environ.get("RISCV", "") + "/include" if os.environ.get("RISCV") else "", - f"{arch_dir}/thirdparty/chipyard/tools/DRAMSim2", - build_dir, - f"{arch_dir}/src/csrc/include", - ] - inc_flags = " ".join([f"-I{p}" for p in inc_paths if p]) - - topname = "TestHarness" - - cflags = f"{inc_flags} -DTOP_NAME='\"V{topname}\"' -std=c++17 " - ldflags = ( - f"-lreadline -ldramsim -lfesvr " - f"-L{arch_dir}/thirdparty/chipyard/tools/DRAMSim2 " - f"-L{arch_dir}/thirdparty/chipyard/toolchains/riscv-tools/riscv-isa-sim/build " - f"-L{arch_dir}/thirdparty/chipyard/toolchains/riscv-tools/riscv-isa-sim/build/lib" - ) - - obj_dir = f"{build_dir}/obj_dir" - subprocess.run(f"rm -rf {obj_dir}", shell=True) - os.makedirs(obj_dir, exist_ok=True) - - sources = " ".join(vsrcs + csrcs) - jobs = data.get("jobs", "") - - verilator_cmd = ( - f"verilator -MMD --build -cc --trace -O3 --x-assign fast --x-initial fast --noassert -Wno-fatal " - f"--trace-fst --trace-threads 1 --output-split 10000 --output-split-cfuncs 100 " - f"--unroll-count 256 " - f"-Wno-PINCONNECTEMPTY " - f"-Wno-ASSIGNDLY " - f"-Wno-DECLFILENAME " - f"-Wno-UNUSED " - f"-Wno-UNOPTFLAT " - f"-Wno-BLKANDNBLK " - f"-Wno-style " - f"-Wall " - f"--timing -j {jobs} +incdir+{build_dir} --top {topname} {sources} " - f"-CFLAGS '{cflags}' -LDFLAGS '{ldflags}' --Mdir {obj_dir} --exe" - ) - - result = stream_run_logger( - cmd=verilator_cmd, - logger=context.logger, - cwd=bbdir, - stdout_prefix="verilator build", - stderr_prefix="verilator build", - ) - result = stream_run_logger( - cmd=f"make -C {obj_dir} -f V{topname}.mk {obj_dir}/V{topname}", - logger=context.logger, - cwd=bbdir, - stdout_prefix="verilator build", - stderr_prefix="verilator build", - ) - - # ================================================================================== - # Return result to API - # ================================================================================== - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=data.get("from_run_workflow", False), - extra_fields={"task": "build"}, - ) - - # ================================================================================== - # Continue routing - # ================================================================================== - if data.get("from_run_workflow"): - await context.emit({"topic": "verilator.sim", "data": {**data, "task": "run"}}) - - return diff --git a/workflow/steps/verilator/04_sim_api_step.py b/workflow/steps/verilator/04_sim_api_step.py deleted file mode 100644 index dbfd4d00..00000000 --- a/workflow/steps/verilator/04_sim_api_step.py +++ /dev/null @@ -1,40 +0,0 @@ -import os -import sys -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Verilator Sim", - "description": "run verilator simulation", - "path": "/verilator/sim", - "method": "POST", - "emits": ["verilator.sim"], - "flows": ["verilator"], -} - - -async def handler(req, context): - body = req.get("body") or {} - binary = body.get("binary", "") - batch = body.get("batch", False) - if not binary: - return { - "status": 400, - "body": { - "success": False, - "failure": True, - "returncode": 400, - "message": "binary parameter is required", - }, - } - - await context.emit({"topic": "verilator.sim", "data": {**body, "task": "sim"}}) - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/verilator/04_sim_event_step.py b/workflow/steps/verilator/04_sim_event_step.py deleted file mode 100644 index 521236ad..00000000 --- a/workflow/steps/verilator/04_sim_event_step.py +++ /dev/null @@ -1,124 +0,0 @@ -import os -import subprocess -import sys -from datetime import datetime - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.search_workload import search_workload -from utils.event_common import check_result - -config = { - "type": "event", - "name": "make sim", - "description": "run simulation", - "subscribes": ["verilator.sim"], - "emits": [], - "flows": ["verilator"], -} - - -async def handler(data, context): - # ================================================================================== - # Get simulation parameters - # ================================================================================== - bbdir = get_buckyball_path() - arch_dir = f"{bbdir}/arch" - build_dir = f"{arch_dir}/build" - - # Generate timestamp - timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M") - - binary_name = data.get("binary", "") - success_result, failure_result = await check_result( - context, returncode=(binary_name == None), continue_run=True - ) - - binary_path = search_workload(f"{bbdir}/bb-tests/output/workloads/src", binary_name) - success_result, failure_result = await check_result( - context, returncode=(binary_path == None), continue_run=True - ) - if failure_result: - context.logger.error("binary not found", failure_result) - return - - # Create log and waveform directory - log_dir = f"{arch_dir}/log/{timestamp}-{binary_name}" - waveform_dir = f"{arch_dir}/waveform/{timestamp}-{binary_name}" - topname = "TestHarness" - - os.makedirs(log_dir, exist_ok=True) - os.makedirs(waveform_dir, exist_ok=True) - - bin_path = f"{build_dir}/obj_dir/V{topname}" - batch = data.get("batch", False) - - # Create log and waveform file - log_path = f"{log_dir}/bdb.log" - fst_path = f"{waveform_dir}/waveform.fst" - # Remove old waveform file - subprocess.run(f"rm -f {waveform_dir}/waveform.vcd", shell=True, check=True) - - # ================================================================================== - # Execute simulation script with streaming output - # ================================================================================== - # batch_param = "True" if batch else "False" - # sim_cmd = f"./scripts/sim.sh {bin_path} {binary_path} {log_dir}/stdout.log \ - # {log_dir}/disasm.log {batch_param} {vcd_path} {log_path}" - sim_cmd = ( - f"{bin_path} +permissive +loadmem={binary_path} +loadmem_addr=800000000 " - f"{'+batch ' if batch else ''} " - f"+fst={fst_path} +log={log_path} +permissive-off " - f"{binary_path} > >(tee {log_dir}/stdout.log) 2> >(spike-dasm > {log_dir}/disasm.log)" - ) - script_dir = os.path.dirname(__file__) - - result = stream_run_logger( - cmd=sim_cmd, - logger=context.logger, - cwd=script_dir, - stdout_prefix="verilator sim", - stderr_prefix="verilator sim", - executable="bash", - ) - success_result, failure_result = await check_result( - context, returncode=result.returncode, continue_run=True - ) - if failure_result: - context.logger.error("sim failed", failure_result) - return - - if os.path.exists(f"{waveform_dir}/waveform.fst.heir"): - subprocess.run( - f"gtkwave -f {waveform_dir}/waveform.fst -H {waveform_dir}/waveform.fst.heir", - shell=True, - check=True, - ) - - # ================================================================================== - # Return simulation result - # ================================================================================== - # This is the end point of the run workflow, status will no longer be set to processing - success_result, failure_result = await check_result( - context, - result.returncode, - continue_run=False, - extra_fields={ - "task": "sim", - "binary": binary_path, - "log_dir": log_dir, - "waveform_dir": waveform_dir, - "timestamp": timestamp, - }, - ) - - # ================================================================================== - # Finish workflow - # ================================================================================== - - return diff --git a/workflow/steps/verilator/05_run_api_step.py b/workflow/steps/verilator/05_run_api_step.py deleted file mode 100644 index 77bf175d..00000000 --- a/workflow/steps/verilator/05_run_api_step.py +++ /dev/null @@ -1,52 +0,0 @@ -import asyncio -from utils.event_common import wait_for_result - -config = { - "type": "api", - "name": "Verilator Complete Workflow", - "description": "trigger complete verilator workflow", - "path": "/verilator/run", - "method": "POST", - "emits": ["verilator.run"], - "flows": ["verilator"], -} - - -async def handler(req, context): - body = req.get("body") or {} - - config = { - "binary": body.get("binary", ""), - "config": body.get("config", "sims.verilator.BuckyballToyVerilatorConfig"), - "jobs": body.get("jobs", "16"), - "batch": body.get("batch", False), - "from_run_workflow": True, - } - - await context.emit({"topic": "verilator.run", "data": config}) - - # ================================================================================== - # Wait for simulation result - # - # Expected return result format: - # { - # "status": 200/400/500, - # "body": { - # "success": true/false, - # "failure": true/false, - # "processing": true/false, - # "return_code": 0, - # other fields - # } - # } - # - # Since the Motia framework wraps data in the data field, it needs to be unpacked - # if isinstance(result, dict) and 'data' in result: - # return result['data'] - # return result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/verilator/README.md b/workflow/steps/verilator/README.md deleted file mode 100644 index 25635f69..00000000 --- a/workflow/steps/verilator/README.md +++ /dev/null @@ -1,197 +0,0 @@ -# Verilator Simulation Workflow - -Hardware simulation workflow based on Verilator in the Buckyball framework, providing a complete automation flow from RTL generation to simulation execution. Verilator is a high-performance Verilog simulator that supports fast functional verification and performance analysis. - -## II. Original API Usage Guide - -#### `run` -**Endpoint**: `POST /verilator/run` - -**Function**: Execute complete workflow. Clean build directory, generate Verilog, compile Verilator into simulation file, and run simulation directly - -**Parameters**: - -- **`jobs`** - Number of parallel compilation tasks - - Default value: `16` -- **`binary`** [Required] - Test binary file path - - Default value: `""` - -**Example**: -```bash -# bbdev wrapper -bbdev verilator --run "jobs 256 --binary ${buckyball}/bb-tests/workloads/build/src/CTest/ctest_mvin_mvout_alternate_test_singlecore-baremetal --batch" - -# Raw command -curl -X POST http://localhost:5000/verilator/run -H "Content-Type: application/json" -d '{"jobs": 8, "binary": "/home/user/test.elf"}' -``` - - -#### `clean` - -**Endpoint**: `POST /verilator/clean` - -**Function**: Clean build folder - -**Parameters**: None - -**Example**: -```bash -curl -X POST http://localhost:5000/verilator/clean -``` - -#### `verilog` - -**Endpoint**: `POST /verilator/verilog` - -**Function**: Only generate Verilog code, without compilation and simulation - -**Parameters**: None - -**Example**: -```bash -curl -X POST http://localhost:5000/verilator/verilog -d '{"jobs": 8}' -``` - -#### `build` - -**Endpoint**: `POST /verilator/build` - -**Function**: Compile verilog source files and cpp source files into executable simulation file - -**Parameters**: - -- **`jobs`** - Number of parallel compilation tasks - - Default value: `16` - -**Example**: -```bash -curl -X POST http://localhost:5000/verilator/build -d '{"jobs": 16}' -``` - -#### `sim` - -**Endpoint**: `POST /verilator/sim` - -**Function**: Run existing simulation executable - -**Parameters**: - -- **`binary`** [Required] - Custom test binary file path - -**Example**: -```bash -curl -X POST http://localhost:5000/verilator/sim \ - -H "Content-Type: application/json" \ - -d '{"binary": "/home/user/test_program.elf"}' -``` - - - - -## II. Developer Documentation - -### Directory Structure - -``` -steps/verilator/ -├── 00_start_node_noop_step.py # Workflow entry node definition -├── 00_start_node_noop_step.tsx # Frontend UI component -├── 01_run_api_step.py # Complete workflow API entry -├── 01_clean_api_step.py # Clean API endpoint -├── 01_verilog_api_step.py # Verilog generation API endpoint -├── 01_build_api_step.py # Build API endpoint -├── 01_sim_api_step.py # Simulation API endpoint -├── 02_clean_event_step.py # Clean build directory -├── 03_verilog_event_step.py # Verilog code generation -├── 04_build_event_step.py # Verilator compilation -├── 05_sim_event_step.py # Simulation execution -├── 99_complete_event_step.py # Completion handling -├── 99_error_event_step.py # Error handling -└── README.md # This document -``` - -### Workflow Steps Detailed - -#### 1. Entry Node (`00_start_node_noop_step.py`) -- **Type**: `noop` node -- **Function**: Provide UI interface entry point -- **Frontend**: "Start Build Verilator" button - -#### 2. API Endpoints -- **Complete Workflow API** (`01_run_api_step.py`): `/verilator` → `verilator.run` -- **Clean API** (`01_clean_api_step.py`): `/verilator/clean` → `verilator.clean` -- **Verilog Generation API** (`01_verilog_api_step.py`): `/verilator/verilog` → `verilator.verilog` -- **Build API** (`01_build_api_step.py`): `/verilator/build` → `verilator.build` -- **Simulation API** (`01_sim_api_step.py`): `/verilator/sim` → `verilator.sim` - -#### 3. Clean Step (`02_clean_event_step.py`) -- **Type**: `event` step -- **Subscribes**: `verilator.run`, `verilator.clean` -- **Emits**: `verilator.verilog`, `verilator.complete` -- **Function**: Delete build directory, serves workflow or standalone operation - -#### 4. Verilog Generation (`03_verilog_event_step.py`) -- **Type**: `event` step -- **Subscribes**: `verilator.verilog` -- **Emits**: `verilator.build`, `verilator.complete` -- **Function**: Use mill to generate Verilog code to build directory - -#### 5. Verilator Compilation (`04_build_event_step.py`) -- **Type**: `event` step -- **Subscribes**: `verilator.build` -- **Emits**: `verilator.sim`, `verilator.complete` -- **Function**: Compile Verilog and C++ source files into executable simulation file - -#### 6. Simulation Execution (`05_sim_event_step.py`) -- **Type**: `event` step -- **Subscribes**: `verilator.sim` -- **Emits**: `verilator.complete` -- **Function**: Run simulation, supports custom binary parameter - -#### 7. Completion Handling (`99_complete_event_step.py`) -- **Type**: `event` step -- **Subscribes**: `verilator.complete` -- **Function**: Print success message, mark workflow as complete - -#### 8. Error Handling (`99_error_event_step.py`) -- **Type**: `event` step -- **Subscribes**: `verilator.error` -- **Function**: Print error message, handle workflow exceptions - -### Workflow Diagram - -```mermaid -graph TD; - API[POST /verilator
Complete Workflow] --> RUN[verilator.run] - - CLEAN_DIRECT[verilator.clean
Single-step Clean] --> CLEAN_STEP[02_clean_event_step] - VERILOG_DIRECT[verilator.verilog
Single-step Generate] --> VERILOG_STEP[03_verilog_event_step] - BUILD_DIRECT[verilator.build
Single-step Build] --> BUILD_STEP[04_build_event_step] - SIM_DIRECT[verilator.sim
Single-step Simulation] --> SIM_STEP[05_sim_event_step] - - RUN --> CLEAN_STEP - CLEAN_STEP --> |Workflow Mode| VERILOG_STEP - CLEAN_STEP --> |Single-step Mode| COMPLETE[verilator.complete] - - VERILOG_STEP --> |Workflow Mode| BUILD_STEP - VERILOG_STEP --> |Single-step Mode| COMPLETE - - BUILD_STEP --> |Workflow Mode| SIM_STEP - BUILD_STEP --> |Single-step Mode| COMPLETE - - SIM_STEP --> COMPLETE - - COMPLETE --> COMPLETE_STEP[99_complete_event_step] - - CLEAN_STEP -.-> |Error| ERROR[verilator.error] - VERILOG_STEP -.-> |Error| ERROR - BUILD_STEP -.-> |Error| ERROR - SIM_STEP -.-> |Error| ERROR - - ERROR --> ERROR_STEP[99_error_event_step] - - classDef apiNode fill:#e1f5fe - classDef eventNode fill:#f3e5f5 - classDef stepNode fill:#e8f5e8 - classDef endNode fill:#fff3e0 -``` diff --git a/workflow/steps/workload/01_buidl_api_step.py b/workflow/steps/workload/01_buidl_api_step.py deleted file mode 100644 index daade0a0..00000000 --- a/workflow/steps/workload/01_buidl_api_step.py +++ /dev/null @@ -1,38 +0,0 @@ -import subprocess -import sys -import os -import asyncio -from utils.event_common import wait_for_result - -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - -from utils.path import get_buckyball_path - -config = { - "type": "api", - "name": "build workload", - "description": "build workload", - "path": "/workload/build", - "method": "POST", - "emits": ["workload.build"], - "flows": ["workload"], -} - - -async def handler(req, context): - bbdir = get_buckyball_path() - body = req.get("body") or {} - data = {"workload": body.get("workload", "")} - workload_dir = f"{bbdir}/bb-tests/workload" - await context.emit({"topic": "workload.build", "data": data}) - - # ================================================================================== - # Wait for simulation result - # ================================================================================== - while True: - result = await wait_for_result(context) - if result is not None: - return result - await asyncio.sleep(1) diff --git a/workflow/steps/workload/01_build_event_step.py b/workflow/steps/workload/01_build_event_step.py deleted file mode 100644 index 068013f1..00000000 --- a/workflow/steps/workload/01_build_event_step.py +++ /dev/null @@ -1,61 +0,0 @@ -from contextlib import redirect_stdout -import os -from re import T -import subprocess -import sys -import time - -# Add the utils directory to the Python path -utils_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) -if utils_path not in sys.path: - sys.path.insert(0, utils_path) - - -from utils.path import get_buckyball_path -from utils.stream_run import stream_run_logger -from utils.event_common import check_result - -config = { - "type": "event", - "name": "build workload", - "description": "build workload", - "subscribes": ["workload.build"], - "emits": [], - "flows": ["workload"], -} - - -async def handler(data, context): - bbdir = get_buckyball_path() - workload_dir = f"{bbdir}/bb-tests" - build_dir = f"{workload_dir}/build" - - # os.mkdir(f"{workload_dir}/build", exist_ok=True) - subprocess.run(f"rm -rf {build_dir} && mkdir -p {build_dir}", shell=True) - - # command = f"source {bbdir}/env.sh && cd {workload_dir}/build && cmake .. && make build-all" - command = f"source {bbdir}/env.sh && cd {build_dir} && cmake -G Ninja .. && ninja -j{os.cpu_count()}" - context.logger.info( - "Executing workload command", {"command": command, "cwd": build_dir} - ) - result = stream_run_logger( - cmd=command, - logger=context.logger, - cwd=workload_dir, - executable="bash", - stdout_prefix="workload build", - stderr_prefix="workload build", - ) - - # ================================================================================== - # Return simulation result - # ================================================================================== - # This is the end of run workflow, status no longer set to processing - success_result, failure_result = await check_result( - context, result.returncode, continue_run=False - ) - - # ================================================================================== - # finish workflow - # ================================================================================== - return diff --git a/workflow/steps/workload/README.md b/workflow/steps/workload/README.md deleted file mode 100644 index a6bb3941..00000000 --- a/workflow/steps/workload/README.md +++ /dev/null @@ -1,39 +0,0 @@ -# Workload Workflow - -Workload build workflow in Buckyball framework, used to build test workloads and benchmark programs. - -## API Usage - -### `build` -**Endpoint**: `POST /workload/build` - -**Function**: Build workload - -**Parameters**: -- **`workload`** - Specify workload name to build - -**Examples**: -```bash -# Build specific workload -bbdev workload --build "--workload test_program" - -# Build all workloads -bbdev workload --build -``` - -**Response**: -```json -{ - "status": 200, - "body": { - "success": true, - "processing": false, - "return_code": 0 - } -} -``` - -## Notes - -- Workload source code located in `bb-tests/workload` directory -- Build results typically output to `bb-tests/workloads/build` directory diff --git a/workflow/utils/__init__.py b/workflow/utils/__init__.py deleted file mode 100644 index 6db5263f..00000000 --- a/workflow/utils/__init__.py +++ /dev/null @@ -1,10 +0,0 @@ -from .stream_run import stream_run -from .path import get_buckyball_path -from .port import find_available_port -from .search_workload import ( - search_workload, - search_workload_all, - search_workload_pattern, -) - -__all__ = ["stream_run", "get_buckyball_path", "find_available_port", "search_workload"] diff --git a/workflow/utils/event_common.py b/workflow/utils/event_common.py deleted file mode 100644 index 71796922..00000000 --- a/workflow/utils/event_common.py +++ /dev/null @@ -1,111 +0,0 @@ -""" -Common utility functions for all event steps. -""" - - -async def check_result(context, returncode, continue_run=False, extra_fields=None): - """ - Check returncode, create appropriate result objects and set state. - - Args: - context: The event context object - returncode: The return code (int) - continue_run: If True, set processing state instead of success/failure - extra_fields: Optional dictionary of extra fields to include in result body - - Returns: - tuple: (success_result, failure_result) - one will be None based on returncode and continue_run - """ - extra_fields = extra_fields or {} - - if continue_run: - await context.state.set(context.trace_id, "processing", True) - return None, None - elif returncode != 0: - failure_result = { - "status": 500, - "body": { - "success": False, - "failure": True, - "processing": False, - "returncode": returncode, - **extra_fields, - }, - } - await context.state.set(context.trace_id, "failure", failure_result) - return None, failure_result - else: - success_result = { - "status": 200, - "body": { - "success": True, - "failure": False, - "processing": False, - "returncode": returncode, - **extra_fields, - }, - } - await context.state.set(context.trace_id, "success", success_result) - return success_result, None - - -# ================================================================================== -# API waits for event return result -# -# Expected return result format: -# { -# "status": 200/400/500, -# "body": { -# "success": true/false, -# "failure": true/false, -# "processing": true/false, -# "return_code": 0, -# other fields -# } -# } -# -# Since the Motia framework wraps data in the data field, it needs to be unpacked -# if isinstance(result, dict) and 'data' in result: -# return result['data'] -# return result -# ================================================================================== - - -async def wait_for_result(context): - """ - Check for task completion state (success or failure). - Returns result if found, None if still processing. - - Args: - context: The event context object - - Returns: - dict or None: The result data if task completed, None if still processing - """ - # Check for success result - success_result = await context.state.get(context.trace_id, "success") - if success_result and success_result.get("data"): - # Filter out invalid null state - if success_result == {"data": None} or ( - isinstance(success_result, dict) - and success_result.get("data") is None - and len(success_result) == 1 - ): - await context.state.delete(context.trace_id, "success") - return None - context.logger.info("task completed") - - if isinstance(success_result, dict) and "data" in success_result: - return success_result["data"] - return success_result - - # Check for error status - failure_result = await context.state.get(context.trace_id, "failure") - if failure_result and failure_result.get("data"): - context.logger.error("task failed", failure_result) - - if isinstance(failure_result, dict) and "data" in failure_result: - return failure_result["data"] - return failure_result - - return None diff --git a/workflow/utils/path.py b/workflow/utils/path.py deleted file mode 100644 index 69320c09..00000000 --- a/workflow/utils/path.py +++ /dev/null @@ -1,6 +0,0 @@ -import os - - -def get_buckyball_path(): - current_dir = os.path.dirname(__file__) - return os.path.dirname(os.path.dirname(current_dir)) diff --git a/workflow/utils/port.py b/workflow/utils/port.py deleted file mode 100644 index 82e6bfed..00000000 --- a/workflow/utils/port.py +++ /dev/null @@ -1,16 +0,0 @@ -import socket - - -def find_available_port(start_port: int = 5000, end_port: int = 5500) -> int: - """Find an available port in the specified range""" - for port in range(start_port, end_port + 1): - try: - with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: - sock.bind(("localhost", port)) - return port - except OSError: - # Port is already in use, try next one - continue - - # If no port is available in the range, raise an exception - raise RuntimeError(f"No available port found in range {start_port}-{end_port}") diff --git a/workflow/utils/search_workload.py b/workflow/utils/search_workload.py deleted file mode 100644 index 0d4121f7..00000000 --- a/workflow/utils/search_workload.py +++ /dev/null @@ -1,72 +0,0 @@ -import os -from typing import Optional, List - - -def search_workload(search_dir: str, filename: str) -> Optional[str]: - """ - Recursively search for a specified filename in the directory and its subdirectories - - Args: - search_dir: Root directory to search - filename: Filename to search for - - Returns: - Absolute path of the found file, or None if not found - """ - if not os.path.exists(search_dir): - return None - - for root, dirs, files in os.walk(search_dir): - if filename in files: - return os.path.abspath(os.path.join(root, filename)) - - return None - - -def search_workload_all(search_dir: str, filename: str) -> List[str]: - """ - Recursively search for a specified filename in the directory and its subdirectories, returning all matches - - Args: - search_dir: Root directory to search - filename: Filename to search for - - Returns: - List of absolute paths of all found files - """ - results = [] - - if not os.path.exists(search_dir): - return results - - for root, dirs, files in os.walk(search_dir): - if filename in files: - results.append(os.path.abspath(os.path.join(root, filename))) - - return results - - -def search_workload_pattern(search_dir: str, pattern: str) -> List[str]: - """ - Recursively search for files matching a specified pattern in the directory and its subdirectories - - Args: - search_dir: Root directory to search - pattern: Filename pattern (supports wildcards * and ?) - - Returns: - List of absolute paths of all matching files - """ - import fnmatch - - results = [] - - if not os.path.exists(search_dir): - return results - - for root, dirs, files in os.walk(search_dir): - for file in files: - if fnmatch.fnmatch(file, pattern): - results.append(os.path.abspath(os.path.join(root, file))) - - return results diff --git a/workflow/utils/stream_run.py b/workflow/utils/stream_run.py deleted file mode 100644 index 6bd6dcb1..00000000 --- a/workflow/utils/stream_run.py +++ /dev/null @@ -1,172 +0,0 @@ -import subprocess -import threading -from typing import Optional, List, Callable - - -class StreamResult: - """Result object mimicking subprocess.CompletedProcess""" - - def __init__(self, returncode: int, stdout: str, stderr: str): - self.returncode = returncode - self.stdout = stdout - self.stderr = stderr - - -def stream_run( - cmd: str, - cwd: Optional[str] = None, - shell: bool = True, - executable: Optional[str] = None, - timeout: Optional[float] = None, - on_stdout: Optional[Callable[[str], None]] = None, - on_stderr: Optional[Callable[[str], None]] = None, - stdout_prefix: str = "STDOUT", - stderr_prefix: str = "STDERR", -) -> StreamResult: - """ - Execute command and stream output in real-time - - Args: - cmd: Command to execute - cwd: Working directory - shell: Whether to execute using shell - timeout: Timeout in seconds - on_stdout: Callback function for stdout lines - on_stderr: Callback function for stderr lines - stdout_prefix: Prefix for stdout output - stderr_prefix: Prefix for stderr output - - Returns: - StreamResult: Result object containing returncode, stdout, stderr - - Example: - def log_stdout(line): - logger.info(f'[STDOUT] {line}') - - def log_stderr(line): - logger.info(f'[STDERR] {line}') - - result = stream_run( - "make build", - cwd="/path/to/project", - on_stdout=log_stdout, - on_stderr=log_stderr - ) - """ - - def read_stream( - stream, output_list: List[str], callback: Optional[Callable], prefix: str - ): - """Thread function to read stream output""" - try: - for line in iter(stream.readline, ""): - if line: - line = line.rstrip() - output_list.append(line) - if callback: - callback(line) - finally: - stream.close() - - # Start process - process = subprocess.Popen( - cmd, - cwd=cwd, - shell=shell, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, - text=True, - bufsize=1, - executable=executable, - ) - - stdout_lines = [] - stderr_lines = [] - - # Create threads to read stdout and stderr - stdout_thread = threading.Thread( - target=read_stream, - args=(process.stdout, stdout_lines, on_stdout, stdout_prefix), - ) - stderr_thread = threading.Thread( - target=read_stream, - args=(process.stderr, stderr_lines, on_stderr, stderr_prefix), - ) - - stdout_thread.start() - stderr_thread.start() - - try: - # Wait for process to finish (with timeout) - process.wait(timeout=timeout) - except subprocess.TimeoutExpired: - # Kill process on timeout - process.kill() - process.wait() - - # Wait for threads to finish - stdout_thread.join() - stderr_thread.join() - - return StreamResult( - returncode=process.returncode, - stdout="\n".join(stdout_lines), - stderr="\n".join(stderr_lines), - ) - - -def stream_run_logger( - cmd: str, - logger, - cwd: Optional[str] = None, - shell: bool = True, - executable: Optional[str] = None, - timeout: Optional[float] = None, - stdout_prefix: str = "STDOUT", - stderr_prefix: str = "STDERR", - verbose: bool = False, -) -> StreamResult: - """ - Convenience function for streaming output using logger - - Args: - cmd: Command to execute - logger: Logger instance - cwd: Working directory - shell: Whether to execute using shell - timeout: Timeout in seconds - stdout_prefix: Prefix for stdout output - stderr_prefix: Prefix for stderr output - verbose: Whether to use verbose output mode (verbose mode uses logger with timestamp, non-verbose prints directly) - - Returns: - StreamResult: Result object containing returncode, stdout, stderr - """ - - def log_stdout(line): - if verbose: - # Verbose mode: use logger.info, includes timestamp and task ID - logger.info(f"[{stdout_prefix}] {line}") - else: - # Non-verbose mode: print directly, use green for STDOUT - print(f"\033[32m[{stdout_prefix}]\033[0m {line}") - - def log_stderr(line): - if verbose: - # Verbose mode: use logger.info, includes timestamp and task ID - logger.info(f"[{stderr_prefix}] {line}") - else: - # Non-verbose mode: print directly, use red for STDERR - print(f"\033[31m[{stderr_prefix}]\033[0m {line}") - - return stream_run( - cmd=cmd, - cwd=cwd, - shell=shell, - executable=executable, - timeout=timeout, - on_stdout=log_stdout, - on_stderr=log_stderr, - stdout_prefix=stdout_prefix, - stderr_prefix=stderr_prefix, - ) diff --git a/workflow/vscode b/workflow/vscode deleted file mode 160000 index 0393e997..00000000 --- a/workflow/vscode +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 0393e997a706e320faeeae7296273667c30373db
- - - - - - - - - - - - - -
- -
- - - - - - - - -
-
-

项目目录说明

-

根目录

-
    -
  • arch/ - Scala编写的架构设计文档和代码
  • -
  • bb-test/ - 测试相关
  • -
  • compiler/ - subtree MLIR-based编译器,含LLVM等submodule
  • -
  • docs/ - 文档目录
  • -
  • scripts/ - 初始化脚本
  • -
  • sim/ - submodule RISC-V模拟器(spike)
  • -
  • thirdparty/ - submodules 第三方依赖 -
      -
    • chipyard/ - SoC设计框架
    • -
    • circt/ - CIRCT电路编译器
    • -
    -
  • -
  • tools/ - submodules 工具 -
      -
    • motia/ - 后端服务框架
    • -
    • vistools/ - 可视化工具
    • -
    -
  • -
  • workflow/ - CI/CD工作流配置
  • -
-

compiler/ 内部结构

-
    -
  • llvm/ - submodule LLVM主项目
  • -
  • thirdparty/ - submodules 编译器依赖 -
      -
    • mimalloc/ - 内存分配器
    • -
    • riscv-gnu-toolchain/ - RISC-V工具链
    • -
    -
  • -
  • examples/ - 各种MLIR示例和模型
  • -
  • frontend/ - 前端代码生成
  • -
  • midend/ - 中端优化
  • -
  • tools/ - 编译工具(buddy-opt等)
  • -
-

注: submodule需独立更新,subtree随主仓库同步

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - -