Skip to content

Commit 104c7b3

Browse files
authored
repeat (#69)
* repeat:ompsimd 实现完成,待验证 * repeat:ompsimd 实现完成,待验证 * repeat:cuda 实现完成,完成验证
1 parent b875ed1 commit 104c7b3

33 files changed

Lines changed: 660 additions & 143 deletions

doc/excuter/op-mem-cuda/list.md

Lines changed: 63 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,105 +2,106 @@
22

33
本页面由 `excuter/op-mem-cuda 生成,请勿手动修改
44

5-
### arg
5+
### matmul
66

77
| Operation | Author | Math Formula | IR Instruction |
88
|-----------|--------|--------------|----------------|
9-
| vecset | none | [3 4 5]->shape | vecset(vector<any>:value)->(vector<any>:name) |
10-
| argset | none | argvalue->argname | argset(var<any>:value)->(var<any>:name) |
9+
| matmul | cublas | T3=T1 @ T2 | matmul(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
1110

12-
### tensorlife
11+
### init
1312

1413
| Operation | Author | Math Formula | IR Instruction |
1514
|-----------|--------|--------------|----------------|
16-
| renametensor | none | rename(newname)->T1 | renametensor(var<string>:new_name)->(tensor<any>:t) |
17-
| newtensor | none | T1 = zeros(shape) | newtensor(vector<int32>:shape)->(tensor<any>:tensor1) |
18-
| newtensor | none | T1 = zeros(shape) | newtensor(var<string>:shape)->(tensor<any>:tensor1) |
19-
| deltensor | none | del->T1 | deltensor()->(tensor<any>:t) |
20-
| copytensor | none | T2.data = T1.data | copytensor(tensor<any>:src)->(tensor<any>:dst) |
15+
| constant | miaobyte | constant(value)->T1 | constant(var<any>:value)->(tensor<any>:t) |
16+
| arange | miaobyte | arange(start,step)->T1 | arange(var<any>:start, var<any>:step)->(tensor<any>:t) |
17+
| uniform | miaobyte | uniform(low,high,seed)->T1 | uniform(var<any>:low, var<any>:high, var<int32>:seed)->(tensor<any>:t) |
18+
| dropout | miaobyte | dropout(p,seed)->A | dropout(var<float32>:p, var<int32>:seed)->(tensor<any>:A) |
19+
| normal | miaobyte | normal(mean,stddev,seed)->T1 | normal(var<any>:mean, var<any>:stddev, var<int32>:seed)->(tensor<any>:t) |
2120

2221
### io
2322

2423
| Operation | Author | Math Formula | IR Instruction |
2524
|-----------|--------|--------------|----------------|
26-
| loadtensordata | none | loadtensordata(path)->tensor | loadtensordata(var<string>:path)->(tensor<any>:t) |
27-
| save | none | save(T1,path) | save(tensor<any>:t, var<string>:path)->() |
25+
| load | none | load(path) | load(var<string>:path)->() |
2826
| print | miaobyte | print(T1) | print(tensor<any>:t)->() |
2927
| print | miaobyte | print(T1) | print(tensor<any>:t, var<string>:format)->() |
30-
| load | none | load(path) | load(var<string>:path)->() |
28+
| save | none | save(T1,path) | save(tensor<any>:t, var<string>:path)->() |
29+
| loadtensordata | none | loadtensordata(path)->tensor | loadtensordata(var<string>:path)->(tensor<any>:t) |
3130

32-
### matmul
31+
### arg
3332

3433
| Operation | Author | Math Formula | IR Instruction |
3534
|-----------|--------|--------------|----------------|
36-
| matmul | cublas | T3=T1 @ T2 | matmul(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
35+
| argset | none | argvalue->argname | argset(var<any>:value)->(var<any>:name) |
36+
| vecset | none | [3 4 5]->shape | vecset(vector<any>:value)->(vector<any>:name) |
3737

38-
### init
38+
### tensorlife
3939

4040
| Operation | Author | Math Formula | IR Instruction |
4141
|-----------|--------|--------------|----------------|
42-
| normal | miaobyte | normal(mean,stddev,seed)->T1 | normal(var<any>:mean, var<any>:stddev, var<int32>:seed)->(tensor<any>:t) |
43-
| dropout | miaobyte | dropout(p,seed)->A | dropout(var<float32>:p, var<int32>:seed)->(tensor<any>:A) |
44-
| uniform | miaobyte | uniform(low,high,seed)->T1 | uniform(var<any>:low, var<any>:high, var<int32>:seed)->(tensor<any>:t) |
45-
| arange | miaobyte | arange(start,step)->T1 | arange(var<any>:start, var<any>:step)->(tensor<any>:t) |
46-
| constant | miaobyte | constant(value)->T1 | constant(var<any>:value)->(tensor<any>:t) |
42+
| copytensor | none | T2.data = T1.data | copytensor(tensor<any>:src)->(tensor<any>:dst) |
43+
| newtensor | none | T1 = zeros(shape) | newtensor(vector<int32>:shape)->(tensor<any>:tensor1) |
44+
| newtensor | none | T1 = zeros(shape) | newtensor(var<string>:shape)->(tensor<any>:tensor1) |
45+
| renametensor | none | rename(newname)->T1 | renametensor(var<string>:new_name)->(tensor<any>:t) |
46+
| deltensor | none | del->T1 | deltensor()->(tensor<any>:t) |
4747

4848
### elementwise
4949

5050
| Operation | Author | Math Formula | IR Instruction |
5151
|-----------|--------|--------------|----------------|
52-
| switch | miaobyte | C=switch(tensors,cases) | switch(listtensor<any>:tensors, tensor<int32|bool>:cases)->(tensor<any>:result) |
53-
| greaterscalar | miaobyte | mask=compare(T1, scalar) | greaterscalar(tensor<any>:A, var<any>:scalar)->(tensor<bool>:mask) |
54-
| notequal | miaobyte | T1!=T2->mask | notequal(tensor<any>:A, tensor<any>:B, var<float32>:epsilon)->(tensor<bool>:mask) |
55-
| equalscalar | miaobyte | T1==scalar->mask | equalscalar(tensor<any>:A, var<any>:scalar, var<float32>:epsilon)->(tensor<bool>:mask) |
56-
| min | miaobyte | T3=min(T1, T2) | min(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
57-
| maxscalar | miaobyte | T3=max(T1, scalar) | maxscalar(tensor<any>:A, var<any>:scalar)->(tensor<any>:C) |
58-
| tan | miaobyte | T3=tan(T1) | tan(tensor<float64|float32>:A)->(tensor<float64|float32>:C) |
59-
| sin | miaobyte | T3=sin(T1) | sin(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
60-
| less | miaobyte | mask=compare(T1, T2) | less(tensor<any>:A, tensor<any>:B)->(tensor<bool>:mask) |
61-
| powscalar | miaobyte | T3=pow(T1, scalar) | powscalar(tensor<float64|float32>:A, var<float64|int32>:scalar)->(tensor<float64|float32>:C) |
62-
| rsubscalar | miaobyte | T3=scalar-T1 | rsubscalar(var<any>:scalar, tensor<any>:A)->(tensor<any>:C) |
63-
| divscalar | miaobyte | T3=scalar/T1 | divscalar(tensor<any>:A, var<any>:scalar)->(tensor<any>:C) |
64-
| log | miaobyte | T3=log(T1) | log(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
65-
| addscalar | miaobyte | T3=T1+scalar | addscalar(tensor<any>:A, var<any>:b)->(tensor<any>:C) |
52+
| pow | miaobyte | T3=pow(T1, T2) | pow(tensor<float64|float32>:A, tensor<float64|float32>:B)->(tensor<float64|float32>:C) |
53+
| max | miaobyte | T3=max(T1, T2) | max(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
54+
| mulscalar | miaobyte | T3=T1*scalar | mulscalar(tensor<any>:A, var<any>:b)->(tensor<any>:C) |
55+
| equal | miaobyte | T1==T2->mask | equal(tensor<any>:A, tensor<any>:B, var<float32>:epsilon)->(tensor<bool>:mask) |
56+
| mul | miaobyte | T3=T1*T2 | mul(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
57+
| exp | miaobyte | T3=exp(T1) | exp(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
58+
| subscalar | miaobyte | T3=T1-scalar | subscalar(tensor<any>:A, var<any>:b)->(tensor<any>:C) |
59+
| sqrt | miaobyte | T3=sqrt(T1) | sqrt(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
60+
| sub | miaobyte | T3=T1-T2 | sub(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
61+
| add | cublas | T3=T1+T2 | add(tensor<any>:a, tensor<any>:b)->(tensor<any>:c) |
62+
| add | miaobyte | T3=T1+T2 | add(tensor<any>:a, tensor<any>:b)->(tensor<any>:c) |
63+
| todtype | none | T3(dtypeA)->T1(dtypeB) | todtype(tensor<any>:a)->(tensor<any>:b) |
64+
| invert | miaobyte | T3=~T1 | invert(tensor<int64|int32|int16|int8|bool>:A)->(tensor<int64|int32|int16|int8|bool>:C) |
65+
| rdivscalar | miaobyte | T3=scalar/T1 | rdivscalar(var<any>:scalar, tensor<any>:A)->(tensor<any>:C) |
66+
| rpowscalar | miaobyte | T3=pow(scalar, T1) | rpowscalar(var<float32|int32>:scalar, tensor<float64|float32>:A)->(tensor<float64|float32>:C) |
67+
| cos | miaobyte | T3=cos(T1) | cos(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
6668
| greater | miaobyte | mask=compare(T1, T2) | greater(tensor<any>:A, tensor<any>:B)->(tensor<bool>:mask) |
69+
| addscalar | miaobyte | T3=T1+scalar | addscalar(tensor<any>:A, var<any>:b)->(tensor<any>:C) |
70+
| div | miaobyte | T3=T1/T2 | div(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
71+
| divscalar | miaobyte | T3=scalar/T1 | divscalar(tensor<any>:A, var<any>:scalar)->(tensor<any>:C) |
6772
| lessscalar | miaobyte | mask=compare(T1, scalar) | lessscalar(tensor<any>:A, var<any>:scalar)->(tensor<bool>:mask) |
68-
| cos | miaobyte | T3=cos(T1) | cos(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
73+
| less | miaobyte | mask=compare(T1, T2) | less(tensor<any>:A, tensor<any>:B)->(tensor<bool>:mask) |
6974
| notequalscalar | miaobyte | T1!=scalar->mask | notequalscalar(tensor<any>:A, var<any>:scalar, var<float32>:epsilon)->(tensor<bool>:mask) |
75+
| rsubscalar | miaobyte | T3=scalar-T1 | rsubscalar(var<any>:scalar, tensor<any>:A)->(tensor<any>:C) |
76+
| sin | miaobyte | T3=sin(T1) | sin(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
7077
| minscalar | miaobyte | T3=min(T1, scalar) | minscalar(tensor<any>:A, var<any>:scalar)->(tensor<any>:C) |
71-
| rpowscalar | miaobyte | T3=pow(scalar, T1) | rpowscalar(var<float32|int32>:scalar, tensor<float64|float32>:A)->(tensor<float64|float32>:C) |
72-
| rdivscalar | miaobyte | T3=scalar/T1 | rdivscalar(var<any>:scalar, tensor<any>:A)->(tensor<any>:C) |
73-
| todtype | none | T3(dtypeA)->T1(dtypeB) | todtype(tensor<any>:a)->(tensor<any>:b) |
74-
| add | cublas | T3=T1+T2 | add(tensor<any>:a, tensor<any>:b)->(tensor<any>:c) |
75-
| add | miaobyte | T3=T1+T2 | add(tensor<any>:a, tensor<any>:b)->(tensor<any>:c) |
76-
| sub | miaobyte | T3=T1-T2 | sub(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
77-
| sqrt | miaobyte | T3=sqrt(T1) | sqrt(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
78-
| subscalar | miaobyte | T3=T1-scalar | subscalar(tensor<any>:A, var<any>:b)->(tensor<any>:C) |
79-
| exp | miaobyte | T3=exp(T1) | exp(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
80-
| mul | miaobyte | T3=T1*T2 | mul(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
81-
| equal | miaobyte | T1==T2->mask | equal(tensor<any>:A, tensor<any>:B, var<float32>:epsilon)->(tensor<bool>:mask) |
82-
| mulscalar | miaobyte | T3=T1*scalar | mulscalar(tensor<any>:A, var<any>:b)->(tensor<any>:C) |
83-
| div | miaobyte | T3=T1/T2 | div(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
84-
| invert | miaobyte | T3=~T1 | invert(tensor<int64|int32|int16|int8|bool>:A)->(tensor<int64|int32|int16|int8|bool>:C) |
85-
| max | miaobyte | T3=max(T1, T2) | max(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
86-
| pow | miaobyte | T3=pow(T1, T2) | pow(tensor<float64|float32>:A, tensor<float64|float32>:B)->(tensor<float64|float32>:C) |
78+
| tan | miaobyte | T3=tan(T1) | tan(tensor<float64|float32>:A)->(tensor<float64|float32>:C) |
79+
| maxscalar | miaobyte | T3=max(T1, scalar) | maxscalar(tensor<any>:A, var<any>:scalar)->(tensor<any>:C) |
80+
| min | miaobyte | T3=min(T1, T2) | min(tensor<any>:A, tensor<any>:B)->(tensor<any>:C) |
81+
| log | miaobyte | T3=log(T1) | log(tensor<float64|float32|float16|bfloat16>:A)->(tensor<float64|float32|float16|bfloat16>:C) |
82+
| equalscalar | miaobyte | T1==scalar->mask | equalscalar(tensor<any>:A, var<any>:scalar, var<float32>:epsilon)->(tensor<bool>:mask) |
83+
| notequal | miaobyte | T1!=T2->mask | notequal(tensor<any>:A, tensor<any>:B, var<float32>:epsilon)->(tensor<bool>:mask) |
84+
| greaterscalar | miaobyte | mask=compare(T1, scalar) | greaterscalar(tensor<any>:A, var<any>:scalar)->(tensor<bool>:mask) |
85+
| switch | miaobyte | C=switch(tensors,cases) | switch(listtensor<any>:tensors, tensor<int32|bool>:cases)->(tensor<any>:result) |
86+
| powscalar | miaobyte | T3=pow(T1, scalar) | powscalar(tensor<float64|float32>:A, var<float64|int32>:scalar)->(tensor<float64|float32>:C) |
8787

88-
### reduce
88+
### changeshape
8989

9090
| Operation | Author | Math Formula | IR Instruction |
9191
|-----------|--------|--------------|----------------|
92-
| prod | miaobyte | B = prod(A, axis=[1 2], keepdims=false) | prod(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
93-
| reducemax | miaobyte | B = reducemax(A, axis=[1 2], keepdims=false) | reducemax(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
94-
| sum | miaobyte | B = sum(A, axis=[1 2], keepdims=false) | sum(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
95-
| reducemin | miaobyte | B = reducemin(A, axis=[1 2], keepdims=false) | reducemin(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
92+
| reshape | miaobyte | T1.reshape(shape)->T2 | reshape(tensor<any>:A, vector<int32>:shape)->(tensor<any>:B) |
93+
| transpose | miaobyte | T2 = T1.transpose(dimorder=[1,0]) | transpose(tensor<any>:A, vector<int32>:dim_order)->(tensor<any>:C) |
94+
| concat | miaobyte | Tresult = concat([T1, T2...], axis=3) | concat(listtensor<any>:tensors, var<int32>:dim)->(tensor<any>:result) |
95+
| broadcastTo | miaobyte | T2 = T1.broadcastTo(new_shape=[4,3,2]) | broadcastTo(tensor<any>:A, vector<int32>:new_shape)->(tensor<any>:B) |
96+
| indexselect | miaobyte | T2 = T1.indexselect(index=[1,2], axis=1) | indexselect(tensor<any>:A, tensor<int64|int32>:indices, var<int32>:axis)->(tensor<any>:B) |
97+
| repeat | miaobyte | T2 = T1.repeat(repeats=[3 4 5]) | repeat(tensor<any>:A, vector<int32>:repeats)->(tensor<any>:B) |
9698

97-
### changeshape
99+
### reduce
98100

99101
| Operation | Author | Math Formula | IR Instruction |
100102
|-----------|--------|--------------|----------------|
101-
| indexselect | miaobyte | T2 = T1.indexselect(index=[1,2], axis=1) | indexselect(tensor<any>:A, tensor<int64|int32>:indices, var<int32>:axis)->(tensor<any>:B) |
102-
| broadcastTo | miaobyte | T2 = T1.broadcastTo(new_shape=[4,3,2]) | broadcastTo(tensor<any>:A, vector<int32>:new_shape)->(tensor<any>:B) |
103-
| concat | miaobyte | Tresult = concat([T1, T2...], axis=3) | concat(listtensor<any>:tensors, var<int32>:dim)->(tensor<any>:result) |
104-
| transpose | miaobyte | T2 = T1.transpose(dimorder=[1,0]) | transpose(tensor<any>:A, vector<int32>:dim_order)->(tensor<any>:C) |
105-
| reshape | miaobyte | T1.reshape(shape)->T2 | reshape(tensor<any>:A, vector<int32>:shape)->(tensor<any>:B) |
103+
| reducemin | miaobyte | B = reducemin(A, axis=[1 2], keepdims=false) | reducemin(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
104+
| sum | miaobyte | B = sum(A, axis=[1 2], keepdims=false) | sum(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
105+
| reducemax | miaobyte | B = reducemax(A, axis=[1 2], keepdims=false) | reducemax(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
106+
| prod | miaobyte | B = prod(A, axis=[1 2], keepdims=false) | prod(tensor<any>:A, vector<int32>:dims, var<bool>:keepdims)->(tensor<any>:B) |
106107

0 commit comments

Comments
 (0)