Recommendations on requirements for {{unicode(encode,decode)}} #6177
① This verification only uses the "Chinese" Unicode for testing and simply decodes Chinese using the Go language.
② The complete Unicode dictionary was found on the Internet: https://www.unicode.org/versions/Unicode16.0.0/#Components
③ The ultimate goal is to achieve the function that when Unicode is matched, it can be decoded (in addition, it is hoped that nuclei can specify the text for encoding during verification).
I wrote the exploration script, but some urls appear unicode encoding characters. For example, baidu.com shows "\u767e\u5ea6\u4e00\u4e0b\uff0c\u4f60\u5c31\u77e5\u9053".

I usually use unicode after extracting "Chinese" validate (conversion website: http://www.jsons.cn/unicode)

yaml:
id: alive-check-20250328
info:
name: alive-check
author: alive-check
severity: info
description: status test
http:
After consulting, I found that the go language supports unicode decoding. This code tests reading the local "unicodelist.txt" file and performing decoding.
unicodelist.txt sample:
\u767e\u5ea6\u4e00\u4e0b\u006f\u006b\uff0c\u4f60\u5c31\u77e5\u9053\uff0c\ua\u662f\u0020\u0061\u0061
\u767e\u5ea6\u4e00\u4e0b\u006f\u006b
\u767e\u5ea6\u4e00\u4e0b\u006f\u006b
\u006f\u006b

main.go run result:

go test code:
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"regexp"
"strconv"
"strings"
)
// Fix the broken escape of \u (such as \ua → line break + \u)
func fixBrokenUnicode(data string) string {
var result strings.Builder
i := 0
for i < len(data) {
if strings.HasPrefix(data[i:], `\u`) {
// Attempt to take four hexadecimal characters
end := i + 6
if end <= len(data) {
hex := data[i+2 : end]
if matched, _ := regexp.MatchString(`^[0-9a-fA-F]{4}$`, hex); matched {
result.WriteString(data[i:end])
i = end
continue
}
}
// Illegal \u escape, skipping the current \u and up to 4 characters following it
j := i + 2
for j < len(data) && j-i < 6 {
if !((data[j] >= '0' && data[j] <= '9') ||
(data[j] >= 'a' && data[j] <= 'f') ||
(data[j] >= 'A' && data[j] <= 'F')) {
break
}
j++
}
result.WriteString("\n")
i = j // Skip the part of illegal escape
} else {
result.WriteByte(data[i])
i++
}
}
return result.String()
}
// Decode \uXXXX or \UXXXXXXXX
func EscapeUnicode(data []byte) []byte {
re := regexp.MustCompile(`(\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8})+`)
for _, match := range re.FindAll(data, -1) {
str, err := strconv.Unquote(`"` + string(match) + `"`)
if err == nil {
data = bytes.ReplaceAll(data, match, []byte(str))
}
}
return data
}
func main() {
file, err := os.Open("unicodelist.txt")
if err != nil {
fmt.Println("Failed to open the file:", err)
return
}
defer file.Close()
scanner := bufio.NewScanner(file)
lineNum := 1
for scanner.Scan() {
line := scanner.Text()
// Fix illegal Unicode escape (line breaks)
fixedLine := fixBrokenUnicode(line)
// Decode as UTF-8
decoded := EscapeUnicode([]byte(fixedLine))
fmt.Printf("line %d : %s\n", lineNum, string(decoded))
lineNum++
}
if err := scanner.Err(); err != nil {
fmt.Println("Error in reading the file:", err)
}
}
Recommendations on requirements for {{unicode(encode,decode)}} #6177
① This verification only uses the "Chinese" Unicode for testing and simply decodes Chinese using the Go language.
② The complete Unicode dictionary was found on the Internet: https://www.unicode.org/versions/Unicode16.0.0/#Components
③ The ultimate goal is to achieve the function that when Unicode is matched, it can be decoded (in addition, it is hoped that nuclei can specify the text for encoding during verification).
I wrote the exploration script, but some urls appear unicode encoding characters. For example, baidu.com shows "\u767e\u5ea6\u4e00\u4e0b\uff0c\u4f60\u5c31\u77e5\u9053".


I usually use unicode after extracting "Chinese" validate (conversion website: http://www.jsons.cn/unicode)
yaml:
id: alive-check-20250328
info:
name: alive-check
author: alive-check
severity: info
description: status test
http:
raw:
GET / HTTP/1.1
Host: {{Hostname}}
matchers-condition: and
matchers:
status:
extractors:
name: title
group: 1
regex:
group: 1
regex:
After consulting, I found that the go language supports unicode decoding. This code tests reading the local "unicodelist.txt" file and performing decoding.
unicodelist.txt sample:
\u767e\u5ea6\u4e00\u4e0b\u006f\u006b\uff0c\u4f60\u5c31\u77e5\u9053\uff0c\ua\u662f\u0020\u0061\u0061

\u767e\u5ea6\u4e00\u4e0b\u006f\u006b
\u767e\u5ea6\u4e00\u4e0b\u006f\u006b
\u006f\u006b
main.go run result:

go test code: