目标和参考资料

## 目标
* 用go实现字符串相似度lib
* 处理中文准确度较高(目前很多老外写的库处理中文效果不佳)
* 集成多种相似度算法(编辑距离,汉明编码，骰子系数)

## 莱文斯坦-编辑距离(Levenshtein)
* https://zhuanlan.zhihu.com/p/91667128
* https://www.jianshu.com/p/a617d20162cf
(以上两份参考资料都是创建矩阵，看完算法之后感悟，没有必要创建矩阵，只要缓存x坐标+对角线一个值就行，实现效果一样)
* http://richardminerich.com/tag/damerau-levenshtein-distance/ (补充)
##  Hamming
* https://baike.baidu.com/item/%E6%B1%89%E6%98%8E%E8%B7%9D%E7%A6%BB/475174?fr=aladdin

## Dice's coefficient 
* https://blog.csdn.net/gjk0223/article/details/2314844
n个字符算集合一个元素，这点容易忽略，n是可以配置的，很多开源项目都忽略这点。原论文公式是  2 *(a 和b的交集) /(len(a) + len(b))，默认选择2，但是2对中文不太友好

## Jaro
* https://www.jianshu.com/p/a4af202cb702 (good)
* https://blog.csdn.net/asty9000/article/details/81348857

## TODO
* Damerau-Levenshtein - distance & normalized
* Jaro and Jaro-Winkler - this implementation of Jaro-Winkler does not limit the common prefix length

## 补充
https://help.highbond.com/helpdocs/analytics/13/scripting-guide/zh-cn/Content/lang_ref/functions/r_dicecoefficient.htm

## 参考API设计(取名)
* https://github.com/hbakhtiyor/strsim
## 参考选用了哪些算法名字
* https://github.com/dguo/strsim-rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

目标和参考资料 #1

目标

莱文斯坦-编辑距离(Levenshtein)

Hamming

Dice's coefficient

Jaro

TODO

补充

参考API设计(取名)

参考选用了哪些算法名字

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

目标和参考资料 #1

Description

目标

莱文斯坦-编辑距离(Levenshtein)

Hamming

Dice's coefficient

Jaro

TODO

补充

参考API设计(取名)

参考选用了哪些算法名字

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions