mirror of
https://github.com/go-gitea/gitea
synced 2025-07-27 04:38:36 +00:00
Updated tokenizer to better matching when search for code snippets (#32261)
This PR improves the accuracy of Gitea's code search. Currently, Gitea does not consider statements such as `onsole.log("hello")` as hits when the user searches for `log`. The culprit is how both ES and Bleve are tokenizing the file contents (in both cases, `console.log` is a whole token). In ES' case, we changed the tokenizer to [simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.). In such a case, tokens are words formed by digits and letters. In Bleve's case, it employs a [letter](https://blevesearch.com/docs/Tokenizers/) tokenizer. Resolves #32220 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
This commit is contained in:
Binary file not shown.
@@ -1,2 +1,2 @@
|
||||
P pack-393dc29256bc27cb2ec73898507df710be7a3cf5.pack
|
||||
P pack-a7bef76cf6e2b46bc816936ab69306fb10aea571.pack
|
||||
|
||||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Reference in New Issue
Block a user