mirror of
				https://github.com/go-gitea/gitea
				synced 2025-10-31 19:38:23 +00:00 
			
		
		
		
	* Migrate to go modules * make vendor * Update mvdan.cc/xurls * make vendor * Update code.gitea.io/git * make fmt-check * Update github.com/go-sql-driver/mysql * make vendor
		
			
				
	
	
		
			119 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
		
			Vendored
		
	
	
	
			
		
		
	
	
			119 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
		
			Vendored
		
	
	
	
| # This fork...
 | |
| 
 | |
| I'm maintaining this fork because the original author was not replying to issues or pull requests.  For now I plan on maintaining this fork as necessary.
 | |
| 
 | |
| ## Status
 | |
| 
 | |
| [](https://travis-ci.org/blevesearch/go-porterstemmer)
 | |
| 
 | |
| [](https://coveralls.io/r/blevesearch/go-porterstemmer?branch=HEAD)
 | |
| 
 | |
| # Go Porter Stemmer
 | |
| 
 | |
| A native Go clean room implementation of the Porter Stemming Algorithm.
 | |
| 
 | |
| This algorithm is of interest to people doing Machine Learning or
 | |
| Natural Language Processing (NLP).
 | |
| 
 | |
| This is NOT a port. This is a native Go implementation from the human-readable
 | |
| description of the algorithm.
 | |
| 
 | |
| I've tried to make it (more) efficient by NOT internally using string's, but
 | |
| instead internally using []rune's and using the same (array) buffer used by
 | |
| the []rune slice (and sub-slices) at all steps of the algorithm.
 | |
| 
 | |
| For Porter Stemmer algorithm, see:
 | |
| 
 | |
| http://tartarus.org/martin/PorterStemmer/def.txt      (URL #1)
 | |
| 
 | |
| http://tartarus.org/martin/PorterStemmer/             (URL #2)
 | |
| 
 | |
| # Departures
 | |
| 
 | |
| Also, since when I initially implemented it, it failed the tests at...
 | |
| 
 | |
| http://tartarus.org/martin/PorterStemmer/voc.txt      (URL #3)
 | |
| 
 | |
| http://tartarus.org/martin/PorterStemmer/output.txt   (URL #4)
 | |
| 
 | |
| ... after reading the human-readble text over and over again to try to figure out
 | |
| what the error I made was (and doing all sorts of things to debug it) I came to the
 | |
| conclusion that the some of these tests were wrong according to the human-readable
 | |
| description of the algorithm.
 | |
| 
 | |
| This led me to wonder if maybe other people's code that was passing these tests had
 | |
| rules that were not in the human-readable description. Which led me to look at the source
 | |
| code here...
 | |
| 
 | |
| http://tartarus.org/martin/PorterStemmer/c.txt        (URL #5)
 | |
| 
 | |
| ... When I looked there I noticed that there are some items marked as a "DEPARTURE",
 | |
| which differ from the original algorithm. (There are 2 of these.)
 | |
| 
 | |
| I implemented these departures, and the tests at URL #3 and URL #4 all passed.
 | |
| 
 | |
| ## Usage
 | |
| 
 | |
| To use this Golang library, use with something like:
 | |
| 
 | |
|     package main
 | |
|     
 | |
|     import (
 | |
|       "fmt"
 | |
|       "github.com/reiver/go-porterstemmer"
 | |
|     )
 | |
|     
 | |
|     func main() {
 | |
|       
 | |
|       word := "Waxes"
 | |
|       
 | |
|       stem := porterstemmer.StemString(word)
 | |
|       
 | |
|       fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
 | |
|     }
 | |
| 
 | |
| Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:
 | |
| 
 | |
|     package main
 | |
|     
 | |
|     import (
 | |
|       "fmt"
 | |
|       "github.com/reiver/go-porterstemmer"
 | |
|     )
 | |
|     
 | |
|     func main() {
 | |
|       
 | |
|       word := []rune("Waxes")
 | |
|       
 | |
|       stem := porterstemmer.Stem(word)
 | |
|       
 | |
|       fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
 | |
|     }
 | |
| 
 | |
| Although NOTE that the above code may modify original slice (named "word" in the example) as a side
 | |
| effect, for efficiency reasons. And that the slice named "stem" in the example above may be a
 | |
| sub-slice of the slice named "word".
 | |
| 
 | |
| Also alternatively, if you already know that your word is already lowercase (and you don't need
 | |
| this library to lowercase your word for you) you can instead use code like:
 | |
| 
 | |
|     package main
 | |
|     
 | |
|     import (
 | |
|       "fmt"
 | |
|       "github.com/reiver/go-porterstemmer"
 | |
|     )
 | |
|     
 | |
|     func main() {
 | |
|       
 | |
|       word := []rune("waxes")
 | |
|       
 | |
|       stem := porterstemmer.StemWithoutLowerCasing(word)
 | |
|       
 | |
|       fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
 | |
|     }
 | |
| 
 | |
| Again NOTE (like with the previous example) that the above code may modify original slice (named
 | |
| "word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem"
 | |
| in the example above may be a sub-slice of the slice named "word".
 |