1
1
mirror of https://github.com/go-gitea/gitea synced 2024-12-27 02:54:27 +00:00
gitea/vendor/github.com/blevesearch/snowballstem/README.md
6543 fdf750e4d4
[Vendor] blevesearch v0.8.1 -> v1.0.7 (#11360)
* Update blevesearch v0.8.1 -> v1.0.7

* make vendor

Co-authored-by: zeripath <art27@cantab.net>
2020-05-10 06:40:54 +01:00

1.9 KiB
Vendored

snowballstem

This repository contains the Go stemmers generated by the Snowball project. They are maintained outside of the core bleve package so that they may be more easily be reused in other contexts.

Usage

All these stemmers export a single Stem() method which operates on a snowball Env structure. The Env structure maintains all state for the stemmer. A new Env is created to point at an initial string. After stemming, the results of the Stem() operation can be retrieved using the Current() method. The Env structure can be reused for subsequent calls by using the SetCurrent() method.

Example

package main

import (
	"fmt"

	"github.com/blevesearch/snowballstem"
	"github.com/blevesearch/snowballstem/english"
)

func main() {

	// words to stem
	words := []string{
		"running",
		"jumping",
	}

	// build new environment
	env := snowballstem.NewEnv("")

	for _, word := range words {
		// set up environment for word
		env.SetCurrent(word)
		// invoke stemmer
		english.Stem(env)
		// print results
		fmt.Printf("%s stemmed to %s\n", word, env.Current())
	}
}

Produces Output:

$ ./snowtest
running stemmed to run
jumping stemmed to jump

Testing

The test harness for these stemmers is hosted in the main Snowball repository. There are functional tests built around the separate snowballstem-data repository, and there is support for fuzz-testing the stemmers there as well.

Generating the Stemmers

$ export SNOWBALL=/path/to/github.com/snowballstem/snowball/after/snowball/built
$ go generate

Updated the Go Generate Commands

A simple tool is provided to automate these from the snowball algorithms directory:

$ go run gengen.go /path/to/github.com/snowballstem/snowball/algorithms