1
1
mirror of https://github.com/go-gitea/gitea synced 2025-01-22 15:44:27 +00:00
Lunny Xiao 5dbf36f356
Issue search support elasticsearch (#9428)
* Issue search support elasticsearch

* Fix lint

* Add indexer name on app.ini

* add a warnning on SearchIssuesByKeyword

* improve code
2020-02-13 14:06:17 +08:00

17 KiB
Vendored

Elastic 3.0

Elasticsearch 2.0 comes with some breaking changes. You will probably need to upgrade your application and/or rewrite part of it due to those changes.

We use that window of opportunity to also update Elastic (the Go client) from version 2.0 to 3.0. This will introduce both changes due to the Elasticsearch 2.0 update as well as changes that make Elastic cleaner by removing some old cruft.

So, to summarize:

  1. Elastic 2.0 is compatible with Elasticsearch 1.7+ and is still actively maintained.
  2. Elastic 3.0 is compatible with Elasticsearch 2.0+ and will soon become the new master branch.

The rest of the document is a list of all changes in Elastic 3.0.

Pointer types

All types have changed to be pointer types, not value types. This not only is cleaner but also simplifies the API as illustrated by the following example:

Example for Elastic 2.0 (old):

q := elastic.NewMatchAllQuery()
res, err := elastic.Search("one").Query(&q).Do()  // notice the & here

Example for Elastic 3.0 (new):

q := elastic.NewMatchAllQuery()
res, err := elastic.Search("one").Query(q).Do()   // no more &
// ... which can be simplified as:
res, err := elastic.Search("one").Query(elastic.NewMatchAllQuery()).Do()

It also helps to prevent subtle issues.

Query/filter merge

One of the biggest changes in Elasticsearch 2.0 is the merge of queries and filters. In Elasticsearch 1.x, you had a whole range of queries and filters that were basically identical (e.g. term_query and term_filter).

The practical aspect of the merge is that you can now basically use queries where once you had to use filters instead. For Elastic 3.0 this means: We could remove a whole bunch of files. Yay!

Notice that some methods still come by "filter", e.g. PostFilter. However, they accept a Query now when they used to accept a Filter before.

Example for Elastic 2.0 (old):

q := elastic.NewMatchAllQuery()
f := elastic.NewTermFilter("tag", "important")
res, err := elastic.Search().Index("one").Query(&q).PostFilter(f)

Example for Elastic 3.0 (new):

q := elastic.NewMatchAllQuery()
f := elastic.NewTermQuery("tag", "important") // it's a query now!
res, err := elastic.Search().Index("one").Query(q).PostFilter(f)

Facets are removed

Facets have been removed in Elasticsearch 2.0. You need to use aggregations now.

Errors

Elasticsearch 2.0 returns more information about an error in the HTTP response body. Elastic 3.0 now reads this information and makes it accessible by the consumer.

Errors and all its details are now returned in Error.

HTTP Status 404 (Not Found)

When Elasticsearch does not find an entity or an index, it generally returns HTTP status code 404. In Elastic 2.0 this was a valid result and didn't raise an error from the Do functions. This has now changed in Elastic 3.0.

Starting with Elastic 3.0, there are only two types of responses considered successful. First, responses with HTTP status codes [200..299]. Second, HEAD requests which return HTTP status 404. The latter is used by Elasticsearch to e.g. check for existence of indices or documents. All other responses will return an error.

To check for HTTP Status 404 (with non-HEAD requests), e.g. when trying to get or delete a missing document, you can use the IsNotFound helper (see below).

The following example illustrates how to check for a missing document in Elastic 2.0 and what has changed in 3.0.

Example for Elastic 2.0 (old):

res, err = client.Get().Index("one").Type("tweet").Id("no-such-id").Do()
if err != nil {
  // Something else went wrong (but 404 is NOT an error in Elastic 2.0)
}
if !res.Found {
	// Document has not been found
}

Example for Elastic 3.0 (new):

res, err = client.Get().Index("one").Type("tweet").Id("no-such-id").Do()
if err != nil {
  if elastic.IsNotFound(err) {
    // Document has not been found
  } else {
    // Something else went wrong
  }
}

HTTP Status 408 (Timeouts)

Elasticsearch now responds with HTTP status code 408 (Timeout) when a request fails due to a timeout. E.g. if you specify a timeout with the Cluster Health API, the HTTP response status will be 408 if the timeout is raised. See here for the specific commit to the Cluster Health API.

To check for HTTP Status 408, we introduced the IsTimeout helper.

Example for Elastic 2.0 (old):

health, err := client.ClusterHealth().WaitForStatus("yellow").Timeout("1s").Do()
if err != nil {
  // ...
}
if health.TimedOut {
  // We have a timeout
}

Example for Elastic 3.0 (new):

health, err := client.ClusterHealth().WaitForStatus("yellow").Timeout("1s").Do()
if elastic.IsTimeout(err) {
  // We have a timeout
}

Bulk Errors

The error response of a bulk operation used to be a simple string in Elasticsearch 1.x. In Elasticsearch 2.0, it returns a structured JSON object with a lot more details about the error. These errors are now captured in an object of type ErrorDetails which is used in BulkResponseItem.

Removed specific Elastic errors

The specific error types ErrMissingIndex, ErrMissingType, and ErrMissingId have been removed. They were only used by DeleteService and are replaced by a generic error message.

Numeric types

Elastic 3.0 has settled to use float64 everywhere. It used to be a mix of float32 and float64 in Elastic 2.0. E.g. all boostable queries in Elastic 3.0 now have a boost type of float64 where it used to be float32.

Pluralization

Some services accept zero, one or more indices or types to operate on. E.g. in the SearchService accepts a list of zero, one, or more indices to search and therefor had a func called Index(index string) and a func called Indices(indices ...string).

Elastic 3.0 now only uses the singular form that, when applicable, accepts a variadic type. E.g. in the case of the SearchService, you now only have one func with the following signature: Index(indices ...string).

Notice this is only limited to Index(...) and Type(...). There are other services with variadic functions. These have not been changed.

Multiple calls to variadic functions

Some services with variadic functions have cleared the underlying slice when called while other services just add to the existing slice. This has now been normalized to always add to the underlying slice.

Example for Elastic 2.0 (old):

// Would only cleared scroll id "two"
// because ScrollId cleared the values when called multiple times
client.ClearScroll().ScrollId("one").ScrollId("two").Do()

Example for Elastic 3.0 (new):

// Now (correctly) clears both scroll id "one" and "two"
// because ScrollId no longer clears the values when called multiple times
client.ClearScroll().ScrollId("one").ScrollId("two").Do()

Ping service requires URL

The Ping service raised some issues because it is different from all other services. If not explicitly given a URL, it always pings 127.0.0.1:9200.

Users expected to ping the cluster, but that is not possible as the cluster can be a set of many nodes: So which node do we ping then?

To make it more clear, the Ping function on the client now requires users to explicitly set the URL of the node to ping.

Meta fields

Many of the meta fields e.g. _parent or _routing are now part of the top-level of a document and are no longer returned as parts of the fields object. We had to change larger parts of e.g. the Reindexer to get it to work seamlessly with Elasticsearch 2.0.

Notice that all stored meta-fields are now returned by default.

HasParentQuery / HasChildQuery

NewHasParentQuery and NewHasChildQuery must now include both parent/child type and query. It is now in line with the Java API.

Example for Elastic 2.0 (old):

allQ := elastic.NewMatchAllQuery()
q := elastic.NewHasChildFilter("tweet").Query(&allQ)

Example for Elastic 3.0 (new):

q := elastic.NewHasChildQuery("tweet", elastic.NewMatchAllQuery())

SetBasicAuth client option

You can now tell Elastic to pass HTTP Basic Auth credentials with each request. In previous versions of Elastic you had to set up your own http.Transport to do this. This should make it more convenient to use Elastic in combination with Shield in its basic setup.

Example:

client, err := elastic.NewClient(elastic.SetBasicAuth("user", "secret"))
if err != nil {
  log.Fatal(err)
}

Delete-by-Query API

The Delete-by-Query API is a plugin now. It is no longer core part of Elasticsearch. You can install it as a plugin as described here.

Elastic 3.0 still contains the DeleteByQueryService, but you need to install the plugin first. If you don't install it and use DeleteByQueryService you will most probably get a 404.

An older version of this document stated the following:

Elastic 3.0 still contains the DeleteByQueryService but it will fail with ErrPluginNotFound when the plugin is not installed.

Example for Elastic 3.0 (new):

_, err := client.DeleteByQuery().Query(elastic.NewTermQuery("client", "1")).Do()
if err == elastic.ErrPluginNotFound {
	// Delete By Query API is not available
}

I have decided that this is not a good way to handle the case of a missing plugin. The main reason is that with this logic, you'd always have to check if the plugin is missing in case of an error. This is not only slow, but it also puts logic into a service where it should really be just opaque and return the response of Elasticsearch.

If you rely on certain plugins to be installed, you should check on startup. That's where the following two helpers come into play.

HasPlugin and SetRequiredPlugins

Some of the core functionality of Elasticsearch has now been moved into plugins. E.g. the Delete-by-Query API is a plugin now.

You need to make sure to add these plugins to your Elasticsearch installation to still be able to use the DeleteByQueryService. You can test this now with the HasPlugin(name string) helper in the client.

Example for Elastic 3.0 (new):

err, found := client.HasPlugin("delete-by-query")
if err == nil && found {
	// ... Delete By Query API is available
}

To simplify this process, there is now a SetRequiredPlugins helper that can be passed as an option func when creating a new client. If the plugin is not installed, the client wouldn't be created in the first place.

// Will raise an error if the "delete-by-query" plugin is NOT installed
client, err := elastic.NewClient(elastic.SetRequiredPlugins("delete-by-query"))
if err != nil {
  log.Fatal(err)
}

Notice that there also is a way to define mandatory plugins in the Elasticsearch configuration file.

Common Query has been renamed to Common Terms Query

The CommonQuery has been renamed to CommonTermsQuery to be in line with the Java API.

Remove MoreLikeThis and MoreLikeThisField

The More Like This API and the More Like This Field query have been removed and replaced with the MoreLikeThisQuery.

Remove Filtered Query

With the merge of queries and filters, the filtered query became deprecated. While it is only deprecated and therefore still available in Elasticsearch 2.0, we have decided to remove it from Elastic 3.0. Why? Because we think that when you're already forced to rewrite many of your application code, it might be a good chance to get rid of things that are deprecated as well. So you might simply change your filtered query with a boolean query as described here.

Remove FuzzyLikeThis and FuzzyLikeThisField

Both have been removed from Elasticsearch 2.0 as well.

Remove LimitFilter

The limit filter is deprecated in Elasticsearch 2.0 and becomes a no-op. Now is a good chance to remove it from your application as well. Use the terminate_after parameter in your search as described here to achieve similar effects.

Remove _cache and _cache_key from filters

Both have been removed from Elasticsearch 2.0 as well.

Partial fields are gone

Partial fields are removed in Elasticsearch 2.0 in favor of source filtering.

Scripting

A Script type has been added to Elastic 3.0. In Elastic 2.0, there were various places (e.g. aggregations) where you could just add the script as a string, specify the scripting language, add parameters etc. With Elastic 3.0, you should now always use the Script type.

Example for Elastic 2.0 (old):

update, err := client.Update().Index("twitter").Type("tweet").Id("1").
	Script("ctx._source.retweets += num").
	ScriptParams(map[string]interface{}{"num": 1}).
	Upsert(map[string]interface{}{"retweets": 0}).
	Do()

Example for Elastic 3.0 (new):

update, err := client.Update().Index("twitter").Type("tweet").Id("1").
	Script(elastic.NewScript("ctx._source.retweets += num").Param("num", 1)).
	Upsert(map[string]interface{}{"retweets": 0}).
	Do()

Cluster State

The combination of Metric(string) and Metrics(...string) has been replaced by a single func with the signature Metric(...string).

Unexported structs in response

Services generally return a typed response from a Do func. Those structs are exported so that they can be passed around in your own application. In Elastic 3.0 however, we changed that (most) sub-structs are now unexported, meaning: You can only pass around the whole response, not sub-structures of it. This makes it easier for restructuring responses according to the Elasticsearch API. See ClusterStateResponse as an example.

Add offset to Histogram aggregation

Histogram aggregations now have an offset option.

Services

REST API specification

As you might know, Elasticsearch comes with a REST API specification. The specification describes the endpoints in a JSON structure.

Most services in Elastic predated the REST API specification. We are in the process of bringing all these services in line with the specification. Services can be generated by go generate (not 100% automatic though). This is an ongoing process.

This probably doesn't mean a lot to you. However, you can now be more confident that Elastic supports all features that the REST API specification describes.

At the same time, the file names of the services are renamed to match the REST API specification naming.

REST API Test Suite

The REST API specification of Elasticsearch comes along with a test suite that official clients typically use to test for conformance. Up until now, Elastic didn't run this test suite. However, we are in the process of setting up infrastructure and tests to match this suite as well.

This process in not completed though.