Conversation
d4b4225 to
015167a
Compare
|
noting that the java side is less cooperative (won't start the response until the request is complete, so can't add flow control), but I am grateful as it has forced me to a tidier solution. I will add a bulk doc update endpoint in the JAX-RS fashion, a domain object that represents a small group of updates to make, and have the erlang side accumulate a batch of docs then issue the POST update and wait for its successful response. We get flow control, we avoid introducing json-seq and I can remove the ugly code in this PR too. Will try to get a version of that pushed to this PR over the weekend. The batch size will be configurable but I should be able to find a decent default as I can induce the problem locally quite easily. |
015167a to
24f5230
Compare
|
pushed an update, not quite done yet ("purge with conflicts" test fails atm but the rest pass). |
24f5230 to
1802782
Compare
|
all tests passing now (was a silly mistake in purge_index) |
|
built 1 million doc index locally in 1m21s which I think is good |
1802782 to
d2994e7
Compare
| DbPurgeSeq = couch_db:get_purge_seq(Db), | ||
| ok = nouveau_api:set_purge_seq( | ||
| Index, PurgeAcc3#purge_acc.index_purge_seq, DbPurgeSeq | ||
| Index, PurgeAcc4#purge_acc.index_purge_seq, DbPurgeSeq |
There was a problem hiding this comment.
This is not new behavior in the pr, but just out of curiosity, why do we pass in both the db purge seq and the latest index purge seq when we set the purge seq in the index. How is this used on the java side.
I guess we expected cases where these may diverge. One I can think of is if the last purge info was already in the PurgeAcc1#purge_acc.exclude_list list, then we'd would not bump the the accumulator purge sequence. So set_purge_seq may be called as set_purge_seq(Index, 99, 100)? But I can't see how that would be useful or what it would mean on the java side).
|
|
||
| -define(JSON_CONTENT_TYPE, {"Content-Type", "application/json"}). | ||
|
|
||
| -deprecated([ |
There was a problem hiding this comment.
I wonder if we have to be this strict with deprecated calls for nouveau. This would be local calls on this node. The remote call is possibly between erlang and java not erlang and erlang.
There was a problem hiding this comment.
They are mostly notes for me to remove the functions in a future release. We need at least one release with them for smooth upgrade (upgrade nouveau jvm side, which understands individual updates and bulk updates, then the couchdb side, which will then only send bulk updates)
nickva
left a comment
There was a problem hiding this comment.
+1
With a few comments, one about purge is just curiosity about how it works, but not a blocker since it's existing behavior and we're not changing it.
The deprecation I was wondering if it's even worth bothering with, unless they leak to remote calls and may affect online cluster upgrades of course.
Overview
Nouveau switched from ibrowse to gun (and http/1.1 to http/2) in 3.5.0, in order to reduce the large number of connections made between couchdb and nouveau server. A user has found, on a larger test than I performed during code development, a significant indexing speed regression.
Before the ibrowse to gun transition the update requests to nouveau server used http pipelining (that is, multiple requests were made to the server, in order, without waiting for the responses). This was a significant optimization. With gun this was not possible (as http/2 uses multiplexing instead). The difference turns out much more significant that expected.
This PR adds a new endpoint on the nouveau server that supports bulk update. Each item in the bulk list is a document update or delete request.
This has demonstrated a substantial performance improvement.
The single doc update and delete endpoints will be removed in a future release but will remain for a time for backward compatibility.
Testing recommendations
Will be covered by automated tests
Related Issues or Pull Requests
#5894
Checklist
rel/overlay/etc/default.inisrc/docsfolder