Skip to content

bulk updates for performance#5896

Merged
rnewson merged 1 commit intomainfrom
nouveau-streaming-index-update
Mar 12, 2026
Merged

bulk updates for performance#5896
rnewson merged 1 commit intomainfrom
nouveau-streaming-index-update

Conversation

@rnewson
Copy link
Member

@rnewson rnewson commented Feb 20, 2026

Overview

Nouveau switched from ibrowse to gun (and http/1.1 to http/2) in 3.5.0, in order to reduce the large number of connections made between couchdb and nouveau server. A user has found, on a larger test than I performed during code development, a significant indexing speed regression.

Before the ibrowse to gun transition the update requests to nouveau server used http pipelining (that is, multiple requests were made to the server, in order, without waiting for the responses). This was a significant optimization. With gun this was not possible (as http/2 uses multiplexing instead). The difference turns out much more significant that expected.

This PR adds a new endpoint on the nouveau server that supports bulk update. Each item in the bulk list is a document update or delete request.

This has demonstrated a substantial performance improvement.

The single doc update and delete endpoints will be removed in a future release but will remain for a time for backward compatibility.

Testing recommendations

Will be covered by automated tests

Related Issues or Pull Requests

#5894

Checklist

  • This is my own work, I did not use AI, LLM's or similar technology
  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@rnewson rnewson force-pushed the nouveau-streaming-index-update branch 2 times, most recently from d4b4225 to 015167a Compare February 25, 2026 20:08
@rnewson
Copy link
Member Author

rnewson commented Feb 26, 2026

noting that the java side is less cooperative (won't start the response until the request is complete, so can't add flow control), but I am grateful as it has forced me to a tidier solution. I will add a bulk doc update endpoint in the JAX-RS fashion, a domain object that represents a small group of updates to make, and have the erlang side accumulate a batch of docs then issue the POST update and wait for its successful response. We get flow control, we avoid introducing json-seq and I can remove the ugly code in this PR too. Will try to get a version of that pushed to this PR over the weekend. The batch size will be configurable but I should be able to find a decent default as I can induce the problem locally quite easily.

@rnewson rnewson force-pushed the nouveau-streaming-index-update branch from 015167a to 24f5230 Compare March 7, 2026 18:45
@rnewson rnewson changed the title stream updates for performance bulk updates for performance Mar 7, 2026
@rnewson
Copy link
Member Author

rnewson commented Mar 7, 2026

pushed an update, not quite done yet ("purge with conflicts" test fails atm but the rest pass).

@rnewson rnewson force-pushed the nouveau-streaming-index-update branch from 24f5230 to 1802782 Compare March 8, 2026 00:01
@rnewson
Copy link
Member Author

rnewson commented Mar 8, 2026

all tests passing now (was a silly mistake in purge_index)

@rnewson
Copy link
Member Author

rnewson commented Mar 8, 2026

built 1 million doc index locally in 1m21s which I think is good

@rnewson rnewson force-pushed the nouveau-streaming-index-update branch from 1802782 to d2994e7 Compare March 12, 2026 13:23
DbPurgeSeq = couch_db:get_purge_seq(Db),
ok = nouveau_api:set_purge_seq(
Index, PurgeAcc3#purge_acc.index_purge_seq, DbPurgeSeq
Index, PurgeAcc4#purge_acc.index_purge_seq, DbPurgeSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not new behavior in the pr, but just out of curiosity, why do we pass in both the db purge seq and the latest index purge seq when we set the purge seq in the index. How is this used on the java side.

I guess we expected cases where these may diverge. One I can think of is if the last purge info was already in the PurgeAcc1#purge_acc.exclude_list list, then we'd would not bump the the accumulator purge sequence. So set_purge_seq may be called as set_purge_seq(Index, 99, 100)? But I can't see how that would be useful or what it would mean on the java side).


-define(JSON_CONTENT_TYPE, {"Content-Type", "application/json"}).

-deprecated([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have to be this strict with deprecated calls for nouveau. This would be local calls on this node. The remote call is possibly between erlang and java not erlang and erlang.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are mostly notes for me to remove the functions in a future release. We need at least one release with them for smooth upgrade (upgrade nouveau jvm side, which understands individual updates and bulk updates, then the couchdb side, which will then only send bulk updates)

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

With a few comments, one about purge is just curiosity about how it works, but not a blocker since it's existing behavior and we're not changing it.

The deprecation I was wondering if it's even worth bothering with, unless they leak to remote calls and may affect online cluster upgrades of course.

@rnewson rnewson merged commit 3afc6fd into main Mar 12, 2026
60 checks passed
@rnewson rnewson deleted the nouveau-streaming-index-update branch March 12, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants