Skip to content

Replace post meta sync storage with dedicated wp_collaboration table#11068

Open
josephfusco wants to merge 72 commits intoWordPress:trunkfrom
josephfusco:feature/sync-updates-table
Open

Replace post meta sync storage with dedicated wp_collaboration table#11068
josephfusco wants to merge 72 commits intoWordPress:trunkfrom
josephfusco:feature/sync-updates-table

Conversation

@josephfusco
Copy link

@josephfusco josephfusco commented Feb 26, 2026

The real-time collaboration sync layer currently stores messages as post meta, which works but creates side effects at scale. This moves it to a dedicated wp_collaboration table purpose-built for the workload.

Additionally, the subsystem has been renamed from 'sync' to 'collaboration' throughout. A deprecated wp-sync/v1 REST route alias is preserved for backward compatibility with the Gutenberg plugin during its transition.

The beta 1–3 implementation stores sync messages as post meta on a private wp_sync_storage post type. Post meta is designed for static key-value data, not high-frequency transient message passing. This mismatch causes:

1. Data loss during compaction

When the sync log gets long, a client cleans it up by deleting all messages then re-inserting the ones that still matter. Between delete and re-insert, the history is empty. Any editor polling during that window sees nothing; any edit sent during it is silently dropped. Editors poll multiple times per second, so the gap gets hit regularly with two or more people editing.

Reproduce with the included script:

npm run env:cli -- eval-file tools/local-env/scripts/collaboration-perf/DO_NOT_RELEASE_prove-beta3-post_meta_data_loss.php

2. Silent cursor collisions

Each sync message gets a millisecond timestamp as its ID. When two editors send updates within the same millisecond, both messages get the same ID. The second one is skipped. The edit vanishes with no error.

3. Cache thrashing

Every sync write touches post meta. Every post meta write invalidates the site-wide post query cache (wp_cache_set_posts_last_changed()). One open editor tab continuously invalidates caches for every visitor on the entire site. Five active editors can cause 20–30 cache invalidations per second, degrading object caching for front-end traffic.

Comparison

Problem Beta 1–3 (post meta) This PR (wp_collaboration table)
Compaction Multi-step delete + re-insert; race window Single atomic DELETE … WHERE cursor <
Message IDs Millisecond timestamps; collisions possible Auto-increment; guaranteed unique ordering
Cache impact Every write invalidates posts_last_changed No post meta hooks; no cache churn

A purpose-built table with auto-increment IDs eliminates all three at the root: no post meta hooks fire, compaction is a single atomic DELETE, and auto-increment IDs guarantee unique ordering. The WP_Collaboration_Storage interface and WP_HTTP_Polling_Collaboration_Server are unchanged.

Testing

# Unit tests
npm run test:php -- --filter WP_Test_REST_Collaboration_Server

# E2E tests — 3 concurrent users (presence, sync, undo/redo)
npm run test:e2e -- tests/e2e/specs/collaboration/

# Performance benchmark
npm run test:performance:collaboration

Note for beta 1–3 testers: If upgrading from a previous beta, the old wp_sync_updates table is no longer used and can be dropped: npm run env:cli -- db query "DROP TABLE IF EXISTS wp_sync_updates"

Credits

Trac ticket: https://core.trac.wordpress.org/ticket/64696

Use of AI Tools

Co-authored with Claude Code (Opus 4.6), used to synthesize discussion across related tickets and PRs into a single implementation. All code was reviewed and tested before submission.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

github-actions bot commented Feb 26, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props joefusco, peterwilsoncc, mindctrl, westonruter, paulkevan, dd32, czarate.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@josephfusco josephfusco changed the title feat: replace post meta sync storage with dedicated sync_updates table Replace post meta sync storage with dedicated sync_updates table Feb 26, 2026
Copy link
Contributor

@peterwilsoncc peterwilsoncc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass note inline.

This approach had occurred to me and I think it's probably quite a good idea.

But, as always, there is a but.

I know @m has been reluctant in the past to add new tables to the database schema. I guess (and Matt will be able to correct me) that's to avoid ending up with a database scheme of dozens and dozens of tables.

As not all sites run the database upgrade routine on a regular basis (I think wordpress.org is often behind) as it can be quite a burden on a site, if this approach is to be taken it would need to include a check for the table's existence in a couple of locations:

  • The option_{$option} hook
  • Before displaying the options field on the writing page

Some form of filtering would need to be available for sites that are using custom data stores.

Sorry to be that person. I'm not trying to be negative, just share knowledge I've learnt over the last few years.

@josephfusco
Copy link
Author

@peterwilsoncc Thanks for the thorough review — please don't apologize, this is exactly the kind of institutional knowledge that's invaluable.

As not all sites run the database upgrade routine on a regular basis (I think wordpress.org is often behind) as it can be quite a burden on a site, if this approach is to be taken it would need to include a check for the table's existence in a couple of locations:

Completely agree. I believe I've audited every code path that reads wp_enable_real_time_collaboration to map out what actually needs protection.

  • The option_{$option} hook

When the setting is saved, the flow through options.php → update_option() writes to the wp_options table. No code is hooked onto that option change that would interact with the sync_updates table, so toggling the setting is safe regardless of table state.

  • Before displaying the options field on the writing page

The checkbox in options-writing.php calls get_option() to check/uncheck itself — it doesn't query the sync_updates table, so it renders safely whether or not the upgrade has run.

The code path that touches the table is the REST route registration in rest-api.php:433, which instantiates WP_Sync_Table_Storage. That's already gated with a db_version >= 61697 check (c13f1cd), so the table is never accessed on sites that haven't run the upgrade.

Some form of filtering would need to be available for sites that are using custom data stores.

The wp_sync_storage filter (line 443) should cover this — it lets sites swap out WP_Sync_Table_Storage for any custom WP_Sync_Storage implementation (Redis, etc.).

Let me know if I've missed anything or if you think additional safeguards would be worthwhile!

@josephfusco josephfusco force-pushed the feature/sync-updates-table branch 2 times, most recently from 0104993 to c13f1cd Compare February 27, 2026 16:40
@westonruter
Copy link
Member

@peterwilsoncc:

As not all sites run the database upgrade routine on a regular basis (I think wordpress.org is often behind) as it can be quite a burden on a site, if this approach is to be taken it would need to include a check for the table's existence in a couple of locations:

  • The option_{$option} hook
  • Before displaying the options field on the writing page

Some form of filtering would need to be available for sites that are using custom data stores.

Yeah, this has been an issue, but I think it's primarily/only really an issue for database tables that are used for frontend features. Given that the proposed wp_sync_updates table here is exclusively used in the post editor which is in the admin, and access to the admin requires first doing a DB upgrade screen, then I think this wouldn't be a concern.

I know Matt has been reluctant in the past to add new tables to the database schema. I guess (and Matt will be able to correct me) that's to avoid ending up with a database scheme of dozens and dozens of tables.

I haven't touched custom tables for a long time, so I'm not aware of the state of the art here. But I do know that WooCommerce made the switch to using custom tables instead of postmeta for the sake of scalabilitiy, simplicity, and reliability. Doing a quick search in WPDirectory and I see that Jetpack, Yoast, Elementor, WPForms, WordFence, and many others use custom tables as well.

@westonruter

This comment was marked as off-topic.

@josephfusco josephfusco force-pushed the feature/sync-updates-table branch from 355e7f2 to 7f57c63 Compare February 27, 2026 21:53
@peterwilsoncc
Copy link
Contributor

peterwilsoncc commented Mar 1, 2026

Yeah, this has been an issue, but I think it's primarily/only really an issue for database tables that are used for frontend features. Given that the proposed wp_sync_updates table here is exclusively used in the post editor which is in the admin, and access to the admin requires first doing a DB upgrade screen, then I think this wouldn't be a concern.

This isn't quite correct, on multi site installs site (ie, non super admins) don't get prompted to upgrade the database tables and globally upgrading tables by super admins can be prevented using a filter:

add_filter( 'map_meta_cap', function( $caps, $cap ) {
	if ( 'upgrade_network' === $cap ) {
		$caps[] = 'do_not_allow';
	}
	return $caps;
}, 10, 4 );

To reproduce:

  1. Install MS
  2. Create a sub site
  3. Add a user to the network
  4. Add the new user as an admin on the sub site
  5. Login as sub site admin
  6. Bump DB version in version.php
  7. Visit dashboard as sub site admin
  8. Observe lack of db upgrade prompt
  9. Login as super admin
  10. Observe but don't click db upgrade prompt
  11. You can ignore the prompt as long as you wish as a super admin
  12. Add the code above as a mu-plugin
  13. Prompt disappears
  14. Visit /wp-admin/network/upgrade.php
  15. Observe permission denied message
  16. Remove mu-plugin added above to avoid confusion later.

Keep in mind that the sub site admin can enable site RTC on the writing page, even though they can't update the tables.

I haven't touched custom tables for a long time, so I'm not aware of the state of the art here. But I do know that WooCommerce made the switch to using custom tables instead of postmeta for the sake of scalabilitiy, simplicity, and reliability. Doing a quick search in WPDirectory and I see that Jetpack, Yoast, Elementor, WPForms, WordFence, and many others use custom tables as well.

Not really relevant for the core schema.

@westonruter
Copy link
Member

Not really relevant for the core schema.

Seems relevant to me, as plugins may opt to introduce new tables if the core schema doesn't provide an efficient way to store the desired data. In the same way, if a new feature for core can't be represented efficiently in the core schema, then so too should a new table be considered.

@peterwilsoncc
Copy link
Contributor

@westonruter Well, no, because my point wasn't about whether the tables would be more performant (I said it's probably a good idea in my comment), my point was that Matt has a reluctance to add more tables to the database schema. Whether that reluctance still remains is worth asking but if it does then the plugin actions aren't relevant to core.

josephfusco and others added 5 commits March 3, 2026 11:22
Co-authored-by: Peter Wilson <519727+peterwilsoncc@users.noreply.github.com>
Adds a db_version check before registering the collaboration REST
routes so sites that haven't run the upgrade routine yet don't hit
a fatal error from the missing sync_updates table.
Schedule a daily cron job to delete sync update rows older than
1 day, preventing unbounded table growth from abandoned
collaborative editing sessions. The expiration is filterable
via the wp_sync_updates_expiration hook.
Store the room identifier string directly instead of its md5 hash.
Room strings like "postType/post:42" are short, already validated by
the REST API schema, and storing them verbatim improves debuggability.
Renames all test_sync_* methods to test_collaboration_* and the
dispatch_sync helper to dispatch_collaboration for consistency
with the subsystem rename. Deprecated route test names are
intentionally preserved as test_deprecated_sync_route_*.
DO_NOT_RELEASE_prove-data-loss.php →
DO_NOT_RELEASE_prove-beta3-post_meta_data_loss.php
josephfusco and others added 18 commits March 5, 2026 13:50
…ion-server.php

Co-authored-by: Weston Ruter <westonruter@gmail.com>
Co-authored-by: Weston Ruter <westonruter@gmail.com>
…ion-server.php

Co-authored-by: Weston Ruter <westonruter@gmail.com>
wp_collaboration_inject_setting was hooked to admin_init, but
wp-core-data is not registered at that point, causing
wp_add_inline_script to silently fail. Move the hook to
enqueue_block_editor_assets where the handle is guaranteed to exist.

Also replace the fragile form-scraping setCollaboration helper with
requestUtils.updateSiteSettings and tolerate pre-existing test users
so interrupted runs do not break subsequent test executions.
Use WP_CLI::log status lines consistent with the performance
benchmark script instead of a progress bar.
Change the (room, event_type, client_id) index to UNIQUE and replace the
DELETE + INSERT pair in set_awareness_state() with a single INSERT … ON
DUPLICATE KEY UPDATE.  The row is never absent between statements, which
eliminates the window where a concurrent get_awareness_state() could see
zero rows for a client and prevents duplicate rows from two concurrent
set_awareness_state() calls.
Drop the inline DELETE from get_awareness_state().  The WHERE clause
already filters expired rows from results, so the DELETE was redundant
for correctness.  Removing it makes the read path truly read-only, which
matters because check_permissions() calls get_awareness_state() on every
poll — permission callbacks should not perform writes.

Cleanup of expired rows is handled by cron (wp_delete_old_collaboration_data)
and natural overwrite via the UPSERT in set_awareness_state().
- Update wp_delete_old_collaboration_data() docblock to mention awareness
  row cleanup alongside sync-update cleanup.
- Add inline comment explaining the 60-second threshold (2× the 30-second
  awareness timeout).
- Rename test to test_collaboration_awareness_preserved_across_separate_upserts
  to accurately describe what sequential dispatch proves.
- Update test_collaboration_expired_awareness_rows_cleaned_up to verify that
  expired rows are filtered from results (not returned) rather than deleted
  inline, and separately assert that wp_delete_old_collaboration_data()
  handles actual deletion.
Split awareness into its own table so the two concerns have independent
schemas matching their actual usage: wp_collaboration is append-only sync
updates (no client_id, no event_type), wp_awareness is per-client upsert
rows with a UNIQUE KEY on (room, client_id).
Update all storage class methods to target the correct table:
awareness reads/writes go to wp_awareness, sync update queries drop the
now-unnecessary event_type filter. Cron cleanup splits its DELETEs
across both tables with their respective retention windows.
Clear wp_awareness in setUp, remove event_type from test inserts,
add get_awareness_row_count() helper, and update cross-table assertions
to count each table independently.
@westonruter
Copy link
Member

westonruter commented Mar 7, 2026

@josephfusco It looks like the types can be made more specific. For example:

diff --git a/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php b/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
index 894777e608..8c62cc1ab0 100644
--- a/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
+++ b/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
@@ -15,6 +15,8 @@
  * @since 7.0.0
  *
  * @access private
+ *
+ * @phpstan-import-type AwarenessState from WP_Collaboration_Storage
  */
 class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
 	/**
@@ -73,7 +75,8 @@ class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
 	 *
 	 * @param string $room    Room identifier.
 	 * @param int    $timeout Seconds before an awareness entry is considered expired.
-	 * @return array<int, array{client_id: int, state: mixed, wp_user_id: int}> Awareness entries.
+	 * @return array<int, array> Awareness entries.
+	 * @phpstan-return array<int, AwarenessState>
 	 */
 	public function get_awareness_state( string $room, int $timeout = 30 ): array {
 		global $wpdb;
@@ -241,10 +244,10 @@ class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
 	 *
 	 * @global wpdb $wpdb WordPress database abstraction object.
 	 *
-	 * @param string $room       Room identifier.
-	 * @param int    $client_id  Client identifier.
-	 * @param array  $state      Serializable awareness state for this client.
-	 * @param int    $wp_user_id WordPress user ID that owns this client.
+	 * @param string               $room       Room identifier.
+	 * @param int                  $client_id  Client identifier.
+	 * @param array<string, mixed> $state      Serializable awareness state for this client.
+	 * @param int                  $wp_user_id WordPress user ID that owns this client.
 	 * @return bool True on success, false on failure.
 	 */
 	public function set_awareness_state( string $room, int $client_id, array $state, int $wp_user_id ): bool {
diff --git a/src/wp-includes/collaboration/interface-wp-collaboration-storage.php b/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
index 9550384da5..13dec0d8da 100644
--- a/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
+++ b/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
@@ -10,6 +10,8 @@
  * Interface for collaboration storage backends used by the collaborative editing server.
  *
  * @since 7.0.0
+ *
+ * @phpstan-type AwarenessState array{client_id: int, state: array<string, mixed>, wp_user_id: int}
  */
 interface WP_Collaboration_Storage {
 	/**
@@ -32,7 +34,8 @@ interface WP_Collaboration_Storage {
 	 *
 	 * @param string $room    Room identifier.
 	 * @param int    $timeout Seconds before an awareness entry is considered expired.
-	 * @return array<int, array{client_id: int, state: mixed, wp_user_id: int}> Awareness entries.
+	 * @return array<int, array> Awareness entries.
+	 * @phpstan-return array<int, AwarenessState>
 	 */
 	public function get_awareness_state( string $room, int $timeout = 30 ): array;
 
@@ -86,10 +89,10 @@ interface WP_Collaboration_Storage {
 	 *
 	 * @since 7.0.0
 	 *
-	 * @param string $room       Room identifier.
-	 * @param int    $client_id  Client identifier.
-	 * @param array  $state      Serializable awareness state for this client.
-	 * @param int    $wp_user_id WordPress user ID that owns this client.
+	 * @param string               $room       Room identifier.
+	 * @param int                  $client_id  Client identifier.
+	 * @param array<string, mixed> $state      Serializable awareness state for this client.
+	 * @param int                  $wp_user_id WordPress user ID that owns this client.
 	 * @return bool True on success, false on failure.
 	 */
 	public function set_awareness_state( string $room, int $client_id, array $state, int $wp_user_id ): bool;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants