Replace post meta sync storage with dedicated wp_collaboration table#11068
Replace post meta sync storage with dedicated wp_collaboration table#11068josephfusco wants to merge 72 commits intoWordPress:trunkfrom
wp_collaboration table#11068Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
sync_updates tablesync_updates table
peterwilsoncc
left a comment
There was a problem hiding this comment.
First pass note inline.
This approach had occurred to me and I think it's probably quite a good idea.
But, as always, there is a but.
I know @m has been reluctant in the past to add new tables to the database schema. I guess (and Matt will be able to correct me) that's to avoid ending up with a database scheme of dozens and dozens of tables.
As not all sites run the database upgrade routine on a regular basis (I think wordpress.org is often behind) as it can be quite a burden on a site, if this approach is to be taken it would need to include a check for the table's existence in a couple of locations:
- The
option_{$option}hook - Before displaying the options field on the writing page
Some form of filtering would need to be available for sites that are using custom data stores.
Sorry to be that person. I'm not trying to be negative, just share knowledge I've learnt over the last few years.
|
@peterwilsoncc Thanks for the thorough review — please don't apologize, this is exactly the kind of institutional knowledge that's invaluable.
Completely agree. I believe I've audited every code path that reads
When the setting is saved, the flow through options.php → update_option() writes to the wp_options table. No code is hooked onto that option change that would interact with the sync_updates table, so toggling the setting is safe regardless of table state.
The checkbox in The code path that touches the table is the REST route registration in
The Let me know if I've missed anything or if you think additional safeguards would be worthwhile! |
0104993 to
c13f1cd
Compare
Yeah, this has been an issue, but I think it's primarily/only really an issue for database tables that are used for frontend features. Given that the proposed
I haven't touched custom tables for a long time, so I'm not aware of the state of the art here. But I do know that WooCommerce made the switch to using custom tables instead of postmeta for the sake of scalabilitiy, simplicity, and reliability. Doing a quick search in WPDirectory and I see that Jetpack, Yoast, Elementor, WPForms, WordFence, and many others use custom tables as well. |
This comment was marked as off-topic.
This comment was marked as off-topic.
355e7f2 to
7f57c63
Compare
This isn't quite correct, on multi site installs site (ie, non super admins) don't get prompted to upgrade the database tables and globally upgrading tables by super admins can be prevented using a filter: add_filter( 'map_meta_cap', function( $caps, $cap ) {
if ( 'upgrade_network' === $cap ) {
$caps[] = 'do_not_allow';
}
return $caps;
}, 10, 4 );To reproduce:
Keep in mind that the sub site admin can enable site RTC on the writing page, even though they can't update the tables.
Not really relevant for the core schema. |
Seems relevant to me, as plugins may opt to introduce new tables if the core schema doesn't provide an efficient way to store the desired data. In the same way, if a new feature for core can't be represented efficiently in the core schema, then so too should a new table be considered. |
|
@westonruter Well, no, because my point wasn't about whether the tables would be more performant (I said it's probably a good idea in my comment), my point was that Matt has a reluctance to add more tables to the database schema. Whether that reluctance still remains is worth asking but if it does then the plugin actions aren't relevant to core. |
Co-authored-by: Peter Wilson <519727+peterwilsoncc@users.noreply.github.com>
Adds a db_version check before registering the collaboration REST routes so sites that haven't run the upgrade routine yet don't hit a fatal error from the missing sync_updates table.
Schedule a daily cron job to delete sync update rows older than 1 day, preventing unbounded table growth from abandoned collaborative editing sessions. The expiration is filterable via the wp_sync_updates_expiration hook.
Store the room identifier string directly instead of its md5 hash. Room strings like "postType/post:42" are short, already validated by the REST API schema, and storing them verbatim improves debuggability.
Renames all test_sync_* methods to test_collaboration_* and the dispatch_sync helper to dispatch_collaboration for consistency with the subsystem rename. Deprecated route test names are intentionally preserved as test_deprecated_sync_route_*.
DO_NOT_RELEASE_prove-data-loss.php → DO_NOT_RELEASE_prove-beta3-post_meta_data_loss.php
src/wp-includes/collaboration/class-wp-http-polling-collaboration-server.php
Outdated
Show resolved
Hide resolved
src/wp-includes/collaboration/class-wp-http-polling-collaboration-server.php
Outdated
Show resolved
Hide resolved
src/wp-includes/collaboration/class-wp-http-polling-collaboration-server.php
Show resolved
Hide resolved
…ion-server.php Co-authored-by: Weston Ruter <westonruter@gmail.com>
Co-authored-by: Weston Ruter <westonruter@gmail.com>
…ion-server.php Co-authored-by: Weston Ruter <westonruter@gmail.com>
wp_collaboration_inject_setting was hooked to admin_init, but wp-core-data is not registered at that point, causing wp_add_inline_script to silently fail. Move the hook to enqueue_block_editor_assets where the handle is guaranteed to exist. Also replace the fragile form-scraping setCollaboration helper with requestUtils.updateSiteSettings and tolerate pre-existing test users so interrupted runs do not break subsequent test executions.
Use WP_CLI::log status lines consistent with the performance benchmark script instead of a progress bar.
Change the (room, event_type, client_id) index to UNIQUE and replace the DELETE + INSERT pair in set_awareness_state() with a single INSERT … ON DUPLICATE KEY UPDATE. The row is never absent between statements, which eliminates the window where a concurrent get_awareness_state() could see zero rows for a client and prevents duplicate rows from two concurrent set_awareness_state() calls.
Drop the inline DELETE from get_awareness_state(). The WHERE clause already filters expired rows from results, so the DELETE was redundant for correctness. Removing it makes the read path truly read-only, which matters because check_permissions() calls get_awareness_state() on every poll — permission callbacks should not perform writes. Cleanup of expired rows is handled by cron (wp_delete_old_collaboration_data) and natural overwrite via the UPSERT in set_awareness_state().
- Update wp_delete_old_collaboration_data() docblock to mention awareness row cleanup alongside sync-update cleanup. - Add inline comment explaining the 60-second threshold (2× the 30-second awareness timeout). - Rename test to test_collaboration_awareness_preserved_across_separate_upserts to accurately describe what sequential dispatch proves. - Update test_collaboration_expired_awareness_rows_cleaned_up to verify that expired rows are filtered from results (not returned) rather than deleted inline, and separately assert that wp_delete_old_collaboration_data() handles actual deletion.
Split awareness into its own table so the two concerns have independent schemas matching their actual usage: wp_collaboration is append-only sync updates (no client_id, no event_type), wp_awareness is per-client upsert rows with a UNIQUE KEY on (room, client_id).
Update all storage class methods to target the correct table: awareness reads/writes go to wp_awareness, sync update queries drop the now-unnecessary event_type filter. Cron cleanup splits its DELETEs across both tables with their respective retention windows.
Clear wp_awareness in setUp, remove event_type from test inserts, add get_awareness_row_count() helper, and update cross-table assertions to count each table independently.
|
@josephfusco It looks like the types can be made more specific. For example: diff --git a/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php b/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
index 894777e608..8c62cc1ab0 100644
--- a/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
+++ b/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
@@ -15,6 +15,8 @@
* @since 7.0.0
*
* @access private
+ *
+ * @phpstan-import-type AwarenessState from WP_Collaboration_Storage
*/
class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
/**
@@ -73,7 +75,8 @@ class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
*
* @param string $room Room identifier.
* @param int $timeout Seconds before an awareness entry is considered expired.
- * @return array<int, array{client_id: int, state: mixed, wp_user_id: int}> Awareness entries.
+ * @return array<int, array> Awareness entries.
+ * @phpstan-return array<int, AwarenessState>
*/
public function get_awareness_state( string $room, int $timeout = 30 ): array {
global $wpdb;
@@ -241,10 +244,10 @@ class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
*
* @global wpdb $wpdb WordPress database abstraction object.
*
- * @param string $room Room identifier.
- * @param int $client_id Client identifier.
- * @param array $state Serializable awareness state for this client.
- * @param int $wp_user_id WordPress user ID that owns this client.
+ * @param string $room Room identifier.
+ * @param int $client_id Client identifier.
+ * @param array<string, mixed> $state Serializable awareness state for this client.
+ * @param int $wp_user_id WordPress user ID that owns this client.
* @return bool True on success, false on failure.
*/
public function set_awareness_state( string $room, int $client_id, array $state, int $wp_user_id ): bool {
diff --git a/src/wp-includes/collaboration/interface-wp-collaboration-storage.php b/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
index 9550384da5..13dec0d8da 100644
--- a/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
+++ b/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
@@ -10,6 +10,8 @@
* Interface for collaboration storage backends used by the collaborative editing server.
*
* @since 7.0.0
+ *
+ * @phpstan-type AwarenessState array{client_id: int, state: array<string, mixed>, wp_user_id: int}
*/
interface WP_Collaboration_Storage {
/**
@@ -32,7 +34,8 @@ interface WP_Collaboration_Storage {
*
* @param string $room Room identifier.
* @param int $timeout Seconds before an awareness entry is considered expired.
- * @return array<int, array{client_id: int, state: mixed, wp_user_id: int}> Awareness entries.
+ * @return array<int, array> Awareness entries.
+ * @phpstan-return array<int, AwarenessState>
*/
public function get_awareness_state( string $room, int $timeout = 30 ): array;
@@ -86,10 +89,10 @@ interface WP_Collaboration_Storage {
*
* @since 7.0.0
*
- * @param string $room Room identifier.
- * @param int $client_id Client identifier.
- * @param array $state Serializable awareness state for this client.
- * @param int $wp_user_id WordPress user ID that owns this client.
+ * @param string $room Room identifier.
+ * @param int $client_id Client identifier.
+ * @param array<string, mixed> $state Serializable awareness state for this client.
+ * @param int $wp_user_id WordPress user ID that owns this client.
* @return bool True on success, false on failure.
*/
public function set_awareness_state( string $room, int $client_id, array $state, int $wp_user_id ): bool; |
src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
Outdated
Show resolved
Hide resolved
…rage.php Co-authored-by: Weston Ruter <westonruter@gmail.com>
…hfusco/wordpress-develop into feature/sync-updates-table
The real-time collaboration sync layer currently stores messages as post meta, which works but creates side effects at scale. This moves it to a dedicated
wp_collaborationtable purpose-built for the workload.Additionally, the subsystem has been renamed from 'sync' to 'collaboration' throughout. A deprecated
wp-sync/v1REST route alias is preserved for backward compatibility with the Gutenberg plugin during its transition.The beta 1–3 implementation stores sync messages as post meta on a private
wp_sync_storagepost type. Post meta is designed for static key-value data, not high-frequency transient message passing. This mismatch causes:1. Data loss during compaction
When the sync log gets long, a client cleans it up by deleting all messages then re-inserting the ones that still matter. Between delete and re-insert, the history is empty. Any editor polling during that window sees nothing; any edit sent during it is silently dropped. Editors poll multiple times per second, so the gap gets hit regularly with two or more people editing.
Reproduce with the included script:
2. Silent cursor collisions
Each sync message gets a millisecond timestamp as its ID. When two editors send updates within the same millisecond, both messages get the same ID. The second one is skipped. The edit vanishes with no error.
3. Cache thrashing
Every sync write touches post meta. Every post meta write invalidates the site-wide post query cache (
wp_cache_set_posts_last_changed()). One open editor tab continuously invalidates caches for every visitor on the entire site. Five active editors can cause 20–30 cache invalidations per second, degrading object caching for front-end traffic.Comparison
wp_collaborationtable)DELETE … WHERE cursor <posts_last_changedA purpose-built table with auto-increment IDs eliminates all three at the root: no post meta hooks fire, compaction is a single atomic
DELETE, and auto-increment IDs guarantee unique ordering. TheWP_Collaboration_Storageinterface andWP_HTTP_Polling_Collaboration_Serverare unchanged.Testing
Note for beta 1–3 testers: If upgrading from a previous beta, the old wp_sync_updates table is no longer used and can be dropped:
npm run env:cli -- db query "DROP TABLE IF EXISTS wp_sync_updates"Credits
meta_idcursor approach, awareness→transients, race condition testing (wordpress-develop #11067)meta_valueindexing limitationsTrac ticket: https://core.trac.wordpress.org/ticket/64696
Use of AI Tools
Co-authored with Claude Code (Opus 4.6), used to synthesize discussion across related tickets and PRs into a single implementation. All code was reviewed and tested before submission.
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.