Skip to content

Simplify the definition of the consume function#1544

Open
YexuanXiao wants to merge 1 commit intomicrosoft:masterfrom
YexuanXiao:consume-upstream
Open

Simplify the definition of the consume function#1544
YexuanXiao wants to merge 1 commit intomicrosoft:masterfrom
YexuanXiao:consume-upstream

Conversation

@YexuanXiao
Copy link

@YexuanXiao YexuanXiao commented Mar 6, 2026

This PR extracts duplicate code logic into a function to reduce header file size, PCH size, and compilation time.

#1442 and #1448 removed the WINRT_IMPL_SHIM macro, causing the generated code to become bloated.
I've counted that the following pattern appears 33,597 times in the Windows namespace, which significantly increases header file size and actually slows down compilation.

        if constexpr (!std::is_same_v<D, %>)
        {
            winrt::hresult _winrt_cast_result_code;
            auto const _winrt_casted_result = impl::try_as_with_reason<%, D const*>(static_cast<D const*>(this), _winrt_cast_result_code);
            check_hresult(_winrt_cast_result_code);
            auto const _winrt_abi_type = *(abi_t<%>**)&_winrt_casted_result;
            _winrt_abi_type->%(%);
        }
        else
        {
            auto const _winrt_abi_type = *(abi_t<%>**)this;
            _winrt_abi_type->%(%);
        }%

This patch does not introduce any new logic; it merely adds one function call on top of the original code. Based on my testing, the 80MB header files can now be reduced to 60MB, the 2.35GB PCH file can be reduced to 2.2GB, and the time to build the PCH can be reduced from 120 seconds to 110 seconds.

Now, the above pattern is reduced to 1 line, repeated 33,597 times.

        consume_noexcept_remove_overload<%, D>(static_cast<D const*>(this), &abi_t<%>::%%);%

@YexuanXiao
Copy link
Author

@jonwis @dmachaj

@dmachaj
Copy link
Contributor

dmachaj commented Mar 6, 2026

My biggest concern would be binary size and/or additional function calls for the resulting module. The revised consume_ method needed several iterations for exactly those reasons. The generated code is called a lot so small differences can add up.

I would recommend compiling something with full optimizations enabled and then look at the disassembly for the consume_ function (or more likely the calling function where it was inlined). There should not be a function call to consume_noexcept_remove_overload. Rather, it should be inlined and avoid the cost of a function call.

Basically the consume_noexcept_remove_overload function should be a compiletime concept that isn't observable in the final optimized binary.

@YexuanXiao
Copy link
Author

YexuanXiao commented Mar 7, 2026

My biggest concern would be binary size and/or additional function calls for the resulting module. The revised consume_ method needed several iterations for exactly those reasons. The generated code is called a lot so small differences can add up.

I would recommend compiling something with full optimizations enabled and then look at the disassembly for the consume_ function (or more likely the calling function where it was inlined). There should not be a function call to consume_noexcept_remove_overload. Rather, it should be inlined and avoid the cost of a function call.

Basically the consume_noexcept_remove_overload function should be a compiletime concept that isn't observable in the final optimized binary.

I wrote two tests, one that calls uri.AbsoluteUri() once and another that calls uri.AbsoluteUri() five times in succession, using x64 Release. I manually verified the assembly, and each call was inlined. There are many factors that determine how the compiler decides to inline, but I believe this sufficiently demonstrates that the new function does not hinder optimization.

@jonwis
Copy link
Member

jonwis commented Mar 7, 2026

There's a few other large cppwinrt-using projects out there. One of the larger ones that uses the most of these tools is https://github.com/CommunityToolkit/Lottie-Windows - lots of queryinterface, string management, moving up and down the interface hierarchy, etc. Consider using SizeBench to compare the outputs of pre & post. I'm less worried about clock time and more about the codegen. I buy that it's probably going to be identical.

@sylveon
Copy link
Contributor

sylveon commented Mar 7, 2026

Plugging https://github.com/TranslucentTB/TranslucentTB as another large-ish cppwinrt using project :)

@YexuanXiao
Copy link
Author

I used the animation LottieLogo1.json and verified that it loads and displays properly in my program.

The SizeBench results are as follows:

2026-03-07 091445

I'm not familiar with SizeBench. If anyone would like to examine my program's results more deeply, I can send the pdb and exe files.

@YexuanXiao
Copy link
Author

YexuanXiao commented Mar 7, 2026

Since the caller (consume_Type::Method) needs to pass the implementation type's member pointer to the new function template, the caller is the sole user of the instantiated function.

I believe the size improvements in the result come from the new function adding an additional noexcept layer. The results were actually quite surprising, some functions were completely eliminated thanks to this optimization.

2026-03-07 093808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants