Skip to content

Add Louvain community detection algorithm#453

Open
Becheler wants to merge 28 commits intoboostorg:developfrom
Becheler:feature/louvain-algorithm
Open

Add Louvain community detection algorithm#453
Becheler wants to merge 28 commits intoboostorg:developfrom
Becheler:feature/louvain-algorithm

Conversation

@Becheler
Copy link
Contributor

@Becheler Becheler commented Feb 2, 2026

Implement #447

  • Multi-level modularity optimization following Blondel et al. (2008)
  • Supports custom quality functions (other than modularity) with policy-based design (extensions to come to propose alternative quality functions to match gen-louvain)
  • Incremental quality tracking
  • Lazy rollback
  • Vertex shuffling
  • Competitive with established implementation, see benchmarks here

Review-driven changes implemented in this PR:

  • Keep revision history / metadata in git (remove embedded revision-history metadata)
  • Define Boost concepts for user-provided graph partition quality functions (including incremental variant)
  • Allow users to provide a quality function (support both naive and incremental implementations)
  • Make RNG configurable by the user (URBG-style), with a default
  • Use BOOST_ASSERT instead of assert
  • Fix size()/loop index type mismatch (std::size_t)
  • Return vertex partitions as contiguous integral labels (match BGL interfaces)
  • Use index-based storage instead of hash maps to access vertices (generic across graph types)
  • Update the documentation to reflect those changes
  • Provide the documentation requested by Joaquin (louvain_clustering, louvain_quality_functions)
  • Rework unfold helper/type based on reviewer suggestion (free function in detail namespace; avoid _t suffix for classes)
  • Assert/check add_edge insertion success when using add_edge return value

Review items still open:

  • Make internal containers fully user-configurable via templating the map type (avoid hard-coding std::map/std::set; consider flat_map default)

I reviewed allocating container in the file (4 maps/sets and ~13 vectors). They all live in louvain_detail internals or are local scratch inside the public function, they do not appear in the API signature.

The maps operate on types created by the internal aggregate() (agg_vertex_t, community_type) that the user never sees. The set only allocates in the slow/non-incremental fallback path. The final remap is O(k) and one-shot and exists only to give meaningful contiguous labels. The vectors back iterator_property_map, which requires random-access iterators.

All maps/sets already use boost::unordered_flat_map/unordered_flat_set so the flat_map suggestion is effectively already done.

Templating the container type would add a template parameter that leaks internal types into the public API for no user-observable benefit : the containers are transient scratch rebuilt each pass, not long-lived state.

In short I would rather keep the API simple for now and avoid making too much noise (the genericity level looks already fairly complex to me), but I would be happy to revisit if a concrete use case comes up. I would rather focus on a next PR about more direct benefits:

  • Supporting undirected CSR graphs
  • Supporting directed graphs (requires quality function tweaks)
  • Supporting generic termination criterion as different application domains may require different stopping conditions, including fixed gain thresholds (Campigotto et al., 2014), threshold scaling (Halappanavar et al., 2017), decisions learned from gain decay patterns, or number of vertices moved, or a mix of those.

@jeremy-murphy
Copy link
Collaborator

Not sure why this error is occurring only for OSX 11.7 C++14. I assume this is not specific to your code.

In file included from ../../../boost/container/detail/operator_new_helpers.hpp:26:
../../../boost/container/detail/aligned_allocation.hpp:87:26: error: no member named 'aligned_alloc' in the global namespace; did you mean 'aligned_allocate'?
   return rounded_size ? ::aligned_alloc(al, rounded_size) : 0;
                         ^~~~~~~~~~~~~~~
                         aligned_allocate
../../../boost/container/detail/aligned_allocation.hpp:77:14: note: 'aligned_allocate' declared here
inline void* aligned_allocate(std::size_t al, std::size_t sz)
             ^
1 error generated.

Copy link
Collaborator

@jeremy-murphy jeremy-murphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, more later.

Comment on lines +43 to +44
std::map<VertexDescriptor, WeightType> internal_weights;
std::map<VertexDescriptor, std::set<VertexDescriptor>> vertex_mapping;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid hard-coding the choice of data structure, especially std::map and std::set, so instead of templating VertexDescriptor and WeightType we should template the whole map type, so users can use boost::unordered_map or some other kind of property map of their own choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhh I see. I felt it was not in the BGL spirit to do so, and Joaquin also mentioned it, but I was not sure how to solve it without making the API heavy. Can we default to a concrete type to simplify user's experience ? Also, is it ok to use boost::unordered_map if it adds the constraint of key_type being hashable ? That was my idea behind using std::map

Copy link
Collaborator

@jeremy-murphy jeremy-murphy Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, we can still (and should) make the user experience nice with defaults. That's the great thing, we get both benefits, the cost is more work on the part of the library authors. :)
Again, I think astar is an example, but instead of using param = arg in the function definition, make an overload, like so:

auto user_friendly_foo(graph const &g) {
ConcreteA a;
ConcreteB b;
return generic_foo(g, a, b);
}

Sorry, typing on my phone, so please ignore random syntax errors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, yeah, we probably shouldn't add new constraints into the default interface. Users will too easily assume that the constraint is mandatory.
So maybe use boost::flat_map as the default and users can always use a hash map if they want to.

Comment on lines +68 to +70
std::set<community_type> unique_communities;
std::map<community_type, vertex_descriptor> comm_to_vertex;
std::map<vertex_descriptor, std::set<vertex_descriptor>> vertex_to_originals;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should almost certainly be input parameters taken by non-const reference so that the user a) decides their type and b) automatically gets their value at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they gain access to the whole hierarchy. Ufff it's a lot of guts leaking out haha
Will do, and tell you in case of problemsm thanks again for your time !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong. But have a look at the astar API for some examples of prior art.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on second thought, this is not a priority, we can always do it later. Getting it correct and fast are higher priorities.

this was not supposed to be commited :)
this was not supposed to be commited :)
// L_c = internal edge weight for community c
// k_c = sum of degrees in community c
// m = total edge weight / 2
struct newman_and_girvan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this modularity thing a general concept or does it apply to Louvain only?

Copy link
Contributor Author

@Becheler Becheler Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a yes and no situation.

  • The modularity can be used outside of louvain to assess partition quality of a graph.
  • But the current implementation with incremental computations (remove, insert, gain) is particularly suited for Louvain.
  • Making it generally useful would require to disentangle the two aspects.
  • But I would rather do it in a clustering folder so we can have:
include/boost/graph/clustering/
├── quality_functions.hpp        # 10 incremental metrics/criterions for gen-louvain
├── label_propagation.hpp      # another clustering method (future work)
├── leiden.hpp                          # Leiden algorithm (future work)
├── louvain.hpp                        # Louvain algorithm
├── girvan_newman.hpp         # Edge betweenness clustering (currently in bc_clustering.hpp)

Copy link
Collaborator

@jeremy-murphy jeremy-murphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more requests, still going...


<H3>Parameters</H3>

IN: <tt>const Graph&amp; g</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm has the additional requirement that vertices are copyable, hashable etc., as they're internally stored in unordered_sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have been changing the vertices handling in this aspect because it was not friendly with some types of graphs. The interface now takes a VertexIndexMap but I still have to commit those changes, sorry 😓
I will update the documentation in that sense once I merged the new stuff

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an open issue or resolved?

@jeremy-murphy
Copy link
Collaborator

Btw, are the benchmarks in the description current? I mean, did the performance change at all since you swapped some standard map containers for Boost ones?
I'm curious to know if it made a difference but in the description we just want whatever is the latest benchmark.

@Becheler
Copy link
Contributor Author

Becheler commented Feb 24, 2026

@jeremy-murphy

  • I have this benchmark repository here: https://github.com/Becheler/boost-graph-benchmarks/tree/main/louvain
  • they are run locally then results pushed, but we chatted with Joaquin about making it a CI thing
  • switching hashmaps did not improve performance, but I have been switching to indexing anyway because it was more generic for some types of graphs
  • I haven't done a full benchmark since the last changes (its a few hours of computation lol) so the figures are somewhat outdated
  • The seemingly really fast performance gain between genlouvain and igraph comes from a different tradeoff between termination criterion and partition quality:
    • I compared bgl 0 threshold against genlouvain 10-6 threshold.
    • I aligned their behavior along the same total number of passes
    • the decreasing curves answer "what percentage of total gain is still unrealized after n passes"
    • the little bars answer "how much this pass contributed"
    • and you can see the 5 levels of aggregation as the algorithm tries to find the best partition into communities
    • what is interesting is that local optimization in the first level eats most of the passes for very little gain.
    • min_inner_threshold affect the number of passes spent inside each level
    • min_outer_threshold affect the total number of levels explored
    • And so the graph shows that (for this graph case) min_inner_threshold is the main driver of the number of iterations:
image

@jeremy-murphy
Copy link
Collaborator

Btw, you can use - [ ] and - [x] in the markup to make a todo list and tick things off as they're done. It will make it easier for us as reviewers to know where you are up to. E.g.:

  • done
  • not done

Comment on lines +12 to +18
#ifndef BOOST_GRAPH_LOUVAIN_TRUST_AGGREGATED_Q
#define BOOST_GRAPH_LOUVAIN_TRUST_AGGREGATED_Q 0
#endif

#ifndef BOOST_GRAPH_LOUVAIN_TRACK_PEAK_Q
#define BOOST_GRAPH_LOUVAIN_TRACK_PEAK_Q 0
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I only just discovered these macros, sorry, and I'm just wondering if they are just meant for private experimental purposes or if users would want to use them too?

If they're for private experiments, I think I would prefer they were kept on a private branch. I mean, I think we should only merge into develop the code that is useful to everyone.

If they are useful to everyone, then they need to be documented and I don't think macros are really OK.

I think once this is resolved, then we're good to merge. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants