Add Louvain community detection algorithm#453
Add Louvain community detection algorithm#453Becheler wants to merge 28 commits intoboostorg:developfrom
Conversation
|
Not sure why this error is occurring only for OSX 11.7 C++14. I assume this is not specific to your code. |
jeremy-murphy
left a comment
There was a problem hiding this comment.
Just a few comments, more later.
| std::map<VertexDescriptor, WeightType> internal_weights; | ||
| std::map<VertexDescriptor, std::set<VertexDescriptor>> vertex_mapping; |
There was a problem hiding this comment.
We generally try to avoid hard-coding the choice of data structure, especially std::map and std::set, so instead of templating VertexDescriptor and WeightType we should template the whole map type, so users can use boost::unordered_map or some other kind of property map of their own choice.
There was a problem hiding this comment.
Mhhh I see. I felt it was not in the BGL spirit to do so, and Joaquin also mentioned it, but I was not sure how to solve it without making the API heavy. Can we default to a concrete type to simplify user's experience ? Also, is it ok to use boost::unordered_map if it adds the constraint of key_type being hashable ? That was my idea behind using std::map
There was a problem hiding this comment.
Oh yes, we can still (and should) make the user experience nice with defaults. That's the great thing, we get both benefits, the cost is more work on the part of the library authors. :)
Again, I think astar is an example, but instead of using param = arg in the function definition, make an overload, like so:
auto user_friendly_foo(graph const &g) {
ConcreteA a;
ConcreteB b;
return generic_foo(g, a, b);
}
Sorry, typing on my phone, so please ignore random syntax errors.
There was a problem hiding this comment.
Umm, yeah, we probably shouldn't add new constraints into the default interface. Users will too easily assume that the constraint is mandatory.
So maybe use boost::flat_map as the default and users can always use a hash map if they want to.
| std::set<community_type> unique_communities; | ||
| std::map<community_type, vertex_descriptor> comm_to_vertex; | ||
| std::map<vertex_descriptor, std::set<vertex_descriptor>> vertex_to_originals; |
There was a problem hiding this comment.
These should almost certainly be input parameters taken by non-const reference so that the user a) decides their type and b) automatically gets their value at the end.
There was a problem hiding this comment.
So they gain access to the whole hierarchy. Ufff it's a lot of guts leaking out haha
Will do, and tell you in case of problemsm thanks again for your time !
There was a problem hiding this comment.
I could be wrong. But have a look at the astar API for some examples of prior art.
There was a problem hiding this comment.
And on second thought, this is not a priority, we can always do it later. Getting it correct and fast are higher priorities.
this was not supposed to be commited :)
this was not supposed to be commited :)
| // L_c = internal edge weight for community c | ||
| // k_c = sum of degrees in community c | ||
| // m = total edge weight / 2 | ||
| struct newman_and_girvan |
There was a problem hiding this comment.
Is this modularity thing a general concept or does it apply to Louvain only?
There was a problem hiding this comment.
This is a yes and no situation.
- The modularity can be used outside of louvain to assess partition quality of a graph.
- But the current implementation with incremental computations (
remove,insert,gain) is particularly suited for Louvain. - Making it generally useful would require to disentangle the two aspects.
- But I would rather do it in a
clusteringfolder so we can have:
include/boost/graph/clustering/
├── quality_functions.hpp # 10 incremental metrics/criterions for gen-louvain
├── label_propagation.hpp # another clustering method (future work)
├── leiden.hpp # Leiden algorithm (future work)
├── louvain.hpp # Louvain algorithm
├── girvan_newman.hpp # Edge betweenness clustering (currently in bc_clustering.hpp)
jeremy-murphy
left a comment
There was a problem hiding this comment.
Couple more requests, still going...
…, type mismatch in for loop
|
|
||
| <H3>Parameters</H3> | ||
|
|
||
| IN: <tt>const Graph& g</tt> |
There was a problem hiding this comment.
The algorithm has the additional requirement that vertices are copyable, hashable etc., as they're internally stored in unordered_sets.
There was a problem hiding this comment.
You're right. I have been changing the vertices handling in this aspect because it was not friendly with some types of graphs. The interface now takes a VertexIndexMap but I still have to commit those changes, sorry 😓
I will update the documentation in that sense once I merged the new stuff
There was a problem hiding this comment.
Is this still an open issue or resolved?
|
Btw, are the benchmarks in the description current? I mean, did the performance change at all since you swapped some standard map containers for Boost ones? |
|
|
Btw, you can use - [ ] and - [x] in the markup to make a todo list and tick things off as they're done. It will make it easier for us as reviewers to know where you are up to. E.g.:
|
| #ifndef BOOST_GRAPH_LOUVAIN_TRUST_AGGREGATED_Q | ||
| #define BOOST_GRAPH_LOUVAIN_TRUST_AGGREGATED_Q 0 | ||
| #endif | ||
|
|
||
| #ifndef BOOST_GRAPH_LOUVAIN_TRACK_PEAK_Q | ||
| #define BOOST_GRAPH_LOUVAIN_TRACK_PEAK_Q 0 | ||
| #endif |
There was a problem hiding this comment.
Ahh, I only just discovered these macros, sorry, and I'm just wondering if they are just meant for private experimental purposes or if users would want to use them too?
If they're for private experiments, I think I would prefer they were kept on a private branch. I mean, I think we should only merge into develop the code that is useful to everyone.
If they are useful to everyone, then they need to be documented and I don't think macros are really OK.
I think once this is resolved, then we're good to merge. :)

Implement #447
Review-driven changes implemented in this PR:
Review items still open:
I reviewed allocating container in the file (4 maps/sets and ~13 vectors). They all live in
louvain_detailinternals or are local scratch inside the public function, they do not appear in the API signature.The maps operate on types created by the internal
aggregate()(agg_vertex_t,community_type) that the user never sees. The set only allocates in the slow/non-incremental fallback path. The final remap is O(k) and one-shot and exists only to give meaningful contiguous labels. The vectors backiterator_property_map, which requires random-access iterators.All maps/sets already use
boost::unordered_flat_map/unordered_flat_setso theflat_mapsuggestion is effectively already done.Templating the container type would add a template parameter that leaks internal types into the public API for no user-observable benefit : the containers are transient scratch rebuilt each pass, not long-lived state.
In short I would rather keep the API simple for now and avoid making too much noise (the genericity level looks already fairly complex to me), but I would be happy to revisit if a concrete use case comes up. I would rather focus on a next PR about more direct benefits: