Enumerative model checking tools are limited by the size of the state space to which they can be applied. Reduction modulo branching bisimulation usually results in a much smaller state space and therefore enables model checking much larger states spaces. We present an algorithm for reducing state spaces modulo branching bisimulation which is suitable for distributed implementation. The algorithm is based on partition refinement. In order to allow for distributed implementations, it works on transition systems which contain cycles of invisible steps, without eliminating strongly connected components first. In order to avoid fine grained parallelism, the algorithm refines the whole partition instead of just a single block in the partition. We prove correctness and also present some experimental results obtained with single threaded and distributed prototypes.