Why there is a division of "num_nodes" in the KL divergence?
Why there is a division of "num_nodes" in the KL divergence?