topology: make for_each_node_with_cpus() O(N)
for_each_node_with_cpus() calls nr_cpus_node() at every iteration, which
makes it O(N^2). Kernel tracks such nodes with N_CPU record in node_states
array. Switching to it makes for_each_node_with_cpus() O(N).
Andrea:
Now we can include also offline nodes with CPUs assigned (assuming it's
possible). If checking the online state is required, the user must use
node_online() within the loop.
CC: Andrea Righi <arighi@nvidia.com>
CC:Tejun Heo <tj@kernel.org>
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>