Regarding the layer size for a deep belief network:
The layers should not get smaller and they should be initialized correctly.
For visible and hidden units, whether to use binary values or probabilities:
Only the first hidden units values are binary values, others like v0, v1, h1 adopt real valued probabilities.
How to measure the quality of the models as the probability is hard to compute due to the partition function: