Why are neural networks initial weights initialized as random numbers? I had rea
ID: 650594 • Letter: W
Question
Why are neural networks initial weights initialized as random numbers? I had read somewhere that this is done to "break the symmetry" and this makes the neural network learn faster. How does breaking the symmetry make it learn faster?
Would'nt initializing the weights to 0 be a better idea? That way the weights would be able to find their values (whether positive or negative) faster?
Is there some other underlying philosophy behind randomizing the weights apart from hoping that they would be near their optimum values when initialized?
Explanation / Answer
The basic intuition behind initializing weight layers into small (and different) values is just so that the bias of the system is broken and weight values can move along and away and apart to different values.
More concretely, you'ld probably want your initial weights to be distinct and have "a small gap" between them, this 'gap' expands out as you go along and forces the weights to be a bit larger at every iteration, and this helps the network to converge faster, i.e. the learning process speeds up.
If you would instead have all your weights to some constant, each weight will be updated at a very slow (~fixed) rate, and this won't help much, specially if the initial values are 'very far' from the final values.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.