The Residual Network (ResNet) Standard: Unterschied zwischen den Versionen
| Zeile 19: | Zeile 19: | ||
This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space. | This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space. | ||
<ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref> | <ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref> | ||
== Residual Learning == | |||
To fix this optimization problem, ResNet introduced something called '''residual learning''' using skip, or shortcut, connections. | |||
In a normal deep network, many layers are stacked together and they try to learn a function <math>H(x)</math> directly from the input <math>x</math>. The network must learn the full mapping by itself, which can be very hard when the network is very deep. | |||
ResNet changes this idea. It adds a skip connection that jumps over one or more layers. Because of this shortcut, the network does not try to learn <math>H(x)</math> directly. Instead, it learns a residual function: | |||
:<math>F(x) := H(x) - x</math> | |||
The output of the block becomes: | |||
:<math>Y = F(x) + x</math> | |||
So instead of learning everything from zero, the layers only learn the difference between the input and the output. | |||
The main advantage is about optimization. If the best mapping is close to just passing the input forward (an identity mapping), it is much easier for the network to make the nonlinear layers learn <math>F(x) = 0</math>. This can happen by pushing their weights close to zero. It is more easy than trying to force many nonlinear layers to behave exactly like an identity function. | |||
Because of this, adding more layers should not make the network worse. In theory, a deeper network can always perform at least as good as a shallow one, because it can just learn <math>F(x) = 0</math> and copy the shallow network behavior. But in practice, it still may need good training to make it work proper.<ref>Sandushi W. ''Understanding ResNet-50: Solving the Vanishing Gradient Problem with Skip Connections''. Medium. Available at: https://medium.com/@sandushiw98/understanding-resnet-50-solving-the-vanishing-gradient-problem-with-skip-connections-5591fcb7ff74</ref> | |||
== References == | == References == | ||
<references /> | <references /> | ||
Version vom 16. Februar 2026, 13:20 Uhr
ResNet, also called Residual Network, is a very important model in deep learning history. Before ResNet, training very deep neural networks was really hard because of the vanishing gradient problem. This means the signal used to update the weights becomes very small as it moves backward through many layers.
ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.[1]
MATLAB Usage
It is easy to use in MATLAB with resnet50. Many people use it as the default option for transfer learning tests, because its behavior is well known and already understood.[2]
The Math Behind Vanishing Gradients and Training Problems
The main problem when training very deep neural networks is called the vanishing gradient problem. This happens when the gradients from backpropagation get smaller and smaller as they move to the earlier layers of the network. Because of this, the first layers almost stop learning.
This happens because of the chain rule in math. During backpropagation, the gradient in each layer is multiplied by gradients from the layers after it. So it becomes like many small numbers multiplied together. When you multiply many small numbers, the result gets very tiny.
If the network use activation functions that can saturate, or even normal non-linear functions with certain weight setups, the gradients shrink very fast. They can become close to zero in an exponential way. When this happen, the early layers in the model cannot learn properly, and the whole training process slow down a lot or even stop working good.[3]
Scientists found something interesting when they were creating the ResNet model. They saw a problem called the **degradation problem**. At first, people thought very deep networks fail only because of vanishing gradients. But even when they fixed the gradient problem using good weight setup and something called Batch Normalization (which helps signals and gradients stay normal size), very deep networks still did worse than smaller ones.
For example, a 34-layer network had higher training error than a 18-layer network. This is strange because the 34-layer network should be able to do everything the 18-layer network can do, and even more. But in real experiments, it trained worse and had more test error too.
This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space. [3]
Residual Learning
To fix this optimization problem, ResNet introduced something called residual learning using skip, or shortcut, connections.
In a normal deep network, many layers are stacked together and they try to learn a function directly from the input . The network must learn the full mapping by itself, which can be very hard when the network is very deep.
ResNet changes this idea. It adds a skip connection that jumps over one or more layers. Because of this shortcut, the network does not try to learn directly. Instead, it learns a residual function:
The output of the block becomes:
So instead of learning everything from zero, the layers only learn the difference between the input and the output.
The main advantage is about optimization. If the best mapping is close to just passing the input forward (an identity mapping), it is much easier for the network to make the nonlinear layers learn . This can happen by pushing their weights close to zero. It is more easy than trying to force many nonlinear layers to behave exactly like an identity function.
Because of this, adding more layers should not make the network worse. In theory, a deeper network can always perform at least as good as a shallow one, because it can just learn and copy the shallow network behavior. But in practice, it still may need good training to make it work proper.[4]
References
- ↑ MathWorks. Train Residual Network for Image Classification. [Online]. Available at: https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-classification.html
- ↑ MathWorks. Pretrained Convolutional Neural Networks. MATLAB Documentation. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html
- ↑ 3,0 3,1 Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.
- ↑ Sandushi W. Understanding ResNet-50: Solving the Vanishing Gradient Problem with Skip Connections. Medium. Available at: https://medium.com/@sandushiw98/understanding-resnet-50-solving-the-vanishing-gradient-problem-with-skip-connections-5591fcb7ff74