The Residual Network (ResNet) Standard - Versionsgeschichte

Ajay.paul@stud.hshl.de: /* Residual Learning */

2026-02-17T06:25:43Z

Residual Learning

← Nächstältere Version		Version vom 17. Februar 2026, 06:25 Uhr
Zeile 38:		Zeile 38:

	Because of this, adding more layers should not make the network worse. In theory, a deeper network can always perform at least as good as a shallow one, because it can just learn <math>F(x) = 0</math> and copy the shallow network behavior. But in practice, it still may need good training to make it work proper.<ref>Sandushi W. ''Understanding ResNet-50: Solving the Vanishing Gradient Problem with Skip Connections''. Medium. Available at: https://medium.com/@sandushiw98/understanding-resnet-50-solving-the-vanishing-gradient-problem-with-skip-connections-5591fcb7ff74</ref>		Because of this, adding more layers should not make the network worse. In theory, a deeper network can always perform at least as good as a shallow one, because it can just learn <math>F(x) = 0</math> and copy the shallow network behavior. But in practice, it still may need good training to make it work proper.<ref>Sandushi W. ''Understanding ResNet-50: Solving the Vanishing Gradient Problem with Skip Connections''. Medium. Available at: https://medium.com/@sandushiw98/understanding-resnet-50-solving-the-vanishing-gradient-problem-with-skip-connections-5591fcb7ff74</ref>

			This identity mapping means the original input <math>x</math> is added directly to the output of the next layers. Because of this, there is a second path where the signal can move forward easily, and the gradient can also move backward without problem.

			During backpropagation, the gradient of the loss <math>L</math> with respect to the input <math>x</math> has an extra added term. Since it is added (not multiplied many times like in normal deep networks), it avoids the problem where gradients become very small. In normal networks, gradients can shrink again and again because of multiplication, and they almost disappear. But here, the loss from the final output is still strongly felt by the first layers. So the early layers can learn better, and training becomes more stable.

			Sometimes, inside the network, the size of the feature maps changes. For example, the spatial size can become smaller, or the number of channels can increase. In this case, the skip connection cannot just add <math>x</math> directly because the dimensions are different. So ResNet uses a small <math>1 \times 1</math> convolution layer in the shortcut path to adjust the dimensions. This makes sure both feature maps have the same shape, so they can be added together without any mismatch. This help the network work properly even when the size changes.

	== References ==		== References ==
	<references />		<references />

Ajay.paul@stud.hshl.de: /* The Math Behind Vanishing Gradients and Training Problems */

2026-02-16T12:20:39Z

The Math Behind Vanishing Gradients and Training Problems

@@ Zeile 19: / Zeile 19: @@
 This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space.
 <ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref>
 == References ==
 <references />

Ajay.paul@stud.hshl.de: /* The Math Behind Vanishing Gradients and Training Problems */

2026-02-16T12:16:32Z

The Math Behind Vanishing Gradients and Training Problems

← Nächstältere Version		Version vom 16. Februar 2026, 12:16 Uhr
Zeile 18:		Zeile 18:

	This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space.		This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space.
			<ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref>

	== References ==		== References ==
	<references />		<references />

Ajay.paul@stud.hshl.de: /* The Math Behind Vanishing Gradients and Training Problems */

2026-02-16T12:16:01Z

The Math Behind Vanishing Gradients and Training Problems

← Nächstältere Version		Version vom 16. Februar 2026, 12:16 Uhr
Zeile 13:		Zeile 13:
	If the network use activation functions that can saturate, or even normal non-linear functions with certain weight setups, the gradients shrink very fast. They can become close to zero in an exponential way. When this happen, the early layers in the model cannot learn properly, and the whole training process slow down a lot or even stop working good.<ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref>		If the network use activation functions that can saturate, or even normal non-linear functions with certain weight setups, the gradients shrink very fast. They can become close to zero in an exponential way. When this happen, the early layers in the model cannot learn properly, and the whole training process slow down a lot or even stop working good.<ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref>

			Scientists found something interesting when they were creating the ResNet model. They saw a problem called the degradation problem. At first, people thought very deep networks fail only because of vanishing gradients. But even when they fixed the gradient problem using good weight setup and something called Batch Normalization (which helps signals and gradients stay normal size), very deep networks still did worse than smaller ones.

			For example, a 34-layer network had higher training error than a 18-layer network. This is strange because the 34-layer network should be able to do everything the 18-layer network can do, and even more. But in real experiments, it trained worse and had more test error too.

			This idea is sometimes called the "ResNet Hypothesis." It shows that the problem is not just vanishing gradients. The real issue is that very deep networks are very hard to optimize. The learning process becomes very slow and difficult, and the model dont converge easily in such a complex and non-linear space.

	== References ==		== References ==
	<references />		<references />

Ajay.paul@stud.hshl.de: /* Performance */

2026-02-16T11:58:15Z

Performance

← Nächstältere Version		Version vom 16. Februar 2026, 11:58 Uhr
Zeile 2:		Zeile 2:

	ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.<ref>MathWorks. ''Train Residual Network for Image Classification''. [Online]. Available at: https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-classification.html</ref>		ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.<ref>MathWorks. ''Train Residual Network for Image Classification''. [Online]. Available at: https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-classification.html</ref>

	~~=== Performance ===~~
	ResNet-50 is seen as the normal industry "workhorse." It gives a good mix of accuracy (about 76% top-1 on ImageNet) and how much computing power it needs. It works well for most tasks and people trust it a lot.<ref>DeepLabCut. ''What neural network should I use? (Trade-offs, speed performance, and considerations)''. GitHub Wiki. Available at: https://github.com/DeepLabCut/DeepLabCut/wiki/What-neural-network-should-I-use%3F-(Trade-offs,-speed-performance,-and-considerations)</ref>

	=== MATLAB Usage ===		=== MATLAB Usage ===

Ajay.paul@stud.hshl.de am 13. Februar 2026 um 15:30 Uhr

2026-02-13T15:30:43Z

← Nächstältere Version		Version vom 13. Februar 2026, 15:30 Uhr
Zeile 8:		Zeile 8:
	=== MATLAB Usage ===		=== MATLAB Usage ===
	It is easy to use in MATLAB with <tt>resnet50</tt>. Many people use it as the default option for transfer learning tests, because its behavior is well known and already understood.<ref>MathWorks. ''Pretrained Convolutional Neural Networks''. MATLAB Documentation. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html</ref>		It is easy to use in MATLAB with <tt>resnet50</tt>. Many people use it as the default option for transfer learning tests, because its behavior is well known and already understood.<ref>MathWorks. ''Pretrained Convolutional Neural Networks''. MATLAB Documentation. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html</ref>

			== The Math Behind Vanishing Gradients and Training Problems ==
			The main problem when training very deep neural networks is called the vanishing gradient problem. This happens when the gradients from backpropagation get smaller and smaller as they move to the earlier layers of the network. Because of this, the first layers almost stop learning.

			This happens because of the chain rule in math. During backpropagation, the gradient in each layer is multiplied by gradients from the layers after it. So it becomes like many small numbers multiplied together. When you multiply many small numbers, the result gets very tiny.

			If the network use activation functions that can saturate, or even normal non-linear functions with certain weight setups, the gradients shrink very fast. They can become close to zero in an exponential way. When this happen, the early layers in the model cannot learn properly, and the whole training process slow down a lot or even stop working good.<ref name="Xu2024">Xu, G., Wang, X., Wu, X., Leng, X. and Xu, Y., 2024. Development of skip connection in deep neural networks for computer vision and medical image analysis: A survey. arXiv preprint arXiv:2405.01725.</ref>


	== References ==		== References ==
	<references />		<references />

Ajay.paul@stud.hshl.de am 10. Februar 2026 um 15:09 Uhr

2026-02-10T15:09:50Z

← Nächstältere Version		Version vom 10. Februar 2026, 15:09 Uhr
Zeile 3:		Zeile 3:
	ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.<ref>MathWorks. ''Train Residual Network for Image Classification''. [Online]. Available at: https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-classification.html</ref>		ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.<ref>MathWorks. ''Train Residual Network for Image Classification''. [Online]. Available at: https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-classification.html</ref>

			=== Performance ===
			ResNet-50 is seen as the normal industry "workhorse." It gives a good mix of accuracy (about 76% top-1 on ImageNet) and how much computing power it needs. It works well for most tasks and people trust it a lot.<ref>DeepLabCut. ''What neural network should I use? (Trade-offs, speed performance, and considerations)''. GitHub Wiki. Available at: https://github.com/DeepLabCut/DeepLabCut/wiki/What-neural-network-should-I-use%3F-(Trade-offs,-speed-performance,-and-considerations)</ref>

			=== MATLAB Usage ===
			It is easy to use in MATLAB with <tt>resnet50</tt>. Many people use it as the default option for transfer learning tests, because its behavior is well known and already understood.<ref>MathWorks. ''Pretrained Convolutional Neural Networks''. MATLAB Documentation. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html</ref>

	== References ==		== References ==
	<references />		<references />

Ajay.paul@stud.hshl.de am 10. Februar 2026 um 14:56 Uhr

2026-02-10T14:56:14Z

← Nächstältere Version		Version vom 10. Februar 2026, 14:56 Uhr
Zeile 1:		Zeile 1:
	ResNet, also called Residual Network, is a very important model in deep learning history. Before ResNet, training very deep neural networks was really hard because of the vanishing gradient problem. This means the signal used to update the weights ~~become~~ very small as it ~~move~~ backward through many layers.		ResNet, also called Residual Network, is a very important model in deep learning history. Before ResNet, training very deep neural networks was really hard because of the vanishing gradient problem. This means the signal used to update the weights becomes very small as it moves backward through many layers.

	ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.		ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.<ref>MathWorks. ''Train Residual Network for Image Classification''. [Online]. Available at: https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-classification.html</ref>


			== References ==
			<references />

Ajay.paul@stud.hshl.de: Die Seite wurde neu angelegt: „ResNet, also called Residual Network, is a very important model in deep learning history. Before ResNet, training very deep neural networks was really hard because of the vanishing gradient problem. This means the signal used to update the weights become very small as it move backward through many layers. ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of pa…“

2026-02-10T12:20:54Z

Die Seite wurde neu angelegt: „ResNet, also called Residual Network, is a very important model in deep learning history. Before ResNet, training very deep neural networks was really hard because of the vanishing gradient problem. This means the signal used to update the weights become very small as it move backward through many layers. ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of pa…“

Neue Seite

ResNet, also called Residual Network, is a very important model in deep learning history. Before ResNet, training very deep neural networks was really hard because of the vanishing gradient problem. This means the signal used to update the weights become very small as it move backward through many layers.

ResNet fixed this problem by adding skip connections, also known as shortcuts. These connections let the gradient flow around some layers instead of passing through all of them. Because of this, very deep networks can be trained, even with hundreds of layers, like ResNet-50, ResNet-101, and ResNet-152.