Zwicky-box analysis: Unterschied zwischen den Versionen

Aktuelle Version vom 14. April 2026, 13:49 Uhr

Zwicky-Box Analysis for Models

We do a Zwicky-box analysis of four CNN and Transformer models (ResNet-50, MobileNetV2, EfficientNet-B0, ViT-Base) across six image task. We look at what each model have (parameters, FLOPs, input size, how much data it need, speed, memory, hardware target, explainability, noise strongness, and test accuracy).

Here we make key parts (like model size, math cost, speed, memory size, data need, transfer, explain, noise handling, and segmentation output) with different levels. We put each model to these levels and test them for task like: fix image, make image better, remove noise, supervised segmentation, normal segmentation, and classification.

We find that, CNN models (ResNet and EfficientNet) give good correct guess and pixel outputs but it cost more computer power. MobileNetV2 is very small (good for edge device) but guess less correct. ViT-Base is very big and need lots of data, it only do good on classification when it train with huge data.

Below is a morphological table, tell why we pick model for task, and give a compare table and metric chart.

For pixel task (fix, make better, or segmentation), CNNs (ResNet or EfficientNet) is best. For phone or fast real-time, use MobileNetV2 or small EfficientNet. For only classification with too much data, use ViT or big EfficientNet. MATLAB Deep Learning Toolbox have already trained models ready to use, like resnet50, mobilenetv2, efficientnetb0, and visionTransformer.^[1]^[2]

References

↑ MathWorks. Pretrained Deep Neural Networks. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html
↑ MathWorks. visionTransformer - Pretrained vision transformer (ViT) neural network. Available at: https://www.mathworks.com/help/vision/ref/visiontransformer.html

[1] MathWorks. Pretrained Deep Neural Networks. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html

[2] MathWorks. visionTransformer - Pretrained vision transformer (ViT) neural network. Available at: https://www.mathworks.com/help/vision/ref/visiontransformer.html

[1]

[2]

@@ Zeile 3: / Zeile 3: @@
 We do a Zwicky-box analysis of four CNN and Transformer models (ResNet-50, MobileNetV2, EfficientNet-B0, ViT-Base) across six image task. We look at what each model have (parameters, FLOPs, input size, how much data it need, speed, memory, hardware target, explainability, noise strongness, and test accuracy).
-We make key parts (like model size, math cost, speed, memory size, data need, transfer, explain, noise handling, and segmentation output) with different levels. We put each model to these levels and test them for task like: fix image, make image better, remove noise, supervised segmentation, normal segmentation, and classification.
+Here we make key parts (like model size, math cost, speed, memory size, data need, transfer, explain, noise handling, and segmentation output) with different levels. We put each model to these levels and test them for task like: fix image, make image better, remove noise, supervised segmentation, normal segmentation, and classification.
-What we find: CNN models (ResNet and EfficientNet) give good correct guess and pixel outputs but it cost more computer power. MobileNetV2 is very small (good for edge device) but guess less correct. ViT-Base is very big and need lots of data, it only do good on classification when it train with huge data.
+We find that, CNN models (ResNet and EfficientNet) give good correct guess and pixel outputs but it cost more computer power. MobileNetV2 is very small (good for edge device) but guess less correct. ViT-Base is very big and need lots of data, it only do good on classification when it train with huge data.
-We show the morphological table, tell why we pick model for task, and give a compare table and metric chart.
+Below is a morphological table, tell why we pick model for task, and give a compare table and metric chart.
-What we suggest: For pixel task (fix, make better, or segmentation), CNNs (ResNet or EfficientNet) is best. For phone or fast real-time, use MobileNetV2 or small EfficientNet. For only classification with too much data, use ViT or big EfficientNet. MATLAB Deep Learning Toolbox have already trained models ready to use, like <tt>resnet50</tt>, <tt>mobilenetv2</tt>, <tt>efficientnetb0</tt>, and <tt>visionTransformer</tt>.<ref>MathWorks. ''Pretrained Deep Neural Networks''. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html</ref><ref>MathWorks. ''visionTransformer - Pretrained vision transformer (ViT) neural network''. Available at: https://www.mathworks.com/help/vision/ref/visiontransformer.html</ref>
+For pixel task (fix, make better, or segmentation), CNNs (ResNet or EfficientNet) is best. For phone or fast real-time, use MobileNetV2 or small EfficientNet. For only classification with too much data, use ViT or big EfficientNet. MATLAB Deep Learning Toolbox have already trained models ready to use, like <tt>resnet50</tt>, <tt>mobilenetv2</tt>, <tt>efficientnetb0</tt>, and <tt>visionTransformer</tt>.<ref>MathWorks. ''Pretrained Deep Neural Networks''. Available at: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html</ref><ref>MathWorks. ''visionTransformer - Pretrained vision transformer (ViT) neural network''. Available at: https://www.mathworks.com/help/vision/ref/visiontransformer.html</ref>
 == References ==
 <references />

Zwicky-box analysis: Unterschied zwischen den Versionen

Aktuelle Version vom 14. April 2026, 13:49 Uhr

Zwicky-Box Analysis for Models

References

Navigationsmenü

Suche