YOLO Model for Object Detection: Unterschied zwischen den Versionen
Die Seite wurde neu angelegt: „= YOLO Model for LEGO Parts Detection = The '''LEGO Parts Detection System''' is a computer vision application designed to automatically identify and classify specific LEGO bricks within images. This implementation utilizes the '''YOLO11''' (You Only Look Once, version 11) architecture from the Ultralytics library. The model is trained on a custom dataset of approximately 400 images using Google Colab and NVIDIA GPU acceleration. == Overview ==…“ |
|||
| Zeile 86: | Zeile 86: | ||
save=True | save=True | ||
</source> | </source> | ||
== Core Logic and Code Implementation == | |||
The implementation relies on a specific workflow to bridge the gap between raw data and the Ultralytics YOLO engine. The core logic involves dynamically generating configuration files and defining the training hyperparameters. | |||
=== Automated Configuration Generation === | |||
YOLO models require a specific YAML configuration file to locate the dataset and identify class names. Instead of manually creating this file, the system uses a Python function to parse the raw label map (`classes.txt`) and generate the `data.yaml` file programmatically. | |||
This ensures that the class indices ($0, 1, 2...$) perfectly match the class names (e.g., "brick_2x4") during training. | |||
<source lang="python"> | |||
import yaml | |||
def create_data_yaml(path_to_classes_txt, path_to_data_yaml): | |||
""" | |||
Parses a raw text file of class names and generates | |||
the YAML configuration required by YOLO. | |||
""" | |||
# 1. Read class names from the text file | |||
with open(path_to_classes_txt, 'r') as f: | |||
classes = [line.strip() for line in f.readlines() if line.strip()] | |||
# 2. Define the dictionary structure required by YOLO | |||
data = { | |||
'path': '/content/data', # Root directory | |||
'train': 'train/images', # Subpath to training images | |||
'val': 'validation/images', # Subpath to validation images | |||
'nc': len(classes), # Number of Classes | |||
'names': classes # List of Class Names | |||
} | |||
# 3. Serialize the dictionary to a YAML file | |||
with open(path_to_data_yaml, 'w') as f: | |||
yaml.dump(data, f, sort_keys=False) | |||
# Execution | |||
create_data_yaml('/content/custom_data/classes.txt', '/content/data.yaml') | |||
</source> | |||
=== Data Partitioning Logic === | |||
To prevent overfitting, the raw dataset is split into training and validation subsets. The system uses a 90/10 split ratio. This logic is handled by an external utility script (`train_val_split.py`) which randomizes the files to ensure a representative distribution of LEGO parts in both sets. | |||
<source lang="python"> | |||
# 90% Training data, 10% Validation data | |||
!python train_val_split.py --datapath="/content/custom_data" --train_pct=0.9 | |||
</source> | |||
=== Model Initialization and Training === | |||
The core training loop is initiated via the Command Line Interface (CLI). The logic here defines the hardware constraints and the duration of the learning process. | |||
<source lang="python"> | |||
!yolo detect train \ | |||
data=/content/data.yaml \ # Path to the config generated above | |||
model=yolo11s.pt \ # Load the 'Small' pretrained weights | |||
epochs=60 \ # Iterate over the dataset 60 times | |||
imgsz=640 # Resize all inputs to 640x640 pixels | |||
</source> | |||
* '''Pre-trained Weights (`yolo11s.pt`):''' The model uses Transfer Learning, starting with weights learned from the COCO dataset rather than random values. This significantly speeds up convergence for the custom LEGO dataset. | |||
* '''Image Size (`imgsz=640`):''' The native resolution of the network. LEGO images are automatically resized (downsampled or upsampled) to this dimension before entering the backbone. | |||
== Evaluation and Results == | == Evaluation and Results == | ||
Aktuelle Version vom 6. Februar 2026, 13:23 Uhr
YOLO Model for LEGO Parts Detection
The LEGO Parts Detection System is a computer vision application designed to automatically identify and classify specific LEGO bricks within images. This implementation utilizes the YOLO11 (You Only Look Once, version 11) architecture from the Ultralytics library. The model is trained on a custom dataset of approximately 400 images using Google Colab and NVIDIA GPU acceleration.
Overview
Object detection involves locating instances of objects of certain classes within an image. Unlike standard image classification (which assigns a single label to an image), this YOLO model predicts:
- Bounding Boxes: The spatial coordinates of the LEGO part.
- Class Probabilities: The specific type of LEGO brick (e.g., "2x4 Brick", "Technic Pin").
The system uses the `yolo11s` (Small) model variant, optimized for a balance between inference speed and detection accuracy, making it suitable for real-time applications.
Dataset Preparation
The performance of the model relies on a curated dataset processed through the following pipeline:
Data Collection and Annotation
- Source: 400 images containing target LEGO parts in various orientations, lighting conditions, and backgrounds.
- Annotation Tool: Label Studio.
- Label Format: YOLO standard format, where each image has a corresponding `.txt` file containing lines in the format:
<class_id> <x_center> <y_center> <width> <height> All coordinates are normalized between 0 and 1.
Preprocessing
Before training, the dataset is split to ensure robust evaluation:
- Training Set: 90% of images (used to update model weights).
- Validation Set: 10% of images (used to evaluate performance during training).
- Configuration: A `data.yaml` file is generated dynamically to map the directory paths and class names for the training engine.
Network Architecture
The YOLO11 architecture is a single-stage object detector. It processes the entire image in a single forward pass, distinguishing it from two-stage detectors like R-CNN. The architecture consists of three main components:
| Component | Function | Description |
|---|---|---|
| Backbone | Feature Extraction | A Convolutional Neural Network (based on CSPDarknet) that downsamples the image to extract distinct features (edges, textures, shapes) at different scales. It utilizes C3k2 blocks (Cross Stage Partial networks with specific kernel sizes) to improve gradient flow and reduce computational cost. |
| Neck | Feature Fusion | Uses PANet (Path Aggregation Network) layers to combine features from different backbone levels. This ensures that the model can detect both large (close-up) and small (distant) LEGO parts effectively. |
| Head | Prediction | A decoupled head that separates the classification task (what is it?) from the regression task (where is it?). It outputs the final bounding boxes and class scores. |
Mathematical Description of Core Operations
SiLU Activation Function
The hidden layers of the network use the Sigmoid Linear Unit (SiLU) activation function. It allows for smoother gradient propagation compared to the traditional ReLU.
Intersection over Union (IoU)
To measure how well a predicted box overlaps with the ground truth box during training, the Intersection over Union metric is used:
Where is the predicted bounding box and is the ground truth box.
Loss Function
The model optimizes a composite loss function that combines three distinct error measurements:
- Box Loss (): Measures the error in the coordinate predictions. YOLO11 typically uses CIoU (Complete IoU) loss, which accounts for overlap, center point distance, and aspect ratio consistency.
- Class Loss (): Measures the error in classification using Binary Cross Entropy (BCE):
- DFL Loss (): Distribution Focal Loss, used to refine the localization of the bounding box boundaries.
Non-Maximum Suppression (NMS)
During inference, the model may predict multiple overlapping boxes for a single LEGO part. NMS filters these to keep only the best prediction.
- Select the box with the highest confidence score.
- Calculate IoU between this box and all other boxes.
- Discard boxes with an IoU threshold higher than a set limit (e.g., 0.5).
Implementation
Training Configuration
The model is trained using the Python SDK. The training process runs for 60 epochs with an image size of 640 pixels.
!yolo detect train \
data=/content/data.yaml \
model=yolo11s.pt \
epochs=60 \
imgsz=640
Inference
Once trained, the best weights (`best.pt`) are used to predict classes on new images.
!yolo detect predict \
model=runs/detect/train/weights/best.pt \
source=data/validation/images \
save=True
Core Logic and Code Implementation
The implementation relies on a specific workflow to bridge the gap between raw data and the Ultralytics YOLO engine. The core logic involves dynamically generating configuration files and defining the training hyperparameters.
Automated Configuration Generation
YOLO models require a specific YAML configuration file to locate the dataset and identify class names. Instead of manually creating this file, the system uses a Python function to parse the raw label map (`classes.txt`) and generate the `data.yaml` file programmatically.
This ensures that the class indices ($0, 1, 2...$) perfectly match the class names (e.g., "brick_2x4") during training.
import yaml
def create_data_yaml(path_to_classes_txt, path_to_data_yaml):
"""
Parses a raw text file of class names and generates
the YAML configuration required by YOLO.
"""
# 1. Read class names from the text file
with open(path_to_classes_txt, 'r') as f:
classes = [line.strip() for line in f.readlines() if line.strip()]
# 2. Define the dictionary structure required by YOLO
data = {
'path': '/content/data', # Root directory
'train': 'train/images', # Subpath to training images
'val': 'validation/images', # Subpath to validation images
'nc': len(classes), # Number of Classes
'names': classes # List of Class Names
}
# 3. Serialize the dictionary to a YAML file
with open(path_to_data_yaml, 'w') as f:
yaml.dump(data, f, sort_keys=False)
# Execution
create_data_yaml('/content/custom_data/classes.txt', '/content/data.yaml')
Data Partitioning Logic
To prevent overfitting, the raw dataset is split into training and validation subsets. The system uses a 90/10 split ratio. This logic is handled by an external utility script (`train_val_split.py`) which randomizes the files to ensure a representative distribution of LEGO parts in both sets.
# 90% Training data, 10% Validation data
!python train_val_split.py --datapath="/content/custom_data" --train_pct=0.9
Model Initialization and Training
The core training loop is initiated via the Command Line Interface (CLI). The logic here defines the hardware constraints and the duration of the learning process.
!yolo detect train \
data=/content/data.yaml \ # Path to the config generated above
model=yolo11s.pt \ # Load the 'Small' pretrained weights
epochs=60 \ # Iterate over the dataset 60 times
imgsz=640 # Resize all inputs to 640x640 pixels
- Pre-trained Weights (`yolo11s.pt`): The model uses Transfer Learning, starting with weights learned from the COCO dataset rather than random values. This significantly speeds up convergence for the custom LEGO dataset.
- Image Size (`imgsz=640`): The native resolution of the network. LEGO images are automatically resized (downsampled or upsampled) to this dimension before entering the backbone.
Evaluation and Results
Upon completion of the training phase, the model's performance is qualitatively evaluated by running inference on unseen images from the validation set. The output consists of the original input images overlaid with prediction annotations.