ZHANG Hu, LI Huiying, HU Kaihua
(School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, Jiangxi, China)
Extended abstract:[Background and purposes] With the process of the fourth industrial revolution, China's manufacturing industry will gradually realize intelligent production, intelligent detection and intelligent logistics. China is one of the world's largest producers, consumers and exporters of ceramic tiles. However, the ceramic tile surface defect detection still relies on manual visual inspection, resulting in low detection efficiency and high costs, making it difficult to achieve intelligent inspection. With the development of deep learning, the object detection technology based on deep learning is expected to realize the intelligent detection of tiles and further improve the production efficiency of tiles. In this context, this paper is aimed to explore an improved YOLO11 algorithm for tile surface defect detection, thus achieving model lightweighting while enhancing detection.[Methods] The original dataset used in this study was collected from the internet, which consists of 3,593 images with a resolution of 8192×6000 pixels and 1,975 images with a resolution of 4096×3500 pixels. Considering the high resolution of the original images and the large proportion of small defect targets in the dataset, a sliding window slicing approach was adopted, which effectively alleviated the training difficulties caused by high-resolution data, where the original images were cropped into smaller images of 640×640 pixels with a 25% overlap rate. After preprocessing, a ceramic tile surface defect dataset, containing a total of 20,736 images, was constructed and split into training and validation sets at a ratio of 9:1.Four improvement strategies were adopted, based on the YOLO11n model. Considering the high proportion of small defects in the dataset and the importance of low-level detail information for feature extraction of small object, the fusion module was proposed and integrated into the backbone network to enhance the interaction between abstract high-level semantic features and fine-grained low-level detail features by fusing low-level feature maps during high-level feature extraction, thus improving the feature extraction ability of the model. (2) In the downsampling process of the network, the MDown module partially replaced standard 3×3 convolutions, in which the feature map was split into dual pathways, processing them through convolutional operations and max pooling respectively, thereby achieving a lightweight downsampling architecture. (3) ITo address the phenomenon of similar features in feature maps generated by standard convolutions, the Ghost module is introduced to optimize the network architecture, where the intrinsic features was first generated via standard convolution, then more features were produced through simple linear operations. (4) To address the challenges posed by complex texture patterns on tile surfaces, Efficient Multi-scale Attention (EMA) mechanism, which is a hybrid attention integrating both channel and spatial dimensions, was introduced to enhance the model's adaptability to complex environment, thereby improving its generalization capability.[Results] The experiments were conducted in a virtual environment deployed on the Hengyuan Cloud Platform, utilizing computational resources including a 16-core AMD EPYC 7J13 CPU and an NVIDIA RTX 4090D GPU. The system was operated on Ubuntu 22.04 LTS, with Python 3.11 as the programming language, PyTorch 2.4.0 as the deep learning framework, and CUDA 12.1.1 toolkit for GPU acceleration. The training configuration was set with the following parameters: 500 epochs, batch size of 64, initial learning rate 0.01, final learning rate 0.01, SGD optimizer, and momentum 0.937. The optimized model exhibited significant improvements over the baseline YOLO11n. Specifically, 31% parameter reduction and 26% lower FLOPs were achieved, together with performance improvements of +5.2% in precision, +5.7% in recall, +3.9% in mAP@0.5, and +5.6% in mAP@0.5:0.95. Furthermore, the improved model achieved 82.8% mAP@0.5, outperforming YOLO11s by 0.5% in mAP@0.5.[Conclusions] The improved YOLO11 model was able to optimize object detection performance, while maintaining low computational complexity, thus providing a lightweight detection algorithm for reference in tile surface defect detection tasks.
Key words: surface defect detection; YOLO11; EMA attention mechanism; Ghost module; lightweight model