Gearbox fault diagnosis method based on lightweight channel attention mechanism and transfer learning

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT In practical engineering, the working conditions of gearbox are complex and variable. In varying working conditions, the performance of intelligent fault diagnosis model is degraded

because of limited valid samples and large data distribution differences of gearbox signals. Based on these issues, this research proposes a gearbox fault diagnosis method integrated with

lightweight channel attention mechanism, and further realizes the cross-component transfer learning. First, time–frequency distribution of original signals is obtained by wavelet transform.

It could intuitively reflect local characteristics of signals. Secondly, based on a local cross-channel interaction strategy, a lightweight efficient channel attention mechanism (LECA) is

designed. The kernel size of 1D convolution is affected by channel number and coefficients. Multi-scale feature input is used to retain more detailed features of different dimensions. A

lightweight convolutional neural network is constructed. Finally, a transfer learning method is applied to freeze lower structures of the network and fine-tune higher structures of the model

using small samples. Through experimental verification, the proposed model could effectively utilize samples. The application of transfer learning could realize accurate and fast fault

classification of small samples, and achieve good gearbox fault diagnosis effect under varying working conditions and cross-component conditions. SIMILAR CONTENT BEING VIEWED BY OTHERS A

NOVEL INTELLIGENT FAULT DIAGNOSIS METHOD FOR GEARBOX BASED ON MULTI-DIMENSIONAL ATTENTION DENOISING CONVOLUTION Article Open access 21 October 2024 MULTI SCALE CONVOLUTIONAL NEURAL NETWORK

COMBINING BILSTM AND ATTENTION MECHANISM FOR BEARING FAULT DIAGNOSIS UNDER MULTIPLE WORKING CONDITIONS Article Open access 15 April 2025 A HYBRID APPROACH COMBINING DEEP LEARNING AND SIGNAL

PROCESSING FOR BEARING FAULT DIAGNOSIS UNDER IMBALANCED SAMPLES AND MULTIPLE OPERATING CONDITIONS Article Open access 19 April 2025 INTRODUCTION At present, mechanical equipment is widely

used in industrial production and intelligent manufacturing. However, the operation environment of mechanical equipment is complex in practical applications. The long-term running could lead

to aging and damage of components in mechanical equipment1. If critical components fail, the operation of equipment will be affected, resulting in huge losses. Gearbox is a key transmission

component in mechanical equipment. It is usually in a non-stationary and variable load operating environment. In early stages, weak fault of mechanical components is often ignored due to

the interference of environmental and other noise. If allowed to progress, it could interfere with the normal operation of equipment and even lead to casualties and other accidents2.

Therefore, the research of gearbox fault diagnosis can reduce the occurrence of major accidents. The research is of great meaning for enhancing the reliability and security of equipment

operation. The key points of gearbox fault diagnosis methods are signal feature extraction and fault pattern recognition. The main methods for feature extraction embrace Short Time Fourier

Transform (STFT)3, Variational Mode Decomposition (VMD)4 and Wavelet Transform (WT)5, etc. Traditional pattern recognition algorithms mainly include Support Vector Machine (SVM)6, Sparse

Representation Classification (SRC)7, Artificial Neural Network (ANN)8, etc. With fault diagnosis gradually stepping into the big data era, the collected signals not only have a large

amount, but also have complex and diverse types. Traditional identification methods are difficult to meet the demand in big data era. Intelligent identification of gearbox faults is a

necessary research9,10. In recent years, deep learning11 is one of the fastest-growing domains in machine learning. It has found an increasingly wide utilization in fault diagnosis. In deep

learning methods, convolutional neural network (CNN) utilizes the idea of weight sharing and local sense to decrease complexity and computational cost of network12,13. It has significant

advantages 2D image classification. CNN is universally adopted in mechanical equipment fault diagnosis field due to the superior classification performance. For example, Ye et al.14 proposed

a new intelligent rolling bearing fault diagnosis method based on variational mode extraction (VME) and improved 1D-CNN, which had strong feature learning ability. Long et al.15 used pixel

filling method to convert signals into images and input these images into a 2D-CNN network to achieve high-precision fault classification. Yan et al.16 proposed a deep order-wavelet

convolutional variational autoencoder (DOWCVAE) network to identify bearing faults under fluctuating speed conditions. The research could improve feature learning ability of a plain

convolutional variational autoencoder. Zhang et al.17 designed a multi-branch residual convolutional neural network that achieved high-precision gearbox fault diagnosis. Although CNN has

strong feature extraction capabilities, the key information could be weaken when applying max or average pooling directly merges features in a model. The attention mechanism is a good

solution to this problem18,19. Zhao et al.20 embedded an improved channel and spatial attention module in residual structure and focused attention on effective information of feature maps.

Li et al.21 combined Dual-stage Attention-based Recurrent Neural Network (DA-RNN) and Convolutional Block Attention Module (CBAM) to obtain a bearing fault diagnosis model, which achieved

good diagnosis results under unbalanced data condition. Liu et al.22 constructed a stacked residual multi-attention network (SRMANet) to take critical feature components of gearbox vibration

signals. Zhao et al.23 presented a novel rotor system fault diagnosis model based on parallel convolutional neural network architecture with attention mechanism (AMPCNN), which had good

performance for load adaptability and noise immunity. Ding et al.24 designed a feature-guided attention mechanism and embedded it into the residual network to enhance its generalization

ability. Li et al.25 integrated the convolutional neural network (CNN) with attention mechanisms to strengthen the representational power of fault samples. To sum up, the deep learning

algorithm could adaptively extract fault features and has strong fault classification performance. However, large numbers of samples are usually required for fault diagnosis using deep

learning algorithm. The monitored gearbox signals are mostly normal operation data. The fault samples are few. If samples are limited, the recognition precision and generalization capability

of neural networks are weak. Transfer learning method can better solve such problems26. Zheng et al.27 introduced open source bearing samples as source domain data and treated a target

domain bearing dataset as small samples. Transfer learning model was refined by a new optimal fusion way. Dong et al.28 proposed a small sample intelligent bearing diagnosis method based on

dynamic model and transfer learning, aiming at the difficulty of obtaining fault data in practical engineering. Yu et al.29 proposed a feature fusion CNN based on transfer learning. The

network is of strong robustness and high accuracy verified by bearing fault diagnosis experiment. He et al.30 combined deep transfer learning method and improved residual shrinkage network

to achieve cross-condition quantitative diagnosis of bearing faults. Li et al.31 proposed a planetary gears fault diagnosis approach based on intrinsic feature extraction and deep transfer

learning. Zhong et al.32 proposed a novel fault diagnosis method based on incorporating data augmentation and fine-tuning transfer learning, which combined the synthetized samples and

original data to train the deep network. In summary, a transfer learning model only requires limited samples to train a network suitable for the current task. It solves the problem of

lacking numerous labeled samples. However, the above researches only consider a fault diagnosis model on one type of component and did not consider the performance of models on different

components. There are complex distribution differences in the fault signals generated by different components. Moreover, the effective gearbox fault samples are limited under variable

operating conditions. The above problems lead to poor diagnostic performance and weak generalization ability of models. Therefore, this research proposes a gearbox fault diagnosis method

based on lightweight channel attention mechanism and transfer learning. The method could solve the above problems and realize accurate classification of limited gearbox samples under varying

working conditions and cross-component conditions. The main contributions of this paper are as follows: (1)A new model based on EfficientNetV2 network is proposed. It uses channel attention

mechanism to optimize the negative impact of dimension reduction through appropriate cross channels. Multi-scale feature input is used to retain more detailed features of different

dimensions. Based on a local cross-channel interaction strategy without dimensionality reduction, the size of cross-channel affected by channel number and coefficients is adjusted, which

make the attention mechanism lightweight. (2)A transfer learning strategy is applied to extract features from limited samples. The strategy achieves high-precision fault diagnosis for small

samples of untrained working conditions and components. It expands the application range of transfer learning. Through experimental verification, the proposed model has strong generalization

ability. It could fit the fault distribution difference in different working conditions and components. Simultaneously, it still has good fault diagnosis performance under limited samples.

The rest of this paper is organized as follow. In Section “Method”, the paper introduces wavelet transform method, lightweight network, channel attention mechanism and transfer learning in

detail. In Section “Constructing gearbox fault diagnosis model”, a fault diagnosis model based on transfer learning and LECA module is designed, and detailed flow of fault diagnosis is

shown. In Section “Experimental verification”, the comprehensive performance of the model is demonstrated taking the gearbox fault dataset published by Southeast University. Conclusions are

presented in Section “Conclusion”. METHOD WAVELET TRANSFORM The signals could be directly displayed the mapping relationship in time-domain and frequency-domain using time–frequency analysis

method. It could intuitively reflect the local characteristics of the signals. As one of the representative methods of time–frequency analysis, wavelet transform describes the

time–frequency characteristics of raw signals by translating and stretching wavelet basis functions. It could flexibly change the window length according to the frequency amplitude and

provide good resolution results for non-periodic signal. Wavelet transform can be expressed as the inner product of signal $x(t)$ and wavelet basis function ${\psi }_{a,b}(t)$. The

expression is shown in Eq. (1): $$WT(a,b)=\frac{1}{\sqrt{a}}{\int }_{-\infty }^{\infty }x(t)\psi \left(\frac{t-b}{a}\right)dt$$ (1) $${\psi }_{a,b}(t)=\frac{1}{\sqrt{a}}\psi

\left(\frac{t-b}{a}\right)$$ (2) where $t$ is time variable, $b$ is a translation factor, $a$ is a stretching factor used to control the stretching size of the wavelet basis function.

LIGHTWEIGHT CONVOLUTIONAL NEURAL NETWORKS At present, gearbox fault diagnosis models can achieve high accuracy, but most of the models have complex structures and occupy a lot of computing

resources. Therefore, the lightweight and high-precision model, EfficientNetV2, is selected as the basic network to reduce the computation cost. EfficientNetV2 is a new lightweight CNN

combining neural network search technology33. Its core structure is MBConv and Fused-MBConv module, as shown in Fig. 1 and 2. MBConv module is composed of depthwise separable convolution

(DSC) and SE modules. MBConv module firstly raises the dimension, then calculates using DSC, and finally reduces dimension by convolution layer. Fused-MBConv replaces DSC with standard

convolution to improve operating speed. Fused-MBConv module is used in lower structure of the network. MBConv is applied in the higher structure. It could reduce the network parameters

amount and improve computational speed. CHANNEL ATTENTION MECHANISM Nowadays, attention mechanism has been extensively used in deep neural network due to its characteristics of sharing

weights and strengthening effective information. To improve the ability to extract effective information from gearbox signals, this paper introduces a channel attention mechanism to optimize

the performance of gearbox fault diagnosis model. In SE modules, dimension reduction leads to a decline in model learning ability. Wang et al.34 proposed an efficient channel attention

(ECA) mechanism for the above issue. The architectural details of ECA module is shown in Fig. 3. It adopts a local cross-channel interaction strategy without dimension reduction. The module

obtains more accurate attention information using a 1D convolutional layer to aggregate cross-channel information. First, the aggregation features of each channel are obtained by global

average pooling. Second, the kernel size K is adaptively calculated using the channel number C. Finally, the weight of each channel is calculated adopting 1D convolution and a sigmoid

function. TRANSFER LEARNING Transfer learning is a machine learning method. It reapplies features learned from a task to a target task. In transfer learning, the domain to be learned is

usually defined as source domain ${D}_{s}=\{{x}_{s},P({x}_{s})\}$, its learning task ${T}_{s}=\{{y}_{s},{f}_{s}(\cdot )\}$. The domain to be solved is called target domain

${D}_{t}=\{{x}_{t},Q({x}_{t})\}$, and its learning task ${T}_{t}=\{{y}_{t},{f}_{t}(\cdot )\}({D}_{s}\ne {D}_{t}\,\mathrm{ or }\,{T}_{s}\ne {T}_{t})$. Transfer learning is to acquire

knowledge in ${D}_{s}$ and ${T}_{s}$ to help the learning of ${f}_{t}(\cdot )$26. The fault diagnosis effect of a deep learning model is closely related to whether the training samples

are sufficient. Only based on many training samples can a high-precision deep learning model be obtained. The transfer learning method could realize the knowledge transfer. The knowledge

learned from source domain with sufficient data is applied to target domain with few samples. CONSTRUCTING GEARBOX FAULT DIAGNOSIS MODEL To preserve more detailed features in samples, this

paper uses average pooling and max pooling to extract two different scales information of original signals. Average pooling reflects the global information of feature maps and provides

feedback for each point on feature maps. Max pooling captures the local features of signals and presents the overall trend of signal change35. In this paper, the multi-scale features input

is applied to take the global and local features of samples. A lightweight channel attention mechanism (LECA) is designed. The LECA consists of 1D convolutional layer, BN layer, max pooling,

average pooling, and hard-sigmoid activation function. The LECA captures the dependency between channels by aggregating global and local features. It could adaptively calculate the kernel

size according to channels number and coefficients. By reducing the most suitable even number related to channel coefficients, the attention mechanism is more lightweight. The stride is set

to 2. The features learned by different convolution kernels are scored adaptively. The LECA architectural details is shown in Fig. 4. The expression for the channel attention mechanism is

described as follows. H and W represent the height and width of the input feature maps. C represents the number of channels. The expressions of average pooling and max pooling are shown in

Eq. (3) and (4): $${z}_{1}=\frac{1}{H\times W}{\sum }_{i=1}^{h}{\sum }_{j=1}^{w}f(i,j)$$ (3) $${z}_{2}=\mathit{max}f(i,j)$$ (4) where ${z}_{1}$ and ${z}_{2}$ are the outputs of global

average pooling and global max pooling. $f$ are a set of 2D feature maps. $$w=\sigma (C1D_k (y))$$ (5) where $C1D_{k}$ represents 1D convolution operation with kernel size k, $\sigma$

is a hard-sigmoid activation function, and $W$ is the weight parameter. $${\widetilde{X}}_{c}=w\otimes f(i,j)$$ (6) where ${\widetilde{X}}_{c}$ is the optimized feature matrix. Its

height, width and channel number are the same as the size of the input feature matrix. After adjusting the channel attention module, the important feature information will be enhanced. In

Fig. 4, K represents the coverage of local cross-channel interaction and also represents kernel size for 1 × 1 convolutional layer. To lighten the attention module, subtract the most

suitable even number based on original ECA. The even number is affected by coefficients γ and b. There is a mapping relationship between K and channel C as follows: $$K=\varphi

(C)={\left|\frac{{\mathit{log}}_{2}(C)+b}{\gamma }\left.-{|\frac{\gamma }{b}|}_{even}\right|\right.}_{odd}$$ (7) where \({\left|\frac{{\mathit{log}}_{2}(C)+b}{\gamma }\left.-{|\frac{\gamma

}{b}|}_{even}\right|\right.}_{odd}\) represents the nearest odd number, the coefficients $\gamma$ and $b$ are set to 2 and 1, respectively. The network proposed adopts MBConv and

Fused-MBConv module in EfficientNetV2 network, replacing all SE modules with LECA modules. Since depthwise convolution is slow operation speed in shallow networks, the Fused-MBConv module is

applied in the lower structures. The 3*3 convolution is selected to quickly extract signal features to match the size of time–frequency map. To overcome the gradient vanishing problem, the

smooth SiLU function is selected as activation function. In the deep structure, the MBConv module is used. To reduce the negative impact of dimension reduction in SE, LECA module is

integrated to evaluate the importance of different channel features, highlighting important features and inhibiting invalid features. The multi-scale feature input is used to save more

detailed features in signals. Thus, the feature extraction capability and robustness of the model are enhanced. The detailed structure and parameter of model are shown in Table 1. The

transfer learning is introduced into the research. In the process of transfer learning, the transfer effect of data from similar fields is better than that of two domains with significant

differences. Under different working conditions, the samples of gearbox components have certain similarities. Therefore, the transferred parameters can be used as a powerful set of features

to reduce the complexity and training time of the network. The fault diagnosis flow of the proposed method is shown in Fig. 5. The detailed implementation steps are as follows: * 1. Sample

pretreatment. Wavelet transform is performed on original signals to obtain RGB images (224 × 224 × 3). All the obtained samples are divided into training sets and validation sets, which are

respectively used for training and evaluating the final effect of the model. One working condition is used as source domain samples. The other working conditions are used as target domain

samples. The gearbox datasets are constructed under different working conditions. * 2. Model training and transfer. The convolutional neural network is constructed for training. The model

weight, learning rate and other parameters are determined according to training results. The LECA module is used to extract the key features of the fault signals. Freeze the lower structure

of the network, including the first convolutional layer and four Fused-MBConv modules. Fine-tune the higher structure of the network with the small samples in different working conditions.

The sample distribution difference caused by different working conditions is reduced. * 3. Model application. Input validation samples from the target domain into the trained model. The

Softmax classification layer is used to output the results to complete the gearbox fault diagnosis for varying working conditions. Furthermore, the cross-component fault diagnosis is further

realized according to the above process. EXPERIMENTAL VERIFICATION DATA INTRODUCTION The gearbox dataset from Southeast University is used as fault diagnosis experimental data in this

paper, which was obtained on drivetrain dynamic simulator (DDS)26. This platform is composed of a programmable controller, a variable speed drive motor, a two-stage planetary gearbox and a

two-stage parallel shaft spur gearbox, etc. This dataset includes two sub datasets: bearing and gearbox datasets. Each sub dataset has one health state and four fault states. The detailed

fault types are shown in Table 2. The sample size of source domain is 4000, which is used to train the model. Each fault type consists of 800 samples, divided into a training set and a

validation set in a 4:1 ratio. The sample size of target domain is 1000, which is used to verify the generalization and transfer effects for small samples. Each fault type consists of 200

samples, divided into a training set and a validation set in a 1:4 ratio. The detailed partitioning is shown in Table 2. Wavelet transform is used to process vibration signals, mapping the

original signal to 2D space. The wavelet time–frequency images of bearing and gear are obtained, as shown in Figs. 6 and 7. PARAMETER SETTINGS The entire experiment is performed under the

ubuntu 18.04 operating system, applying Python 3.8 and Pytorch 1.8 framework. It runs on a computer of an Intel (R) Xeon(R) Gold 6330 processor and a NVIDIA GeForce RTX 3090 GPU. During the

experiment, Adaptive Moment Estimation (Adam) is used to update the training parameters of all models. Cross entropy is used to calculate loss values. The dropout is set to 0.2. Batch Size

affects the generalization performance and convergence speed. Too small batch size could lead to a large impact on the training process. Too large could not achieve the ideal accuracy for a

limited number of epochs35. Therefore, the Batch Size is set to 32. In the initial stage, a higher learning rate can quickly approach the optimal solution. The learning rate decay enables

the model to make large weight adjustment at the initial training stage. It can perform more precise parameter adjustments near the optimal solution in the subsequent stages. The initial

learning rate is set to 0.01. The learning rate decay strategy is used to optimize training process. The accuracy of the validation is set as an indicator. The learning rate is adjusted when

the accuracy no longer rises. The experimental variable settings are shown in Table 3. EXPERIMENT AND RESULT ANALYSIS EXPERIMENTAL VERIFICATION OF ATTENTION MECHANISM To explore the fault

diagnosis effectiveness of EfficientnetV2 network integrated with LECA, the classification accuracy of EfficientNetV2 combined with SE, ECA and LECA is compared and analyzed respectively.

Bearing and gear datasets at 20 Hz-0 V working conditions are used for verification. Each model is trained for 10 times to mitigate the influence of random initial values. The average is

taken. The comprehensive performance of the models is shown in Table 4. From Table 4, the accuracy of the three models is all above 90% on both datasets. The diagnostic performance of

LECA-EfficientNetV2 is the best among all comparative methods. The model accuracy rises reasonably after considering the multi-scale input. A small size of convolution kernel could extract

richer features. The accuracy reaches 99.38% and 99.75% respectively, an improvement of about 3% based on ECA module. This demonstrates the effectiveness of the LECA module. In terms of

fault diagnosis efficiency, SE-EfficientNetV2 has the longest diagnosis time, with an average iteration time of about 14.61 s. SE has two fully-connected layers and high computational costs.

The kernel size of LECA module is smaller than ECA. The LECA-EfficientNetV2 spends the shortest diagnosis time, with about 13.57 s, as shown in Fig. 8. By comprehensive comparison, with 50

iterations, the LECA-EfficientNetV2 model has the shortest diagnosis time and the highest accuracy. To further evaluate the superiority of the three models, Fig. 9 presents a validation

accuracy curve of two experimental subjects for 50 iterations. Compared with the networks based on SE and ECA, the accuracy of LECA-EfficientNetV2 model always more than 97% after about ten

iterations, which further demonstrates the model integrated with LECA module could better complete gearbox fault diagnosis. To deeply explore the reasons for false classification of samples,

the paper presents the validation results of bearing and gear data under 20 Hz-0 V working condition in the form of confusion matrix, as shown in Fig. 10. The recognition capacity of

SE-EfficientNetV2 on Ball and Surface faults are relatively poor. The classification effect of ECA-EfficientNetV2 on Comb and Miss faults need to be improved. In contrast,

LECA-EfficientNetV2 network greatly improves the recognition accuracy on different samples. Only a few samples are misclassified. Some Normal samples are misclassified as Comb on the bearing

dataset. A few Miss samples are considered as Root on gear dataset. The recognition precision reaches almost 100% on other fault types. This proves once again the excellent fault feature

learning capacity of LECA-EfficientNetV2 network. Therefore, the further research is to establish a transfer learning network based on the LECA-EfficientNetV2 network. EXPERIMENTAL

VERIFICATION OF TRANSFER LEARNING FAULT DIAGNOSIS The previous experiments were conducted on a dataset with sufficient fault samples. To achieve high-accuracy fault diagnosis for untrained

small samples, transfer learning method is introduced and different types of transfer learning fault diagnosis tasks are set. After the network is trained on source domain dataset, the lower

structures of the network are frozen and the higher structures of the network are fine-tuned with 250 target domain samples. Then 750 validation samples from target domain are used to

estimate the classification capacity. To verify the performance of the proposed network, the fault diagnosis results are compared with seven other models, namely Vgg1336, ResNet5037,

MobileNetV3-L38, EfficientnetV1-b039, EfficientnetV2-S33, GhostNetV240, FasterNet-T241 network. The above models are trained same as LECA-EfficientNetV2 network. To ensure the reliability of

experimental results, the average value of 10 experiments is taken as result. The detailed transfer experiments are shown in Table 5. T1 represents the transfer of bearing fault diagnosis

knowledge learned from source domain (20 Hz-0 V) to target domain (30 Hz-2 V). T3 represents the transfer of bearing diagnosis knowledge learned form source domain (20 Hz-0 V) to gear target

domain (20 Hz-0 V). The classification accuracy of eight transfer learning fault diagnosis models is shown in Fig. 11. The results present the proposed models have stronger transfer feature

learning ability than other models. The accuracy is 99.27% and 99.63% in T1 and T2 respectively. It proves the model is effective in diagnosing small sample faults under variable working

condition. The accuracy of T3 and T4 is 99.15% and 99.02%. This method can achieve cross-component fault diagnosis and has good generalization. Among the comparative methods, The accuracy of

EfficientNet series network is worse than proposed method in four tasks. It proves the model combining multi-scale feature input and LECA module can obtain richer signal information. The

accuracy of FasterNet-T2 is similar to GhostNetV2, both above 97%. The Vgg13 has bad generalization ability and the worst diagnostic effect. The above results demonstrate the method can

effectively extract fault features under different working conditions and components. The fault diagnosis time of eight transfer learning models is shown in Fig. 12. The results show the

proposed model achieves the shortest diagnosis time in four tasks. The diagnosis time is 9.73 s, 9.58 s, 9.92 s and 9.79 s in four tasks. It can realize the fast fault diagnosis in varying

working condition and cross components. In addition to training time, FLoating-point Operations (FLOPs) and Parameters (Params) are usually regarded as indicators to evaluate the complexity

of the model. FLOPs presents the number of floating-point operations. Params is the number of parameters of the model35. The complexity of eight network is shown in Table 6. The parameters

of LECA-EfficientNetV2 is the smallest, but FLOPs is not the lowest. It could be related to the depth of the network, different convolution and other parameter settings. Combined with the

fault diagnosis time of the model, LECA-EfficientNetV2 meets the requirements for model lightweight. To observe the distribution variation process of fault data intuitively, this paper uses

the t-SEN method to visualize the classification process of bearing data in T1 tasks. The detailed feature distribution is shown in Fig. 13. The dimensionality reduction visualization shows,

without classification, the feature distributions of various fault signals are obviously mixed and difficult to distinguish. With the further training of the model, there are already

relatively obvious five kinds of distributions in the fully-connected layer. CONCLUSION This paper proposes a new gearbox fault diagnosis method based on lightweight channel attention

mechanism and transfer learning. The method could solve the problem of bad fault diagnosis performance caused by large sample distribution difference and limited samples. The bearing and

gear datasets are used to verify the classification and generalization capacity of the proposed model. The conclusions are as follows. (1)LECA-EfficientNetV2 has been proven to get 99.38%

and 99.75% accuracy on bearing and gear samples, respectively. The fault diagnosis time is 13.57 s and 13.22 s. Compared with SE-EfficientNetV2 and ECA-EfficientNetV2, LECA-EfficientNetV2

has the best diagnostic effect on both datasets. It could extract more detailed features and effectively complete gearbox fault diagnosis. (2)The transfer learning experiments present

LECA-EfficientNetV2 has the best diagnostic performance and generalization ability under different gearbox working conditions and components. The computational cost shows that proposed

method could meet the requirement for model lightweight. The proposed method can realize fast and accurate classification of gearbox faults. It is of great significance to solve the problem

of small samples in practical engineering applications. Since this paper only explores two components and the validation datasets are completely balanced. However, in industrial environment,

the sample imbalance problem is prominent. The aspects will be further explored in the future: (1)Further expand the application scope of LECA-EfficientNetV2 to enhance model generalization

ability. (2)Further study the model performance in imbalanced datasets and maintain high accuracy. DATA AVAILABILITY The data may be available from the corresponding author upon request.

REFERENCES * Yan, X., She, D., Xu, Y. & Jia, M. Deep regularized variational autoencoder for intelligent fault diagnosis of rotor-bearing system within entire life-cycle process.

_Knowl-Based. Syst._ 226, 107142 (2021). Article Google Scholar * Wang, K. & Qin, F. Fault diagnosis of gearbox based on Fourier Bessel EWT and manifold regularization ELM. _Sci.

Rep.-UK_ 13(1), 14486–14486 (2023). Article ADS MathSciNet CAS Google Scholar * Mishra, R. K., Choudhary, A., Fatima, S., Mohanty, A. R. & Panigrahi, B. K. A fault diagnosis

approach based on 2D-Vibration imaging for bearing faults. _J. Vib. Eng. Technol._ 11(7), 3121–3134 (2022). Article Google Scholar * Dou, S., Liu, Y., Du, Y., Wang, Z. & Jia, X.

Research on feature extraction and diagnosis method of gearbox vibration signal based on VMD and ResNeXt. _Int. J. Comput. Int. Sys._ 16(1), 119 (2023). Article Google Scholar * Liu, Y.,

Dan, B., Yi, C., Huang, T. & Zhang, F. Self-matching extraction fractional wavelet transform for mechanical equipment fault diagnosis. _Meas. Sci. Technol._ 35(3), 035102 (2024). Article

ADS Google Scholar * Zhao, W., Lv, Y., Liu, J., Lee, C. K. M. & Tu, L. Early fault diagnosis based on reinforcement learning optimized-SVM model with vibration-monitored signals.

_Qual. Eng._ 35(4), 696–711 (2023). Article Google Scholar * Jalali, A., Farsi, H. & Ghaemmaghami, S. A universal image steganalysis system based on double sparse representation

classification (DSRC). _Multimed. Tools. Appl._ 77, 16347–16366 (2018). Article Google Scholar * Chen, W., Hsu, S. & Shen, H. Application of SVM and ANN for intrusion detection.

_Comput. Oper. Res._ 32(10), 2617–2634 (2005). Article Google Scholar * Zhu, Z. _et al._ A review of the application of deep learning in intelligent fault diagnosis of rotating machinery.

_Measurement_ 206, 112346 (2023). Article Google Scholar * Lu, Y., Mi, J., Liang, H., Cheng, Y. & Bai, L. Intelligent fault diagnosis of rotating machinery based on a novel lightweight

convolutional neural network. _Proc. Inst. Mech. Eng. O-J. Risk Reliab._ 236(4), 554–569 (2022). Google Scholar * Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of

data with neural networks. _Science_ 313(5786), 504–507 (2006). Article ADS MathSciNet PubMed CAS Google Scholar * Ruan, D., Han, J., Yan, J. & Gühmann, C. Light convolutional

neural network by neural architecture search and model pruning for bearing fault diagnosis and remaining useful life prediction. _Sci. Rep.-UK_ 13(1), 5484–5484 (2023). Article ADS CAS

Google Scholar * Yan, X., Liu, Y., Xu, Y. & Jia, M. Multistep forecasting for diurnal wind speed based on hybrid deep learning model with improved singular spectrum decomposition.

_Energy Convers. Manag._ 225, 113456 (2020). Article Google Scholar * Ye, M., Yan, X., Chen, N. & Jia, M. Intelligent fault diagnosis of rolling bearing using variational mode

extraction and improved one-dimensional convolutional neural network. _Appl. Acoust._ 202, 109143 (2023). Article Google Scholar * Long, Y., Zhou, W. & Luo, Y. A fault diagnosis method

based on one-dimensional data enhancement and convolutional neural network. _Measurement_ 180, 109532 (2021). Article Google Scholar * Yan, X., She, D. & Xu, Y. Deep order-wavelet

convolutional variational autoencoder for fault identification of rolling bearing under fluctuating speed conditions. _Expert Syst. Appl._ 216, 119479 (2023). Article Google Scholar *

Zhang, J., Zhang, Q., Qin, X. & Sun, Y. Robust fault diagnosis of quayside container crane gearbox based on 2D image representation in frequency domain and CNN. _Struct. Health. Monit._

23(1), 324–342 (2024). Article Google Scholar * Chen, A., Li, X., Jing, H., Hong, C. & Li, M. Anomaly detection algorithm for photovoltaic cells based on lightweight Multi-Channel

spatial attention mechanism. _Energies_ 16(4), 1619 (2023). Article Google Scholar * Zhu, J., Jiang, Q., Shen, Y., Xu, F. & Zhu, Q. Res-HSA: Residual hybrid network with self-attention

mechanism for RUL prediction of rotating machinery. _Eng. Appl. Artif. Intel._ 124, 106491 (2023). Article Google Scholar * Zhao, Y., Chen, J., Xu, X., Lei, J. & Zhou, W. SEV-Net:

residual network embedded with attention mechanism for plant disease severity detection. _Concurr. Comp-Pract. E._ 33(10), e6161 (2021). Article Google Scholar * Li, J., Liu, Y. & Li,

Q. Intelligent fault diagnosis of rolling bearings under imbalanced data conditions using attention-based deep learning method. _Measurement_ 189, 110500 (2022). Article Google Scholar *

Liu, S., Huang, J., Ma, J. & Luo, J. SRMANet: Toward an interpretable neural network with Multi-Attention mechanism for gearbox fault diagnosis. _Appl. Sci._ 12(16), 8388 (2022). Article

CAS Google Scholar * Zhao, Z., Jiao, Y. & Zhang, X. A fault diagnosis method of rotor system based on parallel convolutional neural network architecture with attention mechanism. _J.

Signal. Process. Syst._ 95(8), 965–977 (2023). Article Google Scholar * Ding, Y. _et al._ Deep imbalanced domain adaptation for transfer learning fault diagnosis of bearings under

multiple working conditions. _Reliab. Eng. Syst. Safe._ 230, 108890 (2023). Article Google Scholar * Li, M., Peng, P., Zhang, J., Wang, H. & Shen, W. SCCAM: Supervised contrastive

convolutional attention mechanism for Ante-Hoc interpretable fault diagnosis with limited fault samples. _IEEE Trans. Neural Netw. Learn._ 1–12. https://doi.org/10.1109/TNNLS.2023.3313728

(2023). * Shao, S., McAleer, S., Yan, R. & Baldi, P. Highly accurate machine fault diagnosis using deep transfer learning. _IEEE Trans. Ind. Inform._ 15(4), 2446–2455 (2019). Article

Google Scholar * Zheng, Z., Fu, J., Lu, C. & Zhu, Y. Research on rolling bearing fault diagnosis of small dataset based on a new optimal transfer learning network. _Measurement_ 177,

109285 (2021). Article Google Scholar * Dong, Y., Li, Y., Zheng, H., Wang, R. & Xu, M. A new dynamic model and transfer learning based intelligent fault diagnosis framework for rolling

element bearings race faults: Solving the small sample problem. _ISA Trans._ 121, 327–348 (2022). Article PubMed Google Scholar * Yu, D., Fu, H., Song, Y., Xie, W. & Xie, Z. Deep

transfer learning rolling bearing fault diagnosis method based on convolutional neural network feature fusion. _Meas. Sci. Technol._ 35(1), 015013 (2023). Article ADS Google Scholar * He,

S., Zhu, L., Li, H., Hu, C. & Bao, J. Cross-condition quantitative diagnosis method for bearing faults based on IDRSN-ECDAN. _Meas. Sci. Technol._ 35(2), 025129 (2024). Article ADS

Google Scholar * Li, H. _et al._ Fault diagnosis of planetary gears based on intrinsic feature extraction and deep transfer learning. _Meas. Sci. Technol._ 34(1), 014009 (2023). Article

ADS MathSciNet Google Scholar * Zhong, H. _et al._ Fine-tuning transfer learning based on DCGAN integrated with self-attention and spectral normalization for bearing fault diagnosis.

_Measurement_ 210, 112421 (2023). Article Google Scholar * Tan, M. & Le, Q. Efficientnetv2: Smaller models and faster training. In _International Conference on Machine Learning_,

10096–10106 (2021). * Wang, Q. _et al._ ECA-Net: Efficient channel attention for deep convolutional neural networks. In _Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition_, 11534–11542 (2020). * Huang, Y., Liao, A., Hu, D., Shi, W. & Zheng, S. Multi-scale convolutional network with channel attention mechanism for rolling bearing fault

diagnosis. _Measurement_ 203, 111935 (2022). Article Google Scholar * Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In _International

Conference on Learning Representations_, 1–14 (2014). * He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In _Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition_, 770–778 (2016). * Howard, A. _et al._ Searching for mobilenetv3. In _Proceedings of the IEEE/CVF International Conference on Computer Vision_,

1314–1324 (2019). * Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In _International Conference on Machine Learning_, 6105–6114 (2019). *

Tang, Y. _et al._ GhostNetv2: enhance cheap operation with long-range attention. _Adv. Neural Inf. Process. Syst._ 35, 9969–9982 (2022). Google Scholar * Chen, J. _et al._ Run, Don't

walk: Chasing higher FLOPS for faster neural networks. In _Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition_, 12021–12031 (2023). Download references

ACKNOWLEDGEMENTS The authors would like to thank the editor and reviewers for the valuable comments and suggestions. FUNDING This work was supported by Beijing Municipal Education Commission

& Beijing Natural Science Foundation Co-financing Project (Grant Numbers [KZ202210015019]), the Project of Construction and Support for high-level Innovative Teams of Beijing Municipal

Institutions (Grant Numbers [BPHR20220107]). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of Mechanical and Electrical Engineering, Beijing Institute of Graphic Communication,

Beijing, 102600, China Xuemin Cheng, Shuihai Dou, Yanping Du & Zhaohua Wang Authors * Xuemin Cheng View author publications You can also search for this author inPubMed Google Scholar *

Shuihai Dou View author publications You can also search for this author inPubMed Google Scholar * Yanping Du View author publications You can also search for this author inPubMed Google

Scholar * Zhaohua Wang View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS Study conception and design: X.C., Y.D.; data collection: X.C.,

Z.W.; analysis and interpretation of results: X.C., S.D.; draft manuscript preparation: X.C., S.D. All authors reviewed the results and approved the final version of the manuscript.

CORRESPONDING AUTHOR Correspondence to Shuihai Dou. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER'S NOTE Springer

Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative

Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the

original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in

the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your

intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence,

visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Cheng, X., Dou, S., Du, Y. _et al._ Gearbox fault diagnosis method based on

lightweight channel attention mechanism and transfer learning. _Sci Rep_ 14, 743 (2024). https://doi.org/10.1038/s41598-023-50826-6 Download citation * Received: 30 October 2023 * Accepted:

26 December 2023 * Published: 07 January 2024 * DOI: https://doi.org/10.1038/s41598-023-50826-6 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content:

Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Gearbox fault diagnosis method based on lightweight channel attention mechanism and transfer learning

Play all audios:

Trending News

Latest News