Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 5;11(1):9614.
doi: 10.1038/s41598-021-88799-z.

Deep convolution stack for waveform in underwater acoustic target recognition

Affiliations

Deep convolution stack for waveform in underwater acoustic target recognition

Shengzhao Tian et al. Sci Rep. .

Abstract

In underwater acoustic target recognition, deep learning methods have been proved to be effective on recognizing original signal waveform. Previous methods often utilize large convolutional kernels to extract features at the beginning of neural networks. It leads to a lack of depth and structural imbalance of networks. The power of nonlinear transformation brought by deep network has not been fully utilized. Deep convolution stack is a kind of network frame with flexible and balanced structure and it has not been explored well in underwater acoustic target recognition, even though such frame has been proven to be effective in other deep learning fields. In this paper, a multiscale residual unit (MSRU) is proposed to construct deep convolution stack network. Based on MSRU, a multiscale residual deep neural network (MSRDN) is presented to classify underwater acoustic target. Dataset acquired in a real-world scenario is used to verify the proposed unit and model. By adding MSRU into Generative Adversarial Networks, the validity of MSRU is proved. Finally, MSRDN achieves the best recognition accuracy of 83.15%, improved by 6.99% from the structure related networks which take the original signal waveform as input and 4.48% from the networks which take the time-frequency representation as input.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The basic unit of DRSN. In convolution layer, parameter c is the convolutional out channel number, k is the kernel size of convolution, and s is the stride of convolution.
Figure 2
Figure 2
The structure of Multiscale Residual Unit (MSRU). Two hyper-parameters C and S will determine the output shape. The shape of input data is [B,C_in,L] in which B represents batch size, C_in represents channel number, and L represents data length. The shape of output data is [B,C×4,L/S]. Hyper-parameters S is usually set to 1 or 2. Parallel multiscale convolution module consists of four convolutional layers with different kernel size and a channel concat operation. Soft Threshold Learning Module consists of a global average pooling layer and two fully connected nonlinear transformation layer. In every convolution layer, parameter c is the convolutional out channel number, k is the kernel size of convolution, and s is the stride of convolution. All the padding approach are “same”.
Figure 3
Figure 3
The structure of multiscale residual deep neural network (MSRDN). The shape of input data is [B, 1, L], in which B represents batch size and L represents data length. The shape of output is [B,Class_N], in which B represents batch size and Class_N represents the number of predicted categories. In the head of the network are four parallel convolution with different kernel size. the main body of MSRDN is stacked by MSRU. According to the difference of hyper-parameter C, all MSRUs will be divided into four convolution stacks. The convolution stacks can be directly connected to each other due to the independence and flexibility of MSRU.
Figure 4
Figure 4
Time-Frequency Representation of each category.
Figure 5
Figure 5
The loss curves of the discriminators and generators. (a) Generator loss of BigGAN and MSBigGAN. (b) Discriminator loss of BigGAN and MSBigGAN. (c) Generator loss of WaveGAN and MSRWaveGAN. (d) Discriminator loss of WaveGAN and MSRWaveGAN. The smoothing function is used, and the smoothing factors are set to 0.8 in all subfigure.

Similar articles

Cited by

References

    1. Meng Q, Yang S, Piao S. The classification of underwater acoustic target signals based on wave structure and support vector machine. J. Acoust. Soc. Am. 2014;136:2265. doi: 10.1121/1.4900181. - DOI
    1. Meng Q, Yang S. A wave structure based method for recognition of marine acoustic target signals. J. Acoust. Soc. Am. 2015;137:2242. doi: 10.1121/1.4920186. - DOI
    1. Cai Y, Shi X. The feature extraction and classification of ocean acoustic signals based on wave structure. Acta Electron. Sin. 1999;27:129–130.
    1. Azimi-Sadjadi MR, Yao D, Huang Q, Dobeck GJ. Underwater target classification using wavelet packets and neural networks. IEEE Trans. Neural Netw. 2000;11:784–794. doi: 10.1109/72.846748. - DOI - PubMed
    1. Wei X, Gang-Hu LI, Wang ZQ. Underwater target recognition based on wavelet packet and principal component analysis. Comput. Simul. 2011;28:8–290.

Publication types