Overview of the WaveNet entire architecture. A residual module in the model is shown in the dotted line. Multiple such modules will be stacked together in the network. K is the layer index. The nodes of each layer in the hidden layer will add the original value and the value of the activation function and pass it to the next layer. The 1 × 1 convolution kernel is used to reduce the number of channels. Then the results of the over activated function of each hidden layer are added to do a series of operations and transmitted to the output layer. The output layer uses softmax to calculate the probability of each sampling point.

