نوع المستند : مقالات علمیة محکمة
المؤلف
كلية التربية النوعيه ج المنصورة
المستخلص
1- Introduction:
Music separation and classification is a critical task in signal processing, particularly in the field of music. In general, singing sound separation systems have applications in a variety of domains, including automatic word recognition and alignment, identification of the singer, retrieval of information about music, karaoke, identifying musical genre, extraction of melody, sound signal categorization, and so on [1].
Currently, genre hierarchies, which are typically built manually by human specialists, are one way used to arrange music information on the Web [2]. This method, which may be automated via automatic musical genre categorization, is an important component of a comprehensive music information retrieval system for audio data. There is also a framework for developing and analysing musical content features [3]. The bulk of audio analysis algorithms for music that have been proposed employ feature extraction for similarity retrieval, classification, segmentation, and audio thumbnailing. As we can see, our research is the first step in identifying and evaluating various types of Arabic music, as well as isolating the Arabic singing voice from the background music. The following is the paper's outline. Section 3 proposes a system. Section 4 is about the automatic classification and evaluation of the proposed features, and Section 5 is about the conclusions and future work.
RELATED WORK
One of the most common task for getting Musical Information Retrieval (MIR) is analysis process of musical jins. There were many works in this process, but it was limited to Western music thus works in Arabic music consider non-existent . This section presented some studies related to separating singing voice from music background as a first step in the analysis process, then determining music genres as a second step.
In [4 ] The bowing movements are utilized to create additional limitations on audio dictionary activation entries using non-negative matrix factorization (NMF). However, this method is only tested on randomly produced video sequences of string instruments where each player's different bowing actions are clearly captured.
In [5] Similar work has been proposed in which audio and visual correspondences are learned unsupervised to aid source separation. This research topic produces positive results separation of audiovisual music for musical instrument performances, but not separation of vocal voices.
In [6] The author articulates a generic formalism for source model adaptation in a Bayesian model framework. The suggested approach is then tested on the challenge of differentiating between voice and music in popular songs. According to the data, an adaptation technique can greatly improve separation performance on a continual basis
In [7] After focusing on the fluctuation of a singing voice, A technique for improving the singing voice in monaural music signals is based on "two-stage harmonic/percussive sound separation on various resolution spectrograms. This technique relies heavily on the pitch estimation parameter, and the pitch estimulation parameter is critical to its effectiveness.
In [8] The author suggested an unique unsupervised technique for singing voice separation that uses rank-1 constraint to improve RPCA. The mixter and DSD100 datasets were employed in the experiment, and the findings demonstrated that CRPCA outperforms regular RPCA and WRPCA, especially when time frequency masking is used.
In [9] Huang et al. began with the notion that music accompaniments are unimportant and difficult to change, whereas singers are more versatile but scarce in the auditory mix. Based on these assumptions, it is advised that the Robust Principal Component Analysis (RPCA) technique, for solving low-rank and sparse matrices, be utilized.
In [10] the work was used to identify three types of folk music: German, Irish, and Austrian. To investigate statistical differences across the various folk, the dataset was evaluated and compared using several Hidden Markov Models (HMM) architectures. The categorization performance is now averaging 77%.
In [11] Support Vector Machines (SVM) as a statistical ML approach performed better in MGC than HMM .
In [12] They chose frequency-domain properties and low-level traits using a genetic approach. For comparative comparison, different classification approaches such as Nave Bayes (NB), K-nearest neighbor (KNN), and SVM are applied.
In [13] The researchers used samples from the GTZAN dataset to distinguish between ten genres. With, we were able to get an ideal classification accuracy of 80.1 %.
Many different classifiers and feature extraction algorithms are utilised for automatic MGC. Gaussian mixture model (GMM) [14], radial basis function (RBF) [15], AdaBoost [16], and semi-supervised method [17] are the classifiers.
The PROPOSED SYSTEM
This study discusses an arabic music analysis system that employs artificial intelligence techniques and plays an essential role in retrieving music information (MIR). There are two primary steps in our analytical system: extracting the melody by isolating the singing voice from the music background Then there's melodic analysis to determine the genres that make it up.
The suggested CRPCA as a first portion of our analysis system is discussed in this section. Despite the fact that RPCA has been effectively used to the problem of separation of singing voices, in order to lower the nuclear norm, it overlooks the changing SVD characteristic values and computing complexity [18]. To address these two problems, CRPCA, a partial sum minimization of singular value based on rank-1 constraint, is applied. The purpose of RPCA is to maximize the partial sum of singular values by applying a prior rank-1 restriction fully [19]. Due to the rank-1 limitation, In many songs, background music has a greater range of richness than the singer vocal . Figure 1 depicts the processes of the CRPCA separation technique.
Figure 1. Block diagram of CRPCA teqnique for separation process
The proposed Arabic music database is the first database that separates Arabic songs. It contains 100 Arabic songs ranging in length from 90 to 120 seconds. It includes Arab singers such as Umm Kulthum, Abdel Halim, Najah and others. These clips are found on the Internet.
CRPCA is a variant of RPCA that separates singing voices using the rank-1 constraint. This is how the model is defined [20] :
Minimize ||Sx||∗ + λx ||Sy||1 (1)
Subject to Sx + Sy = Mx
Where : Mx ∈ Rn1×n2, Sx ∈ Rn1×n2 , L ∈ Rn1×n2 . and || • ||∗ and || • ||1 consider The nuclear norm (single value summation) is denoted.L1 norm (Summation of absolute matrix entry values). λx > 0 is a trade-off parameter between range L and sparsity S, λ k = k/max (n1, n2).
CRPCA Algorithm 1 is used to separate singing voices. The value of M is a music signal derived from the audio data observed. After separating them with CRPCA, We can obtain a sparse matrix S and a low-rank matrix L (music accompaniment) (singing voice) [21].
To improve separation performance, we use ideal binary time frequency masking (IBM) estimate following CRPCA separation to further improve separation outcomes. Mibm is defined as follows [22]:
(2)
Figure 2. A comparison of RPCA, WRPCA, CRPCA, and CRPCA on SDR, SIR, and NSDR on our dataset with IBM.
We partition the test dataset using CRPCA and conduct a short-time Fourier transform (STFT) and an inverse STFT (ISTFT). In addition, we employ the IBM technique to improve separation outcomes. Finally, the low-rank matrix L (accompanying music) and the sparse matrix S can be obtained (singing voice).
B. Arabic Music Genre Classification Using support vector classification
One of the primary features that distinguishes music based on a certain set of patterns is genre. The internet's arabic music genres are poorly defined, making computerised categorization of Arabic audio genres difficult. The first stage in this project is to build a well-annotated dataset that includes thirteen of the most well-known Arabic music genres.
Figure 3. Show the framework of genres classification step
The Arabic Music Genre (AMG) dataset's vast corpus is developed by recording various audio snippets. The dataset contains thirteen distinct genre classes. Each music composition is 60 to 120 seconds long and is saved as a 1200 MB wav audio file.
Figures 4 and 5 illustrate the waveforms of two AMGs, rast and nahwand , utilizing the x-axis and the y-axis for amplitude In this case, Both genres' visual representations vividly demonstrate the difficulties in differentiating one from the other.
Figure 4. The Nahwand genre's spectrogram
Figure 5. Spectrogram of the Rast genre
The power spectrogram was created using STFT with the following parameters: The sample rate (sr) is 44100.
Davis et al. [24] proposed Mel-Frequency Cepstral Coefficient (MFCC) in the 1990s. The MFCC summarizes the frequency distribution over a window size, allowing analysis of the sound's time and frequency features. It is one of the most widely used cutting-edge feature extraction methods.
Figure 6 . MFCC Spectrogram
SVM is a supervised learning methodology that aims to develop a decision-making strategy to better characterize training data by using a specific training data set. The method attempts to use a hyperplane to classify decision boundaries that distinguish data points of different groups and are commonly employed in pattern identification and classification. The hyper plane equation is defined as [26]:
(3)
Where:
is weight vector and is bias.
In this work, multi-class SVM divides the research sample into 35 pieces. There are several techniques available to deal with various groups. In this work, "one vs. other" techniques have been employed to isolate items from one class from those from another [27]. The appropriate hyper level that distinguishes each class from the other elements is calculated. The N classes are distinguished by a set of binary classifiers, with each training identifying one class from the other. SVM seeks the best separation hyper plane (i.e., a "decision boundary" that separates instances of one class from those of another). Data from two classes can always be separated by a hyper plane with a suitable nonlinear mapping. The SVM method is precise because it can model complex, nonlinear decision boundaries [28].
Experimental results
Our analysis system has been applied to two datasets: first one is arabic songs database for singing voice separation which contains male and female vocalists with a sampling rate of 16 kHz and audio clip lengths ranging from 90 to 120 seconds. Data were taken from arabic music tracks that included more than two background music instruments. Second , arabic music genres database for genre identification that comprises 35 classes of arabic music genres.
CRPCA generates better separation results than SDR, SIR, and NSDR. Table I displays the execution time of each technique on two identical datasets. According to CRPCA, a previous target rank can be utilized to separate a singing voice from a mixed music signal.
GNSDR is calculated by averaging the NSDRs of each collection over all mixtures, weighted by time, and evaluating the terms of source to artefact ratio (SAR), source to distortion ratio (SDR), and source to interference ratio (SIR) using BSS-EVAL. [29] GNSDR is defined as follows:
As : N is the total number of tracks, wn is the track time, and n is a song index.
Table1. Evaluation table for 4 song files with 90 s duration using CRPCA for separation
Song name |
SDR |
SIR |
SAR |
GNSDR |
Ya msafer wahdak |
1.7360 |
5.0676 |
30.8097 |
2.6610 |
shaghuly f alhubi badr |
2.2495 |
7.0665 |
22.1440 |
1.2282 |
ana f intizarak |
2.0895 |
6.7545 |
16.408 |
3.2028 |
bent el balad |
1.0568 |
4.9963 |
20.4936 |
2.4452 |
According to the experiment results, pitch with (MFCC) has the best effect for extracting music signal features, and MFCC is a good feature to employ for individual recordings. The accuracy classification achieved by SVM reaches 99 % when the recognition rate is used, as demonstrated in the following equation.
Figure. 6.4 experimental results to identify music genres at first 40 second of shaghuly f alhubi badr Arabic song
On our dataset, we tested the recommended approach. The trial findings using SDR, SIR, and NSDR clearly show that CRPCA achieves higher separation outcomes. The process of analysis and classification of races was evaluated from the following equation [30]:
The results of the evaluation of the experiment indicated the effectiveness of the proposed system, as the accuracy of the results reached to 99%.
The classification of musical genres is difficult in light of the large amount of fresh data produced in an unstructured manner, which is a topic of interest in Music Information Retrieval. Furthermore, certain musical genres are formed of similar sorts of instruments with similar rhythmic patterns, adding to the difficulty. In the future, we hope to create an automatic music genre using deep learning classifier model that might be included in automatic music information retrieval systems or applications that benefit from music genre information.