Linguistix: Speaker Recognition System
CSL2050: Pattern Recognition and Machine Learning
1IIT Jodhpur |
Project Summary
Linguistix: Speaker Recognition System explores speaker identification through classical machine learning methods. The project applies a diverse range of supervised models—including K-Nearest Neighbors, Support Vector Machines, Decision Trees, Artificial Neural Networks, Multi-layer Perceptron, and Naïve Bayes—as well as unsupervised techniques such as K-Means clustering and Gaussian Mixture Models (GMM).
To enhance speaker representation and reduce model complexity, dimensionality reduction techniques like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are used. Ensemble strategies are also integrated, including ensemble GMMs for robust likelihood estimation and tree-based ensembles to address overfitting in decision trees.
A detailed experimental pipeline demonstrates that combining classical models with PCA or LDA preprocessing significantly improves classification accuracy and generalization. Models such as SVM, KNN, ANN, and Bayesian classifiers consistently yield strong results across multiple test conditions.
The GitHub repository includes the feature extraction pipeline, model training and evaluation scripts, ensemble implementations, and preprocessed datasets. Contributors and the tech stack used are also provided for transparency and reproducibility.
In summary, Linguistix presents a thorough and scalable approach to speaker recognition, integrating classical ML techniques with dimensionality reduction and ensemble learning for improved performance and interpretability.
Performance Summary of Various Models
Model | Train Acc (%) | Validation Acc (%) | Test Acc (%) |
---|---|---|---|
KNN (With LDA) | 99.89 | 100.00 | 99.80 |
KNN (With PCA) | 89.58 | 76.98 | 79.08 |
KNN (Entire Dataset) | 98.27 | 95.22 | 94.63 |
SVM (With FS) | 98.95 | 99.24 | 100.00 |
SVM (With PCA) | 98.12 | 99.21 | 99.40 |
SVM (With LDA) | 91.43 | 97.78 | 95.56 |
Bayesian Learning (Entire Dataset) | 95.96 | 75.00 | 74.70 |
Bayesian Learning (With FS) | 96.87 | 79.76 | 82.27 |
Bayesian Learning (With LDA) | 99.89 | 100.00 | 99.80 |
Bayesian Learning (With PCA) | 52.49 | 50.25 | 48.11 |
Decision Tree (Entire Dataset) | 100.00 | 63.35 | 61.43 |
Decision Tree (With PCA) | 100.00 | 14.34 | 14.12 |
Decision Tree (With LDA) | 100.00 | 14.34 | 14.12 |
Decision Tree (With PCA + LDA) | 100.00 | 63.35 | 61.43 |
Decision Tree (With LDA + t-SNE) | 99.60 | 1.86 | 1.59 |
Decision Tree (With LDA + UMAP) | 99.94 | 84.88 | 88.59 |
Decision Tree (With t-SNE) | 15.77 | 1.86 | 2.65 |
Decision Tree (With UMAP) | 16.51 | 11.41 | 12.20 |
Decision Tree (Raw Features) | 16.79 | 8.22 | 10.61 |
AdaBoost | 20.78 | 18.92 | 19.28 |
SAMME | 7.37 | 7.17 | 7.16 |
Bagging | 100.00 | 81.08 | 82.70 |
K-Means (With LDA) | 88.61 | 92.61 | 87.67 |
K-Means (With LDA + PCA) | 91.03 | 94.03 | 87.67 |
K-Means (Raw Features) | 8.68 | 9.09 | 37.27 |
ANN (With FS) | 99.72 | 80.79 | 82.42 |
ANN (With PCA) | 88.75 | 80.23 | 72.17 |
CNN (Raw Features) | 96.20 | 89.80 | 91.85 |
CNN (With LDA) | 99.88 | 99.75 | 99.80 |
CNN (With PCA) | 97.32 | 54.48 | 59.84 |
CNN (With LDA + PCA) | 100.00 | 99.75 | 99.80 |
Interactive Web-App
Spotlight Video
Our Team
Vyankatesh Deshpande
B23CS1079
Shashank Parchure
B23CM1059
Atharva Honparkhe
B23EE1006
Abhinash Roy
B23CS1003
Namya Dhingra
B23CS1040
Damarasingu Akshaya Sree
B23EE1085
Acknowledgment
We would like to express our heartfelt appreciation to Dr. Anand Mishra for granting us the privilege to contribute to this project.