Linguistix: Speaker Recognition System

CSL2050: Pattern Recognition and Machine Learning

Vyankatesh Deshpande 1 Shashank Parchure 1 Atharva Honparkhe 1 Abhinash Roy 1 Namya Dhingra 1 Damarasingu Akshaya Sree 1

1IIT Jodhpur

Project Summary

Linguistix: Speaker Recognition System explores speaker identification through classical machine learning methods. The project applies a diverse range of supervised models—including K-Nearest Neighbors, Support Vector Machines, Decision Trees, Artificial Neural Networks, Multi-layer Perceptron, and Naïve Bayes—as well as unsupervised techniques such as K-Means clustering and Gaussian Mixture Models (GMM).

To enhance speaker representation and reduce model complexity, dimensionality reduction techniques like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are used. Ensemble strategies are also integrated, including ensemble GMMs for robust likelihood estimation and tree-based ensembles to address overfitting in decision trees.

A detailed experimental pipeline demonstrates that combining classical models with PCA or LDA preprocessing significantly improves classification accuracy and generalization. Models such as SVM, KNN, ANN, and Bayesian classifiers consistently yield strong results across multiple test conditions.

The GitHub repository includes the feature extraction pipeline, model training and evaluation scripts, ensemble implementations, and preprocessed datasets. Contributors and the tech stack used are also provided for transparency and reproducibility.

In summary, Linguistix presents a thorough and scalable approach to speaker recognition, integrating classical ML techniques with dimensionality reduction and ensemble learning for improved performance and interpretability.

Performance Summary of Various Models

Model Train Acc (%) Validation Acc (%) Test Acc (%)
KNN (With LDA)99.89100.0099.80
KNN (With PCA)89.5876.9879.08
KNN (Entire Dataset)98.2795.2294.63
SVM (With FS)98.9599.24100.00
SVM (With PCA)98.1299.2199.40
SVM (With LDA)91.4397.7895.56
Bayesian Learning (Entire Dataset)95.9675.0074.70
Bayesian Learning (With FS)96.8779.7682.27
Bayesian Learning (With LDA)99.89100.0099.80
Bayesian Learning (With PCA)52.4950.2548.11
Decision Tree (Entire Dataset)100.0063.3561.43
Decision Tree (With PCA)100.0014.3414.12
Decision Tree (With LDA)100.0014.3414.12
Decision Tree (With PCA + LDA)100.0063.3561.43
Decision Tree (With LDA + t-SNE)99.601.861.59
Decision Tree (With LDA + UMAP)99.9484.8888.59
Decision Tree (With t-SNE)15.771.862.65
Decision Tree (With UMAP)16.5111.4112.20
Decision Tree (Raw Features)16.798.2210.61
AdaBoost20.7818.9219.28
SAMME7.377.177.16
Bagging100.0081.0882.70
K-Means (With LDA)88.6192.6187.67
K-Means (With LDA + PCA)91.0394.0387.67
K-Means (Raw Features)8.689.0937.27
ANN (With FS)99.7280.7982.42
ANN (With PCA)88.7580.2372.17
CNN (Raw Features)96.2089.8091.85
CNN (With LDA)99.8899.7599.80
CNN (With PCA)97.3254.4859.84
CNN (With LDA + PCA)100.0099.7599.80
Top 3 Models

Interactive Web-App

Access the Web-App

Spotlight Video

Our Team

Vyankatesh Deshpande

B23CS1079

Shashank Parchure

B23CM1059

Atharva Honparkhe

B23EE1006

Abhinash Roy

B23CS1003

Namya Dhingra

B23CS1040

Damarasingu Akshaya Sree

B23EE1085

Acknowledgment

We would like to express our heartfelt appreciation to Dr. Anand Mishra for granting us the privilege to contribute to this project.
For any queries, please contact Vyankatesh Deshpande or raise an issue on GitHub.