Noise suppression models running in production environments are commonly trained on publicly avai... more Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions in production environments due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This 'ears-off' situation motivates augmenting existing datasets in a privacypreserving manner. In this paper, we present Aura, a solution to make existing noise suppression test sets more challenging and diverse while limiting the sampling budget. Aura is 'ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of Aura, we augment a current benchmark test set in noise suppression by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips obtained from AudioSet. Aura makes the existing benchmark test set harder by 100% in DNSMOS P.835, a 26% improvement in Spearman's rank correlation coefficient (SRCC) compared to random sampling and, identifies 73% out-of-distribution samples to augment the test set.
The task of translating between programming languages differs from the challenge of translating n... more The task of translating between programming languages differs from the challenge of translating natural languages in that programming languages are designed with a far more rigid set of structural and grammatical rules. Previous work has used a tree-to-tree encoder/decoder model to take advantage of the inherent tree structure of programs during translation. Neural decoders, however, by default do not exploit known grammar rules of the target language. In this paper, we describe a tree decoder that leverages knowledge of a language's grammar rules to exclusively generate syntactically correct programs. We find that this grammar-based tree-to-tree model outperforms the state of the art tree-to-tree model in translating between two programming languages on a previously used synthetic task.
The task of translating between programming languages differs from the challenge of translating n... more The task of translating between programming languages differs from the challenge of translating natural languages in that programming languages are designed with a far more rigid set of structural and grammatical rules. Previous work has used a tree-to-tree encoder/decoder model to take advantage of the inherent tree structure of programs during translation. Neural decoders, however, by default do not exploit known grammar rules of the target language. In this paper, we describe a tree decoder that leverages knowledge of a language's grammar rules to exclusively generate syntactically correct programs. We find that this grammar-based tree-to-tree model outperforms the state of the art tree-to-tree model in translating between two programming languages on a previously used synthetic task.
Machine Learning, Optimization, and Data Science, 2022
Highly-parameterized deep neural networks are known to have strong data-memorization capability, ... more Highly-parameterized deep neural networks are known to have strong data-memorization capability, but does this ability to memorize random data also extend to simple standard learning methods with few parameters? Following recent work exploring memorization in deep learning, we investigate memorization in standard non-neural learning models through the label recorder method, which uses a model's training accuracy on randomized data to estimate its memorization ability, giving a distribution-and regularization-dependent label recording score. Label recording scores can be used to measure how capacity changes in response to regularization and other hyperparameter choices. This method is fully empirical, easy to implement, and works for all black-box classification methods. The label recording score supplements existing theoretical measures of model capacity such as Rademacher complexity and Vapnik-Chervonenkis (VC) dimension, while agreeing with conventional intuitions regarding statistical learning processes. We find that memorization ability is not limited to only over-parameterized models, but instead exists as a continuum, being present (to some degree) even in simple learning models with few parameters.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Noise suppression models running in production environments are commonly trained on publicly avai... more Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions in production environments due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This 'ears-off' situation motivates augmenting existing datasets in a privacypreserving manner. In this paper, we present Aura, a solution to make existing noise suppression test sets more challenging and diverse while limiting the sampling budget. Aura is 'ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of Aura, we augment a current benchmark test set in noise suppression by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips obtained from AudioSet. Aura makes the existing benchmark test set harder by 100% in DNSMOS P.835, a 26% improvement in Spearman's rank correlation coefficient (SRCC) compared to random sampling and, identifies 73% out-of-distribution samples to augment the test set.
Noise suppression models running in production environments are commonly trained on publicly avai... more Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions in production environments due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This 'ears-off' situation motivates augmenting existing datasets in a privacypreserving manner. In this paper, we present Aura, a solution to make existing noise suppression test sets more challenging and diverse while limiting the sampling budget. Aura is 'ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of Aura, we augment a current benchmark test set in noise suppression by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips obtained from AudioSet. Aura makes the existing benchmark test set harder by 100% in DNSMOS P.835, a 26% improvement in Spearman's rank correlation coefficient (SRCC) compared to random sampling and, identifies 73% out-of-distribution samples to augment the test set.
The task of translating between programming languages differs from the challenge of translating n... more The task of translating between programming languages differs from the challenge of translating natural languages in that programming languages are designed with a far more rigid set of structural and grammatical rules. Previous work has used a tree-to-tree encoder/decoder model to take advantage of the inherent tree structure of programs during translation. Neural decoders, however, by default do not exploit known grammar rules of the target language. In this paper, we describe a tree decoder that leverages knowledge of a language's grammar rules to exclusively generate syntactically correct programs. We find that this grammar-based tree-to-tree model outperforms the state of the art tree-to-tree model in translating between two programming languages on a previously used synthetic task.
The task of translating between programming languages differs from the challenge of translating n... more The task of translating between programming languages differs from the challenge of translating natural languages in that programming languages are designed with a far more rigid set of structural and grammatical rules. Previous work has used a tree-to-tree encoder/decoder model to take advantage of the inherent tree structure of programs during translation. Neural decoders, however, by default do not exploit known grammar rules of the target language. In this paper, we describe a tree decoder that leverages knowledge of a language's grammar rules to exclusively generate syntactically correct programs. We find that this grammar-based tree-to-tree model outperforms the state of the art tree-to-tree model in translating between two programming languages on a previously used synthetic task.
Machine Learning, Optimization, and Data Science, 2022
Highly-parameterized deep neural networks are known to have strong data-memorization capability, ... more Highly-parameterized deep neural networks are known to have strong data-memorization capability, but does this ability to memorize random data also extend to simple standard learning methods with few parameters? Following recent work exploring memorization in deep learning, we investigate memorization in standard non-neural learning models through the label recorder method, which uses a model's training accuracy on randomized data to estimate its memorization ability, giving a distribution-and regularization-dependent label recording score. Label recording scores can be used to measure how capacity changes in response to regularization and other hyperparameter choices. This method is fully empirical, easy to implement, and works for all black-box classification methods. The label recording score supplements existing theoretical measures of model capacity such as Rademacher complexity and Vapnik-Chervonenkis (VC) dimension, while agreeing with conventional intuitions regarding statistical learning processes. We find that memorization ability is not limited to only over-parameterized models, but instead exists as a continuum, being present (to some degree) even in simple learning models with few parameters.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Noise suppression models running in production environments are commonly trained on publicly avai... more Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions in production environments due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This 'ears-off' situation motivates augmenting existing datasets in a privacypreserving manner. In this paper, we present Aura, a solution to make existing noise suppression test sets more challenging and diverse while limiting the sampling budget. Aura is 'ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of Aura, we augment a current benchmark test set in noise suppression by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips obtained from AudioSet. Aura makes the existing benchmark test set harder by 100% in DNSMOS P.835, a 26% improvement in Spearman's rank correlation coefficient (SRCC) compared to random sampling and, identifies 73% out-of-distribution samples to augment the test set.
Uploads
Papers by Aditya Khant