While recent work has shown that scores from models trained by the ubiquitous masked language mod... more While recent work has shown that scores from models trained by the ubiquitous masked language modeling (MLM) objective effectively discriminate probable from improbable sequences, it is still an open question if these MLMs specify a principled probability distribution over the space of possible sequences. In this paper, we interpret MLMs as energy-based sequence models and propose two energy parametrizations derivable from the trained MLMs. In order to draw samples correctly from these models, we develop a tractable sampling scheme based on the Metropolis-Hastings Monte Carlo algorithm. In our approach, samples are proposed from the same masked conditionals used for training the masked language models, and they are accepted or rejected based on their energy values according to the target distribution. We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energybased models for both open-ended unconditional generation and a conditional generation task of machine translation. We theoretically and empirically justify our sampling algorithm by showing that the masked conditionals on their own do not yield a Markov chain whose stationary distribution is that of our target distribution, and our approach generates higher quality samples than other recently proposed undirected generation approaches (Wang and Cho, 2019; Ghazvininejad et al., 2019).
Proceedings of the AAAI Conference on Artificial Intelligence
Beam search is a desirable choice of test-time decoding algorithm for neural sequence models beca... more Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do not directly consider the behaviour of the final decoding method. As a result, for cross-entropy trained models, beam decoding can sometimes yield reduced test performance when compared with greedy decoding. In order to train models that can more effectively make use of beam search, we propose a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search. While well-defined, this "direct loss" objective is itself discontinuous and thus difficult to optimize. Hence, in our approach, we form a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure.In experiments, we show that optimizing this new training objecti...
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent work on controlled text generation has either required attribute-based fine-tuning of the ... more Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM. In this work, we propose Mix and Match LM, a global score-based alternative for controllable text generation that combines arbitrary pre-trained black-box models for achieving the desired attributes in the generated text without involving any fine-tuning or structural assumptions about the black-box models. We interpret the task of controllable generation as drawing samples from an energy-based model whose energy values are a linear combination of scores from black-box models that are separately responsible for fluency, the control attribute, and faithfulness to any conditioning context. We use a Metropolis-Hastings sampling scheme to sample from this energy-based model using bidirectional context and global attribute features. We validate the effectiveness of our approach on various controlled generation and style-based text revision tasks by outperforming recently proposed methods that involve extra training, fine-tuning, or restrictive assumptions over the form of models.
The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal ... more The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities. Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying potential robustness of MLMs to privacy attacks. In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM's model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are indeed susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level.
Globally normalized neural sequence models are considered superior to their locally normalized eq... more Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of ine...
This paper describes our construction of named-entity recognition (NER) systems in two Western Ir... more This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of “Linguistic Rapid Response” to potential emergency humanitarian relief situations. In the absence of large annotated corpora, parallel corpora, treebanks, bilingual lexica, etc., we found the following to be effective: exploiting distributional regularities in monolingual data, projecting information across closely related languages, and utilizing human linguist judgments. We show promising results on both a four-month exercise in Sorani and a two-day exercise in Tajik, achieved with minimal annotation costs.
Globally normalized neural sequence models are considered superior to their locally normalized eq... more Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of ine...
: An antenna is a metallic device for radiating and receiving radio waves. The parameters that ca... more : An antenna is a metallic device for radiating and receiving radio waves. The parameters that can affect the characteristics of an antenna are the material chosen, the shape, the size and the feed used for the activation of the antenna. Different characteristics that get affected with design are the gain, input impedance, radiation pattern, etc. This paper concentrates on Yagi-Uda loop array antenna. This paper reviews and compares the characteristics of circular loop arrays and square loop arrays with an objective to find suitability of both for IEEE 802.11 Wi-Fi standard.
This paper contributes to a growing body of evidence that—when coupled with appropriate machine-l... more This paper contributes to a growing body of evidence that—when coupled with appropriate machine-learning techniques–linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data. In particular, we show that phonological features outperform character-based models. PanPhon is a database relating over 5,000 IPA segments to 21 subsegmental articulatory features. We show that this database boosts performance in various NER-related tasks. Phonologically aware, neural CRF models built on PanPhon features are able to perform better on monolingual Spanish and Turkish NER tasks that character-based models. They have also been shown to work well in transfer models (as between Uzbek and Turkish). PanPhon features also contribute measurably to Orthography-to-IPA conversion tasks.
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – cap... more In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – capitalization – is absent, as the language’s Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in pri... more We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in printed Early Modern documents. We focus on clustering extracted glyph images into underlying templates in the presence of multiple confounding sources of variance. Our approach introduces a neural editor model that first generates well-understood printing phenomena like spatial perturbations from template parameters via interpertable latent variables, and then modifies the result by generating a non-interpretable latent vector responsible for inking variations, jitter, noise from the archiving process, and other unforeseen phenomena associated with Early Modern printing. Critically, by introducing an inference network whose input is restricted to the visual residual between the observation and the interpretably-modified template, we are able to control and isolate what the vector-valued latent variable captures. We show that our approach outperforms rigid interpretable clustering baselines (Ocular) and overly-flexible deep generative models (VAE) alike on the task of completely unsupervised discovery of typefaces in mixed-font documents.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017
We demonstrate that a continuous relaxation of the argmax operation can be used to create a diffe... more We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-tosequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)-a well-known technique for correcting exposure bias-we introduce a new training objective that is continuous and differentiable everywhere and that can provide informative gradients near points where previous decoding decisions change their value. In addition, by using a related approximation, we demonstrate a similar approach to sampled-based training. Finally, we show that our approach outperforms cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.
Proceedings of the Workshop on Structured Prediction for NLP, 2016
We propose a multi-task learning objective for training joint structured prediction models when n... more We propose a multi-task learning objective for training joint structured prediction models when no jointly annotated data is available. We use conditional random fields as the joint predictive model and train their parameters by optimizing the marginal likelihood of all available annotations, with additional posterior constraints on the distributions of the latent variables imposed to enforce agreement. Experiments on named entity recognition and part-of-speech tagging show that the proposed model outperforms independent task estimation, and the posterior constraints provide a useful mechanism for incorporating domainspecific knowledge.
2015 International Conference on Signal Processing and Communication (ICSC), 2015
In this paper, a compact 3D U-shaped Yagi-Uda nantenna with different high permittivity and perme... more In this paper, a compact 3D U-shaped Yagi-Uda nantenna with different high permittivity and permeability materials has been studied and compared with variation in the spacing between the elements. Modern communication demands high speed data processing at nanometer wavelength and it is basically relying on the speed and bandwidth of emitters & detectors on the nanoscale. In order to achieve this goal, antenna made of material with high permittivity and permeability dielectrics are required which allow the enhancement of light and high frequency RF communication and sensing. Optical Yagi-Uda nantenna offers unique directive property which mobilizes concept like beam forming and beam scattering and can be predominately useful in various application areas. To fulfill this motive we need to study suitable material technology in order to improve various properties such as gain, return loss, radiated power etc. ANSYS HFSS v15.0 is the software that provides a platform to design the antenna with the changes in dielectric permittivity and permeability and the spacing between the elements is relying on the operating wavelength.
While recent work has shown that scores from models trained by the ubiquitous masked language mod... more While recent work has shown that scores from models trained by the ubiquitous masked language modeling (MLM) objective effectively discriminate probable from improbable sequences, it is still an open question if these MLMs specify a principled probability distribution over the space of possible sequences. In this paper, we interpret MLMs as energy-based sequence models and propose two energy parametrizations derivable from the trained MLMs. In order to draw samples correctly from these models, we develop a tractable sampling scheme based on the Metropolis-Hastings Monte Carlo algorithm. In our approach, samples are proposed from the same masked conditionals used for training the masked language models, and they are accepted or rejected based on their energy values according to the target distribution. We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energybased models for both open-ended unconditional generation and a conditional generation task of machine translation. We theoretically and empirically justify our sampling algorithm by showing that the masked conditionals on their own do not yield a Markov chain whose stationary distribution is that of our target distribution, and our approach generates higher quality samples than other recently proposed undirected generation approaches (Wang and Cho, 2019; Ghazvininejad et al., 2019).
Proceedings of the AAAI Conference on Artificial Intelligence
Beam search is a desirable choice of test-time decoding algorithm for neural sequence models beca... more Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do not directly consider the behaviour of the final decoding method. As a result, for cross-entropy trained models, beam decoding can sometimes yield reduced test performance when compared with greedy decoding. In order to train models that can more effectively make use of beam search, we propose a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search. While well-defined, this "direct loss" objective is itself discontinuous and thus difficult to optimize. Hence, in our approach, we form a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure.In experiments, we show that optimizing this new training objecti...
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent work on controlled text generation has either required attribute-based fine-tuning of the ... more Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM. In this work, we propose Mix and Match LM, a global score-based alternative for controllable text generation that combines arbitrary pre-trained black-box models for achieving the desired attributes in the generated text without involving any fine-tuning or structural assumptions about the black-box models. We interpret the task of controllable generation as drawing samples from an energy-based model whose energy values are a linear combination of scores from black-box models that are separately responsible for fluency, the control attribute, and faithfulness to any conditioning context. We use a Metropolis-Hastings sampling scheme to sample from this energy-based model using bidirectional context and global attribute features. We validate the effectiveness of our approach on various controlled generation and style-based text revision tasks by outperforming recently proposed methods that involve extra training, fine-tuning, or restrictive assumptions over the form of models.
The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal ... more The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities. Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying potential robustness of MLMs to privacy attacks. In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM's model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are indeed susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level.
Globally normalized neural sequence models are considered superior to their locally normalized eq... more Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of ine...
This paper describes our construction of named-entity recognition (NER) systems in two Western Ir... more This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of “Linguistic Rapid Response” to potential emergency humanitarian relief situations. In the absence of large annotated corpora, parallel corpora, treebanks, bilingual lexica, etc., we found the following to be effective: exploiting distributional regularities in monolingual data, projecting information across closely related languages, and utilizing human linguist judgments. We show promising results on both a four-month exercise in Sorani and a two-day exercise in Tajik, achieved with minimal annotation costs.
Globally normalized neural sequence models are considered superior to their locally normalized eq... more Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of ine...
: An antenna is a metallic device for radiating and receiving radio waves. The parameters that ca... more : An antenna is a metallic device for radiating and receiving radio waves. The parameters that can affect the characteristics of an antenna are the material chosen, the shape, the size and the feed used for the activation of the antenna. Different characteristics that get affected with design are the gain, input impedance, radiation pattern, etc. This paper concentrates on Yagi-Uda loop array antenna. This paper reviews and compares the characteristics of circular loop arrays and square loop arrays with an objective to find suitability of both for IEEE 802.11 Wi-Fi standard.
This paper contributes to a growing body of evidence that—when coupled with appropriate machine-l... more This paper contributes to a growing body of evidence that—when coupled with appropriate machine-learning techniques–linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data. In particular, we show that phonological features outperform character-based models. PanPhon is a database relating over 5,000 IPA segments to 21 subsegmental articulatory features. We show that this database boosts performance in various NER-related tasks. Phonologically aware, neural CRF models built on PanPhon features are able to perform better on monolingual Spanish and Turkish NER tasks that character-based models. They have also been shown to work well in transfer models (as between Uzbek and Turkish). PanPhon features also contribute measurably to Orthography-to-IPA conversion tasks.
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – cap... more In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – capitalization – is absent, as the language’s Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in pri... more We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in printed Early Modern documents. We focus on clustering extracted glyph images into underlying templates in the presence of multiple confounding sources of variance. Our approach introduces a neural editor model that first generates well-understood printing phenomena like spatial perturbations from template parameters via interpertable latent variables, and then modifies the result by generating a non-interpretable latent vector responsible for inking variations, jitter, noise from the archiving process, and other unforeseen phenomena associated with Early Modern printing. Critically, by introducing an inference network whose input is restricted to the visual residual between the observation and the interpretably-modified template, we are able to control and isolate what the vector-valued latent variable captures. We show that our approach outperforms rigid interpretable clustering baselines (Ocular) and overly-flexible deep generative models (VAE) alike on the task of completely unsupervised discovery of typefaces in mixed-font documents.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017
We demonstrate that a continuous relaxation of the argmax operation can be used to create a diffe... more We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-tosequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)-a well-known technique for correcting exposure bias-we introduce a new training objective that is continuous and differentiable everywhere and that can provide informative gradients near points where previous decoding decisions change their value. In addition, by using a related approximation, we demonstrate a similar approach to sampled-based training. Finally, we show that our approach outperforms cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.
Proceedings of the Workshop on Structured Prediction for NLP, 2016
We propose a multi-task learning objective for training joint structured prediction models when n... more We propose a multi-task learning objective for training joint structured prediction models when no jointly annotated data is available. We use conditional random fields as the joint predictive model and train their parameters by optimizing the marginal likelihood of all available annotations, with additional posterior constraints on the distributions of the latent variables imposed to enforce agreement. Experiments on named entity recognition and part-of-speech tagging show that the proposed model outperforms independent task estimation, and the posterior constraints provide a useful mechanism for incorporating domainspecific knowledge.
2015 International Conference on Signal Processing and Communication (ICSC), 2015
In this paper, a compact 3D U-shaped Yagi-Uda nantenna with different high permittivity and perme... more In this paper, a compact 3D U-shaped Yagi-Uda nantenna with different high permittivity and permeability materials has been studied and compared with variation in the spacing between the elements. Modern communication demands high speed data processing at nanometer wavelength and it is basically relying on the speed and bandwidth of emitters & detectors on the nanoscale. In order to achieve this goal, antenna made of material with high permittivity and permeability dielectrics are required which allow the enhancement of light and high frequency RF communication and sensing. Optical Yagi-Uda nantenna offers unique directive property which mobilizes concept like beam forming and beam scattering and can be predominately useful in various application areas. To fulfill this motive we need to study suitable material technology in order to improve various properties such as gain, return loss, radiated power etc. ANSYS HFSS v15.0 is the software that provides a platform to design the antenna with the changes in dielectric permittivity and permeability and the spacing between the elements is relying on the operating wavelength.
Uploads
Papers by Kartik Goyal