Papers by Ulya Bayram

Intensive care units (ICUs) are divisions where critically ill patients are treated by medical ex... more Intensive care units (ICUs) are divisions where critically ill patients are treated by medical experts. The unmet and vital need for automated clinical decision-making mechanisms is critical to maneuvering the large influx of patients. This became more apparent after the COVID-19 pandemic. Existing studies focus on determining the probability of patients dying in the ICUs and prioritizing patients in dire need. Only a few studies have calculated
the patient’s probability of returning to the ICUs after discharge. These studies reduce the problem into a binary task of predicting mortality or re-admission only. However, this is unrealistic since both outcomes are highly possible for each patient. In this interdisciplinary study, two main contributions are proposed for the automated clinical decision-making stateof-the-art: (1) using the real-life data collected from thousands of ICU patients by healthcare professionals, three possibilities (recovery, mortality, and returning to the intensive care unit within 30 days) are predicted for patients in intensive care instead of just one possibility. (2) A novel feature extraction approach is proposed by the biomedical expert in our team. Four
machine learning algorithms are applied to the finalized feature set to understand the difference between the binary and the multi-class classification problems. Obtained results reach 78% success, proving the possibility of developing better clinical decision-making mechanisms
for ICUs.

Journal of Information Technologies, 2022
Social media data can provide a general idea of people's response towards the COVID-19 outbreak a... more Social media data can provide a general idea of people's response towards the COVID-19 outbreak and its reflections, but it cannot be as objective as the news articles as a source of information. They are valuable sources of data for natural language processing research as they can reveal various paradigms about different phenomena related to the pandemic. This study uses a news collection spanning nine months from 2019 to 2020, containing COVID-19 related articles from various organizations around the world. The investigation conducted on the collection aims at revealing the repercussions of the pandemic at multiple levels. The first investigation discloses the most mentioned problems covered during the pandemic using statistics. Meanwhile, the second investigation utilizes machine learning to determine the most prevalent topics present within the articles to provide a better picture of the pandemic-induced issues. The results show that the economy was among the most prevalent problems. The third investigation constructs lexical networks from the articles, and reveals how every problem is related through nodes and weighted connections. The findings exhibit the need for more research using machine learning and natural language processing techniques on similar data collections to unveil the full repercussions of the pandemic.

In this study, we introduce a new network feature for detecting suicidal ideation from clinical t... more In this study, we introduce a new network feature for detecting suicidal ideation from clinical texts and conduct various additional experiments to enrich the state of knowledge. We evaluate statistical features with and without stopwords, use lexical networks for feature extraction and classification, and compare the results with standard machine learning methods using a logistic classifier, a neural network, and a deep learning method. We utilize three text collections. The first two contain transcriptions of interviews conducted by experts with suicidal (n=161 patients that experienced severe ideation) and control subjects (n=153). The third collection consists of interviews conducted by experts with epilepsy patients, with a few of them admitting to experiencing suicidal ideation in the past (32 suicidal and 77 control). The selected methods detect suicidal ideation with an average area under the curve (AUC) score of 95% on the merged collection with high suicidal ideation, and the trained models generalize over the third collection with an average AUC score of 1

Background: Probabilistic assessments of clinical care are essential for quality care. Yet, machi... more Background: Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients. Results: The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier. Conclusions: The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.

With early identification and intervention, many suicidal deaths are preventable. Tools that incl... more With early identification and intervention, many suicidal deaths are preventable. Tools that include machine learning methods have been able to identify suicidal language. This paper examines the persistence of this suicidal language up to 30 days after discharge from care. Method: In a multi-center study, 253 subjects were enrolled into either suicidal or control cohorts. Their responses to standardized instruments and interviews were analyzed using machine learning algorithms. Subjects were re-interviewed approximately 30 days later, and their language was compared to the original language to determine the presence of suicidal ideation. Results: The results show that language characteristics used to classify suicidality at the initial encounter are still present in the speech 30 days later (AUC = 89% (95% CI: 85-95%), p < .0001) and that algorithms trained on the second interviews could also identify the subjects that produced the first interviews (AUC = 85% (95% CI: 81-90%), p < .0001). Conclusions: This approach explores the stability of suicidal language. When using advanced computational methods, the results show that a patient's language is similar 30 days after first captured, while responses to standard measures change.

The COVID-19 pandemic has sparked a remarkable volume of research literature, and scientists are ... more The COVID-19 pandemic has sparked a remarkable volume of research literature, and scientists are increasingly in need of intelligent tools to cut through the noise and uncover relevant research directions. As a response, we propose a novel framework. In this framework, we develop a novel weighted semantic graph model to compress the research studies efficiently. Also, we present two analyses on this graph to propose alternative ways to uncover additional aspects of COVID-19 research. Design/methodology/approach: We construct the semantic graph using state-of-the-art Natural Language Processing (NLP) techniques on COVID-19 publication texts (>100,000 texts). Next, we conduct an evolutionary analysis to capture the changes in COVID-19 research across time. Finally, we apply a link prediction study to detect novel COVID-19 research directions that are so far undiscovered. Findings: Findings reveal the success of the semantic graph in capturing scientific knowledge and its evolution. Meanwhile, the prediction experiments provide 79% accuracy on returning intelligible links, showing the reliability of the methods for predicting novel connections that could help scientists discover potential new directions. Originality/value: To our knowledge, this is the first study to propose a holistic framework that includes encoding the scientific knowledge in a semantic graph, demonstrates an evolutionary examination of past and ongoing research, and offers scientists with tools to generate new hypotheses and research directions through predictive modeling and deep machine learning techniques.
Conference Presentations by Ulya Bayram

Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, NAACL, 2022
In this shared task, we focus on detecting mental health signals in Reddit users' posts through t... more In this shared task, we focus on detecting mental health signals in Reddit users' posts through two main challenges: A) capturing mood changes (anomalies) from the longitudinal set of posts (called timelines), and B) assessing the users' suicide risk-levels. Our approaches leverage emotion recognition on linguistic content by computing emotion/sentiment scores using pre-trained BERTs on users' posts and feeding them to machine learning models, including XGBoost, Bi-LSTM, and logistic regression. For Task-A, we detect longitudinal anomalies using a sequence-to-sequence (seq2seq) autoencoder and capture regions of mood deviations. For Task-B, our two models utilize the BERT emotion/sentiment scores. The first computes emotion bandwidths and merges them with n-gram features, and employs logistic regression to detect users' suicide risk levels. The second model predicts suicide risk on the timeline level using a Bi-LSTM on Task-A results and sentiment scores. Our results outperformed most participating teams and ranked in the top three in Task-A. In Task-B, our methods surpass all others and return the best macro and micro F1 scores.
Unsupervised land use - Land cover classification for multispectral images
Evaluation of Textural Features for Multispectral Images
Comparison of cuboid and tracklet features for action recognition on surveillance videos
A framework for detecting complex events in surveillance videos
Improving Reliability with Dynamic Syndrome Allocation in Intelligent Software Defined Data Centers
Characterizing Data Dependence Constraints for Dynamic Reliability Using N-Queens Attack Domains
Preventing suicide requires early identification of suicidal ideation. In this research, we propo... more Preventing suicide requires early identification of suicidal ideation. In this research, we propose an approach to evaluate whether an individual's statements during a clinical interview can be classified as coming from a suicidal or non-suicidal mindset. To do so, we compare the statements with distinct lexical associative networks constructed from corpora of suicidal and control texts. Each node in these networks is a word, and the weight of the edge between every word pair indicates how strongly the words are associated in that corpus. Several metrics of association are evaluated in this work. Preliminary results show good classification performance with above 75% accuracy on novel test data.

Politics is an area of broad interest to policy-makers, researchers, and the general public. The ... more Politics is an area of broad interest to policy-makers, researchers, and the general public. The recent explosion in the availability of electronic data and advances in data analysis methods-including techniques from machine learning-have led to many studies attempting to extract political insight from this data. Speeches in the U.S. Congress represent an exceptionally rich dataset for this purpose, and these have been analyzed by many researchers using statistical and machine learning methods. In this paper, we analyze House of Representatives floor speeches from the 1981-2016 period, with the goal of inferring the partisan affiliation of the speakers from their use of words. Previous studies with sophisticated machine learning models has suggested that this task can be accomplished with an accuracy in the 55 to 80% range, depending on the year. In this paper, we show that, in fact, very comparable results can be obtained using a much simpler linear classifier in word space, indicating that the use of words in partisan ways is not particularly complicated. Our results also confirm that, over the period of study, it has become steadily easier to infer partisan affiliation from political speeches in the United States. Finally, we make some observations about specific terms that Republicans and Democrats have favored over the years in service of partisan expression.
In this shared task, we accept the challenge of constructing models to identify Twitter users who... more In this shared task, we accept the challenge of constructing models to identify Twitter users who attempted suicide based on their tweets 30 and 182 days before the adverse event's occurrence. We explore multiple machine learning and deep learning methods to identify a person's suicide risk based on the short-term history of their tweets. Taking the real-life applicability of the model into account, we make the design choice of classifying on the tweet level. By voting the tweet-level suicide risk scores through an ensemble of classifiers, we predict the suicidal users 30-days before the event with an 81.8% true-positives rate. Meanwhile, the tweet-level voting falls short on the six-month-long data as the number of tweets with weak suicidal ideation levels weakens the overall suicidal signals in the long term.
Books by Ulya Bayram

Applying Machine Learning to Online Data?: Beware! Computational Social Science Requires Care.
The immense impact of social media on contemporary cultural evolution is undeniable, consequently... more The immense impact of social media on contemporary cultural evolution is undeniable, consequently declaring them an essential data source for computational social science studies. Alongside the advancements in natural language processing and machine learning disciplines, computational social science researchers continuously adapt new techniques to the data collected from social media. Although these developments are imperative for studying the sociological transformations in many communities, there are some inconspicuous problems on the horizon. This chapter addresses issues that may arise from the use of social media data, like biased models. It also discusses various obstacles associated with machine learning methods while also providing possible solutions and recommendations to overcome these struggles from an interdisciplinary perspective. In the long term, this chapter will guide computational social science researchers in their future studies, from things to be aware of with data collection to assembling an accurate experimental design.
Uploads
Papers by Ulya Bayram
the patient’s probability of returning to the ICUs after discharge. These studies reduce the problem into a binary task of predicting mortality or re-admission only. However, this is unrealistic since both outcomes are highly possible for each patient. In this interdisciplinary study, two main contributions are proposed for the automated clinical decision-making stateof-the-art: (1) using the real-life data collected from thousands of ICU patients by healthcare professionals, three possibilities (recovery, mortality, and returning to the intensive care unit within 30 days) are predicted for patients in intensive care instead of just one possibility. (2) A novel feature extraction approach is proposed by the biomedical expert in our team. Four
machine learning algorithms are applied to the finalized feature set to understand the difference between the binary and the multi-class classification problems. Obtained results reach 78% success, proving the possibility of developing better clinical decision-making mechanisms
for ICUs.
Conference Presentations by Ulya Bayram
Books by Ulya Bayram
the patient’s probability of returning to the ICUs after discharge. These studies reduce the problem into a binary task of predicting mortality or re-admission only. However, this is unrealistic since both outcomes are highly possible for each patient. In this interdisciplinary study, two main contributions are proposed for the automated clinical decision-making stateof-the-art: (1) using the real-life data collected from thousands of ICU patients by healthcare professionals, three possibilities (recovery, mortality, and returning to the intensive care unit within 30 days) are predicted for patients in intensive care instead of just one possibility. (2) A novel feature extraction approach is proposed by the biomedical expert in our team. Four
machine learning algorithms are applied to the finalized feature set to understand the difference between the binary and the multi-class classification problems. Obtained results reach 78% success, proving the possibility of developing better clinical decision-making mechanisms
for ICUs.