Papers by Mihaela van der Schaar
We introduce an efficient and flexible framework,for temporal,filtering in wavelet-based scalable... more We introduce an efficient and flexible framework,for temporal,filtering in wavelet-based scalable video codecs called unconstrained,motion,compensated,temporal,filtering (UMCTF). UMCTF allows for the use of different filters and temporal,decomposition,structures through,a set ofcontrolling parameters,that may,be easily modified,during,the coding process, at different granularities and levels. The proposed framework enables the adaptation of the coding process to the video content, network and end-device characteristics, allows

Interconnected autonomous systems (ASs) often share security risks. However, an AS lacks the ince... more Interconnected autonomous systems (ASs) often share security risks. However, an AS lacks the incentive to make (sufficient) security investments if the cost exceeds its own benefit even though doing that would be socially beneficial. In this paper, we develop a systematic and rigorous framework for analyzing and significantly improving the mutual security of a collection of ASs that interact frequently over a long period of time. Using this framework, we show that simple incentive schemes based on rating systems can be designed to encourage the ASs' security investments, thereby significantly improving their mutual security. When designing the optimal rating systems, we explicitly consider that monitoring the ASs' investment actions is imperfect and that the cyber-environment exhibits unique characteristics. An important consideration in this design is the heterogeneity of ASs in terms of both generated traffic and underlying connectivity. Our analysis shows that the optimal strategy recommended to the ASs on whether to make or not security investments emerges as a tradeoff between the performance gains achieved by ensuring the AS's compliance with the recommended strategy and the efficiency loss induced by the imperfect monitoring. When the monitoring errors are sufficiently small or the traffic and connectivity structure of the AS collection exhibits the "Maximal Critical Traffic (MCT)" property (i.e. the critical traffic of the whole collection is no less than that of any subset of the AS collection), it is optimal to recommend all ASs to make security investments. However, when this network property is not satisfied, an improved performance can be achieved when some ASs are recommended to NOT make security investments. Many simple network topologies (e.g. the complete, the "line", the "star" graphs and etc.) exhibit the "MCT" property. However, a common topology on the Internet-the "core-periphery" topology-does not possess the "MCT" property and in this case, we prove that whether or not it is optimal to recommend all ASs to make security investments depends on the AS collection size. Even though this paper considers a simplified model of the interconnected ASs' security, our analysis provides important and useful insights for designing rating systems that can significantly improve the mutual security of real networks in a variety of practical scenarios.

2014 52nd Annual Allerton Conference on Communication Control and Computing, Sep 1, 2014
Emerging stream mining applications require classification of large data streams generated by sin... more Emerging stream mining applications require classification of large data streams generated by single or multiple heterogeneous sources. Different classifiers can be used to produce predictions. However, in many practical scenarios the distribution over data and labels (and hence the accuracies of the classifiers) may be unknown a priori and may change in unpredictable ways over time. We consider data streams that are characterized by their context information which can be used as meta-data to choose which classifier should be used to make a specific prediction. Since the context information can be high dimensional, learning the best classifiers to make predictions using contexts suffers from the curse of dimensionality. In this paper, we propose a context-adaptive learning algorithm which learns online what is the best context, learner, and classifier to use to process a data stream. Then, we theoretically bound the regret of the proposed algorithm and show that its time order is independent of the dimension of the context space. Our numerical results illustrate that our algorithm outperforms most prior online learning algorithms, for which such online performance bounds have not been proven. Index Terms-Stream mining, context-adaptive learning, distributed multiuser learning, contextual bandits.

International Conference on Acoustics, Speech, and Signal Processing, Apr 19, 2009
In this paper, we consider social peer-to-peer (P2P) networks, where peers are sharing their reso... more In this paper, we consider social peer-to-peer (P2P) networks, where peers are sharing their resources (i.e., multimedia content and upload bandwidth). In the considered P2P networks, peers are self-interested, thereby determining their resource divisions (i.e., actions) among their associated peers such that their utility (e.g., multimedia quality) is maximized. Peers determine their optimal strategies for selecting their action based on a Markov Decision Process (MDP) framework, which enables the peers to maximize their cumulative utilities. We consider heterogeneous peers that have different and limited ability to characterize their resource reciprocations using only a limited number of states. We investigate how the limited number of states impacts the resource reciprocation and the resulting multimedia quality over time. Simulation results show that peers simultaneously refining their state descriptions can improve the multimedia quality in the resource reciprocation. Moreover, peers prefer to interact with other peers that have higher available upload bandwidths as well as have similar capabilities for refining their number of states.

IEEE ACM Transactions on Networking, Jun 18, 2010
Distributed medium access control (MAC) protocols are essential for the proliferation of low cost... more Distributed medium access control (MAC) protocols are essential for the proliferation of low cost, decentralized wireless local area networks (WLANs). Most MAC protocols are designed with the presumption that nodes comply with prescribed rules. However, selfish nodes have natural motives to manipulate protocols in order to improve their own performance. This often degrades the performance of other nodes as well as that of the overall system. In this work, we propose a class of protocols that limit the performance gain which nodes can obtain through selfish manipulation while incurring only a small efficiency loss. The proposed protocols are based on the idea of a review strategy, with which nodes collect signals about the actions of other nodes over a period of time, use a statistical test to infer whether or not other nodes are following the prescribed protocol, and trigger a punishment if a departure from the protocol is perceived. We consider the cases of private and public signals and provide analytical and numerical results to demonstrate the properties of the proposed protocols.

Corr, Apr 24, 2011
In this paper, we consider a two-sided digital content market, and study which of the two busines... more In this paper, we consider a two-sided digital content market, and study which of the two business modes, i.e., Business-to-Customer (B2C) and Customer-to-Customer (C2C), should be selected and when it should be selected. The considered market is managed by an intermediary, through which content producers can sell their contents to consumers. The intermediary can select B2C or C2C as its business mode, while the content producers and consumers are rational agents that maximize their own utilities. The content producers are differentiated by their content qualities. First, given the intermediary's business mode, we show that there always exists a unique equilibrium at which neither the content producers nor the consumers change their decisions. Moreover, if there are a sufficiently large number of consumers, then the decision process based on the content producers' naive expectation can reach the unique equilibrium. Next, we show that in a market with only one intermediary, C2C should be selected if the intermediary aims at maximizing its profit. Then, by considering a particular scenario where the contents are not highly substitutable, we prove that when the intermediary chooses to maximize the social welfare, C2C should be selected if the content producers can receive sufficient compensation for content sales, and B2C should be selected otherwise.

2014 Ieee Global Conference on Signal and Information Processing, Nov 18, 2014
Substantial empirical research has shown that the level of individualism vs. collectivism is one ... more Substantial empirical research has shown that the level of individualism vs. collectivism is one of the most critical and important determinants of societal traits, such as economic growth, economic institutions and health conditions. But the exact nature of this impact has thus far not been well understood in an analytical setting. In this work, we develop one of the first theoretical models that analytically studies the impact of individualism-collectivism on the society. We model the growth of an individual's welfare (wealth, resources and health) as depending not only on himself, but also on the level of collectivism, i.e. the level of dependence on the rest of the individuals in the society, which leads to a co-evolutionary setting. Based on our model, we are able to predict the impact of individualism-collectivism on various societal metrics, such as average welfare, average lifetime , total population, cumulative welfare and average inequality. We analytically show that individualism has a positive impact on average welfare and cumulative welfare, but comes with the drawbacks of lower average lifetime , lower total population and higher average inequality.

Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many... more Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, we introduce and formalize the Global MAB (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time steps with probability one. In addition, we also study how the informativeness of the arms about each other's rewards affects the speed of learning. Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the informativeness of the arms. We also prove a sublinear in time Bayesian risk bound for the GMAB which reduces to the well-known Bayesian risk bound for linearly parameterized bandits when the arms are fully informative. GMABs have applications ranging from drug and treatment discovery to dynamic pricing.

In order to understand the complex interactions between different technologies in a communication... more In order to understand the complex interactions between different technologies in a communications market, it is of fundamental importance to understand how technologies affect the demand of users and competition between network service providers (NSPs). To this end, we analyze user subscription dynamics and revenue maximization in monopolisitc and duopolistic communications markets. First, by considering a monopoly market with only one NSP, we investigate the impact of technologies on the users' dynamic subscription. It is shown that, for any price charged by the NSP, there exists a unique equilibrium point of the considered user subscription dynamics. We also provide a sufficient condition under which the user subscription dynamics converges to the equilibrium point starting from any initial point. We then derive upper and lower bounds on the optimal price and market share that maximize the NSP's revenue. Next, we turn to the analysis of a duopoly market and show that, for any charged prices, the equilibrium point of the considered user subscription dynamics exists and is unique. As in a monopoly market, we derive a sufficient condition on the technologies of the NSPs that ensures the user subscription dynamics to reach the equilibrium point. Then, we model the NSP competition using a non-cooperative game, in which the two NSPs choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium in the market competition game.

IEEE Transactions on Signal Processing, 2015
In this paper, we develop online learning algorithms that enable the agents to cooperatively lear... more In this paper, we develop online learning algorithms that enable the agents to cooperatively learn how to maximize the overall reward in scenarios where only noisy global feedback is available without exchanging any information among themselves. We prove that our algorithms' learning regrets-the losses incurred by the algorithms due to uncertainty-are logarithmically increasing in time and thus the time average reward converges to the optimal average reward. Moreover, we also illustrate how the regret depends on the size of the action space, and we show that this relationship is influenced by the informativeness of the reward structure with regard to each agent's individual action. When the overall reward is fully informative, regret is shown to be linear in the total number of actions of all the agents. When the reward function is not informative, regret is linear in the number of joint actions. Our analytic and numerical results show that the proposed learning algorithms significantly outperform existing online learning solutions in terms of regret and learning speed. We illustrate how our theoretical framework can be used in practice by applying it to online Big Data mining using distributed classifiers.
2015 IEEE International Conference on Communications (ICC), 2015
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2012
We propose an incentive scheme based on intervention to sustain cooperation among selfinterested ... more We propose an incentive scheme based on intervention to sustain cooperation among selfinterested users. In the proposed scheme, an intervention device collects imperfect signals about the actions of the users for a test period, and then chooses the level of intervention that degrades the performance of the network for the remaining time period. We analyze the problems of designing an optimal intervention rule given a test period and choosing an optimal length of the test period. The intervention device can provide the incentive for cooperation by exerting intervention following signals that involve a high likelihood of deviation. Increasing the length of the test period has two counteracting effects on the performance: It improves the quality of signals, but at the same time it weakens the incentive for cooperation due to increased delay.

IEEE Journal of Selected Topics in Signal Processing, 2015
Recommender systems, medical diagnosis, network security, etc., require ongoing learning and deci... more Recommender systems, medical diagnosis, network security, etc., require ongoing learning and decision-making in real time. These-and many others-represent perfect examples of the opportunities and difficulties presented by Big Data: the available information often arrives from a variety of sources and has diverse features so that learning from all the sources may be valuable but integrating what is learned is subject to the curse of dimensionality. This paper develops and analyzes algorithms that allow efficient learning and decision-making while avoiding the curse of dimensionality. We formalize the information available to the learner/decision-maker at a particular time as a context vector which the learner should consider when taking actions. In general the context vector is very high dimensional, but in many settings, the most relevant information is embedded into only a few relevant dimensions. If these relevant dimensions were known in advance, the problem would be simple-but they are not. Moreover, the relevant dimensions may be different for different actions. Our algorithm learns the relevant dimensions for each action, and makes decisions based in what it has learned. Formally, we build on the structure of a contextual multi-armed bandit by adding and exploiting a relevance relation. We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces toÕ(T 2(√ 2−1)); in the absence of a relevance relation, the best known contextual bandit algorithms achieve regretÕ(T (D+1)/(D+2)), where D is the full dimension of the context vector. Our algorithm alternates between exploring and exploiting and does not require observing outcomes during exploitation (so allows for active learning). Moreover, during exploitation, suboptimal actions are chosen with arbitrarily low probability. Our algorithm is tested on datasets arising from network security and online news article recommendations.

IEEE Conference on Decision and Control and European Control Conference, 2011
This paper focuses on analyzing the interactions emerging between users in online communities. Ne... more This paper focuses on analyzing the interactions emerging between users in online communities. Network utility maximization and other methods are not effective when the communities are composed of intelligent and self-interested users (multimedia social communities, social networks etc.), because the interests of the individual users may be in conflict. In our prior work, we propose to design protocols in a stationary community to provide users incentives to voluntarily operate according to predetermined social norms and provide services. In this paper, we extend the study to analyze the interactions of self-interested users under a social norm in an online community of finite population, where the stationary property of the community does not hold. To optimize their long-term performance based on their knowledge, users adapt strategies to play their best response by solving individual stochastic control problems. Understanding the evolution of a community provides protocol designers guidelines for designing social norms in which no user will have the incentive to adapt and deviate from the prescribed protocol, which in turn encourages cooperative behavior among users and achieves the optimal social welfare of the community.
2011 IEEE Global Telecommunications Conference - GLOBECOM 2011, 2011

2007 IEEE International Conference on Communications, 2007
We study the problem of optimal resource allocation for multiuser wireless video transmissions fr... more We study the problem of optimal resource allocation for multiuser wireless video transmissions from an informationtheoretic point of view. We show that the previously known optimal rate allocation solution in wireless multiaccess which maximizes the weighted sum rate is suboptimal in wireless video communications. We further derive the optimal video resource allocation by jointly considering the Application-MAC-PHY layers. This optimal scheme maximizes the weighted sum video quality of all video users for any feasible power control policy. We refer to this policy as Largest Quality Improvement Highest Possible Rate (LQIHPR). We propose a simple greedy algorithm for implementation. With the help of the inherent prioritization mechanism of video coders, we show that LQIHPR is universally optimal for all video coding schemes. Simulation results demonstrate the significant improvement LQIHPR leads to as opposed to the conventional one. 1 This problem has been partially considered in [11], but the study was only preliminary and incomplete.

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2012
In online labor markets, experts sell their expertise to buyers. Despite the success and the perc... more In online labor markets, experts sell their expertise to buyers. Despite the success and the perceived promise of online labor markets, they face a serious practical challenge: providing appropriate incentives for experts to participate and exert effort to accurately (successfully) complete tasks. Personal rating schemes have been proposed to address this challenge: they provide differentiated reward/punishment to experts in order to incentivize them to cooperate (i.e. to their best to complete tasks). However, when the transactions in a market are subject to errors, the experts are wrongly punished frequently whenever personal rating schemes are deployed. This not only reduces the experts' incentives to cooperate, but also it harms the market performance such as the obtained social welfare or revenue. To mitigate the problem of wrong punishments, we develop a novel game-theoretic formalism based on collective ratings. We formalize an online labor market as a two-sided trading platform where buyers and experts interact repeatedly. The market designer's problem is to create a market policy that maximizes the market's revenue subject to the constraints imposed by the characteristics of the market and the incentives of the participants. We propose to organize such markets by dividing experts into groups for which a collective rating is created and maintained based on the buyers' aggregated feedback. We analyze how the group size and the adopted rating scheme affect the market's revenue and the social welfare of the participants in the market, and determine the optimal design of the market policy. We show that collective ratings are surprisingly more effective and more robust than personal rating for a wide variety of online labor markets.
Uploads
Papers by Mihaela van der Schaar