Abstract
Online advertising is emerging as a primary industry for major computer science companies such as Facebook, Google, Microsoft and Yahoo!. From a computing perspective, we face challenges to match the best ads to suitable users within fast enough response times on the massive industrial Petabyte scale of data. In this thesis, centered on computational problems related to online ad problems, the main contributions are: to provide a new statistical model for ad response prediction; a new parallel computing algorithm for model inference; a new optimization method in an online stochastic environment; and exploration and exploitation for ad selection in time-varying dynamic systems. First, we introduce a Bayesian Regression Model for click through rate prediction, and develop and parallelize learning and inference algorithms in the distributed Hadoop Map-Reduce framework. Then, we propose a Multi-Core Gibbs Sampling. Our exact parallel inference achieves near linear speedup in the complex statistical models on 1.8 million news articles from over 20 years of the New York Times. Moreover, we develop a voted Dual Average method for online classification, derive the training and generalization error bounds, and achieve state-of-the-art performance in parsing reranking. Finally, we discuss the exploration and exploitation in a more realistic scenario under a time-varying environment. We introduce different discount factors (e.g.exponential decay) according to the underlying dynamics; and consequently our algorithm is able to trade-off exploration and exploitation adaptively.