讲座嘉宾介绍:
郑泽宇博士目前是美国加州伯克利大学工业工程与运筹系助理教授。他从斯坦福大学获得运筹学博士学位和经济学硕士学位,从北京大学获得数学学士学位,他的主要研究方向为仿真、非平稳随机建模与决策、金融科技等。
?
讲座内容介绍:
We consider a single-product dynamic pricing problem under a specific non-stationary setting, where the underlying demand process grows over time in expectation and also possibly in the level of random fluctuation. The decision maker sequentially sets price in each time period and learns the unknown demand model, with the goal of maximizing expected cumulative revenue over the entire time horizon. We prove matching upper and lower bounds on regret and provide near-optimal pricing policies, showing how the growth rate of random fluctuation over time affects the best achievable regret order and the near-optimal policy design. In the analysis, we show that whether the seller knows the length of time horizon in advance or not surprisingly render different optimal regret orders. We then extend the demand model such that the optimal price may vary with time and present a novel and near-optimal policy for the extended model. Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that knowing or not knowing the length of time horizon render the same optimal regret order, in contrast to the non-stationary dynamic pricing problem.