Yan Chen

About Myself

I'm Yan Chen, a 4th-year PhD student in Decision Sciences at Fuqua School of Business, Duke University, advised by Professor Alexandre Belloni and Professor Yehua Wei. I obtained my Master in Statistics from Stanford University in 2018 and my Bachelor in Statistics from Nanjing University Department of Mathematics in 2016. I worked as a data scientist in industry from 2018 to 2022.

My research lies broadly at the intersection of operations research, statistics, econometrics and machine learning. I am interested in developing theoretically grounded methods for learning and inference that improve decision-making under uncertainty. Methodologically, my work draws on statistical learning theory, high-dimensional statistics, econometrics, causal inference, online learning and optimization. I apply these tools to problems motivated by artificial intelligence and operations management.

For the full list of my working papers and publications, please refer to my Google Scholar page.

Selected Working Papers

Online Pandora's Box for Contextual LLM Cascading
Alexandre Belloni, Yan Chen, Yehua Wei
arxiv|

Motivated by Large Language Model (LLM) cascading, we propose an online contextual Pandora’s Box model for adaptively querying and selecting LLM APIs. In each period, a decision-maker observes a request context and faces a two-phase decision problem. In the query phase, the decision-maker sequentially queries APIs, where each query reveals a generated output and the decision-maker incurs an (output-dependent) cost. In the selection phase, the decision-maker selects one of the generated outputs to deploy and observes only the downstream reward of the deployed output. This output-mediated feedback structure differs from classical online contextual Pandora’s Box models, in which opening a box directly reveals its reward. Rather than estimating the full conditional output and cost distributions of each API, we directly model the reservation index and develop a learning approach for the query phase. Specifically, we impose a parametric structure on the contextual reservation index functions induced by the classical Weitzman’s policy. Our policy combines generalized method of moments (GMM) type estimation of these reservation indices with UCB-style confidence bounds for both these indices and the shared output-level reward evaluator. Under regularity conditions, we prove that the resulting policy achieves dimension-dependent \(\widetilde{O}(\sqrt{T})\) cumulative regret over a horizon of \(T\) periods.

Compound Estimation for Binomials
Yan Chen, Lihua Lei
27th ACM Conference on Economics and Computation (EC 2026)
Presented at CODE@MIT (Poster) 2025 · Causal Data Science Meeting 2025 (Spotlight) · 2025 Stanford Causal Science Center Conference (Poster)
arxiv|

Many applications involve estimating the mean of multiple binomial outcomes as a common problem – assessing intergenerational mobility of census tracts, estimating prevalence of infectious diseases across countries, and measuring click-through rates for different demographic groups. The most standard approach is to report the plain average of each outcome. Despite simplicity, the estimates are noisy when the sample sizes or mean parameters are small. In contrast, the Empirical Bayes (EB) methods are able to boost the average accuracy by borrowing information across tasks. Nevertheless, the EB methods require a Bayesian model where the parameters are sampled from a prior distribution which, unlike the commonly-studied Gaussian case, is unidentified due to discreteness of binomial measurements. Even if the prior distribution is known, the computation is difficult when the sample sizes are heterogeneous as there is no simple joint conjugate prior for the sample size and mean parameter. In this paper, we consider the compound decision framework which treats the sample size and mean parameters as fixed quantities. We develop an approximate Stein’s Unbiased Risk Estimator (SURE) for the average mean squared error given any class of estimators. For a class of machine learning-assisted linear shrinkage estimators, we establish asymptotic optimality, regret bounds, and valid inference. Unlike existing work, we work with the binomials directly without resorting to Gaussian approximations. This allows us to work with small sample sizes and/or mean parameters in both one-sample and two-sample settings. We demonstrate our approach using three datasets on firm discrimination, education outcomes, and innovation rates.

Adversarial Estimation of Assortment Probabilities under Independence Structure
Alexandre Belloni, Yan Chen, Matthew Harding
Revision at Operations Research
27th ACM Conference on Economics and Computation (EC 2026)
Presented at Joint Statistical Meetings 2025 · American Causal Inference Conference 2025 · California Econometrics Conference 2024 · Midwest Econometrics Group Conference 2024
arxiv| code|

We consider the problem of estimating assortment probabilities, which is common in operations management applications, including product bundling, advertising, etc. Existing approaches typically model each assortment as a category and apply multinomial models to estimate the choice probabilities; while computationally convenient, these methods do not exploit independence structures in the joint distribution and may therefore be statistically inefficient when the total number of items is large. Using the representation from Bahadur (1959), we relate the sparsity of the generalized correlation coefficients to the independence structure of the binary components. We formulate the problem as estimating a high-dimensional vector of generalized correlation coefficients, together with low or moderate-dimensional nuisance parameters corresponding to the marginal probabilities. We develop a regularized adversarial estimator that attains the optimal rate under standard regularity conditions while remaining computationally feasible. The framework naturally extends to settings with covariates. We apply the proposed estimators to causal inference with multiple binary treatments and show substantial finite-sample improvements over non-adaptive methods. Numerical studies corroborate the theoretical results.

Publications

Compound Estimation for Binomials
Yan Chen, Lihua Lei
EC 2026 | 27th ACM Conference on Economics and Computation

Adaptive Estimation of Multivariate Binary Distributions under Sparse Generalized Correlation Structures
Alexandre Belloni, Yan Chen, Matthew Harding
EC 2026 | 27th ACM Conference on Economics and Computation

Optimal Downsampling for Imbalanced Classification with Generalized Linear Models
Yan Chen, Jose Blanchet, Krzysztof Dembczynski, Laura Fee Nern, Aaron Flores
AISTATS 2025 | International Conference on Artificial Intelligence and Statistics
paper|

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We propose a pseudo maximum likelihood estimator and study its asymptotic normality in the context of increasingly imbalanced populations relative to an increasingly large sample size. We provide theoretical guarantees for the introduced estimator. Additionally, we compute the optimal downsampling rate using a criterion that balances statistical accuracy and computational efficiency. Our numerical experiments, conducted on both synthetic and empirical data, further validate our theoretical results, and demonstrate that the introduced estimator outperforms commonly available alternatives.

Concurrent Reinforcement Learning with Aggregated States via Randomized Least Squares Value Iteration
Yan Chen, Qinxun Bai, Yiteng Zhang, Maria Dimakopoulou, Shi Dong, Qi Sun, Zhengyuan Zhou
ICML 2025 | International Conference on Machine Learning
paper|

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurrently explore an environment. The theoretical results established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of Θ(1/√N), highlighting the advantage of concurrent learning. Our algorithm exhibits significantly lower space complexity compared to (Russo, 2019) and (Agrawal et al., 2021). We reduce the space complexity by a factor of K while incurring only a √K increase in the worst-case regret bound, compared to (Agrawal et al., 2021; Russo, 2019). Interestingly, our algorithm improves the worst-case regret bound of (Russo, 2019) by a factor of H^1/2, matching the improvement in (Agrawal et al., 2021). However, this result is achieved through a fundamentally different algorithmic enhancement and proof technique. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.

Society of Agents: Regret Bounds of Concurrent Thompson Sampling
Yan Chen, Perry Dong, Qinxun Bai, Maria Dimakopoulou, Wei Xu, Zhengyuan Zhou
NeurIPS 2022 | Advances in Neural Information Processing Systems
paper|

We consider the concurrent reinforcement learning problem where n agents simultaneously learn to make decisions in the same environment by sharing experience with each other. Existing works in this emerging area have empirically demonstrated that Thompson sampling (TS) based algorithms provide a particularly attractive alternative for inducing cooperation, because each agent can independently sample a belief environment (and compute a corresponding optimal policy) from the joint posterior computed by aggregating all agents' data, which induces diversity in exploration among agents while benefiting shared experience from all agents. However, theoretical guarantees in this area remain under-explored; in particular, no regret bound is known on TS based concurrent RL algorithms. In this paper, we fill in this gap by considering two settings. In the first, we study the simple finite-horizon episodic RL setting, where TS is naturally adapted into the concurrent setup by having each agent sample from the current joint posterior at the beginning of each episode. We establish a Õ(HS√AT/n) per-agent regret bound, where H is the horizon of the episode, S is the number of states, A is the number of actions, T is the number of episodes and n is the number of agents. In the second setting, we consider the infinite-horizon RL problem, where a policy is measured by its long-run average reward. Here, despite not having natural episodic breakpoints, we show that by a doubling-horizon schedule, we can adapt TS to the infinite-horizon concurrent learning setting to achieve a regret bound of Õ(DS√ATn), where D is the standard notion of diameter of the underlying MDP and T is the number of timesteps. Note that in both settings, the per-agent regret decreases at an optimal rate of Θ(1/√n), which manifests the power of cooperation in concurrent RL.

2026 Joint Statistical Meetings (upcoming)
August 2026 · Boston, MA

2026 INFORMS Revenue Management and Pricing Conference (upcoming)
July 2026 · University of Michigan

2026 Marketplace Innovation Workshop
May 2026 · Online

Invited Discussant, International Seminar on Selective Inference — Discussion of Prof. Nikos Ignatiadis' talk
video · January 2026 · Online

2025 Stanford Causal Science Center Conference (poster)
November 2025 · Stanford, CA

2025 Conference on Digital Experimentation (CODE) @ MIT (poster)
November 2025 · Cambridge, MA

2025 Causal Data Science Meeting
November 2025 · Online

2025 INFORMS Annual Meeting
October 2025 · Atlanta, GA

2025 Joint Statistical Meetings
August 2025 · Nashville, TN

2025 Revenue Management and Pricing Conference
July 2025 · Columbia University, NYC

2025 American Causal Inference Conference
May 2025 · Detroit, MI

Midwest Econometrics Group Conference 2024
November 2024 · Lexington, KY

2024 INFORMS Annual Meeting
October 2024 · Seattle, WA

2024 California Econometrics Conference
September 2024 · UC Davis, CA

INFORMS Annual Meeting
October 2023 · Phoenix, AZ

Yahoo Research Seminar — Towards Improving Efficiency in Yahoo's Predictive Approaches
August 2023 · Online

NeurIPS 2022 Poster Session
November 2022 · New Orleans, LA

INFORMS Annual Meeting
October 2022 · Indianapolis, IN

Head TA (Fall 2025) — DEC-546: Modern Analytics: Data, Prediction and Actions
Master of Quantitative Management, Fuqua School of Business

Instructor (Summer 2024) — Math Summer Camp for Fuqua PhD
Teaching evaluation: 5.0 / 5.0

TA (Spring 2024) — Univ 103: Let's Talk About Digital You: A Technical and Ethical Exploration of a Data-Centric World
Duke Undergraduate

Head TA (Fall 2024) — DEC-546: Modern Analytics: Data, Prediction and Actions
Master of Quantitative Management, Fuqua School of Business

Head TA (Fall 2023) — DEC-546: Modern Analytics: Data, Prediction and Actions
Master of Quantitative Management, Fuqua School of Business

Co-chair, International Student Affairs under Duke Graduate and Professional Student Government
Fall 2022 – Spring 2023

PhD Representative, Duke Graduate School EIS (English for International Students) Working Group
Appointed by Associate Dean for Graduate Programs · Fall 2023 – Spring 2024