Research: Digital Platforms, Privacy, Data Regulation, Experimental Designs and Optimization
Methods: Field experimentation, Deep Learning, Computer Vision, Structural Econometrics, and Multi-armed Bandits
Balancing User Privacy and Personalization (Revise & Resubmit at Marketing Science)
Joint with Cole Zuber
Awards: MSI A.G. Clayton Best Dissertation Proposal 2022 and Shankar-Spiegel Best Dissertation Proposal Award 2023
Privacy restrictions imposed by browsers such as Safari and Chrome limit the quality of individual-level data used in personalization algorithms. This paper investigates the consequences of these privacy restrictions on consumer, seller and platform outcomes using data from Wayfair, a large US-based online retailer. Large-scale randomized experiments indicate that personalization increases seller and platform revenue, and leads to better consumer-product matches with 10% lower post-purchase product returns and 2.3% higher repeat purchase probability. Privacy restrictions can distort these benefits because they limit platforms' ability to personalize. We find that two main policies imposed by Safari and Chrome disproportionately hurt price responsive consumers and small/niche product sellers. To address this, we propose and evaluate a probabilistic recognition algorithm that associates devices with user accounts, even without exact user identity. Our findings demonstrate that this approach mitigates much of welfare and revenue losses, striking a balance between privacy and personalization.
Bias-Aware Multi-Objective Optimization for Online Rankings: A Demand-Side Approach to Supply-Side Efficiency (Under Review)
Joint with Alibek Korganbekov and Aliya Korganbekova
Retail shipping is responsible for nearly 25% of global greenhouse gas emissions and costs retailers billions annually. Firms have long treated shipping optimization as a supply chain problem, investing in warehouses, routes, and infrastructure. We show that a largely overlooked lever—product ranking algorithms at the consumer search stage—can deliver substantial efficiency and sustainability gains without capital investment. We redesign ranking as a cost–conversion optimization, embedding shipping costs directly into the ranking algorithm. Because such re-ranking significantly reshapes which products are shown, it challenges both offline evaluation and online learning; we therefore extend the Adaptive Doubly Robust methods with embedding-based neighborhood representations to correct for the competitive context biases. We ran a field experiment with 2.8 million consumers on a major e-commerce platform to test the algorithm. This demand-side intervention reduced shipping costs by 4.2%, cut delivery distances by 34 miles per order, and lowered emissions by 3.8%, while preserving conversions and customer satisfaction. The emissions savings were equivalent to the annual electricity use of 50,000 U.S. households or the emissions of 58,000 gasoline cars. When the ranking policy underperformed, it exposed structural rigidities such as brand-driven demand or inflexible warehouse networks. These results show that ranking algorithms can both shape demand and diagnose supply chain strategy, helping platforms align profitability with sustainability at scale.
Regulating Data Usage and Dual Platforms
Joint with Alibek Korganbekov
Awards: 2023 ISMS Sheth Foundation Best Dissertation Proposal Award
We examine the necessity and the design of data usage regulation in B2B markets, particularly focusing on concerns regarding platforms like Amazon that function as both marketplaces and sellers. The key issue is the alleged use of internal Amazon data to replicate top-performing products from third-party sellers. Using Deep Learning tools, we analyze visual and textual similarity measures between 624 Amazon Basics and 2 million third-party seller products. The findings reveal significant and consistent similarities between private label and third-party products across multiple product categories. Additionally, our research questions the effectiveness of duration-based regulation proposed by the European Union. It shows that Amazon takes an average of 2.5-3 years to imitate a product, while smaller sellers' products require approximately 5 years. This indicates that duration-based data regulation provides Amazon with ample time to collect data, suggesting the need for a different seller-targeted data regulation approach.
Ranking algorithms and Equilibrium prices [slides]
Joint with Yufeng Huang and Aliya Korganbekova
How do ranking algorithms, which reward products with high historical sales quantities, shape equilibrium prices on e-commerce platforms? While conjectures suggest that high rankings expand demand and lead to higher prices, these algorithms may also create incentives for sellers to lower prices to invest in future rankings. We study an experiment on a major U.S. e-commerce platform that randomly boosted the rankings of a small set of products. Contrary to the popular conjectures, we find that products with improved rankings systematically decrease their prices. This effect is most pronounced among products that can maintain or improve their future rankings through price reductions, suggesting that the investment incentive dominates. We show that a simple dynamic model—capturing the trade-off between current profits and future rankings and estimated using only pre-experiment data—predicts the observed pricing responses, while alternative explanations such as seller inexperience or heightened competitive pressure cannot account for the patterns. Counterfactual analyses reveal that the ranking algorithm’s “memory” is pivotal: an intermediate forgetting rate, corresponding to an effective half-life of performance data of about 23 days, maximizes the investment incentive and lowers long-run prices by roughly 8 percent relative to a static benchmark, whereas very fast or very slow forgetting dampens competitive pressure and pushes prices back toward static levels. Our results show that platforms, even when they do not set prices directly, can steer competitive intensity and consumer welfare through simple, interpretable design choices in their ranking algorithms.
External Validity and Interference: Adaptive Experimental Design in Online Pricing (sole author, slides available upon request)
Interference—violations of the Stable Unit Treatment Value Assumption (SUTVA)—poses a major challenge to causal inference in randomized experiments, especially in online marketplaces where one unit’s treatment can influence others’ outcomes. While cluster-randomized designs are often used to address this issue, they are statistically inefficient, costly to implement, and ill-suited when the experimenter’s goal is to estimate unit-level effects, such as product-level price elasticities. Partnering with a large European retailer, I conduct two field experiments—a product-level experiment and a cluster-level meta-experiment—which reveal that interference bias in standard designs ranges from 35% to 60%. I propose and validate an adaptive experimental design that dynamically prioritizes units with high expected impact, thereby improving statistical power, enhancing external validity, and reducing implementation costs. The findings show that adaptive experimentation can effectively manage interference while aligning experimental design with practical objectives in complex, interconnected settings.
Privacy-constrained geo-experiments (slides available upon request)
Joint with Jerry Chen
This paper studies advertising effectiveness in geo-level experiments under privacy-induced measurement error. Digital platforms increasingly use cluster-randomized designs, assigning treatment at the Designated Market Area (DMA) level, due to constraints on individual-level experimentation. However, platforms often cannot reliably identify a consumer’s location unless the user logs in or clicks on an ad, leading to misclassification of treatment assignment and attenuation bias in estimated effects. We quantify the extent of this misclassification using internal platform data from a large-scale advertising experiment, with error rates as high as 22%. We propose two correction methods: a bias-adjustment using estimated classification accuracy, and a model-based estimator that leverages validation data and adjusts both the propensity score and outcome model. Simulation evidence and empirical application demonstrate that accounting for misclassification significantly improves inference on advertising effectiveness. Our findings offer methodological tools for experimental evaluation in privacy-constrained environments.
Developer productivity and cloud computing emissions (extended abstract available upon request)
Joint with Alibek Korganbekov
This paper examines how layoffs impact developer productivity and, consequently, cloud emissions. Using a natural experiment approach, we analyze how sudden workforce reductions affect developer productivity, cloud resource usage, and subsequent cloud-related environmental emissions. We partner with an anonymous company that uses cloud services and use their employee-level, cloud usage and billing, and cloud-based CO2 emissions data. Our findings show that even though layoffs lead to decreased dollar costs of cloud usage, they also lead to a 1.6% decrease in developer productivity. This results in increased cloud resource consumption and higher emissions due to less efficient code development and longer processing times. The effect is primarily driven by the layoffs of more experienced engineers and mid-level managers. However, substantial part of the negative productivity effects, and consequently, negative emissions effects can be mitigated if the managers who lead the team have longer tenure with the company. Specifically, 1% longer manager tenure can increase the post-layoff productivity back by 0.9%. This study highlights the complex interplay between workforce changes, productivity, and environmental impact, offering insights into managing cloud resources and emissions effectively in response to workforce fluctuations.