CV
Education
- Doctor of Philosophy (PhD) in Biostatistics - University of California, Los Angeles
September 2024 -
Courses: Uncertainty in LLMs, Agent-based AI
- Master of Science in Computer Science - Northeastern University
January 2023 - May 2024
Courses: Algorithm, Distributed Database
- Master of Science in Statistics and Operations Research - University of North Carolina at Chapel Hill
August 2018 - June 2020
Courses: Applied Statistics, Machine Learning, Time Series Forecasting
Work experience
Software Engineer Intern at Amazon
June 2024 - August 2024
Developed key features for Amazon B2B in AWS platform to automate event-driven applications using EventBridge
Implemented enhancements in the Visibility Service using Java, JavaScript and TypeScript, improving real-time tracking and monitoring of important business transactions across the platform
Conducted integration tests for all Outbound services and events, ensuring smooth and error-free deployment
Designed and tested dashboards using CloudWatch to monitor service performance metrics
Data Scientist at ByteDance Ltd.
February 2021 - August 2022
Collaborated with cross-function team to deploy dynamic subscription tool to improve customer experience in platform
Verified product feasibility and deployed XGBoost model to select the important indicators for customers to help design product features
Applied interrupted time series to estimate the potential revenue impact, and evaluate the risk of restricting creator’s quotes strategy in advance
Conducted attribution analysis to evaluate marketing campaign performance which provided powerful evidence to spur on product marketing
Data Scientist at Blingby
August 2020 - December 2020
Built data ETL pipelines through Apache Spark to transform raw data into features by combining business sense and statistical knowledge
Developed, maintained web-based dashboards with Tableau to update daily data analysis report, which increased 20% daily work efficiency
Machine Learning Intern at TouchSuite
June 2020 - August 2020
Queried and cleaned terabyte-sized order data from Azure SQL using pyodbc
Conducted online analytical processing (OLAP) to display critical sales performance from different dimensions
Developed item-based approaches to handle cold-start problems and tuned the model hyper-parameters through SparkML cross-evaluation toolbox which reduced root mean square errors by 10%
Skills
Programming Languages: Python, R, Java, JavaScript, SQL
Tools and Platforms: MySQL, Tableau, AWS, Spark
Publications
- Incentivizing Truthful Language Models via Peer Elicitation Games
NeurIPS 2025, September 2025
This paper introduces Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs.