1 Pre-Experiment & Preparition
1.1 Define Clear Objective & Metrics
You must move beyond a vague "affects the final results." What part of the algorithm are you changing? (e.g., scoring weights, match distance, ETA prediction model, dispatching logic)
1.2 Unit of Diversion & Randomization Unit
1.3 Hypothesis Formulation
-
Null Hypothesis (H0): The new matching algorithm does not change the mean of our primary metric (e.g., Total Completed Trips per day per city) compared to the old algorithm.
-
Alternative Hypothesis (H1): The new matching algorithm does change the mean of our primary metric. This can be two-tailed ("is different") or one-tailed ("increases" if you have strong directional belief).
-
Use power analysis (
1 - β, typically 80%) and significance level (α, typically 5%). - Duration: Run long enough to capture full weekly cycles
2 Experiment Execution & Monitoring
-
Start with a small smoke test (e.g., 1% traffic) to check for critical bugs/crashes.
-
Ramp up gradually (5% → 10% → 50%) while monitoring core system health metrics (latency, error rates).
-
Use holdbacks if possible: keep a small portion of users (e.g., 1%) permanently in the control group to measure long-term effects and novelty biases.
-
Real-Time Monitoring
3 Analysis & Hypothesis Testing
Improtant phase to evaluate variance. Because if a metric like CTR is increased, but the variance is high, then this experiment is not effective.
- Delta Method
- Bootstrap (small samples)