Philosophy

ELO vs OPR: Why We Need Both

A Brief History of ELO

The ELO rating system was developed by physicist Arpad Elo in the 1960s for chess. Today it's used across competitive domains: FIFA world rankings, League of Legends matchmaking, FiveThirtyEight's NFL predictions, and professional esports. The system's power lies in its ability to predict outcomes and adapt based on results.

Why Not Just Use OPR?

If OPR estimates a team's scoring contribution, why can't we just add up OPRs to predict winners?

The Problem: Close Matches

Consider these two outcomes:

MatchRed ScoreBlue ScoreResult
Match A200198Red Wins
Match B5048Red Wins

OPR sees these as completely different matches (200 vs 50 points). But for winning, they're equally valuable - a 2-point victory either way. ELO captures this: both red alliances get similar rating boosts because both achieved the outcome that matters.

When Each Metric Shines

📊 Use OPR For:

  • Predicting expected scores
  • Evaluating robot hardware capability
  • Alliance selection scouting
  • Identifying high-scoring partners

🎯 Use ELO For:

  • Predicting match winners
  • Measuring competitive success rate
  • Bracket placement and seeding
  • Cross-regional ranking
💡 Key Insight: A team scoring 150 points per match (high OPR) but consistently losing 180-150 will have lower ELO than a team scoring 120 but winning 120-110. OPR says the first team has a better robot; ELO says the second team wins more often. Both are true - they measure different things.
Core Metric

Normalized cELO: The Best of Both Worlds

Normalized Cumulative ELO (cELO) combines competitive success with absolute performance, adjusted for regional strength and meta evolution. It's our most comprehensive single metric for ranking teams globally.

The Three-Level System

Event ELO

Isolated rating from a single event's matches

cELO (Cumulative ELO)

Running total across all matches, exponentially weighted toward recent performance

Normalized cELO

cELO adjusted for regional strength and blended with cOPR-based absolute performance

Recency Weighting

Teams improve throughout the season. To reflect current skill rather than historical averages, we apply exponential decay weighting to match importance:

$$ w(t) = e^{-\lambda \cdot \Delta t} $$

Where Δt is days since the match and λ is the decay parameter. Recent matches contribute significantly more than older ones.

The Regional Normalization Challenge

Consider two teams with identical 1600 cELO ratings:

  • Team A: Dominates weak region (15-0 record, avg opponent ELO: 1200)
  • Team B: Competes in elite region (8-7 record, avg opponent ELO: 1800)

Which team is truly stronger? Raw ELO can't distinguish between "big fish in small pond" and "contender among elites."

Hybrid Normalization Formula

Our normalization blends two components to create a globally fair rating:

1. Competitive Component (70% weight)

Traditional ELO from win/loss record - measures competitive success

2. Performance Component (30% weight)

Based on cOPR relative to global mean - measures absolute robot quality

$$ \text{Performance Component} = \text{Base}_{\text{evo}} + \left(\frac{\text{Team cOPR}}{\text{Global Mean cOPR}} - 1\right) \times \text{Range}_{\text{evo}} $$

Evolution Scaling

To prevent artificial ceilings and account for meta evolution, the entire ELO scale adjusts proportionally to global scoring trends:

$$ \text{Evolution Factor} = \frac{\text{Current Season Global Mean cOPR}}{\text{Baseline Season cOPR}} $$

As teams collectively improve and raise the scoring ceiling, the ELO scale naturally inflates to match. A world-class team today might rate 2200, but if the meta doubles scoring capability in future seasons, world-class teams would rate ~4400.

Example: Cross-Regional Comparison

TeamRegionRecordcOPRRaw cELONormalized cELO
Elite TeamStrong12-314017502178
Stat PadderWeak15-04218001652

Despite the undefeated record, the stat padder's low cOPR reveals they're crushing weak opponents. Normalized cELO properly ranks the elite team higher for global comparison.

Use Cases

  • Cross-regional team comparisons and world rankings
  • Championship seeding and advancement predictions
  • Identifying underrated teams from highly competitive regions
  • Multi-season historical comparisons despite meta evolution
Performance

Cumulative Offensive Power Rating (cOPR)

While ELO measures ability to win, cOPR measures ability to score points. It isolates an individual team's contribution to alliance scores, with exponentially higher weight given to recent events.

The Alliance Score Problem

FTC matches are 2v2, but we only observe total alliance scores. If Red Alliance (Teams 123 + 456) scores 180 points, how much did each team contribute individually?

Linear System Solution

We model alliance scores as a linear system across many matches:

$$ \text{cOPR}_{\text{Team}_1} + \text{cOPR}_{\text{Team}_2} \approx \text{Alliance Score} $$

Over an event with N teams and M matches, this creates an overdetermined system \( Ax = b \), solved using Weighted Least Squares Regression.

Time-Weighted Recency

Teams improve throughout the season. To emphasize current performance:

  • Most recent event: Full weight
  • Previous event: Reduced weight (exponential decay)
  • Older events: Progressively less influence

This makes cOPR more predictive of current capability than a simple average across all events.

💡 Why Weighted? A team that scored 40 OPR at their first event but now scores 100 OPR should be rated closer to 100, not 70 (the average).
Trend

Momentum

Momentum quantifies the rate of improvement over time. It answers: "Is this team getting better, staying stable, or declining?"

Methodology

We perform Weighted Least Squares regression on match scores over time, with higher weights on recent matches. The slope of the fitted line represents points-per-match improvement rate.

$$ \text{Score}(t) = \beta_0 + \beta_1 \cdot t + \epsilon $$

Where β₁ (the slope) indicates improvement direction:

  • Positive slope: Improving performance
  • Near-zero slope: Stable performance
  • Negative slope: Declining performance

The raw slope is normalized to a 0-100 scale for interpretability.

Reliability

Consistency Index

Consistency measures how reliably a team performs near their average. High consistency means few "bad matches," while low consistency indicates volatility.

Mathematical Foundation

Based on the Coefficient of Variation (CV):

$$ CV = \frac{\sigma}{\mu} $$

Where σ is standard deviation and μ is mean score. We invert and scale this to 0-100, where CV = 0 (perfect consistency) maps to 100.

💡 Why It Matters: A team with 100 ± 50 point variance is riskier for eliminations than a team scoring 90 ± 10, even if their averages are similar.
Penalties

Foul cOPR

Foul cOPR estimates the average penalty points a team gives to opponents per match. Like scoring OPR, penalties are reported per alliance, so we use the same linear system approach to isolate individual responsibility.

Lower Foul cOPR is better. A team with 5.0 Foul cOPR contributes ~5 penalty points to opponents per match on average.

Time-Weighted Evolution

Foul cOPR uses the same recency weighting as scoring cOPR. Teams that clean up their driving or fix problematic mechanisms will see rapid improvement in this metric.

Ranking Points

RP Reliability

Ranking Points (Movement, Goal, Pattern) determine tournament seeding. RP Reliability estimates the probability of earning each RP type in the next match.

Bayesian Inference with Recency

We blend three statistical approaches:

  1. Historical Success Rate: Long-term track record
  2. Recency Weighting: Recent matches weighted exponentially higher
  3. Bayesian Smoothing: Prevents overfitting to small samples (e.g., 100% from 1 match becomes ~75% after smoothing)
$$ P(\text{RP}) = \frac{\sum_{i} w_i \cdot \text{Success}_i + \text{Prior Successes}}{\sum_{i} w_i + \text{Prior Trials}} $$

Where wᵢ are recency weights. This produces robust probabilities that adapt quickly to new strategies without overreacting to outliers.

Predictions

Match Win Probability

Given two alliances, what's the probability each alliance wins?

ELO-Based Probability

The probability Alliance A defeats Alliance B follows a logistic curve:

$$ P(A \text{ wins}) = \frac{1}{1 + 10^{(R_B - R_A) / D}} $$

Where RA and RB are alliance ratings (sum of both teams' Normalized cELOs) and D is a scaling constant.

Score Prediction Enhancement

We also estimate expected scores using cOPR and Foul cOPR:

$$ \text{Expected Score}_A = \sum \text{cOPR}_{A} + \sum \text{Foul cOPR}_{B} $$

Alliance A's expected score equals their teams' combined scoring ability plus penalties they'll draw from Alliance B.

💡 Two Models, One Prediction: If ELO predicts Red wins but score prediction favors Blue, we flag this as a high-uncertainty match requiring further analysis.