Methodology
Transparency in how we measure, test, and score email service providers.
Our Testing Approach
ESP Benchmarks employs a rigorous, automated testing methodology designed to provide objective, reproducible performance data. Our system operates 24/7/365, continuously evaluating every tracked email service provider under consistent conditions.
Automated test agents send emails through each provider at regular intervals, typically every 15 minutes for transactional providers and hourly for marketing-focused platforms. These agents operate from multiple geographic locations to capture regional performance variations. Each test includes timestamp logging at multiple checkpoints: API request initiation, API response receipt, delivery confirmation, and inbox arrival when measurable.
We maintain test inboxes across major email providers including Gmail, Outlook, Yahoo Mail, and Apple iCloud. These inboxes allow us to measure not just delivery confirmation but actual inbox placement, distinguishing between primary inbox, tabs, and spam folder placement. Inbox monitoring operates on a 5-minute polling cycle to ensure prompt detection of delivered emails.
Our testing methodology intentionally uses standardized email content that represents typical transactional and marketing messages. This approach ensures fair comparison across providers while avoiding content-specific deliverability variations. We periodically rotate test content to prevent pattern-based optimizations that could artificially inflate scores.
Statistical significance is ensured through sample size. Each provider accumulates thousands of data points monthly, providing robust averages and reliable percentile calculations. We apply outlier detection to identify anomalous results that may indicate testing infrastructure issues rather than genuine provider performance variations.
Metric Definitions
Understanding our metrics is essential for meaningful provider comparison. Each metric captures a specific aspect of email service performance, and together they provide a comprehensive view of provider capability.
**Delivery Speed** measures the time elapsed between API request submission and delivery confirmation receipt. This metric reflects the end-to-end performance of the provider's sending infrastructure, including queue processing, MTA handoff, and delivery negotiation with recipient mail servers. We report average, median (P50), 95th percentile (P95), and 99th percentile (P99) values.
**API Latency** measures only the time for the provider's API to acknowledge a send request, independent of actual email delivery. This metric is critical for application performance, as synchronous API calls directly impact user-facing request latency. Lower API latency enables higher throughput and better application responsiveness.
**Uptime** represents the percentage of time the provider's API and sending infrastructure was fully operational. We measure uptime at 1-minute intervals and calculate monthly and rolling averages. An incident is logged whenever our monitoring detects API unavailability or significantly degraded response times.
**Deliverability** measures the percentage of sent emails that successfully reach the recipient's inbox (not spam folder). This metric is distinct from delivery confirmation rates, which can be high even when emails are being spam-filtered. Our inbox placement testing provides this ground-truth deliverability measurement.
**Bounce Rate** indicates the percentage of sent emails that were rejected by recipient mail servers. High bounce rates suggest either poor list hygiene support or infrastructure issues affecting sender reputation. We track both hard bounces (permanent failures) and soft bounces (temporary issues).
**Throughput** represents the maximum sustained sending rate a provider can support, measured in emails per second. This metric matters primarily for high-volume senders and time-sensitive campaigns requiring rapid delivery completion.
Scoring Formula
Our overall provider scores combine multiple performance dimensions into a single comparable metric. The scoring formula balances different aspects of email infrastructure quality according to their practical importance.
The overall score is calculated as a weighted average of five sub-scores:
- **Speed (20%)**: Derived from delivery speed and API latency metrics. Faster providers score higher, with diminishing returns beyond best-in-class thresholds.
- **Reliability (25%)**: Based primarily on uptime percentage, with additional weight for incident severity and frequency. A provider with 99.99% uptime scores significantly higher than one with 99.90%.
- **Deliverability (25%)**: Incorporates inbox placement rates across major mailbox providers, bounce rates, and spam complaint rates. This heavily-weighted dimension reflects the fundamental importance of reaching the inbox.
- **Developer Experience (15%)**: A qualitative assessment of API design, SDK quality, documentation comprehensiveness, and integration ease. Evaluated by our engineering team using standardized criteria.
- **Value (15%)**: Compares pricing relative to performance. Providers offering strong performance at competitive prices score higher. This dimension prevents pure cost-cutting from dominating while ensuring value is recognized.
Each sub-score ranges from 0-100, and the weighted combination produces an overall score on the same scale. Scores are updated monthly based on rolling 90-day performance data, with recent months weighted slightly higher to reflect current provider status.
This methodology intentionally favors providers excelling across multiple dimensions over those with a single standout metric. A provider with consistent mid-80s scores across all categories will typically outscore one with a 95 in one category but 70s in others.
Data Collection
Our data collection infrastructure spans multiple regions and operates continuously to capture comprehensive, geographically representative performance data.
**Geographic Distribution**: Monitoring agents are deployed in six regions: US East (Virginia), US West (Oregon), Europe (Frankfurt), Asia Pacific (Singapore and Tokyo), and South America (Sao Paulo). This distribution allows us to measure provider performance for senders and recipients worldwide, identifying regional variations that aggregate statistics might obscure.
**Test Volume**: We generate approximately 50,000 test transactions daily across all providers, accumulating roughly 1.5 million data points monthly. This volume ensures statistical significance for even granular metrics like P99 latency and enables meaningful week-over-week trend analysis.
**Infrastructure**: Our monitoring infrastructure runs on dedicated hardware to eliminate noisy-neighbor effects that could impact measurement accuracy. Network connectivity uses direct peering where available to minimize external variables. All monitoring systems are redundant, with automatic failover ensuring no gaps in data collection.
**Data Storage**: Raw transaction data is retained for 24 months, enabling historical analysis and trend identification. Aggregated metrics are stored indefinitely. Our data pipeline includes automated anomaly detection that flags unusual results for manual review before inclusion in published statistics.
**Provider Access**: We maintain standard API access to all tracked providers, using the same authentication and rate limits available to typical customers. We do not request or accept special access that would make our results non-representative of normal customer experience.
**Verification**: Our methodology undergoes periodic third-party review to ensure accuracy and absence of bias. We welcome provider feedback on our testing approach and will investigate any concerns about measurement accuracy.
Independence & Transparency
ESP Benchmarks is committed to providing objective, unbiased performance data that serves the interests of engineering teams evaluating email infrastructure options.
**Editorial Independence**: Our rankings and recommendations are determined solely by measured performance data and standardized evaluation criteria. No provider can influence their scores through advertising, sponsorship, or other commercial relationships. We do not accept payment for reviews, placements, or favorable coverage.
**Funding Transparency**: ESP Benchmarks operates as an independent research initiative. Our primary revenue comes from premium research reports and consulting services that do not influence our public benchmark data. Any commercial relationships with tracked providers are disclosed prominently.
**Methodology Disclosure**: This methodology documentation is publicly available so that readers can understand how scores are calculated and evaluate the validity of our approach. We welcome scrutiny and feedback on our methodology.
**Conflict Management**: Team members with financial interests in any tracked provider are recused from evaluations and scoring decisions for that provider. We maintain strict separation between commercial activities and benchmark operations.
**Provider Communication**: We share benchmark results with providers before publication, allowing them to verify data accuracy and respond to findings. However, providers have no editorial control over our conclusions or recommendations.
**Corrections Policy**: When we identify errors in our data or analysis, we correct them promptly and transparently. Material corrections are noted in the affected reports. We maintain an errata log for reference.
Our goal is to be the trusted, neutral source for email infrastructure performance data. We believe transparency about our methods and independence is essential to earning and maintaining that trust.
Benchmark Updates
Weekly digest of ESP performance changes, new incidents, and research insights.