Deep dive into the real-time email address verification process and why we decided to do it differently?

BigDataCloud·February 17, 2020

Email addresses have become more central to digital life, not less. Tickets, reservations, orders, account recovery — nearly every meaningful digital interaction relies on a valid email address reaching the right person. Getting that address right at the point of entry is worth getting right.

When we built our email verification API, we researched every available approach and deliberately chose not to use some of the most common ones. Here is what we found, and why we made the choices we did.

Why invalid email addresses happen

An email address obtained directly from a customer can be invalid for two reasons:

  1. The customer deliberately provided incorrect information — spam, fraud, or simply not wanting to be contacted. This is the minority of cases.
  2. The customer made a typo. This is the vast majority of invalid addresses at point of entry.

Real-time verification addresses the second case. Sending a verification link after the fact addresses both, but at the cost of friction — a customer who doesn't receive a confirmation email often doesn't come back to correct it.

Syntax check

A valid email address is a string in the form local-part@domain, where the local part identifies a mailbox and the domain identifies the mail server responsible for it.

A basic sanity check — verifying a single @ is present, not at the edges, and the domain contains at least one . — eliminates many obvious errors. But a thorough syntax check requires full compliance with the relevant standards:

Developers often reach for a regex for this. The problem is that a fully RFC 822-compliant regex is extremely long and difficult to maintain, so shortcuts are common — and shortcuts cause false positives. The following are all perfectly valid email addresses under the standards:

  • " "@example.org
  • user.name+tag+sorting@example.com
  • 我買@屋企.香港

Our API performs a complete multi-standard compliance check rather than a regex shortcut.

Mail server check

Even a syntactically perfect email address is undeliverable if the domain has no active mail server. The second check confirms that the domain resolves in the global DNS and has a valid MX record — a DNS record that specifies which mail server handles incoming email for that domain.

We verify that the MX records point to valid domains, and that at least one of them resolves to an active, routable IP address.

Email verification API output example

Why we don't do mailbox existence checks

The only way to confirm a specific mailbox exists is to initiate an SMTP handshake with the mail server and ask it. This is called a broken SMTP handshake, and it is the approach taken by many email verification providers. We deliberately do not use it. Here is why.

SMTP does not provide a standard way to check for mailbox existence. The workaround is to pretend you are sending an email, wait for the server to accept or reject the recipient address, then abort before actually sending anything. Mail servers increasingly detect and penalise this behaviour — your sending IP and email address get blacklisted. If you use this approach at scale, you are degrading your own deliverability.

Beyond the practical risks, the mailbox check has limited value:

  • Typos are far more common in the domain part than the local part. People remember their username; they mistype gamil.com for gmail.com or gmail.co for gmail.com. The mailbox check does not catch domain typos.
  • For major providers like Gmail, Hotmail, and Yahoo, almost every short username combination already exists. There is no reliable way to distinguish johnsmith@gmail.com from john.smith@gmail.com without actually sending an email.
  • Many domains use catch-all configuration — the mail server accepts mail for any address at that domain. A mailbox check returns a positive result regardless of whether the specific address is real.
  • The handshake adds significant latency, making it unsuitable for real-time form validation.

What we check

Email verification API — checks performed

Our Email Address Verification API performs the following checks:

  • Full multi-standard syntax compliance (RFC 822, RFC 2822, RFC 5321)
  • Domain check including mail server configuration and MX record validation
  • Check against known abusive email domains and accounts
  • Disposable email address detection

The result is fast, ethical verification suitable for real-time form validation — without putting your sending reputation at risk.