The Search For A Domain Name
I recently had need for a mid-sized amount of real-world data, necessary for testing purposes on low-end hardware (testing and demonstrating some of the new functionality of SQL Server 2005). I wanted something that wasn’t confidential, which excluded the easy choice of using business data, and I refrain from using artificial data. Around the same time I happened across the requisition process for the .COM/.NET and .EDU TLD zones, so I made a request for access.
Soon enough I had the 3.5GB of .COM domain names, along with 650MB of .NET, loaded into the database (although for all results in this entry I only included the .COM TLD, for the data as of 2pm on March 28th, 2006. I’ll analyze the other ones at a future date). It was a great foundation for a lot of tests and demonstrations, and served my original goal admirably. I didn’t stop there, however; Curiousity led me to do some basic analysis to see what sorts of domain names are registered, and how saturated the registry really is.
Note that these are the Verisign distributed zone files, and do not include entries that have no nameservers configured, or which are in a hold state. While those comprise a very small minority of domain names, it does skew the results a bit. To improve accuracy when the sample set is small, for some of the tests I have validated the positives using the WHOIS infrastructure (for instance the domain file had several two letter sequences as being “available”, and a dozen three letter sequences. All of them were the result of a hold state, or no nameservers configured). For aggregate results where it was inapplicable, I’ve filtered international domain names (IDN) from the results (prefaced with xn--).
You’ve thought up a brilliant idea for a new Web 2.0, AJAX-enabled web app, or you’re about to release a thus-far-unnamed killer software app. Now you just need to find the perfect domain name for it to live at (and, in true new-economy fashion, you’ll base your corporate name upon whatever available domain name you find… PILLAGEANDPLUNDR Corporation).
You pull up GoDaddy and start punching in clever names, along with their many variations, only to find that they’re all seemingly taken.
“This can’t be!” you cry. “Has every possibility already been registered?”
Given that there are approximately 50 million .COM domains registered, it is indeed true that the low-hanging fruit domain names are overwhelming taken, and your chances of lucking upon an unnoticed available three-letter acronym (TLA) are close to zero, and your only recourse would be to haggle with domain
What About Acronyms?
If you want one of the 676 possible two-letter sequences, for instance for an acronym or abbreviation, you’re out of luck: They’re all taken. Even allowing for digits, giving 1296 combinations, again every single variation is taken.
Of course, that’s ignoring the fact that .COM registrars now mandate a 3-character minimum length, so it wouldn’t be an option anyways.
Of the 17,576 possible three-letter sequences, again every single one is already taken. Adding digits to the mix (note that I’m intentionally ignoring obtuse dashes for such short domain names, though technically they are legal from the second character onwards), giving 46,656 permutations, yields a larger number of garbage domain entries (either REGISTRAR-LOCKED, REDEMPTIONPERIOD, or with no nameservers), giving a false hope of 228 seemingly open domains, yet they aren’t actually available.
If you’re dying to acquire great domains like 8VZ.com or Q6X.com, they’ll free up within a month, though it seems evident that there are swaths of domain speculators acquiring every variant when they come available, so they won’t go without a fight.
Stepping up to four letter sequences, choosing among the 456,976 combinations, yields a vastly greater availability — perhaps the set is a bit too large for domain speculators and their unlikely success with random sequences — with 97,786 showing as open. A quick check verifies that most are legitimately available. “Choice” domains, such as AGJV.com, EIYK.com, GZVW.com, and QFEV.com. Adding digits into the mix and there are a massive 1.16 million open domains, so long as you’re looking for something like 7RG8.com, or U3JZ.com. Choose one and then manufacture a ridiculous backronym to explain it.
Going to 5-letter sequences (yet another five-letter acronym? YAFLA?), and of course the possibilities are rich, again presuming that you’re willing to accept an arbitrary sequence of letters and/or digits, creating a backronym to match. Using just letters you have a rich 11,881,376 possibilities, of which
approximately 11,015,028 are unclaimed.
How Long Are Most Domains?
Of course many of the registered domains are seldom, if ever, visited, with a huge percentage having nothing more than a parked page (users pay domain registrars to put up ads for themselves).
Thus, analyzing the domain database without taking into account popularity/traffic is of limited value, but it does provide for a bit of entertainment.
As mentioned, 100% of 2 and 3 letter domain names are taken, but it starts to free up as the number of possibilities expodes, all the way up to 63-character domain names. The most popular
registered domain name length is actually 11 characters long, tailing off from there.
The fun doesn’t end at 31 characters, however. There are 253,000+ non-IDN domains that are 32 characters or longer, including 538 that are 63 characters long.
These include such superlative domains as ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ.com, WEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEB.com,
What About Names?
The US Census Bureau has some handy common name files available on their site, so I thought I’d see how one’s luck would be trying to register their own name(s).
If you’re looking for a masculine domain name, you’ll be disheartened to learn that of the 1219 male names listed by the US Census Bureau, every single one is registered. If you’re looking for something feminine, you’re in luck: As I type this, of the 2841 female names listed by the Census, you can soon grab the lucrative recently expired Erlinda.com, or the sitting in purgatory
Shanita.com, though both are technically currently taken.
On the family name front, 100% of the top 10,000 family names are registered.
Cross joining the top 300 male names with the top 300 family names finds that ~10,112 of the 90,000 possibilities aren’t registered, to the benefit of anyone named Antonio Hughes and Lawrence Torres out there! Similarly, cross joining the top 300 female names with the top 300 family names finds that ~14,103 possibilities are unclaimed.
Domain Name Love
On the love front, 1958 (68.9%) of the 2841 possible ‘ILOVE’-prefixed female names (using the census set of names) sit unclaimed, which is surprizing, as only 665 (54.5%) of 1219 ‘ILOVE’-prefixed male names remain available.
Continuing down that path, the seedier side of the internet is hardly a secret, and it’s evident in the DNS database as well. 268,971 domains contain the sequence SEX (11,333 of them also
containing the sequence FREE), while 143,683 domains contain the sequence LOVE.
The most common letter to start a domain is S, with relatively few domains starting with Q, X, Y or Z.
While the most common digit to start a domain is, unsurprizingly, 1.
Every successful company has remoras and haters, so it was interesting to look at the number of suffixed alternatives for some well-known domains. While some of these are actually owned by the
root domain owner, most are hanger-ons and critics.
Samples include GOOGLE-AMERICA, GOOGLE-BUDDY, MICROSOFT-EBOOKS, SLASHDOTREVIEW, SLASHDOTSLASH, and YAHOO2007.
Hopefully this was a bit entertaining, and maybe even informative. I’m doing a much more intriguing, large-scale analysis (again, it’s a nice opportunity to demonstrate some of the new SQL
Server 2005 functionality) that I’ll publish soon, but these were the low-hanging fruit.