EDIT (2015-05-19) – The name navigator was a seldom accessed tool (despite being rather awesome) that nonetheless got endlessly assaulted by poorly programmed bots that would endlessly request the same files, so I took it offline. However if anyone is interested in it send me an email and I can give you 100% of the static files — in the end the design yielded a wonderfully static solution.
I’ve published v1.0 of the Name Navigator, which you can access at http://names.yafla.com. With it you can dive into millions of records with efficiency and ease. One of the greatest rewards for any developer is knowing that people enjoy your work, so I hope this provides value for someone.
To go back for a moment, the Social Security administration has social security registration details from 1910 onwards available for download. After a discussion with some friends about the rise of the name Jennifer in the 1970s — and what precipitated it — I had to analyze the data for myself, which naturally led to me building a whole tool around the process.
The web app should be fully functional in all modern, major browsers, including mobile variants (you can Add To Homescreen in Safari on iOS and on Android via the Chrome Beta. In both cases you get more screen real estate and less distractions).
Some things of note-
- You can advance the year via either the navigation controls in the lower left (back, auto-progress, forward) or by dragging the year within the range, using your mouse or via touch.
- The year range has a corresponding popularity chart for whatever number of names you are investigating.
- You can click on the legend to switch between % of state names to population of that name in a state versus the population of that name in the most populous state. This was a request of my wife who was less curious about the popularity of a name proportionally, but instead wanted to see raw counts. For general names this means that California, Texas, New York, Florida, Illinois, and Pennsylvania will dominate.
- The details on the right lists the name, total count, percentage (for that name relative to the total births for that gender) and rank of that name for the gender in the selected year.
- You can click on individual states! This filters both the graph under the year run, as well as the quantities in the details section. Using this you might determine that there were 4,957 Jennifers registered in New York state in 1973, comprising 4.01% of all females registered, becoming the #1 ranked, most popular name.
The data itself contains any name with more than five registered instances in the year for the state, the idea being that less common names are a privacy matter (I guess if you named your kid as your luggage lock code?), so for extremely rare names you may see it jumping between 0 and 5, where in actuality it was probably 1-4 each of the other years. Additionally you will find lots of overwhelmingly male names in the female name list, and vice versa. This may be gender confusion or just administrative errors, but it accounts for a very small percentage of the data.
On the technical side, this is one of those side projects that became very simple as it progressed. Right now it is nginx in front of a Go instance. The Go process loads in all of the name data, sorts it, aggregates it, builds rankings, etc. That Go instance is then lazily populating a static cache via nginx proxy caching, such that any given request is served once from the dynamic code, thenceforth from the static cache. This includes both name searches, where I’ve done some cleverness to ensure maximum reuse and performance, and individual name data.
When the 2013 data is released (presumably in March of 2014) I can purge the nginx cache and relaunch the Go instance with the updated data.
The process also maintains top n lists for names by jurisdiction by year, though I haven’t devised a way to incorporate that in the interface cleanly. It will come.
Anyways, this is one of few projects that I do publicly, so I might try to tease some blog posts out of it. For instance how horrendous RGB is versus the wonderful HSL/HSV. And I just generally like people enjoying the things I do (this is a major missing element when 99% of your work being proprietary and secret, targeting a very small userbase), so please share this tool if you find it interesting. It is, of course, completely non-commercial — I am not making bitbucks on this: It was just a lark with some data.
For those who look at the code and wonder what’s with all of the essentially manual element layout (versus, for instance, fixed), that was put there specifically for tablets, allowing you to zoom in on any section of the country yet maintain your pop. chart, controls, year slider, legend and details.
On the Go side there is some beautiful code and some ugly code. Data is stored in a results countType for instance (countType in this case is Int32). My other primary performance option was simply one giant array using offsetting math to get the range of appropriate elements.
EDIT: This app is currently seeing about 240 name searches per minute, which of course it is doing with a barely perceptible load. If network bandwidth were unlimited it should easily handle 10s of thousands of lookups per second.