~ / startup analyses / The Niche Encyclopedia Field Guide: 23 Reference Sites, How They Got Built, and What They Became


The Niche Encyclopedia Field Guide

Last week I wrote about 36 niche encyclopedia ideas like Technovelgy.com. That post listed things you could build. This one looks at the things people already built. The actual reference sites of the internet, the ones run by one obsessive person or two friends in a basement, who somehow ended up cataloging an entire subculture.

I went through 23 of them. Founders, founding stories, traffic, money, who sold for what, who is still grinding. There are patterns. They are surprisingly consistent. And the threats are real, both Google updates and now LLMs eating the long tail.



What counts as a niche encyclopedia

Working definition: a website whose main job is to catalog every X for some narrow X, with one entry per item, searchable, cross referenced, ideally with a community of people who care more than is healthy. Wikipedia is too broad. A blog is too unstructured. A SaaS dashboard is not the thing. The niche encyclopedia is closer to a museum than a magazine.

These sites tend to be the only place on earth where the data lives in a clean form. That is the moat. Google rewards them for years; sometimes the wiki community of fans is the only thing keeping the data alive at all.


The Solo Obsessive archetype

One person. Twenty plus years. No team. Eventually the world finds them.

1. Technovelgy.com (Bill Christensen, ~2000)

The patient zero of this whole post. Bill Christensen is a former technical writer who spent ten years writing manuals for Ford, Unisys, and Northern Telecom before turning his sci fi obsession into a website. Technovelgy now holds over 3,000 sci fi inventions cross referenced with 6,400+ "SF made real" articles: every time a real engineer builds something a novelist already imagined. He also writes for Space.com and Live Science. Pure passion project, ad supported, still updated.

2. Etymonline / Online Etymology Dictionary (Douglas Harper, 2001)

Douglas Harper started Etymonline in 2001 as a side project. The domain landed in 2003. He pulled etymologies from the OED, Barnhart, Klein, Weekley, and Watkins, and synthesized them into something readable. Today it covers 30,000+ words, runs Etymonline Premium (paid subscription), ships iOS, Android, and Chrome extension apps, and has one associate editor (Talia Felix, since 2021). One guy, 25 years of compounding. It is the etymology dictionary the internet links to by default.

3. Behind the Name (Mike Campbell, 1996)

Mike Campbell started "The Etymology of First Names" in his college dorm in Victoria, BC in 1996, then renamed it BehindTheName.com in 1999. It is now THE reference for the meaning and history of first names from every culture. Sister sites for surnames and place names. Pure ad supported, still solo, still in Canada. The internet kept growing around him; he kept his data clean.

4. OEIS / On-Line Encyclopedia of Integer Sequences (Neil Sloane, 1964 to 1996)

This one is wild. Neil Sloane started recording integer sequences on punched cards as a Cornell grad student in 1964. He published two books of them. When the database got too big to print, he put it online by email in 1994 and as a website in 1996. He worked at AT&T Labs the whole time. In 2009 he transferred the IP to the OEIS Foundation. As of late 2025 it has over 390,000 sequences and grows by 30 a day. Mathematicians cite it in papers. It is the longest running niche encyclopedia I know of and proof that "this is just a hobby" can outlast actual centuries.


The Two Friends archetype

Two friends, an apartment, a hobby, a database. Half of these eventually got real money behind them. The other half stayed cozy.

5. BoardGameGeek (Scott Alden + Derk Solko, January 2000)

Scott "Aldie" Alden and Derk Solko launched BGG in January 2000. Alden never expected it to make money. Google AdSense changed that: by 2005 he could quit his day job. Now BGG runs ads, Patron support, "Geek Gold" cosmetic purchases, an annual fundraiser, a marketplace, AND a real life convention (BGG.CON, since 2005). In 2024 it had info on 150,000+ board games and around 300,000 daily active users. Possibly the cleanest example of a niche encyclopedia turning into a sustainable lifestyle business without VCs.

6. Discogs (Kevin Lewandowski, August 2000)

Kevin Lewandowski was an Intel programmer and a DJ. He registered discogs.com on August 30, 2000, to catalog his own electronic records. He literally built it in a closet (his apartment building had free T1 internet, the server lived in the manager's closet). The marketplace launched in late 2005 because users were already buying and selling through DMs. Today: 18 million+ user submitted releases, all genres, all formats. Probably the largest two sided marketplace ever built around a fan database. Still based in Portland.

7. MusicBrainz (Robert Kaye, July 2000)

Robert Kaye worked at EMusic on the FreeAmp player when he started MusicBrainz in July 2000 to fill the hole left by CDDB going proprietary. In 2004 he founded the MetaBrainz Foundation as a 501(c)(3): free for non commercial use, paid for commercial. MetaBrainz now stewards Picard, ListenBrainz, BookBrainz, Cover Art Archive. Sad note: Robert passed away on February 21, 2026. The foundation continues. This is the open data archetype: not a business, an institution.

8. MobyGames (Jim Leonard + Brian Hirt, March 1999)

Three high school friends. Leonard had the idea, Hirt wrote the code, Berk ran the business. Started March 1, 1999 with their personal collection of IBM PC games. Now 300,000+ games across hundreds of platforms. Crowdsourced, commercial database. Celebrated 25 years in 2024. The Wikipedia of video games before there was a Wikipedia.

9. RateYourMusic (Hossein Sharifi, December 2000)

Hossein Sharifi, software dev in Seattle, launched RYM on Christmas Eve 2000. It started as a tool to rate albums. It became the most opinionated music encyclopedia online. In 2015 he set up Sonemic, Inc. to formalize things; in 2023 the brand started shifting to "Sonemic / Rate Your Music." Still owner, dev, and sysadmin. Films were added in 2009. The site has the densest concentration of music nerds per square pixel on the internet ahah.

10. Letterboxd (Matthew Buchanan + Karl von Randow, 2011)

The newest entry on this list and the best exit. Two New Zealand designers from Cactuslab launched Letterboxd in 2011. By 2023 it had crossed 10 million members in 200+ countries. In September 2023, Canadian holding company Tiny acquired a 60% majority stake at a ~$50 million valuation. Buchanan and von Randow kept minority stakes and stayed in charge. The proof that you can take a Discogs style "list every movie I watched" community and turn it into a $50M business inside 12 years.

11. Atlas Obscura (Joshua Foer + Dylan Thuras, 2009)

This one stretches the definition: it became media. Foer (yes, the "Moonwalking with Einstein" guy and brother of Jonathan Safran Foer) and Thuras met in 2007 over the Athanasius Kircher Society blog. They launched Atlas Obscura in 2009 as a guide to weird places. They have raised $37.9M total, including a $20M round led by Airbnb. Louise Story is now CEO. Books, experiences, a media operation. Still niche encyclopedia DNA at the core: every weird place gets one entry, georeferenced, with photos.


The Volunteer Collective archetype

No founder gets rich. The site is bigger than any one person.

12. ISFDB / Internet Speculative Fiction Database (Al von Ruff + Ahasuerus, 1995)

Al von Ruff started a searchable awards database in 1993. By 1995 he and "Ahasuerus" (a regular on rec.arts.sf.written) had built ISFDB. Editing opened to the public in 2006 with moderator review. As of today it has 2.38M story titles from 283,651 authors. Hosted by Texas A&M's Cushing Library. Volunteer run, donation supported, cited by academics and Hugo voters alike.

13. IMDB origins (Col Needham, October 1990)

Worth including because it started as a niche encyclopedia exactly like the rest. Col Needham was a film geek in Bristol. He maintained personal lists in the late 80s. In October 1990 he posted the first searchable version on the rec.arts.movies Usenet group. From 1990 to 1996 it was a worldwide volunteer effort. First ad revenue in 1996 (an Independence Day promo). Incorporated January 1996 with the volunteers as shareholders. Acquired by Amazon April 24, 1998. Needham still runs it, telecommuting from Bristol. Every other niche encyclopedia is downstream of this story.

14. IMFDB / Internet Movie Firearms Database (Bunni, May 2007)

Launched in May 2007 by a user named "Bunni" because the founder couldn't find reliable info on the guns in The Matrix. Now 32,473 articles. The community includes actual professional armorers and prop masters who sometimes upload photos of the literal piece used on set. Volunteer run, ad supported, not for sale. The model is: pick a niche specific enough that no media company would touch it, and let the obsessives fill it.

15. IPDB / Internet Pinball Database (David Byers → Chris Wolf, 1997)

Predates Wikipedia. David Byers, a Swedish pinball fan, launched Pinball Pasture in 1997. With Frank Laugh he grew it to 4,000 machines and 2,000 photos by 2001. Updates slowed; Chris Wolf and a group of new hobbyists bought the ipdb.org domain in December 2001 and Jay Stafford became the public face. Cataloging "virtually every pinball machine ever commercially made." A 2025 piece on Kineticist asked whether IPDB is dying as updates have slowed and the community has aged. The classic risk for these volunteer sites: succession.

16. Snopes (David + Barbara Mikkelson, 1994)

Started as the Urban Legends Reference Pages in 1994. David and Barbara Mikkelson met on Usenet's alt.folklore.urban; Snopes is borrowed from a Faulkner family of unpleasant people. Originally a niche encyclopedia of urban legends. Became a fact checking institution by 2010 (7 to 8M monthly uniques). Acquired drama, lawsuits, and a real ownership saga. David stepped down as CEO in September 2022; Chris Richmond (yes, the same Chris Richmond who bought TV Tropes; see below) took over. Employees moved to unionize in July 2025. Niche encyclopedia, then media institution, then labor dispute. The full lifecycle.


The Wiki Federation archetype

17. Fandom (formerly Wikia): Wookieepedia, Memory Alpha, Bulbapedia, Logopedia, ~250,000 wikis

Wikia launched in October 2004 as Wikicities, founded by Wikipedia's Jimmy Wales and Angela Beesley Starling. The thesis was simple: Wikipedia rejects fan content as unencyclopedic, so let's host a for profit company that runs MediaWiki for every subculture. Wookieepedia (Star Wars), Memory Alpha (Star Trek, joined Wikia in February 2005), Bulbapedia (Pokemon, technically independent but in the same ecosystem). By 2007 Wookieepedia had 50,000+ articles. Ad supported, content under copyleft. Renamed Fandom. The dominant business model for fan run encyclopedias today even if every Star Wars nerd hates the ad density.


The Swipe File subgenre

A different beast: not "list every X" but "list every great example of X." The reference is curatorial rather than exhaustive. Two of these have eaten the marketing creator economy.

18. Swiped.co (Mike Schauer)

Mike Schauer (not Mike Driver, the search engines lie) spent six plus years studying conversion focused websites before launching Swiped to share what he found. 200+ categories, thousands of marketing and copywriting examples: ads, emails, pop ups, sales letters, direct mail. Each entry is annotated with the psychology and the lesson. A swipe file is a copywriter's encyclopedia of what works; Swiped is the public version. Labor of love, ad and affiliate supported.

19. Marketing Examples (Harry Dry, 2019)

Harry Dry, UK based ex web designer, launched marketingexamples.com in 2019. Pure swipe file format: one tweet sized lesson, one screenshot, repeat. He grew the newsletter from zero to 130,000 subscribers in around 3.5 years, no paid ads, no existing audience. Around 3,000 daily site visitors, ~200 daily newsletter signups. Monetized via newsletter sponsorships. The Reddit distribution playbook (post the whole article as value, plug newsletter at the bottom) is the textbook example of "make the thing so good it distributes itself."


How they actually make money

The patterns repeat. Here is what actually pays the bills, ranked by how often I saw it:

  1. Display ads. Everyone, basically. AdSense for the first decade, then Mediavine / Raptive / Ezoic / in house once traffic gets serious. Ad revenue is what kept BGG, Technovelgy, Behind the Name, Etymonline (free tier), IMFDB, RYM, and most of the wiki federation alive.
  2. Marketplace fees. The killer feature when the niche has physical goods. Discogs and BoardGameGeek both unlocked an order of magnitude more revenue once the "for sale" button appeared next to each catalog entry. The data attracted the buyers; the buyers attracted the sellers; the database self financed itself.
  3. Patron / donation programs. BGG, OEIS, MetaBrainz, ISFDB. Works when the audience is small and zealous and hates ads. Almost never the main revenue line, but enough to feel sustainable.
  4. Premium subscription. Etymonline Premium. RYM has experimented. Letterboxd Pro is the model that worked best (Letterboxd Pro and Patron tiers grew with the audience).
  5. Newsletter sponsorships. The Marketing Examples model. Niche encyclopedia → newsletter → sponsorship slots. Easy to start, hard to scale past one person.
  6. Books. Atlas Obscura sold a book of the same name and it became a bestseller. The Mikkelsons published Snopes books. Sloane published OEIS books. The encyclopedia is a great proposal hook for a publisher.
  7. Real life events. BGG.CON. Some Atlas Obscura events. Marketing Examples meetups. Lower revenue but tightens the community.
  8. Licensing the data. AllMusic and MusicBrainz both license database access to commercial customers. The B2B side is invisible to readers but is often the biggest line item.

The exits

People do sell. Not always for life changing money, but sometimes.

  • IMDB → Amazon, April 1998. Undisclosed price, but as one of Amazon's first ever acquisitions it set the template for "buy the database, keep the founder."
  • AllMusic → Macrovision/Rovi, 2007, $72 million. Michael Erlewine's compulsive archive turned into a corporate asset. Now spun off into All Media Network, licensing data from its former parent.
  • Know Your Meme → Cheezburger, March 2011. Reported as a "low seven figure" deal. Cheezburger itself was acquired by Literally Media in 2016. Two consecutive ad media rollups.
  • TV Tropes → Chris Richmond + Drew Schoentrup, 2014. "Fast Eddie" sold to two operators who immediately founded an adtech company (Proper Media) to monetize it, alongside Snopes and Salon. Proper Media was sold to Sovrn in 2021. Same playbook: buy passion site, drop ads in, exit the adtech wrapper.
  • Snopes → Chris Richmond, ongoing (2022). Same Richmond. Niche encyclopedia adtech rollup is a real and underdiscussed strategy.
  • Letterboxd → Tiny, September 2023, ~$50 million. Cleanest exit on this list. Founders kept minority stakes and editorial control. The Tiny model (buy beloved internet businesses, leave them alone) maps perfectly onto niche encyclopedias.
  • Atlas Obscura → not exited but $37.9M raised, including $20M from Airbnb. The "go raise like a media business" path. Higher ceiling, more pressure, way more team.

The pattern: you can sell a niche encyclopedia to (a) a tech giant who wants the data, (b) an ad media rollup, or (c) a holding company like Tiny. None of these require you to be a unicorn; all of them require you to have built something Google sends people to.


The Google Helpful Content Update problem

Bad news first. Google's Helpful Content Update (the September 2023 version, then March 2024 core update) hit niche sites brutally. Studies of independent niche sites between December 2023 and August 2024:

  • Nearly half of audited sites lost more than 90% of their organic traffic.
  • 27% of niche sites in one study lost over 91%.
  • By mid 2024, niche site operators were reporting 30% to 90% traffic losses across the board.
  • 800+ sites were entirely de indexed.

Google's stated thesis: thin, templated, AI generated content is bad for users. The actual outcome: forums (Reddit) and large authoritative brands (Forbes, NYT) ate the rankings of small specialized sites. The collateral damage included real, human, lovingly curated reference sites. Helpful Content Update is, ironically, a great way to delete the helpful content of one obsessed person.

Survivors share traits: deep individual articles, real expertise, brand searches, multiple traffic sources beyond Google, communities, newsletters, RSS, social. The encyclopedias on this list that already had a community moat (BGG, Letterboxd, Discogs, RYM, Fandom wikis) shrugged off the update. The ones that lived on Google rankings alone are the ones in trouble.


The LLM threat (and the LLM gift)

Then ChatGPT showed up and asked "why would I click through to Etymonline when I can just ask?" Good question. The honest answer is that for casual queries you don't. The traffic at the very top of the funnel ("what does the name Lucia mean") is gone. People ask Claude or ChatGPT and never visit.

But there are three reasons not to panic about the niche encyclopedia format:

  1. LLMs are trained on these sites. Etymonline, OEIS, IMDB, Discogs, MusicBrainz, ISFDB, IMFDB: all of them are baked into every major model. The encyclopedias are the source. They are upstream of the answer. The question is whether being upstream pays.
  2. Structured data still wins on edge cases. An LLM will hallucinate the catalog number of a 1973 German pressing of a Can album. Discogs will not. For any task that requires precision or completeness (every single pinball machine, every single integer sequence, every single firearm in every Tarantino film), the database still wins.
  3. Community is the moat now. Letterboxd's value is not the film database, IMDB has that. The value is the friends, the lists, the weekly discussion, the seasonal movie diary. LLMs cannot replicate the social layer. The encyclopedias that survive the LLM era are the ones that doubled down on community when they had the chance.

The ones in genuine trouble: pure passion sites, with no community, no newsletter, no marketplace, no licensing deals, that depend on Google traffic for ads. Technovelgy is exactly that profile. It is also exactly the kind of site that should be paid by every model lab whose training set ate it. We are in the early innings of figuring out how that works ;)


What I take from all this

A few things stand out after going through 23 of these.

The compounding is absurd. Sloane started in 1964. Needham in 1990. Mikkelson in 1994. Campbell in 1996. IPDB in 1997. AllMusic in 1990. Most of these sites have a 25 year head start on whatever you start tomorrow. The good news: most niches still don't have a Technovelgy. The first person to be the Technovelgy of [your niche] still wins.

Solo is fine, but two people is better. The most resilient stories are two friends with complementary skills. BGG (Aldie + Solko), MobyGames (Leonard + Hirt + Berk), Atlas Obscura (Foer + Thuras), Letterboxd (Buchanan + von Randow), MusicBrainz needed Robert plus the foundation. One person can start, but the ones that didn't burn out had a co founder.

Marketplace beats ads if your niche has stuff to buy. Discogs and BGG are the two healthiest businesses on this list and both unlocked their second life when they let users sell the things in the database. If your niche has physical (or digital) inventory, build the marketplace earlier than feels comfortable.

Don't depend on Google. Newsletter, RSS, community, Patreon, app, anything that creates a direct relationship. The encyclopedias that survived 2024 had multiple legs. The ones that didn't had one.

Pick a niche that LLMs can't fake. Niches with constantly updating data (live events, new releases, prices, locations), niches with deep correctness requirements (firearms, integer sequences, vinyl pressings), niches with social layers (Letterboxd, RYM). Avoid niches that are just "general knowledge about a topic" because that is exactly what LLMs are for.

The exit is real. Tiny, holding companies, ad media rollups, occasionally a tech giant. None of this is going to make you Mark Zuckerberg. All of it can buy you a house and a few years of freedom.

The motivation has to be obsession, not money. Every founder on this list is, frankly, a weirdo about their niche. Sloane records integer sequences for fun. Christensen actually reads sci fi novels and notes the gadgets. Schauer literally collects ads. Lewandowski is a DJ who couldn't stop cataloging records. If you don't already have the thing you want to catalog, don't start. If you do, the whole post is a permission slip ^^


Sources