~ / startup analyses / Open Source Microlink: Building a Self-Hosted Link Intelligence API


Open Source Microlink: Building a Self-Hosted Link Intelligence API

Microlink.io is a headless browser API. You send it a URL, it sends back metadata (title, description, image, author), a screenshot, or a PDF. It runs Puppeteer so you don't have to. The free tier gives you 100 requests per day. If you're building anything serious, you hit that ceiling fast.

The business is impressive for a solo product. Kiko Beats (the founder) has been running it since 2018, and it's genuinely good. But 100 requests/day free, then $9/month for 3,000 requests/day -- and you're not allowed to send sensitive URLs to a third-party server. If you work at a company with internal tooling, admin panels, or confidential documents, you literally can't use it.

Core thesis: An open source, self-hosted alternative to microlink.io wins on three vectors simultaneously -- privacy (URLs never leave your infra), cost at scale (no per-request pricing), and drop-in compatibility (same API shape, just swap the base URL). The open source project is the distribution strategy for the managed cloud product, which is where the revenue comes from. You don't need to beat microlink.io. You need to be the obvious answer for everyone who can't or won't use a third-party API.




3. 2. Why Open Source Wins Here

Not every product category is a good fit for open source. This one is a great fit. Here's why.

Privacy is a hard blocker, not a preference

A startup building an internal knowledge base, a company with a private admin panel, a law firm with client document URLs -- none of these can use microlink. The URL itself is sensitive. Self-hosting is not a nice-to-have. It's the only viable option. Open source gives them auditability on top of self-hosting, which matters even more in regulated industries.

The cost curve works in your favor

Microlink charges per request. Compute is relatively cheap -- a Puppeteer screenshot costs maybe $0.001 in actual infra costs. At microlink's $0.003/request implied rate at the $9 tier, there's margin, but the user paying $79/month for 50K requests per day would rather spin up a $20/month VPS and run it themselves. Open source makes that trivial. Your managed cloud competes on convenience, not on capability.

Microlink is a one-person project

Kiko Beats is a great developer. But one person can't build everything. The GitHub repo (github.com/microlinkhq) has issues sitting open for years. Feature requests that never ship. An open source community of 20 active contributors will outship a solo developer every time. This is the "community as co-developers" moat that commercial open source builds on.

Drop-in replacement is a powerful magnet

If your API is compatible with microlink's response format, switching is trivial. Change one environment variable. Every developer who has ever been frustrated with microlink's rate limits is a potential GitHub star. And GitHub stars are your distribution engine.

The ecosystem already exists

Puppeteer has 88K GitHub stars. Playwright has 66K. There are thousands of developers who know how to work with headless browsers but have never wrapped it into a clean, self-hostable API service. Your project gives them something to contribute to.


4. 3. Positioning: The Three Messages That Land

You don't need a clever positioning statement. You need three sentences that immediately make the right person say "oh, this is exactly what I was looking for."

Message 1: "Self-hosted microlink.io"

Use this in the GitHub repo description. Use this in the README first paragraph. Use this in every "alternatives to microlink" thread you find. People are already searching for this. They know microlink. They want the thing microlink does, running on their own servers. You don't need to explain what link metadata extraction is. You don't need to educate. Just say what you are.

A self-hosted, open source alternative to microlink.io. Same API, your infrastructure, zero rate limits.

Message 2: "URLs never leave your network"

This is the message for the enterprise buyer, the security-conscious developer, the startup with an internal tool. Privacy as a feature, not a value statement. It's concrete. It answers the question "why not just use microlink?" without mentioning microlink.

Message 3: "Free forever if you self-host, $29/month if you want us to run it"

This is the pricing message that converts. Open source developers are allergic to SaaS pricing. They want to know upfront: can I just run this myself? Yes. Always. And when they eventually don't want to deal with ops, there's a $29/month option. No hidden limits. No feature gates on the self-hosted version.

What to name it

A few directions that work well:

NameVibeNotes
LinkpeekFriendly, descriptiveClear what it does. .io probably available.
PreviewrClassic OSS namingSlightly dated but very searchable.
UnfurlTechnical, exactAlready a Node.js lib (unfurl.js). Name conflict risk.
MetascrapeDescriptive, no nonsenseGood for SEO but a bit dry.
LinkshotScreenshots + linksCaptures both main features in one word.
OpenlinkOpen source signal in the nameMight conflict with the browser extension.
PeekyPlayfulMemorable. Logo potential. Not immediately descriptive.
VizurlVisual + URLShort, unique, no obvious conflicts.

Best pick: Linkshot. It names both main features (link metadata + screenshots), it's a real word, the domain is plausible, and it reads well in a GitHub description. Runner-up: Vizurl. Shorter, unique, and the "viz" prefix signals visual output.


5. 4. Technical Stack

This is not a hard product to build technically. The headless browser ecosystem is mature. The challenge is making it operationally simple to self-host, and making the API clean enough that contributors want to work on it.

Core engine

Use Playwright over Puppeteer. Playwright is actively maintained by Microsoft, supports Chromium/Firefox/WebKit, and has a better TypeScript API. Puppeteer is still valid but Playwright is where the community is going.

Language

Node.js/TypeScript for maximum contributor surface. Everyone who uses Puppeteer or Playwright knows Node. A Go rewrite might make the binary smaller and the deploy simpler, but your contributor pool would shrink by 80%. Ship in TypeScript, optimize later.

Metadata extraction layer

Don't just parse Open Graph tags manually. Combine multiple sources:

  • Open Graph tags (og:title, og:description, og:image)
  • Twitter card meta tags
  • JSON-LD structured data (Schema.org)
  • HTML head fallbacks (title, meta description)
  • RSS/Atom feed metadata for blog URLs
  • oEmbed for YouTube, Twitter, SoundCloud, etc.

The library metascraper (npm) does most of this already. Use it. Don't rebuild what exists. Your job is the API wrapper and the self-hosting story, not the parsing logic.

Caching

Metadata and screenshots should be cached. Redis is the obvious choice. Make it optional -- the self-hosted version should work without Redis for low-traffic installs. Add Redis support via an environment variable. Cache TTL should be configurable, defaulting to 24 hours for metadata and 1 hour for screenshots (pages change more often visually).

Self-hosting story

This is more important than any feature. The self-hosting experience determines whether your GitHub README converts readers into users. The bar is:

  • One command to run: docker run -p 3000:3000 linkshot/linkshot
  • One docker-compose.yml for the full stack (app + Redis)
  • Zero config required for the basic case
  • Environment variables for everything else (port, cache TTL, API key auth, proxy config)
  • A health check endpoint at /health
  • A curl example in the README that works in 30 seconds

API compatibility

Mirror microlink's query parameter interface exactly for the core endpoints. Someone using https://api.microlink.io/?url=https://example.com should be able to switch to https://your-instance.com/?url=https://example.com with zero code changes. This is your migration story. Make it literally one environment variable change.

The Chrome/Chromium dependency problem

This is the hardest operational problem. Chromium is 300MB+ in the Docker image. It has system library dependencies that vary by Linux distro. The solution: use playwright-chromium and base your Docker image on mcr.microsoft.com/playwright, which handles all dependencies. Your resulting image will be ~800MB. That's fine. Alternatives like a bring-your-own-Chrome mode (point to a Chrome instance via CDP) should be optional for power users.

Build estimate

A working v0.1 with metadata extraction and screenshots, Docker image, and API compatibility: 2-3 weekends by one developer who knows Node.js and has touched Playwright before. The hard parts are the operational details, not the logic.


6. 5. Contributor Acquisition

Getting contributors is about making it easy to contribute and finding people who already have the problem you're solving. There are five channels, ordered by conversion rate.

Channel 1: Microlink's own GitHub issues (hottest)

Go to github.com/microlinkhq and look at open and closed issues. Filter by "self-host", "docker", "open source", "self-hosted". These are people who want your product and are already technical enough to open a GitHub issue. Some of them have been waiting years for microlink to ship a self-hosted version. A few of them will want to help build it.

Don't spam their issues with links to your project. That's bad form and Kiko will close the issues. Instead, note the GitHub usernames. Some of them will have public email addresses. Email them with the GitHub playbook framing: "I saw you opened issue #X asking for self-hosting. I built it. Would you try it and tell me what's broken?"

Channel 2: Puppeteer and Playwright contributors

People who have contributed to Puppeteer (1,300+ contributors) or Playwright (1,000+ contributors) already know the domain. They've debugged headless Chrome issues. They know what good looks like. A subset of them would love a cleaner project to contribute to -- one where they're not working on the browser engine itself but on the product layer on top. Find them via the contributor lists on those repos.

Channel 3: The "good first issue" pipeline

Before you launch, write a list of 10-15 GitHub issues tagged "good first issue". These should be real tasks, not fake "add a README badge" issues that experienced developers find insulting. Real good first issues for this project:

  • Add oEmbed support for Vimeo URLs
  • Add a configurable User-Agent header
  • Support PDF generation from raw HTML (not just URLs)
  • Add a /health endpoint with version info
  • Write a docker-compose example with Redis
  • Add support for HTTP proxy configuration via environment variable
  • Write integration tests for screenshot endpoint
  • Add rate limiting middleware (optional, disabled by default)

When you announce on Hacker News, Show HN traffic will hit the repo and some percentage of readers will look for a way to contribute. "Good first issue" is how they find their entry point.

Channel 4: r/selfhosted and adjacent communities

The r/selfhosted community (300K+ members) runs on Docker. They self-host everything. Your project is exactly the kind of thing they bookmark and try. Some of them will file bug reports. A smaller percentage will submit PRs. That's fine -- users-to-contributors is a low conversion rate by design. You need the user volume to surface the contributors.

Post when you launch v0.1. Don't post when the code isn't working yet. The r/selfhosted bar is a functioning Docker image with a clear README.

Channel 5: Direct outreach to adjacent OSS maintainers

There are several OSS projects that need exactly what you're building:

  • Memos -- open source note-taking, links need previews
  • Gitea -- open source GitHub, repository link cards
  • Outline -- open source Notion, paste a URL, get a card
  • BookStack -- open source documentation, embed previews
  • Wallabag -- open source read-later, metadata extraction

Contact their maintainers directly. Not "please add my project as a dependency". Instead: "I'm building an open source link preview service. I'd love to add first-class support for [project name]. Would you test it?" Give them a custom integration. Get a GitHub issue or discussion in their repo that mentions your project. That's SEO plus social proof.

Contribution infrastructure (don't skip this)

Before you post anywhere, have these ready:

  • CONTRIBUTING.md with clear setup instructions (the setup should take under 5 minutes)
  • A code of conduct (boring but expected; use Contributor Covenant)
  • Issue templates for bug reports and feature requests
  • A GitHub Actions CI pipeline that runs on every PR (lint + tests)
  • Clear commit message conventions (Conventional Commits is the standard)
  • A ROADMAP.md that shows where the project is going

Contributors don't contribute to projects that feel chaotic. They need to believe their PR will be reviewed and merged. Respond to every PR within 48 hours in the first six months. This is the most important commitment you can make.


7. 6. User Acquisition

Users and contributors are different audiences and need different channels. Users just want the thing to work. Contributors want to feel ownership. Here are the eight user acquisition channels, ranked by expected volume and quality.

Channel 1: Hacker News Show HN

The canonical launch channel for developer tools. A well-executed Show HN for a self-hosted microlink alternative should land 300-800 GitHub stars in 24 hours if the demo works and the README is clean. The title matters enormously:

Show HN: Linkshot -- self-hosted link previews and screenshots, drop-in microlink.io replacement

Mention in the first comment: (1) what it does, (2) why you built it, (3) the drop-in compatibility story, (4) a live demo URL. Respond to every comment in the first 4 hours. Be honest about what's missing. HN rewards self-awareness.

Channel 2: "Self-hosted alternatives" SEO

Write a blog post titled "Self-hosted microlink.io alternative: full guide". This is direct SEO. People Google "microlink alternative", "self-hosted link preview", "microlink self hosted". These searches exist right now and return no good results. A 1,500-word post with a working Docker command will rank in the top 5 within 2-3 months if the domain has any authority. Publish it on your project's documentation site or on dev.to (which ranks fast for technical queries).

Channel 3: r/selfhosted

A well-timed post on r/selfhosted should get 200-500 upvotes if the project is solid. The community immediately understands self-hosted alternatives. Title your post something like: "I built a self-hosted link preview and screenshot API (open source, Docker one-liner)". Don't pitch. Show the Docker command. Show example output. Let the work speak.

Channel 4: r/webdev and r/javascript

Different audience. These are developers who might not be running their own infra but are building apps that need link previews. They're currently using microlink, iframely, or writing their own OG tag parser. Your project gives them a self-hostable option. Show the API call. Show the JSON response. Show the screenshot. Simple.

Channel 5: alternativeto.net and similar directories

List your project on alternativeto.net as an alternative to microlink.io. It's low-effort and generates a steady drip of referral traffic forever. Also add to:

  • awesome-selfhosted (GitHub list with 200K+ stars)
  • selfh.st (self-hosted app directory)
  • ProductHunt (for the cloud version launch)
  • stackshare.io

Channel 6: Docker Hub pull count as social proof

Publish to Docker Hub. Once you have 10K+ pulls, put that number in your README. "10,000+ Docker pulls" signals that real people are using this. It compounds: more visibility leads to more pulls leads to more credibility. Make Docker Hub your vanity metric in year one instead of GitHub stars (though you'll have both).

Channel 7: Integration-first growth

Write official integrations for the most popular self-hosted apps that need link previews. A one-click "add Linkshot to Outline" guide. A Gitea plugin. A Memos configuration snippet. These live in your documentation but also get posted in the support communities of those projects. It's free user acquisition from communities that are already pre-qualified.

Channel 8: "Why I built this" post

Write 800 words about the specific moment you realized you needed a self-hosted microlink. What were you building? What URL were you trying to preview that you couldn't send to a third-party API? How did you decide to build it yourself instead of paying? This story is more persuasive than any feature list. It finds the people who have had the exact same experience. Publish on your personal blog, cross-post to dev.to and Hashnode.


8. 7. Monetization: From GitHub Stars to MRR

The open source project is the top of the funnel. The managed cloud is where money comes from. This is the standard commercial open source model: 95% of users self-host for free, 5% pay for the cloud because they don't want to deal with ops.

The cloud product

It's literally the same code you're giving away for free, deployed on your infrastructure. You handle updates, uptime, scaling, Chromium dependency hell. The user gets a subdomain and an API key and never thinks about Docker again. The value proposition is time, not features. Do not gate features behind the cloud version. That kills goodwill fast.

Pricing structure

TierPriceRequests/monthScreenshotsSupport
Self-hostFree foreverUnlimited (your infra)YesGitHub issues
Cloud Hobby$9/month50,000/monthYesEmail, 48h
Cloud Pro$29/month300,000/monthYesEmail, 24h
Cloud Team$79/month1,000,000/monthYesSlack, 4h
EnterpriseCustomUnlimitedYesSLA, dedicated

Note: the Cloud Hobby tier is more generous than microlink's $9 tier (50K vs 90K requests/month). You can afford to be more generous because your cost structure is lower if you run on commodity VPS infrastructure rather than premium managed cloud.

The conversion trigger

Users convert from self-hosted to cloud when: (a) they're spending more than $29/month in their own time maintaining it, or (b) they're at a company where they need an SLA and can't get IT to approve running their own Chromium server. Target the second persona for enterprise deals. Target the first persona with onboarding emails that show cost of self-hosting at scale.

Revenue projection

StageGitHub StarsCloud customersMRR
Month 350015-30$300-$600
Month 61,50050-80$1,000-$2,000
Month 123,000120-200$3,000-$6,000
Month 247,000300-500$9,000-$15,000
Month 3612,000600-1,000$18,000-$30,000

These are conservative. If a single Show HN hits the front page and stays there for 6 hours, you can skip straight to month 6 numbers in week 1. The compounding comes from the OSS project being referenced in blog posts, tutorials, and Stack Overflow answers over time.


9. 8. The Three-Year Path

Year 1: Ship and find your community

Q1: Build v0.1. Metadata extraction + screenshots + Docker image + API compatibility. Launch on Show HN. Get to 500 GitHub stars. Find 3-5 people who actively file issues. Convert 1-2 of them into contributors.

Q2: Add PDF generation. Add Redis caching. Launch the cloud product at a soft beta. Start writing integration guides. Get listed on awesome-selfhosted. Post on r/selfhosted. Target: 1,500 stars, 30 cloud customers, $600 MRR.

Q3-Q4: Ship structured data extraction (the scraping feature). This opens up a new user segment -- people doing content monitoring, price tracking, research automation. Publish a library of extraction "recipes" (predefined selectors for common sites). The recipe library is community-driven and becomes a moat. Target: 3,000 stars, $3,000 MRR.

Year 2: Deepen the product and find the enterprise angle

The biggest signal you'll get in year 1 is which industries are using you for what. Link previews in internal documentation? Screenshot archiving for compliance? Content monitoring? Each of these is an enterprise wedge with its own procurement motion. Choose one to double down on. Build the features that enterprise buyers need: SSO, audit logs, usage reports, dedicated instances.

Target: 7,000 stars, 300 cloud customers, 2-3 enterprise contracts, $10,000-$15,000 MRR.

Year 3: The category question

By year 3 you have a choice. Stay in the "headless browser API" category (microlink's category) or expand into adjacent spaces: content intelligence, web archiving, automated thumbnail generation. The decision depends on where your customers are pulling you. Follow the pull, not the original vision.

Target: 12,000+ stars, fundable at seed if you want to raise, $25,000-$35,000 MRR on the conservative path. $50,000+ MRR if the enterprise motion works.


10. 9. Risks and How to Manage Them

Risk 1: Microlink open sources itself

Kiko Beats has been asked about this multiple times and hasn't done it. The product has been closed since 2018. But it's not impossible. If it happens, your OSS project would compete directly with a 7-year-old codebase with more features. The hedge: build a genuinely better community and better documentation. Microlink's code quality is excellent but the community is nonexistent. A 20-person contributor community beats a solo dev even if the solo dev is brilliant.

Risk 2: Browserless.io captures the market

Browserless.io is an open source headless browser service. It's more powerful than what you're building (full Chrome DevTools Protocol access) but also more complex. It's not a drop-in microlink replacement. The audiences overlap but aren't identical. Browserless targets developers who want to run arbitrary browser scripts. Your project targets developers who want a clean REST API for the 90% use case. These can coexist.

Risk 3: The Chromium ops burden kills you

Running Chromium in production is genuinely annoying. Memory leaks. Process crashes. Page hang detection. Concurrency management. These are solved problems (Puppeteer Cluster, Playwright's browser pool), but you have to implement them correctly. Budget significant time for operational reliability. A self-hosted product that crashes once a week and requires a container restart is worse than a rate-limited SaaS.

Risk 4: Legal issues with scraping

Metadata extraction from public pages is almost certainly fine everywhere. Screenshots of public pages are almost certainly fine. Structured data extraction gets complicated when users start extracting data from sites that prohibit scraping in their ToS. This is a user's legal problem, not yours -- you're providing a tool, not a service. Include a clear disclaimer in your documentation and don't build features specifically designed for circumventing anti-bot measures.

Risk 5: Nobody pays for the cloud version

This is the real risk for any OSS product. Developers self-host and never upgrade. The conversion rate from self-hosted to paid is typically 1-3%. At 3,000 GitHub stars, assume 500-1,000 active self-hosters. At 2% conversion, that's 10-20 paying customers. You need volume. The answer is to make the cloud product compelling for specific personas (companies, teams, non-technical users), not for individual developers. Price the cloud product for the company's budget, not the developer's personal credit card.


11. 10. Verdict

This is a real opportunity. Not a get-rich-quick play -- the revenue ceiling on a pure link preview API is lower than on a developer productivity tool. But as a first commercial OSS project, it has almost everything you want: a clear incumbent to position against, a real privacy/cost problem that the incumbent can't solve, a technical foundation that exists (Playwright, metascraper), and a community (r/selfhosted, the headless browser ecosystem) that is actively looking for exactly this.

The biggest strategic bet is whether you can expand past "link preview API" into something with a larger TAM. Content intelligence, web archiving, automated visual regression testing -- these are adjacent. Start with the microlink replacement. Let the users tell you where to go next.

The project is buildable by one developer in a month of weekends. The community-building takes a year. The revenue gets interesting in year two. If you're looking for a project that can fund a lifestyle business by year three without raising money, this fits. If you're looking for a $10M ARR play, this probably isn't it alone -- you'd need to find the vertical expansion where the market is larger.

Build it. Launch it. See who shows up. The answer to "is there a big market here?" is somewhere in your first 500 GitHub stars.