Our analytics dev-disable guard keyed on NODE_ENV, so a prod-built Docker image running on a dev tier silently fired 530 events into the production PostHog project over 90 days. The fix was a hostname gate, not a bigger env-var matrix.

NODE_ENV Is Not Production: How One Env Var Polluted Our PostHog Funnel

A funnel report on a Monday morning is the wrong place to discover that quick_check_started is firing from localhost:3000. It's even worse to discover, scrolling down, that signup_completed has been firing from dev.acurio.ch for the last three months — the dev tier, the box that nobody is supposed to be feeding into the production analytics project. We had two leakage vectors, both routed through the same dev-disable guard, and they had been quietly polluting our NODE_ENV production check for ninety days.

The fix was small. The lesson behind it is the part worth writing down: in a world of prod-built Docker images, environment-keyed third-party SDKs, and multi-tier deployments, NODE_ENV === "production" is not a reliable answer to the question "should this code talk to the real production backend?" It's an answer to a different question, which is "is this a production build of the bundle?" Those two questions used to be the same. They are no longer.

The bug surfaced inside Acurio, our citation-verification product for academic theses. Acurio runs as a single self-hosted Next.js container on Coolify, with a separate dev tier (dev.acurio.ch) on a smaller box for pre-merge integration testing. Both tiers run the same image built by CI. That detail is what turned a single misjudged env-var check into a three-month analytics leak.

Why NODE_ENV Stopped Meaning What You Think It Means

PostHog's official Next.js guide, like most third-party SDK guides, hands you a snippet that gates initialisation on process.env.NODE_ENV === "production". The reasoning is straightforward and was correct a decade ago: production builds run with NODE_ENV=production, dev servers run with NODE_ENV=development, so this is a one-line filter that keeps tracking off your laptop. The Next.js documentation on NODE_ENV is explicit about the same convention.

The convention quietly broke as soon as we shipped a Docker image. A next build produces a standalone bundle with NODE_ENV baked to production. That image is then deployed to every tier that consumes it — prod, dev, staging, preview, anywhere a teammate happens to bun run build && bun run start on their laptop. On all of those, process.env.NODE_ENV === "production" is true, because they're running the production build. The variable lost its discriminative power the moment we stopped shipping dev and prod as separate artefacts.

Compounding this, the PostHog project key is a NEXT_PUBLIC_* value — public-by-design, baked into the client bundle at build time, by definition the same across every tier that pulls the same image. Even if you wanted to give the dev tier its own PostHog project, you couldn't do it through Coolify's runtime env without rebuilding the image with a different --build-arg. Our dev tier therefore inherited the prod PostHog key for free, and the only thing keeping its events out of the prod project was the NODE_ENV gate — which, as we now know, was returning true on the dev tier the whole time.

Over ninety days the result was roughly 530 contaminating events: quick_check_started and quick_check_completed from localhost:3000 (our own dev sessions), and signup_completed from dev.acurio.ch (a teammate testing the magic-link flow on the dev box). They are plausible events — they don't look like noise — which is why our funnel charts kept reporting suspicious "production" sign-ups that didn't appear in the database. The drift was small enough to ignore and large enough to skew any A/B comparison run in the same window.

The Hostname Gate

The fix that landed was deliberately small. We replaced the NODE_ENV gate at all three PostHog initialisation chokepoints with a check on the canonical production hostname:

const PROD_HOST = "app.acurio.ch";

function isTrackingEnabled(): boolean {
  if (typeof window === "undefined") return false;
  if (process.env.NEXT_PUBLIC_POSTHOG_ENABLE_IN_DEV === "true") return true;
  return window.location.hostname === PROD_HOST;
}

The client (apps/web/lib/posthog-client.tsx) reads window.location.hostname. The landing site (apps/landing/src/lib/posthog.ts) checks against acurio.ch. The server (apps/web/lib/posthog-node.ts) parses BETTER_AUTH_URL and inspects the hostname there, because Node has no window. All three return false for any host that is not literally the canonical production domain. The dev tier — whose BETTER_AUTH_URL is dev.acurio.ch — no-ops cleanly. A local bun run dev, a local bun run build && start, a Coolify preview deploy, a teammate running the prod image from their laptop: they all resolve to a non-prod host and are silently dropped from the SDK init. The NEXT_PUBLIC_POSTHOG_ENABLE_IN_DEV=true escape hatch is preserved for the (rare) case where you actually want to debug PostHog locally; it skips the host check on purpose and is documented as a debug-only opt-in.

This is the same shape of fix as the slowapi rate-limit headers gotcha we wrote up in May: the library wasn't broken, the contract was. The advertised contract was "guard on NODE_ENV"; the actual contract that produces correct production-only behaviour is "guard on something that is definitionally true only at the production endpoint." A hostname check satisfies that. An env var burned into a multi-tier Docker image does not.

A footnote on the BETTER_AUTH_URL choice on the server: we considered using process.env.VERCEL_URL and process.env.COOLIFY_URL-style runtime injections, but those introduce another layer of "is this set everywhere we expect," and any tier that forgets to set it goes back to the default — which is, naturally, prod. Picking the canonical hostname as the single source of truth means there is exactly one string in the codebase that controls whether tracking fires, and an audit can grep for it in two seconds.

What This Means for Any Multi-Tier Deployment

Three takeaways, in the order they'll save you time on the next deployment.

Stop equating NODE_ENV with "is this the production deployment." It answers a different question — "is this a production build of the JavaScript bundle." The two were equivalent when your dev environment ran next dev and your prod environment ran next start on the same box. They are no longer equivalent on any setup where (a) the build artefact is a Docker image, (b) the image is reused across tiers, or (c) developers ever run a prod build locally to reproduce a bug. All three of those are now the default. If your gate is meant to express "talk to the real production backend," gate it on the actual production endpoint — a hostname, a deployment ID, an env var that is unique to that one tier and you control end-to-end.

Treat your public env vars as a deploy-time constant, not a runtime switch. Every NEXT_PUBLIC_* value is baked into the client bundle at build time. If you want the dev tier to talk to a different PostHog project, a different Sentry DSN, or a different Stripe key, you cannot do it by setting a different env var in Coolify at runtime — by then the bundle is already on the wire with the prod value embedded. You have to either rebuild the image per tier with different --build-arg values, or accept that the differentiation has to happen at runtime in code, after the variable has been read. We picked the second option because we wanted one image to deploy across all tiers, and the host gate makes that safe. If you go the first route, the Next.js public runtime configuration docs explain the trade-off.

Audit the third-party SDKs that key on environment, not just the ones you wrote. Sentry, PostHog, LaunchDarkly, Statsig, Segment, Datadog RUM — they all ship guides that use the same NODE_ENV shortcut, and they all carry the same risk. The PostHog leak we cleaned up is a category of bug: any analytics or error-tracking SDK initialised behind a NODE_ENV gate will silently emit into production from any non-production tier that happens to run a production build. A grep for NODE_ENV across lib/ is twenty seconds of work and surfaces the entire blast radius. The broader lesson — that the safest defaults change as the deployment topology changes — is the same one we keep circling in the pilot-to-production playbook for Swiss SMEs: patterns that were correct at one scale become wrong at the next, and the only way to catch the drift is to revisit them on schedule.

The headline lesson is small enough to fit on a sticky note: NODE_ENV === "production" is not a production-deployment check; it is a production-build check. If you need the first, gate on the canonical hostname. The reason is worth the longer read — every multi-tier deployment that ships a single Docker image will reproduce this bug, and the symptom will be a funnel chart that quietly disagrees with the database for as long as nobody looks.

If you are about to wire analytics into a Next.js product that ships through Coolify, Fly, Railway, or any container platform that reuses a single build artefact across tiers, the host gate is the cheap upgrade that prevents a quarter of polluted funnels. Read the Next.js NODE_ENV docs for what the variable actually guarantees, and if you want a second pair of eyes on your deployment topology before you go live, book a free AI Potenzial-Check.

NODE_ENV Is Not Production: How One Env Var Polluted Our PostHog Funnel

NODE_ENV Is Not Production: How One Env Var Polluted Our PostHog Funnel

Why NODE_ENV Stopped Meaning What You Think It Means

The Hostname Gate

What This Means for Any Multi-Tier Deployment

acurio · Hallucinated citations? Not in your manuscript.

Related posts

useQuery vs useEffect: Refactoring a Next.js Dashboard from Fetch Hooks to TanStack Query

LLM Truncated JSON: The finish_reason Gotcha That Bridges OpenAI and Vertex

AI Agent Approval Gates in Next.js: Building a Multi-Step Agent Loop That Knows When to Stop