Last year I was building an app at GainForest. Standard stack. Next.js, Tailwind, shadcn/ui. I'd been using shadcn dialogs everywhere, and they're great. Genuinely. They're the best off the shelf dialog you can drop into a project today, and I'd use them again tomorrow.

But "great" doesn't mean "solves my problems," and pretty quickly I had a list of problems.

I'm a frontend engineer, and I have this involuntary need for things to be pixel perfect. If something looks slightly off, or feels slightly wrong, it eats at me until I fix it. The shadcn dialogs were good, but every time I shipped something with one of them, I'd notice another thing that wasn't quite right. After about a year of this, the small annoyances had compounded into something I couldn't ignore, and I ended up writing my own modal system from scratch.

This is the story of how that happened, what I broke along the way, and why I'm calling the final thing a modal, not a dialog, not a drawer.

Problem 1: dialogs look like shit on phones

This isn't shadcn's fault. It's dialogs' fault. Dialogs were designed for screens with elbow room. They sit in the middle of the viewport, they have horizontal margin, they expect a mouse. On a phone, a dialog is a cramped little box floating in a sea of dimmed background, and the keyboard popping up makes it worse.

The "correct" answer in the shadcn ecosystem is the Drawer. Slide it up from the bottom, give it a handle, let people swipe to dismiss. It's the right call on mobile. But put a drawer on a 27 inch monitor and now that looks wrong. So you have a Dialog component, a Drawer component, two different APIs, and a media query somewhere in your component file that decides which one to render. Fine, you can build an abstraction. Most teams don't, and you end up with inconsistency. Even if you do, you still haven't solved the other problems.

Problem 2: one dialog wants to open another dialog

Real product example. You have a payment dialog that pays via a blockchain wallet. The user opens it. Cool. But they're not connected to a wallet yet. So now you need to prompt them to connect first.

What do you do? The instinct is to throw a second dialog on top of the first. But:

  1. 1.

    It's a UX violation. The accepted wisdom is one dialog at a time. Modal on modal is a sin.

  2. 2.

    It looks bad. The first dialog peeks out from behind, the overlays double up, the visual hierarchy is gone.

  3. 3.

    The keyboard trap breaks. Tab to a button in the dialog underneath the one you're looking at. Try focusing it. Watch the focus ring appear behind glass.

  4. 4.

    Click anywhere outside the top dialog and you dismiss the bottom one, because the top one's "outside" includes the bottom dialog.

You can technically hack around each of these. None of the hacks make you feel proud.

Problem 3: dialogs returned as JSX are a paradigm crime

This one was the most personal. The conventional way to use a dialog in React is:

const [open, setOpen] = useState(false);

return (
  <>
    <Button onClick={() => setOpen(true)}>Pay</Button>
    <PaymentDialog open={open} onOpenChange={setOpen} />
  </>
);

If you only ever open it from a button click, fine. But the moment you want to open a dialog from a JS function, say, in response to an API failure, or inside an async wallet handshake, you're back in React state land. You have to setOpen(true) from inside the handler, the dialog has to be rendered as part of the parent's tree, and if the trigger lives in a useEffect, congratulations, you've just added an extra render cycle to your app for what should be a one line side effect.

The shape I wanted was always:

showModal(SomeModal);

That's it. A function. From anywhere. No state, no JSX, no useEffect gymnastics. Toast libraries figured this out a decade ago. Why are dialogs still stuck in the JSX as control surface era?

The first attempted fix: someone's library on X

I was scrolling X one night and saw a clip of a guy demoing his solution to the stacked dialog problem. When you opened a second dialog, the first didn't get covered up. It scaled down slightly, dropped its opacity, and sat there in the background looking intentionally inactive. The new dialog floated in front of it. Focus trapping was handled. Outside clicks did the right thing. The whole interaction felt designed instead of patched together.

This was Toldo (toldo.vercel.app). I was sold. I cloned it into a side project.

Raphael Salaja on Twitter / X
Toldo ― An elevated dialogs component for React.Built on top of Radix, it offers a simple API for developing beautiful dialogs. pic.twitter.com/oUL0HcPMe6— Raphael Salaja (@raphaelsalaja) November 6, 2024
https://x.com/raphaelsalaja/status/1854196335060402268

It didn't stick. Two reasons.

The first was paradigm. Toldo wanted you to declare all your modals up front, in one place, like routes. On paper that's organized. In practice it's an architectural noose. You can't co-locate a modal with the feature that owns it, every new modal is a trip to the central registry, and the bigger the app the worse it gets. It's the dialog version of putting all your routes in one giant App.js.

The second was that managing the stack didn't feel natural. The control surface didn't give me what I wanted.

But, and this is the important part, the aesthetic of stacked but deferential modals was real. That part was a genuinely good idea. So I did what any obsessive frontend dev does: I took the parts I liked and rewrote the rest.

Version 1: a context with a stack

The first version I built for myself was a context provider that held a stack of modals as React state. Hooks exposed a small set of operations:

  • pushModal(variant). Add a modal on top.

  • popModal(). Remove the top one.

  • clear(). Empty everything.

Each modal in the stack was a { id, content } pair. The provider rendered the whole stack, and the topmost one was the "active" modal. The ones below got that scale down, opacity down treatment. I was deep into Framer Motion by then, so I added a blur on the inactive ones too.

The blur was the sickest part. It made the inactive modals feel like they were behind frosted glass, like they were physically further away. Motion blur on transitions. Frame perfect height interpolation. The whole thing felt buttery.

And the API was finally what I wanted:

const { pushModal } = useModal();
pushModal({ id: "connect-wallet", content: <ConnectWallet /> });

No JSX in the parent. No open state. Define a modal anywhere in your codebase, push it from anywhere else. Done.

I shipped it. It worked. But there was a problem I'd been avoiding.

The phone problem was still there

The stacked dialog aesthetic only makes sense on a screen big enough to show the stack. On a phone, you don't have room for "the previous modal scales down slightly behind the current one." There is no behind. There's just the viewport. So the whole effect, the thing I'd been so proud of, was useless on the device most people would actually use this on.

This bothered me for a few months. I'd open the app on my phone, watch a modal pop up looking exactly as bad as a default dialog has always looked on mobile, and then close it and go back to whatever I was doing.

And then one day I had a different idea.

What if the stack wasn't visual?

Here's the reframe. The stack of modals is a data structure. There's no rule that says it has to render as a literal stack of visible elements. What if the stack only existed in code, and what the user actually saw was a single modal frame, the same frame, always, whose contents changed?

That changes the problem completely.

Instead of:

  • modal A renders

  • modal B pushes on top, A scales back

  • B closes, A scales forward

You get:

  • modal frame renders, showing A's content

  • B is pushed onto the stack. The frame's content animates from A to B.

  • B is popped. The frame's content animates back from B to A.

The frame itself never moves. It doesn't grow a clone behind it. It just sits there, like a window into the modal system, and the contents flow through it.

I sketched out the animation in my head. When content changes:

  • The current content slides off to the left, scaling down slightly, opacity dropping, and blurring as it goes. That motion blur effect again, because it's still sick.

  • The new content slides in from the right with the opposite curve. Scaling up, fading in, unblurring.

  • At the same time, the modal frame's height animates to match the new content's height, smoothly, without ever dropping a frame.

That last part is the hardest. If you naively let the modal change height, you get a jolt. If you measure and tween, you have to do it without causing layout thrash. Framer Motion plus a ResizeObserver handles this beautifully. Measure the inner content with the observer, drive the outer height with a motion value, and the frame breathes in and out as content swaps.

The result was something I hadn't seen anyone else do. It felt like a single object that morphed between states, not a stack of cards. And, critically, it worked on phones, because there was only one frame at a time.

Even better, I made the frame portrait shaped. Content designed for a portrait frame looks dramatically better on a phone screen than content designed for a landscape dialog jammed into a 390px viewport.

But a portrait frame on a phone is still a dialog

Even with the redesign, and I redesigned the visual style too. Bigger corner radius, more modern, more breathing room. A dialog is still a dialog. On a phone, the correct container is a drawer. It comes up from the bottom, you can swipe it down, it doesn't fight with the keyboard. There's a reason every native mobile OS does it this way.

So I bit the bullet. I pulled in shadcn's Drawer alongside the Dialog. I wrote a media query hook (useMediaQuery) that decides at runtime which mode we're in: dialog or drawer. The provider switches between the two based on screen size. The animations and the stack logic stay the same. The only thing that changes is the actual container. A centered dialog above some breakpoint, a bottom drawer below it.

The components that go inside the modal (ModalHeader, ModalTitle, ModalDescription, ModalFooter) are mode aware. ModalFooter reads from a context (ModalModeContext) and renders either a DialogFooter or a DrawerFooter underneath. The author of a modal doesn't have to care. They just write content, and the system handles the rest.

So now the full feature set was:

  • One frame, always. Single modal container, never stacked visually.

  • Content flows through the frame. Slide, blur, scale transitions in both directions, height morphs to match.

  • Dialog on desktop, drawer on mobile. Same API, same content, the right container for the device.

  • Stack based state. Push, pop, clear. A clean imperative API.

  • No JSX returns. Define modals where they belong, open them from anywhere via a hook.

You can read the implementation here: https://github.com/GainForest/bumicerts-monorepo/tree/main/apps/bumicerts/components/ui/modal

A few notes on what's in the code, for anyone digging through it:

  • context.tsx is the brain. It holds the stack, decides between dialog and drawer mode, and renders the active stack of ModalWrappers inside whichever container is appropriate.

  • ModalWrapper.tsx is the per modal motion shell. The slide, scale, blur, and opacity transitions all live here. It also handles focus trapping. When the active modal changes, focusable elements inside the wrapper get re-scoped, and Tab/Shift+Tab are intercepted to keep focus inside.

  • AnimateChangeInHeight.tsx is the small but critical piece that uses a ResizeObserver to drive the outer height as a motion value. This is what makes the frame breathe smoothly when content changes.

  • use-current-modal-info.ts does something a bit cheeky. It reads the currently active modal's title, description, and dismissibility directly from the DOM using data-* attributes. This lets each modal's content declare its own accessibility metadata without the modal author having to thread it through props.

Why call it a Modal instead of Dialog or Drawer? Because the system isn't either one. It's the abstraction above both. The user gets a dialog on desktop and a drawer on mobile from the same call site, with the same content, the same animations, the same stack. The word "modal" stops being the visual thing and becomes the concept. A transient overlay that demands attention. Whether it's centered or bottom anchored is an implementation detail the system handles.

The landscape problem

I was riding high on this for a while. Then, a few weeks ago, we ran into something I hadn't planned for.

We needed to display an iframe inside a modal. The iframe was a third party widget that only worked in landscape. Wide, not tall. Our modal frame was constant width and portrait shaped on purpose, because keeping width constant is what makes the height animation easy. If the width can change too, you've got to animate two dimensions simultaneously and keep the contents from looking like they're being squished through a pasta maker.

I'd been so attached to the constant width invariant that I genuinely didn't want to touch it. But the requirement was real and the workaround would've been ugly. Open the iframe in a new tab, or build a different modal system just for this one case. Both options sucked.

I figured this was a good moment to lean on AI and see how far it could get. I described the constraint, the existing animation system, and what I needed. The thing I was scared of, that allowing variable width would shatter the slick transitions, didn't happen. The fix was a dialogWidth field on each modal variant, threaded through to the dialog container as a Tailwind max-width class, with transition-[max-width] on the container itself so the frame morphs between widths the same way it morphs between heights. Smooth. Composable. No regression on the existing modals, which all default back to the original portrait width.

That one modal now opens, the frame elegantly expands into a landscape rectangle, the iframe loads in, and when you close it the frame collapses back to portrait. It's the kind of detail nobody will consciously notice, and that's exactly the point.

What I learned

A few things stuck with me from this.

The "right" UX principle is sometimes wrong if you obey it literally. "One modal at a time" is good advice if you read it as don't make users juggle multiple competing focus contexts. It's bad advice if you read it as the literal pixel count of dialog elements on screen must never exceed one. The first version of my system had two dialogs visible at once and was still good UX, because the inactive one was visibly inactive. The principle is about cognitive load, not pixel count.

Composition over registration. Toldo wanted you to centralize modals. I wanted to define modals next to the feature that owns them and call them from anywhere. The hook based approach beat the registry approach for the same reason React Context beat Redux for most use cases: locality wins.

JSX is not the only control surface. A component that's only ever rendered to be hidden, then shown, then hidden again, doesn't belong in the render tree of its caller. Push it into a provider, control it imperatively, and your parent components get to forget it exists.

Constraints that enable animation are worth defending, until they aren't. The constant width invariant was load bearing for the height animation. I was right to protect it. I was also right, eventually, to relax it once a real use case required it. The trick is knowing the difference between a constraint that makes the system simpler and a constraint that's just become a habit.

Animations are not decoration. The blur on the inactive modal, the height tween when content changes, the slide out slide in choreography. These aren't candy. They're how the system communicates state to the user. Without them, a modal swap is a jump cut. With them, it's a continuous experience. Users don't have to consciously parse what just happened.

I don't know if this modal component is the most perfect on earth. (I said that to myself at one point. I was being dramatic. It's pretty good, though.) But I do know that I haven't wanted to rewrite it. Every problem I had a year ago is gone. Every new problem we've thrown at it, landscape iframes included, has been a small, contained extension instead of a structural rewrite.

That, more than anything, is the metric I trust. Not whether a system is clever, but whether it stops being a problem.