Ruby Concurrency: What Actually Happens

Ruby Concurrency: What Actually Happens

Since I wrote about async Ruby and patched Solid Queue to support fibers, people keep asking the same questions. What happens when a fiber blocks? Don’t you still need threads? What about database transactions? What about Ractors?

This post answers all of it. From the ground up.

The four primitives

Ruby gives you four concurrency primitives: processes, threads, fibers, and Ractors. They nest. Every process has an implicit “main Ractor” where your code runs by default, so you never have to think about Ractors unless you explicitly create one. Without Ractors, the hierarchy is simply process – threads – fibers. With Ractors, it becomes:

graph TD P[Process] --> R1["Ractor 1 (GVL 1)"] P --> R2["Ractor 2 (GVL 2)"] R1 --> T1[Thread 1] R1 --> T2[Thread 2] R2 --> T3[Thread 3] T1 --> F1[Fiber A] T1 --> F2[Fiber B] T2 --> F3[Fiber C] T3 --> F4[Fiber D] T3 --> F5[Fiber E] style P fill:#4a90a4,color:#fff style R1 fill:#c084fc,color:#fff style R2 fill:#c084fc,color:#fff style T1 fill:#7fb069,color:#fff style T2 fill:#7fb069,color:#fff style T3 fill:#7fb069,color:#fff style F1 fill:#e8a87c,color:#fff style F2 fill:#e8a87c,color:#fff style F3 fill:#e8a87c,color:#fff style F4 fill:#e8a87c,color:#fff style F5 fill:#e8a87c,color:#fff

Think of your computer as an office building.

Processes are fully isolated: separate offices, each with its own locked door, furniture, and files. Each process has its own memory, its own Ruby VM, and its own GVL. When you run Puma with 3 workers, you get 3 processes. They can’t corrupt each other’s state because they don’t share memory. The OS schedules them independently. The cost: each one loads your entire application into memory.

Ractors sit between processes and threads: offices that share a mailroom but not their filing cabinets. Each Ractor has its own GVL, so threads in different Ractors can execute Ruby code truly in parallel, but they can only pass notes to each other – no shared mutable objects. You communicate via message passing, copying or moving data between them. Every Ruby process has a “main Ractor” where all your code runs by default. Creating additional Ractors is opt-in.

Threads live inside a process and share its memory: workers sharing the same office, accessing the same filing cabinets, coordinating to avoid collisions. The OS preemptively schedules them, meaning it can pause any thread at any time and switch to another. You don’t control when this happens. The GVL prevents threads from executing Ruby code in parallel, but it releases the lock during I/O. So two threads can wait on two different network calls simultaneously, but they can’t crunch numbers at the same time.

Fibers live inside a thread and are cooperatively scheduled: multiple tasks juggled by one worker at their desk. When they’re waiting for something – a phone call, a fax, a response – they set it aside and pick up the next task. A fiber runs until it explicitly yields. When it hits I/O – a network call, a database query, reading a file – it yields to the reactor, and another fiber picks up. No OS thread context switch for the fiber itself, no preemption. One thread can run thousands of fibers.

Here’s what that means for cost:

  Process Ractor Thread Fiber
Memory full app copy ~thread + Ractor state ~8MB virtual stack reservation ~4KB initial virtual stack, grows as needed
Creation time ~ms ~80μs ~80μs ~3μs
Context switch kernel kernel (threads within) ~1.3μs (kernel) ~0.1μs (userspace)
Isolation Full (own memory) Share-nothing (messages) Shared memory Shared thread
Parallelism Yes Yes (own GVL) No (shared GVL) No
I/O concurrency Yes Yes Yes Yes
Rails compatible Yes No Yes Yes

Creation and switching benchmarks are from Samuel Williams’ fiber-vs-thread performance comparison. Fibers create 20x faster and switch 10x faster than threads. The memory row is about virtual address space reserved by the platform/runtime, not resident memory. The benchmark reports actual RSS, where the gap is much smaller than the virtual stack numbers suggest. But the shape is still real: each thread is a kernel object with scheduler state and a stack reservation, while each fiber is scheduled in userspace. Ractors give you parallelism too, but can’t run Rails. Everything is a tradeoff.

How scheduling works

This is where most of the confusion lives. Let me show you what actually happens.

Preemptive scheduling (threads)

The OS controls when threads switch. Your code has no say. A thread could be paused mid-calculation, mid-assignment, mid-anything.

sequenceDiagram participant OS as OS Scheduler participant T1 as Thread 1 participant T2 as Thread 2 participant LLM as LLM API OS->>T1: Run T1->>LLM: Send request Note over T1: Blocks in I/O (parked) OS->>T2: Run T2->>LLM: Send request Note over T2: Blocks in I/O (parked) Note over OS: Both threads parked LLM-->>T1: Response ready LLM-->>T2: Response ready OS->>T1: Wake and run Note over T1: Processing response OS->>OS: Time slice expired OS->>T2: Preempt T1, run T2 Note over T2: Processing response OS->>OS: Time slice expired OS->>T1: Resume T1 Note over T1: Finish response OS->>T2: Resume T2 Note over T2: Finish response

The OS can preempt runnable threads on a timer, but a thread blocked in I/O is parked until the socket is ready. That part matters: threads do not spin uselessly while waiting for tokens. The preemption happens when a thread is runnable – including in the middle of response processing, object allocation, assignment, or any other Ruby code.

For two threads doing I/O, this works fine. The overhead is noise. For 200 threads mostly waiting for LLM tokens, the problem is the one-operation-per-thread shape: 200 kernel threads, 200 stack reservations, 200 scheduler entries, and usually 200 copies of whatever per-thread application resources the worker holds.

This is also why a worker limit means different things in Solid Queue’s current thread mode and in the fiber mode from my patch. threads: 25 is both “run 25 jobs at once” and “create 25 kernel threads.” If all 25 jobs are streaming tokens, job 26 waits. fibers: 250 is mostly an admission limit for the reactor: run up to 250 jobs as fibers on the same thread, park the ones waiting on I/O, and resume them when ready. You still need limits because APIs, sockets, memory, and databases have limits. But the cap is no longer tied to one kernel thread per job.

Cooperative scheduling (fibers)

Fibers switch only when they choose to. In practice, the async gem makes this automatic: your code yields at I/O boundaries without you writing anything special.

sequenceDiagram participant R as Reactor participant F1 as Fiber 1 participant F2 as Fiber 2 participant LLM as LLM API R->>F1: Run F1->>LLM: Send request Note over F1: Yields (I/O wait) R->>F2: Run F2->>LLM: Send request Note over F2: Yields (I/O wait) Note over R: Both waiting, reactor sleeps LLM-->>F1: Response ready R->>F1: Resume immediately Note over F1: Processes response F1->>R: Done LLM-->>F2: Response ready R->>F2: Resume immediately Note over F2: Processes response F2->>R: Done

No OS thread context switch per fiber. No timer-based preemption between fibers. When a fiber yields, the reactor checks which fibers have I/O ready and resumes them. When nothing is ready, the reactor sleeps in the OS until something is. The kernel still does the I/O readiness work; Ruby just avoids one kernel thread per wait.

The GVL: why threads and fibers are more similar than you think

This is the part that makes thread-based Ruby less different from fiber-based Ruby than it first looks.

The GVL means only one thread can execute Ruby code at a time. Threads run in parallel only during I/O, when the GVL is released. So if your workload is I/O-bound – HTTP calls, database queries, LLM streaming – threads give you I/O concurrency, not parallelism.

Fibers give you the same I/O concurrency. One fiber yields at I/O, another picks up. The difference: fibers do it without OS scheduling overhead, without the memory cost of a thread stack, and without needing a database connection per concurrent job.

If threads only help with I/O anyway, why pay their overhead?

There is one case where threads win: CPU-bound work that releases the GVL. Some C extensions (image processing, cryptographic operations) release the GVL while doing heavy computation. Multiple threads can then run those C extensions in parallel. Fibers can’t do that. They share a thread.

For actual Ruby-level CPU parallelism, you need processes or Ractors. Processes are production-ready and Rails-compatible. Ractors are lighter than processes, but still experimental.

What happens when a fiber hits I/O

This is the happy path and the most common question.

# Inside a fiber
response = Net::HTTP.get(URI("https://api.example.com/v1/completions"))

Here’s the full chain:

  1. Net::HTTP opens a socket and sends the request
  2. The socket isn’t readable yet (the server hasn’t responded)
  3. Ruby calls rb_io_wait on the socket
  4. The async gem’s Fiber.scheduler intercepts this call
  5. The scheduler suspends the current fiber and registers the socket with the event loop
  6. The reactor runs other fibers while this one sleeps
  7. When the socket becomes readable, the reactor resumes this fiber
  8. Net::HTTP reads the response as if nothing happened

Your code doesn’t change. No await, no callbacks, no promises. The same Net::HTTP.get call that works in a thread works in a fiber. The yield is invisible.

Bob Nystrom called this the function color problem in 2015. In languages with async/await, every function is either sync or async. An async function can only be called with await, and await can only live inside another async function. The color spreads upward through your entire call stack.

Python:

# Python: the color spreads, and you need different libraries
async def get_user(id):
    async with aiohttp.ClientSession() as session:  # can't use requests
        response = await session.get(f"/users/{id}")  # must await
        return await response.json()                   # must await

async def handle_request():  # must be async because it calls get_user
    user = await get_user(1)  # must await

You can’t use requests in async Python without blocking the event loop. You need aiohttp, httpx in async mode, or a thread wrapper. You can’t use the blocking psycopg2 API as async I/O; you need asyncpg or Psycopg’s async API. The ecosystem splits: sync libraries and async libraries, doing the same thing differently.

JavaScript:

// JavaScript: same problem, less severe (Node has fewer library splits)
async function getUser(id) {
  const response = await fetch(`/users/${id}`);  // must await
  return await response.json();                   // must await
}

async function handleRequest() {  // must be async
  const user = await getUser(1);  // must await
}

Ruby:

# Ruby: no color
def get_user(id)
  response = Net::HTTP.get(URI("https://api.example.com/users/#{id}"))  # just a normal call
  JSON.parse(response)                            # just a normal call
end

def handle_request
  user = get_user(1)  # just a normal call
end

Same Net::HTTP. Same pg. Same call stack, as long as the library uses scheduler-aware Ruby I/O. The fiber scheduler intercepts I/O at the Ruby runtime level, below your code. Your methods don’t know and don’t care whether they’re running in a thread or a fiber.

What happens when a fiber does CPU-bound work

# Inside a fiber
100_000.times { Digest::SHA256.hexdigest("work") }

This blocks the reactor. No other fiber runs until it finishes. There’s no I/O boundary to yield at, so the fiber holds the thread.

sequenceDiagram participant R as Reactor participant F1 as Fiber 1 (CPU) participant F2 as Fiber 2 (I/O) R->>F1: Run Note over F1,F2: F1 doing CPU work... Note over F2: Waiting to run Note over F1,F2: F1 still computing... Note over F2: Still waiting F1->>R: Done R->>F2: Finally runs

This is not a bug. It’s the tradeoff of cooperative scheduling. Fibers are designed for I/O-bound work. CPU-bound work should go on a thread, where the OS can preempt it.

With my fiber-mode patch for Solid Queue, this is a configuration choice:

workers:
  - queues: [ chat, turbo, notifications ]
    fibers: 50       # I/O-bound: use fibers
  - queues: [ cpu ]
    threads: 2        # CPU-bound: use threads

One backend, two modes, matching the concurrency model to the workload.

What happens when a fiber queries the database

The pg gem has supported Fiber.scheduler since v1.3.0. When a fiber executes a query, the pg gem sends it non-blockingly via PQsendQuery, then calls rb_io_wait on the PostgreSQL socket. The scheduler intercepts this, suspends the fiber, and lets others run while PostgreSQL processes the query.

# Inside a fiber
user = User.find(42)  # yields while waiting for PostgreSQL

The fiber yields. Other fibers run. When PostgreSQL responds, the reactor resumes the fiber. Your code doesn’t know the difference.

Connection sharing

With threads, every thread can query the database at the same time. Each one needs its own connection. With fibers, the important difference is that ordinary Active Record query paths can release connections between DB operations, so a much smaller pool is often enough. If you need more concurrent DB access, increase the pool and fibers will check out separate connections concurrently. The reactor never preempts a fiber – it only switches when a fiber yields at an I/O boundary:

sequenceDiagram participant R as Reactor participant F1 as Fiber A participant F2 as Fiber B participant Pool as DB Pool (1 conn) participant PG as PostgreSQL participant HTTP as HTTP API R->>F1: Run F1->>Pool: Check out F1->>PG: SELECT * FROM users Note over F1: Yields (waiting for PG) R->>F2: Run F2->>HTTP: GET /api/data Note over F2: Yields (waiting for HTTP) PG-->>R: F1's result ready R->>F1: Resume F1->>Pool: Return F1->>R: Done HTTP-->>R: F2's result ready R->>F2: Resume F2->>Pool: Check out F2->>PG: UPDATE messages SET ... Note over F2: Yields (waiting for PG) PG-->>R: F2's result ready R->>F2: Resume F2->>Pool: Return F2->>R: Done

Active Record 7.2+ makes this work: ordinary query paths can release connections between DB operations instead of holding them for the fiber’s lifetime. Check out, query, return. The minimum pool size is often 3 per process (1 execution + 2 for worker overhead), but jobs that hold transactions, use connection-local session state, or explicitly pin connections need more. For DB-heavy workloads, bump the pool size.

What happens when a fiber starts a transaction

If fibers share a connection, can one fiber’s transaction leak into another?

No. Active Record handles this correctly.

When a fiber starts a transaction, it holds the connection for the entire duration – from BEGIN to COMMIT or ROLLBACK. The connection is not released mid-transaction. Other fibers that need the database wait for the connection to be returned.

sequenceDiagram participant R as Reactor participant F1 as Fiber A participant F2 as Fiber B participant Pool as DB Pool (1 conn) participant PG as PostgreSQL R->>F1: Run F1->>Pool: Check out F1->>PG: BEGIN F1->>PG: UPDATE accounts SET ... Note over F1: Yields (waiting for PG) R->>F2: Run F2->>Pool: Check out Note over F2: Waits (connection held by F1) PG-->>F1: Result R->>F1: Resume F1->>PG: COMMIT F1->>Pool: Return F1->>R: Done Pool->>F2: Connection available F2->>PG: SELECT * FROM accounts Note over F2: Yields (waiting for PG) PG-->>F2: Result R->>F2: Resume F2->>Pool: Return F2->>R: Done

Under fiber isolation (config.active_support.isolation_level = :fiber), Active Record tracks connection ownership per fiber. The connection gets a real Monitor lock. No other fiber can touch it during a transaction.

Safe. No interleaving. Fiber B just waits.

For the target workload – LLM streaming, HTTP calls – database touches are short reads and status updates. Transactions are brief. The wait is negligible. If your jobs run long transactions, those jobs belong on a thread-based worker.

What happens when you have too many fibers

Fibers aren’t free. Each one uses memory (~4KB), and each one might hold open connections to external services. If you spawn 10,000 fibers that all hit the same API, you’re opening 10,000 connections to that API. The API will not be happy.

Async doesn’t eliminate resource limits; it changes where they show up. With threads, the limit is explicit: 25 threads, 25 concurrent jobs. With fibers, the limit is implicit: you keep going until something else breaks.

The fix is a semaphore. The FiberPool in my Solid Queue patch uses one:

semaphore = Async::Semaphore.new(size)

# Only `size` fibers run concurrently
semaphore.async do
  perform_job
end

When you configure fibers: 100 with the patch, that’s not “unlimited fibers.” It’s a semaphore capping concurrency at 100. You control the ceiling.

“Why not just configure more Solid Queue threads?”

In plain Ruby, more threads can be reasonable. In Solid Queue thread mode, threads: 200 means more than “allow 200 jobs to wait on I/O.”

Kernel threads are the expensive unit. Samuel Williams’ benchmarks show fibers allocate 20x faster (~3μs vs ~80μs), switch 10x faster (~0.1μs vs ~1.3μs), and achieve 15x higher throughput (~80,000 vs ~5,000 requests/second). The OS can schedule thousands of threads, but scheduler entries, stack reservations, wakeups, and GVL coordination make that a poor default concurrency knob.

Solid Queue currently enforces a database-pool guard. Today it expects threads + 2 database connections per process, so 200 threads across 2 processes won’t boot unless the pool is at least 404. That guard may be conservative for I/O-heavy jobs; there’s an open issue about making it advisory or bypassable. But it is still a guard you hit today.

A blocked job still occupies its worker thread. The OS can park an LLM streaming thread until the socket is ready, but in Solid Queue thread mode it still consumes one of the configured thread workers. If all 25 are streaming tokens, job 26 waits.

Fibers make the Solid Queue limit mean “how many jobs may wait at once” instead of “how many kernel threads should exist.” They still need limits, but the limit is no longer one kernel thread per waiting job.

“Why not Ractors?”

Ractors solve a different problem. Fibers give you I/O concurrency – many things waiting at once. Ractors give you CPU parallelism – many things computing at once.

Here’s what they look like:

# Two Ractors computing fibonacci in parallel
r1 = Ractor.new { fibonacci(38) }
r2 = Ractor.new { fibonacci(38) }

r1.value  # Ruby 4.0+
r2.value  # Both ran in parallel, each with their own GVL

Each Ractor has its own GVL, so they can execute Ruby code truly in parallel across CPU cores. The tradeoff: strict isolation. You can only share immutable (frozen) objects. Everything else gets copied or moved between Ractors via message passing. Access a mutable variable from an outer scope? Ractor::IsolationError.

When Ractors win, they win big. Fibonacci(38) five times: 0.68s with Ractors vs 2.26s sequential. 3.3x speedup. Real parallelism.

But they are not a practical answer for Rails jobs yet:

  • Still experimental in Ruby 4.0. Creating a Ractor still emits the experimental API warning.
  • Many gems don’t work without changes. Gems that rely on mutable constants, global variables, class variables, or shared process state can hit Ractor::IsolationError.
  • No Rails integration. ActiveRecord, ActionCable, the router, the logger – Rails is built on shared mutable state. None of it runs inside a Ractor.
  • No Ractor-based job queue exists.
  • Still active bug surface. The Ruby bug tracker still has Ractor-related issues, including recent crash reports.

For I/O concurrency, Ractors don’t help at all. Each Ractor still has threads constrained by its own GVL. Fibers within those threads still do the actual I/O multiplexing. Ractors add CPU parallelism, which is not what LLM streaming needs.

For Rails jobs that need CPU parallelism today, processes are still the boring answer. Puma already uses that model for web workers. Ractors may become useful for isolated CPU-heavy Ruby work, but they are not the answer to this Solid Queue I/O problem.

“Isn’t this just what JavaScript does?”

No. I showed the code comparison above. JavaScript’s async/await is a colored concurrency model: the async keyword spreads upward through every caller. Ruby’s fibers are colorless: your existing code works unchanged, and the scheduler handles yields below your code.

There’s a deeper difference too. JavaScript async/await runs on an event loop. Ruby fibers run on top of a multi-threaded runtime. You can have multiple Ruby threads, each running its own reactor with its own fibers, and mix fibers and threads in the same application. Node can run JavaScript in parallel with worker_threads, but that’s a worker/isolate model, not the same thing as putting multiple reactors inside ordinary application threads.

“Isn’t this just what Go does?”

Closer. Goroutines are lightweight, runtime-scheduled, and multiplexed across OS threads. Conceptually similar to Ruby fibers, but Go’s scheduler can also preempt goroutines.

Two differences:

  1. Go has true parallelism. Goroutines run across multiple OS threads with no GVL equivalent. CPU-bound goroutines run in parallel. Ruby fibers don’t.

  2. Ruby has existing code. If you have a Rails application with hundreds of thousands of lines of Ruby, you can add fiber-based concurrency without rewriting anything. Your models, your controllers, your views, your gems – they all work. With Go, you’re rewriting.

If you’re starting from scratch and need both I/O concurrency and CPU parallelism, Go is a strong choice. If you have a Ruby application and need I/O concurrency, fibers give you that without a rewrite.

“Fibers need Async do blocks. That’s still new syntax.”

Someone on Hacker News called this out: I said “no async/await” but the examples show Async do and .wait.

Here’s the actual change:

# Before
chat = RubyLLM.chat
response = chat.ask("Hello")

# After
Async do
  chat = RubyLLM.chat
  response = chat.ask("Hello")
end

Two lines of wrapping. Your application code inside doesn’t change. Your models don’t change. Your gems don’t change. Nothing gets a new keyword.

In Python, adopting async means rewriting every function signature in the call chain to async def, adding await to every call, and replacing or wrapping blocking libraries. requests becomes aiohttp or async httpx. Blocking database APIs become async database APIs. Your test framework changes. Your middleware changes. It’s a rewrite.

Two lines of wrapping vs. rewriting your stack. That’s not even the same conversation.

When to use what

flowchart TD A[What kind of work?] --> B{CPU-bound?} B -->|Yes| C{Need parallelism?} C -->|Yes| D{Rails?} D -->|Yes| E[Processes] D -->|No| H[Ractors] C -->|No| F[Threads] B -->|No| I[Fibers] style E fill:#4a90a4,color:#fff style H fill:#c084fc,color:#fff style F fill:#7fb069,color:#fff style I fill:#e8a87c,color:#fff
  • I/O-bound work (LLM streaming, HTTP calls, webhooks, email delivery): fibers. Low overhead, high concurrency, shared database connections.
  • CPU-bound work (image processing, data crunching, PDF generation): threads. The OS can preempt them, and C extensions can release the GVL for parallelism.
  • CPU parallelism with Rails: processes. Each one gets its own GVL, its own memory, its own everything. Puma already does this.
  • CPU parallelism without Rails: Ractors (when they graduate from experimental). Lighter than processes, true parallelism, but strict isolation means most gems don’t work.
  • All of them at once: that’s what a well-configured Rails app does. Puma forks processes. Each process runs threads. Fibers run inside those threads for I/O-heavy jobs. They coexist.
# Solid Queue with the fiber-mode patch: all three working together
workers:
  - queues: [ chat, turbo ]
    fibers: 50        # I/O-bound: fibers
    processes: 2       # parallelism: processes
  - queues: [ pdf, images ]
    threads: 4         # CPU-bound: threads
    processes: 1

No single model is universally better. The right answer is matching the model to the workload.


This covers every “what happens when” question I’ve gotten so far. If I missed yours, find me on Twitter; I’ll either update this post or write a follow-up.

Newsletter