RubyLLM 1.16: Concurrent Tool Execution, Rails-Style Instrumentation, and api_base for Every Provider

RubyLLM 1.16: Concurrent Tool Execution, Rails-Style Instrumentation, and api_base for Every Provider

When you first reach for an LLM library, the only question is whether it works. Can it call the model, parse the response, run a tool. Once your app is actually in production, the questions change. Is it fast? Can I see what it’s doing when something goes wrong? Can I send its traffic through my own infrastructure instead of straight out to the provider?

I released RubyLLM 1.16 today. It answers these production questions.

The three headline features are about speed, visibility, and control: tools that run concurrently, structured events for everything RubyLLM does, and a configurable base URL for every native provider. None of them change how you write your app. All of them matter the moment real traffic shows up.

Tools That Run Concurrently

When a model returns several tool calls in one response, it’s telling you those calls are independent. Get the weather, look up the stock price, fetch the exchange rate. The model didn’t ask for them in order. It asked for all of them.

RubyLLM has always run them one at a time. For tools that are CPU-bound that’s fine, but most tools aren’t. Most tools are an HTTP call, a database query, another LLM request. They spend their time waiting. Running three waits back to back, when you could have waited for all three at once, is time your user can’t even get back.

1.16 runs them together. Turn it on for every chat from one place:

RubyLLM.configure do |config|
  config.tool_concurrency = true # :threads, :fibers, true, or false
end

true uses :threads and needs no dependencies. If you’d rather not pay for a thread per tool, which I’d recommend for I/O bound applications, :fibers mode uses the async gem and gets you the same overlap on a single thread. Check out my previous posts on why I think async is the future of Ruby and what Ruby concurrency actually does.

When one conversation needs different behaviour than the rest, override it per chat:

chat.with_tools(Weather, StockPrice, Currency, concurrency: :fibers)
chat.with_tools(Weather, StockPrice, concurrency: false)

Inside Rails, each concurrent tool call runs wrapped in the Rails executor, so connection pools, CurrentAttributes, and reloading behave the way the rest of your app does. You don’t think about it. It just works.

And concurrency doesn’t make your UI wait for the slowest tool. Each result is added back to the conversation the moment that tool finishes, in completion order, so your streaming callbacks see results land as they happen. RubyLLM still gathers every result before going back to the model, but your users watch progress instead of a spinner.

Instrumentation Without Monkey Patching

You can’t operate what you can’t see. Some libraries popped up to add instrumentation to RubyLLM, but they monkey patch us. That’s unnecessary maintenance burden.

RubyLLM 1.16 emits structured events for the work it does, the same way Rails does. In a Rails app they flow through ActiveSupport::Notifications automatically, and you subscribe the way you’d subscribe to any framework event:

# config/initializers/ruby_llm_instrumentation.rb
ActiveSupport::Notifications.subscribe('chat.ruby_llm') do |_name, _start, _finish, _id, payload|
  Rails.logger.info(
    provider: payload[:provider],
    model: payload[:model],
    input_tokens: payload[:input_tokens],
    output_tokens: payload[:output_tokens]
  )
end

Outside Rails, point config.instrumenter at anything that responds to instrument(name, payload) { ... } and wire it into OpenTelemetry, StatsD, or your own logger. The events cover the whole surface: HTTP requests, chat completions, tool calls, embeddings, and model registry refreshes, each carrying the provider, model, token usage, and the Ruby objects an observability adapter needs.

Those payloads can hold message content, tool arguments, and full provider responses, which is exactly the sensitive data you don’t want sprayed into logs by accident. So log or export those fields only when your policy allows it. The Instrumentation guide has the full payload reference.

A Base URL for Every Native Provider

In production, your AI traffic rarely goes straight to the provider. It goes through a gateway that handles auth, a proxy that enforces rate limits, a private endpoint inside your network. RubyLLM let you point most providers at a custom base URL already. 1.16 fills the last gaps, so now every native provider has one:

RubyLLM.configure do |config|
  config.bedrock_api_base     = ENV['BEDROCK_API_BASE']
  config.mistral_api_base     = ENV['MISTRAL_API_BASE']
  config.perplexity_api_base  = ENV['PERPLEXITY_API_BASE']
  config.vertexai_api_base    = ENV['VERTEXAI_API_BASE']
  config.xai_api_base         = ENV['XAI_API_BASE']
end

Together with the bases already there for OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Azure, Ollama, and GPUStack, you can front any provider with your own infrastructure. Each override falls back to the provider’s default when unset, so nothing you already have changes.

While I was in the HTTP layer, I made the Faraday adapter configurable too:

RubyLLM.configure do |config|
  config.faraday_adapter = :async_http # or :typhoeus, :net_http, :httpx, etc.
end

It defaults to Net::HTTP, so nothing changes unless you ask. Reach for it when you want connection pooling, HTTP/2, or whatever adapter your app already standardizes on.

Transcription Words

Transcription now exposes word-level timing when the provider returns it, so you can build word-by-word highlighting on top of OpenAI’s verbose transcriptions:

transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words # => [{ word:, start:, end: }, ...]

Getting Ready for 2.0

Deprecation warnings are now yours to control:

RubyLLM.configure do |config|
  config.deprecation_behavior = :warn # :warn (default), :silence, or :raise
end

Set :raise in your test environment and a deprecated path fails the build the moment something hits it. That’s the cheapest possible way to be ready before those paths disappear in 2.0, instead of finding out on upgrade day.

Fixes and the Model Registry

A release this size carries a long tail of fixes. The ones worth calling out: Anthropic’s “prompt is too long” now raises ContextLengthExceededError so you can rescue it like any other context-length error, streaming parallel tool calls accumulate correctly, Bedrock reasoning streams properly, and Gemini function calls and inline images follow the spec. Active Storage handling in Rails got more careful about pending uploads, load order, and text attachments. And when configuration or a model lookup goes wrong, the error now tells you what happened and how to fix it.

The model registry is refreshed with the latest models, capabilities, and pricing. One fix there is worth a sentence: models.dev started shipping partial release dates like 2025-09 and 2025, RubyLLM was turning those into invalid timestamps, and model loading broke. 1.16 normalizes them to real dates so the registry keeps loading.

The full release notes have the complete list.

Use It

gem 'ruby_llm', '~> 1.16'
bundle update ruby_llm

It’s backwards compatible. Concurrency is opt-in, instrumentation stays inert until you subscribe, and every new *_api_base falls back to the provider default. Nothing you’ve built changes until you decide to reach for it. The boring infrastructure is just there now, waiting for the day your app stops being a demo and starts being production.

Newsletter