# RubyLLM 1.16: Concurrent Tool Execution, Rails-Style Instrumentation, and api_base for Every Provider

RubyLLM 1.16 runs your tools concurrently in threads or fibers, makes RubyLLM observable without monkey patching, and lets every native provider sit behind a proxy.

When you first reach for an LLM library, the only question is whether it works. Can it call the model, parse the response, run a tool. Once your app is actually in production, the questions change. Is it fast? Can I see what it's doing when something goes wrong? Can I send its traffic through my own infrastructure instead of straight out to the provider?

I released [RubyLLM](https://rubyllm.com) 1.16 today. It answers these production questions.

The three headline features are about speed, visibility, and control: tools that run concurrently, structured events for everything RubyLLM does, and a configurable base URL for every native provider. None of them change how you write your app. All of them matter the moment real traffic shows up.

## Tools That Run Concurrently

When a model returns several tool calls in one response, it's telling you those calls are independent. Get the weather, look up the stock price, fetch the exchange rate. The model didn't ask for them in order. It asked for all of them.

RubyLLM has always run them one at a time. For tools that are CPU-bound that's fine, but most tools aren't. Most tools are an HTTP call, a database query, another LLM request. They spend their time waiting. Running three waits back to back, when you could have waited for all three at once, is time your user can't even get back.

1.16 runs them together. Turn it on for every chat from one place:

```ruby
RubyLLM.configure do |config|
  config.tool_concurrency = true # :threads, :fibers, true, or false
end
```

`true` uses `:threads` and needs no dependencies. If you'd rather not pay for a thread per tool, which I'd recommend for I/O bound applications, `:fibers` mode uses the `async` gem and gets you the same overlap on a single thread. Check out my previous posts on [why I think async is the future of Ruby](/async-ruby-is-the-future/) and [what Ruby concurrency actually does](/ruby-concurrency-what-actually-happens/).

When one conversation needs different behaviour than the rest, override it per chat:

```ruby
chat.with_tools(Weather, StockPrice, Currency, concurrency: :fibers)
chat.with_tools(Weather, StockPrice, concurrency: false)
```

Inside Rails, each concurrent tool call runs wrapped in the Rails executor, so connection pools, `CurrentAttributes`, and reloading behave the way the rest of your app does. You don't think about it. It just works.

And concurrency doesn't make your UI wait for the slowest tool. Each result is added back to the conversation the moment that tool finishes, in completion order, so your streaming callbacks see results land as they happen. RubyLLM still gathers every result before going back to the model, but your users watch progress instead of a spinner.

## Instrumentation Without Monkey Patching

You can't operate what you can't see. Some libraries popped up to add instrumentation to RubyLLM, but they monkey patch us. That's unnecessary maintenance burden.

RubyLLM 1.16 emits structured events for the work it does, the same way Rails does. In a Rails app they flow through `ActiveSupport::Notifications` automatically, and you subscribe the way you'd subscribe to any framework event:

```ruby
# config/initializers/ruby_llm_instrumentation.rb
ActiveSupport::Notifications.subscribe('chat.ruby_llm') do |_name, _start, _finish, _id, payload|
  Rails.logger.info(
    provider: payload[:provider],
    model: payload[:model],
    input_tokens: payload[:input_tokens],
    output_tokens: payload[:output_tokens]
  )
end
```

Outside Rails, point `config.instrumenter` at anything that responds to `instrument(name, payload) { ... }` and wire it into OpenTelemetry, StatsD, or your own logger. The events cover the whole surface: HTTP requests, chat completions, tool calls, embeddings, and model registry refreshes, each carrying the provider, model, token usage, and the Ruby objects an observability adapter needs.

Those payloads can hold message content, tool arguments, and full provider responses, which is exactly the sensitive data you don't want sprayed into logs by accident. So log or export those fields only when your policy allows it. The [Instrumentation guide](https://rubyllm.com/instrumentation) has the full payload reference.

## A Base URL for Every Native Provider

In production, your AI traffic rarely goes straight to the provider. It goes through a gateway that handles auth, a proxy that enforces rate limits, a private endpoint inside your network. RubyLLM let you point most providers at a custom base URL already. 1.16 fills the last gaps, so now every native provider has one:

```ruby
RubyLLM.configure do |config|
  config.bedrock_api_base     = ENV['BEDROCK_API_BASE']
  config.mistral_api_base     = ENV['MISTRAL_API_BASE']
  config.perplexity_api_base  = ENV['PERPLEXITY_API_BASE']
  config.vertexai_api_base    = ENV['VERTEXAI_API_BASE']
  config.xai_api_base         = ENV['XAI_API_BASE']
end
```

Together with the bases already there for OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Azure, Ollama, and GPUStack, you can front any provider with your own infrastructure. Each override falls back to the provider's default when unset, so nothing you already have changes.

While I was in the HTTP layer, I made the Faraday adapter configurable too:

```ruby
RubyLLM.configure do |config|
  config.faraday_adapter = :async_http # or :typhoeus, :net_http, :httpx, etc.
end
```

It defaults to `Net::HTTP`, so nothing changes unless you ask. Reach for it when you want connection pooling, HTTP/2, or whatever adapter your app already standardizes on.

## Transcription Words

`Transcription` now exposes word-level timing when the provider returns it, so you can build word-by-word highlighting on top of OpenAI's verbose transcriptions:

```ruby
transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words # => [{ word:, start:, end: }, ...]
```

## Getting Ready for 2.0

Deprecation warnings are now yours to control:

```ruby
RubyLLM.configure do |config|
  config.deprecation_behavior = :warn # :warn (default), :silence, or :raise
end
```

Set `:raise` in your test environment and a deprecated path fails the build the moment something hits it. That's the cheapest possible way to be ready before those paths disappear in 2.0, instead of finding out on upgrade day.

## Fixes and the Model Registry

A release this size carries a long tail of fixes. The ones worth calling out: Anthropic's "prompt is too long" now raises `ContextLengthExceededError` so you can rescue it like any other context-length error, streaming parallel tool calls accumulate correctly, Bedrock reasoning streams properly, and Gemini function calls and inline images follow the spec. Active Storage handling in Rails got more careful about pending uploads, load order, and text attachments. And when configuration or a model lookup goes wrong, the error now tells you what happened and how to fix it.

The model registry is refreshed with the latest models, capabilities, and pricing. One fix there is worth a sentence: models.dev started shipping partial release dates like `2025-09` and `2025`, RubyLLM was turning those into invalid timestamps, and model loading broke. 1.16 normalizes them to real dates so the registry keeps loading.

The [full release notes](https://github.com/crmne/ruby_llm/releases/tag/1.16.0) have the complete list.

## Use It

```ruby
gem 'ruby_llm', '~> 1.16'
```

```bash
bundle update ruby_llm
```

It's backwards compatible. Concurrency is opt-in, instrumentation stays inert until you subscribe, and every new `*_api_base` falls back to the provider default. Nothing you've built changes until you decide to reach for it. The boring infrastructure is just there now, waiting for the day your app stops being a demo and starts being production.
