<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Casola Blog</title><description>Product updates, engineering deep dives, and practical guides from the Casola team.</description><link>https://www.casola.ai/</link><item><title>One inference platform, four API surfaces</title><link>https://www.casola.ai/blog/api-compatibility/</link><guid isPermaLink="true">https://www.casola.ai/blog/api-compatibility/</guid><description>How OpenAI-, Anthropic-, and Fal.ai-compatible clients share the same dispatch backend with Casola&apos;s native API, and where they can&apos;t</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Every inference request comes with a compliance certificate</title><link>https://www.casola.ai/blog/data-sovereignty/</link><guid isPermaLink="true">https://www.casola.ai/blog/data-sovereignty/</guid><description>Verifiable data residency built into every request, without dedicated infrastructure</description><pubDate>Wed, 08 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Building a GPU autoscaler that works: queueing theory and utilization metrics combined</title><link>https://www.casola.ai/blog/autoscaling/</link><guid isPermaLink="true">https://www.casola.ai/blog/autoscaling/</guid><description>Why utilization alone is the wrong scaling signal for GPU inference, and how arrival rate, Little&apos;s Law, and queue drain work better</description><pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Where the milliseconds go in a GPU inference request</title><link>https://www.casola.ai/blog/latency/</link><guid isPermaLink="true">https://www.casola.ai/blog/latency/</guid><description>End-to-end latency decomposition across a multi-modal inference pipeline — and the five decisions that keep overhead off the critical path</description><pubDate>Wed, 11 Mar 2026 00:00:00 GMT</pubDate></item><item><title>GPU workers fail in interesting ways</title><link>https://www.casola.ai/blog/gpu-worker-failures/</link><guid isPermaLink="true">https://www.casola.ai/blog/gpu-worker-failures/</guid><description>From PCIe bus failures to cascading cloud outages: what actually breaks in a distributed GPU inference fleet, and how you build around it</description><pubDate>Thu, 26 Feb 2026 00:00:00 GMT</pubDate></item></channel></rss>