<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Inference on Caminho Solo</title><link>https://www.caminhosolo.com.br/en/tags/inference/</link><description>Recent content in Inference on Caminho Solo</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Wed, 01 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.caminhosolo.com.br/en/tags/inference/index.xml" rel="self" type="application/rss+xml"/><item><title>Liquid AI LFMs: Run Competitive AI Models Without Per-Token Costs</title><link>https://www.caminhosolo.com.br/en/2026/04/liquid-ai-lfm-solo-builders/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.caminhosolo.com.br/en/2026/04/liquid-ai-lfm-solo-builders/</guid><description>The decision most solo builders keep postponing At some point every indie developer building with AI hits the same wall: the product works, users are coming in, and then you look at the API bill and realize your unit economics are broken.</description></item><item><title>vLLM: How to Serve LLMs in Production with High Throughput</title><link>https://www.caminhosolo.com.br/en/2026/03/vllm-inference-production/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://www.caminhosolo.com.br/en/2026/03/vllm-inference-production/</guid><description>TL;DR: vLLM is an open-source inference engine that delivers 2-4x more throughput than traditional solutions, with 50-80% lower costs than external APIs for high-volume usage. Recommended for products exceeding 100k tokens/month.</description></item></channel></rss>