Awesome Free Compute
Home / LLMs

🪙 The Free Token-Maxxing Guide (2026)

A messiah-guide to extracting the maximum amount of frontier-model usage — tokens and compute — for $0, legitimately.

Last verified: June 2026. Free tiers, models, and limits change weekly. Every number below was cross-checked against an official pricing/docs page or the auto-generated cheahjs/free-llm-api-resources list; where a provider hides exact caps behind a login or a JS page, it's flagged.

❗ Important

Token-maxx, don't token-cheat. Everything here uses tiers you genuinely qualify for. Creating fake/duplicate accounts, cycling cards to re-trigger trials, scraping/sharing keys, or scripting around rate limits violates provider ToS, gets you banned, and risks killing the free programs for everyone. The sustainable move is aggregating many legit sources, not cheating one source many times. See §9.

[!TIP] Making media, not just text? See the companion Free Audio Generation Guide 🎙️ — free music, voice (TTS + cloning), and transcription, with a hard focus on which "free" tools you can actually monetize.


Contents


TL;DR — the messiah stack

For a solo creator / power user who wants frontier quality at $0:

Need Use Why
🧠 Best free frontier brain Google AI Studio (browser) + free Gemini API key The only place you get true frontier (Gemini 3 Pro / 3.1 Pro) free, ~1M context, multimodal. Price = your data trains Google.
Fastest free tokens Cerebras (gpt-oss-120b ~3000 tok/s) 1,000,000 tokens/day free, blazing fast.
💻 Free coding Gemini Code Assist (~180k completions/mo) + Gemini CLI (1,000 agent req/day) ~90× Copilot Free's completion cap.
🤖 Free agentic coding Cline / Aider + a free key (Gemini-free → Groq-free → OpenRouter :free) Model-agnostic OSS agent on free backends = $0.
🔁 Recurring free GPU Modal ($5/mo free, $30/mo with a card on file) Best standing serverless-GPU allowance; resets monthly.
♾️ Truly unlimited Local (Ollama / LM Studio on Apple Silicon) Bounded only by your hardware, ToS-clean, private.
🧰 Glue LiteLLM or OpenRouter as one endpoint, fallback chain Route local → free → :free → cheap-paid automatically.
🇨🇳 Near-free tokens DeepSeek / Doubao cache-hit input ~100–350M cached input tokens per $1 (§11.1).
🎬 Free commercial video/image Open weights: Wan 2.2 / CogVideoX-2B / Kolors on free GPU No watermark, commercial-OK, $0 (§11.5).

One-liner: Chat & long-context → AI Studio. Coding → Gemini Code Assist + CLI. Agentic/overflow → Cline with Gemini-free + Groq-free + OpenRouter :free. Bulk → local models. Glue it with LiteLLM and lean on prompt caching + batch APIs.


1. What "frontier" means in 2026

As of mid-2026, the frontier tier is roughly GPT-5.5 / Gemini 3.1 Pro / Claude Opus 4.8. Free tiers usually hand you the mid model of the current generation (Flash / mini / Sonnet / Haiku) with limited or metered access to the very top model. The two big exceptions where you can touch a genuine frontier model for free: Google AI Studio (Gemini 3 Pro / 3.1 Pro in the browser) and, briefly, trial credits / student offers.


2. Free frontier in the browser (chat apps)

The zero-setup layer. Most give a frontier-ish model with dynamic, unpublished caps that silently downgrade you after a threshold.

App Free model Free limit Notes
Google AI Studio 🥇 Gemini 3 Pro / 3.1 Pro (frontier) Very generous browser use; "rate limits may apply" Best free frontier. Free-tier data used for training.
Gemini app Gemini 3 Flash (limited 3 Pro / Deep Think) Daily caps (unpublished); ~2-month AI Pro trial offered
ChatGPT Free GPT-5.x ("auto"), downgrades to mini after cap Dynamic, unpublished; now shows ads Keep threads short to avoid re-sending context.
Claude.ai Free Sonnet 4.5 (Opus is paid) Rolling ~5-hr window cap (unpublished) Web search included.
Microsoft Copilot GPT-5-class Generous; light peak throttling Best free "GPT-5 chat" fallback.
Perplexity Free Frontier in "Pro Search" Unlimited quick + a few Pro searches/day (~3–5, unverified) Best for cited web answers.
Duck.ai (DuckDuckGo) GPT-5 mini, Claude Haiku 4.5, Llama 4 Scout, gpt-oss-120b Free, rate-limited Private, no account.
DeepSeek / Qwen / Kimi / Le Chat V3.x/R1 · Qwen3 · K2 · Mistral-large-class Generous free Strongest "generous free" non-US chat for bulk work.
Meta AI Llama 4 Free (WhatsApp/IG/web) Region-limited.

Maximize: do long-context, multimodal, and reasoning-heavy work in AI Studio instead of burning capped ChatGPT/Claude messages. Caps on rows without a published number are approximate.


3. Free LLM APIs — the token buffet

Perpetual (or near-perpetual) free API tiers. Numbers verified against cheahjs/free-llm-api-resources and provider docs, June 2026.

Provider Top free model(s) Free limit Card? Notes
Google AI Studio / Gemini API 🥇 Gemini 3.5 Flash, 3 Flash, 2.5 Flash 250k TPM · 20 RPD · 5 RPM (3.1 Flash-Lite: 500 RPD); Gemma 3: 14,400 RPD No Best free access to a top closed model. Data trains Google.
Cerebras gpt-oss-120b, Llama 3.1 8B 30 RPM · 60k TPM · 1,000,000 tokens/day No ~3000 tok/s — fastest free tokens.
Groq Llama 3.3 70B, gpt-oss-120b, Qwen3-32B Per-model, e.g. Llama 3.3 70B = 1,000 RPD; Llama 3.1 8B = 14,400 RPD No Very fast; verify caps at /settings/limits.
OpenRouter Many :free slugs (gpt-oss-120b, Qwen3-Coder, GLM-4.5-Air, Kimi, Nemotron 3…) 20 RPM · 50 RPD (→ 1,000 RPD with a $10 lifetime top-up); shared quota No (for 50/day) Best rotating buffet + built-in fallbacks.
Mistral La Plateforme Open + proprietary Mistral 1 req/s · 500k TPM · 1,000,000,000 tokens/month Phone Free "Experiment" plan requires opting into data training.
Cloudflare Workers AI Llama 3.3 70B, gpt-oss-120b, GLM-4.7-flash, Kimi K2.6 10,000 neurons/day No Great background trickle; 70B outputs burn neurons fast.
Cohere command-a-* (incl. reasoning/vision) 20 RPM · 1,000 requests/month (shared) No Genuinely free eval keys, low cap.
GitHub Models GPT-5, o3/o4-mini, Grok 3, DeepSeek-R1, Llama 4, Phi-4 Tiny, tied to Copilot tier (Free ≈ 15 RPM/150 RPD) No Best legal way to sample closed top models.
NVIDIA NIM Nemotron, Llama, many open 40 RPM Phone Context-limited; good for POCs.
HF Inference Providers Routes to Groq/Cerebras/Together/etc. $0.10/mo credits (PRO $2/mo) No Universal fallback, tiny budget.
Vercel AI Gateway Routes to many providers $5/mo No One key, many models.

Best picks: Gemini (quality) → Cerebras (speed + volume) → Groq (speed) → OpenRouter :free (variety/overflow) → Cloudflare (automation trickle).

Not really free in 2026: Together (min $5 purchase), DeepSeek direct (balance-based), Fireworks (payment profile). Treat as cheap-paid, not free.


4. Free coding assistants + the BYO-key power move

Tool Free model Free limit Notes
Gemini Code Assist (individual) 🥇 Gemini 2.5/3-class, 128k context ~180,000 completions/month ("90× other free assistants") Most generous completions by far. VS Code + JetBrains.
Gemini CLI Gemini 3 (Flash+Pro), 1M context 60 RPM · 1,000 agent requests/day (personal account) Open-source (Apache-2.0).
GitHub Copilot Free GPT-5-class / Claude / Gemini (shared) 2,000 completions/mo + limited chat/agent Agent mode, MCP, Copilot CLI. ⚠️ New Pro/Pro+/Max/Student signups paused Apr 20 2026.
Cline / Roo Code (OSS) Any (BYO key) Unlimited if you BYO a free key Gemini, OpenRouter 200+, Groq/Cerebras, local Ollama/LM Studio.
Aider (OSS CLI) Any (BYO key) Unlimited (BYO) Great with Gemini-free / DeepSeek.
Continue.dev (OSS) Any (BYO key) Free (BYO) ⚠️ Repo now read-only / maintenance-freeze — prefer Cline/Roo.
Zed Limited hosted + BYO key Limited free hosted prompts; unlimited w/ own key Top models are Pro-only.

The power move: bring-your-own free key → $0 agentic coding

Run an OSS agent (Cline / Aider / Roo / Gemini CLI) and point it at a free API tier:

  1. Quality: Gemini free key — aistudio.google.com/apikey. Free "Gemini 3-class"; CLI gives 60 RPM / 1,000 req/day. (Prompts may train Google.)
  2. Speed: Groq free — fast Llama / gpt-oss / Qwen.
  3. Overflow: OpenRouter :free — a free DeepSeek/Qwen/Llama variant when the others cap out.

Recommended free stack: Cline (or Aider) + Gemini-free, with Groq-free secondary and OpenRouter :free as overflow. Add MCP servers for tools.


5. Trial credits — one-time fuel

Use once, on one real account, as intended. (From cheahjs/free-llm-api-resources.)

Provider Free credit Notes
Modal $5/mo free, $30/mo with a payment method on file Best recurring serverless GPU/CPU; pay-per-second.
Baseten $30 Pay by compute time.
NLP Cloud $15 Phone verification.
AI21 / Upstage $10 / 3 months each Jamba · Solar models.
SambaNova Cloud $5 / 3 months DeepSeek V3.x, Llama 4 Maverick, gpt-oss-120b. (A no-card free tier ~20 req/day/model also exists.)
Scaleway 1,000,000 free tokens Llama 3.3 70B, Qwen3, Mistral, gpt-oss-120b.
Alibaba Model Studio (Intl) 1,000,000 tokens/model Qwen open + proprietary; activate Intl/Singapore mode.
Inference.net $1 (+$25 on survey) Various open models.
Fireworks / Nebius / Hyperbolic $1 each Various open models.
Novita $0.50 for 1 year Various open models.

For cloud credits ($300 GCP, $200 Azure, AWS's new $100+$100, Oracle $300), and student/startup programs (GitHub Student Pack, Google for Startups up to $350k, AWS Activate up to $200k), see the main README §6.


6. Self-host OSS for "unlimited" tokens

When you run the model, tokens are free (bounded by hardware). All of these speak the OpenAI-compatible API, so they slot behind the same client code / hub.

Runtime Endpoint Best for
Ollama http://localhost:11434/v1 Easiest local; Mac-native.
LM Studio http://localhost:1234/v1 GUI + Responses API (works with Codex).
llama.cpp llama-server Lightweight C/C++, GGUF quant on CPU/GPU.
vLLM vllm serve <model> High-throughput serving, prefix caching; runs on Apple Silicon.

Hardware reality: Apple Silicon unified memory lets an M-series Mac address lots of RAM as effective "VRAM," which is why it's the popular local-LLM choice. A 16–24 GB machine comfortably runs 8–14B models (Llama 3 8B, Qwen3-14B, DeepSeek-R1-Distill); 32B+ needs more.

Free GPU to host bigger models / media: Hugging Face Spaces (free high-VRAM GPU time via ZeroGPU), Kaggle (30 GPU-hrs/week), Modal ($30/mo). Full details and the GPU→model map are in the main README §2–§3. Oracle's Always-Free ARM (4 OCPU / 24 GB) makes a perfect 24/7 CPU LLM backend via llama.cpp (8B at ~5–10 tok/s, permanent).


7. The orchestration layer (wire it together)

The whole stack is glued by the OpenAI Chat Completions schema — change only base_url + api_key to repoint any client. Pick one hub:

  • OpenRouter (hosted, zero-ops): :free models, a models: [...] array for automatic fallback, and provider routing (sort, data_collection:"deny", ZDR).
  • LiteLLM (self-hosted proxy, http://0.0.0.0:4000): unify 100+ providers, load-balance multiple free keys you legitimately own under one model_name, fallback across model groups, and hard-enforce RPM (enforce_model_rate_limits,
  • Redis for multi-instance) so you degrade gracefully instead of getting 429-banned.
   Apps / coding tools (Aider · Cline · Roo · your scripts)   ── all speak OpenAI API
                              │
                              ▼
          ┌──────────────────────────────────────────┐
          │  ONE HUB:  LiteLLM proxy  — or — OpenRouter │
          └───────┬───────────────────────┬────────────┘
   routing /      │                       │   :free models, provider routing,
   fallback /     │                       │   model-array fallbacks
   load-balance   ▼                       ▼
   ┌────────────┬───────────────┬───────────────┬──────────────────────────┐
   │ Free tiers │ Trial-credit  │ Your OTHER    │ LOCAL (unlimited):        │
   │ (Gemini,   │ providers     │ legit free    │ Ollama, LM Studio,        │
   │ Groq, …)   │ (Modal, …)    │ keys (yours)  │ llama.cpp, vLLM           │
   └────────────┴───────────────┴───────────────┴──────────────────────────┘

Tiered fallback chain (cheapest-first): local model → free hosted tier (Groq/Gemini/Cerebras) → OpenRouter :free → small paid model as last resort.

7.1 Add a gateway, tracing & tools (all $0)

Once a hub is routing your tokens, three cheap upgrades make the stack production-grade:

  • AI gateways (caching, retries, budgets, analytics in front of any provider): Cloudflare AI Gateway (managed, free — caching/rate-limit/logging/fallback; free plan keeps 100k logs, 10 gateways, and BYOK avoids markup), or self-host LiteLLM / Bifrost (Apache/MIT) for virtual keys + spend caps. Portkey & Helicone ship OSS gateways too.
  • Observability / evals (see where tokens go, catch regressions): 🥇 Langfuse (MIT — self-host, or 50k-units/mo free cloud) for traces, cost, prompt management and LLM-as-judge; lighter options are Helicone (Apache, 10k req/mo free), Arize Phoenix (local OTel) and OpenLLMetry (Apache, export to any OTel backend).
  • MCP (Model Context Protocol) — the open "USB-C for AI": one tool server (files, DBs, APIs, search) plugs into Claude, Copilot, Cline, Cursor. Find free servers in the official modelcontextprotocol/servers repo, the GitHub MCP Registry, Smithery, mcp.so, or the Docker MCP Catalog (300+ containerized — safest). ⚠️ Only attach trusted servers with least-privilege tokens — a malicious MCP server is a prompt-injection / secret-exfil vector.

$0 glue pick: LiteLLM (route) + Langfuse self-host (trace) + Docker MCP Toolkit (tools), plus n8n (self-host) or Dify (OSS) for visual automation / RAG. Need the data + hosting layer too? See the Free Ship-It Stack 🚀 (DBs, vector/RAG, embeddings, hosting, storage, auth — all $0, commercial-use traps flagged).


8. Quota-stretching: caching & batching

Make each free token go further:

  • Prompt caching (reuse a fixed prefix cheaply):
  • Anthropiccache_control: {type:"ephemeral"}; cache reads ≈10% of base input (~90% off).
  • Geminiimplicit caching auto-on for 2.5+; explicit via caches.create() (guaranteed savings). Put stable content at the start of the prompt.
  • DeepSeek — disk caching on by default, no code changes (prompt_cache_hit_tokens in usage).
  • Batch APIs (50% off + higher limits) for non-interactive jobs: OpenAI Batch (24-hr turnaround) and Anthropic Message Batches (most finish < 1 hr).
  • Trim & stabilize: keep a stable system-prompt prefix (maximizes cache hits + batch efficiency), summarize old turns, pin num_ctx locally to avoid silent truncation/OOM.

9. Legit vs risky

✅ Clearly legit & sustainable

Official free tiers (AI Studio, Groq, Cerebras, Cohere, NIM, Mistral, Cloudflare, GitHub Models) · trial credits used as intended on one real account · OpenRouter :free · OSS self-hosting · student/startup programs you actually qualify for.

Legit ≠ private. Many free tiers train on your inputs (Google AI Studio free, Mistral's free Experiment plan, etc.). Never paste secrets, proprietary, or sensitive code into a free tier.

⚠️ Grey / risky — flagged, not endorsed (no how-to)

Practice Why it's risky Consequence
gpt4free-style reverse proxies Scrape/automate vendor chatbots (HAR/cookies) — violates source ToS Constant breakage, takedowns, no privacy/reliability
Multi-accounting / fake accounts Circumvents the per-account caps that keep free tiers alive Bans/termination
Card cycling to re-trigger trials Defeats trial eligibility; crosses payment terms Payment bans, possible fraud liability
Sharing / scraping API keys Keys are credentials; unauthorized use Revocation; owner eats the bill; legal exposure
Bypassing rate limits Contravenes technical controls 429 → hard ban; collateral damage to the whole community

10. The stacking playbook

A realistic $0 routine for a solo AI creator:

  1. Daily driver: Google AI Studio for ideation/long-context/multimodal; Gemini Code Assist + CLI for coding.
  2. Agentic coding: Cline/Aider → Gemini-free → Groq-free → OpenRouter :free, behind LiteLLM with per-key RPM caps + fallbacks.
  3. Bulk / iterative grunt work: local Ollama/LM Studio (lint fixes, drafts).
  4. Speed bursts: Cerebras (1M tok/day) / Groq.
  5. GPU jobs (media, fine-tunes): Modal ($30/mo) for serverless bursts; Kaggle (30 h/wk) for long headless runs; Oracle A1 as a free 24/7 backend.
  6. Stretch quotas: prompt caching (~90% off) + batch APIs (50% off) on stable prompts.
  7. One-time fuel: spend trial credits (Modal/Baseten/SambaNova/Scaleway…) on real deployable workloads; claim GitHub Student Pack / startup programs if eligible.
  8. Hygiene: set budgets, disable auto-recharge, delete idle GPU/volumes, and never feed secrets to a training-on free tier.

11. The China play 🇨🇳 — cheapest tokens + free media

Chinese labs are in an all-out price/feature war, which is fantastic for a token-maxxer. Three things stand out: near-free token pricing via context caching, flat coding subscriptions at a fraction of Claude's price, and genuinely free image/video (especially via open weights). The trade-offs — data jurisdiction, watermarks, ToS — are real; see §11.6.

Versions move fast. As of June 2026 the current models are GLM-5.1 / 4.7, Kimi K2.7, MiniMax M3, DeepSeek V4, Kling 3.0, Hailuo 2.3, Seedance 2.0, CogVideoX-3, Wan 2.2, HunyuanVideo 1.5. Re-verify names before relying on them.

11.1 "100M tokens for ~$1" — the cache-hit trick

The viral claim is real, but only for input tokens served from cache — not output. The mechanism is context caching: a provider bills a repeated prompt prefix (system prompt, long doc, codebase, chat history) at ~1–2% of the normal price. Agentic coding naturally creates this, because every tool-call resends the same growing context. (DeepSeek prices verified on its official pricing page, 2026-06-12; the prefix cache is automatic — it only matches an identical prefix from token 0.)

Provider · model Cache-hit input /1M Cache-miss input /1M Output /1M $1 buys (cache-hit input)
DeepSeek V4 Flash $0.0028 $0.14 $0.28 ≈ 357M tokens
DeepSeek V4 Pro $0.003625 $0.435 $0.87 ≈ 276M tokens
ByteDance Doubao seed-1.6-flash ≈$0.0042 (¥0.03) ≈$0.021 (¥0.15) ≈$0.21 (¥1.50) ≈ 239M tokens
Doubao seed-1.6-flash batch ≈$0.0042 ≈$0.0105 (¥0.075) ≈$0.10 input ≈ 96M / $1
DeepSeek (2024 cache launch price) $0.014 $0.14 ≈ 71M tokens

Where the meme came from: Hacker News users (May 2026) reported running DeepSeek V4 Pro + opencode and burning ~100M tokens for ~$2 in a day, "majority cache tokens." Plausible against official pricing with a high cache-hit ratio and limited output.

The honest version: fresh input is ~$0.14/M (≈7M/$1) and output is 100–300× the cache-hit price, so a real mixed bill is higher. Mechanisms that actually work: automatic context cache (DeepSeek, Doubao, Kimi) + batch APIs (~50–60% off, non-real-time). ⚠️ A specific off-peak discount could not be confirmed on DeepSeek's live 2026 page (it existed historically) — treat off-peak as unverified.

11.2 Ultra-cheap "coding plan" subscriptions

Chinese labs sell flat monthly coding plans that undercut Claude Code massively. The wiring is officially supported, not a hack: most expose an Anthropic-compatible endpoint, so you point Claude Code / Cline / Roo at it:

// ~/.claude/settings.json  (Z.ai GLM example, per docs.z.ai)
{ "env": {
  "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
  "ANTHROPIC_AUTH_TOKEN": "<your_key>",
  "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.1",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7"
}}
Plan Price (2026) Model(s) Rough quota Notes
Z.ai GLM Coding Plan – Lite ~$18/mo advertised (launched ~$3/mo late 2025) ⚠️verify live GLM-5.1 / 4.7 / 4.5-Air ~80 prompts/5h, ~400/wk; "≈3× Claude Pro usage" Flagship; widest tool support. ToS = supported tools only.
MiniMax Token Plan (ex-"Coding Plan") $20 / $50 / $120 MiniMax-M3 (+multimodal) 5-hr + weekly windows Bundles coding and image/video/music under one quota.
Moonshot Kimi K2.7 Code PAYG: $0.19 / $0.95 / $4.00 per 1M (hit/miss/out) kimi-k2.7-code, 256K ctx spend-budget caps Strong agentic coder; no cheap flat sub.
DeepSeek V4 PAYG: $0.14 / $0.28 (flash) per 1M deepseek-v4-flash/pro Cheapest credible Claude-Code backend.
Anthropic Claude (baseline) ~$20 Pro / ~$100–200 Max Claude The thing they're undercutting.
GitHub Copilot (baseline) $10 / $39 / $100 GPT/Claude/Gemini

⚠️ Alibaba Qwen's free coding tier was discontinued 2026-04-15 (1,000→100→0 req/day); Qwen Code now needs a paid plan or BYO key.

11.3 Free Chinese model tiers (text + multimodal)

Provider Genuinely-free API Free chat app Open weights Intl friction
Zhipu GLM / Z.ai 🥇 glm-4.7-flash, glm-4.5-flash, glm-4-flash (text); glm-4v-flash, glm-4.6v-flash (vision); cogview-3-flash (image) — all $0 ⚠️rate limits login-gated chat.z.ai GLM-4.5/4.7 (4.5 = MIT) Low — Z.ai takes intl email/cards
Tencent Hunyuan 1,000,000 tokens free, valid 1 year (Tencent Cloud) yuanbao.tencent.com Hunyuan-A13B (13B active, 256K ctx) Med — Cloud acct verification
Alibaba Qwen Model Studio Intl: 1M tokens/model, 90 days, Singapore region (enable "Free Quota Only" to avoid overage) chat.qwen.ai Qwen3 (Apache-2.0) Med
DeepSeek No free API (cheap PAYG) chat.deepseek.com DeepSeek-V3 / R1 Low
Moonshot Kimi No free API kimi.com Kimi K2 (huge) Low
ByteDance Doubao Ark trial credits only doubao.com High (CN-first)

🏆 The standout: Zhipu's *-Flash family — the only major Chinese lab with a genuinely $0 API spanning text, vision, and image (and historically video). Sign up on the international Z.ai portal (intl email/card OK).

Best $0 self-host weights (run free on free GPU — see §6): Qwen3, DeepSeek-V3/R1, GLM-4.5/4.7, Hunyuan-A13B, ERNIE 4.5, Kimi K2, Yi, MiniMax-M1.

11.4 Free image generation

Model Provider Free path Watermark / commercial Open weights Friction
CogView-3-Flash 🥇 Zhipu Free via API (Z.ai) ⚠️limits login-gated None / commercial OK Low
Kolors (可图) 🥇 Kuaishou Open weights (self-host) + Kling app None / commercial (<300M MAU) Zero (self-host)
Hunyuan-DiT Tencent Open weights (self-host) None / commercial (<100M MAU) Zero (self-host)
Dreamina / Seedream ByteDance App ~60 credits/day Watermark / personal only Low (Google login)
Tongyi Wanxiang Alibaba App (free) / API trial App watermark; API commercial High (app needs +86)
ERNIE-ViLG Baidu App ~50 signup credits Watermark / personal Extreme (+86, real-name)

Verdict: self-host Kolors (top quality, clean license, runs on free GPU) or hit the CogView-3-Flash API (no server needed) for commercial-grade cover art and thumbnails at $0.

11.5 Free video generation (for music videos)

The key insight: free app tiers almost all watermark and forbid commercial use — so for monetized music videos, the free and legally-clean path is OPEN WEIGHTS, run free on free GPU (HF ZeroGPU / Kaggle / Colab / Modal) via ComfyUI or Diffusers.

🔓 Self-host open weights — free, no watermark, commercial:

Model Best free-GPU fit VRAM (quantized) Clip / res License
CogVideoX-2B 🥇 lowest barrier Free Colab T4 ~4 GB 6s · 720×480 Apache-2.0
CogVideoX-5B / 1.5-5B Kaggle, HF ZeroGPU ~5–10 GB up to 10s · 1360×768 commercial OK
Wan 2.2 TI2V-5B 🥇 best license HF ZeroGPU Space, RTX 4090 ~16–24 GB 720p · 24fps, T2V+I2V Apache-2.0
HunyuanVideo 1.5 HF ZeroGPU / Modal; GGUF on 12–16 GB 12–24 GB (FP8/GGUF) 5s · 720p commercial except EU/UK/KR
Step-Video-T2V (multi-GPU only) ~78 GB ~8.5s heavy — skip on free

📱 Free app tiers (easy, but watermark + personal/non-commercial — good for drafts):

App Free allowance Free output Notes
Kling AI (kling.ai) 🥇 66 credits/day (≈1–3 clips) 5s · 720p Best quality free app; T2V+I2V+Motion Control. Free queue can wait hours.
Hailuo (hailuoai.video) daily bonus (≈2–3/day) ⚠️ 6s · 768p · 24fps 15 cinematic camera commands. Pair with MiniMax Music API for an original track.
Vidu / PixVerse monthly/daily credits ⚠️unverified varies Vidu = character-reference (consistent characters across a video).

Verdict for music videos: 1. 🥇 Wan 2.2 TI2V-5B (self-host) — cleanest license (Apache-2.0), 720p/24fps, runs on a free HF ZeroGPU Space or a 4090. Publishable & monetizable. 2. 🥇 CogVideoX-2B (self-host) — runs on a free Colab T4; churn many short clips to cut to a beat. 3. Kling / Hailuo free apps — fast drafts & storyboards (watermarked). 4. Cheapest paid fallback: Zhipu CogVideoX-3 API ~$0.20/video (4K/60fps/audio).

🎙️ Need the soundtrack too? Video is only half a music video — see the companion Free Audio Generation Guide for free, commercially-clean music (ACE-Step), voice/TTS, and lyric-timing/transcription (WhisperX), and the Media Post-Production Guide 🎬 to finish it (4K upscale, 60fps, mastering, subtitles, dubbing).

⚠️ "CogVideoX-Flash free" is not confirmed for 2026 — there's no free video model on international Z.ai (CogVideoX-3 is paid). The free Flash video tier was a Chinese-BigModel feature and may be retired; verify before relying on it.

11.6 China caveats

  • Data jurisdiction: these run under PRC jurisdiction and inputs may be used to train models. Don't send secrets, proprietary, or regulated code/codebases.
  • Watermark + non-commercial on free app tiers: for anything you'll monetize (incl. YouTube music videos), use open weights (Wan / CogVideoX / Hunyuan / Kolors) — free and commercially clean.
  • Don't prompt copyrighted characters: Disney/Universal/Warner sued MiniMax (Sept 2025) over Hailuo reproducing IP. Generate original characters only.
  • ToS: Z.ai's coding plan restricts use to officially supported tools and is individual-use only (repeated violations → ban). One legit account per service — Anthropic publicly accused MiniMax (Feb 2026) of using thousands of fraudulent accounts; don't be a cautionary tale.
  • Geopolitics/availability: some orgs prohibit sending code to Chinese endpoints; access and pricing can change (Qwen's free tier vanished; GLM's entry price rose). Verify current terms before committing.

Living lists to bookmark


Sources

Compiled June 2026 from a multi-model research sweep (GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro), with every number cross-checked against primary docs:

  • Free LLM tiers & limits: github.com/cheahjs/free-llm-api-resources
  • Gemini API pricing / rate limits: ai.google.dev/gemini-api/docs/pricing, /rate-limits
  • Gemini Code Assist: blog.google/.../gemini-code-assist-free/, developers.google.com/gemini-code-assist/resources/quotas
  • Gemini CLI: github.com/google-gemini/gemini-cli
  • GitHub Copilot Free + signup pause: docs.github.com/copilot/get-started/plans
  • OpenRouter: openrouter.ai/docs (free variants, fallbacks, provider routing)
  • LiteLLM: docs.litellm.ai/docs/proxy/load_balancing, /reliability
  • Cerebras: inference-docs.cerebras.ai; Groq: console.groq.com/docs/rate-limits
  • Cloudflare Workers AI: developers.cloudflare.com/workers-ai/platform/pricing
  • Prompt caching / batch: platform.claude.com/docs, ai.google.dev/gemini-api/docs/caching, developers.openai.com/api/docs/guides/batch
  • Local runtimes: ollama.com, lmstudio.ai, github.com/ggml-org/llama.cpp, github.com/vllm-project/vllm
  • Modal pricing: modal.com/pricing
  • China — token pricing: api-docs.deepseek.com/quick_start/pricing, /news/news0802 (context cache), /news/news250929 (V3.2 DSA), volcengine.com/docs/82379/1544106 (Doubao)
  • China — coding plans: docs.z.ai/devpack, platform.minimax.io/docs/token-plan, platform.kimi.ai/docs/pricing, api-docs.deepseek.com/guides/anthropic_api
  • China — free model tiers: docs.bigmodel.cn/cn/guide/models/free/* (GLM/CogView/CogVideoX Flash), cloud.tencent.com/document/product/1729/97731 (Hunyuan 1M/yr), alibabacloud.com/help/en/model-studio/new-free-quota
  • China — image/video (open weights): github.com/Kwai-Kolors/Kolors, github.com/Tencent/HunyuanDiT, github.com/Wan-Video/Wan2.2, github.com/THUDM/CogVideo, github.com/Tencent-Hunyuan/HunyuanVideo; apps: kling.ai, hailuoai.video

Numbers are point-in-time (June 2026) and change frequently — verify before relying on them. Corrections via PR welcome.


📝 Spotted a stale quota or a license that changed? This guide is open source — edit it on GitHub.