Trending
GPU CLI's Avatar

GPU CLI

@gpucli

Run any command on a remote GPU from your terminal

4
Followers
16
Following
42
Posts
10.02.2026
Joined
Posts Following

Latest posts by GPU CLI @gpucli

Remember: Fine-Tuning teaches the model how to respond, not what to know, so build accordingly!

16.03.2026 23:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

3. Are you just trying to teach the model "new" information?

Opt for an alternative solution like RAG since FT struggles to memorize specific facts.

16.03.2026 23:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

2. Are you trying to reduce prompt token usage at scale?

Use Fine-Tuning as it lets you remove hundreds of words of system prompting from every API call, reducing token usage, lowering latency, and saving money at scale over time.

16.03.2026 23:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

1. Are you trying to lock down consistent output formatting?

Use Fine-Tuning because it builds "muscle memory" into the model’s weights, that allow it to follow complex structures (like JSON schemas) reliably without needing a long list of instructions every time.

16.03.2026 23:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

#FineTuning is a powerful tool for levelling up your #LLM, but when should you use it and why?

Here's a quick checklist:

16.03.2026 23:01 πŸ‘ 0 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

3. Enable Dry Runs

If an agent uses the incorrect command, it can cause real problems. Providing a `--dry-run` flag is a crucial safety net as it allows agents to validate the request locally and properly assess the result of their actions before pulling the trigger.

14.03.2026 16:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

2. Mitigate Common Agentic Errors

Where a human may make a typo, an agent may generate a path traversal or double encode a URL. To mitigate this ensure your CLI has strict input hardening and sanitises everything.

14.03.2026 16:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

1. Raw JSON > Custom Flags

While flags make passing arguments to the CLI easier for humans, agents prefer parsing the json in it's entirety. Add a `--json` path to commands so agents can pass the full API payload with zero translation loss.

14.03.2026 16:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

#CLIs are becoming an increasingly important tool for #agents to leverage, but is your CLI designed to work with agents and not against them?

Here are 3 tricks to help agents get the most out of your CLI tool.

14.03.2026 16:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
GPU CLI Documentation | GPU-CLI Run code on cloud GPUs by prefixing any command with 'gpu run'

For more information on how to leverage GPU Serverless, check out our docs gpu-cli.sh/docs

13.03.2026 23:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Then run `gpu serverless deploy` and get your endpoint.

Your server provider handles worker provisioning & scaling while you keep a single CLI flow for deployment, status checking, warming & deletion.

13.03.2026 23:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The model is simple, just start by defining your settings in the serverless section of your `gpu.jsonc`

13.03.2026 23:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

GPU Serverless deploys and manages serverless endpoints for templates like:
- ComfyUI
- vLLM
- Whisper

So you stop managing and start shipping

13.03.2026 23:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Most ML teams do not lose on model quality; they lose on deployment friction.

GPU Serverless is built for that specific gap:
- Local-first workflow
- Managed serverless endpoint
- No custom orchestration layer

13.03.2026 23:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Good news! You can have scale-to-zero GPU inference without babysitting pods.

`gpu serverless` gives you managed endpoint deploys, warmups, and lifecycle control directly from the CLI.

13.03.2026 23:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Open source models in 2026 are now approximating their closed source counterparts. Have we hit the point where every dev should be at least experimenting with them?

1️⃣ Already am
2️⃣ Planning to this month
3️⃣ Still not worth the infra hassle
4️⃣ APIs will always win

πŸ“Š Show results

07.03.2026 16:29 πŸ‘ 2 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Lots of core team members of Alibaba Qwen are resigning publicly on X.

The gaping hole that Qwen imploding would leave in the open research ecosystem will be hard to fill. The small models are irreplaceable.

I’ll do my best to keep carrying that torch. Every bit matters.

03.03.2026 18:10 πŸ‘ 106 πŸ” 11 πŸ’¬ 3 πŸ“Œ 2

Then run `gpu run serverless deploy` and get your endpoint.

Your server provider handles worker provisioning & scaling while you keep a single CLI flow for deployment, status checking, warming & deletion.

03.03.2026 00:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The model is simple, just start by defining the config in `gpu.jsonc`

03.03.2026 00:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

GPU Serverless deploys and manages serverless endpoints for templates like:
- ComfyUI
- vLLM
- Whisper

So you stop managing and start shipping

03.03.2026 00:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Most ML teams do not lose on model quality; they lose on deployment friction.

GPU Serverless is built for that specific gap:
- Local-first workflow
- Managed serverless endpoint
- No custom orchestration layer

03.03.2026 00:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Good news! You can have scale-to-zero GPU inference without babysitting pods.

`gpu serverless` gives you managed endpoint deploys, warmups, and lifecycle control directly from the CLI.

03.03.2026 00:01 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - gpu-cli/skills: AI Dev Skills AI Dev Skills. Contribute to gpu-cli/skills development by creating an account on GitHub.

Don't get caught with your pants down, find this skill and more in our repo github.com/gpu-cli/skills

28.02.2026 17:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Skills are handy, but importing unverified skills into a repo is one of the easiest ways to introduce security risks to your #VibeCoding projects.

That's why we created skill-shield, an easy way to validate and/or rewrite skills without security risks.

28.02.2026 17:03 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - gpu-cli/skills: AI Dev Skills AI Dev Skills. Contribute to gpu-cli/skills development by creating an account on GitHub.

Find this skill and more at our repo github.com/gpu-cli/skills

21.02.2026 17:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Context management is a full-time job you didn't audition for πŸ’Ό

Stop letting "context drift" kill your flow. Our context-curator skill automates agentic context so it evolves in-step with your features

Focus on the orchestration. Let your agents clean after themselves 🧹

21.02.2026 17:02 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Hunyuan3D V2 is the top choice for AI-generated 3D right nowβ€”and the best part? You can run it locally for free.

Check the link in the comments.

26.01.2025 14:56 πŸ‘ 11 πŸ” 2 πŸ’¬ 2 πŸ“Œ 1
Video thumbnail

Finally, run 'claude --model <model_name>' and you should see your model loaded and ready to go in the terminal!

18.02.2026 00:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Next, we need to set some environment variables. This can be done inline via the command line, or better yet in your shell config file (.zshrc/.bashrc)

18.02.2026 00:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

First, make sure your Ollama or vLLM setup up and running. GPU CLI makes this incredibly easy, but just make sure you have your endpoints handy. For this walkthrough we'll be assuming you're serving Ollama from localhost:11434 and vLLM from localhost:8000

18.02.2026 00:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0