RTX 5090 in Australia March 2026 Pricing Stock Reality
RTX 5090 in AU is scarce and overpriced
Australia has RTX 5090 stock. Barely. And if you find one, you will pay a premium that feels detached from reality.
RTX 5090 in AU is scarce and overpriced
Australia has RTX 5090 stock. Barely. And if you find one, you will pay a premium that feels detached from reality.
Remote Ollama access without public ports
Ollama is at its happiest when it is treated like a local daemon: the CLI and your apps talk to a loopback HTTP API, and the rest of the network never finds out it exists.
Compose-first Ollama server with GPU and persistence.
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.
HTTPS Ollama without breaking streaming responses.
Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.
RAG embeddings - Python, Ollama, OpenAI APIs.
If you are working through retrieval-augmented generation (RAG), this section walks through text embeddings in plain terms — what they are, how they fit search and retrieval, and how to call two common local setups from Python using Ollama or an OpenAI-compatible HTTP API (as many llama.cpp-based servers expose).
Serve open models fast with SGLang.
SGLang is a high-performance serving framework for large language models and multimodal models, built to deliver low-latency and high-throughput inference across everything from a single GPU to distributed clusters.
Hot-swap local LLMs without changing clients.
Soon you are juggling vLLM, llama.cpp, and more—each stack on its own port. Everything downstream still wants one /v1 base URL; otherwise you keep shuffling ports, profiles, and one-off scripts. llama-swap is the /v1 proxy before those stacks.
Most local AI setups start with a model and a runtime.
What actually happens when you run Ultrawork.
Oh My Opencode promises a “virtual AI dev team” — Sisyphus orchestrating specialists, tasks running in parallel, and the magic ultrawork keyword activating all of it.
OpenCode LLM test — coding and accuracy stats
I have tested how OpenCode works with several locally hosted on Ollama LLMs, and for comparison added some Free models from OpenCode Zen.
Meet Sisyphus and its specialist agent crew.
The biggest capability jump in OpenCode comes from specialised agents: deliberate separation of orchestration, planning, execution, and research.
OpenHands CLI QuickStart in minutes
OpenHands is an open-source, model-agnostic platform for AI-driven software development agents. It lets an agent behave more like a coding partner than a simple autocomplete tool.
Self-host OpenAI-compatible APIs with LocalAI in minutes.
LocalAI is a self-hosted, local-first inference server designed to behave like a drop-in OpenAI API for running AI workloads on your own hardware (laptop, workstation, or on-prem server).
Install Oh My Opencode and ship faster.
Oh My Opencode turns OpenCode into a multi-agent coding harness: an orchestrator delegates work to specialist agents that run in parallel.
How to Install, Configure, and Use the OpenCode
I keep coming back to llama.cpp for local inference—it gives you control that Ollama and others abstract away, and it just works. Easy to run GGUF models interactively with llama-cli or expose an OpenAI-compatible HTTP API with llama-server.
Artificial Intelligence is reshaping how software is written, reviewed, deployed, and maintained. From AI coding assistants to GitOps automation and DevOps workflows, developers now rely on AI-powered tools across the entire software lifecycle.