Hardware

How Ollama Handles Parallel Requests

How Ollama Handles Parallel Requests

Configuring ollama for parallel requests executions.

When the Ollama server receives two requests at the same time, its behavior depends on its configuration and available system resources.

Large Language Models Speed Test

Large Language Models Speed Test

Let's test the LLMs' speed on GPU vs CPU

Comparing prediction speed of several versions of LLMs: llama3 (Meta/Facebook), phi3 (Microsoft), gemma (Google), mistral(open source) on CPU and GPU.