웹 검색을 위해 Ollama Python 라이브러리를 설치하는 방법은 무엇인가요?

pip install ‘ollama>=0.6.0’ 명령어를 사용하여 0.6.0 이상의 버전을 설치하십시오. 이 버전은 web_search 및 web_fetch 함수를 포함합니다.

Ollama Python에서 web_search와 web_fetch의 차이는 무엇인가요?

web_search는 인터넷을 검색하여 제목, URL 및 요약문을 포함한 여러 개의 검색 결과를 반환합니다. web_fetch는 특정 URL의 전체 내용을 가져와 페이지 제목, 마크다운 형식의 콘텐츠 및 링크를 반환합니다.

Ollama 검색 에이전트에 가장 잘 작동하는 Python 모델은 무엇인가요?

도구 사용 능력이 강한 모델이 가장 효과적이며, 이에 해당하는 모델로는 qwen3, gpt-oss, 그리고 qwen3 480b-cloud, deepseek-v3.1-cloud와 같은 클라우드 모델이 있습니다.

대규모 웹 검색 결과를 Python으로 어떻게 처리해야 하나요?

문맥 제한에 맞게 결과를 줄이세요. 권장되는 방법은 모델에 전달하기 전에 결과 문자열을 약 8000자(2000 토큰 x 4자)로 잘라내는 것입니다.

파이썬 검색 에이전트에 대해 어떤 맥락 길이를 설정해야 하나요?

성능이 합리적으로 유지되도록 컨텍스트 길이를 약 32000 토큰으로 설정하세요. 검색 에이전트는 웹 검색 및 웹 가져오기에서 수천 개의 토큰을 반환할 수 있으므로 전체 컨텍스트 길이를 사용하는 것이 가장 좋습니다.

파이썬에서 Ollama Web Search API 사용하기

Q: Ollama Python 웹 검색을 비동기 코드와 함께 사용할 수 있나요?

네, Ollama Python 라이브러리는 비동기 작업을 지원합니다. 비동기 애플리케이션에서 웹 검색 및 데이터 가져오기 작업을 차단하지 않도록 AsyncClient를 사용하세요.

파이썬과 올라마로 AI 검색 에이전트를 구축하세요.

Page content

Ollama의 Python 라이브러리는 이제 네이티브 OLlama 웹 검색 기능을 포함하고 있습니다. 몇 줄의 코드만으로도, 실시간 인터넷 정보를 사용하여 로컬 LLM을 보완할 수 있고, 환각을 줄이고 정확도를 향상시킬 수 있습니다.

시작하기

Ollama Python 라이브러리를 웹 검색용으로 설치하려면 어떻게 하나요? pip install 'ollama>=0.6.0'을 사용하여 0.6.0 이상의 버전을 설치하세요. 이 버전은 web_search 및 web_fetch 함수를 포함하고 있습니다.

pip install 'ollama>=0.6.0'

Python 환경 및 패키지 관리를 위해 uv, 빠른 Python 패키지 관리자 또는 venv를 사용하여 가상 환경을 설정하여 종속성을 분리할 수 있습니다.

Ollama 계정에서 API 키를 생성하고 환경 변수로 설정하세요:

export OLLAMA_API_KEY="your_api_key"

Windows PowerShell에서:

$env:OLLAMA_API_KEY = "your_api_key"

기본 웹 검색

Ollama로 웹 검색하는 가장 간단한 방법은 다음과 같습니다:

import ollama

# 간단한 웹 검색
response = ollama.web_search("Ollama란 무엇인가요?")
print(response)

출력:

results = [
    {
        "title": "Ollama",
        "url": "https://ollama.com/",
        "content": "Ollama에 클라우드 모델이 이제 사용 가능합니다..."
    },
    {
        "title": "Ollama란 무엇인가요? 기능, 가격, 사용 사례",
        "url": "https://www.walturn.com/insights/what-is-ollama",
        "content": "우리 서비스..."
    },
    {
        "title": "Ollama 완전 가이드: 설치, 사용 및 코드 예제",
        "url": "https://collabnix.com/complete-ollama-guide",
        "content": "우리 디스코드 서버에 가입하세요..."
    }
]

결과 개수 제어

import ollama

# 더 많은 결과 얻기
response = ollama.web_search("최신 AI 뉴스", max_results=10)

for result in response.results:
    print(f"📌 {result.title}")
    print(f"   {result.url}")
    print(f"   {result.content[:100]}...")
    print()

전체 페이지 내용 가져오기

Ollama Python에서 web_search와 web_fetch의 차이점은 무엇인가요? web_search는 인터넷을 검색하고, 제목, URL, 스니펫을 포함한 여러 검색 결과를 반환합니다. web_fetch는 특정 URL의 전체 내용을 가져오고, 페이지 제목, 마크다운 내용 및 링크를 반환합니다. web_fetch가 반환하는 마크다운 내용은 추가 처리에 완벽합니다. 다른 맥락에서 HTML을 마크다운으로 변환하려면 Python으로 HTML을 마크다운으로 변환에 대한 가이드를 참조하세요.

from ollama import web_fetch

result = web_fetch('https://ollama.com')
print(result)

출력:

WebFetchResponse(
    title='Ollama',
    content='[클라우드 모델](https://ollama.com/blog/cloud-models)은 이제 Ollama에 사용 가능합니다\n\n**채팅 및 오픈 모델로 빌드**\n\n[다운로드](https://ollama.com/download) [모델 탐색](https://ollama.com/models)\n\nmacOS, Windows, Linux에 사용 가능',
    links=['https://ollama.com/', 'https://ollama.com/models', 'https://github.com/ollama/ollama']
)

검색 및 가져오기 결합

일반적인 패턴은 먼저 검색하고, 관련 결과에서 전체 내용을 가져오는 것입니다:

from ollama import web_search, web_fetch

# 정보 검색
search_results = web_search("Ollama 2025년 새로운 기능")

# 첫 번째 결과에서 전체 내용 가져오기
if search_results.results:
    first_url = search_results.results[0].url
    full_content = web_fetch(first_url)
    
    print(f"제목: {full_content.title}")
    print(f"내용: {full_content.content[:500]}...")
    print(f"발견된 링크: {len(full_content.links)}")

검색 에이전트 구축

Ollama 검색 에이전트에 가장 적합한 Python 모델은 무엇인가요? 강력한 도구 사용 능력이 있는 모델이 가장 적합하며, qwen3, gpt-oss, 그리고 qwen3:480b-cloud 및 deepseek-v3.1-cloud와 같은 클라우드 모델이 포함됩니다. 이러한 모델에서 구조화된 출력이 필요한 고급 사용 사례에 대해서는 Ollama 및 Qwen3를 사용한 LLM의 구조화된 출력에 대한 가이드를 참조하세요.

먼저 능력 있는 모델을 끌어다 사용하세요:

ollama pull qwen3:4b

간단한 검색 에이전트

자율적으로 검색할 때 결정할 수 있는 기본 검색 에이전트입니다:

from ollama import chat, web_fetch, web_search

available_tools = {'web_search': web_search, 'web_fetch': web_fetch}

messages = [{'role': 'user', 'content': "ollama의 새로운 엔진은 무엇인가요"}]

while True:
    response = chat(
        model='qwen3:4b',
        messages=messages,
        tools=[web_search, web_fetch],
        think=True
    )
    
    if response.message.thinking:
        print('🧠 생각 중:', response.message.thinking[:200], '...')
    
    if response.message.content:
        print('💬 응답:', response.message.content)
    
    messages.append(response.message)
    
    if response.message.tool_calls:
        print('🔧 도구 호출:', response.message.tool_calls)
        for tool_call in response.message.tool_calls:
            function_to_call = available_tools.get(tool_call.function.name)
            if function_to_call:
                args = tool_call.function.arguments
                result = function_to_call(**args)
                print('📥 결과:', str(result)[:200], '...')
                # 맥락 길이 제한을 위해 결과를 자르기
                messages.append({
                    'role': 'tool', 
                    'content': str(result)[:2000 * 4], 
                    'tool_name': tool_call.function.name
                })
            else:
                messages.append({
                    'role': 'tool', 
                    'content': f'{tool_call.function.name} 도구를 찾을 수 없습니다', 
                    'tool_name': tool_call.function.name
                })
    else:
        break

파이썬에서 대규모 웹 검색 결과를 처리하려면 어떻게 하나요? 맥락 제한에 맞게 결과를 자르세요. 권장 방법은 결과 문자열을 약 8000자(2000 토큰 × 4자)로 자르고 모델에 전달하는 것입니다.

오류 처리가 있는 고급 검색 에이전트

더 나은 오류 처리가 있는 업그레이드된 버전입니다:

from ollama import chat, web_fetch, web_search
import json

class SearchAgent:
    def __init__(self, model: str = 'qwen3:4b'):
        self.model = model
        self.tools = {'web_search': web_search, 'web_fetch': web_fetch}
        self.messages = []
        self.max_iterations = 10
        
    def query(self, question: str) -> str:
        self.messages = [{'role': 'user', 'content': question}]
        
        for iteration in range(self.max_iterations):
            try:
                response = chat(
                    model=self.model,
                    messages=self.messages,
                    tools=[web_search, web_fetch],
                    think=True
                )
            except Exception as e:
                return f"채팅 중 오류 발생: {e}"
            
            self.messages.append(response.message)
            
            # 도구 호출이 없으면 최종 답변
            if not response.message.tool_calls:
                return response.message.content or "생성된 응답 없음"
            
            # 도구 호출 실행
            for tool_call in response.message.tool_calls:
                result = self._execute_tool(tool_call)
                self.messages.append({
                    'role': 'tool',
                    'content': result,
                    'tool_name': tool_call.function.name
                })
        
        return "최대 반복 횟수에 도달했으나 최종 답변 없음"
    
    def _execute_tool(self, tool_call) -> str:
        func_name = tool_call.function.name
        args = tool_call.function.arguments
        
        if func_name not in self.tools:
            return f"알 수 없는 도구: {func_name}"
        
        try:
            result = self.tools[func_name](**args)
            # 맥락 제한을 위해 자르기
            result_str = str(result)
            if len(result_str) > 8000:
                result_str = result_str[:8000] + "... [자름]"
            return result_str
        except Exception as e:
            return f"도구 오류: {e}"

# 사용법
agent = SearchAgent(model='qwen3:4b')
answer = agent.query("Ollama의 최신 기능은 무엇인가요?")
print(answer)

비동기 웹 검색

Ollama Python 웹 검색을 비동기 코드와 함께 사용할 수 있나요? 네, Ollama Python 라이브러리는 비동기 작업을 지원합니다. 비동기 애플리케이션에서 비차단 웹 검색 및 가져오기 작업을 수행하려면 AsyncClient를 사용하세요. 서버리스 맥락에서 파이썬과 다른 언어 간 성능 비교를 원하시면 AWS Lambda 성능: JavaScript, Python, Golang에 대한 분석을 참조하세요.

import asyncio
from ollama import AsyncClient

async def async_search():
    client = AsyncClient()
    
    # 여러 검색을 동시에 수행
    tasks = [
        client.web_search("Ollama 기능"),
        client.web_search("로컬 LLM 도구"),
        client.web_search("AI 검색 에이전트"),
    ]
    
    results = await asyncio.gather(*tasks)
    
    for i, result in enumerate(results):
        print(f"검색 {i + 1}:")
        for r in result.results[:2]:
            print(f"  - {r.title}")
        print()

# 비동기 검색 실행
asyncio.run(async_search())

비동기 검색 에이전트

import asyncio
from ollama import AsyncClient

async def async_research_agent(question: str):
    client = AsyncClient()
    messages = [{'role': 'user', 'content': question}]
    
    while True:
        response = await client.chat(
            model='qwen3:4b',
            messages=messages,
            tools=[client.web_search, client.web_fetch],
        )
        
        messages.append(response.message)
        
        if not response.message.tool_calls:
            return response.message.content
        
        # 도구 호출을 동시에 실행
        tool_tasks = []
        for tool_call in response.message.tool_calls:
            if tool_call.function.name == 'web_search':
                task = client.web_search(**tool_call.function.arguments)
            elif tool_call.function.name == 'web_fetch':
                task = client.web_fetch(**tool_call.function.arguments)
            else:
                continue
            tool_tasks.append((tool_call.function.name, task))
        
        # 결과 수집
        for tool_name, task in tool_tasks:
            result = await task
            messages.append({
                'role': 'tool',
                'content': str(result)[:8000],
                'tool_name': tool_name
            })

# 실행
answer = asyncio.run(async_research_agent("Python 3.13에 새로운 기능은 무엇인가요?"))
print(answer)

맥락 길이 및 성능

Python 검색 에이전트에 어떤 맥락 길이를 설정해야 하나요? 약 32000 토큰의 맥락 길이를 설정하세요. 검색 에이전트는 web_search 및 web_fetch가 수천 토큰을 반환할 수 있기 때문에 전체 맥락 길이가 가장 좋습니다.

from ollama import chat, web_search

# 검색 중심 작업에 대한 더 높은 맥락 설정
response = chat(
    model='qwen3:4b',
    messages=[{'role': 'user', 'content': '최신 AI 개발을 조사하세요'}],
    tools=[web_search],
    options={
        'num_ctx': 32768,  # 32K 맥락
    }
)

MCP 서버 통합

Ollama는 웹 검색이 가능한 모든 MCP 클라이언트에서 웹 검색을 가능하게 하는 Python MCP 서버를 제공합니다. MCP 서버를 Python으로 구축하고 웹 검색 및 스크래핑 기능을 갖춘 포괄적인 가이드를 원하시면 Python에서 MCP 서버 구축에 대한 상세한 튜토리얼을 참조하세요.

Cline 통합

Cline 설정에서 MCP 서버를 구성하세요:

MCP 서버 관리 → MCP 서버 구성 → 추가:

{
  "mcpServers": {
    "web_search_and_fetch": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "path/to/web-search-mcp.py"],
      "env": { "OLLAMA_API_KEY": "your_api_key_here" }
    }
  }
}

Codex 통합

~/.codex/config.toml에 추가하세요:

[mcp_servers.web_search]
command = "uv"
args = ["run", "path/to/web-search-mcp.py"]
env = { "OLLAMA_API_KEY" = "your_api_key_here" }

자체 MCP 서버 생성

#!/usr/bin/env python3
"""Ollama 웹 검색을 위한 간단한 MCP 서버."""

import os
from mcp.server import Server
from mcp.types import Tool, TextContent
from ollama import web_search, web_fetch

app = Server("ollama-web-search")

@app.tool()
async def search_web(query: str, max_results: int = 5) -> str:
    """정보를 검색합니다."""
    results = web_search(query, max_results=max_results)
    
    output = []
    for r in results.results:
        output.append(f"**{r.title}**\n{r.url}\n{r.content}\n")
    
    return "\n---\n".join(output)

@app.tool()
async def fetch_page(url: str) -> str:
    """웹 페이지의 전체 내용을 가져옵니다."""
    result = web_fetch(url)
    return f"# {result.title}\n\n{result.content}"

if __name__ == "__main__":
    app.run()

실용적인 예시

이 예시들은 Ollama의 웹 검색 API의 실제 세계 적용 사례를 보여줍니다. 이러한 패턴을 확장하여 더 복잡한 시스템을 구축할 수 있습니다. 예를 들어, Python에서 PDF 생성을 결합하여 연구 보고서를 생성할 수 있습니다.

뉴스 요약기

from ollama import chat, web_search

def summarize_news(topic: str) -> str:
    # 최신 뉴스 검색
    results = web_search(f"{topic} 최신 뉴스", max_results=5)
    
    # 모델에 전달할 검색 결과 형식화
    news_content = "\n\n".join([
        f"**{r.title}**\n{r.content}"
        for r in results.results
    ])
    
    # 요약 요청
    response = chat(
        model='qwen3:4b',
        messages=[{
            'role': 'user',
            'content': f"{topic}에 대한 이러한 뉴스 항목을 요약하세요:\n\n{news_content}"
        }]
    )
    
    return response.message.content

summary = summarize_news("인공지능")
print(summary)

연구 보조자

from ollama import chat, web_search, web_fetch
from dataclasses import dataclass

@dataclass
class ResearchResult:
    question: str
    sources: list
    answer: str

def research(question: str) -> ResearchResult:
    # 관련 정보 검색
    search_results = web_search(question, max_results=3)
    
    # 상위 출처에서 전체 내용 가져오기
    sources = []
    full_content = []
    
    for result in search_results.results[:3]:
        try:
            page = web_fetch(result.url)
            sources.append(result.url)
            full_content.append(f"출처: {result.url}\n{page.content[:2000]}")
        except:
            continue
    
    # 종합 답변 생성
    context = "\n\n---\n\n".join(full_content)
    
    response = chat(
        model='qwen3:4b',
        messages=[{
            'role': 'user',
            'content': f"""다음 출처를 기반으로 이 질문에 답하세요: {question}

출처:
{context}

출처를 인용하여 종합적인 답변을 제공하세요."""
        }]
    )
    
    return ResearchResult(
        question=question,
        sources=sources,
        answer=response.message.content
    )

# 사용법
result = research("Ollama의 새로운 모델 스케줄링은 어떻게 작동하나요?")
print(f"질문: {result.question}")
print(f"출처: {result.sources}")
print(f"답변: {result.answer}")

권장 모델

모델	파라미터	최적 용도
`qwen3:4b`	4B	빠른 로컬 검색
`qwen3`	8B	일반 목적 에이전트
`gpt-oss`	다양한	연구 작업
`qwen3:480b-cloud`	480B	복잡한 추론 (클라우드)
`gpt-oss:120b-cloud`	120B	긴 형식 연구 (클라우드)
`deepseek-v3.1-cloud`	-	고급 분석 (클라우드)

최선의 실천

결과 자르기: 항상 맥락 제한에 맞게 웹 결과를 자르세요 (~8000자)
오류 처리: 네트워크 실패를 위해 도구 호출을 try/except로 감싸세요
레이트 제한: Ollama의 웹 검색 API 레이트 제한을 존중하세요
맥락 길이: 검색 에이전트에 약 32000 토큰 사용하세요
확장성: 동시 작업을 위해 AsyncClient 사용하세요
테스트: 단위 테스트를 작성하여 검색 에이전트의 신뢰성을 보장하세요
파이썬 기초: 파이썬 빠른 참조를 항상 준비하세요