Ollama新版功能：AI思维链控制

~~关注我，一起成长~~

1、概述

Ollama 现在可以启用或禁用思考功能。这使用户可以灵活地针对不同的应用程序和用例选择模型的思考行为。

当开启思考时，输出会将模型的思考和模型的输出分离；当关闭思考时，模型不会思考，直接输出内容。

支持思考的模型：

• DeepSeek R1
• Qwen3

2、CLI命令行中使用

CLI中缺省是启用思考模式的，如:

ollama run deepseek-r1

如果要禁用思考模式，可以在互动环境时输入***/set nothink***

在互动环节输入：/set think以启用思考模式

也可以直接在执行命令的时候设置思考模式

ollama run deepseek-r1 --think # 开启思考模式ollama run deepseek-r1 --think=false #禁用思考模式

对于直接使用脚本进行推理时，可以使用–hidethinking，这会启用思维模型但只想返回最终的结果，而不包括思考过程。

ollama run deepseek-r1:8b --hidethinking "9.9与9.11哪一个更大?"

3、Rest API

Ollama 的生成 API（/api/generate）和聊天 API（/api/chat）均已更新，以支持思考。

新增了一个think参数，可以设置为true或false用于启用模型的思考过程。当该think参数设置为 true 时，输出将把模型的思考与模型的输出分离。这可以帮助用户打造全新的应用体验，例如通过图形界面以动画形式呈现思考过程，或者让游戏中的 NPC 在输出前显示思考气泡。当该think参数设置为 false 时，模型将不会思考，直接输出内容。

使用 Ollama 聊天 API 并启用思考的示例

curl http://localhost:11434/api/chat -d '{  "model": "deepseek-r1",  "messages": [    {      "role": "user",      "content": "how many r in the word strawberry?"    }  ],  "think": true,  "stream": false}'

输出

{"model":"deepseek-r1","created_at":"2025-05-29T09:35:56.836222Z","message":    {"role": "assistant",    "content": "The word "strawberry" contains **three** instances of the letter 'R' ..."    "thinking": "First, the question is: "how many r in the word  strawberry?" I need to count the number of times the letter 'r' appears in the word "strawberry". Let me write down the word:...",    "done_reason":"stop",    "done":true,    "total_duration":47975065417,    "load_duration":29758167,    "prompt_eval_count":10,    "prompt_eval_duration":174191542,    "eval_count":2514,    "eval_duration":47770692833    }}

4、Python 库

请更新到最新的 Ollama Python 库。

pip install ollama

from ollama import chatmessages = [  {    'role': 'user',    'content': 'What is 10 + 23?',  },]response = chat('deepseek-r1', messages=messages, think=True)print('Thinking:n========nn' + response.message.thinking)print('nResponse:n========nn' + response.message.content)

5、JavaScript 库

请更新到最新的 Ollama JavaScript 库。

npm i ollama

import ollama from 'ollama'async function main() {  const response = await ollama.chat({    model: 'deepseek-r1',    messages: [      {        role: 'user',        content: 'What is 10 + 23',      },    ],    stream: false,    think: true,  })  console.log('Thinking:n========nn' + response.message.thinking)  console.log('nResponse:n========nn' + response.message.content + 'nn')}main()

思考式流式响应示例

import ollama from 'ollama'async function main() {  const response = await ollama.chat({    model: 'deepseek-r1',    messages: [      {        role: 'user',        content: 'What is 10 + 23',      },    ],    stream: true,    think: true,  })  let startedThinking = false  let finishedThinking = false  for await (const chunk of response) {    if (chunk.message.thinking && !startedThinking) {      startedThinking = true      process.stdout.write('Thinking:n========nn')    } else if (chunk.message.content && startedThinking && !finishedThinking) {      finishedThinking = true      process.stdout.write('nnResponse:n========nn')    }    if (chunk.message.thinking) {      process.stdout.write(chunk.message.thinking)    } else if (chunk.message.content) {      process.stdout.write(chunk.message.content)    }  }}main()