﻿# Onchain AI

> For the complete documentation index, see [llms.txt](/llms.txt)

The LLM canister is an onchain service that gives ICP canisters access to large language models without relying on HTTPS outcalls to external AI APIs. Your canister calls a shared system canister, which routes inference requests to nodes running model weights onchain. No API keys, no off-chain dependencies: AI inference becomes a native part of your canister logic.

## What the LLM canister provides

The LLM canister (canister ID: `w36hm-eqaaa-aaaal-qr76a-cai`) exposes two APIs:

- **Prompt API**: send a single text prompt and receive a text response. Best for one-shot interactions.
- **Chat API**: send a sequence of messages with roles (`system`, `user`, `assistant`) and receive the next assistant turn. Best for multi-turn conversations.

Currently supported models:

| Model | Identifier |
|-------|-----------|
| Llama 3.1 8B | `Llama3_1_8B` |

Inference is seeded from ICP's random beacon, making results deterministic per execution round and verifiable by the subnet.

**Cycles cost:** Inference is free during the initial rollout period. Pricing will be announced before the free period ends.

## How this differs from HTTPS outcalls

Using the LLM canister is different from calling an external AI API via [HTTPS outcalls](https-outcalls.md):

| | LLM canister | HTTPS outcalls to external AI |
|---|---|---|
| API keys required | No | Yes |
| Inference runs | Onchain (ICP nodes) | External provider (OpenAI, Anthropic, etc.) |
| Response determinism | Yes (random beacon seeded) | No |
| Model choice | ICP-hosted models only | Any provider's API |
| Response size | 1000 tokens output limit | Provider-dependent |

Use the LLM canister when you want tamperproof, key-free inference with deterministic results. Use HTTPS outcalls when you need a specific commercial model, larger context windows, or higher output limits.

## Add the dependency

### Motoko

Add `llm` to your `mops.toml`:

```toml
[dependencies]
llm = "2.1.0"
```

Then run:

```sh
mops install
```

### Rust

Add `ic-llm` to your `Cargo.toml`:

```toml
[dependencies]
ic-cdk = "0.17.1"
ic-llm = "1.1.0"
```

## Prompt API

The prompt API sends a single text input to the model and returns a text response. Use it for one-shot tasks: summarization, classification, extraction, or simple Q&A.

### Motoko

```motoko
import LLM "mo:llm";

persistent actor {
  public func prompt(p : Text) : async Text {
    await LLM.prompt(#Llama3_1_8B, p);
  };
};
```

### Rust

```rust
use ic_cdk::update;
use ic_llm::Model;

#[update]
async fn prompt(prompt_str: String) -> String {
    ic_llm::prompt(Model::Llama3_1_8B, prompt_str).await
}
```

## Chat API

The chat API accepts a list of messages with roles and returns the assistant's next response. Use it for multi-turn conversations or when you need a system prompt to shape the model's behavior.

### Motoko

```motoko
import LLM "mo:llm";

persistent actor {
  public func chat(messages : [LLM.ChatMessage]) : async Text {
    let response = await LLM.chat(#Llama3_1_8B).withMessages(messages).send();
    switch (response.message.content) {
      case (?text) text;
      case null "";
    };
  };
};
```

**`ChatMessage` type:**

```motoko
type ChatMessage = {
  role : { #system_; #user; #assistant };
  content : Text;
};
```

### Rust

```rust
use ic_cdk::update;
use ic_llm::{ChatMessage, Model};

#[update]
async fn chat(messages: Vec<ChatMessage>) -> String {
    let response = ic_llm::chat(Model::Llama3_1_8B)
        .with_messages(messages)
        .send()
        .await;
    response.message.content.unwrap_or_default()
}
```

**`ChatMessage` type:**

```rust
pub struct ChatMessage {
    pub role: Role,       // Role::System | Role::User | Role::Assistant
    pub content: String,
}
```

### Building a conversation

To build a multi-turn conversation, accumulate messages in stable state and pass the full history on each call:

#### Motoko

```motoko
import LLM "mo:llm";
import Array "mo:core/Array";

persistent actor {
  var history : [LLM.ChatMessage] = [];

  public func send(userMessage : Text) : async Text {
    let userEntry = { role = #user; content = userMessage };
    let allMessages = Array.concat(history, [userEntry]);
    let response = await LLM.chat(#Llama3_1_8B).withMessages(allMessages).send();
    let assistantReply = switch (response.message.content) {
      case (?text) text;
      case null "";
    };
    let assistantEntry = { role = #assistant; content = assistantReply };
    history := Array.concat(allMessages, [assistantEntry]);
    assistantReply;
  };
};
```

#### Rust

```rust
use ic_cdk::update;
use ic_llm::{ChatMessage, Role, Model};
use std::cell::RefCell;

thread_local! {
    static HISTORY: RefCell<Vec<ChatMessage>> = RefCell::new(Vec::new());
}

#[update]
async fn send(user_message: String) -> String {
    HISTORY.with(|h| {
        h.borrow_mut().push(ChatMessage {
            role: Role::User,
            content: user_message,
        });
    });
    let messages = HISTORY.with(|h| h.borrow().clone());
    let response = ic_llm::chat(Model::Llama3_1_8B)
        .with_messages(messages)
        .send()
        .await;
    let reply = response.message.content.unwrap_or_default();
    HISTORY.with(|h| {
        h.borrow_mut().push(ChatMessage {
            role: Role::Assistant,
            content: reply.clone(),
        });
    });
    reply
}
```

Note that this example stores conversation history in heap memory. For production use, store history in stable memory so it persists across canister upgrades. See [data persistence](data-persistence.md) for details.

## Limitations

During the initial rollout, the LLM canister enforces the following limits:

| Limit | Value |
|-------|-------|
| Max messages per chat request | 10 |
| Max prompt size | 10 KiB |
| Max output tokens | 1000 |
| Streaming | Not supported |

Requests that exceed these limits return an error. Design your application to stay within these bounds: for example, by trimming old messages from conversation history before each call.

Streaming is not currently supported. The LLM canister returns the complete response when inference finishes.

## Deploy and test

### Local testing

The LLM canister is not available in a local replica. To develop locally, mock the LLM canister behind a canister interface:

```motoko
// mock_llm.mo: local test stub
import LLM "mo:llm";

persistent actor {
  public func chat(messages : [LLM.ChatMessage]) : async Text {
    "Mock response for: " # (if (messages.size() > 0) messages[messages.size() - 1].content else "");
  };
};
```

For integration tests that need real inference, deploy to mainnet and test there.

### Deploy to mainnet

```sh
icp deploy -e ic
```

Once deployed, call your canister:

```sh
icp canister call -e ic <your-canister-id> prompt '("What is the Internet Computer?")'
```

## Full example

The complete chatbot example (with frontend) is available in the `dfinity/examples` repository:

- [Rust LLM chatbot](https://github.com/dfinity/examples/tree/master/rust/llm_chatbot)
- [Motoko LLM chatbot](https://github.com/dfinity/examples/tree/master/motoko/llm_chatbot)

Both examples include a browser UI and can be deployed to mainnet in a single command from [ICP Ninja](https://icp.ninja).

## Next steps

- [HTTPS outcalls](https-outcalls.md): call external AI APIs when you need more model options or larger context windows
- [Data persistence](data-persistence.md): persist conversation history across canister upgrades using stable memory
- [App architecture](../../concepts/app-architecture.md): understand where AI inference fits in a multi-canister application
