Secure Upgrades
Canister upgrades are one of the highest-risk operations in production. A bad upgrade can corrupt state, make the canister permanently non-upgradeable, or break clients. This guide covers the patterns and checks you need to upgrade safely.
Checklist
Section titled “Checklist”Use this before every production upgrade:
- Take a snapshot immediately before upgrading
- Run the upgrade locally first with
icp deploy - Verify data survives: write → upgrade → read
- Check Candid interface compatibility. No removed methods, no breaking type changes
- Avoid
pre_upgradehooks that serialize large state (use stable structures instead) - In Motoko, use
persistent actor(which eliminates the need for pre_upgrade hooks): avoid manualpre_upgrade/post_upgrade - Confirm you have a backup controller (cannot recover from a trapped
post_upgradewithout one) - Add a rollback plan: snapshot ID recorded, restore procedure tested
How upgrades work
Section titled “How upgrades work”When you run icp deploy on an existing canister, the IC executes the following sequence:
- Stop the canister (waits for in-flight messages to complete)
- Run
pre_upgradeon the old code (if defined) - Replace the Wasm module with the new code
- Run
post_upgradeon the new code (if defined) - Restart the canister
Stable memory is preserved through steps 2–4. Heap memory is cleared when the new Wasm loads. If pre_upgrade or post_upgrade traps, the upgrade fails with different consequences:
| Hook | Trap result |
|---|---|
pre_upgrade | Upgrade cancelled. Old code still running. State intact but may need attention. |
post_upgrade | New Wasm installed but initialization failed. Canister may be in an inconsistent state. |
Both scenarios leave the canister in a difficult state. Prevention is far better than recovery.
Stable memory patterns
Section titled “Stable memory patterns”Motoko: use persistent actor
Section titled “Motoko: use persistent actor”The persistent actor declaration automatically stores all let and var fields in stable memory. No serialization, no upgrade hooks, no instruction-limit traps.
persistent actor Counter { var count : Nat = 0;
public func increment() : async Nat { count += 1; count; };
public query func get() : async Nat { count };
// transient: resets to [] on each upgrade: correct for caches, transient logs, and reset-on-upgrade counters transient var recentCallers : [Principal] = [];};Key rules:
- All
let/varfields persist automatically. Nostablekeyword needed transient varfor caches or counters that should reset on upgrade- Do not write manual
pre_upgrade/post_upgradehooks. The runtime handles everything - If a persistent field’s type changes incompatibly, the upgrade traps. See Schema evolution.
Rust: use stable structures
Section titled “Rust: use stable structures”In Rust, use ic-stable-structures to store data directly in stable memory. Data lives there from the start. No serialization step on upgrade.
use ic_stable_structures::{ memory_manager::{MemoryId, MemoryManager, VirtualMemory}, DefaultMemoryImpl, StableBTreeMap, StableCell,};use std::cell::RefCell;
type Memory = VirtualMemory<DefaultMemoryImpl>;
// Each structure must have its own unique MemoryId: never reuse IDsconst USERS_MEM_ID: MemoryId = MemoryId::new(0);const COUNTER_MEM_ID: MemoryId = MemoryId::new(1);
thread_local! { static MEMORY_MANAGER: RefCell<MemoryManager<DefaultMemoryImpl>> = RefCell::new(MemoryManager::init(DefaultMemoryImpl::default()));
static USERS: RefCell<StableBTreeMap<u64, Vec<u8>, Memory>> = RefCell::new(StableBTreeMap::init( MEMORY_MANAGER.with(|m| m.borrow().get(USERS_MEM_ID)) ));
static COUNTER: RefCell<StableCell<u64, Memory>> = RefCell::new(StableCell::init( MEMORY_MANAGER.with(|m| m.borrow().get(COUNTER_MEM_ID)), 0u64, ).expect("Failed to init counter"));}
#[ic_cdk::post_upgrade]fn post_upgrade() { // Stable structures auto-restore: no deserialization needed. // Re-initialize timers or transient state here if required.}Warning: Each
MemoryIdmust map to exactly one data structure for the lifetime of the canister. Reusing aMemoryIdfor a different structure after an upgrade corrupts both. Keep a written record of yourMemoryIdallocations and never reorder them.
Avoid pre_upgrade serialization
Section titled “Avoid pre_upgrade serialization”The serialization-based upgrade pattern is common in older Rust code but is fundamentally fragile:
// DO NOT DO THIS in production#[ic_cdk::pre_upgrade]fn pre_upgrade() { // If STATE is large, this hits the instruction limit and traps. // A trapped pre_upgrade prevents the upgrade: canister stays on old code. ic_cdk::storage::stable_save((STATE.with(|s| s.borrow().clone()),)).unwrap();}When pre_upgrade traps due to instruction exhaustion, the canister cannot be upgraded. The skip_pre_upgrade flag (an emergency escape hatch via the management canister’s install_code API (see Management canister reference) bypasses the hook) but anything the hook would have saved is lost. Use stable structures so the upgrade path cannot brick itself under load.
Candid interface compatibility
Section titled “Candid interface compatibility”The IC checks your new Wasm module’s Candid interface against the old one before completing the upgrade. If the new interface is not backward-compatible, the upgrade is rejected.
Safe changes:
| Change | Why it is safe |
|---|---|
| Add a new method | Existing clients don’t call it |
| Add optional parameters to an existing method | Old clients send no value; IC substitutes null |
| Remove trailing parameters from an existing method | Old clients send extra values; IC ignores them |
| Return additional values from a method | Old clients ignore extra return values |
| Change a parameter type to a supertype | Old values remain valid inputs |
| Change a return type to a subtype | New values remain valid for old clients |
Breaking changes (upgrade rejected or clients break):
| Change | Why it breaks |
|---|---|
| Remove a method | Clients calling it get errors |
| Add a required (non-optional) parameter | Old clients don’t send it |
| Change a parameter type to an incompatible type | Old clients send invalid values |
Example: safe evolution:
// Beforeservice counter : { add : (nat) -> (); get : () -> (int) query;}
// After: safe: optional param added, new return value, new methodservice counter : { add : (nat, label : opt text) -> (new_val : nat); get : () -> (nat, last_change : nat) query; reset : () -> ();}icp-cli checks Candid compatibility during deploy and prompts for confirmation if it detects a potentially breaking change. Use --yes in automated pipelines after manually verifying compatibility:
icp deploy my-canister -e ic --yesSnapshot-based rollback
Section titled “Snapshot-based rollback”Always take a snapshot immediately before a risky upgrade. If the upgrade causes unexpected behavior, you can restore the previous state within minutes.
# 1. Stop the canister and create a snapshoticp canister stop my-canister -e icicp canister snapshot create my-canister -e ic# Note the snapshot ID printed in the outputicp canister start my-canister -e ic
# 2. Deploy the upgradeicp deploy my-canister -e ic
# 3. Verify correctnessicp canister call my-canister health_check -e ic
# 4a. If everything works, clean up when no longer neededicp canister snapshot delete my-canister <snapshot-id> -e ic
# 4b. If something is wrong, stop and restoreicp canister stop my-canister -e icicp canister snapshot restore my-canister <snapshot-id> -e icicp canister start my-canister -e icSnapshots capture the full canister state: Wasm module, Wasm heap memory, stable memory, and chunk store. Restoring from a snapshot brings back all of this state atomically.
See Canister snapshots for listing, downloading, and the state transfer workflow.
Schema evolution
Section titled “Schema evolution”Upgrading canister code sometimes requires changing the shape of stored data. The rules differ by language.
Motoko
Section titled “Motoko”When upgrading a persistent actor, the runtime checks that every persistent field’s current type is compatible with the value stored in stable memory. Incompatible changes cause the upgrade to trap.
Safe changes:
- Add new
letorvarfields with initial values. The runtime initializes them on upgrade - Add optional record fields (e.g., change
{ name : Text }to{ name : Text; email : ?Text }) - Widen a field’s type (e.g.,
Nat→Int)
Unsafe changes (upgrade traps):
- Remove or rename a persistent field
- Narrow a field’s type (e.g.,
Int→Nat) - Change a non-optional field to an incompatible type
If you need to make an unsafe change, migrate the data in two upgrades: add the new field alongside the old one, upgrade once (both fields present), then upgrade again to remove the old field. Test this two-step process locally before deploying to mainnet.
Rust stable structures use serialized bytes on disk. Schema evolution safety depends on the serialization format and versioning strategy.
Adding fields safely with Candid encoding:
use candid::{CandidType, Decode, Deserialize, Encode};use ic_stable_structures::storable::{Bound, Storable};use std::borrow::Cow;
#[derive(CandidType, Deserialize, Clone)]struct UserV2 { id: u64, name: String, created: u64, // New optional field: safe to add: old records deserialize with None email: Option<String>,}
impl Storable for UserV2 { // Unbounded avoids write failures when struct grows. // Bounded requires a fixed max_size; if encoded size exceeds it after // adding fields, writes trap. const BOUND: Bound = Bound::Unbounded;
fn to_bytes(&self) -> Cow<'_, [u8]> { Cow::Owned(Encode!(self).expect("failed to encode UserV2")) }
fn from_bytes(bytes: Cow<'_, [u8]>) -> Self { Decode!(&bytes, Self).expect("failed to decode UserV2") }}Rules:
- Use
Option<T>for new fields: Candid deserializes absent fields asNone, so old records remain readable after the upgrade - Use
Bound::Unboundedunless you have a strict size requirement - Never reorder
MemoryIdallocations across upgrades: same effect as changing a field type - For breaking schema changes, use a versioned enum and migrate records lazily on read
Testing upgrades locally
Section titled “Testing upgrades locally”Never upgrade on mainnet without first verifying locally that data written before the upgrade is still readable after.
Motoko:
# Start local networkicp network start -d
# Deploy initial versionicp deploy backend
# Write dataicp canister call backend increment '()'icp canister call backend increment '()'icp canister call backend get '()'# Returns: (2 : nat)
# Modify source code, then redeployicp deploy backend
# Verify data survivedicp canister call backend get '()'# Must still return: (2 : nat)Rust:
# Start local networkicp network start -d
# Deploy initial versionicp deploy backend
# Write dataicp canister call backend add_user '("Alice")'icp canister call backend get_user_count '()'# Returns: (1 : nat64)
# Modify source code, then upgradeicp deploy backend
# Verify data survivedicp canister call backend get_user_count '()'# Must still return: (1 : nat64)If the count drops to zero after upgrade, your data is not in stable memory: review your storage declarations before touching mainnet.
For advanced scenarios (upgrade rollbacks, schema migrations, concurrent call safety), use PocketIC to script multi-step upgrade scenarios in a controlled environment.
Controller safety
Section titled “Controller safety”You cannot upgrade a canister without a valid controller. Losing all controller keys leaves the canister permanently frozen at its current code: there is no recovery path on the IC.
# Check current controllersicp canister settings show my-canister -e ic
# Add a backup controller before any risky upgradeicp canister settings update my-canister --add-controller <backup-principal> -e icFor production canisters:
- Maintain at least two controllers (primary identity + hardware wallet or multisig)
- For fully onchain governance, add an SNS or DAO canister as controller and remove personal principals
See Access management for detailed controller management patterns.
Next steps
Section titled “Next steps”- Data persistence: stable structures and upgrade patterns in depth
- Canister lifecycle: the full upgrade sequence and install modes
- Canister snapshots: create and restore snapshots
- Testing strategies: test upgrade scenarios before deploying to mainnet
- Access management: manage controllers and prevent lock-out