Secure Upgrades

Canister upgrades are one of the highest-risk operations in production. A bad upgrade can corrupt state, make the canister permanently non-upgradeable, or break clients. This guide covers the patterns and checks you need to upgrade safely.

Checklist

Use this before every production upgrade:

Take a snapshot immediately before upgrading
Run the upgrade locally first with icp deploy
Verify data survives: write → upgrade → read
Check Candid interface compatibility. No removed methods, no breaking type changes
Avoid pre_upgrade hooks that serialize large state (use stable structures instead)
In Motoko, use persistent actor (which eliminates the need for pre_upgrade hooks): avoid manual pre_upgrade/post_upgrade
Confirm you have a backup controller (cannot recover from a trapped post_upgrade without one)
Add a rollback plan: snapshot ID recorded, restore procedure tested

How upgrades work

When you run icp deploy on an existing canister, the IC executes the following sequence:

Stop the canister (waits for in-flight messages to complete)
Run pre_upgrade on the old code (if defined)
Replace the Wasm module with the new code
Run post_upgrade on the new code (if defined)
Restart the canister

Stable memory is preserved through steps 2–4. Heap memory is cleared when the new Wasm loads. If pre_upgrade or post_upgrade traps, the upgrade fails with different consequences:

Hook	Trap result
`pre_upgrade`	Upgrade cancelled. Old code still running. State intact but may need attention.
`post_upgrade`	New Wasm installed but initialization failed. Canister may be in an inconsistent state.

Both scenarios leave the canister in a difficult state. Prevention is far better than recovery.

Stable memory patterns

Motoko: use `persistent actor`

The persistent actor declaration automatically stores all let and var fields in stable memory. No serialization, no upgrade hooks, no instruction-limit traps.

persistent actor Counter {
  var count : Nat = 0;

  public func increment() : async Nat {
    count += 1;
    count;
  };

  public query func get() : async Nat { count };

  // transient: resets to [] on each upgrade: correct for caches, transient logs, and reset-on-upgrade counters
  transient var recentCallers : [Principal] = [];
};

Key rules:

All let/var fields persist automatically. No stable keyword needed
transient var for caches or counters that should reset on upgrade
Do not write manual pre_upgrade/post_upgrade hooks. The runtime handles everything
If a persistent field’s type changes incompatibly, the upgrade traps. See Schema evolution.

Rust: use stable structures

In Rust, use ic-stable-structures to store data directly in stable memory. Data lives there from the start. No serialization step on upgrade.

use ic_stable_structures::{
    memory_manager::{MemoryId, MemoryManager, VirtualMemory},
    DefaultMemoryImpl, StableBTreeMap, StableCell,
};
use std::cell::RefCell;

type Memory = VirtualMemory<DefaultMemoryImpl>;

// Each structure must have its own unique MemoryId: never reuse IDs
const USERS_MEM_ID: MemoryId = MemoryId::new(0);
const COUNTER_MEM_ID: MemoryId = MemoryId::new(1);

thread_local! {
    static MEMORY_MANAGER: RefCell<MemoryManager<DefaultMemoryImpl>> =
        RefCell::new(MemoryManager::init(DefaultMemoryImpl::default()));

    static USERS: RefCell<StableBTreeMap<u64, Vec<u8>, Memory>> =
        RefCell::new(StableBTreeMap::init(
            MEMORY_MANAGER.with(|m| m.borrow().get(USERS_MEM_ID))
        ));

    static COUNTER: RefCell<StableCell<u64, Memory>> =
        RefCell::new(StableCell::init(
            MEMORY_MANAGER.with(|m| m.borrow().get(COUNTER_MEM_ID)),
            0u64,
        ).expect("Failed to init counter"));
}

#[ic_cdk::post_upgrade]
fn post_upgrade() {
    // Stable structures auto-restore: no deserialization needed.
    // Re-initialize timers or transient state here if required.
}

Warning: Each MemoryId must map to exactly one data structure for the lifetime of the canister. Reusing a MemoryId for a different structure after an upgrade corrupts both. Keep a written record of your MemoryId allocations and never reorder them.

Avoid `pre_upgrade` serialization

The serialization-based upgrade pattern is common in older Rust code but is fundamentally fragile:

// DO NOT DO THIS in production
#[ic_cdk::pre_upgrade]
fn pre_upgrade() {
    // If STATE is large, this hits the instruction limit and traps.
    // A trapped pre_upgrade prevents the upgrade: canister stays on old code.
    ic_cdk::storage::stable_save((STATE.with(|s| s.borrow().clone()),)).unwrap();
}

When pre_upgrade traps due to instruction exhaustion, the canister cannot be upgraded. The skip_pre_upgrade flag (an emergency escape hatch via the management canister’s install_code API (see Management canister reference) bypasses the hook) but anything the hook would have saved is lost. Use stable structures so the upgrade path cannot brick itself under load.

Candid interface compatibility

The IC checks your new Wasm module’s Candid interface against the old one before completing the upgrade. If the new interface is not backward-compatible, the upgrade is rejected.

Safe changes:

Change	Why it is safe
Add a new method	Existing clients don’t call it
Add optional parameters to an existing method	Old clients send no value; IC substitutes `null`
Remove trailing parameters from an existing method	Old clients send extra values; IC ignores them
Return additional values from a method	Old clients ignore extra return values
Change a parameter type to a supertype	Old values remain valid inputs
Change a return type to a subtype	New values remain valid for old clients

Breaking changes (upgrade rejected or clients break):

Change	Why it breaks
Remove a method	Clients calling it get errors
Add a required (non-optional) parameter	Old clients don’t send it
Change a parameter type to an incompatible type	Old clients send invalid values

Example: safe evolution:

// Before
service counter : {
  add : (nat) -> ();
  get : () -> (int) query;
}

// After: safe: optional param added, new return value, new method
service counter : {
  add : (nat, label : opt text) -> (new_val : nat);
  get : () -> (nat, last_change : nat) query;
  reset : () -> ();
}

icp-cli checks Candid compatibility during deploy and prompts for confirmation if it detects a potentially breaking change. Use --yes in automated pipelines after manually verifying compatibility:

icp deploy my-canister -e ic --yes

Snapshot-based rollback

Always take a snapshot immediately before a risky upgrade. If the upgrade causes unexpected behavior, you can restore the previous state within minutes.

# 1. Stop the canister and create a snapshot
icp canister stop my-canister -e ic
icp canister snapshot create my-canister -e ic
# Note the snapshot ID printed in the output
icp canister start my-canister -e ic

# 2. Deploy the upgrade
icp deploy my-canister -e ic

# 3. Verify correctness
icp canister call my-canister health_check -e ic

# 4a. If everything works, clean up when no longer needed
icp canister snapshot delete my-canister <snapshot-id> -e ic

# 4b. If something is wrong, stop and restore
icp canister stop my-canister -e ic
icp canister snapshot restore my-canister <snapshot-id> -e ic
icp canister start my-canister -e ic

Snapshots capture the full canister state: Wasm module, Wasm heap memory, stable memory, and chunk store. Restoring from a snapshot brings back all of this state atomically.

See Canister snapshots for listing, downloading, and the state transfer workflow.

Schema evolution

Upgrading canister code sometimes requires changing the shape of stored data. The rules differ by language.

Motoko

When upgrading a persistent actor, the runtime checks that every persistent field’s current type is compatible with the value stored in stable memory. Incompatible changes cause the upgrade to trap.

Safe changes:

Add new let or var fields with initial values. The runtime initializes them on upgrade
Add optional record fields (e.g., change { name : Text } to { name : Text; email : ?Text })
Widen a field’s type (e.g., Nat → Int)

Unsafe changes (upgrade traps):

Remove or rename a persistent field
Narrow a field’s type (e.g., Int → Nat)
Change a non-optional field to an incompatible type

If you need to make an unsafe change, migrate the data in two upgrades: add the new field alongside the old one, upgrade once (both fields present), then upgrade again to remove the old field. Test this two-step process locally before deploying to mainnet.

Rust

Rust stable structures use serialized bytes on disk. Schema evolution safety depends on the serialization format and versioning strategy.

Adding fields safely with Candid encoding:

use candid::{CandidType, Decode, Deserialize, Encode};
use ic_stable_structures::storable::{Bound, Storable};
use std::borrow::Cow;

#[derive(CandidType, Deserialize, Clone)]
struct UserV2 {
    id: u64,
    name: String,
    created: u64,
    // New optional field: safe to add: old records deserialize with None
    email: Option<String>,
}

impl Storable for UserV2 {
    // Unbounded avoids write failures when struct grows.
    // Bounded requires a fixed max_size; if encoded size exceeds it after
    // adding fields, writes trap.
    const BOUND: Bound = Bound::Unbounded;

    fn to_bytes(&self) -> Cow<'_, [u8]> {
        Cow::Owned(Encode!(self).expect("failed to encode UserV2"))
    }

    fn from_bytes(bytes: Cow<'_, [u8]>) -> Self {
        Decode!(&bytes, Self).expect("failed to decode UserV2")
    }
}

Rules:

Use Option<T> for new fields: Candid deserializes absent fields as None, so old records remain readable after the upgrade
Use Bound::Unbounded unless you have a strict size requirement
Never reorder MemoryId allocations across upgrades: same effect as changing a field type
For breaking schema changes, use a versioned enum and migrate records lazily on read

Testing upgrades locally

Never upgrade on mainnet without first verifying locally that data written before the upgrade is still readable after.

Motoko:

# Start local network
icp network start -d

# Deploy initial version
icp deploy backend

# Write data
icp canister call backend increment '()'
icp canister call backend increment '()'
icp canister call backend get '()'
# Returns: (2 : nat)

# Modify source code, then redeploy
icp deploy backend

# Verify data survived
icp canister call backend get '()'
# Must still return: (2 : nat)

Rust:

# Start local network
icp network start -d

# Deploy initial version
icp deploy backend

# Write data
icp canister call backend add_user '("Alice")'
icp canister call backend get_user_count '()'
# Returns: (1 : nat64)

# Modify source code, then upgrade
icp deploy backend

# Verify data survived
icp canister call backend get_user_count '()'
# Must still return: (1 : nat64)

If the count drops to zero after upgrade, your data is not in stable memory: review your storage declarations before touching mainnet.

For advanced scenarios (upgrade rollbacks, schema migrations, concurrent call safety), use PocketIC to script multi-step upgrade scenarios in a controlled environment.

Controller safety

You cannot upgrade a canister without a valid controller. Losing all controller keys leaves the canister permanently frozen at its current code: there is no recovery path on the IC.

# Check current controllers
icp canister settings show my-canister -e ic

# Add a backup controller before any risky upgrade
icp canister settings update my-canister --add-controller <backup-principal> -e ic

For production canisters:

Maintain at least two controllers (primary identity + hardware wallet or multisig)
For fully onchain governance, add an SNS or DAO canister as controller and remove personal principals

See Access management for detailed controller management patterns.

Next steps

Data persistence: stable structures and upgrade patterns in depth
Canister lifecycle: the full upgrade sequence and install modes
Canister snapshots: create and restore snapshots
Testing strategies: test upgrade scenarios before deploying to mainnet
Access management: manage controllers and prevent lock-out