Skip to content

Add instances option to target specific fleet instances#3925

Open
fededagos wants to merge 5 commits into
dstackai:masterfrom
fededagos:feat/target-specific-fleet-instances
Open

Add instances option to target specific fleet instances#3925
fededagos wants to merge 5 commits into
dstackai:masterfrom
fededagos:feat/target-specific-fleet-instances

Conversation

@fededagos
Copy link
Copy Markdown

@fededagos fededagos commented Jun 1, 2026

Adds an instances option to run configurations (dev environments, tasks, services) that restricts a run to specific existing fleet instances.

Syntax

Long forms:

instances:
  - fleet: my-fleet
    instance: 3
  - name: my-fleet-1
  - hostname: worker-1

Short form for matching by instance name:

instances:
  - my-fleet-1

The fleet form also supports <project name>/<fleet name> for fleets from another project.

Behavior

  • instances has allow-list semantics: a run is placed only on a matching existing instance.
  • When instances is set, dstack never provisions new instances to satisfy the run.
  • If no matching instance is available, the run fails with a no-capacity error; retry can be used to wait for a selected busy instance to free up.
  • A run is rejected up front if it specifies fewer instances than required by its node count.
  • New-capacity backend offers are skipped when instances is set because they cannot satisfy the selector.

Implementation

  • Adds strict selector models to ProfileParams: name, hostname, and fleet + instance, while preserving the string shorthand as an instance-name selector.
  • Reuses the existing fleet/instance offer selection path and filters loaded instances by the selected instance selectors.
  • Supports both backend fleets and SSH fleets.
  • Supports qualified fleet references without broadening query complexity for the common unqualified case.
  • Keeps the change backward compatible by omitting unset instances for older client/server compatibility paths.
  • No DB schema change; the field is stored in the existing run/profile JSON.

Docs

Updated the shared fleet-management snippet and protips guide. The docs promote the explicit syntax first and keep the short instance-name syntax in a collapsible section.

Testing

  • uv run ruff check .
  • uv run pyright -p .
  • uv run pytest2607 passed, 1055 skipped
  • Local end-to-end testing with dstack server:
    • Backend AWS fleet: baseline fleets plus all four instances syntaxes completed successfully.
    • SSH fleet: created an EC2 instance from the dstack AWS AMI, configured it as an SSH fleet, and verified all four instances syntaxes completed successfully.
    • Negative case: nonexistent instance selector failed with “Failed to use specified instances” and did not provision another backend instance.

AI Assistance

This PR includes AI-assisted changes. The original PR noted Claude Code assistance; follow-up schema, implementation review, tests, docs, and E2E verification were assisted by Codex.

fededagos and others added 5 commits June 1, 2026 11:19
Introduce an `instances` run profile option that pins a run to specific
existing fleet instances (nodes). Each value matches an instance by its
name (e.g. `my-fleet-0`) or by its hostname/IP address.

When set, `filter_instances` keeps only matching instances and the job
assignment phase never provisions new capacity to satisfy a node
selector, terminating with a no-capacity error instead.
Reject runs that target fewer instances than the number of nodes they
require, surfaced during planning via `validate_run_spec_and_set_defaults`.

Exclude new-capacity backend offers from the run plan when `instances` is
set, since they are never provisioned and would otherwise mislead the
`dstack apply`/`dstack offer` output.
Add a 'Targeting specific instances' section to the shared fleets snippet
(dev environments, tasks, services) and a corresponding tip in the
protips guide.
Handle an explicit empty `instances` list consistently across the
assignment gate, plan output, and instance filtering by checking
`is not None` instead of truthiness, so an empty list targets existing
instances only (rather than silently allowing new-capacity provisioning
and showing unusable offers).

Add regression tests ensuring the instance selector is applied on the
multinode and shared-instances filter paths.
@peterschmidt85 peterschmidt85 changed the title Add instances option to target specific fleet nodes Add instances option to target specific fleet instances Jun 5, 2026
@peterschmidt85 peterschmidt85 requested a review from jvstme June 5, 2026 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant