Skip to content

[Bug]: 500 server error when re-applying replica groups service #3676

@r4victor

Description

@r4victor

Steps to reproduce

  1. Have an old replica group service submitted before Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection #3573
  2. Re-apply the same service with the same name – the server will fail with 500 and the following log:
  Top-level `resources` is not allowed when `replicas` is a list. Specify `resources` in each replica group instead. (type=value_error)
__root__
  Missing configuration (type=value_error)

Actual behaviour

The problem is the old run can no longer pass the replica groups check because default resources changed in #3573

resources = values.get("resources")
default_resources = ResourcesSpec()
if resources and resources.dict() != default_resources.dict():
raise ValueError(
"Top-level `resources` is not allowed when `replicas` is a list. "
"Specify `resources` in each replica group instead."
)

Expected behaviour

Although we can handle the error, the check itself is very fragile since it depends on default ResourcesSpec() staying the same.

dstack version

master

Server logs

File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/routers/runs.py", line 132, in get_plan
    run_plan = await runs.get_plan(
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 347, in get_plan
    current_resource = await get_run_by_name(
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 297, in get_run_by_name
    return run_model_to_run(run_model, return_in_api=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 757, in run_model_to_run
    run_spec = get_run_spec(run_model)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 117, in get_run_spec
    return RunSpec.__response__.parse_raw(run_model.run_spec)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pydantic/main.py", line 572, in pydantic.main.BaseModel.parse_raw
  File "pydantic/main.py", line 549, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 364, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 2 validation errors for RunSpecResponse
configuration -> ServiceConfigurationResponse -> __root__
  Top-level `resources` is not allowed when `replicas` is a list. Specify `resources` in each replica group instead. (type=value_error)
__root__
  Missing configuration (type=value_error)

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions