Skip to content

bloat: extract addDriver validation + log paths into [[gnu::cold]] helper #2979

@zackees

Description

@zackees

Tracked in #2974.

Current size

fl::ChannelManager::addDriver(int, fl::shared_ptr<fl::IChannelDriver>) is 1,756 B on ESP32-S3. The function is called only 6x total (all from BusTraits<(fl::Bus)1>::registerWithManager() during static init), so paying ~1.7 KB at every call site is poor density for a function that almost never runs in steady state.

Breakdown - hot vs cold

Source: src/fl/channels/manager.cpp.hpp lines 51-139.

HOT (always-taken on the happy path)

  • Null-pointer check on driver (predictable: never taken in real callers)
  • driver->getName() retrieval into engineName
  • Empty-name check (predictable: never taken - built-in drivers always return non-empty)
  • Linear scan of mDrivers for duplicate name (~1-4 entries, all miss on the happy path)
  • mDrivers.push_back({priority, driver, engineName, enabled})
  • fl::sort_small(...) to maintain priority order
  • Exclusive-driver enable check

COLD (only triggered on validation failure / replacement / debug logging)

  • FL_WARN("... Null driver provided") - log + early return
  • FL_WARN("... Engine has empty name ...") - log + early return
  • FL_DBG("... already registered ... idempotent no-op") - log + early return
  • FL_WARN("... Replacing existing driver ...") - log
  • FL_DBG("Waiting for all drivers to become READY ...") + waitForReady() call (490 B callee) + erase loop + FL_DBG("Removing old driver ...")
  • FL_DBG("Added driver ... caps: ...") block (already gated behind #if FASTLED_HAS_DBG, but the caps-string construction still emits some code)

Top callees inflating the symbol

  • fl::ChannelManager::waitForReady(unsigned long) - 490 B (only reached on replace path)
  • fl::vector_basic::push_back_move_impl(void*) - 234 B (hot)
  • fl::detail::log_emit - 222 B (cold, multiple call sites)
  • Several fl::string ops - ~250 B combined (mostly for log formatting)

Proposed change

Split into a hot core and a [[gnu::cold]] out-of-line helper:

[[gnu::cold]] [[gnu::noinline]]
static void addDriver_slow(ChannelManager* self,
                            int priority,
                            fl::shared_ptr<IChannelDriver> driver,
                            AddDriverError err,
                            const fl::string& engineName);

void ChannelManager::addDriver(int priority, fl::shared_ptr<IChannelDriver> driver) FL_NOEXCEPT {
    if (FL_UNLIKELY(!driver)) {
        return addDriver_slow(this, priority, driver, AddDriverError::NullDriver, {});
    }
    fl::string engineName = driver->getName();
    if (FL_UNLIKELY(engineName.empty())) {
        return addDriver_slow(this, priority, driver, AddDriverError::EmptyName, engineName);
    }
    // Duplicate scan - if hit, jump to slow path for replace/no-op/warn handling
    for (const auto& entry : mDrivers) {
        if (FL_UNLIKELY(entry.name == engineName)) {
            return addDriver_slow(this, priority, driver,
                                   (entry.driver == driver && entry.priority == priority)
                                       ? AddDriverError::IdempotentNoOp
                                       : AddDriverError::Replacing,
                                   engineName);
        }
    }
    // Happy path: no duplicate, just append + sort
    bool enabled = mExclusiveDriver.empty() || (engineName == mExclusiveDriver);
    mDrivers.push_back({priority, driver, engineName, enabled});
    fl::sort_small(mDrivers.begin(), mDrivers.end());
}

addDriver_slow owns:

  • All FL_WARN / FL_DBG emissions (log_emit ~222 B x N sites)
  • The waitForReady() call (490 B callee) used during the replace flow
  • The erase loop for the old duplicate
  • The post-replace push_back + sort_small (or factor those back through the hot path after _slow returns the cleaned-up state)
  • The #if FASTLED_HAS_DBG caps-string construction (already gated, but the strings/log_emit still pull weight on debug builds)

Estimated savings

  • Hot path shrinks to ~200 B (precondition checks + push_back + sort_small).
  • Cold helper carries the ~1.5 KB of validation strings, log_emit calls, waitForReady() glue, and erase logic.
  • Net icache savings on the hot path: ~800-1000 B per binary.
  • The cold helper ships once and lives in a .text.cold section, out of the hot icache region. [[gnu::cold]] tells GCC/clang to optimize it for size and place it away from the hot path.

Additional benefit

Once the hot path is ~200 B, the 6 call sites in BusTraits<...>::registerWithManager() become inlining candidates - currently the 1.7 KB body blocks inlining outright. Inlining the slim hot path may further reduce per-call overhead during static init.

Constraint

Logging stays enabled (per #2974). This change does NOT remove any log site - it moves them out of the hot icache path so the happy-path code stays dense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions