Skip to content

Reduce per-symbol bloat in the FastLED 1-4KB function band on ESP32-S3 (logging must stay enabled) #2974

@zackees

Description

@zackees

Background

Per the symbol-bloat analysis of the latest ESP32-S3 Blink build (fbuild, post #2968):

  • Total flash: 330,054 B across 3,428 live symbols
  • 8 distinct fl::* functions land in the 1-4 KB band, totaling 13,738 B
  • ESP-IDF / libc cluster (~28 KB libc printf, ~10 KB ESP-IDF drivers, ~5 KB Arduino HAL) dominates the absolute top, but those are vendor-controlled. The 1-4 KB band is where FastLED has real leverage.

The smoking-gun pattern

Every single 1-4 KB FastLED symbol is dominated by FL_WARN/FL_ERROR/FL_INFO machinery. Verified across all 8:

Callee shared by all 8 functions Bytes
fl::detail::log_emit(log_kind, char const*, int, fl::sstream&) 222 B
fl::fastled_file_offset(char const*) 113 B
fl::sstream::appendFormatted(long) 83 B
fl::sstream::appendFormatted(unsigned long) 83 B
fl::string::string() 30 B
fl::basic_string::~basic_string() 30 B
fl::basic_string::append(char const*) 27 B
_Unwind_Resume (C++ exception unwinding) 156 B
__stack_chk_fail + __stack_chk_guard 13 B

These callees are referenced (so they each cost their full size once). But the per-call-site inline cost is what bloats individual functions:

Per FL_WARN/FL_ERROR call site:
  ~200-300 B inline (sstream ctor + format << chain + log_emit call)
  ~100-200 B exception landing pad metadata (string/sstream dtors can throw)
A function with 3-5 log sites ends up with ~1-2 KB of pure logging overhead.

Constraint

Logging must stay enabledFASTLED_LOG_VERBOSITY=1 (the default for debug + when SKETCH_HAS_LARGE_MEMORY) must continue to produce full log output. Sub-issues that try to no-op log call sites are off-table.

What IS in scope: make the logging infrastructure itself cheaper. Examples:

  • Mark string/sstream destructors noexcept -> eliminate landing pads (saves ~100-200 B per call site)
  • Push more inline boilerplate into log_emit -> shrink the per-call-site burst
  • Consolidate multiple log sites in one function through expected<>::failure(...) carrying the error info to a single bottom-of-function log
  • Specialize basic_string hot path so writes from sstream don't pay the variant-storage dispatch
  • Extract validation/error paths into [[gnu::cold]] helpers (out-of-line, doesn't bloat the caller)
  • Convert dynamic-driver dispatch (which currently inlines multiple writer templates) to virtual dispatch

Sub-issues (will be linked as filed)

Cross-cutting (affects all 8 functions):

Per-function:

Expected combined savings

If the cross-cutting fixes (A + B + C) land and at least 4 of the per-function fixes ship, we should recover ~6-8 KB of flash on ESP32-S3 Blink — roughly 2% of total flash, but more importantly it removes the pattern that makes every FastLED function pay the logging tax twice (call site + cleanup metadata).

Investigation tooling

  • Bloat report: .build/symbols/esp32s3-fbuild/report.md (3,428 symbols)
  • Back-reference graphviz sidecars: .build/symbols/esp32s3-fbuild/graphs/0001..0269.dot
  • Per-symbol analysis script: ci/tmp/fl_bloat_investigation.py
  • Per-function detail: .build/fl_bloat_investigation.md

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions