- Introduced `scripts/mqtt_z2m_diag.py` for reusable MQTT and Zigbee2MQTT diagnostics. - Added `copilot-instructions.md` section for MQTT/Zigbee diagnostics tool usage. - Created `docs/mqtt-broker-broad-analysis.md` for comprehensive MQTT broker analysis. - Documented Salotto Overview Switch investigation in `docs/salotto-overview-switch-investigation.md`.
4.4 KiB
MQTT broker broad analysis
Scope
Broad passive review of the MQTT broker used for casa, focused on signs of broker stress, unusual traffic, excessive retained state, or noisy publishers that could affect performance or the network.
Method
- Verified broker reachability on TCP/1883.
- Took a 45-second full subscription sample (
#+$SYS/#) to capture retained-state bursts and broker metrics. - Took a 60-second steady-state sample with a 5-second warmup ignored to separate normal retained snapshots from ongoing traffic.
Findings
Overall status
No obvious broker health issue showed up. The broker appears stable and not under noticeable pressure:
| Signal | Observation |
|---|---|
| Connected clients | 46 |
| Max clients seen | 47 |
| Subscriptions | 945 |
| Dropped publishes | 0 |
| Retained messages stored | 1026 |
| Retained store size | 1,494,548 bytes |
| Heap current / max | 4,149,612 / 4,887,547 bytes |
The $SYS counters did not suggest backlog, churn, or message loss.
Traffic shape
The first sample had a large initial burst, but it was mostly explained by retained state replay and Zigbee2MQTT bridge metadata sent immediately after subscribing:
- 970 retained messages were seen on connect.
- Largest payloads were:
zigbee2mqtt_2/bridge/definitions- 245,350 byteszigbee2mqtt/bridge/definitions- 217,824 byteszigbee2mqtt/bridge/devices- 82,585 byteszigbee2mqtt_2/bridge/devices- 81,884 bytes
That explains the one-shot peak of 1084 messages/second during the broad sample. It looks like a subscription snapshot, not an ongoing flood.
Steady-state load
After excluding the initial retained burst:
| Metric | Value |
|---|---|
| Sample window | 60 seconds |
| Non-retained messages | 1511 |
| Non-retained bytes | 42,949 |
| Average rate | 25.18 messages/second |
| Peak second | 97 messages |
| Unique topics seen | 202 |
That is a fairly modest steady-state load. The broker is handling a reasonable message rate without signs of distress.
Noisiest publishers
The clear dominant talker is a Shelly EM3 namespace:
- Root prefix
shelliesaccounted for 1362 / 1511 steady-state messages. - The top topics were all from
shellies/shellyem3-485519D91C40/emeter/.... - Individual EM3 topics appeared 35 times in 60 seconds, which is chatty but not bandwidth-heavy.
Important nuance:
- This is mostly a message-count issue, not a bandwidth issue.
- The same steady-state sample shows
shelliesproduced only 7004 bytes total.
So the Shelly EM3 is the main source of ongoing chatter, but it does not currently look like a broker or network problem by itself.
Large payloads
Outside the retained startup burst, large payloads were minimal:
- Only one large non-retained payload was observed in the steady-state sample:
frigate/stats- 10,743 bytes
Early large-byte topics from frigate snapshots and hass.agent thumbnails appeared in the broad capture, but they did not show up as sustained heavy traffic in the steady-state sample.
Topic naming oddities
Several Zigbee2MQTT topics contain spaces, for example:
zigbee2mqtt/Btcino coso salotto/availabilityzigbee2mqtt/Letty Condizionatore Ufficio/availability
This is not a broker anomaly, but it is worth noting:
- it can make tooling and ad-hoc topic handling more brittle
- it increases the chance of mistakes in scripts, automations, and CLI work
If you want cleaner topic hygiene, consider slug-style friendly names for Zigbee2MQTT devices.
Conclusion
What looks healthy
- No dropped publishes
- No sign of broker backlog or unstable client churn
- Retained store is present but not unusually large
- Heap usage is not alarming
- Steady-state traffic volume is modest
What stands out
- A retained-state burst on subscribe, mostly from Zigbee2MQTT bridge metadata. This is expected behavior and not a live flood.
- A very chatty Shelly EM3 publisher dominating message count. It is the main thing to watch, but at current byte volume it does not look harmful.
- Topic names with spaces in Zigbee2MQTT. Not a performance issue, but a maintainability footgun.
Recommendation
No urgent remediation is indicated from this pass.
If you want to reduce noise further, the best next place to look would be the publish frequency/config of shellies/shellyem3-485519D91C40, since that device is responsible for most of the ongoing message count.