HomeNetwork/docs/mqtt-broker-broad-analysis.md

117 lines
4.4 KiB
Markdown
Raw Permalink Normal View History

# MQTT broker broad analysis
## Scope
Broad passive review of the MQTT broker used for **casa**, focused on signs of broker stress, unusual traffic, excessive retained state, or noisy publishers that could affect performance or the network.
## Method
1. Verified broker reachability on TCP/1883.
2. Took a **45-second full subscription sample** (`#` + `$SYS/#`) to capture retained-state bursts and broker metrics.
3. Took a **60-second steady-state sample** with a **5-second warmup ignored** to separate normal retained snapshots from ongoing traffic.
## Findings
### Overall status
**No obvious broker health issue showed up.** The broker appears stable and not under noticeable pressure:
| Signal | Observation |
| --- | --- |
| Connected clients | 46 |
| Max clients seen | 47 |
| Subscriptions | 945 |
| Dropped publishes | 0 |
| Retained messages stored | 1026 |
| Retained store size | 1,494,548 bytes |
| Heap current / max | 4,149,612 / 4,887,547 bytes |
The `$SYS` counters did **not** suggest backlog, churn, or message loss.
### Traffic shape
The first sample had a large initial burst, but it was mostly explained by **retained state replay** and **Zigbee2MQTT bridge metadata** sent immediately after subscribing:
- 970 retained messages were seen on connect.
- Largest payloads were:
- `zigbee2mqtt_2/bridge/definitions` - 245,350 bytes
- `zigbee2mqtt/bridge/definitions` - 217,824 bytes
- `zigbee2mqtt/bridge/devices` - 82,585 bytes
- `zigbee2mqtt_2/bridge/devices` - 81,884 bytes
That explains the one-shot peak of **1084 messages/second** during the broad sample. It looks like a subscription snapshot, **not** an ongoing flood.
### Steady-state load
After excluding the initial retained burst:
| Metric | Value |
| --- | --- |
| Sample window | 60 seconds |
| Non-retained messages | 1511 |
| Non-retained bytes | 42,949 |
| Average rate | 25.18 messages/second |
| Peak second | 97 messages |
| Unique topics seen | 202 |
That is a fairly modest steady-state load. The broker is handling a reasonable message rate without signs of distress.
### Noisiest publishers
The clear dominant talker is a **Shelly EM3** namespace:
- Root prefix `shellies` accounted for **1362 / 1511** steady-state messages.
- The top topics were all from `shellies/shellyem3-485519D91C40/emeter/...`.
- Individual EM3 topics appeared **35 times in 60 seconds**, which is chatty but not bandwidth-heavy.
Important nuance:
- This is mostly a **message-count** issue, not a **bandwidth** issue.
- The same steady-state sample shows `shellies` produced only **7004 bytes** total.
So the Shelly EM3 is the main source of ongoing chatter, but it does **not** currently look like a broker or network problem by itself.
### Large payloads
Outside the retained startup burst, large payloads were minimal:
- Only one large non-retained payload was observed in the steady-state sample:
- `frigate/stats` - 10,743 bytes
Early large-byte topics from `frigate` snapshots and `hass.agent` thumbnails appeared in the broad capture, but they did **not** show up as sustained heavy traffic in the steady-state sample.
### Topic naming oddities
Several Zigbee2MQTT topics contain spaces, for example:
- `zigbee2mqtt/Btcino coso salotto/availability`
- `zigbee2mqtt/Letty Condizionatore Ufficio/availability`
This is **not** a broker anomaly, but it is worth noting:
- it can make tooling and ad-hoc topic handling more brittle
- it increases the chance of mistakes in scripts, automations, and CLI work
If you want cleaner topic hygiene, consider slug-style friendly names for Zigbee2MQTT devices.
## Conclusion
### What looks healthy
- No dropped publishes
- No sign of broker backlog or unstable client churn
- Retained store is present but not unusually large
- Heap usage is not alarming
- Steady-state traffic volume is modest
### What stands out
1. **A retained-state burst on subscribe**, mostly from Zigbee2MQTT bridge metadata. This is expected behavior and not a live flood.
2. **A very chatty Shelly EM3 publisher** dominating message count. It is the main thing to watch, but at current byte volume it does not look harmful.
3. **Topic names with spaces** in Zigbee2MQTT. Not a performance issue, but a maintainability footgun.
## Recommendation
No urgent remediation is indicated from this pass.
If you want to reduce noise further, the best next place to look would be the publish frequency/config of `shellies/shellyem3-485519D91C40`, since that device is responsible for most of the ongoing message count.