117 lines
4.4 KiB
Markdown
117 lines
4.4 KiB
Markdown
|
|
# MQTT broker broad analysis
|
||
|
|
|
||
|
|
## Scope
|
||
|
|
|
||
|
|
Broad passive review of the MQTT broker used for **casa**, focused on signs of broker stress, unusual traffic, excessive retained state, or noisy publishers that could affect performance or the network.
|
||
|
|
|
||
|
|
## Method
|
||
|
|
|
||
|
|
1. Verified broker reachability on TCP/1883.
|
||
|
|
2. Took a **45-second full subscription sample** (`#` + `$SYS/#`) to capture retained-state bursts and broker metrics.
|
||
|
|
3. Took a **60-second steady-state sample** with a **5-second warmup ignored** to separate normal retained snapshots from ongoing traffic.
|
||
|
|
|
||
|
|
## Findings
|
||
|
|
|
||
|
|
### Overall status
|
||
|
|
|
||
|
|
**No obvious broker health issue showed up.** The broker appears stable and not under noticeable pressure:
|
||
|
|
|
||
|
|
| Signal | Observation |
|
||
|
|
| --- | --- |
|
||
|
|
| Connected clients | 46 |
|
||
|
|
| Max clients seen | 47 |
|
||
|
|
| Subscriptions | 945 |
|
||
|
|
| Dropped publishes | 0 |
|
||
|
|
| Retained messages stored | 1026 |
|
||
|
|
| Retained store size | 1,494,548 bytes |
|
||
|
|
| Heap current / max | 4,149,612 / 4,887,547 bytes |
|
||
|
|
|
||
|
|
The `$SYS` counters did **not** suggest backlog, churn, or message loss.
|
||
|
|
|
||
|
|
### Traffic shape
|
||
|
|
|
||
|
|
The first sample had a large initial burst, but it was mostly explained by **retained state replay** and **Zigbee2MQTT bridge metadata** sent immediately after subscribing:
|
||
|
|
|
||
|
|
- 970 retained messages were seen on connect.
|
||
|
|
- Largest payloads were:
|
||
|
|
- `zigbee2mqtt_2/bridge/definitions` - 245,350 bytes
|
||
|
|
- `zigbee2mqtt/bridge/definitions` - 217,824 bytes
|
||
|
|
- `zigbee2mqtt/bridge/devices` - 82,585 bytes
|
||
|
|
- `zigbee2mqtt_2/bridge/devices` - 81,884 bytes
|
||
|
|
|
||
|
|
That explains the one-shot peak of **1084 messages/second** during the broad sample. It looks like a subscription snapshot, **not** an ongoing flood.
|
||
|
|
|
||
|
|
### Steady-state load
|
||
|
|
|
||
|
|
After excluding the initial retained burst:
|
||
|
|
|
||
|
|
| Metric | Value |
|
||
|
|
| --- | --- |
|
||
|
|
| Sample window | 60 seconds |
|
||
|
|
| Non-retained messages | 1511 |
|
||
|
|
| Non-retained bytes | 42,949 |
|
||
|
|
| Average rate | 25.18 messages/second |
|
||
|
|
| Peak second | 97 messages |
|
||
|
|
| Unique topics seen | 202 |
|
||
|
|
|
||
|
|
That is a fairly modest steady-state load. The broker is handling a reasonable message rate without signs of distress.
|
||
|
|
|
||
|
|
### Noisiest publishers
|
||
|
|
|
||
|
|
The clear dominant talker is a **Shelly EM3** namespace:
|
||
|
|
|
||
|
|
- Root prefix `shellies` accounted for **1362 / 1511** steady-state messages.
|
||
|
|
- The top topics were all from `shellies/shellyem3-485519D91C40/emeter/...`.
|
||
|
|
- Individual EM3 topics appeared **35 times in 60 seconds**, which is chatty but not bandwidth-heavy.
|
||
|
|
|
||
|
|
Important nuance:
|
||
|
|
|
||
|
|
- This is mostly a **message-count** issue, not a **bandwidth** issue.
|
||
|
|
- The same steady-state sample shows `shellies` produced only **7004 bytes** total.
|
||
|
|
|
||
|
|
So the Shelly EM3 is the main source of ongoing chatter, but it does **not** currently look like a broker or network problem by itself.
|
||
|
|
|
||
|
|
### Large payloads
|
||
|
|
|
||
|
|
Outside the retained startup burst, large payloads were minimal:
|
||
|
|
|
||
|
|
- Only one large non-retained payload was observed in the steady-state sample:
|
||
|
|
- `frigate/stats` - 10,743 bytes
|
||
|
|
|
||
|
|
Early large-byte topics from `frigate` snapshots and `hass.agent` thumbnails appeared in the broad capture, but they did **not** show up as sustained heavy traffic in the steady-state sample.
|
||
|
|
|
||
|
|
### Topic naming oddities
|
||
|
|
|
||
|
|
Several Zigbee2MQTT topics contain spaces, for example:
|
||
|
|
|
||
|
|
- `zigbee2mqtt/Btcino coso salotto/availability`
|
||
|
|
- `zigbee2mqtt/Letty Condizionatore Ufficio/availability`
|
||
|
|
|
||
|
|
This is **not** a broker anomaly, but it is worth noting:
|
||
|
|
|
||
|
|
- it can make tooling and ad-hoc topic handling more brittle
|
||
|
|
- it increases the chance of mistakes in scripts, automations, and CLI work
|
||
|
|
|
||
|
|
If you want cleaner topic hygiene, consider slug-style friendly names for Zigbee2MQTT devices.
|
||
|
|
|
||
|
|
## Conclusion
|
||
|
|
|
||
|
|
### What looks healthy
|
||
|
|
|
||
|
|
- No dropped publishes
|
||
|
|
- No sign of broker backlog or unstable client churn
|
||
|
|
- Retained store is present but not unusually large
|
||
|
|
- Heap usage is not alarming
|
||
|
|
- Steady-state traffic volume is modest
|
||
|
|
|
||
|
|
### What stands out
|
||
|
|
|
||
|
|
1. **A retained-state burst on subscribe**, mostly from Zigbee2MQTT bridge metadata. This is expected behavior and not a live flood.
|
||
|
|
2. **A very chatty Shelly EM3 publisher** dominating message count. It is the main thing to watch, but at current byte volume it does not look harmful.
|
||
|
|
3. **Topic names with spaces** in Zigbee2MQTT. Not a performance issue, but a maintainability footgun.
|
||
|
|
|
||
|
|
## Recommendation
|
||
|
|
|
||
|
|
No urgent remediation is indicated from this pass.
|
||
|
|
|
||
|
|
If you want to reduce noise further, the best next place to look would be the publish frequency/config of `shellies/shellyem3-485519D91C40`, since that device is responsible for most of the ongoing message count.
|