Rewriting Home Assistant Long-term statistics from InfluxDB

In an earlier post, I made an error that incorrectly aggregated the energy data which resulted in hugely inflated aggregated energy usage. All the un-aggregated data was accurate, but the sums were wrong. Luckily I had all the raw data stored in InfluxDB and could rebuild it.

In this post, I walk through how to re-write the Home Assistant Long-term statistics database to fix this mistake.

A grossly high electric bill

First, I opened up the Influx UI and constructed a query to generate the corrected metrics. The HA database requires two fields: sum and state. One describes the cumulative running sum, and the other is the sum at the end of the hour. I didn’t fully understand how sum differed from state, so I just assumed they were the same.

from(bucket: "energy-1h")
  |> range(start: -365d, stop: now())
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "sensor")
  |> filter(fn: (r) => r["entity_id"] != "main_panel_total_energy" and r["entity_id"] != "sub_panel_32_total_power" and r["entity_id"] != "daily_electricity_usage")
  |> group(columns: ["_measurement", "_field"])
  |> aggregateWindow(every: 1h, fn: sum, createEmpty: false)
  |> cumulativeSum()
  |> yield(name: "mean")

I ran this query, then clicked download to CSV.

That gave me a file that looked like the following:


Delete first 3 lines and one of the two trailing lines.

# tail -n+4 '2022-11-30_23 38_influxdb_data.csv' | head -n-1 > influxdata.csv

Make sure the file is nearby on the host running HomeAssistant, then stop HA and open up the file:

# sqlite3 home-assistant_v2.db
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.

I looked at my target table schema:

sqlite> .schema statistics 
CREATE TABLE statistics (
        id INTEGER NOT NULL, 
        created DATETIME, 
        start DATETIME, 
        mean FLOAT, 
        min FLOAT, 
        max FLOAT, 
        last_reset DATETIME, 
        state FLOAT, 
        sum FLOAT, 
        metadata_id INTEGER, 
        PRIMARY KEY (id), 
        FOREIGN KEY(metadata_id) REFERENCES statistics_meta (id) ON DELETE CASCADE
CREATE UNIQUE INDEX ix_statistics_statistic_id_start ON statistics (metadata_id, start);
CREATE INDEX ix_statistics_start ON statistics (start);
CREATE INDEX ix_statistics_metadata_id ON statistics (metadata_id);

Then import the CSV file into the database

sqlite> .mode csv
sqlite> .import "../influxdata.csv" influx
sqlite> .schema statistics
sqlite> .schema influx    
  "" TEXT,
  "result" TEXT,
  "table" TEXT,
  "_field" TEXT,
  "_measurement" TEXT,
  "_start" TEXT,
  "_stop" TEXT,
  "_time" TEXT,
  "_value" TEXT,
  "domain" TEXT,
  "entity_id" TEXT,
  "state" TEXT

Now we’re reading to replace the data. First, find the id of the entity to replace. The first column will be the id

sqlite> select * from statistics_meta where statistic_id = 'sensor.main_panel_total_energy';

sqlite> delete from statistics where metadata_id = 1;

Then insert the data and drop the temp table

sqlite> insert into statistics select null AS id, datetime(_time) as created, datetime(_time, '-1 hour') as start, null as mean, null as min, null as max, null as last_reset, cast(_value as float) as state, cast(_value as float) as sum, 1 as metadata_id from influx;

sqlite> drop table influx;

Then delete the history for the energy_cost because at the end of the next hour, HA will distort the metrics again:

sqlite> delete from states where entity_id = 'sensor.main_panel_total_power' or entity_id = 'sensor.main_panel_total_energy' or entity_id = 'sensor.main_panel_total_energy_cost';

Home Assistant stores some intermediary data in the statistics_short_term table. This may cause problems and I’ve had to delete those records sometimes

sqlite> delete from statistics_short_term where metadata_id = 1;

Then open home-assistant/.storage/core.restore_state and change the entry for main_panel_total_energy to match the latest value from InfluxDB.

Restart Home Assistant.

VoilĂ . The data is corrected:

Leave a Reply

Your email address will not be published. Required fields are marked *