# SQLite Backup & Data Integrity Plan ## Status Legend - `[ ]` To do - `[x]` Done - `[~]` Partial / needs review --- ## Phase 1 — WAL Mode **Goal:** Ensure SQLite uses Write-Ahead Logging for safe concurrent reads and hot backups. - [ ] Connect to the DB and verify WAL is active: ```bash sqlite3 /path/to/your.db "PRAGMA journal_mode;" # Expected output: wal ``` - [ ] If not `wal`, enable it (run once, persists): ```bash sqlite3 /path/to/your.db "PRAGMA journal_mode=WAL;" ``` - [ ] Confirm the `-wal` and `-shm` sidecar files exist next to the `.db` file after a write - [ ] Make sure nginx/PHP has write access to those sidecar files (same owner as the `.db`) --- ## Phase 2 — Audit Log **Goal:** Record every INSERT, UPDATE, and DELETE with the actor, timestamp, and a before/after snapshot. ### 2.1 — Create the table - [ ] Add the `audit_log` table to the DB: ```sql CREATE TABLE IF NOT EXISTS audit_log ( id INTEGER PRIMARY KEY AUTOINCREMENT, timestamp TEXT NOT NULL DEFAULT (datetime('now')), actor TEXT NOT NULL, action TEXT NOT NULL CHECK(action IN ('INSERT','UPDATE','DELETE')), table_name TEXT NOT NULL, record_id INTEGER, old_data TEXT, new_data TEXT ); ``` ### 2.2 — Instrument PHP mutations - [ ] Create a reusable `audit()` helper in PHP that accepts `$db, $actor, $action, $table, $id, $old, $new` - [ ] Wrap every **DELETE** in the admin dashboard with `audit()`, capturing the row before deletion - [ ] Wrap every **UPDATE** (form submissions + admin edits) with `audit()`, capturing before/after - [ ] Wrap **INSERTs** for completeness (new_data only) - [ ] Verify by triggering a test delete and querying `SELECT * FROM audit_log ORDER BY id DESC LIMIT 5;` ### 2.3 — Protect the audit log - [ ] No UI should expose a "clear audit log" button - [ ] The PHP DB user should not have `DELETE` permission on `audit_log` (use a restricted PDO connection for app queries if possible) --- ## Phase 3 — Soft Deletes **Goal:** Prevent hard DELETEs on critical tables so data is always recoverable instantly. htmx elements that query languages/keywords must continue to work transparently. ### 3.1 — Schema changes - [ ] Identify all tables that htmx elements query (e.g. `languages`, `keywords`, any lookup/reference tables) - [ ] Add `deleted_at` to each: ```sql ALTER TABLE languages ADD COLUMN deleted_at TEXT DEFAULT NULL; ALTER TABLE keywords ADD COLUMN deleted_at TEXT DEFAULT NULL; -- repeat for other affected tables ``` ### 3.2 — Replace DELETE queries - [ ] Search the codebase for `DELETE FROM languages`, `DELETE FROM keywords`, etc. - [ ] Replace each hard DELETE with a soft delete: ```php // Before $db->prepare("DELETE FROM languages WHERE id = ?")->execute([$id]); // After $db->prepare("UPDATE languages SET deleted_at = datetime('now') WHERE id = ?") ->execute([$id]); ``` - [ ] Do the same in any admin dashboard bulk-delete operations ### 3.3 — Filter deleted rows everywhere - [ ] Add `WHERE deleted_at IS NULL` to **every** SELECT that feeds an htmx endpoint: ```sql -- Example SELECT * FROM languages WHERE deleted_at IS NULL ORDER BY name; SELECT * FROM keywords WHERE deleted_at IS NULL ORDER BY name; ``` - [ ] Search for raw `SELECT * FROM languages` and `SELECT * FROM keywords` across all PHP files and patch each one - [ ] Test each htmx-driven element (dropdowns, tag lists, autocompletes) to confirm deleted entries no longer appear ### 3.4 — Admin: show soft-deleted entries - [ ] Add an admin view that lists soft-deleted rows (`WHERE deleted_at IS NOT NULL`) with a **Restore** button - [ ] The restore action sets `deleted_at = NULL` --- ## Phase 4 — Hourly Snapshots via Cronjob **Goal:** Automatically save compressed, timestamped copies of the DB locally, retained for 30 days. ### 4.1 — Create the backup script - [ ] Create `/usr/local/bin/backup-sqlite.sh`: ```bash #!/bin/bash DB_PATH="/var/www/myapp/database.db" BACKUP_DIR="/var/backups/myapp" RETENTION_DAYS="${RETENTION_DAYS:-30}" TIMESTAMP=$(date +"%Y-%m-%dT%H-%M-%S") BACKUP_FILE="$BACKUP_DIR/db-$TIMESTAMP.db.gz" mkdir -p "$BACKUP_DIR" # Safe hot backup using SQLite's online backup API sqlite3 "$DB_PATH" ".backup /tmp/myapp-snapshot.db" gzip -c /tmp/myapp-snapshot.db > "$BACKUP_FILE" rm /tmp/myapp-snapshot.db # Prune old backups find "$BACKUP_DIR" -name "*.db.gz" -mtime +$RETENTION_DAYS -delete echo "[$(date)] Backup written: $BACKUP_FILE" ``` - [ ] Make it executable: ```bash chmod +x /usr/local/bin/backup-sqlite.sh ``` - [ ] Run it manually once and verify a `.db.gz` file appears in `/var/backups/myapp/` - [ ] Test restore by decompressing and opening the snapshot: ```bash gunzip -c /var/backups/myapp/db-.db.gz > /tmp/test-restore.db sqlite3 /tmp/test-restore.db ".tables" ``` ### 4.2 — Schedule with cron - [ ] Open the crontab: ```bash crontab -e ``` - [ ] Add hourly and daily jobs: ```cron # Hourly snapshot — kept 30 days 0 * * * * /usr/local/bin/backup-sqlite.sh >> /var/log/sqlite-backup.log 2>&1 # Daily snapshot at 2am — kept 90 days 0 2 * * * RETENTION_DAYS=90 /usr/local/bin/backup-sqlite.sh >> /var/log/sqlite-backup.log 2>&1 ``` - [ ] Verify the log after the next hour: `tail -f /var/log/sqlite-backup.log` --- ## Phase 5 — Remote Sync *(for later)* **Goal:** Push backups off the VM to a remote destination so a disk failure or VM loss doesn't take your history with it. - [ ] Choose a remote destination (Backblaze B2, S3, SFTP, etc.) - [ ] Install and configure rclone: ```bash apt install rclone rclone config # set up a remote, name it "mybackups" ``` - [ ] Add remote sync to the backup script after the `gzip` step: ```bash rclone copy "$BACKUP_FILE" mybackups:myapp-backups/ ``` - [ ] Enable versioning on the remote bucket (B2/S3) so even remote overwrites are recoverable - [ ] Test a full restore from remote: ```bash rclone copy mybackups:myapp-backups/db-.db.gz /tmp/ gunzip /tmp/db-.db.gz sqlite3 /tmp/db-.db ".tables" ``` - [ ] (Optional) Set up a separate cron to prune remote copies older than 6 months --- ## Quick Reference — Recovery Scenarios | Scenario | Solution | |---|---| | Admin accidentally deleted a row | Set `deleted_at = NULL` in the relevant table | | User submitted bad data via a form | Query `audit_log` for the `old_data` JSON, restore manually | | Bulk accidental delete | Restore from the last hourly snapshot (< 1h data loss max) | | VM or disk failure | Pull latest snapshot from remote (Phase 5) | | "Who deleted this and when?" | `SELECT * FROM audit_log WHERE table_name='x' AND action='DELETE'` |