docs: backend audit — add refactor & maintenance task list to TODO.md

Full analysis of PHP and SQLite layer covering:

Performance:
- WAL mode + cache_size pragma missing from Database constructor
- Separate is_published/year indexes force temp B-tree sort on every public query
- v_theses_full materialised as CTE on every query, indexes never used by view
- getAllPublishedTheses() runs full 15-join view just for the author name index

PHP / Database.php:
- 5 dead CRUD helpers (getOrientationId etc.) never called anywhere
- 13 alias methods doubling every lookup; pick canonical names and remove
- getPDO()/getConnection() leaking to 8 call-sites with raw SQL that belongs in DB layer
- Unparameterised query in edit.php line 155 (SQL injection, fix immediately)
- sanitize_string() HTML-escapes at write time — stores & in DB, breaks search/export
- Dead variable $problematique in formulaire.php (read from POST, never used)
- setThesisJury() has no transaction guard of its own
- DB config auto-detection silently uses test.db if file exists locally

Maintainability:
- edit.php (530 lines) mixes display + POST + file upload — extract action file
- Banner upload logic copy-pasted between formulaire.php and edit.php
- Junction-table loops open-coded in every action; add setThesisLanguages/Formats/Tags
- RateLimit writes a JSON file on every public request
- __wakeup() throws from public method (PHP 8 deprecation)
This commit is contained in:
Pontoporeia
2026-03-26 22:42:35 +01:00
parent b12ae73e91
commit 72daf46c46

107
TODO.md
View File

@@ -340,3 +340,110 @@ Goal: rename the tables and column to the canonical M2M pattern (`tags`, `thesis
- [x] Add admin user management UI — password change/set for PHP auth layer (`public/admin/account.php` + `actions/account.php`; "Compte" nav link; account CSS)
- [x] Merge `status.php` and `logs.php` into a single `system.php` page; remove "Statut" and "Journaux" nav links, add single "Système" link; preserve all existing content in their respective sections
- [x] Rework logs UI: replace the select-then-click-Afficher flow with instant tabs (nginx access, nginx error, php-fpm); switching tabs loads the selected log immediately without a form submit; add a copy-to-clipboard button per log view
---
## Refactor & Maintenance (backend audit 2026-03-26)
### A — SQLite / Query performance
- [ ] **WAL mode** — set `PRAGMA journal_mode = WAL` and `PRAGMA synchronous = NORMAL` in `Database::__construct()` after `foreign_keys = ON`.
Eliminates full-database read-locks on every write; makes concurrent PHP-FPM workers safe.
Also add `PRAGMA cache_size = -8000` (≈8 MB page cache) while there.
- [ ] **Composite index `(is_published, year DESC)`** on `theses` — every public query filters on both.
Currently `idx_theses_published` and `idx_theses_year` are separate; the query planner picks one
and sorts the other with a temp B-tree. A single covering index eliminates the sort:
`CREATE INDEX IF NOT EXISTS idx_theses_pub_year ON theses(is_published, year DESC);`
Add to `schema.sql` + a migration.
- [ ] **`v_theses_full` is always fully materialised** — every query against `v_theses_public` forces
SQLite to expand the entire view CTE (all 15 JOINs + 8 GROUP_CONCAT temp B-trees) before
applying the `WHERE`/`LIMIT`. The view cannot be used with an index. Fix: **replace `v_theses_public`
with a plain table query in `searchTheses` and `getPublishedTheses`** — build the minimal JOIN set
per query and let indexes filter `theses` first. Keep the views for reporting/admin use only.
- [ ] **`getAllPublishedTheses()` in `search.php`** — fetches every published thesis (all columns,
all JOINs) just to build the student-name index on the Répertoire page. This is a full table
scan through the fat view. Replace with a dedicated lean query:
`SELECT id, authors FROM v_theses_public ORDER BY authors ASC` — or add
`Database::getPublishedAuthors(): array` that queries `thesis_authors JOIN authors` directly,
avoiding the view entirely.
- [ ] **`migration 005` view is stale in the file** — `005_add_banner.sql` recreates the view still
referencing `thesis_keywords` / `keywords.keyword` (the old pre-migration-001 names).
The file is already applied to the live DB (correctly, since 001 ran first), but the migration
file itself is wrong and misleading. Fix: rewrite it to reference `thesis_tags` / `tags.name`,
or drop the view recreation from it (schema.sql is the canonical source).
### B — PHP / Database.php
- [ ] **Dead CRUD helpers**`getOrientationId()`, `getAPProgramId()`, `getFinalityId()`,
`getLanguageId()`, `getFormatId()` are defined in `Database.php` but **never called** anywhere
(forms now pass IDs directly from selects). Remove them to reduce surface area.
- [ ] **Alias proliferation**`getOrientations()`/`getAllOrientations()`, `getApPrograms()`/`getAllAPPrograms()`,
`getFinalityTypes()`/`getAllFinalityTypes()`, `getLanguages()`/`getAllLanguages()`,
`getFormatTypes()`/`getAllFormatTypes()`, `getLicenseTypes()`/`getAllLicenseTypes()`,
`findOrCreateKeyword()`, `getUsedKeywords()`**13 alias methods** pointing at 6 real ones.
Pick the canonical name for each pair, update all call-sites (there are few), and delete aliases.
Reduces Database.php from ~945 lines significantly.
- [ ] **`getPDO()` / `getConnection()` leaking to callers** — `edit.php`, `formulaire.php`,
`thanks.php`, `import.php`, `tfe.php`, `index.php`, `media.php`, `system.php` all call
`$db->getPDO()` or `$db->getConnection()` to run raw queries that belong in `Database.php`.
Each is a missed encapsulation:
- `tfe.php`: raw `SELECT access_type_id FROM theses WHERE id = ?` → add `getThesisAccessTypeId(int $id): ?int`
- `index.php`: raw `SELECT thesis_id, file_path FROM thesis_files WHERE … IN (…)` → add `getCoverPathsForTheses(array $ids): array`
- `media.php`: raw visibility join → move into `Database::getFileVisibility(string $path): ?int`
- `edit.php` (line 155): unparameterised `"… WHERE id = $thesisId"` **SQL injection risk** — fix immediately; also move to a DB method
- `edit.php`: raw `SELECT license_id, access_type_id, context_note FROM theses WHERE id = ?` → expose these via `getThesis()` (already returns `v_theses_full` which has `license_id`)
- `formulaire.php`: raw identifier-generation query + all junction-table INSERTs → encapsulate in `Database::createThesis(array $data): int`
- [ ] **`sanitize_string()` in `formulaire.php` applies `htmlspecialchars` at write time** —
HTML-escaping belongs at render time (in the template), not at storage time. Storing
`&` or `<` in the DB means search, export, and any non-HTML consumer sees corrupt data.
Remove `htmlspecialchars` from `sanitize_string()`; keep only `trim()`. The templates already
call `htmlspecialchars()` on output.
- [ ] **Dead variable `$problematique`**`formulaire.php` line 84 reads `$_POST["problématique"]`
into `$problematique` but the value is **never used** (no matching column, no INSERT reference).
Delete it.
- [ ] **`setThesisJury()` not wrapped in a transaction** — the method does a DELETE then multiple
INSERTs with no transaction guard of its own. If called from outside a transaction (e.g. a
future API endpoint) a partial failure leaves orphaned rows. Wrap the body in
`BEGIN … COMMIT / ROLLBACK` (check `$this->pdo->inTransaction()` to avoid nesting).
- [ ] **DB config auto-detection is fragile**`src/config.php` switches to `test.db` whenever the
file exists locally, which means a developer who ran tests and forgot to delete `test.db` will
silently hit test data on a local production-mirror. Make the default `prod`; require explicit
`DB_ENV=test` to use the test database.
### C — Code organisation / maintainability
- [ ] **`edit.php` does too much** — 530 lines combining form display, POST handling, file upload,
and all reference-data loading in one file. Extract the POST handler to
`public/admin/actions/edit.php` (matching the pattern already used by `formulaire.php`,
`tag.php`, `page.php`, etc.).
- [ ] **`formulaire.php` duplicates banner-upload logic verbatim from `edit.php`** — the MIME check,
size cap, `random_bytes` name, `chmod`, and `setBannerPath()` call are copy-pasted.
Extract a private `Database`-level helper or a standalone `uploadBanner(array $file): ?string`
utility function (returns the relative path or null) shared by both action files.
- [ ] **Junction-table INSERTs are open-coded in every action**`formulaire.php` and `edit.php`
both manually loop `INSERT INTO thesis_languages`, `thesis_formats`, `thesis_tags`.
Add `Database::setThesisLanguages(int $id, array $ids)`, `setThesisFormats(int $id, array $ids)`,
`setThesisTags(int $id, array $names)` — following the same delete-then-reinsert pattern
already used by `setThesisJury()`.
- [ ] **`RateLimit` uses per-file JSON on disk** — reads, writes, and `glob()`s the filesystem on
every public request. For a low-traffic art-school site this is fine, but it creates a
write-on-every-hit pattern. Consider switching to APCu (if available) or SQLite (single INSERT)
to avoid filesystem churn. At minimum, move the cache dir to `/tmp` or a dedicated
`storage/cache/` path that is excluded from deploy rsync.
- [ ] **`__wakeup()` singleton guard throws from a public method** — PHP 8.x deprecates
throwing exceptions from `__wakeup`. Change to `trigger_error(…, E_USER_ERROR)` or implement
`__serialize()`/`__unserialize()` that always throw.