Restructure repository and implement secure search feature

Phase 1: Consolidate shared infrastructure
- Create shared/ directory for common code
- Consolidate Database.php from front-backend and formulaire into unified shared/Database.php
  - Smart path detection for test.db vs posterg.db
  - Secure search with wildcard escaping and input validation
  - Support both singleton and direct instantiation patterns
  - Full CRUD methods for admin functionality
- Move RateLimit.php to shared/ (30 requests/min)
- Update all require paths across apps to use shared/

Phase 2: Reorganize directory structure
- Rename front-backend/ → apps/public/
- Rename formulaire/ → apps/admin/
- Rename db/ → database/
- Update all file paths for new structure
- Create root .gitignore excluding databases, cache, logs

Implement secure search feature
- Add apps/public/search.php with full-text search across theses
- Search filters: query, year, orientation, AP program, keywords
- Security features:
  - SQL injection prevention (prepared statements)
  - Wildcard injection prevention (escape % and _)
  - Input validation (max 200 chars, year range 1900-2100)
  - Rate limiting (30 req/min per IP)
  - Pagination limited to 100 results/page
  - XSS protection (htmlspecialchars on output)

Add comprehensive test suite
- Create apps/public/tests/ with proper structure
  - tests/Integration/SearchTest.php - 12 search scenarios
  - tests/Security/SecurityTest.php - vulnerability testing
  - tests/Unit/RateLimitTest.php - rate limit behavior
- Create database/fixtures/CreateTestDatabase.php
- Add apps/public/run-tests.php test runner
- All tests passing (4/4 suites)

Update deployment configuration
- Rename justfile 'sync' recipe to 'deploy'
- Create deploy group with separate deploy-public and deploy-admin
- Add test-deploy recipe for test database
- Exclude *.db, tests/, cache/, *.md from production deploy
- Deploy shared/ to both public and admin locations

Stats: +4482 insertions, -654 deletions across 72 files
This commit is contained in:
Théophile Gervreau-Mercier
2026-01-28 10:24:36 +01:00
parent 95f52d549e
commit 467aced734
81 changed files with 6304 additions and 785 deletions

244
database/README.md Normal file
View File

@@ -0,0 +1,244 @@
# Post-ERG Thesis Database Schema
SQLite database schema for managing final thesis projects (TFE) and doctoral theses at ERG.
## Overview
This schema supports all requirements from the technical specifications (`posterg_fiche-technique.md`):
- Multiple metadata categories (orientation, AP, finality, languages, formats, keywords)
- Multiple authors and supervisors per thesis
- Access control (Libre/Interne/Interdit)
- Licensing management
- File uploads (main TFE, annexes, written parts)
- Jury notes and points
- Publication workflow (submission → defense → publication)
- Editable static pages (charte, about, licenses, contact)
- Distinction between TFEs and doctoral theses
## Database Structure
### Core Tables
**`theses`** - Main thesis information
- Basic metadata (title, subtitle, year, identifier)
- Academic details (orientation, AP program, finality)
- Content (synopsis, jury notes, duration/size)
- Access control and licensing
- Publication workflow status
**`authors`** - Student/author information
- Name and contact email
**`supervisors`** - Thesis promoters
- Name of supervisor/promoter
**`thesis_files`** - Uploaded files
- Main TFE, annexes, written parts
- File metadata (path, size, MIME type)
**`pages`** - Static content pages
- Charte, about, licenses, contact pages
- Easily editable content
### Reference Tables (Predefined Lists)
- `orientations` - Arts Numériques, Dessin, Cinéma d'animation, etc.
- `ap_programs` - Narration Spéculative, DPM, APS, LIENS
- `finality_types` - Approfondi, Enseignement, Spécialisé
- `languages` - Français, Anglais, etc. (expandable)
- `format_types` - Site web, Audio, Vidéo, Performance, etc.
- `keywords` - Dynamic, expandable keyword list (max 10 per thesis)
- `access_types` - Libre, Interne, Interdit
- `license_types` - To be defined
### Junction Tables (Many-to-Many)
- `thesis_authors` - Links theses to authors
- `thesis_supervisors` - Links theses to supervisors
- `thesis_languages` - Multiple languages per thesis
- `thesis_formats` - Multiple formats per thesis
- `thesis_keywords` - Max 10 keywords per thesis
## Key Features
### 1. Flexible Metadata
- Multiple authors, supervisors, languages, formats, and keywords per thesis
- Predefined lists with ability to add new entries
- Proper normalization to avoid data duplication
### 2. Access Control
Three levels of access as specified:
- **Libre**: Freely accessible online and in library
- **Interne**: Physical access only, descriptive note online
- **Interdit**: No physical/online access, descriptive note only
**Important**: Access can be restricted but never opened (as per specs)
### 3. Publication Workflow
The schema tracks the complete lifecycle:
1. **Submission** (`submitted_at`) - Student submits TFE
2. **Defense** (`defense_date`) - Soutenance takes place
3. **Jury Review** (`jury_note_added`, `jury_points`, `context_note`)
4. **Publication** (`published_at`, `is_published = 1`)
**Important**: TFEs are NOT published immediately upon submission. They must wait for:
- Defense to occur
- Jury to add optional context note (max 150 words)
- Jury points to be recorded
### 4. File Management
Support for multiple file types per thesis:
- Main TFE work
- Annexes
- Written part
- Other supporting files
### 5. Views for Easy Querying
**`v_theses_full`** - Complete thesis information with all related data
- Joins all tables
- Concatenates multiple values (authors, supervisors, keywords, etc.)
- Use for backend/admin interfaces
**`v_theses_public`** - Only published theses
- Filtered to `is_published = 1`
- Use for public-facing website
## Usage
### Initialize Database
```bash
sqlite3 posterg.db < schema.sql
```
### Example Queries
#### Get all published theses from 2025
```sql
SELECT * FROM v_theses_public WHERE year = 2025;
```
#### Get theses by orientation
```sql
SELECT * FROM v_theses_full
WHERE orientation = 'Vidéographie';
```
#### Get theses with specific keyword
```sql
SELECT t.* FROM v_theses_full t
JOIN thesis_keywords tk ON t.id = tk.thesis_id
JOIN keywords k ON tk.keyword_id = k.id
WHERE k.keyword = 'performance';
```
#### Get theses awaiting publication (submitted but not published)
```sql
SELECT * FROM theses
WHERE submitted_at IS NOT NULL
AND is_published = 0;
```
#### Update access type (can only restrict, not open)
```sql
-- Allowed: from Libre to Interne
UPDATE theses SET access_type_id = 2 WHERE id = 1;
-- Not allowed per specs: from Interdit to Libre
-- This should be enforced in application logic
```
## Data Import Notes
Based on `Database_TFE_test.csv`:
### Current CSV Structure
- Identifiant (e.g., "2025-002")
- Titre, Sous-titre
- Auteur·ice(s) - comma-separated if multiple
- Contact - email
- Promoteur·ice(s) - comma-separated if multiple
- Format - comma-separated if multiple
- Année
- AP - abbreviation (DPM, LIENS, etc.)
- Orientation - abbreviation (SC, VI, CA, etc.)
- Finalité
- Mots-clés - comma-separated, max 10
- Synopsis
- Contexte - jury context note
- Remarques - internal notes
- Langue - language(s)
- Autorisation - access type
- License - license type
- taille - duration/size info
- Points sur 20 - jury points
- lien BAIU - institutional repository link
### Import Considerations
1. **Parse comma-separated values** for:
- Authors (split and create entries in `authors` table)
- Supervisors (split and create entries in `supervisors` table)
- Formats (map to `format_types`)
- Keywords (split and create/link in `keywords`)
- Languages (split and map to `languages`)
2. **Map abbreviations**:
- Orientations: SC → Sculpture, VI → Vidéographie, CA → Cinéma d'animation, etc.
- AP: DPM, LIENS, APS (exact match)
3. **Handle missing data**:
- Some fields in CSV are empty (AP, Orientation for some entries)
- Use NULL in database
4. **Parse duration/size**:
- Examples: "128 pages", "78 pages + ?? minutes", "68 minutes"
- Extract numeric values for `duration_pages` and `duration_minutes`
- Store original string in `file_size_info`
## Schema Design Decisions
### Why SQLite?
- Self-contained, serverless
- Easy to backup (single file)
- Good performance for this use case
- Simple to integrate with various tools
### Normalization Level
- 3rd Normal Form (3NF) for most tables
- Denormalized views for read performance
- Balance between flexibility and simplicity
### Extensibility
- New languages can be added via `languages` table
- Keywords are dynamic and grow with content
- License types can be defined later
- Static pages can be added via `pages` table
### Constraints
- CASCADE deletes on junction tables
- UNIQUE constraints on lookup table names
- NOT NULL on critical fields
- Automatic timestamps via triggers
## Important Business Rules
1. **No immediate publication**: TFEs must go through defense before publication
2. **Access restriction is one-way**: Can restrict but not open access
3. **Max 10 keywords** per thesis (enforce in application)
4. **Jury context note max 150 words** (enforce in application)
5. **Synopsis ~200 words** (guideline, not hard limit)
6. **Multiple selections allowed** for: languages, formats, authors, supervisors, keywords
7. **Doctoral theses**: Use `is_doctoral = 1` to distinguish from TFEs
## Next Steps
1. Create import script to load CSV data
2. Define license types
3. Build backend API for CRUD operations
4. Implement authorization checks
5. Create admin interface for easy editing
6. Build public-facing website using views