# Migration from YAML to SQLite ## Overview The Post-ERG thesis submission form has been completely overhauled to use a SQLite database instead of flat YAML files. This provides better data integrity, querying capabilities, and prepares the system for a full-featured web application. ## What Changed ### Database Implementation **Before:** Form data was saved as individual YAML files in `data/yaml/`, with file uploads scattered in `data/content/` and `data/cover/`. **After:** All thesis data is now stored in a relational SQLite database (`../db/posterg.db`) with proper normalization and foreign key relationships. ### New Architecture ``` Form Submission Flow: 1. User fills out enhanced form (index.php) 2. Form validates input and begins database transaction 3. Creates/links: author, thesis, supervisors, keywords, languages, formats 4. Uploads files with random names for security 5. Records file metadata in database 6. Commits transaction (all-or-nothing) 7. Redirects to confirmation page showing database data ``` ### Database Schema Highlights - **19 tables** including junction tables and views - **Normalized structure** (3rd Normal Form) - **Automatic timestamps** via triggers - **Cascade deletes** for referential integrity - **Predefined lookup tables** for orientations, AP programs, finalities, etc. - **Views** for simplified querying (v_theses_full, v_theses_public) ## New Files ### `Database.php` Database helper class providing: - PDO connection with error handling - Transaction management - Find-or-create methods for entities - Prepared statement helpers - Lookup methods for all reference data **Key Methods:** ```php $db = new Database(); $authorId = $db->findOrCreateAuthor($name, $email); $keywordId = $db->findOrCreateKeyword($keyword); $orientations = $db->getAllOrientations(); $thesis = $db->getThesis($id); ``` ## Modified Files ### `index.php` **Enhancements:** - Dynamically loads form options from database - Added required fields per schema: - Subtitle (optional) - Synopsis (~200 words, required) - Finality (Approfondi/Enseignement/Spécialisé) - Languages (multiple selection with checkboxes) - Formats (multiple selection with checkboxes) - Better form organization with sections - Improved accessibility (proper labels, IDs) **New Form Fields:** | Field | Type | Required | Notes | |-------|------|----------|-------| | Subtitle | Text | No | New field | | Synopsis | Textarea | Yes | ~200 words | | Finality | Select | Yes | From finality_types table | | Languages | Checkboxes | Yes | Multiple selection | | Formats | Checkboxes | No | Multiple selection | ### `formulaire.php` **Complete rewrite** with: 1. **Transaction-Based Processing:** - `BEGIN TRANSACTION` at start - All insertions in single transaction - `COMMIT` on success or `ROLLBACK` on error - Ensures data consistency 2. **Prepared Statements:** - All SQL queries use PDO prepared statements - Protection against SQL injection - Parameter binding for all user input 3. **Entity Creation:** - Finds or creates authors (by name) - Finds or creates supervisors (by name) - Finds or creates keywords (by text) - Links all entities via junction tables 4. **Identifier Generation:** - Format: `YYYY-NNN` (e.g., "2026-001") - Automatically increments per year - Unique constraint in database 5. **File Handling:** - Random cryptographic filenames (32 hex chars) - Organized by year and identifier: `data/theses/YYYY/YYYY-NNN/` - Cover images separate: `data/covers/` - Metadata stored in `thesis_files` table 6. **Validation:** - Year range: 2000 to current year + 1 - Max 10 keywords enforced - At least one language required - URL format validation - File type and size validation ### `thanks.php` **Complete redesign:** - Reads from database using thesis ID - Displays data from `v_theses_full` view - Shows all relationships: authors, supervisors, keywords, languages, formats - Lists uploaded files with metadata (type, size, date) - Responsive CSS grid layout - Publication status indicator **Security:** - Validates thesis ID (integer only) - Uses prepared statements - No path traversal vulnerability - Error messages don't expose system details ## Database Files ### `../db/posterg.db` Initialized SQLite database with: - 19 tables (11 core, 5 junction, 3 reference) - 2 views (v_theses_full, v_theses_public) - Predefined data: - 15 orientations - 4 AP programs - 3 finality types - 2 languages (French, English) - 7 format types - 3 access types - 4 static pages ### Schema Documentation See `../db/README.md` and `../db/SETUP.md` for complete documentation. ## Security Improvements Retained All security improvements from the previous commit are preserved: ✅ CSRF protection with session tokens ✅ Input validation and sanitization ✅ Prepared statements (SQL injection protection) ✅ Random filenames for uploads ✅ File type and size validation ✅ MIME type checking ✅ Error logging without exposing paths ✅ Path traversal protection ## Data Mapping ### YAML to Database Mapping | Old YAML Field | New Database Location | Notes | |----------------|----------------------|-------| | `auteurice` | `authors.name` | Normalized, reusable | | `email` | `authors.email` | Now in authors table | | `année` | `theses.year` | Integer field | | `titre` | `theses.title` | Required | | - | `theses.subtitle` | New field | | `description` | `theses.synopsis` | Renamed for clarity | | `problématique` | (not yet used) | Can be added to schema | | `orientation` | `theses.orientation_id` | Foreign key to orientations | | `ap` | `theses.ap_program_id` | Foreign key to ap_programs | | - | `theses.finality_id` | New field (required) | | `promoteurice` | `supervisors.name` + `thesis_supervisors` | Many-to-many | | `tag` | `keywords.keyword` + `thesis_keywords` | Many-to-many, max 10 | | `lien` | `theses.baiu_link` | URL validation | | `files` | `thesis_files` table | Full metadata | | `couverture` | (stored as file, not in DB yet) | Could add cover_path column | ## Migration Path for Existing Data If you have existing YAML files to import: 1. **Parse YAML files:** ```php $yamlFiles = glob('data/yaml/*.yaml'); foreach ($yamlFiles as $file) { $data = Yaml::parseFile($file); // ... } ``` 2. **Insert into database:** ```php $db->beginTransaction(); try { $authorId = $db->findOrCreateAuthor($data['auteurice'], $data['email']); // Insert thesis // Link relationships $db->commit(); } catch (Exception $e) { $db->rollback(); } ``` 3. **Verify data:** ```sql SELECT COUNT(*) FROM theses; SELECT * FROM v_theses_full LIMIT 5; ``` ## Testing Checklist Before production deployment: - [ ] Form loads without errors - [ ] All dropdown options populate from database - [ ] Form submission creates thesis record - [ ] Author is created or found correctly - [ ] Supervisors linked properly - [ ] Keywords created and linked (test max 10) - [ ] Languages required (test validation) - [ ] Formats optional (test multiple selection) - [ ] Files upload successfully - [ ] File metadata recorded in database - [ ] Thanks page displays all data correctly - [ ] Transaction rollback works on error - [ ] CSRF token validated - [ ] Invalid data rejected (year, URL, etc.) ## Known Limitations 1. **No cover_path column:** Cover images uploaded but path not stored in `theses` table (can be added) 2. **No problématique field:** Old field not yet in schema (can be added to `theses.remarks` or new column) 3. **File type detection:** Basic (by extension), could be enhanced 4. **No duplicate detection:** Same thesis can be submitted multiple times 5. **No edit capability:** Once submitted, no UI to edit (admin interface needed) ## Next Steps 1. **Initialize production database:** ```bash cd /path/to/production/db sqlite3 posterg.db < schema.sql ``` 2. **Set permissions:** ```bash chmod 644 posterg.db chown www-data:www-data posterg.db ``` 3. **Test form submission:** - Submit test thesis - Verify all fields saved - Check file uploads - Test thanks page 4. **Import existing data:** - Create migration script - Parse old YAML files - Bulk insert into database - Verify integrity 5. **Build admin interface:** - CRUD operations for theses - User management - Approval workflow - Bulk operations 6. **Build public website:** - Search and filter theses - Respect access controls - Display thesis details - Static pages management ## Compatibility Notes ### PHP Requirements - PHP 7.4+ (tested on PHP 8.x) - PDO extension with SQLite support - Composer for Symfony YAML (still used for potential migration) ### Database - SQLite 3.8.0+ - File-based database (no server needed) - Single file: `db/posterg.db` ### Dependencies ```json { "require": { "symfony/yaml": "^6.2", "behat/transliterator": "^1.5" } } ``` Note: YAML library retained for potential data migration from old files. ## Backup Strategy SQLite database is a single file - easy to backup: ```bash # Simple copy cp db/posterg.db db/backups/posterg_$(date +%Y%m%d).db # SQL dump (portable) sqlite3 db/posterg.db .dump > backups/posterg_$(date +%Y%m%d).sql # Compressed backup tar -czf backups/posterg_$(date +%Y%m%d).tar.gz db/posterg.db data/ ``` Set up automated daily backups via cron. ## Performance Considerations - **Indexes:** All critical foreign keys and search fields indexed - **Views:** Pre-computed joins for common queries - **Transactions:** Ensure atomicity without locking issues - **File I/O:** Random filenames prevent directory listing overhead For large datasets (1000+ theses): - Consider WAL mode: `PRAGMA journal_mode=WAL;` - Optimize with `ANALYZE;` periodically - Monitor database size and `VACUUM` if needed ## Rollback Plan If issues arise, you can roll back to YAML-based system: 1. Use previous jj commit: `jj checkout ` 2. Old YAML files in `data/yaml/` still intact 3. Database changes don't affect old YAML code 4. Can run both systems in parallel during transition ## Support For questions or issues: - Schema documentation: `db/README.md` - Setup guide: `db/SETUP.md` - Security details: `SECURITY.md` - Technical specs: `db/posterg_fiche-technique.md` --- **Migration completed:** 2026-01-27 **Database version:** 1.0 **Form version:** 2.0 (SQLite)